All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
@ 2022-11-25 17:30 Alex Bennée
  2022-11-25 17:30 ` [PATCH v2 1/5] include/hw: attempt to document VirtIO feature variables Alex Bennée
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Alex Bennée

Hi,

This is continuing to attempt to fix the various vhost-user issues
that are currently plaguing the release. One concrete bug I've come
across is that all qtest MMIO devices where being treated as legacy
which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x
to fall back to trying to set the endian value for the virt-queues.

I've patched it for the GPIO tests and raised a tracking bug (#1342)
for the general problem. This might explain why the only other VirtIO
vhost-user MMIO device tested via qtest are the virtio-net-tests. The
vhost networking support is its own special implementation so its hard
to compare the code for GPIO. It does make me wonder if disabling the
mmio version of the test for now would be worthwhile. FWIW I did try
disabling force-legacy for all machine types and that caused a bunch
of the other tests to fail.

I made some progress in tracking down the memory leak that clang
complains about. It comes down to the line:

  gpio->vhost_dev.vqs = g_new0(struct vhost_virtqueue, gpio->vhost_dev.nvqs);

which is never cleared up because we never call
vu_gpio_device_unrealize() in the test. However its unclear why this
is the case. We don't seem to unrealize the vhost-user-network tests
either and clang doesn't complain about that.

I can replicate some of the other failures I've been seeing in CI by
running:

  ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test

however this seems to run everything in parallel and maybe is better
at exposing race conditions. Perhaps the CI system makes those races
easier to hit? Unfortunately I've not been able to figure out exactly
how things go wrong in the failure case. 

I've included Stefano's:

  vhost: enable vrings in vhost_dev_start() for vhost-user devices

in this series as it makes sense and improves the vring state errors.
However it's up to you if you want to include it in the eventual PR.
There are still CI errors I'm trying to track down but I thought it
would be worth posting the current state of my tree.

Please review.


Alex Bennée (4):
  include/hw: attempt to document VirtIO feature variables
  include/hw: VM state takes precedence in virtio_device_should_start
  tests/qtests: override "force-legacy" for gpio virtio-mmio tests
  hw/virtio: ensure a valid host_feature set for virtio-user-gpio

Stefano Garzarella (1):
  vhost: enable vrings in vhost_dev_start() for vhost-user devices

 include/hw/virtio/vhost.h        | 31 ++++++++++++++++++----
 include/hw/virtio/virtio.h       | 43 ++++++++++++++++++++++++++-----
 backends/cryptodev-vhost.c       |  4 +--
 backends/vhost-user.c            |  4 +--
 hw/block/vhost-user-blk.c        |  4 +--
 hw/net/vhost_net.c               |  8 +++---
 hw/scsi/vhost-scsi-common.c      |  4 +--
 hw/virtio/vhost-user-fs.c        |  4 +--
 hw/virtio/vhost-user-gpio.c      | 10 ++++++--
 hw/virtio/vhost-user-i2c.c       |  4 +--
 hw/virtio/vhost-user-rng.c       |  4 +--
 hw/virtio/vhost-vsock-common.c   |  4 +--
 hw/virtio/vhost.c                | 44 ++++++++++++++++++++++++++++----
 tests/qtest/libqos/virtio-gpio.c |  3 ++-
 hw/virtio/trace-events           |  4 +--
 15 files changed, 134 insertions(+), 41 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] include/hw: attempt to document VirtIO feature variables
  2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
@ 2022-11-25 17:30 ` Alex Bennée
  2022-11-25 17:30 ` [PATCH v2 2/5] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Alex Bennée, Stefan Hajnoczi

We have a bunch of variables associated with the device and the vhost
backend which are used inconsistently throughout the code base. Lets
start trying to bring some order by agreeing what each variable is
for.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>

---
v2
  - dropped DISCUSS and commentary
  - separated protocol section for clarity
  - updated working on vhost->backend_features
  - made clear guest_features was the written state
---
 include/hw/virtio/vhost.h  | 25 ++++++++++++++++++++++---
 include/hw/virtio/virtio.h | 19 ++++++++++++++++++-
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 353252ac3e..eaf628f656 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -88,13 +88,32 @@ struct vhost_dev {
     int vq_index_end;
     /* if non-zero, minimum required value for max_queues */
     int num_queues;
+    /**
+     * vhost feature handling requires matching the feature set
+     * offered by a backend which may be a subset of the total
+     * features eventually offered to the guest.
+     *
+     * @features: available features provided by the backend
+     * @acked_features: final negotiated features with front-end driver
+     *
+     * @backend_features: this is used in a couple of places to either
+     * store VHOST_USER_F_PROTOCOL_FEATURES to apply to
+     * VHOST_USER_SET_FEATURES or VHOST_NET_F_VIRTIO_NET_HDR. Its
+     * future use should be discouraged and the variable retired as
+     * its easy to confuse with the VirtIO backend_features.
+     */
     uint64_t features;
-    /** @acked_features: final set of negotiated features */
     uint64_t acked_features;
-    /** @backend_features: backend specific feature bits */
     uint64_t backend_features;
-    /** @protocol_features: final negotiated protocol features */
+
+    /**
+     * @protocol_features: is the vhost-user only feature set by
+     * VHOST_USER_SET_PROTOCOL_FEATURES. Protocol features are only
+     * negotiated if VHOST_USER_F_PROTOCOL_FEATURES has been offered
+     * by the backend (see @features).
+     */
     uint64_t protocol_features;
+
     uint64_t max_queues;
     uint64_t backend_cap;
     /* @started: is the vhost device started? */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index a973811cbf..0f612067f7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -93,6 +93,12 @@ enum virtio_device_endian {
     VIRTIO_DEVICE_ENDIAN_BIG,
 };
 
+/**
+ * struct VirtIODevice - common VirtIO structure
+ * @name: name of the device
+ * @status: VirtIO Device Status field
+ *
+ */
 struct VirtIODevice
 {
     DeviceState parent_obj;
@@ -100,9 +106,20 @@ struct VirtIODevice
     uint8_t status;
     uint8_t isr;
     uint16_t queue_sel;
-    uint64_t guest_features;
+    /**
+     * These fields represent a set of VirtIO features at various
+     * levels of the stack. @host_features indicates the complete
+     * feature set the VirtIO device can offer to the driver.
+     * @guest_features indicates which features the VirtIO driver has
+     * selected by writing to the feature register. Finally
+     * @backend_features represents everything supported by the
+     * backend (e.g. vhost) and could potentially be a subset of the
+     * total feature set offered by QEMU.
+     */
     uint64_t host_features;
+    uint64_t guest_features;
     uint64_t backend_features;
+
     size_t config_len;
     void *config;
     uint16_t config_vector;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] include/hw: VM state takes precedence in virtio_device_should_start
  2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
  2022-11-25 17:30 ` [PATCH v2 1/5] include/hw: attempt to document VirtIO feature variables Alex Bennée
@ 2022-11-25 17:30 ` Alex Bennée
  2022-11-27 10:17   ` Bernhard Beschow
  2022-11-25 17:30 ` [PATCH v2 3/5] tests/qtests: override "force-legacy" for gpio virtio-mmio tests Alex Bennée
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Alex Bennée, Christian Borntraeger

The VM status should always preempt the device status for these
checks. This ensures the device is in the correct state when we
suspend the VM prior to migrations. This restores the checks to the
order they where in before the refactoring moved things around.

While we are at it lets improve our documentation of the various
fields involved and document the two functions.

Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started)
Fixes: 259d69c00b (hw/virtio: introduce virtio_device_should_start)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
---
 include/hw/virtio/virtio.h | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 0f612067f7..48f539d0fe 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -133,6 +133,13 @@ struct VirtIODevice
     bool broken; /* device in invalid state, needs reset */
     bool use_disabled_flag; /* allow use of 'disable' flag when needed */
     bool disabled; /* device in temporarily disabled state */
+    /**
+     * @use_started: true if the @started flag should be used to check the
+     * current state of the VirtIO device. Otherwise status bits
+     * should be checked for a current status of the device.
+     * @use_started is only set via QMP and defaults to true for all
+     * modern machines (since 4.1).
+     */
     bool use_started;
     bool started;
     bool start_on_kick; /* when virtio 1.0 feature has not been negotiated */
@@ -408,6 +415,17 @@ static inline bool virtio_is_big_endian(VirtIODevice *vdev)
     return false;
 }
 
+
+/**
+ * virtio_device_should_start() - check if device started
+ * @vdev - the VirtIO device
+ * @status - the devices status bits
+ *
+ * Check if the device is started. For most modern machines this is
+ * tracked via the @vdev->started field (to support migration),
+ * otherwise we check for the final negotiated status bit that
+ * indicates everything is ready.
+ */
 static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
 {
     if (vdev->use_started) {
@@ -428,15 +446,11 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
  */
 static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status)
 {
-    if (vdev->use_started) {
-        return vdev->started;
-    }
-
     if (!vdev->vm_running) {
         return false;
     }
 
-    return status & VIRTIO_CONFIG_S_DRIVER_OK;
+    return virtio_device_started(vdev, status);
 }
 
 static inline void virtio_set_started(VirtIODevice *vdev, bool started)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] tests/qtests: override "force-legacy" for gpio virtio-mmio tests
  2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
  2022-11-25 17:30 ` [PATCH v2 1/5] include/hw: attempt to document VirtIO feature variables Alex Bennée
  2022-11-25 17:30 ` [PATCH v2 2/5] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
@ 2022-11-25 17:30 ` Alex Bennée
  2022-11-25 17:30 ` [PATCH v2 4/5] hw/virtio: ensure a valid host_feature set for virtio-user-gpio Alex Bennée
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Alex Bennée, Thomas Huth,
	Laurent Vivier, Paolo Bonzini

The GPIO device is a VIRTIO_F_VERSION_1 devices but running with a
legacy MMIO interface we miss out that feature bit causing confusion.
For the GPIO test force the mmio bus to support non-legacy so we can
properly test it.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1333
---
 tests/qtest/libqos/virtio-gpio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqos/virtio-gpio.c b/tests/qtest/libqos/virtio-gpio.c
index 762aa6695b..f22d7b5eb5 100644
--- a/tests/qtest/libqos/virtio-gpio.c
+++ b/tests/qtest/libqos/virtio-gpio.c
@@ -154,7 +154,8 @@ static void virtio_gpio_register_nodes(void)
     QOSGraphEdgeOptions edge_opts = { };
 
     /* vhost-user-gpio-device */
-    edge_opts.extra_device_opts = "id=gpio0,chardev=chr-vhost-user-test";
+    edge_opts.extra_device_opts = "id=gpio0,chardev=chr-vhost-user-test "
+        "-global virtio-mmio.force-legacy=false";
     qos_node_create_driver("vhost-user-gpio-device",
                             virtio_gpio_device_create);
     qos_node_consumes("vhost-user-gpio-device", "virtio-bus", &edge_opts);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] hw/virtio: ensure a valid host_feature set for virtio-user-gpio
  2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
                   ` (2 preceding siblings ...)
  2022-11-25 17:30 ` [PATCH v2 3/5] tests/qtests: override "force-legacy" for gpio virtio-mmio tests Alex Bennée
@ 2022-11-25 17:30 ` Alex Bennée
  2022-11-25 17:30   ` [Virtio-fs] " Alex Bennée
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Alex Bennée

There was a disconnect here because vdev->host_features was set to
random rubbish. This caused a weird negotiation between the driver and
device that took no account of the features provided by the backend.
To fix this we must set vdev->host_features once we have initialised
the vhost backend.

[AJB: however this is confusing because AFAICT none of the other
vhost-user devices do this.]

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 hw/virtio/vhost-user-gpio.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index 5851cb3bc9..b2496c824c 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -228,6 +228,12 @@ static int vu_gpio_connect(DeviceState *dev, Error **errp)
         return ret;
     }
 
+    /*
+     * Once we have initialised the vhost backend we can finally set
+     * the what host features are available for this device.
+     */
+    vdev->host_features = vhost_dev->features;
+
     /* restore vhost state */
     if (virtio_device_started(vdev, vdev->status)) {
         vu_gpio_start(vdev);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] vhost: enable vrings in vhost_dev_start() for vhost-user devices
  2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
@ 2022-11-25 17:30   ` Alex Bennée
  2022-11-25 17:30 ` [PATCH v2 2/5] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, German Maglione, Raphael Norwitz,
	Alex Bennée, Gonglei (Arei),
	Kevin Wolf, Hanna Reitz, Jason Wang, Paolo Bonzini, Fam Zheng,
	Dr. David Alan Gilbert, open list:Block layer core,
	open list:virtiofs

From: Stefano Garzarella <sgarzare@redhat.com>

Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
backend, but we forgot to enable vrings as specified in
docs/interop/vhost-user.rst:

    If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
    ring starts directly in the enabled state.

    If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
    initialized in a disabled state and is enabled by
    ``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.

Some vhost-user front-ends already did this by calling
vhost_ops.vhost_set_vring_enable() directly:
- backends/cryptodev-vhost.c
- hw/net/virtio-net.c
- hw/virtio/vhost-user-gpio.c

But most didn't do that, so we would leave the vrings disabled and some
backends would not work. We observed this issue with the rust version of
virtiofsd [1], which uses the event loop [2] provided by the
vhost-user-backend crate where requests are not processed if vring is
not enabled.

Let's fix this issue by enabling the vrings in vhost_dev_start() for
vhost-user front-ends that don't already do this directly. Same thing
also in vhost_dev_stop() where we disable vrings.

[1] https://gitlab.com/virtio-fs/virtiofsd
[2] https://github.com/rust-vmm/vhost/blob/240fc2966/crates/vhost-user-backend/src/event_loop.rs#L217

Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in features")
Reported-by: German Maglione <gmaglione@redhat.com>
Tested-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221123131630.52020-1-sgarzare@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/hw/virtio/vhost.h      |  6 +++--
 backends/cryptodev-vhost.c     |  4 ++--
 backends/vhost-user.c          |  4 ++--
 hw/block/vhost-user-blk.c      |  4 ++--
 hw/net/vhost_net.c             |  8 +++----
 hw/scsi/vhost-scsi-common.c    |  4 ++--
 hw/virtio/vhost-user-fs.c      |  4 ++--
 hw/virtio/vhost-user-gpio.c    |  4 ++--
 hw/virtio/vhost-user-i2c.c     |  4 ++--
 hw/virtio/vhost-user-rng.c     |  4 ++--
 hw/virtio/vhost-vsock-common.c |  4 ++--
 hw/virtio/vhost.c              | 44 ++++++++++++++++++++++++++++++----
 hw/virtio/trace-events         |  4 ++--
 13 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index eaf628f656..1cafa0d776 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -203,24 +203,26 @@ static inline bool vhost_dev_is_started(struct vhost_dev *hdev)
  * vhost_dev_start() - start the vhost device
  * @hdev: common vhost_dev structure
  * @vdev: the VirtIODevice structure
+ * @vrings: true to have vrings enabled in this call
  *
  * Starts the vhost device. From this point VirtIO feature negotiation
  * can start and the device can start processing VirtIO transactions.
  *
  * Return: 0 on success, < 0 on error.
  */
-int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings);
 
 /**
  * vhost_dev_stop() - stop the vhost device
  * @hdev: common vhost_dev structure
  * @vdev: the VirtIODevice structure
+ * @vrings: true to have vrings disabled in this call
  *
  * Stop the vhost device. After the device is stopped the notifiers
  * can be disabled (@vhost_dev_disable_notifiers) and the device can
  * be torn down (@vhost_dev_cleanup).
  */
-void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings);
 
 /**
  * DOC: vhost device configuration handling
diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index bc13e466b4..572f87b3be 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -94,7 +94,7 @@ cryptodev_vhost_start_one(CryptoDevBackendVhost *crypto,
         goto fail_notifiers;
     }
 
-    r = vhost_dev_start(&crypto->dev, dev);
+    r = vhost_dev_start(&crypto->dev, dev, false);
     if (r < 0) {
         goto fail_start;
     }
@@ -111,7 +111,7 @@ static void
 cryptodev_vhost_stop_one(CryptoDevBackendVhost *crypto,
                                  VirtIODevice *dev)
 {
-    vhost_dev_stop(&crypto->dev, dev);
+    vhost_dev_stop(&crypto->dev, dev, false);
     vhost_dev_disable_notifiers(&crypto->dev, dev);
 }
 
diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index 5dedb2d987..7bfcaef976 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -85,7 +85,7 @@ vhost_user_backend_start(VhostUserBackend *b)
     }
 
     b->dev.acked_features = b->vdev->guest_features;
-    ret = vhost_dev_start(&b->dev, b->vdev);
+    ret = vhost_dev_start(&b->dev, b->vdev, true);
     if (ret < 0) {
         error_report("Error start vhost dev");
         goto err_guest_notifiers;
@@ -120,7 +120,7 @@ vhost_user_backend_stop(VhostUserBackend *b)
         return;
     }
 
-    vhost_dev_stop(&b->dev, b->vdev);
+    vhost_dev_stop(&b->dev, b->vdev, true);
 
     if (k->set_guest_notifiers) {
         ret = k->set_guest_notifiers(qbus->parent,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 0d5190accf..1177064631 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -178,7 +178,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp)
     }
 
     s->dev.vq_index_end = s->dev.nvqs;
-    ret = vhost_dev_start(&s->dev, vdev);
+    ret = vhost_dev_start(&s->dev, vdev, true);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Error starting vhost");
         goto err_guest_notifiers;
@@ -213,7 +213,7 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&s->dev, vdev);
+    vhost_dev_stop(&s->dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 26e4930676..043058ff43 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -259,7 +259,7 @@ static int vhost_net_start_one(struct vhost_net *net,
         goto fail_notifiers;
     }
 
-    r = vhost_dev_start(&net->dev, dev);
+    r = vhost_dev_start(&net->dev, dev, false);
     if (r < 0) {
         goto fail_start;
     }
@@ -308,7 +308,7 @@ fail:
     if (net->nc->info->poll) {
         net->nc->info->poll(net->nc, true);
     }
-    vhost_dev_stop(&net->dev, dev);
+    vhost_dev_stop(&net->dev, dev, false);
 fail_start:
     vhost_dev_disable_notifiers(&net->dev, dev);
 fail_notifiers:
@@ -329,7 +329,7 @@ static void vhost_net_stop_one(struct vhost_net *net,
     if (net->nc->info->poll) {
         net->nc->info->poll(net->nc, true);
     }
-    vhost_dev_stop(&net->dev, dev);
+    vhost_dev_stop(&net->dev, dev, false);
     if (net->nc->info->stop) {
         net->nc->info->stop(net->nc);
     }
@@ -606,7 +606,7 @@ err_start:
         assert(r >= 0);
     }
 
-    vhost_dev_stop(&net->dev, vdev);
+    vhost_dev_stop(&net->dev, vdev, false);
 
     return r;
 }
diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 767f827e55..18ea5dcfa1 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -68,7 +68,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
         goto err_guest_notifiers;
     }
 
-    ret = vhost_dev_start(&vsc->dev, vdev);
+    ret = vhost_dev_start(&vsc->dev, vdev, true);
     if (ret < 0) {
         error_report("Error start vhost dev");
         goto err_guest_notifiers;
@@ -101,7 +101,7 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret = 0;
 
-    vhost_dev_stop(&vsc->dev, vdev);
+    vhost_dev_stop(&vsc->dev, vdev, true);
 
     if (k->set_guest_notifiers) {
         ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index dc4014cdef..d97b179e6f 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -76,7 +76,7 @@ static void vuf_start(VirtIODevice *vdev)
     }
 
     fs->vhost_dev.acked_features = vdev->guest_features;
-    ret = vhost_dev_start(&fs->vhost_dev, vdev);
+    ret = vhost_dev_start(&fs->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost: %d", -ret);
         goto err_guest_notifiers;
@@ -110,7 +110,7 @@ static void vuf_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&fs->vhost_dev, vdev);
+    vhost_dev_stop(&fs->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index b2496c824c..b38e4d4cf0 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -81,7 +81,7 @@ static int vu_gpio_start(VirtIODevice *vdev)
      */
     vhost_ack_features(&gpio->vhost_dev, feature_bits, vdev->guest_features);
 
-    ret = vhost_dev_start(&gpio->vhost_dev, vdev);
+    ret = vhost_dev_start(&gpio->vhost_dev, vdev, false);
     if (ret < 0) {
         error_report("Error starting vhost-user-gpio: %d", ret);
         goto err_guest_notifiers;
@@ -139,7 +139,7 @@ static void vu_gpio_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(vhost_dev, vdev);
+    vhost_dev_stop(vhost_dev, vdev, false);
 
     ret = k->set_guest_notifiers(qbus->parent, vhost_dev->nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
index 1c9f3d20dc..dc5c828ba6 100644
--- a/hw/virtio/vhost-user-i2c.c
+++ b/hw/virtio/vhost-user-i2c.c
@@ -46,7 +46,7 @@ static void vu_i2c_start(VirtIODevice *vdev)
 
     i2c->vhost_dev.acked_features = vdev->guest_features;
 
-    ret = vhost_dev_start(&i2c->vhost_dev, vdev);
+    ret = vhost_dev_start(&i2c->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost-user-i2c: %d", -ret);
         goto err_guest_notifiers;
@@ -80,7 +80,7 @@ static void vu_i2c_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&i2c->vhost_dev, vdev);
+    vhost_dev_stop(&i2c->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c
index f9084cde58..201a39e220 100644
--- a/hw/virtio/vhost-user-rng.c
+++ b/hw/virtio/vhost-user-rng.c
@@ -47,7 +47,7 @@ static void vu_rng_start(VirtIODevice *vdev)
     }
 
     rng->vhost_dev.acked_features = vdev->guest_features;
-    ret = vhost_dev_start(&rng->vhost_dev, vdev);
+    ret = vhost_dev_start(&rng->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost-user-rng: %d", -ret);
         goto err_guest_notifiers;
@@ -81,7 +81,7 @@ static void vu_rng_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&rng->vhost_dev, vdev);
+    vhost_dev_stop(&rng->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index a67a275de2..d21c72b401 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -70,7 +70,7 @@ int vhost_vsock_common_start(VirtIODevice *vdev)
     }
 
     vvc->vhost_dev.acked_features = vdev->guest_features;
-    ret = vhost_dev_start(&vvc->vhost_dev, vdev);
+    ret = vhost_dev_start(&vvc->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost: %d", -ret);
         goto err_guest_notifiers;
@@ -105,7 +105,7 @@ void vhost_vsock_common_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&vvc->vhost_dev, vdev);
+    vhost_dev_stop(&vvc->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, vvc->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index d1c4c20b8c..7fb008bc9e 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1777,15 +1777,36 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
     return 0;
 }
 
+static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
+{
+    if (!hdev->vhost_ops->vhost_set_vring_enable) {
+        return 0;
+    }
+
+    /*
+     * For vhost-user devices, if VHOST_USER_F_PROTOCOL_FEATURES has not
+     * been negotiated, the rings start directly in the enabled state, and
+     * .vhost_set_vring_enable callback will fail since
+     * VHOST_USER_SET_VRING_ENABLE is not supported.
+     */
+    if (hdev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER &&
+        !virtio_has_feature(hdev->backend_features,
+                            VHOST_USER_F_PROTOCOL_FEATURES)) {
+        return 0;
+    }
+
+    return hdev->vhost_ops->vhost_set_vring_enable(hdev, enable);
+}
+
 /* Host notifiers must be enabled at this point. */
-int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
+int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
 {
     int i, r;
 
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
-    trace_vhost_dev_start(hdev, vdev->name);
+    trace_vhost_dev_start(hdev, vdev->name, vrings);
 
     vdev->vhost_started = true;
     hdev->started = true;
@@ -1830,10 +1851,16 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
             goto fail_log;
         }
     }
+    if (vrings) {
+        r = vhost_dev_set_vring_enable(hdev, true);
+        if (r) {
+            goto fail_log;
+        }
+    }
     if (hdev->vhost_ops->vhost_dev_start) {
         r = hdev->vhost_ops->vhost_dev_start(hdev, true);
         if (r) {
-            goto fail_log;
+            goto fail_start;
         }
     }
     if (vhost_dev_has_iommu(hdev) &&
@@ -1848,6 +1875,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         }
     }
     return 0;
+fail_start:
+    if (vrings) {
+        vhost_dev_set_vring_enable(hdev, false);
+    }
 fail_log:
     vhost_log_put(hdev, false);
 fail_vq:
@@ -1866,18 +1897,21 @@ fail_features:
 }
 
 /* Host notifiers must be enabled at this point. */
-void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
+void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
 {
     int i;
 
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
-    trace_vhost_dev_stop(hdev, vdev->name);
+    trace_vhost_dev_stop(hdev, vdev->name, vrings);
 
     if (hdev->vhost_ops->vhost_dev_start) {
         hdev->vhost_ops->vhost_dev_start(hdev, false);
     }
+    if (vrings) {
+        vhost_dev_set_vring_enable(hdev, false);
+    }
     for (i = 0; i < hdev->nvqs; ++i) {
         vhost_virtqueue_stop(hdev,
                              vdev,
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 820dadc26c..14fc5b9bb2 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -9,8 +9,8 @@ vhost_section(const char *name) "%s"
 vhost_reject_section(const char *name, int d) "%s:%d"
 vhost_iotlb_miss(void *dev, int step) "%p step %d"
 vhost_dev_cleanup(void *dev) "%p"
-vhost_dev_start(void *dev, const char *name) "%p:%s"
-vhost_dev_stop(void *dev, const char *name) "%p:%s"
+vhost_dev_start(void *dev, const char *name, bool vrings) "%p:%s vrings:%d"
+vhost_dev_stop(void *dev, const char *name, bool vrings) "%p:%s vrings:%d"
 
 
 # vhost-user.c
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Virtio-fs] [PATCH v2 5/5] vhost: enable vrings in vhost_dev_start() for vhost-user devices
@ 2022-11-25 17:30   ` Alex Bennée
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, German Maglione, Raphael Norwitz,
	Alex Bennée, Gonglei (Arei),
	Kevin Wolf, Hanna Reitz, Jason Wang, Paolo Bonzini, Fam Zheng,
	Dr. David Alan Gilbert, open list:Block layer core,
	open list:virtiofs

From: Stefano Garzarella <sgarzare@redhat.com>

Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
backend, but we forgot to enable vrings as specified in
docs/interop/vhost-user.rst:

    If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
    ring starts directly in the enabled state.

    If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
    initialized in a disabled state and is enabled by
    ``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.

Some vhost-user front-ends already did this by calling
vhost_ops.vhost_set_vring_enable() directly:
- backends/cryptodev-vhost.c
- hw/net/virtio-net.c
- hw/virtio/vhost-user-gpio.c

But most didn't do that, so we would leave the vrings disabled and some
backends would not work. We observed this issue with the rust version of
virtiofsd [1], which uses the event loop [2] provided by the
vhost-user-backend crate where requests are not processed if vring is
not enabled.

Let's fix this issue by enabling the vrings in vhost_dev_start() for
vhost-user front-ends that don't already do this directly. Same thing
also in vhost_dev_stop() where we disable vrings.

[1] https://gitlab.com/virtio-fs/virtiofsd
[2] https://github.com/rust-vmm/vhost/blob/240fc2966/crates/vhost-user-backend/src/event_loop.rs#L217

Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in features")
Reported-by: German Maglione <gmaglione@redhat.com>
Tested-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221123131630.52020-1-sgarzare@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/hw/virtio/vhost.h      |  6 +++--
 backends/cryptodev-vhost.c     |  4 ++--
 backends/vhost-user.c          |  4 ++--
 hw/block/vhost-user-blk.c      |  4 ++--
 hw/net/vhost_net.c             |  8 +++----
 hw/scsi/vhost-scsi-common.c    |  4 ++--
 hw/virtio/vhost-user-fs.c      |  4 ++--
 hw/virtio/vhost-user-gpio.c    |  4 ++--
 hw/virtio/vhost-user-i2c.c     |  4 ++--
 hw/virtio/vhost-user-rng.c     |  4 ++--
 hw/virtio/vhost-vsock-common.c |  4 ++--
 hw/virtio/vhost.c              | 44 ++++++++++++++++++++++++++++++----
 hw/virtio/trace-events         |  4 ++--
 13 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index eaf628f656..1cafa0d776 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -203,24 +203,26 @@ static inline bool vhost_dev_is_started(struct vhost_dev *hdev)
  * vhost_dev_start() - start the vhost device
  * @hdev: common vhost_dev structure
  * @vdev: the VirtIODevice structure
+ * @vrings: true to have vrings enabled in this call
  *
  * Starts the vhost device. From this point VirtIO feature negotiation
  * can start and the device can start processing VirtIO transactions.
  *
  * Return: 0 on success, < 0 on error.
  */
-int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings);
 
 /**
  * vhost_dev_stop() - stop the vhost device
  * @hdev: common vhost_dev structure
  * @vdev: the VirtIODevice structure
+ * @vrings: true to have vrings disabled in this call
  *
  * Stop the vhost device. After the device is stopped the notifiers
  * can be disabled (@vhost_dev_disable_notifiers) and the device can
  * be torn down (@vhost_dev_cleanup).
  */
-void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings);
 
 /**
  * DOC: vhost device configuration handling
diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index bc13e466b4..572f87b3be 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -94,7 +94,7 @@ cryptodev_vhost_start_one(CryptoDevBackendVhost *crypto,
         goto fail_notifiers;
     }
 
-    r = vhost_dev_start(&crypto->dev, dev);
+    r = vhost_dev_start(&crypto->dev, dev, false);
     if (r < 0) {
         goto fail_start;
     }
@@ -111,7 +111,7 @@ static void
 cryptodev_vhost_stop_one(CryptoDevBackendVhost *crypto,
                                  VirtIODevice *dev)
 {
-    vhost_dev_stop(&crypto->dev, dev);
+    vhost_dev_stop(&crypto->dev, dev, false);
     vhost_dev_disable_notifiers(&crypto->dev, dev);
 }
 
diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index 5dedb2d987..7bfcaef976 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -85,7 +85,7 @@ vhost_user_backend_start(VhostUserBackend *b)
     }
 
     b->dev.acked_features = b->vdev->guest_features;
-    ret = vhost_dev_start(&b->dev, b->vdev);
+    ret = vhost_dev_start(&b->dev, b->vdev, true);
     if (ret < 0) {
         error_report("Error start vhost dev");
         goto err_guest_notifiers;
@@ -120,7 +120,7 @@ vhost_user_backend_stop(VhostUserBackend *b)
         return;
     }
 
-    vhost_dev_stop(&b->dev, b->vdev);
+    vhost_dev_stop(&b->dev, b->vdev, true);
 
     if (k->set_guest_notifiers) {
         ret = k->set_guest_notifiers(qbus->parent,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 0d5190accf..1177064631 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -178,7 +178,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp)
     }
 
     s->dev.vq_index_end = s->dev.nvqs;
-    ret = vhost_dev_start(&s->dev, vdev);
+    ret = vhost_dev_start(&s->dev, vdev, true);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Error starting vhost");
         goto err_guest_notifiers;
@@ -213,7 +213,7 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&s->dev, vdev);
+    vhost_dev_stop(&s->dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 26e4930676..043058ff43 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -259,7 +259,7 @@ static int vhost_net_start_one(struct vhost_net *net,
         goto fail_notifiers;
     }
 
-    r = vhost_dev_start(&net->dev, dev);
+    r = vhost_dev_start(&net->dev, dev, false);
     if (r < 0) {
         goto fail_start;
     }
@@ -308,7 +308,7 @@ fail:
     if (net->nc->info->poll) {
         net->nc->info->poll(net->nc, true);
     }
-    vhost_dev_stop(&net->dev, dev);
+    vhost_dev_stop(&net->dev, dev, false);
 fail_start:
     vhost_dev_disable_notifiers(&net->dev, dev);
 fail_notifiers:
@@ -329,7 +329,7 @@ static void vhost_net_stop_one(struct vhost_net *net,
     if (net->nc->info->poll) {
         net->nc->info->poll(net->nc, true);
     }
-    vhost_dev_stop(&net->dev, dev);
+    vhost_dev_stop(&net->dev, dev, false);
     if (net->nc->info->stop) {
         net->nc->info->stop(net->nc);
     }
@@ -606,7 +606,7 @@ err_start:
         assert(r >= 0);
     }
 
-    vhost_dev_stop(&net->dev, vdev);
+    vhost_dev_stop(&net->dev, vdev, false);
 
     return r;
 }
diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 767f827e55..18ea5dcfa1 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -68,7 +68,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
         goto err_guest_notifiers;
     }
 
-    ret = vhost_dev_start(&vsc->dev, vdev);
+    ret = vhost_dev_start(&vsc->dev, vdev, true);
     if (ret < 0) {
         error_report("Error start vhost dev");
         goto err_guest_notifiers;
@@ -101,7 +101,7 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret = 0;
 
-    vhost_dev_stop(&vsc->dev, vdev);
+    vhost_dev_stop(&vsc->dev, vdev, true);
 
     if (k->set_guest_notifiers) {
         ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index dc4014cdef..d97b179e6f 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -76,7 +76,7 @@ static void vuf_start(VirtIODevice *vdev)
     }
 
     fs->vhost_dev.acked_features = vdev->guest_features;
-    ret = vhost_dev_start(&fs->vhost_dev, vdev);
+    ret = vhost_dev_start(&fs->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost: %d", -ret);
         goto err_guest_notifiers;
@@ -110,7 +110,7 @@ static void vuf_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&fs->vhost_dev, vdev);
+    vhost_dev_stop(&fs->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index b2496c824c..b38e4d4cf0 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -81,7 +81,7 @@ static int vu_gpio_start(VirtIODevice *vdev)
      */
     vhost_ack_features(&gpio->vhost_dev, feature_bits, vdev->guest_features);
 
-    ret = vhost_dev_start(&gpio->vhost_dev, vdev);
+    ret = vhost_dev_start(&gpio->vhost_dev, vdev, false);
     if (ret < 0) {
         error_report("Error starting vhost-user-gpio: %d", ret);
         goto err_guest_notifiers;
@@ -139,7 +139,7 @@ static void vu_gpio_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(vhost_dev, vdev);
+    vhost_dev_stop(vhost_dev, vdev, false);
 
     ret = k->set_guest_notifiers(qbus->parent, vhost_dev->nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
index 1c9f3d20dc..dc5c828ba6 100644
--- a/hw/virtio/vhost-user-i2c.c
+++ b/hw/virtio/vhost-user-i2c.c
@@ -46,7 +46,7 @@ static void vu_i2c_start(VirtIODevice *vdev)
 
     i2c->vhost_dev.acked_features = vdev->guest_features;
 
-    ret = vhost_dev_start(&i2c->vhost_dev, vdev);
+    ret = vhost_dev_start(&i2c->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost-user-i2c: %d", -ret);
         goto err_guest_notifiers;
@@ -80,7 +80,7 @@ static void vu_i2c_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&i2c->vhost_dev, vdev);
+    vhost_dev_stop(&i2c->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c
index f9084cde58..201a39e220 100644
--- a/hw/virtio/vhost-user-rng.c
+++ b/hw/virtio/vhost-user-rng.c
@@ -47,7 +47,7 @@ static void vu_rng_start(VirtIODevice *vdev)
     }
 
     rng->vhost_dev.acked_features = vdev->guest_features;
-    ret = vhost_dev_start(&rng->vhost_dev, vdev);
+    ret = vhost_dev_start(&rng->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost-user-rng: %d", -ret);
         goto err_guest_notifiers;
@@ -81,7 +81,7 @@ static void vu_rng_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&rng->vhost_dev, vdev);
+    vhost_dev_stop(&rng->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index a67a275de2..d21c72b401 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -70,7 +70,7 @@ int vhost_vsock_common_start(VirtIODevice *vdev)
     }
 
     vvc->vhost_dev.acked_features = vdev->guest_features;
-    ret = vhost_dev_start(&vvc->vhost_dev, vdev);
+    ret = vhost_dev_start(&vvc->vhost_dev, vdev, true);
     if (ret < 0) {
         error_report("Error starting vhost: %d", -ret);
         goto err_guest_notifiers;
@@ -105,7 +105,7 @@ void vhost_vsock_common_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&vvc->vhost_dev, vdev);
+    vhost_dev_stop(&vvc->vhost_dev, vdev, true);
 
     ret = k->set_guest_notifiers(qbus->parent, vvc->vhost_dev.nvqs, false);
     if (ret < 0) {
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index d1c4c20b8c..7fb008bc9e 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1777,15 +1777,36 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
     return 0;
 }
 
+static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
+{
+    if (!hdev->vhost_ops->vhost_set_vring_enable) {
+        return 0;
+    }
+
+    /*
+     * For vhost-user devices, if VHOST_USER_F_PROTOCOL_FEATURES has not
+     * been negotiated, the rings start directly in the enabled state, and
+     * .vhost_set_vring_enable callback will fail since
+     * VHOST_USER_SET_VRING_ENABLE is not supported.
+     */
+    if (hdev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER &&
+        !virtio_has_feature(hdev->backend_features,
+                            VHOST_USER_F_PROTOCOL_FEATURES)) {
+        return 0;
+    }
+
+    return hdev->vhost_ops->vhost_set_vring_enable(hdev, enable);
+}
+
 /* Host notifiers must be enabled at this point. */
-int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
+int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
 {
     int i, r;
 
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
-    trace_vhost_dev_start(hdev, vdev->name);
+    trace_vhost_dev_start(hdev, vdev->name, vrings);
 
     vdev->vhost_started = true;
     hdev->started = true;
@@ -1830,10 +1851,16 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
             goto fail_log;
         }
     }
+    if (vrings) {
+        r = vhost_dev_set_vring_enable(hdev, true);
+        if (r) {
+            goto fail_log;
+        }
+    }
     if (hdev->vhost_ops->vhost_dev_start) {
         r = hdev->vhost_ops->vhost_dev_start(hdev, true);
         if (r) {
-            goto fail_log;
+            goto fail_start;
         }
     }
     if (vhost_dev_has_iommu(hdev) &&
@@ -1848,6 +1875,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         }
     }
     return 0;
+fail_start:
+    if (vrings) {
+        vhost_dev_set_vring_enable(hdev, false);
+    }
 fail_log:
     vhost_log_put(hdev, false);
 fail_vq:
@@ -1866,18 +1897,21 @@ fail_features:
 }
 
 /* Host notifiers must be enabled at this point. */
-void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
+void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
 {
     int i;
 
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
-    trace_vhost_dev_stop(hdev, vdev->name);
+    trace_vhost_dev_stop(hdev, vdev->name, vrings);
 
     if (hdev->vhost_ops->vhost_dev_start) {
         hdev->vhost_ops->vhost_dev_start(hdev, false);
     }
+    if (vrings) {
+        vhost_dev_set_vring_enable(hdev, false);
+    }
     for (i = 0; i < hdev->nvqs; ++i) {
         vhost_virtqueue_stop(hdev,
                              vdev,
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 820dadc26c..14fc5b9bb2 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -9,8 +9,8 @@ vhost_section(const char *name) "%s"
 vhost_reject_section(const char *name, int d) "%s:%d"
 vhost_iotlb_miss(void *dev, int step) "%p step %d"
 vhost_dev_cleanup(void *dev) "%p"
-vhost_dev_start(void *dev, const char *name) "%p:%s"
-vhost_dev_stop(void *dev, const char *name) "%p:%s"
+vhost_dev_start(void *dev, const char *name, bool vrings) "%p:%s vrings:%d"
+vhost_dev_stop(void *dev, const char *name, bool vrings) "%p:%s vrings:%d"
 
 
 # vhost-user.c
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
  2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
                   ` (4 preceding siblings ...)
  2022-11-25 17:30   ` [Virtio-fs] " Alex Bennée
@ 2022-11-25 18:20 ` Stefan Weil via
  2022-11-25 20:08   ` Alex Bennée
  2022-11-25 19:58 ` Alex Bennée
  6 siblings, 1 reply; 15+ messages in thread
From: Stefan Weil via @ 2022-11-25 18:20 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare

Am 25.11.22 um 18:30 schrieb Alex Bennée:
> Hi,
> 
> This is continuing to attempt to fix the various vhost-user issues
> that are currently plaguing the release. One concrete bug I've come
> across is that all qtest MMIO devices where being treated as legacy
> which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x
> to fall back to trying to set the endian value for the virt-queues.

Do you want to add my 4 commits which fix format strings for 
libvhost-user to your series, or should they be handled separately?

Regards
Stefan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
  2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
                   ` (5 preceding siblings ...)
  2022-11-25 18:20 ` [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Stefan Weil via
@ 2022-11-25 19:58 ` Alex Bennée
  2022-11-26  9:42   ` Alex Bennée
  6 siblings, 1 reply; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 19:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Alex Bennée


Alex Bennée <alex.bennee@linaro.org> writes:

> Hi,
>
<snip>
> I can replicate some of the other failures I've been seeing in CI by
> running:
>
>   ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test
>
> however this seems to run everything in parallel and maybe is better
> at exposing race conditions. Perhaps the CI system makes those races
> easier to hit? Unfortunately I've not been able to figure out exactly
> how things go wrong in the failure case. 
>
<snip>

There is a circular call - we are in vu_gpio_stop which triggers a write
to vhost-user which allows us to catch a disconnect event:

  #0  vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
  #1  0x0000557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:138
  #2  0x0000557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:255
  #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
  #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
  #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
  #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
  #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
  #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
  #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
  #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
  #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
  #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
  #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
  #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
  #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
  #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
  #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
  #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
  #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
  #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
  #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
  #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
  #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
  #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
  (rr) p hdev->started
  $9 = true
  (rr) info thread
    Id   Target Id                                Frame 
  * 1    Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
    2    Thread 2140414.2140439 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
    3    Thread 2140414.2140442 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
    4    Thread 2140414.2140443 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()

During which we eliminate the vhost_dev with a memset:

  Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30

  Old value = 2
  New value = 0
  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
  Download failed: Invalid argument.  Continuing without source file ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S.
  220     ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
  (rr) bt
  #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
  #1  0x0000557adbdd67f8 in vhost_dev_cleanup (hdev=0x557adf80d878) at ../../hw/virtio/vhost.c:1501
  #2  0x0000557adbe04d68 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:256
  #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
  #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
  #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
  #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
  #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
  #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
  #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
  #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
  #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
  #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
  #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
  #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
  #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
  #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
  #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
  #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
  #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
  #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
  #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
  #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
  #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
  #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48

Before finally:

  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007f24dc269537 in __GI_abort () at abort.c:79
  #2  0x00007f24dc26940f in __assert_fail_base (fmt=0x7f24dc3e16a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=<optimized out>) at assert.c:92
  #3  0x00007f24dc278662 in __GI___assert_fail (assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=0x557adc28d922 "int virtio_pci_set_guest_notifiers(DeviceState *, int, _Bool)") at assert.c:101
  #4  0x0000557adb8e97f1 in virtio_pci_set_guest_notifiers (d=0x557adf805280, nvqs=0, assign=false) at ../../hw/virtio/virtio-pci.c:1029
  #5  0x0000557adbe051c7 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:144
  #6  0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
  #7  0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
  #8  0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
  #9  0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
  #10 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
  #11 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
  #12 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
  #13 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
  #14 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48

Because of course we've just done that on disconnect.

Not sure what the cleanest way to avoid that is yet. Maybe it will be
clearer on Monday morning.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
  2022-11-25 18:20 ` [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Stefan Weil via
@ 2022-11-25 20:08   ` Alex Bennée
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Bennée @ 2022-11-25 20:08 UTC (permalink / raw)
  To: Stefan Weil
  Cc: qemu-devel, slp, mst, marcandre.lureau, stefanha,
	mathieu.poirier, viresh.kumar, sgarzare


Stefan Weil <sw@weilnetz.de> writes:

> Am 25.11.22 um 18:30 schrieb Alex Bennée:
>> Hi,
>> This is continuing to attempt to fix the various vhost-user issues
>> that are currently plaguing the release. One concrete bug I've come
>> across is that all qtest MMIO devices where being treated as legacy
>> which caused the VIRTIO_F_VERSION_1 flag to get missed causing s390x
>> to fall back to trying to set the endian value for the virt-queues.
>
> Do you want to add my 4 commits which fix format strings for
> libvhost-user to your series, or should they be handled separately?

I'm going to leave the choice of VirtIO patches to take to MST given
he is the maintainer and I've obviously broken it enough this release :-/

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
  2022-11-25 19:58 ` Alex Bennée
@ 2022-11-26  9:42   ` Alex Bennée
  2022-11-26 11:24     ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Bennée @ 2022-11-26  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Alex Bennée, Paolo Bonzini


Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Hi,
>>
> <snip>
>> I can replicate some of the other failures I've been seeing in CI by
>> running:
>>
>>   ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test
>>
>> however this seems to run everything in parallel and maybe is better
>> at exposing race conditions. Perhaps the CI system makes those races
>> easier to hit? Unfortunately I've not been able to figure out exactly
>> how things go wrong in the failure case. 
>>
> <snip>
>
> There is a circular call - we are in vu_gpio_stop which triggers a write
> to vhost-user which allows us to catch a disconnect event:
>
>   #0  vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
>   #1  0x0000557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:138
>   #2  0x0000557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:255
>   #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274

I suspect the best choice here is to schedule the cleanup as a later
date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
event?

Paolo, any suggestions?

>   #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
>   #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
>   #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
>   #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
>   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
>   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
>   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
>   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
>   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
>   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
>   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
>   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
>   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
>   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
>   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
>   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
>   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
>   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
>   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
>   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
>   (rr) p hdev->started
>   $9 = true
>   (rr) info thread
>     Id   Target Id                                Frame 
>   * 1    Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
>     2    Thread 2140414.2140439 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
>     3    Thread 2140414.2140442 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
>     4    Thread 2140414.2140443 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
>
> During which we eliminate the vhost_dev with a memset:
>
>   Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30
>
>   Old value = 2
>   New value = 0
>   __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
>   Download failed: Invalid argument.  Continuing without source file ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S.
>   220     ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
>   (rr) bt
>   #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
>   #1  0x0000557adbdd67f8 in vhost_dev_cleanup (hdev=0x557adf80d878) at ../../hw/virtio/vhost.c:1501
>   #2  0x0000557adbe04d68 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:256
>   #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
>   #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
>   #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
>   #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
>   #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
>   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
>   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
>   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
>   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
>   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
>   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
>   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
>   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
>   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
>   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
>   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
>   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
>   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
>   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
>   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
>   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
>
> Before finally:
>
>   #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>   #1  0x00007f24dc269537 in __GI_abort () at abort.c:79
>   #2  0x00007f24dc26940f in __assert_fail_base (fmt=0x7f24dc3e16a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=<optimized out>) at assert.c:92
>   #3  0x00007f24dc278662 in __GI___assert_fail (assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=0x557adc28d922 "int virtio_pci_set_guest_notifiers(DeviceState *, int, _Bool)") at assert.c:101
>   #4  0x0000557adb8e97f1 in virtio_pci_set_guest_notifiers (d=0x557adf805280, nvqs=0, assign=false) at ../../hw/virtio/virtio-pci.c:1029
>   #5  0x0000557adbe051c7 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:144
>   #6  0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>   #7  0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
>   #8  0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
>   #9  0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
>   #10 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
>   #11 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
>   #12 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
>   #13 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
>   #14 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
>
> Because of course we've just done that on disconnect.
>
> Not sure what the cleanest way to avoid that is yet. Maybe it will be
> clearer on Monday morning.


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
  2022-11-26  9:42   ` Alex Bennée
@ 2022-11-26 11:24     ` Stefan Hajnoczi
  2022-11-26 14:12       ` Alex Bennée
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2022-11-26 11:24 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, slp, mst, marcandre.lureau, stefanha,
	mathieu.poirier, viresh.kumar, sgarzare, Paolo Bonzini

On Sat, 26 Nov 2022 at 04:45, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Alex Bennée <alex.bennee@linaro.org> writes:
>
> > Alex Bennée <alex.bennee@linaro.org> writes:
> >
> >> Hi,
> >>
> > <snip>
> >> I can replicate some of the other failures I've been seeing in CI by
> >> running:
> >>
> >>   ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test
> >>
> >> however this seems to run everything in parallel and maybe is better
> >> at exposing race conditions. Perhaps the CI system makes those races
> >> easier to hit? Unfortunately I've not been able to figure out exactly
> >> how things go wrong in the failure case.
> >>
> > <snip>
> >
> > There is a circular call - we are in vu_gpio_stop which triggers a write
> > to vhost-user which allows us to catch a disconnect event:
> >
> >   #0  vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> >   #1  0x0000557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:138
> >   #2  0x0000557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:255
> >   #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
>
> I suspect the best choice here is to schedule the cleanup as a later
> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
> event?
>
> Paolo, any suggestions?
>
> >   #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
> >   #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
> >   #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
> >   #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129

Does this mean the backend closed the connection before receiving all
the vhost-user protocol messages sent by QEMU?

This looks like a backend bug. It prevents QEMU's vhost-user client
from cleanly stopping the virtqueue (vhost_virtqueue_stop()).

QEMU is still broken if it cannot handle disconnect at any time. Maybe
a simple solution for that is to check for reentrancy (either by
checking an existing variable or adding a new one to prevent
vu_gpio_stop() from calling itself).

> >   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
> >   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
> >   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
> >   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
> >   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
> >   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
> >   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
> >   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
> >   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
> >   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
> >   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
> >   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
> >   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
> >   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
> >   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
> >   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
> >   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
> >   (rr) p hdev->started
> >   $9 = true
> >   (rr) info thread
> >     Id   Target Id                                Frame
> >   * 1    Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> >     2    Thread 2140414.2140439 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
> >     3    Thread 2140414.2140442 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
> >     4    Thread 2140414.2140443 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
> >
> > During which we eliminate the vhost_dev with a memset:
> >
> >   Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30
> >
> >   Old value = 2
> >   New value = 0
> >   __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
> >   Download failed: Invalid argument.  Continuing without source file ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S.
> >   220     ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
> >   (rr) bt
> >   #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
> >   #1  0x0000557adbdd67f8 in vhost_dev_cleanup (hdev=0x557adf80d878) at ../../hw/virtio/vhost.c:1501
> >   #2  0x0000557adbe04d68 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:256
> >   #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
> >   #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
> >   #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
> >   #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
> >   #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
> >   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
> >   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
> >   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
> >   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
> >   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
> >   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
> >   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
> >   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
> >   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
> >   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
> >   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
> >   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
> >   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
> >   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
> >   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
> >   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
> >   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
> >
> > Before finally:
> >
> >   #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> >   #1  0x00007f24dc269537 in __GI_abort () at abort.c:79
> >   #2  0x00007f24dc26940f in __assert_fail_base (fmt=0x7f24dc3e16a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=<optimized out>) at assert.c:92
> >   #3  0x00007f24dc278662 in __GI___assert_fail (assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=0x557adc28d922 "int virtio_pci_set_guest_notifiers(DeviceState *, int, _Bool)") at assert.c:101
> >   #4  0x0000557adb8e97f1 in virtio_pci_set_guest_notifiers (d=0x557adf805280, nvqs=0, assign=false) at ../../hw/virtio/virtio-pci.c:1029
> >   #5  0x0000557adbe051c7 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:144
> >   #6  0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
> >   #7  0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
> >   #8  0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
> >   #9  0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
> >   #10 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
> >   #11 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
> >   #12 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
> >   #13 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
> >   #14 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
> >
> > Because of course we've just done that on disconnect.
> >
> > Not sure what the cleanest way to avoid that is yet. Maybe it will be
> > clearer on Monday morning.
>
>
> --
> Alex Bennée
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
  2022-11-26 11:24     ` Stefan Hajnoczi
@ 2022-11-26 14:12       ` Alex Bennée
  2022-11-27 18:04         ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Bennée @ 2022-11-26 14:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, slp, mst, marcandre.lureau, stefanha,
	mathieu.poirier, viresh.kumar, sgarzare, Paolo Bonzini


Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Sat, 26 Nov 2022 at 04:45, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>> > Alex Bennée <alex.bennee@linaro.org> writes:
>> >
>> >> Hi,
>> >>
>> > <snip>
>> >> I can replicate some of the other failures I've been seeing in CI by
>> >> running:
>> >>
>> >>   ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test
>> >>
>> >> however this seems to run everything in parallel and maybe is better
>> >> at exposing race conditions. Perhaps the CI system makes those races
>> >> easier to hit? Unfortunately I've not been able to figure out exactly
>> >> how things go wrong in the failure case.
>> >>
>> > <snip>
>> >
>> > There is a circular call - we are in vu_gpio_stop which triggers a write
>> > to vhost-user which allows us to catch a disconnect event:
>> >
>> >   #0 vhost_dev_is_started (hdev=0x557adf80d878) at
>> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
>> >   #1  0x0000557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:138
>> >   #2 0x0000557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640)
>> > at ../../hw/virtio/vhost-user-gpio.c:255
>> >   #3 0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640,
>> > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
>>
>> I suspect the best choice here is to schedule the cleanup as a later
>> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
>> event?
>>
>> Paolo, any suggestions?
>>
>> >   #4 0x0000557adc0539ef in chr_be_event (s=0x557adea51f10,
>> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
>> >   #5 0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10,
>> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
>> >   #6 0x0000557adc04f666 in tcp_chr_disconnect_locked
>> > (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
>> >   #7 0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10,
>> > buf=0x7ffe8588cce0 "\v", len=20) at
>> > ../../chardev/char-socket.c:129
>
> Does this mean the backend closed the connection before receiving all
> the vhost-user protocol messages sent by QEMU?
>
> This looks like a backend bug. It prevents QEMU's vhost-user client
> from cleanly stopping the virtqueue (vhost_virtqueue_stop()).

Well the backend in this case is the qtest framework so not the worlds
most complete implementation.

> QEMU is still broken if it cannot handle disconnect at any time. Maybe
> a simple solution for that is to check for reentrancy (either by
> checking an existing variable or adding a new one to prevent
> vu_gpio_stop() from calling itself).

vhost-user-blk introduced an additional flag:

    /*
     * There are at least two steps of initialization of the
     * vhost-user device. The first is a "connect" step and
     * second is a "start" step. Make a separation between
     * those initialization phases by using two fields.
     */
    /* vhost_user_blk_connect/vhost_user_blk_disconnect */
    bool connected;
    /* vhost_user_blk_start/vhost_user_blk_stop */
    bool started_vu;

but that in itself is not enough. If you look at the various cases of
handling CHR_EVENT_CLOSED you'll see some schedule the shutdown with aio
and some don't even bother (so will probably break the same way).

Rather than have a mish-mash of solutions maybe we should introduce a
new vhost function - vhost_user_async_close() which can take care of the
scheduling and wrap it with a check for a valid vhost structure in case
it gets shutdown in the meantime?

>
>> >   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
>> >   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
>> >   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
>> >   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
>> >   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
>> >   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
>> >   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
>> >   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
>> >   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>> >   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
>> >   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
>> >   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
>> >   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
>> >   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
>> >   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
>> >   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
>> >   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
>> >   (rr) p hdev->started
>> >   $9 = true
>> >   (rr) info thread
>> >     Id   Target Id                                Frame
>> >   * 1    Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
>> >     2    Thread 2140414.2140439 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
>> >     3    Thread 2140414.2140442 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
>> >     4    Thread 2140414.2140443 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
>> >
>> > During which we eliminate the vhost_dev with a memset:
>> >
>> >   Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30
>> >
>> >   Old value = 2
>> >   New value = 0
>> >   __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
>> >   Download failed: Invalid argument.  Continuing without source file ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S.
>> >   220     ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
>> >   (rr) bt
>> >   #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
>> >   #1  0x0000557adbdd67f8 in vhost_dev_cleanup (hdev=0x557adf80d878) at ../../hw/virtio/vhost.c:1501
>> >   #2  0x0000557adbe04d68 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:256
>> >   #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
>> >   #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
>> >   #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
>> >   #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
>> >   #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
>> >   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
>> >   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
>> >   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
>> >   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
>> >   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
>> >   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
>> >   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
>> >   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
>> >   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>> >   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
>> >   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
>> >   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
>> >   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
>> >   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
>> >   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
>> >   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
>> >   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
>> >
>> > Before finally:
>> >
>> >   #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>> >   #1  0x00007f24dc269537 in __GI_abort () at abort.c:79
>> >   #2  0x00007f24dc26940f in __assert_fail_base (fmt=0x7f24dc3e16a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=<optimized out>) at assert.c:92
>> >   #3  0x00007f24dc278662 in __GI___assert_fail (assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=0x557adc28d922 "int virtio_pci_set_guest_notifiers(DeviceState *, int, _Bool)") at assert.c:101
>> >   #4  0x0000557adb8e97f1 in virtio_pci_set_guest_notifiers (d=0x557adf805280, nvqs=0, assign=false) at ../../hw/virtio/virtio-pci.c:1029
>> >   #5  0x0000557adbe051c7 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:144
>> >   #6  0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
>> >   #7  0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
>> >   #8  0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
>> >   #9  0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
>> >   #10 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
>> >   #11 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
>> >   #12 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
>> >   #13 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
>> >   #14 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
>> >
>> > Because of course we've just done that on disconnect.
>> >
>> > Not sure what the cleanest way to avoid that is yet. Maybe it will be
>> > clearer on Monday morning.
>>
>>
>> --
>> Alex Bennée
>>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] include/hw: VM state takes precedence in virtio_device_should_start
  2022-11-25 17:30 ` [PATCH v2 2/5] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
@ 2022-11-27 10:17   ` Bernhard Beschow
  0 siblings, 0 replies; 15+ messages in thread
From: Bernhard Beschow @ 2022-11-27 10:17 UTC (permalink / raw)
  To: qemu-devel, Alex Bennée
  Cc: slp, mst, marcandre.lureau, stefanha, mathieu.poirier,
	viresh.kumar, sgarzare, Christian Borntraeger



Am 25. November 2022 17:30:40 UTC schrieb "Alex Bennée" <alex.bennee@linaro.org>:
>The VM status should always preempt the device status for these
>checks. This ensures the device is in the correct state when we
>suspend the VM prior to migrations. This restores the checks to the
>order they where in before the refactoring moved things around.
>
>While we are at it lets improve our documentation of the various
>fields involved and document the two functions.
>
>Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started)
>Fixes: 259d69c00b (hw/virtio: introduce virtio_device_should_start)
>Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
>---
> include/hw/virtio/virtio.h | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)
>
>diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>index 0f612067f7..48f539d0fe 100644
>--- a/include/hw/virtio/virtio.h
>+++ b/include/hw/virtio/virtio.h
>@@ -133,6 +133,13 @@ struct VirtIODevice
>     bool broken; /* device in invalid state, needs reset */
>     bool use_disabled_flag; /* allow use of 'disable' flag when needed */
>     bool disabled; /* device in temporarily disabled state */
>+    /**
>+     * @use_started: true if the @started flag should be used to check the
>+     * current state of the VirtIO device. Otherwise status bits
>+     * should be checked for a current status of the device.
>+     * @use_started is only set via QMP and defaults to true for all
>+     * modern machines (since 4.1).
>+     */
>     bool use_started;
>     bool started;
>     bool start_on_kick; /* when virtio 1.0 feature has not been negotiated */
>@@ -408,6 +415,17 @@ static inline bool virtio_is_big_endian(VirtIODevice *vdev)
>     return false;
> }
> 
>+

This adds an extra empty line.

>+/**
>+ * virtio_device_should_start() - check if device started

s/virtio_device_should_start/virtio_device_started/

Best regards,
Bernhard

>+ * @vdev - the VirtIO device
>+ * @status - the devices status bits
>+ *
>+ * Check if the device is started. For most modern machines this is
>+ * tracked via the @vdev->started field (to support migration),
>+ * otherwise we check for the final negotiated status bit that
>+ * indicates everything is ready.
>+ */
> static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
> {
>     if (vdev->use_started) {
>@@ -428,15 +446,11 @@ static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
>  */
> static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status)
> {
>-    if (vdev->use_started) {
>-        return vdev->started;
>-    }
>-
>     if (!vdev->vm_running) {
>         return false;
>     }
> 
>-    return status & VIRTIO_CONFIG_S_DRIVER_OK;
>+    return virtio_device_started(vdev, status);
> }
> 
> static inline void virtio_set_started(VirtIODevice *vdev, bool started)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues
  2022-11-26 14:12       ` Alex Bennée
@ 2022-11-27 18:04         ` Stefan Hajnoczi
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2022-11-27 18:04 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, slp, mst, marcandre.lureau, stefanha,
	mathieu.poirier, viresh.kumar, sgarzare, Paolo Bonzini

On Sat, 26 Nov 2022 at 09:18, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
> > On Sat, 26 Nov 2022 at 04:45, Alex Bennée <alex.bennee@linaro.org> wrote:
> >>
> >>
> >> Alex Bennée <alex.bennee@linaro.org> writes:
> >>
> >> > Alex Bennée <alex.bennee@linaro.org> writes:
> >> >
> >> >> Hi,
> >> >>
> >> > <snip>
> >> >> I can replicate some of the other failures I've been seeing in CI by
> >> >> running:
> >> >>
> >> >>   ../../meson/meson.py test --repeat 10 --print-errorlogs qtest-arm/qos-test
> >> >>
> >> >> however this seems to run everything in parallel and maybe is better
> >> >> at exposing race conditions. Perhaps the CI system makes those races
> >> >> easier to hit? Unfortunately I've not been able to figure out exactly
> >> >> how things go wrong in the failure case.
> >> >>
> >> > <snip>
> >> >
> >> > There is a circular call - we are in vu_gpio_stop which triggers a write
> >> > to vhost-user which allows us to catch a disconnect event:
> >> >
> >> >   #0 vhost_dev_is_started (hdev=0x557adf80d878) at
> >> > /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> >> >   #1  0x0000557adbe0518a in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:138
> >> >   #2 0x0000557adbe04d56 in vu_gpio_disconnect (dev=0x557adf80d640)
> >> > at ../../hw/virtio/vhost-user-gpio.c:255
> >> >   #3 0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640,
> >> > event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
> >>
> >> I suspect the best choice here is to schedule the cleanup as a later
> >> date. Should I use the aio_bh one shots for this or maybe an rcu cleanup
> >> event?
> >>
> >> Paolo, any suggestions?
> >>
> >> >   #4 0x0000557adc0539ef in chr_be_event (s=0x557adea51f10,
> >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
> >> >   #5 0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10,
> >> > event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
> >> >   #6 0x0000557adc04f666 in tcp_chr_disconnect_locked
> >> > (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
> >> >   #7 0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10,
> >> > buf=0x7ffe8588cce0 "\v", len=20) at
> >> > ../../chardev/char-socket.c:129
> >
> > Does this mean the backend closed the connection before receiving all
> > the vhost-user protocol messages sent by QEMU?
> >
> > This looks like a backend bug. It prevents QEMU's vhost-user client
> > from cleanly stopping the virtqueue (vhost_virtqueue_stop()).
>
> Well the backend in this case is the qtest framework so not the worlds
> most complete implementation.
>
> > QEMU is still broken if it cannot handle disconnect at any time. Maybe
> > a simple solution for that is to check for reentrancy (either by
> > checking an existing variable or adding a new one to prevent
> > vu_gpio_stop() from calling itself).
>
> vhost-user-blk introduced an additional flag:
>
>     /*
>      * There are at least two steps of initialization of the
>      * vhost-user device. The first is a "connect" step and
>      * second is a "start" step. Make a separation between
>      * those initialization phases by using two fields.
>      */
>     /* vhost_user_blk_connect/vhost_user_blk_disconnect */
>     bool connected;
>     /* vhost_user_blk_start/vhost_user_blk_stop */
>     bool started_vu;
>
> but that in itself is not enough. If you look at the various cases of
> handling CHR_EVENT_CLOSED you'll see some schedule the shutdown with aio
> and some don't even bother (so will probably break the same way).
>
> Rather than have a mish-mash of solutions maybe we should introduce a
> new vhost function - vhost_user_async_close() which can take care of the
> scheduling and wrap it with a check for a valid vhost structure in case
> it gets shutdown in the meantime?

Handling this in core vhost code would be great.

I suggested checking a variable because it's not async. Async is more
complicated because it creates a new in-between state while waiting
for the operation to complete. Async approaches are more likely to
have bugs for this reason.

vhost-user-blk.c's async shutdown is a good example of that:
    case CHR_EVENT_CLOSED:
        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
            /*
             * A close event may happen during a read/write, but vhost
             * code assumes the vhost_dev remains setup, so delay the
             * stop & clear.
             */
            AioContext *ctx = qemu_get_current_aio_context();

            qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
                    NULL, NULL, false);
            aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);

            /*
             * Move vhost device to the stopped state. The vhost-user device
             * will be clean up and disconnected in BH. This can be useful in
             * the vhost migration code. If disconnect was caught there is an
             * option for the general vhost code to get the dev state without
             * knowing its type (in this case vhost-user).
             *
             * FIXME: this is sketchy to be reaching into vhost_dev
             * now because we are forcing something that implies we
             * have executed vhost_dev_stop() but that won't happen
             * until vhost_user_blk_stop() gets called from the bh.
             * Really this state check should be tracked locally.
             */
            s->dev.started = false;

That said, maybe async is really needed here. I haven't looked at the code.

>
> >
> >> >   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
> >> >   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
> >> >   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
> >> >   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
> >> >   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
> >> >   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
> >> >   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
> >> >   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
> >> >   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
> >> >   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
> >> >   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
> >> >   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
> >> >   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
> >> >   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
> >> >   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
> >> >   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
> >> >   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
> >> >   (rr) p hdev->started
> >> >   $9 = true
> >> >   (rr) info thread
> >> >     Id   Target Id                                Frame
> >> >   * 1    Thread 2140414.2140414 (qemu-system-aar) vhost_dev_is_started (hdev=0x557adf80d878) at /home/alex/lsrc/qemu.git/include/hw/virtio/vhost.h:199
> >> >     2    Thread 2140414.2140439 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
> >> >     3    Thread 2140414.2140442 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
> >> >     4    Thread 2140414.2140443 (qemu-system-aar) 0x0000000070000002 in syscall_traced ()
> >> >
> >> > During which we eliminate the vhost_dev with a memset:
> >> >
> >> >   Thread 1 hit Hardware watchpoint 2: *(unsigned int *) 0x557adf80da30
> >> >
> >> >   Old value = 2
> >> >   New value = 0
> >> >   __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
> >> >   Download failed: Invalid argument.  Continuing without source file ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S.
> >> >   220     ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
> >> >   (rr) bt
> >> >   #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:220
> >> >   #1  0x0000557adbdd67f8 in vhost_dev_cleanup (hdev=0x557adf80d878) at ../../hw/virtio/vhost.c:1501
> >> >   #2  0x0000557adbe04d68 in vu_gpio_disconnect (dev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:256
> >> >   #3  0x0000557adbe049bb in vu_gpio_event (opaque=0x557adf80d640, event=CHR_EVENT_CLOSED) at ../../hw/virtio/vhost-user-gpio.c:274
> >> >   #4  0x0000557adc0539ef in chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:61
> >> >   #5  0x0000557adc0506aa in qemu_chr_be_event (s=0x557adea51f10, event=CHR_EVENT_CLOSED) at ../../chardev/char.c:81
> >> >   #6  0x0000557adc04f666 in tcp_chr_disconnect_locked (chr=0x557adea51f10) at ../../chardev/char-socket.c:470
> >> >   #7  0x0000557adc04c81a in tcp_chr_write (chr=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-socket.c:129
> >> >   #8  0x0000557adc050999 in qemu_chr_write_buffer (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, offset=0x7ffe8588cbe4, write_all=true) at ../../chardev/char.c:121
> >> >   #9  0x0000557adc0507c7 in qemu_chr_write (s=0x557adea51f10, buf=0x7ffe8588cce0 "\v", len=20, write_all=true) at ../../chardev/char.c:173
> >> >   #10 0x0000557adc046f3a in qemu_chr_fe_write_all (be=0x557adf80d830, buf=0x7ffe8588cce0 "\v", len=20) at ../../chardev/char-fe.c:53
> >> >   #11 0x0000557adbddc02f in vhost_user_write (dev=0x557adf80d878, msg=0x7ffe8588cce0, fds=0x0, fd_num=0) at ../../hw/virtio/vhost-user.c:490
> >> >   #12 0x0000557adbddd48f in vhost_user_get_vring_base (dev=0x557adf80d878, ring=0x7ffe8588d000) at ../../hw/virtio/vhost-user.c:1260
> >> >   #13 0x0000557adbdd4bd6 in vhost_virtqueue_stop (dev=0x557adf80d878, vdev=0x557adf80d640, vq=0x557adf843570, idx=0) at ../../hw/virtio/vhost.c:1220
> >> >   #14 0x0000557adbdd7eda in vhost_dev_stop (hdev=0x557adf80d878, vdev=0x557adf80d640, vrings=false) at ../../hw/virtio/vhost.c:1916
> >> >   #15 0x0000557adbe051a6 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:142
> >> >   #16 0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
> >> >   #17 0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
> >> >   #18 0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
> >> >   #19 0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
> >> >   #20 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
> >> >   #21 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
> >> >   #22 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
> >> >   #23 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
> >> >   #24 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
> >> >
> >> > Before finally:
> >> >
> >> >   #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> >> >   #1  0x00007f24dc269537 in __GI_abort () at abort.c:79
> >> >   #2  0x00007f24dc26940f in __assert_fail_base (fmt=0x7f24dc3e16a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=<optimized out>) at assert.c:92
> >> >   #3  0x00007f24dc278662 in __GI___assert_fail (assertion=0x557adc28d8f5 "assign || nvqs == proxy->nvqs_with_notifiers", file=0x557adc28d7ab "../../hw/virtio/virtio-pci.c", line=1029, function=0x557adc28d922 "int virtio_pci_set_guest_notifiers(DeviceState *, int, _Bool)") at assert.c:101
> >> >   #4  0x0000557adb8e97f1 in virtio_pci_set_guest_notifiers (d=0x557adf805280, nvqs=0, assign=false) at ../../hw/virtio/virtio-pci.c:1029
> >> >   #5  0x0000557adbe051c7 in vu_gpio_stop (vdev=0x557adf80d640) at ../../hw/virtio/vhost-user-gpio.c:144
> >> >   #6  0x0000557adbe04849 in vu_gpio_set_status (vdev=0x557adf80d640, status=15 '\017') at ../../hw/virtio/vhost-user-gpio.c:173
> >> >   #7  0x0000557adbdc87ff in virtio_set_status (vdev=0x557adf80d640, val=15 '\017') at ../../hw/virtio/virtio.c:2442
> >> >   #8  0x0000557adbdcbfa0 in virtio_vmstate_change (opaque=0x557adf80d640, running=false, state=RUN_STATE_SHUTDOWN) at ../../hw/virtio/virtio.c:3736
> >> >   #9  0x0000557adb91ad27 in vm_state_notify (running=false, state=RUN_STATE_SHUTDOWN) at ../../softmmu/runstate.c:334
> >> >   #10 0x0000557adb910e88 in do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=false) at ../../softmmu/cpus.c:262
> >> >   #11 0x0000557adb910e30 in vm_shutdown () at ../../softmmu/cpus.c:280
> >> >   #12 0x0000557adb91b9c3 in qemu_cleanup () at ../../softmmu/runstate.c:827
> >> >   #13 0x0000557adb522975 in qemu_default_main () at ../../softmmu/main.c:38
> >> >   #14 0x0000557adb5229a8 in main (argc=27, argv=0x7ffe8588d2f8) at ../../softmmu/main.c:48
> >> >
> >> > Because of course we've just done that on disconnect.
> >> >
> >> > Not sure what the cleanest way to avoid that is yet. Maybe it will be
> >> > clearer on Monday morning.
> >>
> >>
> >> --
> >> Alex Bennée
> >>
>
>
> --
> Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-11-27 18:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-25 17:30 [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Alex Bennée
2022-11-25 17:30 ` [PATCH v2 1/5] include/hw: attempt to document VirtIO feature variables Alex Bennée
2022-11-25 17:30 ` [PATCH v2 2/5] include/hw: VM state takes precedence in virtio_device_should_start Alex Bennée
2022-11-27 10:17   ` Bernhard Beschow
2022-11-25 17:30 ` [PATCH v2 3/5] tests/qtests: override "force-legacy" for gpio virtio-mmio tests Alex Bennée
2022-11-25 17:30 ` [PATCH v2 4/5] hw/virtio: ensure a valid host_feature set for virtio-user-gpio Alex Bennée
2022-11-25 17:30 ` [PATCH v2 5/5] vhost: enable vrings in vhost_dev_start() for vhost-user devices Alex Bennée
2022-11-25 17:30   ` [Virtio-fs] " Alex Bennée
2022-11-25 18:20 ` [PATCH for 7.2-rc? v2 0/5] continuing efforts to fix vhost-user issues Stefan Weil via
2022-11-25 20:08   ` Alex Bennée
2022-11-25 19:58 ` Alex Bennée
2022-11-26  9:42   ` Alex Bennée
2022-11-26 11:24     ` Stefan Hajnoczi
2022-11-26 14:12       ` Alex Bennée
2022-11-27 18:04         ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.