[PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion
@ 2021-04-22 17:02 Kevin Wolf
  2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Kevin Wolf @ 2021-04-22 17:02 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den-plotnikov, mst, qemu-devel, raphael.norwitz

vhost-user-blk neglects for several properties to check whether the
configured value is even compatible with the backend. This results
sometimes in crashes because of buggy error handling code, and sometimes
in devices that are presented differently to the guest than the backend
would expect and that don't work properly therefore.

This series fixes some of these bugs.

Kevin Wolf (5):
  vhost-user-blk: Don't reconnect during initialisation
  vhost-user-blk: Use Error more consistently
  vhost-user-blk: Get more feature flags from vhost device
  virtio: Fail if iommu_platform is requested, but unsupported
  vhost-user-blk: Check that num-queues is supported by backend

 include/hw/virtio/vhost.h |  2 +
 hw/block/vhost-user-blk.c | 77 +++++++++++++--------------------------
 hw/virtio/vhost-user.c    |  5 +++
 hw/virtio/virtio-bus.c    |  5 +++
 4 files changed, 38 insertions(+), 51 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-22 17:02 [PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion Kevin Wolf
@ 2021-04-22 17:02 ` Kevin Wolf
  2021-04-23  7:17   ` Denis Plotnikov
                     ` (3 more replies)
  2021-04-22 17:02 ` [PATCH 2/5] vhost-user-blk: Use Error more consistently Kevin Wolf
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 21+ messages in thread
From: Kevin Wolf @ 2021-04-22 17:02 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den-plotnikov, mst, qemu-devel, raphael.norwitz

This is a partial revert of commits 77542d43149 and bc79c87bcde.

Usually, an error during initialisation means that the configuration was
wrong. Reconnecting won't make the error go away, but just turn the
error condition into an endless loop. Avoid this and return errors
again.

Additionally, calling vhost_user_blk_disconnect() from the chardev event
handler could result in use-after-free because none of the
initialisation code expects that the device could just go away in the
middle. So removing the call fixes crashes in several places.

For example, using a num-queues setting that is incompatible with the
backend would result in a crash like this (dereferencing dev->opaque,
which is already NULL):

 #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
 #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
 #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
 #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
 #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
 #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
 #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
 #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
 #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
 #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
 1 file changed, 13 insertions(+), 41 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index f5e9682703..e824b0a759 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -50,6 +50,8 @@ static const int user_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
+
 static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
@@ -362,19 +364,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
     vhost_dev_cleanup(&s->dev);
 }
 
-static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
-                                 bool realized);
-
-static void vhost_user_blk_event_realize(void *opaque, QEMUChrEvent event)
-{
-    vhost_user_blk_event(opaque, event, false);
-}
-
-static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
-{
-    vhost_user_blk_event(opaque, event, true);
-}
-
 static void vhost_user_blk_chr_closed_bh(void *opaque)
 {
     DeviceState *dev = opaque;
@@ -382,12 +371,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
 
     vhost_user_blk_disconnect(dev);
-    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
-            vhost_user_blk_event_oper, NULL, opaque, NULL, true);
+    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
+                             NULL, opaque, NULL, true);
 }
 
-static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
-                                 bool realized)
+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
 {
     DeviceState *dev = opaque;
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -401,17 +389,7 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
         }
         break;
     case CHR_EVENT_CLOSED:
-        /*
-         * Closing the connection should happen differently on device
-         * initialization and operation stages.
-         * On initalization, we want to re-start vhost_dev initialization
-         * from the very beginning right away when the connection is closed,
-         * so we clean up vhost_dev on each connection closing.
-         * On operation, we want to postpone vhost_dev cleanup to let the
-         * other code perform its own cleanup sequence using vhost_dev data
-         * (e.g. vhost_dev_set_log).
-         */
-        if (realized && !runstate_check(RUN_STATE_SHUTDOWN)) {
+        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
             /*
              * A close event may happen during a read/write, but vhost
              * code assumes the vhost_dev remains setup, so delay the
@@ -431,8 +409,6 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
              * knowing its type (in this case vhost-user).
              */
             s->dev.started = false;
-        } else {
-            vhost_user_blk_disconnect(dev);
         }
         break;
     case CHR_EVENT_BREAK:
@@ -490,31 +466,27 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
     s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
     s->connected = false;
 
-    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
-                             vhost_user_blk_event_realize, NULL, (void *)dev,
-                             NULL, true);
-
-reconnect:
     if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
         error_report_err(err);
         goto virtio_err;
     }
 
-    /* check whether vhost_user_blk_connect() failed or not */
-    if (!s->connected) {
-        goto reconnect;
+    if (vhost_user_blk_connect(dev) < 0) {
+        qemu_chr_fe_disconnect(&s->chardev);
+        goto virtio_err;
     }
+    assert(s->connected);
 
     ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
                                sizeof(struct virtio_blk_config));
     if (ret < 0) {
         error_report("vhost-user-blk: get block config failed");
-        goto reconnect;
+        goto virtio_err;
     }
 
-    /* we're fully initialized, now we can operate, so change the handler */
+    /* we're fully initialized, now we can operate, so add the handler */
     qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
-                             vhost_user_blk_event_oper, NULL, (void *)dev,
+                             vhost_user_blk_event, NULL, (void *)dev,
                              NULL, true);
     return;
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] vhost-user-blk: Use Error more consistently
  2021-04-22 17:02 [PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion Kevin Wolf
  2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
@ 2021-04-22 17:02 ` Kevin Wolf
  2021-04-28 18:08   ` Raphael Norwitz
  2021-04-22 17:02 ` [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost device Kevin Wolf
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-22 17:02 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den-plotnikov, mst, qemu-devel, raphael.norwitz

Now that vhost_user_blk_connect() is not called from an event handler
any more, but directly from vhost_user_blk_device_realize(), we don't
have to resort to error_report() any more, but can use Error again.

With Error, the callers are responsible for adding context if necessary
(such as the "-device" option the error refers to). Additional prefixes
are redundant and better omitted.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/vhost-user-blk.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index e824b0a759..8422a29470 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -311,7 +311,7 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
     vhost_dev_free_inflight(s->inflight);
 }
 
-static int vhost_user_blk_connect(DeviceState *dev)
+static int vhost_user_blk_connect(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
@@ -331,8 +331,7 @@ static int vhost_user_blk_connect(DeviceState *dev)
 
     ret = vhost_dev_init(&s->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
-        error_report("vhost-user-blk: vhost initialization failed: %s",
-                     strerror(-ret));
+        error_setg_errno(errp, -ret, "vhost initialization failed");
         return ret;
     }
 
@@ -340,8 +339,7 @@ static int vhost_user_blk_connect(DeviceState *dev)
     if (virtio_device_started(vdev, vdev->status)) {
         ret = vhost_user_blk_start(vdev);
         if (ret < 0) {
-            error_report("vhost-user-blk: vhost start failed: %s",
-                         strerror(-ret));
+            error_setg_errno(errp, -ret, "vhost start failed");
             return ret;
         }
     }
@@ -380,10 +378,12 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
     DeviceState *dev = opaque;
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    Error *local_err = NULL;
 
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_blk_connect(dev) < 0) {
+        if (vhost_user_blk_connect(dev, &local_err) < 0) {
+            error_report_err(local_err);
             qemu_chr_fe_disconnect(&s->chardev);
             return;
         }
@@ -427,7 +427,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
     int i, ret;
 
     if (!s->chardev.chr) {
-        error_setg(errp, "vhost-user-blk: chardev is mandatory");
+        error_setg(errp, "chardev is mandatory");
         return;
     }
 
@@ -435,16 +435,16 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         s->num_queues = 1;
     }
     if (!s->num_queues || s->num_queues > VIRTIO_QUEUE_MAX) {
-        error_setg(errp, "vhost-user-blk: invalid number of IO queues");
+        error_setg(errp, "invalid number of IO queues");
         return;
     }
 
     if (!s->queue_size) {
-        error_setg(errp, "vhost-user-blk: queue size must be non-zero");
+        error_setg(errp, "queue size must be non-zero");
         return;
     }
     if (s->queue_size > VIRTQUEUE_MAX_SIZE) {
-        error_setg(errp, "vhost-user-blk: queue size must not exceed %d",
+        error_setg(errp, "queue size must not exceed %d",
                    VIRTQUEUE_MAX_SIZE);
         return;
     }
@@ -471,7 +471,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         goto virtio_err;
     }
 
-    if (vhost_user_blk_connect(dev) < 0) {
+    if (vhost_user_blk_connect(dev, errp) < 0) {
         qemu_chr_fe_disconnect(&s->chardev);
         goto virtio_err;
     }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost device
  2021-04-22 17:02 [PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion Kevin Wolf
  2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
  2021-04-22 17:02 ` [PATCH 2/5] vhost-user-blk: Use Error more consistently Kevin Wolf
@ 2021-04-22 17:02 ` Kevin Wolf
  2021-04-28 18:46   ` [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost devicey Raphael Norwitz
  2021-04-22 17:02 ` [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported Kevin Wolf
  2021-04-22 17:02 ` [PATCH 5/5] vhost-user-blk: Check that num-queues is supported by backend Kevin Wolf
  4 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-22 17:02 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den-plotnikov, mst, qemu-devel, raphael.norwitz

VIRTIO_F_RING_PACKED and VIRTIO_F_IOMMU_PLATFORM need to be supported by
the vhost device, otherwise advertising it to the guest doesn't result
in a working configuration. They are currently not supported by the
vhost-user-blk export in QEMU.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935020
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/vhost-user-blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 8422a29470..b6f4bb3f6f 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -47,6 +47,8 @@ static const int user_feature_bits[] = {
     VIRTIO_RING_F_INDIRECT_DESC,
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_F_NOTIFY_ON_EMPTY,
+    VIRTIO_F_RING_PACKED,
+    VIRTIO_F_IOMMU_PLATFORM,
     VHOST_INVALID_FEATURE_BIT
 };
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported
  2021-04-22 17:02 [PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion Kevin Wolf
                   ` (2 preceding siblings ...)
  2021-04-22 17:02 ` [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost device Kevin Wolf
@ 2021-04-22 17:02 ` Kevin Wolf
  2021-04-28 19:24   ` Raphael Norwitz
  2021-04-22 17:02 ` [PATCH 5/5] vhost-user-blk: Check that num-queues is supported by backend Kevin Wolf
  4 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-22 17:02 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den-plotnikov, mst, qemu-devel, raphael.norwitz

Commit 2943b53f6 (' virtio: force VIRTIO_F_IOMMU_PLATFORM') made sure
that vhost can't just reject VIRTIO_F_IOMMU_PLATFORM when it was
requested. However, just adding it back to the negotiated flags isn't
right either because it promises support to the guest that the device
actually doesn't support. One example of a vhost-user device that
doesn't have support for the flag is the vhost-user-blk export of QEMU.

Instead of successfully creating a device that doesn't work, just fail
to plug the device when it doesn't support the feature, but it was
requested. This results in much clearer error messages.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935019
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/virtio/virtio-bus.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d6332d45c3..859978d248 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -69,6 +69,11 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
         return;
     }
 
+    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+        error_setg(errp, "iommu_platform=true is not supported by the device");
+        return;
+    }
+
     if (klass->device_plugged != NULL) {
         klass->device_plugged(qbus->parent, &local_err);
     }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] vhost-user-blk: Check that num-queues is supported by backend
  2021-04-22 17:02 [PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion Kevin Wolf
                   ` (3 preceding siblings ...)
  2021-04-22 17:02 ` [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported Kevin Wolf
@ 2021-04-22 17:02 ` Kevin Wolf
  2021-04-28 20:16   ` Raphael Norwitz
  4 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-22 17:02 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den-plotnikov, mst, qemu-devel, raphael.norwitz

Creating a device with a number of queues that isn't supported by the
backend is pointless, the device won't work properly and the error
messages are rather confusing.

Just fail to create the device if num-queues is higher than what the
backend supports.

Since the relationship between num-queues and the number of virtqueues
depends on the specific device, this is an additional value that needs
to be initialised by the device. For convenience, allow leaving it 0 if
the check should be skipped. This makes sense for vhost-user-net where
separate vhost devices are used for the queues and custom initialisation
code is needed to perform the check.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935031
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/hw/virtio/vhost.h | 2 ++
 hw/block/vhost-user-blk.c | 1 +
 hw/virtio/vhost-user.c    | 5 +++++
 3 files changed, 8 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 4a8bc75415..21a9a52088 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -74,6 +74,8 @@ struct vhost_dev {
     int nvqs;
     /* the first virtqueue which would be used by this vhost dev */
     int vq_index;
+    /* if non-zero, minimum required value for max_queues */
+    int num_queues;
     uint64_t features;
     uint64_t acked_features;
     uint64_t backend_features;
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index b6f4bb3f6f..ac2193abef 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -324,6 +324,7 @@ static int vhost_user_blk_connect(DeviceState *dev, Error **errp)
     }
     s->connected = true;
 
+    s->dev.num_queues = s->num_queues;
     s->dev.nvqs = s->num_queues;
     s->dev.vqs = s->vhost_vqs;
     s->dev.vq_index = 0;
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index ded0c10453..ee57abe045 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1909,6 +1909,11 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
                 return err;
             }
         }
+        if (dev->num_queues && dev->max_queues < dev->num_queues) {
+            error_report("The maximum number of queues supported by the "
+                         "backend is %" PRIu64, dev->max_queues);
+            return -EINVAL;
+        }
 
         if (virtio_has_feature(features, VIRTIO_F_IOMMU_PLATFORM) &&
                 !(virtio_has_feature(dev->protocol_features,
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
@ 2021-04-23  7:17   ` Denis Plotnikov
  2021-04-28 16:52   ` Raphael Norwitz
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Denis Plotnikov @ 2021-04-23  7:17 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mst, qemu-devel, raphael.norwitz

Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>

On 22.04.2021 20:02, Kevin Wolf wrote:
> This is a partial revert of commits 77542d43149 and bc79c87bcde.
>
> Usually, an error during initialisation means that the configuration was
> wrong. Reconnecting won't make the error go away, but just turn the
> error condition into an endless loop. Avoid this and return errors
> again.
>
> Additionally, calling vhost_user_blk_disconnect() from the chardev event
> handler could result in use-after-free because none of the
> initialisation code expects that the device could just go away in the
> middle. So removing the call fixes crashes in several places.
>
> For example, using a num-queues setting that is incompatible with the
> backend would result in a crash like this (dereferencing dev->opaque,
> which is already NULL):
>
>   #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
>   #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
>   #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>   #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
>   #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
>   #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
>   #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
>   #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
>   #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
>   #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
>   1 file changed, 13 insertions(+), 41 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index f5e9682703..e824b0a759 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -50,6 +50,8 @@ static const int user_feature_bits[] = {
>       VHOST_INVALID_FEATURE_BIT
>   };
>   
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> +
>   static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config)
>   {
>       VHostUserBlk *s = VHOST_USER_BLK(vdev);
> @@ -362,19 +364,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>       vhost_dev_cleanup(&s->dev);
>   }
>   
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> -                                 bool realized);
> -
> -static void vhost_user_blk_event_realize(void *opaque, QEMUChrEvent event)
> -{
> -    vhost_user_blk_event(opaque, event, false);
> -}
> -
> -static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> -{
> -    vhost_user_blk_event(opaque, event, true);
> -}
> -
>   static void vhost_user_blk_chr_closed_bh(void *opaque)
>   {
>       DeviceState *dev = opaque;
> @@ -382,12 +371,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
>       VHostUserBlk *s = VHOST_USER_BLK(vdev);
>   
>       vhost_user_blk_disconnect(dev);
> -    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
> -            vhost_user_blk_event_oper, NULL, opaque, NULL, true);
> +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> +                             NULL, opaque, NULL, true);
>   }
>   
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> -                                 bool realized)
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>   {
>       DeviceState *dev = opaque;
>       VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -401,17 +389,7 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
>           }
>           break;
>       case CHR_EVENT_CLOSED:
> -        /*
> -         * Closing the connection should happen differently on device
> -         * initialization and operation stages.
> -         * On initalization, we want to re-start vhost_dev initialization
> -         * from the very beginning right away when the connection is closed,
> -         * so we clean up vhost_dev on each connection closing.
> -         * On operation, we want to postpone vhost_dev cleanup to let the
> -         * other code perform its own cleanup sequence using vhost_dev data
> -         * (e.g. vhost_dev_set_log).
> -         */
> -        if (realized && !runstate_check(RUN_STATE_SHUTDOWN)) {
> +        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>               /*
>                * A close event may happen during a read/write, but vhost
>                * code assumes the vhost_dev remains setup, so delay the
> @@ -431,8 +409,6 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
>                * knowing its type (in this case vhost-user).
>                */
>               s->dev.started = false;
> -        } else {
> -            vhost_user_blk_disconnect(dev);
>           }
>           break;
>       case CHR_EVENT_BREAK:
> @@ -490,31 +466,27 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>       s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
>       s->connected = false;
>   
> -    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> -                             vhost_user_blk_event_realize, NULL, (void *)dev,
> -                             NULL, true);
> -
> -reconnect:
>       if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
>           error_report_err(err);
>           goto virtio_err;
>       }
>   
> -    /* check whether vhost_user_blk_connect() failed or not */
> -    if (!s->connected) {
> -        goto reconnect;
> +    if (vhost_user_blk_connect(dev) < 0) {
> +        qemu_chr_fe_disconnect(&s->chardev);
> +        goto virtio_err;
>       }
> +    assert(s->connected);
>   
>       ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
>                                  sizeof(struct virtio_blk_config));
>       if (ret < 0) {
>           error_report("vhost-user-blk: get block config failed");
> -        goto reconnect;
> +        goto virtio_err;
>       }
>   
> -    /* we're fully initialized, now we can operate, so change the handler */
> +    /* we're fully initialized, now we can operate, so add the handler */
>       qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> -                             vhost_user_blk_event_oper, NULL, (void *)dev,
> +                             vhost_user_blk_event, NULL, (void *)dev,
>                                NULL, true);
>       return;
>   


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
  2021-04-23  7:17   ` Denis Plotnikov
@ 2021-04-28 16:52   ` Raphael Norwitz
  2021-04-28 17:31     ` Kevin Wolf
  2021-04-28 19:37   ` Philippe Mathieu-Daudé
  2021-04-29 12:48   ` Kevin Wolf
  3 siblings, 1 reply; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-28 16:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

Given what you've shown with the use-after-free, I agree the changes
need to be reverted. I'm a little uneasy about removing the reconnect
logic from the device realization completely though.

On Thu, Apr 22, 2021 at 07:02:17PM +0200, Kevin Wolf wrote:
> This is a partial revert of commits 77542d43149 and bc79c87bcde.
> 
> Usually, an error during initialisation means that the configuration was
> wrong. Reconnecting won't make the error go away, but just turn the
> error condition into an endless loop. Avoid this and return errors
> again.
> 

Is that nessesarily true? As I understand it the main usecases for
device reconnect are to allow a device backend to be restarted after a
failure or to allow the backend to be upgraded without restarting the
guest. I agree - misconfiguration could be a common cause of a device
backend crashing at realize time, but couldn't there be others? Maybe
transient memory pressure?

Especially in the case where one process is connecting to many different
vhost-user-blk instances, I could imagine power-ons and incoming
migrations racing with backend restarts quite frequently. Should
these cases cause failures?

We can still hit the infinite looping case you describe post-realize.
Why should we treat pre-realize differently?

> Additionally, calling vhost_user_blk_disconnect() from the chardev event
> handler could result in use-after-free because none of the
> initialisation code expects that the device could just go away in the
> middle. So removing the call fixes crashes in several places.
> 
> For example, using a num-queues setting that is incompatible with the
> backend would result in a crash like this (dereferencing dev->opaque,
> which is already NULL):
> 
>  #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
>  #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
>  #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>  #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
>  #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
>  #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
>  #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
>  #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
>  #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
>  #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
>  1 file changed, 13 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index f5e9682703..e824b0a759 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -50,6 +50,8 @@ static const int user_feature_bits[] = {
>      VHOST_INVALID_FEATURE_BIT
>  };
>  
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> +
>  static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> @@ -362,19 +364,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>      vhost_dev_cleanup(&s->dev);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> -                                 bool realized);
> -
> -static void vhost_user_blk_event_realize(void *opaque, QEMUChrEvent event)
> -{
> -    vhost_user_blk_event(opaque, event, false);
> -}
> -
> -static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> -{
> -    vhost_user_blk_event(opaque, event, true);
> -}
> -
>  static void vhost_user_blk_chr_closed_bh(void *opaque)
>  {
>      DeviceState *dev = opaque;
> @@ -382,12 +371,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  
>      vhost_user_blk_disconnect(dev);
> -    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
> -            vhost_user_blk_event_oper, NULL, opaque, NULL, true);
> +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> +                             NULL, opaque, NULL, true);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> -                                 bool realized)
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>  {
>      DeviceState *dev = opaque;
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -401,17 +389,7 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
>          }
>          break;
>      case CHR_EVENT_CLOSED:
> -        /*
> -         * Closing the connection should happen differently on device
> -         * initialization and operation stages.
> -         * On initalization, we want to re-start vhost_dev initialization
> -         * from the very beginning right away when the connection is closed,
> -         * so we clean up vhost_dev on each connection closing.
> -         * On operation, we want to postpone vhost_dev cleanup to let the
> -         * other code perform its own cleanup sequence using vhost_dev data
> -         * (e.g. vhost_dev_set_log).
> -         */
> -        if (realized && !runstate_check(RUN_STATE_SHUTDOWN)) {
> +        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>              /*
>               * A close event may happen during a read/write, but vhost
>               * code assumes the vhost_dev remains setup, so delay the
> @@ -431,8 +409,6 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
>               * knowing its type (in this case vhost-user).
>               */
>              s->dev.started = false;
> -        } else {
> -            vhost_user_blk_disconnect(dev);
>          }
>          break;
>      case CHR_EVENT_BREAK:
> @@ -490,31 +466,27 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>      s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
>      s->connected = false;
>  
> -    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> -                             vhost_user_blk_event_realize, NULL, (void *)dev,
> -                             NULL, true);
> -
> -reconnect:
>      if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
>          error_report_err(err);
>          goto virtio_err;
>      }
>  
> -    /* check whether vhost_user_blk_connect() failed or not */
> -    if (!s->connected) {
> -        goto reconnect;
> +    if (vhost_user_blk_connect(dev) < 0) {
> +        qemu_chr_fe_disconnect(&s->chardev);
> +        goto virtio_err;
>      }
> +    assert(s->connected);

Maybe a good compromise here would be to retry some small number of
times (or even just once) so that cases like daemon upgrades and
recoverable crashes racing with power-ons and incoming migrations
don't result in failures? 

As a more general solution, we could have a user defined parameter to
specify a number of repeated connection failures to allow both pre and
post realize before bringing QEMU down. Thoughts?

>  
>      ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
>                                 sizeof(struct virtio_blk_config));
>      if (ret < 0) {
>          error_report("vhost-user-blk: get block config failed");
> -        goto reconnect;
> +        goto virtio_err;
>      }
>  
> -    /* we're fully initialized, now we can operate, so change the handler */
> +    /* we're fully initialized, now we can operate, so add the handler */
>      qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> -                             vhost_user_blk_event_oper, NULL, (void *)dev,
> +                             vhost_user_blk_event, NULL, (void *)dev,
>                               NULL, true);
>      return;
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-28 16:52   ` Raphael Norwitz
@ 2021-04-28 17:31     ` Kevin Wolf
  2021-04-28 18:22       ` Raphael Norwitz
  0 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-28 17:31 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: den-plotnikov, qemu-devel, qemu-block, mst

Am 28.04.2021 um 18:52 hat Raphael Norwitz geschrieben:
> Given what you've shown with the use-after-free, I agree the changes
> need to be reverted. I'm a little uneasy about removing the reconnect
> logic from the device realization completely though.
> 
> On Thu, Apr 22, 2021 at 07:02:17PM +0200, Kevin Wolf wrote:
> > This is a partial revert of commits 77542d43149 and bc79c87bcde.
> > 
> > Usually, an error during initialisation means that the configuration was
> > wrong. Reconnecting won't make the error go away, but just turn the
> > error condition into an endless loop. Avoid this and return errors
> > again.
> > 
> 
> Is that nessesarily true? As I understand it the main usecases for
> device reconnect are to allow a device backend to be restarted after a
> failure or to allow the backend to be upgraded without restarting the
> guest. I agree - misconfiguration could be a common cause of a device
> backend crashing at realize time, but couldn't there be others? Maybe
> transient memory pressure?
> 
> Especially in the case where one process is connecting to many different
> vhost-user-blk instances, I could imagine power-ons and incoming
> migrations racing with backend restarts quite frequently. Should
> these cases cause failures?
> 
> We can still hit the infinite looping case you describe post-realize.
> Why should we treat pre-realize differently?

I think there is one main difference between realize() and later
operation, which is that we can actually deliver an error to the user
during realize(). When we're just starting QEMU and processing the CLI
arguments, failure is very obvious, and in the context of QMP
device-add, the client is actively waiting for a result, too.

Later on, there is no good way to communicate an error (writes to stderr
just end up in some logfile at best, QAPI events can be missed), and
even if there were, we would have to do something with the guest until
the user/management application actually reacts to the error. The latter
is not a problem during realize() because the device wasn't even plugged
in yet.

So I think there are good reasons why it could make sense to distinguish
initialisation and operation.

> > Additionally, calling vhost_user_blk_disconnect() from the chardev event
> > handler could result in use-after-free because none of the
> > initialisation code expects that the device could just go away in the
> > middle. So removing the call fixes crashes in several places.
> > 
> > For example, using a num-queues setting that is incompatible with the
> > backend would result in a crash like this (dereferencing dev->opaque,
> > which is already NULL):
> > 
> >  #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
> >  #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
> >  #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> >  #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
> >  #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
> >  #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
> >  #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
> >  #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
> >  #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
> >  #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
> >  1 file changed, 13 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index f5e9682703..e824b0a759 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -50,6 +50,8 @@ static const int user_feature_bits[] = {
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> >  
> > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> > +
> >  static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > @@ -362,19 +364,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> >      vhost_dev_cleanup(&s->dev);
> >  }
> >  
> > -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > -                                 bool realized);
> > -
> > -static void vhost_user_blk_event_realize(void *opaque, QEMUChrEvent event)
> > -{
> > -    vhost_user_blk_event(opaque, event, false);
> > -}
> > -
> > -static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> > -{
> > -    vhost_user_blk_event(opaque, event, true);
> > -}
> > -
> >  static void vhost_user_blk_chr_closed_bh(void *opaque)
> >  {
> >      DeviceState *dev = opaque;
> > @@ -382,12 +371,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >  
> >      vhost_user_blk_disconnect(dev);
> > -    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
> > -            vhost_user_blk_event_oper, NULL, opaque, NULL, true);
> > +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> > +                             NULL, opaque, NULL, true);
> >  }
> >  
> > -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > -                                 bool realized)
> > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >  {
> >      DeviceState *dev = opaque;
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > @@ -401,17 +389,7 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> >          }
> >          break;
> >      case CHR_EVENT_CLOSED:
> > -        /*
> > -         * Closing the connection should happen differently on device
> > -         * initialization and operation stages.
> > -         * On initalization, we want to re-start vhost_dev initialization
> > -         * from the very beginning right away when the connection is closed,
> > -         * so we clean up vhost_dev on each connection closing.
> > -         * On operation, we want to postpone vhost_dev cleanup to let the
> > -         * other code perform its own cleanup sequence using vhost_dev data
> > -         * (e.g. vhost_dev_set_log).
> > -         */
> > -        if (realized && !runstate_check(RUN_STATE_SHUTDOWN)) {
> > +        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> >              /*
> >               * A close event may happen during a read/write, but vhost
> >               * code assumes the vhost_dev remains setup, so delay the
> > @@ -431,8 +409,6 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> >               * knowing its type (in this case vhost-user).
> >               */
> >              s->dev.started = false;
> > -        } else {
> > -            vhost_user_blk_disconnect(dev);
> >          }
> >          break;
> >      case CHR_EVENT_BREAK:
> > @@ -490,31 +466,27 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >      s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
> >      s->connected = false;
> >  
> > -    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> > -                             vhost_user_blk_event_realize, NULL, (void *)dev,
> > -                             NULL, true);
> > -
> > -reconnect:
> >      if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> >          error_report_err(err);
> >          goto virtio_err;
> >      }
> >  
> > -    /* check whether vhost_user_blk_connect() failed or not */
> > -    if (!s->connected) {
> > -        goto reconnect;
> > +    if (vhost_user_blk_connect(dev) < 0) {
> > +        qemu_chr_fe_disconnect(&s->chardev);
> > +        goto virtio_err;
> >      }
> > +    assert(s->connected);
> 
> Maybe a good compromise here would be to retry some small number of
> times (or even just once) so that cases like daemon upgrades and
> recoverable crashes racing with power-ons and incoming migrations
> don't result in failures?
> 
> As a more general solution, we could have a user defined parameter to
> specify a number of repeated connection failures to allow both pre and
> post realize before bringing QEMU down. Thoughts?

Retrying once or even a small number of times sounds reasonable enough.
At first I thought it wouldn't help because restarting the daemon might
take some time, but with qemu_chr_fe_wait_connected() we already wait
until we can successfully connect in each iteration, and we would only
retry for errors that happen afterwards.

But I think what we really want to do before retrying is distinguishing
errors that are actually related to the connection itself from errors
that relate to the content of the communication (i.e. invalid requests
or configuration).
In fact, I think the error I hit wasn't even produced on the remote
side, it came from QEMU itself. Making vhost_user_blk_connect() return
-EAGAIN in the right cases and reconnecting only there sounds much
better than just blindly retrying.

Of course, adjusting error reporting so that we can distinguish these
cases will probably touch much more places than would be appropriate for
this patch, so I'd suggest that we indeed just revert the reconnection
during realize() in this series, but then try to add some more selective
retry logic back on top of it (without using the event handler, which
only made everything more complicated in a function that waits
synchronously anyway).

Kevin



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] vhost-user-blk: Use Error more consistently
  2021-04-22 17:02 ` [PATCH 2/5] vhost-user-blk: Use Error more consistently Kevin Wolf
@ 2021-04-28 18:08   ` Raphael Norwitz
  2021-04-29  9:26     ` Kevin Wolf
  0 siblings, 1 reply; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-28 18:08 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

Code looks ok - question about the commit message.

Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

On Thu, Apr 22, 2021 at 07:02:18PM +0200, Kevin Wolf wrote:
> Now that vhost_user_blk_connect() is not called from an event handler
> any more, but directly from vhost_user_blk_device_realize(), we don't
> have to resort to error_report() any more, but can use Error again.

vhost_user_blk_connect() is still called by vhost_user_blk_event() which
is registered as an event handler. I don't understand your point around
us having had to use error_report() before, but not anymore. Can you
clarify?

> 
> With Error, the callers are responsible for adding context if necessary
> (such as the "-device" option the error refers to). Additional prefixes
> are redundant and better omitted.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  hw/block/vhost-user-blk.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index e824b0a759..8422a29470 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -311,7 +311,7 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
>      vhost_dev_free_inflight(s->inflight);
>  }
>  
> -static int vhost_user_blk_connect(DeviceState *dev)
> +static int vhost_user_blk_connect(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> @@ -331,8 +331,7 @@ static int vhost_user_blk_connect(DeviceState *dev)
>  
>      ret = vhost_dev_init(&s->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
>      if (ret < 0) {
> -        error_report("vhost-user-blk: vhost initialization failed: %s",
> -                     strerror(-ret));
> +        error_setg_errno(errp, -ret, "vhost initialization failed");
>          return ret;
>      }
>  
> @@ -340,8 +339,7 @@ static int vhost_user_blk_connect(DeviceState *dev)
>      if (virtio_device_started(vdev, vdev->status)) {
>          ret = vhost_user_blk_start(vdev);
>          if (ret < 0) {
> -            error_report("vhost-user-blk: vhost start failed: %s",
> -                         strerror(-ret));
> +            error_setg_errno(errp, -ret, "vhost start failed");
>              return ret;
>          }
>      }
> @@ -380,10 +378,12 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>      DeviceState *dev = opaque;
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +    Error *local_err = NULL;
>  
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        if (vhost_user_blk_connect(dev) < 0) {
> +        if (vhost_user_blk_connect(dev, &local_err) < 0) {
> +            error_report_err(local_err);
>              qemu_chr_fe_disconnect(&s->chardev);
>              return;
>          }
> @@ -427,7 +427,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>      int i, ret;
>  
>      if (!s->chardev.chr) {
> -        error_setg(errp, "vhost-user-blk: chardev is mandatory");
> +        error_setg(errp, "chardev is mandatory");
>          return;
>      }
>  
> @@ -435,16 +435,16 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>          s->num_queues = 1;
>      }
>      if (!s->num_queues || s->num_queues > VIRTIO_QUEUE_MAX) {
> -        error_setg(errp, "vhost-user-blk: invalid number of IO queues");
> +        error_setg(errp, "invalid number of IO queues");
>          return;
>      }
>  
>      if (!s->queue_size) {
> -        error_setg(errp, "vhost-user-blk: queue size must be non-zero");
> +        error_setg(errp, "queue size must be non-zero");
>          return;
>      }
>      if (s->queue_size > VIRTQUEUE_MAX_SIZE) {
> -        error_setg(errp, "vhost-user-blk: queue size must not exceed %d",
> +        error_setg(errp, "queue size must not exceed %d",
>                     VIRTQUEUE_MAX_SIZE);
>          return;
>      }
> @@ -471,7 +471,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>          goto virtio_err;
>      }
>  
> -    if (vhost_user_blk_connect(dev) < 0) {
> +    if (vhost_user_blk_connect(dev, errp) < 0) {
>          qemu_chr_fe_disconnect(&s->chardev);
>          goto virtio_err;
>      }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-28 17:31     ` Kevin Wolf
@ 2021-04-28 18:22       ` Raphael Norwitz
  0 siblings, 0 replies; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-28 18:22 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

On Wed, Apr 28, 2021 at 07:31:13PM +0200, Kevin Wolf wrote:
> Am 28.04.2021 um 18:52 hat Raphael Norwitz geschrieben:
> > Given what you've shown with the use-after-free, I agree the changes
> > need to be reverted. I'm a little uneasy about removing the reconnect
> > logic from the device realization completely though.
> > 
> > On Thu, Apr 22, 2021 at 07:02:17PM +0200, Kevin Wolf wrote:
> > > This is a partial revert of commits 77542d43149 and bc79c87bcde.
> > > 
> > > Usually, an error during initialisation means that the configuration was
> > > wrong. Reconnecting won't make the error go away, but just turn the
> > > error condition into an endless loop. Avoid this and return errors
> > > again.
> > > 
> > 
> > Is that nessesarily true? As I understand it the main usecases for
> > device reconnect are to allow a device backend to be restarted after a
> > failure or to allow the backend to be upgraded without restarting the
> > guest. I agree - misconfiguration could be a common cause of a device
> > backend crashing at realize time, but couldn't there be others? Maybe
> > transient memory pressure?
> > 
> > Especially in the case where one process is connecting to many different
> > vhost-user-blk instances, I could imagine power-ons and incoming
> > migrations racing with backend restarts quite frequently. Should
> > these cases cause failures?
> > 
> > We can still hit the infinite looping case you describe post-realize.
> > Why should we treat pre-realize differently?
> 
> I think there is one main difference between realize() and later
> operation, which is that we can actually deliver an error to the user
> during realize(). When we're just starting QEMU and processing the CLI
> arguments, failure is very obvious, and in the context of QMP
> device-add, the client is actively waiting for a result, too.
> 
> Later on, there is no good way to communicate an error (writes to stderr
> just end up in some logfile at best, QAPI events can be missed), and
> even if there were, we would have to do something with the guest until
> the user/management application actually reacts to the error. The latter
> is not a problem during realize() because the device wasn't even plugged
> in yet.
> 
> So I think there are good reasons why it could make sense to distinguish
> initialisation and operation.
>

Fair enough. I'm happy in this case, provided we remain resiliant
against one-off daemon restarts during realize.

> > > Additionally, calling vhost_user_blk_disconnect() from the chardev event
> > > handler could result in use-after-free because none of the
> > > initialisation code expects that the device could just go away in the
> > > middle. So removing the call fixes crashes in several places.
> > > 
> > > For example, using a num-queues setting that is incompatible with the
> > > backend would result in a crash like this (dereferencing dev->opaque,
> > > which is already NULL):
> > > 
> > >  #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
> > >  #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
> > >  #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> > >  #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
> > >  #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
> > >  #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
> > >  #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
> > >  #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
> > >  #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
> > >  #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
> > >  1 file changed, 13 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > > index f5e9682703..e824b0a759 100644
> > > --- a/hw/block/vhost-user-blk.c
> > > +++ b/hw/block/vhost-user-blk.c
> > > @@ -50,6 +50,8 @@ static const int user_feature_bits[] = {
> > >      VHOST_INVALID_FEATURE_BIT
> > >  };
> > >  
> > > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> > > +
> > >  static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config)
> > >  {
> > >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > > @@ -362,19 +364,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> > >      vhost_dev_cleanup(&s->dev);
> > >  }
> > >  
> > > -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > > -                                 bool realized);
> > > -
> > > -static void vhost_user_blk_event_realize(void *opaque, QEMUChrEvent event)
> > > -{
> > > -    vhost_user_blk_event(opaque, event, false);
> > > -}
> > > -
> > > -static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> > > -{
> > > -    vhost_user_blk_event(opaque, event, true);
> > > -}
> > > -
> > >  static void vhost_user_blk_chr_closed_bh(void *opaque)
> > >  {
> > >      DeviceState *dev = opaque;
> > > @@ -382,12 +371,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
> > >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > >  
> > >      vhost_user_blk_disconnect(dev);
> > > -    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
> > > -            vhost_user_blk_event_oper, NULL, opaque, NULL, true);
> > > +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> > > +                             NULL, opaque, NULL, true);
> > >  }
> > >  
> > > -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > > -                                 bool realized)
> > > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> > >  {
> > >      DeviceState *dev = opaque;
> > >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > @@ -401,17 +389,7 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > >          }
> > >          break;
> > >      case CHR_EVENT_CLOSED:
> > > -        /*
> > > -         * Closing the connection should happen differently on device
> > > -         * initialization and operation stages.
> > > -         * On initalization, we want to re-start vhost_dev initialization
> > > -         * from the very beginning right away when the connection is closed,
> > > -         * so we clean up vhost_dev on each connection closing.
> > > -         * On operation, we want to postpone vhost_dev cleanup to let the
> > > -         * other code perform its own cleanup sequence using vhost_dev data
> > > -         * (e.g. vhost_dev_set_log).
> > > -         */
> > > -        if (realized && !runstate_check(RUN_STATE_SHUTDOWN)) {
> > > +        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> > >              /*
> > >               * A close event may happen during a read/write, but vhost
> > >               * code assumes the vhost_dev remains setup, so delay the
> > > @@ -431,8 +409,6 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > >               * knowing its type (in this case vhost-user).
> > >               */
> > >              s->dev.started = false;
> > > -        } else {
> > > -            vhost_user_blk_disconnect(dev);
> > >          }
> > >          break;
> > >      case CHR_EVENT_BREAK:
> > > @@ -490,31 +466,27 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > >      s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
> > >      s->connected = false;
> > >  
> > > -    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> > > -                             vhost_user_blk_event_realize, NULL, (void *)dev,
> > > -                             NULL, true);
> > > -
> > > -reconnect:
> > >      if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> > >          error_report_err(err);
> > >          goto virtio_err;
> > >      }
> > >  
> > > -    /* check whether vhost_user_blk_connect() failed or not */
> > > -    if (!s->connected) {
> > > -        goto reconnect;
> > > +    if (vhost_user_blk_connect(dev) < 0) {
> > > +        qemu_chr_fe_disconnect(&s->chardev);
> > > +        goto virtio_err;
> > >      }
> > > +    assert(s->connected);
> > 
> > Maybe a good compromise here would be to retry some small number of
> > times (or even just once) so that cases like daemon upgrades and
> > recoverable crashes racing with power-ons and incoming migrations
> > don't result in failures?
> > 
> > As a more general solution, we could have a user defined parameter to
> > specify a number of repeated connection failures to allow both pre and
> > post realize before bringing QEMU down. Thoughts?
> 
> Retrying once or even a small number of times sounds reasonable enough.
> At first I thought it wouldn't help because restarting the daemon might
> take some time, but with qemu_chr_fe_wait_connected() we already wait
> until we can successfully connect in each iteration, and we would only
> retry for errors that happen afterwards.
> 
> But I think what we really want to do before retrying is distinguishing
> errors that are actually related to the connection itself from errors
> that relate to the content of the communication (i.e. invalid requests
> or configuration).
> In fact, I think the error I hit wasn't even produced on the remote
> side, it came from QEMU itself. Making vhost_user_blk_connect() return
> -EAGAIN in the right cases and reconnecting only there sounds much
> better than just blindly retrying.
> 

Agreed - we should definitely identify any invalid requests or
configuration and treat them differently to legitimate cases.

> Of course, adjusting error reporting so that we can distinguish these
> cases will probably touch much more places than would be appropriate for
> this patch, so I'd suggest that we indeed just revert the reconnection
> during realize() in this series, but then try to add some more selective
> retry logic back on top of it (without using the event handler, which
> only made everything more complicated in a function that waits
> synchronously anyway).

Sounds good to me. I don't see a good reason to use an event handler in
realize since it is synchronous.

> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost devicey
  2021-04-22 17:02 ` [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost device Kevin Wolf
@ 2021-04-28 18:46   ` Raphael Norwitz
  0 siblings, 0 replies; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-28 18:46 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

On Thu, Apr 22, 2021 at 07:02:19PM +0200, Kevin Wolf wrote:
> VIRTIO_F_RING_PACKED and VIRTIO_F_IOMMU_PLATFORM need to be supported by
> the vhost device, otherwise advertising it to the guest doesn't result
> in a working configuration. They are currently not supported by the
> vhost-user-blk export in QEMU.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935020
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  hw/block/vhost-user-blk.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 8422a29470..b6f4bb3f6f 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -47,6 +47,8 @@ static const int user_feature_bits[] = {
>      VIRTIO_RING_F_INDIRECT_DESC,
>      VIRTIO_RING_F_EVENT_IDX,
>      VIRTIO_F_NOTIFY_ON_EMPTY,
> +    VIRTIO_F_RING_PACKED,
> +    VIRTIO_F_IOMMU_PLATFORM,
>      VHOST_INVALID_FEATURE_BIT
>  };
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported
  2021-04-22 17:02 ` [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported Kevin Wolf
@ 2021-04-28 19:24   ` Raphael Norwitz
  2021-04-29  9:34     ` Kevin Wolf
  0 siblings, 1 reply; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-28 19:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

On Thu, Apr 22, 2021 at 07:02:20PM +0200, Kevin Wolf wrote:
> Commit 2943b53f6 (' virtio: force VIRTIO_F_IOMMU_PLATFORM') made sure
> that vhost can't just reject VIRTIO_F_IOMMU_PLATFORM when it was
> requested. However, just adding it back to the negotiated flags isn't
> right either because it promises support to the guest that the device
> actually doesn't support. One example of a vhost-user device that
> doesn't have support for the flag is the vhost-user-blk export of QEMU.
> 
> Instead of successfully creating a device that doesn't work, just fail
> to plug the device when it doesn't support the feature, but it was
> requested. This results in much clearer error messages.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935031
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  hw/virtio/virtio-bus.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index d6332d45c3..859978d248 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -69,6 +69,11 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>          return;
>      }
>

Can you explain this check a little more?

Above we have:
bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);

and then we get the host features from the bckend:
vdev->host_features = vdc->get_features(vdev, vdev->host_features

So as is this is catching the case where vdev->host_features had
VIRTIO_F_IOMMU_PLATFORM set before (by default?), but doesn't now that
the features have been retrieved? 

Why not just:
    if (!virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {

> +    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> +        error_setg(errp, "iommu_platform=true is not supported by the device");
> +        return;
> +    }
> +
>      if (klass->device_plugged != NULL) {
>          klass->device_plugged(qbus->parent, &local_err);
>      }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
  2021-04-23  7:17   ` Denis Plotnikov
  2021-04-28 16:52   ` Raphael Norwitz
@ 2021-04-28 19:37   ` Philippe Mathieu-Daudé
  2021-04-29 12:48   ` Kevin Wolf
  3 siblings, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-04-28 19:37 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: den-plotnikov, raphael.norwitz, qemu-devel, mst

On 4/22/21 7:02 PM, Kevin Wolf wrote:
> This is a partial revert of commits 77542d43149 and bc79c87bcde.
> 
> Usually, an error during initialisation means that the configuration was
> wrong. Reconnecting won't make the error go away, but just turn the
> error condition into an endless loop. Avoid this and return errors
> again.
> 
> Additionally, calling vhost_user_blk_disconnect() from the chardev event
> handler could result in use-after-free because none of the
> initialisation code expects that the device could just go away in the

TIL initialisation wording.

> middle. So removing the call fixes crashes in several places.
> 
> For example, using a num-queues setting that is incompatible with the
> backend would result in a crash like this (dereferencing dev->opaque,
> which is already NULL):
> 
>  #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
>  #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
>  #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>  #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
>  #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
>  #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
>  #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
>  #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
>  #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
>  #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
>  1 file changed, 13 insertions(+), 41 deletions(-)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] vhost-user-blk: Check that num-queues is supported by backend
  2021-04-22 17:02 ` [PATCH 5/5] vhost-user-blk: Check that num-queues is supported by backend Kevin Wolf
@ 2021-04-28 20:16   ` Raphael Norwitz
  0 siblings, 0 replies; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-28 20:16 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

On Thu, Apr 22, 2021 at 07:02:21PM +0200, Kevin Wolf wrote:
> Creating a device with a number of queues that isn't supported by the
> backend is pointless, the device won't work properly and the error
> messages are rather confusing.
> 
> Just fail to create the device if num-queues is higher than what the
> backend supports.
> 
> Since the relationship between num-queues and the number of virtqueues
> depends on the specific device, this is an additional value that needs
> to be initialised by the device. For convenience, allow leaving it 0 if
> the check should be skipped. This makes sense for vhost-user-net where
> separate vhost devices are used for the queues and custom initialisation
> code is needed to perform the check.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935031
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/hw/virtio/vhost.h | 2 ++
>  hw/block/vhost-user-blk.c | 1 +
>  hw/virtio/vhost-user.c    | 5 +++++
>  3 files changed, 8 insertions(+)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 4a8bc75415..21a9a52088 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -74,6 +74,8 @@ struct vhost_dev {
>      int nvqs;
>      /* the first virtqueue which would be used by this vhost dev */
>      int vq_index;
> +    /* if non-zero, minimum required value for max_queues */
> +    int num_queues;
>      uint64_t features;
>      uint64_t acked_features;
>      uint64_t backend_features;
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index b6f4bb3f6f..ac2193abef 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -324,6 +324,7 @@ static int vhost_user_blk_connect(DeviceState *dev, Error **errp)
>      }
>      s->connected = true;
>  
> +    s->dev.num_queues = s->num_queues;
>      s->dev.nvqs = s->num_queues;
>      s->dev.vqs = s->vhost_vqs;
>      s->dev.vq_index = 0;
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index ded0c10453..ee57abe045 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1909,6 +1909,11 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
>                  return err;
>              }
>          }
> +        if (dev->num_queues && dev->max_queues < dev->num_queues) {
> +            error_report("The maximum number of queues supported by the "
> +                         "backend is %" PRIu64, dev->max_queues);
> +            return -EINVAL;
> +        }
>  
>          if (virtio_has_feature(features, VIRTIO_F_IOMMU_PLATFORM) &&
>                  !(virtio_has_feature(dev->protocol_features,
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] vhost-user-blk: Use Error more consistently
  2021-04-28 18:08   ` Raphael Norwitz
@ 2021-04-29  9:26     ` Kevin Wolf
  2021-04-29 12:56       ` Raphael Norwitz
  0 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-29  9:26 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: den-plotnikov, qemu-devel, qemu-block, mst

Am 28.04.2021 um 20:08 hat Raphael Norwitz geschrieben:
> Code looks ok - question about the commit message.
> 
> Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> 
> On Thu, Apr 22, 2021 at 07:02:18PM +0200, Kevin Wolf wrote:
> > Now that vhost_user_blk_connect() is not called from an event handler
> > any more, but directly from vhost_user_blk_device_realize(), we don't
> > have to resort to error_report() any more, but can use Error again.
> 
> vhost_user_blk_connect() is still called by vhost_user_blk_event() which
> is registered as an event handler. I don't understand your point around
> us having had to use error_report() before, but not anymore. Can you
> clarify?

What I meant is that vhost_user_blk_event() can't really make use of an
Error object other than passing it to error_report_err(), which has the
same result as directly using error_report().

With the new code where vhost_user_blk_device_realize() calls the
function directly, we can actually return the error to its caller so
that it ends up in the QMP result or the command line error message.

The result is still not great because vhost_user_blk_connect() doesn't
know the original error message. We'd have to add Error to
vhost_dev_init() and the functions that it calls to get the real error
messages, but at least it's a first step in the right direction.

We already figured that we need to change error reporting so we can know
whether we should retry, so I guess this can be solved at the same time.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported
  2021-04-28 19:24   ` Raphael Norwitz
@ 2021-04-29  9:34     ` Kevin Wolf
  2021-04-29 12:48       ` Raphael Norwitz
  0 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-29  9:34 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: den-plotnikov, qemu-devel, qemu-block, mst

Am 28.04.2021 um 21:24 hat Raphael Norwitz geschrieben:
> On Thu, Apr 22, 2021 at 07:02:20PM +0200, Kevin Wolf wrote:
> > Commit 2943b53f6 (' virtio: force VIRTIO_F_IOMMU_PLATFORM') made sure
> > that vhost can't just reject VIRTIO_F_IOMMU_PLATFORM when it was
> > requested. However, just adding it back to the negotiated flags isn't
> > right either because it promises support to the guest that the device
> > actually doesn't support. One example of a vhost-user device that
> > doesn't have support for the flag is the vhost-user-blk export of QEMU.
> > 
> > Instead of successfully creating a device that doesn't work, just fail
> > to plug the device when it doesn't support the feature, but it was
> > requested. This results in much clearer error messages.
> > 
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935031
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  hw/virtio/virtio-bus.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > index d6332d45c3..859978d248 100644
> > --- a/hw/virtio/virtio-bus.c
> > +++ b/hw/virtio/virtio-bus.c
> > @@ -69,6 +69,11 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> >          return;
> >      }
> >
> 
> Can you explain this check a little more?
> 
> Above we have:
> bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);

If I underdstand the code correctly, at this point this is still the
unchanged value of the iommu_platform=on|off qdev property as given by
the user.

> and then we get the host features from the bckend:
> vdev->host_features = vdc->get_features(vdev, vdev->host_features

Yes, and now a flag is only set if the user had requested it and the
backend also supports it.

> So as is this is catching the case where vdev->host_features had
> VIRTIO_F_IOMMU_PLATFORM set before (by default?), but doesn't now that
> the features have been retrieved? 
> 
> Why not just:
>     if (!virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {

We don't want to fail if the user hadn't even requested the feature, but
just if it was requested, but could not be provided.

> > +    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > +        error_setg(errp, "iommu_platform=true is not supported by the device");
> > +        return;
> > +    }
> > +
> >      if (klass->device_plugged != NULL) {
> >          klass->device_plugged(qbus->parent, &local_err);
> >      }

Kevin



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported
  2021-04-29  9:34     ` Kevin Wolf
@ 2021-04-29 12:48       ` Raphael Norwitz
  0 siblings, 0 replies; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-29 12:48 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

Got it - thanks for the clarification.

Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

On Thu, Apr 29, 2021 at 11:34:11AM +0200, Kevin Wolf wrote:
> Am 28.04.2021 um 21:24 hat Raphael Norwitz geschrieben:
> > On Thu, Apr 22, 2021 at 07:02:20PM +0200, Kevin Wolf wrote:
> > > Commit 2943b53f6 (' virtio: force VIRTIO_F_IOMMU_PLATFORM') made sure
> > > that vhost can't just reject VIRTIO_F_IOMMU_PLATFORM when it was
> > > requested. However, just adding it back to the negotiated flags isn't
> > > right either because it promises support to the guest that the device
> > > actually doesn't support. One example of a vhost-user device that
> > > doesn't have support for the flag is the vhost-user-blk export of QEMU.
> > > 
> > > Instead of successfully creating a device that doesn't work, just fail
> > > to plug the device when it doesn't support the feature, but it was
> > > requested. This results in much clearer error messages.
> > > 
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935019
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  hw/virtio/virtio-bus.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > > index d6332d45c3..859978d248 100644
> > > --- a/hw/virtio/virtio-bus.c
> > > +++ b/hw/virtio/virtio-bus.c
> > > @@ -69,6 +69,11 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> > >          return;
> > >      }
> > >
> > 
> > Can you explain this check a little more?
> > 
> > Above we have:
> > bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> 
> If I underdstand the code correctly, at this point this is still the
> unchanged value of the iommu_platform=on|off qdev property as given by
> the user.
> 
> > and then we get the host features from the bckend:
> > vdev->host_features = vdc->get_features(vdev, vdev->host_features
> 
> Yes, and now a flag is only set if the user had requested it and the
> backend also supports it.
> 
> > So as is this is catching the case where vdev->host_features had
> > VIRTIO_F_IOMMU_PLATFORM set before (by default?), but doesn't now that
> > the features have been retrieved? 
> > 
> > Why not just:
> >     if (!virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> 
> We don't want to fail if the user hadn't even requested the feature, but
> just if it was requested, but could not be provided.
>
> > > +    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > > +        error_setg(errp, "iommu_platform=true is not supported by the device");
> > > +        return;
> > > +    }
> > > +
> > >      if (klass->device_plugged != NULL) {
> > >          klass->device_plugged(qbus->parent, &local_err);
> > >      }
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
                     ` (2 preceding siblings ...)
  2021-04-28 19:37   ` Philippe Mathieu-Daudé
@ 2021-04-29 12:48   ` Kevin Wolf
  2021-04-29 16:53     ` Raphael Norwitz
  3 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2021-04-29 12:48 UTC (permalink / raw)
  To: qemu-block; +Cc: den-plotnikov, mst, qemu-devel, raphael.norwitz

Am 22.04.2021 um 19:02 hat Kevin Wolf geschrieben:
> This is a partial revert of commits 77542d43149 and bc79c87bcde.
> 
> Usually, an error during initialisation means that the configuration was
> wrong. Reconnecting won't make the error go away, but just turn the
> error condition into an endless loop. Avoid this and return errors
> again.
> 
> Additionally, calling vhost_user_blk_disconnect() from the chardev event
> handler could result in use-after-free because none of the
> initialisation code expects that the device could just go away in the
> middle. So removing the call fixes crashes in several places.
> 
> For example, using a num-queues setting that is incompatible with the
> backend would result in a crash like this (dereferencing dev->opaque,
> which is already NULL):
> 
>  #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
>  #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
>  #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>  #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
>  #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
>  #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
>  #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
>  #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
>  #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
>  #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
>  1 file changed, 13 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index f5e9682703..e824b0a759 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -50,6 +50,8 @@ static const int user_feature_bits[] = {
>      VHOST_INVALID_FEATURE_BIT
>  };
>  
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> +
>  static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> @@ -362,19 +364,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>      vhost_dev_cleanup(&s->dev);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> -                                 bool realized);
> -
> -static void vhost_user_blk_event_realize(void *opaque, QEMUChrEvent event)
> -{
> -    vhost_user_blk_event(opaque, event, false);
> -}
> -
> -static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> -{
> -    vhost_user_blk_event(opaque, event, true);
> -}
> -
>  static void vhost_user_blk_chr_closed_bh(void *opaque)
>  {
>      DeviceState *dev = opaque;
> @@ -382,12 +371,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  
>      vhost_user_blk_disconnect(dev);
> -    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
> -            vhost_user_blk_event_oper, NULL, opaque, NULL, true);
> +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> +                             NULL, opaque, NULL, true);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> -                                 bool realized)
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>  {
>      DeviceState *dev = opaque;
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -401,17 +389,7 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
>          }
>          break;
>      case CHR_EVENT_CLOSED:
> -        /*
> -         * Closing the connection should happen differently on device
> -         * initialization and operation stages.
> -         * On initalization, we want to re-start vhost_dev initialization
> -         * from the very beginning right away when the connection is closed,
> -         * so we clean up vhost_dev on each connection closing.
> -         * On operation, we want to postpone vhost_dev cleanup to let the
> -         * other code perform its own cleanup sequence using vhost_dev data
> -         * (e.g. vhost_dev_set_log).
> -         */
> -        if (realized && !runstate_check(RUN_STATE_SHUTDOWN)) {
> +        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>              /*
>               * A close event may happen during a read/write, but vhost
>               * code assumes the vhost_dev remains setup, so delay the
> @@ -431,8 +409,6 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
>               * knowing its type (in this case vhost-user).
>               */
>              s->dev.started = false;
> -        } else {
> -            vhost_user_blk_disconnect(dev);
>          }
>          break;
>      case CHR_EVENT_BREAK:
> @@ -490,31 +466,27 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>      s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
>      s->connected = false;
>  
> -    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> -                             vhost_user_blk_event_realize, NULL, (void *)dev,
> -                             NULL, true);
> -
> -reconnect:
>      if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
>          error_report_err(err);
>          goto virtio_err;
>      }

Preexisting bug: We have to set errp before jumping to virtio_err,
otherwise the caller (virtio_device_realize()) will take this as success
and crash when it later tries to access things that we've already freed
in the error path.

> -    /* check whether vhost_user_blk_connect() failed or not */
> -    if (!s->connected) {
> -        goto reconnect;
> +    if (vhost_user_blk_connect(dev) < 0) {
> +        qemu_chr_fe_disconnect(&s->chardev);
> +        goto virtio_err;

Here I'm newly introducing the same bug (fixed again by patch 2).

>      }
> +    assert(s->connected);
>  
>      ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
>                                 sizeof(struct virtio_blk_config));
>      if (ret < 0) {
>          error_report("vhost-user-blk: get block config failed");
> -        goto reconnect;
> +        goto virtio_err;

And this one is wrong, too.

>      }
>  
> -    /* we're fully initialized, now we can operate, so change the handler */
> +    /* we're fully initialized, now we can operate, so add the handler */
>      qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> -                             vhost_user_blk_event_oper, NULL, (void *)dev,
> +                             vhost_user_blk_event, NULL, (void *)dev,
>                               NULL, true);
>      return;

So maybe patch 2 should come first and also fix the preexisting bug, and
of course this patch needs a v2 that doesn't introduce the new instances
of the bug.

Kevin



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] vhost-user-blk: Use Error more consistently
  2021-04-29  9:26     ` Kevin Wolf
@ 2021-04-29 12:56       ` Raphael Norwitz
  0 siblings, 0 replies; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-29 12:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

Makes sense - I see why it makes reporting better at realize time.

Thanks for the clarification.

On Thu, Apr 29, 2021 at 11:26:29AM +0200, Kevin Wolf wrote:
> Am 28.04.2021 um 20:08 hat Raphael Norwitz geschrieben:
> > Code looks ok - question about the commit message.
> > 
> > Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > 
> > On Thu, Apr 22, 2021 at 07:02:18PM +0200, Kevin Wolf wrote:
> > > Now that vhost_user_blk_connect() is not called from an event handler
> > > any more, but directly from vhost_user_blk_device_realize(), we don't
> > > have to resort to error_report() any more, but can use Error again.
> > 
> > vhost_user_blk_connect() is still called by vhost_user_blk_event() which
> > is registered as an event handler. I don't understand your point around
> > us having had to use error_report() before, but not anymore. Can you
> > clarify?
> 
> What I meant is that vhost_user_blk_event() can't really make use of an
> Error object other than passing it to error_report_err(), which has the
> same result as directly using error_report().
> 
> With the new code where vhost_user_blk_device_realize() calls the
> function directly, we can actually return the error to its caller so
> that it ends up in the QMP result or the command line error message.
> 
> The result is still not great because vhost_user_blk_connect() doesn't
> know the original error message. We'd have to add Error to
> vhost_dev_init() and the functions that it calls to get the real error
> messages, but at least it's a first step in the right direction.
> 
> We already figured that we need to change error reporting so we can know
> whether we should retry, so I guess this can be solved at the same time.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
  2021-04-29 12:48   ` Kevin Wolf
@ 2021-04-29 16:53     ` Raphael Norwitz
  0 siblings, 0 replies; 21+ messages in thread
From: Raphael Norwitz @ 2021-04-29 16:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den-plotnikov, mst, qemu-devel, qemu-block, Raphael Norwitz

On Thu, Apr 29, 2021 at 02:48:52PM +0200, Kevin Wolf wrote:
> So maybe patch 2 should come first and also fix the preexisting bug, and
> of course this patch needs a v2 that doesn't introduce the new instances
> of the bug.

Sounds good to me.

> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-04-29 17:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22 17:02 [PATCH 0/5] vhost-user-blk: Error handling fixes during initialistion Kevin Wolf
2021-04-22 17:02 ` [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
2021-04-23  7:17   ` Denis Plotnikov
2021-04-28 16:52   ` Raphael Norwitz
2021-04-28 17:31     ` Kevin Wolf
2021-04-28 18:22       ` Raphael Norwitz
2021-04-28 19:37   ` Philippe Mathieu-Daudé
2021-04-29 12:48   ` Kevin Wolf
2021-04-29 16:53     ` Raphael Norwitz
2021-04-22 17:02 ` [PATCH 2/5] vhost-user-blk: Use Error more consistently Kevin Wolf
2021-04-28 18:08   ` Raphael Norwitz
2021-04-29  9:26     ` Kevin Wolf
2021-04-29 12:56       ` Raphael Norwitz
2021-04-22 17:02 ` [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost device Kevin Wolf
2021-04-28 18:46   ` [PATCH 3/5] vhost-user-blk: Get more feature flags from vhost devicey Raphael Norwitz
2021-04-22 17:02 ` [PATCH 4/5] virtio: Fail if iommu_platform is requested, but unsupported Kevin Wolf
2021-04-28 19:24   ` Raphael Norwitz
2021-04-29  9:34     ` Kevin Wolf
2021-04-29 12:48       ` Raphael Norwitz
2021-04-22 17:02 ` [PATCH 5/5] vhost-user-blk: Check that num-queues is supported by backend Kevin Wolf
2021-04-28 20:16   ` Raphael Norwitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.