All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
@ 2016-06-21 10:02 marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 01/24] misc: indentation marcandre.lureau
                   ` (25 more replies)
  0 siblings, 26 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Hi,

Since 'vhost-user: simple reconnection support' was merged, it is
possible to disconnect and reconnect a vhost-user backend. However,
many code paths in qemu may trigger assert() when the backend is
disconnected.

Some assert() could simply be replaced by error_report() or silently
fail since they are recoverable cases. Some missing error checks can
also help prevent later issues. In many cases, the code assumes
get_vhost_net() will be non-NULL after a succesful connection, so I
changed it to stay after a disconnect until the new connection comes
(as suggested by Michael). There are also code paths that are wrong,
see "don't assume opaque is a fd" patch for an example.

Since there is feature checks on reconnection, qemu should wait for
the initial connection feature negotiation to complete. The test added
demonstrates this.

For convenience, the series is also available on:
https://github.com/elmarco/qemu, branch vhost-user-reconnect

Marc-André Lureau (24):
  misc: indentation
  vhost-user: minor simplification
  qemu-char: check socket is actually connected
  vhost-user: check qemu_chr_fe_set_msgfds() return value
  vhost: change some assert() for error_report() or silent fail
  vhost-user: check vhost_user_write() return value
  vhost: use error_report() instead of fprintf(stderr,...)
  vhost-user: return a read error
  vhost: make vhost_log_put() idempotent
  vhost: call vhost_log_put() on cleanup
  vhost: add vhost device only after all success
  vhost: make vhost_dev_cleanup() idempotent
  vhost-net: always call vhost_dev_cleanup() on failure
  vhost: don't assume opaque is a fd, use backend cleanup
  vhost: fix calling vhost_dev_cleanup() after vhost_dev_init()
  vhost-user: keep vhost_net after a disconnection
  Revert "vhost-net: do not crash if backend is not present"
  get_vhost_net() should be != null after vhost_user_init
  vhost-net: success if backend has no ops->vhost_migration_done
  vhost: add assert() to check runtime behaviour
  char: add chr_wait_connected callback
  char: add and use tcp_chr_wait_connected
  vhost-user: wait until link is up
  tests: add /vhost-user/connect-fail test

 hw/net/vhost_net.c      |  19 +++-----
 hw/virtio/vhost-user.c  |  54 +++++++++++++++--------
 hw/virtio/vhost.c       | 114 +++++++++++++++++++++++++++---------------------
 include/sysemu/char.h   |   8 ++++
 net/tap.c               |   1 +
 net/vhost-user.c        |  36 ++++++++-------
 qemu-char.c             |  82 ++++++++++++++++++++++++----------
 tests/vhost-user-test.c |  37 ++++++++++++++++
 8 files changed, 233 insertions(+), 118 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 01/24] misc: indentation
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 02/24] vhost-user: minor simplification marcandre.lureau
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 50f4dcd..3f1fecc 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -313,7 +313,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
          * properly.
          */
         if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
-                dev->use_guest_notifier_mask = false;
+            dev->use_guest_notifier_mask = false;
         }
      }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 02/24] vhost-user: minor simplification
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 01/24] misc: indentation marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 03/24] qemu-char: check socket is actually connected marcandre.lureau
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 net/vhost-user.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index d72ce9b..392f982 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -310,7 +310,6 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
 {
     const char *name = opaque;
     const char *driver, *netdev;
-    const char virtio_name[] = "virtio-net-";
 
     driver = qemu_opt_get(opts, "driver");
     netdev = qemu_opt_get(opts, "netdev");
@@ -320,7 +319,7 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
     }
 
     if (strcmp(netdev, name) == 0 &&
-        strncmp(driver, virtio_name, strlen(virtio_name)) != 0) {
+        !g_str_has_prefix(driver, "virtio-net-")) {
         error_setg(errp, "vhost-user requires frontend driver virtio-net-*");
         return -1;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 03/24] qemu-char: check socket is actually connected
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 01/24] misc: indentation marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 02/24] vhost-user: minor simplification marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 04/24] vhost-user: check qemu_chr_fe_set_msgfds() return value marcandre.lureau
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Calling qemu_chr_fe_set_msgfds() on unconnected socket can lead to crash
if s->ioc is NULL.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 84f49ac..06eaba3 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2761,14 +2761,16 @@ static int tcp_set_msgfds(CharDriverState *chr, int *fds, int num)
 {
     TCPCharDriver *s = chr->opaque;
 
-    if (!qio_channel_has_feature(s->ioc,
-                                 QIO_CHANNEL_FEATURE_FD_PASS)) {
-        return -1;
-    }
     /* clear old pending fd array */
     g_free(s->write_msgfds);
     s->write_msgfds = NULL;
 
+    if (!s->connected ||
+        !qio_channel_has_feature(s->ioc,
+                                 QIO_CHANNEL_FEATURE_FD_PASS)) {
+        return -1;
+    }
+
     if (num) {
         s->write_msgfds = g_new(int, num);
         memcpy(s->write_msgfds, fds, num * sizeof(int));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 04/24] vhost-user: check qemu_chr_fe_set_msgfds() return value
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (2 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 03/24] qemu-char: check socket is actually connected marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail marcandre.lureau
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Check return value, and drop the unnecessary 'if' check for fd_num.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost-user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 495e09f..5dae496 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -187,8 +187,8 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
         return 0;
     }
 
-    if (fd_num) {
-        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
+    if (qemu_chr_fe_set_msgfds(chr, fds, fd_num) < 0) {
+        return -1;
     }
 
     return qemu_chr_fe_write_all(chr, (const uint8_t *) msg, size) == size ?
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (3 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 04/24] vhost-user: check qemu_chr_fe_set_msgfds() return value marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-23  4:27   ` Michael S. Tsirkin
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value marcandre.lureau
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Calling a vhost operation may fail, especially with disconnectable
backends. Treat some that look harmless as recoverable errors (print
error, or ignore on error code path).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 81cc5b0..e2d1614 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -398,7 +398,10 @@ static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
     /* inform backend of log switching, this must be done before
        releasing the current log, to ensure no logging is lost */
     r = dev->vhost_ops->vhost_set_log_base(dev, log_base, log);
-    assert(r >= 0);
+    if (r < 0) {
+        error_report("Failed to change backend log");
+    }
+
     vhost_log_put(dev, true);
     dev->log = log;
     dev->log_size = size;
@@ -565,7 +568,9 @@ static void vhost_commit(MemoryListener *listener)
 
     if (!dev->log_enabled) {
         r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
-        assert(r >= 0);
+        if (r < 0) {
+            error_report("Failed to set mem table");
+        }
         dev->memory_changed = false;
         return;
     }
@@ -578,7 +583,9 @@ static void vhost_commit(MemoryListener *listener)
         vhost_dev_log_resize(dev, log_size + VHOST_LOG_BUFFER);
     }
     r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
-    assert(r >= 0);
+    if (r < 0) {
+        error_report("Failed to set mem table");
+    }
     /* To log less, can only decrease log size after table update. */
     if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
         vhost_dev_log_resize(dev, log_size);
@@ -647,6 +654,7 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
     };
     int r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
     if (r < 0) {
+        error_report("Failed to set vring addr");
         return -errno;
     }
     return 0;
@@ -660,12 +668,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev, bool enable_log)
         features |= 0x1ULL << VHOST_F_LOG_ALL;
     }
     r = dev->vhost_ops->vhost_set_features(dev, features);
+    if (r < 0) {
+        error_report("Failed to set features");
+    }
     return r < 0 ? -errno : 0;
 }
 
 static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
 {
-    int r, t, i, idx;
+    int r, i, idx;
     r = vhost_dev_set_features(dev, enable_log);
     if (r < 0) {
         goto err_features;
@@ -682,12 +693,10 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
 err_vq:
     for (; i >= 0; --i) {
         idx = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + i);
-        t = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
-                                     dev->log_enabled);
-        assert(t >= 0);
+        vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
+                                 dev->log_enabled);
     }
-    t = vhost_dev_set_features(dev, dev->log_enabled);
-    assert(t >= 0);
+    vhost_dev_set_features(dev, dev->log_enabled);
 err_features:
     return r;
 }
@@ -937,7 +946,6 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
         }
     }
 
-    assert (r >= 0);
     cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
                               0, virtio_queue_get_ring_size(vdev, idx));
     cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
@@ -1187,7 +1195,9 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n,
 
     file.index = hdev->vhost_ops->vhost_get_vq_index(hdev, n);
     r = hdev->vhost_ops->vhost_set_vring_call(hdev, &file);
-    assert(r >= 0);
+    if (r < 0) {
+        error_report("Failed to set vring call");
+    }
 }
 
 uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (4 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-23  4:27   ` Michael S. Tsirkin
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 07/24] vhost: use error_report() instead of fprintf(stderr, ...) marcandre.lureau
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Just some more error checking.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost-user.c | 44 +++++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 5dae496..e51df27 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -214,7 +214,9 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
         fds[fd_num++] = log->fd;
     }
 
-    vhost_user_write(dev, &msg, fds, fd_num);
+    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+        return -1;
+    }
 
     if (shmfd) {
         msg.size = 0;
@@ -275,7 +277,9 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
     msg.size += sizeof(msg.payload.memory.padding);
     msg.size += fd_num * sizeof(VhostUserMemoryRegion);
 
-    vhost_user_write(dev, &msg, fds, fd_num);
+    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+        return -1;
+    }
 
     return 0;
 }
@@ -290,7 +294,9 @@ static int vhost_user_set_vring_addr(struct vhost_dev *dev,
         .size = sizeof(msg.payload.addr),
     };
 
-    vhost_user_write(dev, &msg, NULL, 0);
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
 
     return 0;
 }
@@ -313,7 +319,9 @@ static int vhost_set_vring(struct vhost_dev *dev,
         .size = sizeof(msg.payload.state),
     };
 
-    vhost_user_write(dev, &msg, NULL, 0);
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
 
     return 0;
 }
@@ -360,7 +368,9 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .size = sizeof(msg.payload.state),
     };
 
-    vhost_user_write(dev, &msg, NULL, 0);
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
 
     if (vhost_user_read(dev, &msg) < 0) {
         return 0;
@@ -401,7 +411,9 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
         msg.payload.u64 |= VHOST_USER_VRING_NOFD_MASK;
     }
 
-    vhost_user_write(dev, &msg, fds, fd_num);
+    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+        return -1;
+    }
 
     return 0;
 }
@@ -427,7 +439,9 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64)
         .size = sizeof(msg.payload.u64),
     };
 
-    vhost_user_write(dev, &msg, NULL, 0);
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
 
     return 0;
 }
@@ -455,7 +469,9 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
         return 0;
     }
 
-    vhost_user_write(dev, &msg, NULL, 0);
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
 
     if (vhost_user_read(dev, &msg) < 0) {
         return 0;
@@ -489,7 +505,9 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
         .flags = VHOST_USER_VERSION,
     };
 
-    vhost_user_write(dev, &msg, NULL, 0);
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
 
     return 0;
 }
@@ -501,7 +519,9 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
         .flags = VHOST_USER_VERSION,
     };
 
-    vhost_user_write(dev, &msg, NULL, 0);
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
 
     return 0;
 }
@@ -588,7 +608,6 @@ static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
 static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
 {
     VhostUserMsg msg = { 0 };
-    int err;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
@@ -605,8 +624,7 @@ static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
         memcpy((char *)&msg.payload.u64, mac_addr, 6);
         msg.size = sizeof(msg.payload.u64);
 
-        err = vhost_user_write(dev, &msg, NULL, 0);
-        return err;
+        return vhost_user_write(dev, &msg, NULL, 0);
     }
     return -1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 07/24] vhost: use error_report() instead of fprintf(stderr, ...)
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (5 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 08/24] vhost-user: return a read error marcandre.lureau
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Let's use qemu proper error reporting API.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2d1614..4e23d73 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -425,11 +425,11 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
         l = vq->ring_size;
         p = cpu_physical_memory_map(vq->ring_phys, &l, 1);
         if (!p || l != vq->ring_size) {
-            fprintf(stderr, "Unable to map ring buffer for ring %d\n", i);
+            error_report("Unable to map ring buffer for ring %d", i);
             r = -ENOMEM;
         }
         if (p != vq->ring) {
-            fprintf(stderr, "Ring buffer relocated for ring %d\n", i);
+            error_report("Ring buffer relocated for ring %d", i);
             r = -EBUSY;
         }
         cpu_physical_memory_unmap(p, l, 0, 0);
@@ -928,8 +928,7 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
 
     r = dev->vhost_ops->vhost_get_vring_base(dev, &state);
     if (r < 0) {
-        fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r);
-        fflush(stderr);
+        error_report("vhost VQ %d ring restore failed: %d", idx, r);
     }
     virtio_queue_set_last_avail_idx(vdev, idx, state.num);
     virtio_queue_invalidate_signalled_used(vdev, idx);
@@ -1016,8 +1015,8 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     }
 
     if (used_memslots > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
-        fprintf(stderr, "vhost backend memory slots limit is less"
-                " than current number of present memory slots\n");
+        error_report("vhost backend memory slots limit is less"
+                " than current number of present memory slots");
         close((uintptr_t)opaque);
         return -1;
     }
@@ -1119,7 +1118,7 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     int i, r, e;
     if (!k->set_host_notifier) {
-        fprintf(stderr, "binding does not support host notifiers\n");
+        error_report("binding does not support host notifiers");
         r = -ENOSYS;
         goto fail;
     }
@@ -1127,7 +1126,7 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     for (i = 0; i < hdev->nvqs; ++i) {
         r = k->set_host_notifier(qbus->parent, hdev->vq_index + i, true);
         if (r < 0) {
-            fprintf(stderr, "vhost VQ %d notifier binding failed: %d\n", i, -r);
+            error_report("vhost VQ %d notifier binding failed: %d", i, -r);
             goto fail_vq;
         }
     }
@@ -1137,8 +1136,7 @@ fail_vq:
     while (--i >= 0) {
         e = k->set_host_notifier(qbus->parent, hdev->vq_index + i, false);
         if (e < 0) {
-            fprintf(stderr, "vhost VQ %d notifier cleanup error: %d\n", i, -r);
-            fflush(stderr);
+            error_report("vhost VQ %d notifier cleanup error: %d", i, -r);
         }
         assert (e >= 0);
     }
@@ -1161,8 +1159,7 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     for (i = 0; i < hdev->nvqs; ++i) {
         r = k->set_host_notifier(qbus->parent, hdev->vq_index + i, false);
         if (r < 0) {
-            fprintf(stderr, "vhost VQ %d notifier cleanup failed: %d\n", i, -r);
-            fflush(stderr);
+            error_report("vhost VQ %d notifier cleanup failed: %d", i, -r);
         }
         assert (r >= 0);
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (6 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 07/24] vhost: use error_report() instead of fprintf(stderr, ...) marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-23  4:27   ` Michael S. Tsirkin
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 09/24] vhost: make vhost_log_put() idempotent marcandre.lureau
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Return read errors (not sure why those were ignored)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost-user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e51df27..819481d 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -221,7 +221,7 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     if (shmfd) {
         msg.size = 0;
         if (vhost_user_read(dev, &msg) < 0) {
-            return 0;
+            return -1;
         }
 
         if (msg.request != VHOST_USER_SET_LOG_BASE) {
@@ -373,7 +373,7 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
     }
 
     if (vhost_user_read(dev, &msg) < 0) {
-        return 0;
+        return -1;
     }
 
     if (msg.request != VHOST_USER_GET_VRING_BASE) {
@@ -474,7 +474,7 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
     }
 
     if (vhost_user_read(dev, &msg) < 0) {
-        return 0;
+        return -1;
     }
 
     if (msg.request != request) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 09/24] vhost: make vhost_log_put() idempotent
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (7 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 08/24] vhost-user: return a read error marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 10/24] vhost: call vhost_log_put() on cleanup marcandre.lureau
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Clear dev->log* when calling vhost_log_put()

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 4e23d73..febf64f 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -362,6 +362,8 @@ static void vhost_log_put(struct vhost_dev *dev, bool sync)
     if (!log) {
         return;
     }
+    dev->log = NULL;
+    dev->log_size = 0;
 
     --log->refcnt;
     if (log->refcnt == 0) {
@@ -719,8 +721,6 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
             return r;
         }
         vhost_log_put(dev, false);
-        dev->log = NULL;
-        dev->log_size = 0;
     } else {
         vhost_dev_log_resize(dev, vhost_get_log_size(dev));
         r = vhost_dev_set_log(dev, true);
@@ -1298,7 +1298,4 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
 
     vhost_log_put(hdev, true);
     hdev->started = false;
-    hdev->log = NULL;
-    hdev->log_size = 0;
 }
-
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 10/24] vhost: call vhost_log_put() on cleanup
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (8 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 09/24] vhost: make vhost_log_put() idempotent marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 11/24] vhost: add vhost device only after all success marcandre.lureau
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Make sure the log is being released on cleanup and the structure
cleared.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index febf64f..237db77 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1105,6 +1105,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
     g_free(hdev->mem);
     g_free(hdev->mem_sections);
     hdev->vhost_ops->vhost_backend_cleanup(hdev);
+    vhost_log_put(hdev, false);
     QLIST_REMOVE(hdev, entry);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 11/24] vhost: add vhost device only after all success
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (9 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 10/24] vhost: call vhost_log_put() on cleanup marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 12/24] vhost: make vhost_dev_cleanup() idempotent marcandre.lureau
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

This helps following vhost_dev_cleanup() patch to check if the device
was added before removing it from the list.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 237db77..2e5cedc 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1020,7 +1020,6 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
         close((uintptr_t)opaque);
         return -1;
     }
-    QLIST_INSERT_HEAD(&vhost_devices, hdev, entry);
 
     r = hdev->vhost_ops->vhost_set_owner(hdev);
     if (r < 0) {
@@ -1079,6 +1078,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     hdev->started = false;
     hdev->memory_changed = false;
     memory_listener_register(&hdev->memory_listener, &address_space_memory);
+    QLIST_INSERT_HEAD(&vhost_devices, hdev, entry);
     return 0;
 fail_vq:
     while (--i >= 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 12/24] vhost: make vhost_dev_cleanup() idempotent
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (10 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 11/24] vhost: add vhost device only after all success marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 13/24] vhost-net: always call vhost_dev_cleanup() on failure marcandre.lureau
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

May be called on multiple code path, so it's easier to make it safe in
the first place.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2e5cedc..c3d3481 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1094,19 +1094,27 @@ fail:
 void vhost_dev_cleanup(struct vhost_dev *hdev)
 {
     int i;
+
     for (i = 0; i < hdev->nvqs; ++i) {
         vhost_virtqueue_cleanup(hdev->vqs + i);
     }
-    memory_listener_unregister(&hdev->memory_listener);
+    if (hdev->mem) {
+        /* those are only safe after successful init */
+        memory_listener_unregister(&hdev->memory_listener);
+        QLIST_REMOVE(hdev, entry);
+    }
     if (hdev->migration_blocker) {
         migrate_del_blocker(hdev->migration_blocker);
         error_free(hdev->migration_blocker);
     }
     g_free(hdev->mem);
     g_free(hdev->mem_sections);
-    hdev->vhost_ops->vhost_backend_cleanup(hdev);
+    if (hdev->vhost_ops) {
+        hdev->vhost_ops->vhost_backend_cleanup(hdev);
+    }
     vhost_log_put(hdev, false);
-    QLIST_REMOVE(hdev, entry);
+
+    memset(hdev, 0, sizeof(struct vhost_dev));
 }
 
 /* Stop processing guest IO notifications in qemu.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 13/24] vhost-net: always call vhost_dev_cleanup() on failure
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (11 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 12/24] vhost: make vhost_dev_cleanup() idempotent marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 14/24] vhost: don't assume opaque is a fd, use backend cleanup marcandre.lureau
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

vhost_dev_init() may need to be cleaned up on failure too. Just call
vhost_dev_cleanup() in all cases. But first zero-alloc the struct to
avoid the garbage.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 3f1fecc..1da05d0 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -140,7 +140,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
     int r;
     bool backend_kernel = options->backend_type == VHOST_BACKEND_TYPE_KERNEL;
-    struct vhost_net *net = g_malloc(sizeof *net);
+    struct vhost_net *net = g_new0(struct vhost_net, 1);
     uint64_t features = 0;
 
     if (!options->net_backend) {
@@ -185,7 +185,6 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
             fprintf(stderr, "vhost lacks feature mask %" PRIu64
                    " for backend\n",
                    (uint64_t)(~net->dev.features & net->dev.backend_features));
-            vhost_dev_cleanup(&net->dev);
             goto fail;
         }
     }
@@ -197,7 +196,6 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
             fprintf(stderr, "vhost lacks feature mask %" PRIu64
                     " for backend\n",
                     (uint64_t)(~net->dev.features & features));
-            vhost_dev_cleanup(&net->dev);
             goto fail;
         }
     }
@@ -205,7 +203,9 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
     vhost_net_ack_features(net, features);
 
     return net;
+
 fail:
+    vhost_dev_cleanup(&net->dev);
     g_free(net);
     return NULL;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 14/24] vhost: don't assume opaque is a fd, use backend cleanup
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (12 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 13/24] vhost-net: always call vhost_dev_cleanup() on failure marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 15/24] vhost: fix calling vhost_dev_cleanup() after vhost_dev_init() marcandre.lureau
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index c3d3481..2c6535c 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1004,21 +1004,19 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 
     hdev->migration_blocker = NULL;
 
-    if (vhost_set_backend_type(hdev, backend_type) < 0) {
-        close((uintptr_t)opaque);
-        return -1;
-    }
+    r = vhost_set_backend_type(hdev, backend_type);
+    assert(r >= 0);
 
-    if (hdev->vhost_ops->vhost_backend_init(hdev, opaque) < 0) {
-        close((uintptr_t)opaque);
-        return -errno;
+    r = hdev->vhost_ops->vhost_backend_init(hdev, opaque);
+    if (r < 0) {
+        goto fail;
     }
 
     if (used_memslots > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
         error_report("vhost backend memory slots limit is less"
                 " than current number of present memory slots");
-        close((uintptr_t)opaque);
-        return -1;
+        r = -1;
+        goto fail;
     }
 
     r = hdev->vhost_ops->vhost_set_owner(hdev);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 15/24] vhost: fix calling vhost_dev_cleanup() after vhost_dev_init()
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (13 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 14/24] vhost: don't assume opaque is a fd, use backend cleanup marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 16/24] vhost-user: keep vhost_net after a disconnection marcandre.lureau
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

vhost_net_init() calls vhost_dev_init() and in case of failure, calls
vhost_dev_cleanup() directly. However, the structure is already
partially cleaned on error. Calling vhost_dev_cleanup() again will call
vhost_virtqueue_cleanup() on already clean queues, and causing potential
double-close. Instead, adjust dev->nvqs and simplify vhost_dev_init()
code to call vhost_virtqueue_cleanup().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c6535c..e4eb97e 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1032,7 +1032,8 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     for (i = 0; i < hdev->nvqs; ++i) {
         r = vhost_virtqueue_init(hdev, hdev->vqs + i, hdev->vq_index + i);
         if (r < 0) {
-            goto fail_vq;
+            hdev->nvqs = i;
+            goto fail;
         }
     }
     hdev->features = features;
@@ -1078,14 +1079,9 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     memory_listener_register(&hdev->memory_listener, &address_space_memory);
     QLIST_INSERT_HEAD(&vhost_devices, hdev, entry);
     return 0;
-fail_vq:
-    while (--i >= 0) {
-        vhost_virtqueue_cleanup(hdev->vqs + i);
-    }
+
 fail:
-    r = -errno;
-    hdev->vhost_ops->vhost_backend_cleanup(hdev);
-    QLIST_REMOVE(hdev, entry);
+    vhost_dev_cleanup(hdev);
     return r;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 16/24] vhost-user: keep vhost_net after a disconnection
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (14 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 15/24] vhost: fix calling vhost_dev_cleanup() after vhost_dev_init() marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 17/24] Revert "vhost-net: do not crash if backend is not present" marcandre.lureau
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Lot of code path assumes get_vhost_net() return non-null. Keep it
after a successful vhost_net_init(). vhost_net_cleanup() will clear the
vhost-dev connection-related data after vhost_user_stop().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c |  1 -
 net/tap.c          |  1 +
 net/vhost-user.c   | 21 ++++++++-------------
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 1da05d0..08c7d1e 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -378,7 +378,6 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 void vhost_net_cleanup(struct vhost_net *net)
 {
     vhost_dev_cleanup(&net->dev);
-    g_free(net);
 }
 
 int vhost_net_notify_migration_done(struct vhost_net *net, char* mac_addr)
diff --git a/net/tap.c b/net/tap.c
index 49817c7..31726e7 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -299,6 +299,7 @@ static void tap_cleanup(NetClientState *nc)
 
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
+        g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
 
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 392f982..4c7702f 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -45,11 +45,6 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
     return s->acked_features;
 }
 
-static int vhost_user_running(VhostUserState *s)
-{
-    return (s->vhost_net) ? 1 : 0;
-}
-
 static void vhost_user_stop(int queues, NetClientState *ncs[])
 {
     VhostUserState *s;
@@ -59,15 +54,11 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
         assert (ncs[i]->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
 
         s = DO_UPCAST(VhostUserState, nc, ncs[i]);
-        if (!vhost_user_running(s)) {
-            continue;
-        }
 
         if (s->vhost_net) {
             /* save acked features */
-            s->acked_features = vhost_net_get_acked_features(s->vhost_net);
+            s->acked_features |= vhost_net_get_acked_features(s->vhost_net);
             vhost_net_cleanup(s->vhost_net);
-            s->vhost_net = NULL;
         }
     }
 }
@@ -85,12 +76,15 @@ static int vhost_user_start(int queues, NetClientState *ncs[])
         assert (ncs[i]->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
 
         s = DO_UPCAST(VhostUserState, nc, ncs[i]);
-        if (vhost_user_running(s)) {
-            continue;
-        }
 
         options.net_backend = ncs[i];
         options.opaque      = s->chr;
+
+        if (s->vhost_net) {
+            vhost_net_cleanup(s->vhost_net);
+            g_free(s->vhost_net);
+        }
+
         s->vhost_net = vhost_net_init(&options);
         if (!s->vhost_net) {
             error_report("failed to init vhost_net for queue %d", i);
@@ -149,6 +143,7 @@ static void vhost_user_cleanup(NetClientState *nc)
 
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
+        g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 17/24] Revert "vhost-net: do not crash if backend is not present"
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (15 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 16/24] vhost-user: keep vhost_net after a disconnection marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 18/24] get_vhost_net() should be != null after vhost_user_init marcandre.lureau
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Now that get_vhost_net() returns non-null after a successful
vhost_net_init(), we no longer need to check this case.

This reverts commit ecd34898596c60f79886061618dd7e01001113ad.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 08c7d1e..f1dd367 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -428,15 +428,10 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
     VHostNetState *net = get_vhost_net(nc);
-    const VhostOps *vhost_ops;
+    const VhostOps *vhost_ops = net->dev.vhost_ops;
 
     nc->vring_enable = enable;
 
-    if (!net) {
-        return 0;
-    }
-
-    vhost_ops = net->dev.vhost_ops;
     if (vhost_ops->vhost_set_vring_enable) {
         return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 18/24] get_vhost_net() should be != null after vhost_user_init
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (16 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 17/24] Revert "vhost-net: do not crash if backend is not present" marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 19/24] vhost-net: success if backend has no ops->vhost_migration_done marcandre.lureau
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 1 +
 net/vhost-user.c   | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index f1dd367..22ea653 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -417,6 +417,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
         break;
     case NET_CLIENT_OPTIONS_KIND_VHOST_USER:
         vhost_net = vhost_user_get_vhost_net(nc);
+        assert(vhost_net != NULL);
         break;
     default:
         break;
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 4c7702f..95ed2d2 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -250,6 +250,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, nc[0].name);
 
+    assert(s->vhost_net != NULL);
+
     return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 19/24] vhost-net: success if backend has no ops->vhost_migration_done
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (17 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 18/24] get_vhost_net() should be != null after vhost_user_init marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 20/24] vhost: add assert() to check runtime behaviour marcandre.lureau
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 22ea653..cc2f68a 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -383,7 +383,7 @@ void vhost_net_cleanup(struct vhost_net *net)
 int vhost_net_notify_migration_done(struct vhost_net *net, char* mac_addr)
 {
     const VhostOps *vhost_ops = net->dev.vhost_ops;
-    int r = -1;
+    int r = 0;
 
     if (vhost_ops->vhost_migration_done) {
         r = vhost_ops->vhost_migration_done(&net->dev, mac_addr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 20/24] vhost: add assert() to check runtime behaviour
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (18 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 19/24] vhost-net: success if backend has no ops->vhost_migration_done marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 21/24] char: add chr_wait_connected callback marcandre.lureau
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

All these functions must be called only after the backend is connected.
They are called from virtio-net.c, after either virtio or link status
change.

The check for nc->peer->link_down should ensure vhost_net_{start,stop}()
are always called between vhost_user_{start,stop}().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e4eb97e..e34bd39 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1186,6 +1186,9 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n,
     int r, index = n - hdev->vq_index;
     struct vhost_vring_file file;
 
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops != NULL);
+
     if (mask) {
         assert(vdev->use_guest_notifier_mask);
         file.fd = event_notifier_get_fd(&hdev->vqs[index].masked_notifier);
@@ -1232,6 +1235,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
     int i, r;
 
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops != NULL);
+
     hdev->started = true;
 
     r = vhost_dev_set_features(hdev, hdev->log_enabled);
@@ -1292,6 +1298,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
     int i;
 
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops != NULL);
+
     for (i = 0; i < hdev->nvqs; ++i) {
         vhost_virtqueue_stop(hdev,
                              vdev,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 21/24] char: add chr_wait_connected callback
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (19 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 20/24] vhost: add assert() to check runtime behaviour marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 22/24] char: add and use tcp_chr_wait_connected marcandre.lureau
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

A function to wait on the backend to be connected, to be used in the
following patches.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 include/sysemu/char.h | 8 ++++++++
 qemu-char.c           | 9 +++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 1eb2d0f..0e3c35f 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -70,6 +70,7 @@ struct CharDriverState {
     int (*get_msgfds)(struct CharDriverState *s, int* fds, int num);
     int (*set_msgfds)(struct CharDriverState *s, int *fds, int num);
     int (*chr_add_client)(struct CharDriverState *chr, int fd);
+    int (*chr_wait_connected)(struct CharDriverState *chr, Error **errp);
     IOEventHandler *chr_event;
     IOCanReadHandler *chr_can_read;
     IOReadHandler *chr_read;
@@ -152,6 +153,13 @@ CharDriverState *qemu_chr_new(const char *label, const char *filename,
 void qemu_chr_disconnect(CharDriverState *chr);
 
 /**
+ * @qemu_chr_wait_connected:
+ *
+ * Wait for characted backend to be connected.
+ */
+int qemu_chr_wait_connected(CharDriverState *chr, Error **errp);
+
+/**
  * @qemu_chr_new_noreplay:
  *
  * Create a new character backend from a URI.
diff --git a/qemu-char.c b/qemu-char.c
index 06eaba3..a0e7736 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3140,6 +3140,15 @@ static gboolean tcp_chr_accept(QIOChannel *channel,
     return TRUE;
 }
 
+int qemu_chr_wait_connected(CharDriverState *chr, Error **errp)
+{
+    if (chr->chr_wait_connected) {
+        return chr->chr_wait_connected(chr, errp);
+    }
+
+    return 0;
+}
+
 static void tcp_chr_close(CharDriverState *chr)
 {
     TCPCharDriver *s = chr->opaque;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 22/24] char: add and use tcp_chr_wait_connected
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (20 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 21/24] char: add chr_wait_connected callback marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up marcandre.lureau
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Add a chr_wait_connected for the tcp backend, and use it in the
open_socket() function.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c | 63 ++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 19 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index a0e7736..dca859d 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3140,6 +3140,32 @@ static gboolean tcp_chr_accept(QIOChannel *channel,
     return TRUE;
 }
 
+static int tcp_chr_wait_connected(CharDriverState *chr, Error **errp)
+{
+    TCPCharDriver *s = chr->opaque;
+    QIOChannelSocket *sioc;
+
+    while (!s->connected) {
+        if (s->is_listen) {
+            fprintf(stderr, "QEMU waiting for connection on: %s\n",
+                    chr->filename);
+            qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), true, NULL);
+            tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
+            qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), false, NULL);
+        } else {
+            sioc = qio_channel_socket_new();
+            if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
+                object_unref(OBJECT(sioc));
+                return -1;
+            }
+            tcp_chr_new_client(chr, sioc);
+            object_unref(OBJECT(sioc));
+        }
+    }
+
+    return 0;
+}
+
 int qemu_chr_wait_connected(CharDriverState *chr, Error **errp)
 {
     if (chr->chr_wait_connected) {
@@ -4403,6 +4429,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
     qapi_copy_SocketAddress(&s->addr, sock->addr);
 
     chr->opaque = s;
+    chr->chr_wait_connected = tcp_chr_wait_connected;
     chr->chr_write = tcp_chr_write;
     chr->chr_sync_read = tcp_chr_sync_read;
     chr->chr_close = tcp_chr_close;
@@ -4426,32 +4453,30 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
         s->reconnect_time = reconnect;
     }
 
-    sioc = qio_channel_socket_new();
     if (s->reconnect_time) {
+        sioc = qio_channel_socket_new();
         qio_channel_socket_connect_async(sioc, s->addr,
                                          qemu_chr_socket_connected,
                                          chr, NULL);
-    } else if (s->is_listen) {
-        if (qio_channel_socket_listen_sync(sioc, s->addr, errp) < 0) {
-            goto error;
-        }
-        s->listen_ioc = sioc;
-        if (is_waitconnect) {
-            fprintf(stderr, "QEMU waiting for connection on: %s\n",
-                    chr->filename);
-            tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
-        }
-        qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), false, NULL);
-        if (!s->ioc) {
-            s->listen_tag = qio_channel_add_watch(
-                QIO_CHANNEL(s->listen_ioc), G_IO_IN, tcp_chr_accept, chr, NULL);
-        }
     } else {
-        if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
+        if (s->is_listen) {
+            sioc = qio_channel_socket_new();
+            if (qio_channel_socket_listen_sync(sioc, s->addr, errp) < 0) {
+                goto error;
+            }
+            s->listen_ioc = sioc;
+            if (is_waitconnect &&
+                qemu_chr_wait_connected(chr, errp) < 0) {
+                goto error;
+            }
+            if (!s->ioc) {
+                s->listen_tag = qio_channel_add_watch(
+                    QIO_CHANNEL(s->listen_ioc), G_IO_IN,
+                    tcp_chr_accept, chr, NULL);
+            }
+        } else if (qemu_chr_wait_connected(chr, errp) < 0) {
             goto error;
         }
-        tcp_chr_new_client(chr, sioc);
-        object_unref(OBJECT(sioc));
     }
 
     return chr;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (21 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 22/24] char: add and use tcp_chr_wait_connected marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-23  4:25   ` Michael S. Tsirkin
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 24/24] tests: add /vhost-user/connect-fail test marcandre.lureau
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

The chardev waits for an initial connection before starting qemu,
vhost-user wants the backend negotiation to be completed. vhost-user is
started in the net_vhost_user_event callback, which is synchronously
called after the socket is connected.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 net/vhost-user.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 95ed2d2..bb4a253 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -248,7 +248,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         s->chr = chr;
     }
 
-    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, nc[0].name);
+    do {
+        Error *err = NULL;
+        if (qemu_chr_wait_connected(chr, &err) < 0) {
+            error_report_err(err);
+            return -1;
+        }
+        qemu_chr_add_handlers(chr, NULL, NULL,
+                              net_vhost_user_event, nc[0].name);
+    } while (nc->link_down);
 
     assert(s->vhost_net != NULL);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 24/24] tests: add /vhost-user/connect-fail test
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (22 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up marcandre.lureau
@ 2016-06-21 10:02 ` marcandre.lureau
  2016-06-23  4:31 ` [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes Michael S. Tsirkin
  2016-06-23  4:32 ` Michael S. Tsirkin
  25 siblings, 0 replies; 60+ messages in thread
From: marcandre.lureau @ 2016-06-21 10:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: mukawa, yuanhan.liu, victork, jonshin, mst, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Check early connection failure and resume.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/vhost-user-test.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
index 8b2164b..7fe12e6 100644
--- a/tests/vhost-user-test.c
+++ b/tests/vhost-user-test.c
@@ -131,6 +131,7 @@ typedef struct TestServer {
     GCond data_cond;
     int log_fd;
     uint64_t rings;
+    bool test_fail;
 } TestServer;
 
 #if !GLIB_CHECK_VERSION(2, 32, 0)
@@ -234,6 +235,12 @@ static void chr_read(void *opaque, const uint8_t *buf, int size)
     uint8_t *p = (uint8_t *) &msg;
     int fd;
 
+    if (s->test_fail) {
+        qemu_chr_disconnect(chr);
+        /* now switch to non-failure */
+        s->test_fail = false;
+    }
+
     if (size != VHOST_USER_HDR_SIZE) {
         g_test_message("Wrong message size received %d\n", size);
         return;
@@ -697,6 +704,33 @@ static void test_reconnect(void)
     g_test_trap_subprocess(path, 0, 0);
     g_test_trap_assert_passed();
 }
+
+static void test_connect_fail_subprocess(void)
+{
+    TestServer *s = test_server_new("reconnect");
+    char *cmd;
+
+    s->test_fail = true;
+    g_thread_new("connect", connect_thread, s);
+    cmd = GET_QEMU_CMDE(s, 2, ",server", "");
+    qtest_start(cmd);
+    g_free(cmd);
+
+    wait_for_fds(s);
+    wait_for_rings_started(s, 2);
+
+    qtest_end();
+    test_server_free(s);
+}
+
+static void test_connect_fail(void)
+{
+    gchar *path = g_strdup_printf("/%s/vhost-user/connect-fail/subprocess",
+                                  qtest_get_arch());
+    g_test_trap_subprocess(path, 0, 0);
+    g_test_trap_assert_passed();
+}
+
 #endif
 
 int main(int argc, char **argv)
@@ -747,6 +781,9 @@ int main(int argc, char **argv)
     qtest_add_func("/vhost-user/reconnect/subprocess",
                    test_reconnect_subprocess);
     qtest_add_func("/vhost-user/reconnect", test_reconnect);
+    qtest_add_func("/vhost-user/connect-fail/subprocess",
+                   test_connect_fail_subprocess);
+    qtest_add_func("/vhost-user/connect-fail", test_connect_fail);
 #endif
 
     ret = g_test_run();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up marcandre.lureau
@ 2016-06-23  4:25   ` Michael S. Tsirkin
  2016-06-23  9:11     ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23  4:25 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: qemu-devel, mukawa, yuanhan.liu, victork, jonshin

On Tue, Jun 21, 2016 at 12:02:51PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> The chardev waits for an initial connection before starting qemu,
> vhost-user wants the backend negotiation to be completed. vhost-user is
> started in the net_vhost_user_event callback, which is synchronously
> called after the socket is connected.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  net/vhost-user.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 95ed2d2..bb4a253 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -248,7 +248,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>          s->chr = chr;
>      }
>  
> -    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, nc[0].name);
> +    do {
> +        Error *err = NULL;
> +        if (qemu_chr_wait_connected(chr, &err) < 0) {
> +            error_report_err(err);
> +            return -1;
> +        }
> +        qemu_chr_add_handlers(chr, NULL, NULL,
> +                              net_vhost_user_event, nc[0].name);
> +    } while (nc->link_down);
>  
>      assert(s->vhost_net != NULL);


I don't get why this makes sense.
Why should vhost user poke at link down at all?

> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail marcandre.lureau
@ 2016-06-23  4:27   ` Michael S. Tsirkin
  2016-06-23  9:19     ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23  4:27 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: qemu-devel, mukawa, yuanhan.liu, victork, jonshin

On Tue, Jun 21, 2016 at 12:02:33PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Calling a vhost operation may fail, especially with disconnectable
> backends. Treat some that look harmless as recoverable errors (print
> error, or ignore on error code path).
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

If it's ok and we can recover, then why should we print errors?

> ---
>  hw/virtio/vhost.c | 32 +++++++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 81cc5b0..e2d1614 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -398,7 +398,10 @@ static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
>      /* inform backend of log switching, this must be done before
>         releasing the current log, to ensure no logging is lost */
>      r = dev->vhost_ops->vhost_set_log_base(dev, log_base, log);
> -    assert(r >= 0);
> +    if (r < 0) {
> +        error_report("Failed to change backend log");
> +    }
> +
>      vhost_log_put(dev, true);
>      dev->log = log;
>      dev->log_size = size;
> @@ -565,7 +568,9 @@ static void vhost_commit(MemoryListener *listener)
>  
>      if (!dev->log_enabled) {
>          r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> -        assert(r >= 0);
> +        if (r < 0) {
> +            error_report("Failed to set mem table");
> +        }
>          dev->memory_changed = false;
>          return;
>      }
> @@ -578,7 +583,9 @@ static void vhost_commit(MemoryListener *listener)
>          vhost_dev_log_resize(dev, log_size + VHOST_LOG_BUFFER);
>      }
>      r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> -    assert(r >= 0);
> +    if (r < 0) {
> +        error_report("Failed to set mem table");
> +    }
>      /* To log less, can only decrease log size after table update. */
>      if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
>          vhost_dev_log_resize(dev, log_size);
> @@ -647,6 +654,7 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>      };
>      int r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
>      if (r < 0) {
> +        error_report("Failed to set vring addr");
>          return -errno;
>      }
>      return 0;
> @@ -660,12 +668,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev, bool enable_log)
>          features |= 0x1ULL << VHOST_F_LOG_ALL;
>      }
>      r = dev->vhost_ops->vhost_set_features(dev, features);
> +    if (r < 0) {
> +        error_report("Failed to set features");
> +    }
>      return r < 0 ? -errno : 0;
>  }
>  
>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>  {
> -    int r, t, i, idx;
> +    int r, i, idx;
>      r = vhost_dev_set_features(dev, enable_log);
>      if (r < 0) {
>          goto err_features;
> @@ -682,12 +693,10 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>  err_vq:
>      for (; i >= 0; --i) {
>          idx = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + i);
> -        t = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> -                                     dev->log_enabled);
> -        assert(t >= 0);
> +        vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> +                                 dev->log_enabled);
>      }
> -    t = vhost_dev_set_features(dev, dev->log_enabled);
> -    assert(t >= 0);
> +    vhost_dev_set_features(dev, dev->log_enabled);
>  err_features:
>      return r;
>  }
> @@ -937,7 +946,6 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>          }
>      }
>  
> -    assert (r >= 0);
>      cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
>                                0, virtio_queue_get_ring_size(vdev, idx));
>      cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
> @@ -1187,7 +1195,9 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n,
>  
>      file.index = hdev->vhost_ops->vhost_get_vq_index(hdev, n);
>      r = hdev->vhost_ops->vhost_set_vring_call(hdev, &file);
> -    assert(r >= 0);
> +    if (r < 0) {
> +        error_report("Failed to set vring call");
> +    }
>  }
>  
>  uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value marcandre.lureau
@ 2016-06-23  4:27   ` Michael S. Tsirkin
  2016-06-23  9:16     ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23  4:27 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: qemu-devel, mukawa, yuanhan.liu, victork, jonshin

On Tue, Jun 21, 2016 at 12:02:34PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Just some more error checking.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Point being? Callers just ignore it afterwards ...


> ---
>  hw/virtio/vhost-user.c | 44 +++++++++++++++++++++++++++++++-------------
>  1 file changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 5dae496..e51df27 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -214,7 +214,9 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>          fds[fd_num++] = log->fd;
>      }
>  
> -    vhost_user_write(dev, &msg, fds, fd_num);
> +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> +        return -1;
> +    }
>  
>      if (shmfd) {
>          msg.size = 0;
> @@ -275,7 +277,9 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>      msg.size += sizeof(msg.payload.memory.padding);
>      msg.size += fd_num * sizeof(VhostUserMemoryRegion);
>  
> -    vhost_user_write(dev, &msg, fds, fd_num);
> +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> +        return -1;
> +    }
>  
>      return 0;
>  }
> @@ -290,7 +294,9 @@ static int vhost_user_set_vring_addr(struct vhost_dev *dev,
>          .size = sizeof(msg.payload.addr),
>      };
>  
> -    vhost_user_write(dev, &msg, NULL, 0);
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
>  
>      return 0;
>  }
> @@ -313,7 +319,9 @@ static int vhost_set_vring(struct vhost_dev *dev,
>          .size = sizeof(msg.payload.state),
>      };
>  
> -    vhost_user_write(dev, &msg, NULL, 0);
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
>  
>      return 0;
>  }
> @@ -360,7 +368,9 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .size = sizeof(msg.payload.state),
>      };
>  
> -    vhost_user_write(dev, &msg, NULL, 0);
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
>  
>      if (vhost_user_read(dev, &msg) < 0) {
>          return 0;
> @@ -401,7 +411,9 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
>          msg.payload.u64 |= VHOST_USER_VRING_NOFD_MASK;
>      }
>  
> -    vhost_user_write(dev, &msg, fds, fd_num);
> +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> +        return -1;
> +    }
>  
>      return 0;
>  }
> @@ -427,7 +439,9 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64)
>          .size = sizeof(msg.payload.u64),
>      };
>  
> -    vhost_user_write(dev, &msg, NULL, 0);
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
>  
>      return 0;
>  }
> @@ -455,7 +469,9 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
>          return 0;
>      }
>  
> -    vhost_user_write(dev, &msg, NULL, 0);
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
>  
>      if (vhost_user_read(dev, &msg) < 0) {
>          return 0;
> @@ -489,7 +505,9 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
>          .flags = VHOST_USER_VERSION,
>      };
>  
> -    vhost_user_write(dev, &msg, NULL, 0);
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
>  
>      return 0;
>  }
> @@ -501,7 +519,9 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
>          .flags = VHOST_USER_VERSION,
>      };
>  
> -    vhost_user_write(dev, &msg, NULL, 0);
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
>  
>      return 0;
>  }
> @@ -588,7 +608,6 @@ static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
>  static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
>  {
>      VhostUserMsg msg = { 0 };
> -    int err;
>  
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
> @@ -605,8 +624,7 @@ static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr)
>          memcpy((char *)&msg.payload.u64, mac_addr, 6);
>          msg.size = sizeof(msg.payload.u64);
>  
> -        err = vhost_user_write(dev, &msg, NULL, 0);
> -        return err;
> +        return vhost_user_write(dev, &msg, NULL, 0);
>      }
>      return -1;
>  }
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 08/24] vhost-user: return a read error marcandre.lureau
@ 2016-06-23  4:27   ` Michael S. Tsirkin
  2016-06-23  9:14     ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23  4:27 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: qemu-devel, mukawa, yuanhan.liu, victork, jonshin

On Tue, Jun 21, 2016 at 12:02:36PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Return read errors (not sure why those were ignored)
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

why bother?  So callers can just ignore them in turn?


> ---
>  hw/virtio/vhost-user.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e51df27..819481d 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -221,7 +221,7 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>      if (shmfd) {
>          msg.size = 0;
>          if (vhost_user_read(dev, &msg) < 0) {
> -            return 0;
> +            return -1;
>          }
>  
>          if (msg.request != VHOST_USER_SET_LOG_BASE) {
> @@ -373,7 +373,7 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>      }
>  
>      if (vhost_user_read(dev, &msg) < 0) {
> -        return 0;
> +        return -1;
>      }
>  
>      if (msg.request != VHOST_USER_GET_VRING_BASE) {
> @@ -474,7 +474,7 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
>      }
>  
>      if (vhost_user_read(dev, &msg) < 0) {
> -        return 0;
> +        return -1;
>      }
>  
>      if (msg.request != request) {
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (23 preceding siblings ...)
  2016-06-21 10:02 ` [Qemu-devel] [PATCH 24/24] tests: add /vhost-user/connect-fail test marcandre.lureau
@ 2016-06-23  4:31 ` Michael S. Tsirkin
  2016-06-24 13:22   ` Marc-André Lureau
  2016-06-23  4:32 ` Michael S. Tsirkin
  25 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23  4:31 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: qemu-devel, mukawa, yuanhan.liu, victork, jonshin

On Tue, Jun 21, 2016 at 12:02:28PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Hi,
> 
> Since 'vhost-user: simple reconnection support' was merged, it is
> possible to disconnect and reconnect a vhost-user backend. However,
> many code paths in qemu may trigger assert() when the backend is
> disconnected.
> 
> Some assert() could simply be replaced by error_report() or silently
> fail since they are recoverable cases. Some missing error checks can
> also help prevent later issues. In many cases, the code assumes
> get_vhost_net() will be non-NULL after a succesful connection, so I
> changed it to stay after a disconnect until the new connection comes
> (as suggested by Michael). There are also code paths that are wrong,
> see "don't assume opaque is a fd" patch for an example.

OK so if it's ok for read or write to fail, then I think
the callback should just return success.
This will make sure backends such as vhost net in kernel
where failure indicates an internal bug are handled
correctly, while vhost user ignores errors and attempts
to go on.

> Since there is feature checks on reconnection, qemu should wait for
> the initial connection feature negotiation to complete. The test added
> demonstrates this.


So on disconnect, we keep the guest going? But then when connect is
attempted, then we for some reason block guest until negotiation
is done? Sounds rather strange to me.

> For convenience, the series is also available on:
> https://github.com/elmarco/qemu, branch vhost-user-reconnect
> 
> Marc-André Lureau (24):
>   misc: indentation
>   vhost-user: minor simplification
>   qemu-char: check socket is actually connected
>   vhost-user: check qemu_chr_fe_set_msgfds() return value
>   vhost: change some assert() for error_report() or silent fail
>   vhost-user: check vhost_user_write() return value
>   vhost: use error_report() instead of fprintf(stderr,...)
>   vhost-user: return a read error
>   vhost: make vhost_log_put() idempotent
>   vhost: call vhost_log_put() on cleanup
>   vhost: add vhost device only after all success
>   vhost: make vhost_dev_cleanup() idempotent
>   vhost-net: always call vhost_dev_cleanup() on failure
>   vhost: don't assume opaque is a fd, use backend cleanup
>   vhost: fix calling vhost_dev_cleanup() after vhost_dev_init()
>   vhost-user: keep vhost_net after a disconnection
>   Revert "vhost-net: do not crash if backend is not present"
>   get_vhost_net() should be != null after vhost_user_init
>   vhost-net: success if backend has no ops->vhost_migration_done
>   vhost: add assert() to check runtime behaviour
>   char: add chr_wait_connected callback
>   char: add and use tcp_chr_wait_connected
>   vhost-user: wait until link is up
>   tests: add /vhost-user/connect-fail test
> 
>  hw/net/vhost_net.c      |  19 +++-----
>  hw/virtio/vhost-user.c  |  54 +++++++++++++++--------
>  hw/virtio/vhost.c       | 114 +++++++++++++++++++++++++++---------------------
>  include/sysemu/char.h   |   8 ++++
>  net/tap.c               |   1 +
>  net/vhost-user.c        |  36 ++++++++-------
>  qemu-char.c             |  82 ++++++++++++++++++++++++----------
>  tests/vhost-user-test.c |  37 ++++++++++++++++
>  8 files changed, 233 insertions(+), 118 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
  2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
                   ` (24 preceding siblings ...)
  2016-06-23  4:31 ` [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes Michael S. Tsirkin
@ 2016-06-23  4:32 ` Michael S. Tsirkin
  2016-06-24 13:17   ` Marc-André Lureau
  25 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23  4:32 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: qemu-devel, mukawa, yuanhan.liu, victork, jonshin

On Tue, Jun 21, 2016 at 12:02:28PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Hi,
> 
> Since 'vhost-user: simple reconnection support' was merged, it is
> possible to disconnect and reconnect a vhost-user backend. However,
> many code paths in qemu may trigger assert() when the backend is
> disconnected.
> 
> Some assert() could simply be replaced by error_report() or silently
> fail since they are recoverable cases. Some missing error checks can
> also help prevent later issues. In many cases, the code assumes
> get_vhost_net() will be non-NULL after a succesful connection, so I
> changed it to stay after a disconnect until the new connection comes
> (as suggested by Michael). There are also code paths that are wrong,
> see "don't assume opaque is a fd" patch for an example.

Can this be rarranged, first patches fixing the code paths that are wrong,
then patches handling the asserts etc?

> Since there is feature checks on reconnection, qemu should wait for
> the initial connection feature negotiation to complete. The test added
> demonstrates this.
> 
> For convenience, the series is also available on:
> https://github.com/elmarco/qemu, branch vhost-user-reconnect
> 
> Marc-André Lureau (24):
>   misc: indentation
>   vhost-user: minor simplification
>   qemu-char: check socket is actually connected
>   vhost-user: check qemu_chr_fe_set_msgfds() return value
>   vhost: change some assert() for error_report() or silent fail
>   vhost-user: check vhost_user_write() return value
>   vhost: use error_report() instead of fprintf(stderr,...)
>   vhost-user: return a read error
>   vhost: make vhost_log_put() idempotent
>   vhost: call vhost_log_put() on cleanup
>   vhost: add vhost device only after all success
>   vhost: make vhost_dev_cleanup() idempotent
>   vhost-net: always call vhost_dev_cleanup() on failure
>   vhost: don't assume opaque is a fd, use backend cleanup
>   vhost: fix calling vhost_dev_cleanup() after vhost_dev_init()
>   vhost-user: keep vhost_net after a disconnection
>   Revert "vhost-net: do not crash if backend is not present"
>   get_vhost_net() should be != null after vhost_user_init
>   vhost-net: success if backend has no ops->vhost_migration_done
>   vhost: add assert() to check runtime behaviour
>   char: add chr_wait_connected callback
>   char: add and use tcp_chr_wait_connected
>   vhost-user: wait until link is up
>   tests: add /vhost-user/connect-fail test
> 
>  hw/net/vhost_net.c      |  19 +++-----
>  hw/virtio/vhost-user.c  |  54 +++++++++++++++--------
>  hw/virtio/vhost.c       | 114 +++++++++++++++++++++++++++---------------------
>  include/sysemu/char.h   |   8 ++++
>  net/tap.c               |   1 +
>  net/vhost-user.c        |  36 ++++++++-------
>  qemu-char.c             |  82 ++++++++++++++++++++++++----------
>  tests/vhost-user-test.c |  37 ++++++++++++++++
>  8 files changed, 233 insertions(+), 118 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up
  2016-06-23  4:25   ` Michael S. Tsirkin
@ 2016-06-23  9:11     ` Marc-André Lureau
  2016-06-23 16:45       ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-23  9:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

Hi

----- Original Message -----
> On Tue, Jun 21, 2016 at 12:02:51PM +0200, marcandre.lureau@redhat.com wrote:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > The chardev waits for an initial connection before starting qemu,
> > vhost-user wants the backend negotiation to be completed. vhost-user is
> > started in the net_vhost_user_event callback, which is synchronously
> > called after the socket is connected.
> > 
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > ---
> >  net/vhost-user.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index 95ed2d2..bb4a253 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -248,7 +248,15 @@ static int net_vhost_user_init(NetClientState *peer,
> > const char *device,
> >          s->chr = chr;
> >      }
> >  
> > -    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event,
> > nc[0].name);
> > +    do {
> > +        Error *err = NULL;
> > +        if (qemu_chr_wait_connected(chr, &err) < 0) {
> > +            error_report_err(err);
> > +            return -1;
> > +        }
> > +        qemu_chr_add_handlers(chr, NULL, NULL,
> > +                              net_vhost_user_event, nc[0].name);
> > +    } while (nc->link_down);
> >  
> >      assert(s->vhost_net != NULL);
> 
> 
> I don't get why this makes sense.
> Why should vhost user poke at link down at all?

It should wait for an initial valid connection (see the test case). Only after vhost_user_start() should it keep going. We could have VhostUserState.ready set to true when done, or we could check after qmp_set_link(.., true) status, like I did here. Does that make sense?

thanks

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-06-23  4:27   ` Michael S. Tsirkin
@ 2016-06-23  9:14     ` Marc-André Lureau
  2016-06-23 17:03       ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-23  9:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

Hi

----- Original Message -----
> On Tue, Jun 21, 2016 at 12:02:36PM +0200, marcandre.lureau@redhat.com wrote:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > Return read errors (not sure why those were ignored)
> > 
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> why bother?  So callers can just ignore them in turn?

The callers do not always ignore errors, fortunately (see vhost_user_init() for ex). That at least prevents flooding terminal/monitor with multiple errors and helps finding out the origin of the error.

> 
> > ---
> >  hw/virtio/vhost-user.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index e51df27..819481d 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -221,7 +221,7 @@ static int vhost_user_set_log_base(struct vhost_dev
> > *dev, uint64_t base,
> >      if (shmfd) {
> >          msg.size = 0;
> >          if (vhost_user_read(dev, &msg) < 0) {
> > -            return 0;
> > +            return -1;
> >          }
> >  
> >          if (msg.request != VHOST_USER_SET_LOG_BASE) {
> > @@ -373,7 +373,7 @@ static int vhost_user_get_vring_base(struct vhost_dev
> > *dev,
> >      }
> >  
> >      if (vhost_user_read(dev, &msg) < 0) {
> > -        return 0;
> > +        return -1;
> >      }
> >  
> >      if (msg.request != VHOST_USER_GET_VRING_BASE) {
> > @@ -474,7 +474,7 @@ static int vhost_user_get_u64(struct vhost_dev *dev,
> > int request, uint64_t *u64)
> >      }
> >  
> >      if (vhost_user_read(dev, &msg) < 0) {
> > -        return 0;
> > +        return -1;
> >      }
> >  
> >      if (msg.request != request) {
> > --
> > 2.7.4
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-06-23  4:27   ` Michael S. Tsirkin
@ 2016-06-23  9:16     ` Marc-André Lureau
  2016-06-23 17:08       ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-23  9:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

Hi

----- Original Message -----
> On Tue, Jun 21, 2016 at 12:02:34PM +0200, marcandre.lureau@redhat.com wrote:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > Just some more error checking.
> > 
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Point being? Callers just ignore it afterwards ...

Callers do not always ignore it (and in general it should not, should it?), this helps breaking the execution at right moment, help debugging, code consistency, good practices etc (perhaps it's too obvious to me and I am missing something?)

> 
> 
> > ---
> >  hw/virtio/vhost-user.c | 44 +++++++++++++++++++++++++++++++-------------
> >  1 file changed, 31 insertions(+), 13 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 5dae496..e51df27 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -214,7 +214,9 @@ static int vhost_user_set_log_base(struct vhost_dev
> > *dev, uint64_t base,
> >          fds[fd_num++] = log->fd;
> >      }
> >  
> > -    vhost_user_write(dev, &msg, fds, fd_num);
> > +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > +        return -1;
> > +    }
> >  
> >      if (shmfd) {
> >          msg.size = 0;
> > @@ -275,7 +277,9 @@ static int vhost_user_set_mem_table(struct vhost_dev
> > *dev,
> >      msg.size += sizeof(msg.payload.memory.padding);
> >      msg.size += fd_num * sizeof(VhostUserMemoryRegion);
> >  
> > -    vhost_user_write(dev, &msg, fds, fd_num);
> > +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > +        return -1;
> > +    }
> >  
> >      return 0;
> >  }
> > @@ -290,7 +294,9 @@ static int vhost_user_set_vring_addr(struct vhost_dev
> > *dev,
> >          .size = sizeof(msg.payload.addr),
> >      };
> >  
> > -    vhost_user_write(dev, &msg, NULL, 0);
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        return -1;
> > +    }
> >  
> >      return 0;
> >  }
> > @@ -313,7 +319,9 @@ static int vhost_set_vring(struct vhost_dev *dev,
> >          .size = sizeof(msg.payload.state),
> >      };
> >  
> > -    vhost_user_write(dev, &msg, NULL, 0);
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        return -1;
> > +    }
> >  
> >      return 0;
> >  }
> > @@ -360,7 +368,9 @@ static int vhost_user_get_vring_base(struct vhost_dev
> > *dev,
> >          .size = sizeof(msg.payload.state),
> >      };
> >  
> > -    vhost_user_write(dev, &msg, NULL, 0);
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        return -1;
> > +    }
> >  
> >      if (vhost_user_read(dev, &msg) < 0) {
> >          return 0;
> > @@ -401,7 +411,9 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
> >          msg.payload.u64 |= VHOST_USER_VRING_NOFD_MASK;
> >      }
> >  
> > -    vhost_user_write(dev, &msg, fds, fd_num);
> > +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > +        return -1;
> > +    }
> >  
> >      return 0;
> >  }
> > @@ -427,7 +439,9 @@ static int vhost_user_set_u64(struct vhost_dev *dev,
> > int request, uint64_t u64)
> >          .size = sizeof(msg.payload.u64),
> >      };
> >  
> > -    vhost_user_write(dev, &msg, NULL, 0);
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        return -1;
> > +    }
> >  
> >      return 0;
> >  }
> > @@ -455,7 +469,9 @@ static int vhost_user_get_u64(struct vhost_dev *dev,
> > int request, uint64_t *u64)
> >          return 0;
> >      }
> >  
> > -    vhost_user_write(dev, &msg, NULL, 0);
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        return -1;
> > +    }
> >  
> >      if (vhost_user_read(dev, &msg) < 0) {
> >          return 0;
> > @@ -489,7 +505,9 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
> >          .flags = VHOST_USER_VERSION,
> >      };
> >  
> > -    vhost_user_write(dev, &msg, NULL, 0);
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        return -1;
> > +    }
> >  
> >      return 0;
> >  }
> > @@ -501,7 +519,9 @@ static int vhost_user_reset_device(struct vhost_dev
> > *dev)
> >          .flags = VHOST_USER_VERSION,
> >      };
> >  
> > -    vhost_user_write(dev, &msg, NULL, 0);
> > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > +        return -1;
> > +    }
> >  
> >      return 0;
> >  }
> > @@ -588,7 +608,6 @@ static bool vhost_user_requires_shm_log(struct
> > vhost_dev *dev)
> >  static int vhost_user_migration_done(struct vhost_dev *dev, char*
> >  mac_addr)
> >  {
> >      VhostUserMsg msg = { 0 };
> > -    int err;
> >  
> >      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
> >  
> > @@ -605,8 +624,7 @@ static int vhost_user_migration_done(struct vhost_dev
> > *dev, char* mac_addr)
> >          memcpy((char *)&msg.payload.u64, mac_addr, 6);
> >          msg.size = sizeof(msg.payload.u64);
> >  
> > -        err = vhost_user_write(dev, &msg, NULL, 0);
> > -        return err;
> > +        return vhost_user_write(dev, &msg, NULL, 0);
> >      }
> >      return -1;
> >  }
> > --
> > 2.7.4
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-23  4:27   ` Michael S. Tsirkin
@ 2016-06-23  9:19     ` Marc-André Lureau
  2016-06-23 17:13       ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-23  9:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

Hi

----- Original Message -----
> On Tue, Jun 21, 2016 at 12:02:33PM +0200, marcandre.lureau@redhat.com wrote:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > Calling a vhost operation may fail, especially with disconnectable
> > backends. Treat some that look harmless as recoverable errors (print
> > error, or ignore on error code path).
> > 
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> If it's ok and we can recover, then why should we print errors?

To me, the current disconnect handling is not handled cleanly. There is not much/nothing in the protocol that tells when and how you can disconnect. If qemu vhost-user reconnection works today, it is mostly by luck. Imho, we should print errors if any call to vhost user fails to help further analysis of broken behaviour. And we shoud have a clean mechanism to handle shutdown/disconnect/reconnect.

> 
> > ---
> >  hw/virtio/vhost.c | 32 +++++++++++++++++++++-----------
> >  1 file changed, 21 insertions(+), 11 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 81cc5b0..e2d1614 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -398,7 +398,10 @@ static inline void vhost_dev_log_resize(struct
> > vhost_dev *dev, uint64_t size)
> >      /* inform backend of log switching, this must be done before
> >         releasing the current log, to ensure no logging is lost */
> >      r = dev->vhost_ops->vhost_set_log_base(dev, log_base, log);
> > -    assert(r >= 0);
> > +    if (r < 0) {
> > +        error_report("Failed to change backend log");
> > +    }
> > +
> >      vhost_log_put(dev, true);
> >      dev->log = log;
> >      dev->log_size = size;
> > @@ -565,7 +568,9 @@ static void vhost_commit(MemoryListener *listener)
> >  
> >      if (!dev->log_enabled) {
> >          r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> > -        assert(r >= 0);
> > +        if (r < 0) {
> > +            error_report("Failed to set mem table");
> > +        }
> >          dev->memory_changed = false;
> >          return;
> >      }
> > @@ -578,7 +583,9 @@ static void vhost_commit(MemoryListener *listener)
> >          vhost_dev_log_resize(dev, log_size + VHOST_LOG_BUFFER);
> >      }
> >      r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> > -    assert(r >= 0);
> > +    if (r < 0) {
> > +        error_report("Failed to set mem table");
> > +    }
> >      /* To log less, can only decrease log size after table update. */
> >      if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
> >          vhost_dev_log_resize(dev, log_size);
> > @@ -647,6 +654,7 @@ static int vhost_virtqueue_set_addr(struct vhost_dev
> > *dev,
> >      };
> >      int r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
> >      if (r < 0) {
> > +        error_report("Failed to set vring addr");
> >          return -errno;
> >      }
> >      return 0;
> > @@ -660,12 +668,15 @@ static int vhost_dev_set_features(struct vhost_dev
> > *dev, bool enable_log)
> >          features |= 0x1ULL << VHOST_F_LOG_ALL;
> >      }
> >      r = dev->vhost_ops->vhost_set_features(dev, features);
> > +    if (r < 0) {
> > +        error_report("Failed to set features");
> > +    }
> >      return r < 0 ? -errno : 0;
> >  }
> >  
> >  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >  {
> > -    int r, t, i, idx;
> > +    int r, i, idx;
> >      r = vhost_dev_set_features(dev, enable_log);
> >      if (r < 0) {
> >          goto err_features;
> > @@ -682,12 +693,10 @@ static int vhost_dev_set_log(struct vhost_dev *dev,
> > bool enable_log)
> >  err_vq:
> >      for (; i >= 0; --i) {
> >          idx = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + i);
> > -        t = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> > -                                     dev->log_enabled);
> > -        assert(t >= 0);
> > +        vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> > +                                 dev->log_enabled);
> >      }
> > -    t = vhost_dev_set_features(dev, dev->log_enabled);
> > -    assert(t >= 0);
> > +    vhost_dev_set_features(dev, dev->log_enabled);
> >  err_features:
> >      return r;
> >  }
> > @@ -937,7 +946,6 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
> >          }
> >      }
> >  
> > -    assert (r >= 0);
> >      cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev,
> >      idx),
> >                                0, virtio_queue_get_ring_size(vdev, idx));
> >      cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev,
> >      idx),
> > @@ -1187,7 +1195,9 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev,
> > VirtIODevice *vdev, int n,
> >  
> >      file.index = hdev->vhost_ops->vhost_get_vq_index(hdev, n);
> >      r = hdev->vhost_ops->vhost_set_vring_call(hdev, &file);
> > -    assert(r >= 0);
> > +    if (r < 0) {
> > +        error_report("Failed to set vring call");
> > +    }
> >  }
> >  
> >  uint64_t vhost_get_features(struct vhost_dev *hdev, const int
> >  *feature_bits,
> > --
> > 2.7.4
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up
  2016-06-23  9:11     ` Marc-André Lureau
@ 2016-06-23 16:45       ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23 16:45 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

On Thu, Jun 23, 2016 at 05:11:37AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Tue, Jun 21, 2016 at 12:02:51PM +0200, marcandre.lureau@redhat.com wrote:
> > > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > 
> > > The chardev waits for an initial connection before starting qemu,
> > > vhost-user wants the backend negotiation to be completed. vhost-user is
> > > started in the net_vhost_user_event callback, which is synchronously
> > > called after the socket is connected.
> > > 
> > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > ---
> > >  net/vhost-user.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > index 95ed2d2..bb4a253 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -248,7 +248,15 @@ static int net_vhost_user_init(NetClientState *peer,
> > > const char *device,
> > >          s->chr = chr;
> > >      }
> > >  
> > > -    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event,
> > > nc[0].name);
> > > +    do {
> > > +        Error *err = NULL;
> > > +        if (qemu_chr_wait_connected(chr, &err) < 0) {
> > > +            error_report_err(err);
> > > +            return -1;
> > > +        }
> > > +        qemu_chr_add_handlers(chr, NULL, NULL,
> > > +                              net_vhost_user_event, nc[0].name);
> > > +    } while (nc->link_down);
> > >  
> > >      assert(s->vhost_net != NULL);
> > 
> > 
> > I don't get why this makes sense.
> > Why should vhost user poke at link down at all?
> 
> It should wait for an initial valid connection (see the test case). Only after vhost_user_start() should it keep going. We could have VhostUserState.ready set to true when done, or we could check after qmp_set_link(.., true) status, like I did here. Does that make sense?
> 
> thanks

I'd prefer something well contained. A dedicated flags sounds
reasonable.

	commit d39c87d70763f2755d1d7a719817b06f0281fb01
	Author: Wen Congyang <wency@cn.fujitsu.com>
	Date:   Wed Nov 11 14:53:29 2015 +0800

	    vhost-user: set link down when the char device is closed

was a quick hack. Disconnect should be transparent to guest and not
cause dhcp renegotiation and stuff.

The reason it's there is because we do not yet handle tx packets
when disconnected, so it reduces the chance of tx watchdog if we tell guest it
should not transmit anything new.

Once we fix that we will drop the link down thing, so don't rely on link
state please.

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-06-23  9:14     ` Marc-André Lureau
@ 2016-06-23 17:03       ` Michael S. Tsirkin
  2016-06-24 12:46         ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23 17:03 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

On Thu, Jun 23, 2016 at 05:14:04AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Tue, Jun 21, 2016 at 12:02:36PM +0200, marcandre.lureau@redhat.com wrote:
> > > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > 
> > > Return read errors (not sure why those were ignored)
> > > 
> > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > why bother?  So callers can just ignore them in turn?
> 
> The callers do not always ignore errors, fortunately (see
> vhost_user_init() for ex).

But that doesn't call set log base, right?

> That at least prevents flooding
> terminal/monitor with multiple errors and helps finding out the origin
> of the error.
> 
> > 
> > > ---
> > >  hw/virtio/vhost-user.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index e51df27..819481d 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -221,7 +221,7 @@ static int vhost_user_set_log_base(struct vhost_dev
> > > *dev, uint64_t base,
> > >      if (shmfd) {
> > >          msg.size = 0;
> > >          if (vhost_user_read(dev, &msg) < 0) {
> > > -            return 0;
> > > +            return -1;
> > >          }
> > >  
> > >          if (msg.request != VHOST_USER_SET_LOG_BASE) {
> > > @@ -373,7 +373,7 @@ static int vhost_user_get_vring_base(struct vhost_dev
> > > *dev,
> > >      }
> > >  
> > >      if (vhost_user_read(dev, &msg) < 0) {
> > > -        return 0;
> > > +        return -1;
> > >      }
> > >  
> > >      if (msg.request != VHOST_USER_GET_VRING_BASE) {
> > > @@ -474,7 +474,7 @@ static int vhost_user_get_u64(struct vhost_dev *dev,
> > > int request, uint64_t *u64)
> > >      }
> > >  
> > >      if (vhost_user_read(dev, &msg) < 0) {
> > > -        return 0;
> > > +        return -1;
> > >      }
> > >  
> > >      if (msg.request != request) {
> > > --
> > > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-06-23  9:16     ` Marc-André Lureau
@ 2016-06-23 17:08       ` Michael S. Tsirkin
  2016-06-24 12:49         ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23 17:08 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

On Thu, Jun 23, 2016 at 05:16:18AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Tue, Jun 21, 2016 at 12:02:34PM +0200, marcandre.lureau@redhat.com wrote:
> > > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > 
> > > Just some more error checking.
> > > 
> > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > Point being? Callers just ignore it afterwards ...
> 
> Callers do not always ignore it (and in general it should not, should it?), this helps breaking the execution at right moment, help debugging, code consistency, good practices etc (perhaps it's too obvious to me and I am missing something?)

Reporting error up the stack is helpful if it's handled in some way.
If we just keep guest going on this error, then we could
maybe log it for debug build but that's all.


> > 
> > 
> > > ---
> > >  hw/virtio/vhost-user.c | 44 +++++++++++++++++++++++++++++++-------------
> > >  1 file changed, 31 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 5dae496..e51df27 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -214,7 +214,9 @@ static int vhost_user_set_log_base(struct vhost_dev
> > > *dev, uint64_t base,
> > >          fds[fd_num++] = log->fd;
> > >      }
> > >  
> > > -    vhost_user_write(dev, &msg, fds, fd_num);
> > > +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      if (shmfd) {
> > >          msg.size = 0;
> > > @@ -275,7 +277,9 @@ static int vhost_user_set_mem_table(struct vhost_dev
> > > *dev,
> > >      msg.size += sizeof(msg.payload.memory.padding);
> > >      msg.size += fd_num * sizeof(VhostUserMemoryRegion);
> > >  
> > > -    vhost_user_write(dev, &msg, fds, fd_num);
> > > +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      return 0;
> > >  }
> > > @@ -290,7 +294,9 @@ static int vhost_user_set_vring_addr(struct vhost_dev
> > > *dev,
> > >          .size = sizeof(msg.payload.addr),
> > >      };
> > >  
> > > -    vhost_user_write(dev, &msg, NULL, 0);
> > > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      return 0;
> > >  }
> > > @@ -313,7 +319,9 @@ static int vhost_set_vring(struct vhost_dev *dev,
> > >          .size = sizeof(msg.payload.state),
> > >      };
> > >  
> > > -    vhost_user_write(dev, &msg, NULL, 0);
> > > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      return 0;
> > >  }
> > > @@ -360,7 +368,9 @@ static int vhost_user_get_vring_base(struct vhost_dev
> > > *dev,
> > >          .size = sizeof(msg.payload.state),
> > >      };
> > >  
> > > -    vhost_user_write(dev, &msg, NULL, 0);
> > > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      if (vhost_user_read(dev, &msg) < 0) {
> > >          return 0;
> > > @@ -401,7 +411,9 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
> > >          msg.payload.u64 |= VHOST_USER_VRING_NOFD_MASK;
> > >      }
> > >  
> > > -    vhost_user_write(dev, &msg, fds, fd_num);
> > > +    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      return 0;
> > >  }
> > > @@ -427,7 +439,9 @@ static int vhost_user_set_u64(struct vhost_dev *dev,
> > > int request, uint64_t u64)
> > >          .size = sizeof(msg.payload.u64),
> > >      };
> > >  
> > > -    vhost_user_write(dev, &msg, NULL, 0);
> > > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      return 0;
> > >  }
> > > @@ -455,7 +469,9 @@ static int vhost_user_get_u64(struct vhost_dev *dev,
> > > int request, uint64_t *u64)
> > >          return 0;
> > >      }
> > >  
> > > -    vhost_user_write(dev, &msg, NULL, 0);
> > > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      if (vhost_user_read(dev, &msg) < 0) {
> > >          return 0;
> > > @@ -489,7 +505,9 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
> > >          .flags = VHOST_USER_VERSION,
> > >      };
> > >  
> > > -    vhost_user_write(dev, &msg, NULL, 0);
> > > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      return 0;
> > >  }
> > > @@ -501,7 +519,9 @@ static int vhost_user_reset_device(struct vhost_dev
> > > *dev)
> > >          .flags = VHOST_USER_VERSION,
> > >      };
> > >  
> > > -    vhost_user_write(dev, &msg, NULL, 0);
> > > +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > +        return -1;
> > > +    }
> > >  
> > >      return 0;
> > >  }
> > > @@ -588,7 +608,6 @@ static bool vhost_user_requires_shm_log(struct
> > > vhost_dev *dev)
> > >  static int vhost_user_migration_done(struct vhost_dev *dev, char*
> > >  mac_addr)
> > >  {
> > >      VhostUserMsg msg = { 0 };
> > > -    int err;
> > >  
> > >      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
> > >  
> > > @@ -605,8 +624,7 @@ static int vhost_user_migration_done(struct vhost_dev
> > > *dev, char* mac_addr)
> > >          memcpy((char *)&msg.payload.u64, mac_addr, 6);
> > >          msg.size = sizeof(msg.payload.u64);
> > >  
> > > -        err = vhost_user_write(dev, &msg, NULL, 0);
> > > -        return err;
> > > +        return vhost_user_write(dev, &msg, NULL, 0);
> > >      }
> > >      return -1;
> > >  }
> > > --
> > > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-23  9:19     ` Marc-André Lureau
@ 2016-06-23 17:13       ` Michael S. Tsirkin
  2016-06-24 13:25         ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-23 17:13 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: marcandre lureau, qemu-devel, mukawa, yuanhan liu, victork, jonshin

On Thu, Jun 23, 2016 at 05:19:55AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Tue, Jun 21, 2016 at 12:02:33PM +0200, marcandre.lureau@redhat.com wrote:
> > > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > 
> > > Calling a vhost operation may fail, especially with disconnectable
> > > backends. Treat some that look harmless as recoverable errors (print
> > > error, or ignore on error code path).
> > > 
> > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > If it's ok and we can recover, then why should we print errors?
> 
> To me, the current disconnect handling is not handled cleanly. There
> is not much/nothing in the protocol that tells when and how you can
> disconnect. If qemu vhost-user reconnection works today, it is mostly
> by luck. Imho, we should print errors if any call to vhost user fails
> to help further analysis of broken behaviour.

But we decided disconnect is OK, did we not? So now a failure like this
is just expected. It's not broken behaviour anymore ...

> And we shoud have a
> clean mechanism to handle shutdown/disconnect/reconnect.


At some level this is something that's missing here.

How about we patch docs/specs/vhost-user.txt
and describe how is reconnection supposed to work?

All we have is:
	If Master is unable to send the full message or receives a wrong reply
	it will close the connection. An optional reconnection mechanism can be
	implemented.


> > 
> > > ---
> > >  hw/virtio/vhost.c | 32 +++++++++++++++++++++-----------
> > >  1 file changed, 21 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index 81cc5b0..e2d1614 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -398,7 +398,10 @@ static inline void vhost_dev_log_resize(struct
> > > vhost_dev *dev, uint64_t size)
> > >      /* inform backend of log switching, this must be done before
> > >         releasing the current log, to ensure no logging is lost */
> > >      r = dev->vhost_ops->vhost_set_log_base(dev, log_base, log);
> > > -    assert(r >= 0);
> > > +    if (r < 0) {
> > > +        error_report("Failed to change backend log");
> > > +    }
> > > +
> > >      vhost_log_put(dev, true);
> > >      dev->log = log;
> > >      dev->log_size = size;
> > > @@ -565,7 +568,9 @@ static void vhost_commit(MemoryListener *listener)
> > >  
> > >      if (!dev->log_enabled) {
> > >          r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> > > -        assert(r >= 0);
> > > +        if (r < 0) {
> > > +            error_report("Failed to set mem table");
> > > +        }
> > >          dev->memory_changed = false;
> > >          return;
> > >      }
> > > @@ -578,7 +583,9 @@ static void vhost_commit(MemoryListener *listener)
> > >          vhost_dev_log_resize(dev, log_size + VHOST_LOG_BUFFER);
> > >      }
> > >      r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> > > -    assert(r >= 0);
> > > +    if (r < 0) {
> > > +        error_report("Failed to set mem table");
> > > +    }
> > >      /* To log less, can only decrease log size after table update. */
> > >      if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
> > >          vhost_dev_log_resize(dev, log_size);
> > > @@ -647,6 +654,7 @@ static int vhost_virtqueue_set_addr(struct vhost_dev
> > > *dev,
> > >      };
> > >      int r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
> > >      if (r < 0) {
> > > +        error_report("Failed to set vring addr");
> > >          return -errno;
> > >      }
> > >      return 0;
> > > @@ -660,12 +668,15 @@ static int vhost_dev_set_features(struct vhost_dev
> > > *dev, bool enable_log)
> > >          features |= 0x1ULL << VHOST_F_LOG_ALL;
> > >      }
> > >      r = dev->vhost_ops->vhost_set_features(dev, features);
> > > +    if (r < 0) {
> > > +        error_report("Failed to set features");
> > > +    }
> > >      return r < 0 ? -errno : 0;
> > >  }
> > >  
> > >  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> > >  {
> > > -    int r, t, i, idx;
> > > +    int r, i, idx;
> > >      r = vhost_dev_set_features(dev, enable_log);
> > >      if (r < 0) {
> > >          goto err_features;
> > > @@ -682,12 +693,10 @@ static int vhost_dev_set_log(struct vhost_dev *dev,
> > > bool enable_log)
> > >  err_vq:
> > >      for (; i >= 0; --i) {
> > >          idx = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + i);
> > > -        t = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> > > -                                     dev->log_enabled);
> > > -        assert(t >= 0);
> > > +        vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> > > +                                 dev->log_enabled);
> > >      }
> > > -    t = vhost_dev_set_features(dev, dev->log_enabled);
> > > -    assert(t >= 0);
> > > +    vhost_dev_set_features(dev, dev->log_enabled);
> > >  err_features:
> > >      return r;
> > >  }
> > > @@ -937,7 +946,6 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
> > >          }
> > >      }
> > >  
> > > -    assert (r >= 0);
> > >      cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev,
> > >      idx),
> > >                                0, virtio_queue_get_ring_size(vdev, idx));
> > >      cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev,
> > >      idx),
> > > @@ -1187,7 +1195,9 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev,
> > > VirtIODevice *vdev, int n,
> > >  
> > >      file.index = hdev->vhost_ops->vhost_get_vq_index(hdev, n);
> > >      r = hdev->vhost_ops->vhost_set_vring_call(hdev, &file);
> > > -    assert(r >= 0);
> > > +    if (r < 0) {
> > > +        error_report("Failed to set vring call");
> > > +    }
> > >  }
> > >  
> > >  uint64_t vhost_get_features(struct vhost_dev *hdev, const int
> > >  *feature_bits,
> > > --
> > > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-06-23 17:03       ` Michael S. Tsirkin
@ 2016-06-24 12:46         ` Marc-André Lureau
  2016-07-04 15:47           ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-24 12:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

Hi

On Thu, Jun 23, 2016 at 7:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > why bother?  So callers can just ignore them in turn?
>>
>> The callers do not always ignore errors, fortunately (see
>> vhost_user_init() for ex).
>
> But that doesn't call set log base, right?

No, what is the point you are making?


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-06-23 17:08       ` Michael S. Tsirkin
@ 2016-06-24 12:49         ` Marc-André Lureau
  2016-07-04 15:46           ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-24 12:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

Hi

On Thu, Jun 23, 2016 at 7:08 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> Callers do not always ignore it (and in general it should not, should it?), this helps breaking the execution at right moment, help debugging, code consistency, good practices etc (perhaps it's too obvious to me and I am missing something?)
>
> Reporting error up the stack is helpful if it's handled in some way.
> If we just keep guest going on this error, then we could
> maybe log it for debug build but that's all.

Reporting up to guest somehow would be a good thing at some point, so
I think we should start from the bottom. vhost-user lacks error
handling, let's add it.

Regarding debug build messages, I don't think it's enough. As long as
we don't have an official supported way to handle disconnect. It's
better to report an error than be silent.


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
  2016-06-23  4:32 ` Michael S. Tsirkin
@ 2016-06-24 13:17   ` Marc-André Lureau
  0 siblings, 0 replies; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-24 13:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuya Mukawa, Yuanhan Liu, Victor Kaplansky, jonshin, QEMU

Hi

On Thu, Jun 23, 2016 at 6:32 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Can this be rarranged, first patches fixing the code paths that are wrong,
> then patches handling the asserts etc?

Ok, I'll try to improve ordering (minor fix, close(fd) fix,
assert/fprintf->error_report, check and return error,
vhost_dev_cleanup() fixes, keep vhost_net after a disconnect, wait
until connection is ready)


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
  2016-06-23  4:31 ` [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes Michael S. Tsirkin
@ 2016-06-24 13:22   ` Marc-André Lureau
  2016-06-24 22:19     ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-24 13:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuya Mukawa, Yuanhan Liu, Victor Kaplansky, jonshin, QEMU

Hi

On Thu, Jun 23, 2016 at 6:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> OK so if it's ok for read or write to fail, then I think
> the callback should just return success.
> This will make sure backends such as vhost net in kernel
> where failure indicates an internal bug are handled
> correctly, while vhost user ignores errors and attempts
> to go on.
>

That's what we are trying to do, but since it's not yet well defined
how to disconnect/reconnect and there are many bugs around this, it
would help a lot to add error checking and error reporting for now.

>> Since there is feature checks on reconnection, qemu should wait for
>> the initial connection feature negotiation to complete. The test added
>> demonstrates this.
>
>
> So on disconnect, we keep the guest going? But then when connect is
> attempted, then we for some reason block guest until negotiation
> is done? Sounds rather strange to me.

When qemu is the server, it already waits for an initial connection
from vhost-user backend. Here it is just enhanced to wait for the
initial nego to complete, which is necessary for proper device
initialization and also further reconnections checks.

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-23 17:13       ` Michael S. Tsirkin
@ 2016-06-24 13:25         ` Marc-André Lureau
  2016-06-24 22:38           ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-24 13:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > If it's ok and we can recover, then why should we print errors?
>>
>> To me, the current disconnect handling is not handled cleanly. There
>> is not much/nothing in the protocol that tells when and how you can
>> disconnect. If qemu vhost-user reconnection works today, it is mostly
>> by luck. Imho, we should print errors if any call to vhost user fails
>> to help further analysis of broken behaviour.
>
> But we decided disconnect is OK, did we not? So now a failure like this
> is just expected. It's not broken behaviour anymore ...

As you know, there are many ways qemu or the vm can break when the
backend is disconnected.  For now, it would help a lot if we had a bit
of error messages in this case. But do we really want to support
spurious disconnect/reconnect at any time? It's going to be
challenging, to maintain, to test... Is it really worthwhile? I doubt
it. I think it would be easier and more future-proof to have a
dedicated vhost-user request for that and only do a best effort for
the ungraceful disconnect.

>> And we shoud have a
>> clean mechanism to handle shutdown/disconnect/reconnect.
>
>
> At some level this is something that's missing here.
>
> How about we patch docs/specs/vhost-user.txt
> and describe how is reconnection supposed to work?
>
> All we have is:
>         If Master is unable to send the full message or receives a wrong reply
>         it will close the connection. An optional reconnection mechanism can be
>         implemented.

This text is there from the original commit but it doesn't say how to
reconnect and or how to recover from the ring. I don't have enough
experience with virtio in general to say when it is acceptable for
backend to be gone (not processing the ring), I can imagine the
implementation vary widely, and the requirements depend on the kind of
device.

If we limit the spec to vhost-user protocol only and leave it open on
how each virtio device should support ungraceful disconnect/reconnect,
then it sounds reasonable to document that after such disconnect, on
reconnect:
- the server assumes the backend has no knowledge of previous connection
- the state can be changed between reconnections (new ring state, mem table etc)
- the server will check feature compatibility with previous backend
and reject incompatible backends
- backend must restore ring processing from used->idx and ignore
VHOST_USER_SET_VRING_BASE (or should we change and document that in
this case the ring base is updated from used->idx?)

(it's probably not a complete list, I am not good at imagining all
possible cases)


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
  2016-06-24 13:22   ` Marc-André Lureau
@ 2016-06-24 22:19     ` Michael S. Tsirkin
  2016-06-27 12:37       ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-24 22:19 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Tetsuya Mukawa, Yuanhan Liu, Victor Kaplansky, jonshin, QEMU

On Fri, Jun 24, 2016 at 03:22:19PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Jun 23, 2016 at 6:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > OK so if it's ok for read or write to fail, then I think
> > the callback should just return success.
> > This will make sure backends such as vhost net in kernel
> > where failure indicates an internal bug are handled
> > correctly, while vhost user ignores errors and attempts
> > to go on.
> >
> 
> That's what we are trying to do, but since it's not yet well defined
> how to disconnect/reconnect and there are many bugs around this, it
> would help a lot to add error checking and error reporting for now.

Just for debugging?  OK, how about a macro with #ifdef DEBUG or
something like that then?


> >> Since there is feature checks on reconnection, qemu should wait for
> >> the initial connection feature negotiation to complete. The test added
> >> demonstrates this.
> >
> >
> > So on disconnect, we keep the guest going? But then when connect is
> > attempted, then we for some reason block guest until negotiation
> > is done? Sounds rather strange to me.
> 
> When qemu is the server, it already waits for an initial connection
> from vhost-user backend. Here it is just enhanced to wait for the
> initial nego to complete, which is necessary for proper device
> initialization and also further reconnections checks.

OH, it's just for the initial connection? I didn't realize.
Will have to re-read it to figure it out. Some comments in commit log
and code pointing this out could also help.

> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-24 13:25         ` Marc-André Lureau
@ 2016-06-24 22:38           ` Michael S. Tsirkin
  2016-06-27  9:01             ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-06-24 22:38 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Tetsuya Mukawa, yuanhan liu, Victor Kaplansky, jonshin, QEMU

On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote:
> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > If it's ok and we can recover, then why should we print errors?
> >>
> >> To me, the current disconnect handling is not handled cleanly. There
> >> is not much/nothing in the protocol that tells when and how you can
> >> disconnect. If qemu vhost-user reconnection works today, it is mostly
> >> by luck. Imho, we should print errors if any call to vhost user fails
> >> to help further analysis of broken behaviour.
> >
> > But we decided disconnect is OK, did we not? So now a failure like this
> > is just expected. It's not broken behaviour anymore ...
> 
> As you know, there are many ways qemu or the vm can break when the
> backend is disconnected.  For now, it would help a lot if we had a bit
> of error messages in this case. But do we really want to support
> spurious disconnect/reconnect at any time? It's going to be
> challenging, to maintain, to test... Is it really worthwhile? I doubt
> it. I think it would be easier and more future-proof to have a
> dedicated vhost-user request for that and only do a best effort for
> the ungraceful disconnect.

ungraceful might take a while. But I don't see a need for
message exchanges for shutdown. Do you?

> >> And we shoud have a
> >> clean mechanism to handle shutdown/disconnect/reconnect.
> >
> >
> > At some level this is something that's missing here.
> >
> > How about we patch docs/specs/vhost-user.txt
> > and describe how is reconnection supposed to work?
> >
> > All we have is:
> >         If Master is unable to send the full message or receives a wrong reply
> >         it will close the connection. An optional reconnection mechanism can be
> >         implemented.
> 
> This text is there from the original commit but it doesn't say how to
> reconnect and or how to recover from the ring. I don't have enough
> experience with virtio in general to say when it is acceptable for
> backend to be gone (not processing the ring), I can imagine the
> implementation vary widely, and the requirements depend on the kind of
> device.
> 
> If we limit the spec to vhost-user protocol only and leave it open on
> how each virtio device should support ungraceful disconnect/reconnect,
> then it sounds reasonable to document that after such disconnect, on
> reconnect:
> - the server assumes the backend has no knowledge of previous connection
> - the state can be changed between reconnections (new ring state, mem table etc)
> - the server will check feature compatibility with previous backend
> and reject incompatible backends
> - backend must restore ring processing from used->idx and ignore
> VHOST_USER_SET_VRING_BASE (or should we change and document that in
> this case the ring base is updated from used->idx?)

I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff
from used->idx.

> (it's probably not a complete list, I am not good at imagining all
> possible cases)

used ring must be flushed before disconnect (so all entries
before used->idx have been processed, and no entries
after used->idx have been processed).

If enabled, dirty log must be flushed before disconnect.


> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-24 22:38           ` Michael S. Tsirkin
@ 2016-06-27  9:01             ` Marc-André Lureau
  2016-07-04 15:43               ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-27  9:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuya Mukawa, yuanhan liu, Victor Kaplansky, jonshin, QEMU

Hi

On Sat, Jun 25, 2016 at 12:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote:
>> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > If it's ok and we can recover, then why should we print errors?
>> >>
>> >> To me, the current disconnect handling is not handled cleanly. There
>> >> is not much/nothing in the protocol that tells when and how you can
>> >> disconnect. If qemu vhost-user reconnection works today, it is mostly
>> >> by luck. Imho, we should print errors if any call to vhost user fails
>> >> to help further analysis of broken behaviour.
>> >
>> > But we decided disconnect is OK, did we not? So now a failure like this
>> > is just expected. It's not broken behaviour anymore ...
>>
>> As you know, there are many ways qemu or the vm can break when the
>> backend is disconnected.  For now, it would help a lot if we had a bit
>> of error messages in this case. But do we really want to support
>> spurious disconnect/reconnect at any time? It's going to be
>> challenging, to maintain, to test... Is it really worthwhile? I doubt
>> it. I think it would be easier and more future-proof to have a
>> dedicated vhost-user request for that and only do a best effort for
>> the ungraceful disconnect.
>
> ungraceful might take a while. But I don't see a need for
> message exchanges for shutdown. Do you?

A message exchange would allow the server to do something before
actually shuting down.: to check if shutdown is allowed/clean, to
flush some pending operations, to change device state, to request
something before shutdown (such as ring base etc),...

>> >> And we shoud have a
>> >> clean mechanism to handle shutdown/disconnect/reconnect.
>> >
>> >
>> > At some level this is something that's missing here.
>> >
>> > How about we patch docs/specs/vhost-user.txt
>> > and describe how is reconnection supposed to work?
>> >
>> > All we have is:
>> >         If Master is unable to send the full message or receives a wrong reply
>> >         it will close the connection. An optional reconnection mechanism can be
>> >         implemented.
>>
>> This text is there from the original commit but it doesn't say how to
>> reconnect and or how to recover from the ring. I don't have enough
>> experience with virtio in general to say when it is acceptable for
>> backend to be gone (not processing the ring), I can imagine the
>> implementation vary widely, and the requirements depend on the kind of
>> device.
>>
>> If we limit the spec to vhost-user protocol only and leave it open on
>> how each virtio device should support ungraceful disconnect/reconnect,
>> then it sounds reasonable to document that after such disconnect, on
>> reconnect:
>> - the server assumes the backend has no knowledge of previous connection
>> - the state can be changed between reconnections (new ring state, mem table etc)
>> - the server will check feature compatibility with previous backend
>> and reject incompatible backends
>> - backend must restore ring processing from used->idx and ignore
>> VHOST_USER_SET_VRING_BASE (or should we change and document that in
>> this case the ring base is updated from used->idx?)
>
> I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff
> from used->idx.

The problem with this approach is that it depends on how the backend
process the rings. Processing packets in order doesn't seem to be
required in general, or is it?

>> (it's probably not a complete list, I am not good at imagining all
>> possible cases)
>
> used ring must be flushed before disconnect (so all entries
> before used->idx have been processed, and no entries
> after used->idx have been processed).
> If enabled, dirty log must be flushed before disconnect.

Ok that sounds like a good requirement for "non-explicit graceful
shutdown" (how would you name it?) and seem to solve the processing
order problem. Yet, at this point, I think it would be easy for the
backend to do a proper shutdown request with all the benefits I
listed.


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes
  2016-06-24 22:19     ` Michael S. Tsirkin
@ 2016-06-27 12:37       ` Marc-André Lureau
  0 siblings, 0 replies; 60+ messages in thread
From: Marc-André Lureau @ 2016-06-27 12:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuya Mukawa, Yuanhan Liu, Victor Kaplansky, jonshin, QEMU

On Sat, Jun 25, 2016 at 12:19 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jun 23, 2016 at 6:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >
>> > OK so if it's ok for read or write to fail, then I think
>> > the callback should just return success.
>> > This will make sure backends such as vhost net in kernel
>> > where failure indicates an internal bug are handled
>> > correctly, while vhost user ignores errors and attempts
>> > to go on.
>> >
>>
>> That's what we are trying to do, but since it's not yet well defined
>> how to disconnect/reconnect and there are many bugs around this, it
>> would help a lot to add error checking and error reporting for now.
>
> Just for debugging?  OK, how about a macro with #ifdef DEBUG or
> something like that then?

As long as the disconnection is not well specified and tested, I would
rather see errors. I think we have a fundamental disagreement on how
to deal with disconnections and the previous series. To me, it is a
best effort so far (works if you are lucky), it is not the way
qemu/vhost-user should support disconnect, which I always though
should be handled by an initial request from the backend, something we
don't have yet.

>> >> Since there is feature checks on reconnection, qemu should wait for
>> >> the initial connection feature negotiation to complete. The test added
>> >> demonstrates this.
>> >
>> >
>> > So on disconnect, we keep the guest going? But then when connect is
>> > attempted, then we for some reason block guest until negotiation
>> > is done? Sounds rather strange to me.
>>
>> When qemu is the server, it already waits for an initial connection
>> from vhost-user backend. Here it is just enhanced to wait for the
>> initial nego to complete, which is necessary for proper device
>> initialization and also further reconnections checks.
>
> OH, it's just for the initial connection? I didn't realize.
> Will have to re-read it to figure it out. Some comments in commit log
> and code pointing this out could also help.

Sorry if the patch comments aren't clear, what about?


vhost-user: wait until backend init is completed

The chardev waits for an initial connection before starting qemu, and
vhost-user should wait for the backend negotiation to be completed
before starting qemu too.

vhost-user is started in the net_vhost_user_event callback, which is
synchronously called after the socket is connected. Use a
VhostUserState.started flag to indicate vhost-user init completed
successfully and qemu can be started.


 (feel free to suggest improvements)

thanks

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-06-27  9:01             ` Marc-André Lureau
@ 2016-07-04 15:43               ` Michael S. Tsirkin
  2016-07-06 13:47                 ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:43 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Tetsuya Mukawa, yuanhan liu, Victor Kaplansky, jonshin, QEMU

On Mon, Jun 27, 2016 at 11:01:42AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Sat, Jun 25, 2016 at 12:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote:
> >> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> > If it's ok and we can recover, then why should we print errors?
> >> >>
> >> >> To me, the current disconnect handling is not handled cleanly. There
> >> >> is not much/nothing in the protocol that tells when and how you can
> >> >> disconnect. If qemu vhost-user reconnection works today, it is mostly
> >> >> by luck. Imho, we should print errors if any call to vhost user fails
> >> >> to help further analysis of broken behaviour.
> >> >
> >> > But we decided disconnect is OK, did we not? So now a failure like this
> >> > is just expected. It's not broken behaviour anymore ...
> >>
> >> As you know, there are many ways qemu or the vm can break when the
> >> backend is disconnected.  For now, it would help a lot if we had a bit
> >> of error messages in this case. But do we really want to support
> >> spurious disconnect/reconnect at any time? It's going to be
> >> challenging, to maintain, to test... Is it really worthwhile? I doubt
> >> it. I think it would be easier and more future-proof to have a
> >> dedicated vhost-user request for that and only do a best effort for
> >> the ungraceful disconnect.
> >
> > ungraceful might take a while. But I don't see a need for
> > message exchanges for shutdown. Do you?
> 
> A message exchange would allow the server to do something before
> actually shuting down.: to check if shutdown is allowed/clean, to
> flush some pending operations, to change device state, to request
> something before shutdown (such as ring base etc),...

Who's the server here? It makes sense for backend to flush things,
but qemu doesn't seem to maintain too much state.

Could you be a bit more explicit?


I think it's ok to add a new message if it actually brings some
benefit, but I'm not sure why it makes sense to do it just in case.

> 
> >> >> And we shoud have a
> >> >> clean mechanism to handle shutdown/disconnect/reconnect.
> >> >
> >> >
> >> > At some level this is something that's missing here.
> >> >
> >> > How about we patch docs/specs/vhost-user.txt
> >> > and describe how is reconnection supposed to work?
> >> >
> >> > All we have is:
> >> >         If Master is unable to send the full message or receives a wrong reply
> >> >         it will close the connection. An optional reconnection mechanism can be
> >> >         implemented.
> >>
> >> This text is there from the original commit but it doesn't say how to
> >> reconnect and or how to recover from the ring. I don't have enough
> >> experience with virtio in general to say when it is acceptable for
> >> backend to be gone (not processing the ring), I can imagine the
> >> implementation vary widely, and the requirements depend on the kind of
> >> device.
> >>
> >> If we limit the spec to vhost-user protocol only and leave it open on
> >> how each virtio device should support ungraceful disconnect/reconnect,
> >> then it sounds reasonable to document that after such disconnect, on
> >> reconnect:
> >> - the server assumes the backend has no knowledge of previous connection
> >> - the state can be changed between reconnections (new ring state, mem table etc)
> >> - the server will check feature compatibility with previous backend
> >> and reject incompatible backends
> >> - backend must restore ring processing from used->idx and ignore
> >> VHOST_USER_SET_VRING_BASE (or should we change and document that in
> >> this case the ring base is updated from used->idx?)
> >
> > I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff
> > from used->idx.
> 
> The problem with this approach is that it depends on how the backend
> process the rings. Processing packets in order doesn't seem to be
> required in general, or is it?


The requirement is that packets are flushed before socket
is closed.

If they are in order then you can close at any point.

With out of order, you need to flush them before closing socket.




> >> (it's probably not a complete list, I am not good at imagining all
> >> possible cases)
> >
> > used ring must be flushed before disconnect (so all entries
> > before used->idx have been processed, and no entries
> > after used->idx have been processed).
> > If enabled, dirty log must be flushed before disconnect.
> 
> Ok that sounds like a good requirement for "non-explicit graceful
> shutdown" (how would you name it?) and seem to solve the processing
> order problem. Yet, at this point, I think it would be easy for the
> backend to do a proper shutdown request with all the benefits I
> listed.
> 
> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-06-24 12:49         ` Marc-André Lureau
@ 2016-07-04 15:46           ` Michael S. Tsirkin
  2016-07-04 22:01             ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:46 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Fri, Jun 24, 2016 at 02:49:22PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Jun 23, 2016 at 7:08 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>
> >> Callers do not always ignore it (and in general it should not, should it?), this helps breaking the execution at right moment, help debugging, code consistency, good practices etc (perhaps it's too obvious to me and I am missing something?)
> >
> > Reporting error up the stack is helpful if it's handled in some way.
> > If we just keep guest going on this error, then we could
> > maybe log it for debug build but that's all.
> 
> Reporting up to guest somehow would be a good thing at some point,

I'm not so sure. If backend will reconnect shortly, we can
handle everything transparently.


> so
> I think we should start from the bottom. vhost-user lacks error
> handling, let's add it.
> 
> Regarding debug build messages, I don't think it's enough. As long as
> we don't have an official supported way to handle disconnect. It's
> better to report an error than be silent.

Let's just work on handling it. If we need debug messages to help us
reach that goal fine. But I don't see many reasons to propagate
return codes back and forth if caller just prints and ignores it.
Print it where it's detected :)


> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-06-24 12:46         ` Marc-André Lureau
@ 2016-07-04 15:47           ` Michael S. Tsirkin
  2016-07-04 21:56             ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:47 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Fri, Jun 24, 2016 at 02:46:28PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Jun 23, 2016 at 7:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > why bother?  So callers can just ignore them in turn?
> >>
> >> The callers do not always ignore errors, fortunately (see
> >> vhost_user_init() for ex).
> >
> > But that doesn't call set log base, right?
> 
> No, what is the point you are making?

Why does vhost_user_set_log_base need to return error?
If backend is not there to handle this message,
then it is not changing memory so it's ok to ignore the error.

Same logic applies to many other messages.


> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-07-04 15:47           ` Michael S. Tsirkin
@ 2016-07-04 21:56             ` Marc-André Lureau
  2016-07-04 22:35               ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-07-04 21:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

Hi

On Mon, Jul 4, 2016 at 5:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> Why does vhost_user_set_log_base need to return error?
> If backend is not there to handle this message,
> then it is not changing memory so it's ok to ignore the error.

How do you know it's not changing the memory?

Furthermore, if the migration happened, it's because backend claim
VHOST_F_LOG_ALL, thus it should really not fail

> Same logic applies to many other messages.

Pretty much all messages, the error can't be ignored, or operations
will just fail silentely randomly. I don't understand why vhost-user
io error can be ignored. Also it's quite inconsistent the way the code
is today, vhost_user_write() returns an error that is mostly ignored,
while vhost_user_read() is checked. Why having an error later when you
can report it earlier? I fail to understand the rationale of this
error handling.


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-07-04 15:46           ` Michael S. Tsirkin
@ 2016-07-04 22:01             ` Marc-André Lureau
  2016-07-04 22:36               ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-07-04 22:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Mon, Jul 4, 2016 at 5:46 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> Let's just work on handling it. If we need debug messages to help us
> reach that goal fine. But I don't see many reasons to propagate
> return codes back and forth if caller just prints and ignores it.
> Print it where it's detected :)


Adding more debug messages is not the point of this patch though, it
is to break the code flow when an error occurs, not later. Sure we can
add error_report() in vhost_user_write() that would be more consistent
with vhost_user_read() actually.


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-07-04 21:56             ` Marc-André Lureau
@ 2016-07-04 22:35               ` Michael S. Tsirkin
  2016-07-05  9:18                 ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 22:35 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Mon, Jul 04, 2016 at 11:56:56PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Jul 4, 2016 at 5:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Why does vhost_user_set_log_base need to return error?
> > If backend is not there to handle this message,
> > then it is not changing memory so it's ok to ignore the error.
> 
> How do you know it's not changing the memory?

either it closed socket intentionally or it exited
and kernel cleaned up.

> Furthermore, if the migration happened, it's because backend claim
> VHOST_F_LOG_ALL, thus it should really not fail

I don't see why - could you explain pls?

> > Same logic applies to many other messages.
> 
> Pretty much all messages, the error can't be ignored, or operations
> will just fail silentely randomly. I don't understand why vhost-user
> io error can be ignored. Also it's quite inconsistent the way the code
> is today, vhost_user_write() returns an error that is mostly ignored,
> while vhost_user_read() is checked. Why having an error later when you
> can report it earlier? I fail to understand the rationale of this
> error handling.

It's historical.  the way I see it, most errors due to disconnect
can be ignored except
maybe for the initial feature negotiation which is needed
so we know what to tell guest.

Errors due to e.g. buffer being full should cause an assert
as it's an internal qemu error.


> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value
  2016-07-04 22:01             ` Marc-André Lureau
@ 2016-07-04 22:36               ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 22:36 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Tue, Jul 05, 2016 at 12:01:49AM +0200, Marc-André Lureau wrote:
> On Mon, Jul 4, 2016 at 5:46 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Let's just work on handling it. If we need debug messages to help us
> > reach that goal fine. But I don't see many reasons to propagate
> > return codes back and forth if caller just prints and ignores it.
> > Print it where it's detected :)
> 
> 
> Adding more debug messages is not the point of this patch though, it
> is to break the code flow when an error occurs, not later. Sure we can
> add error_report() in vhost_user_write() that would be more consistent
> with vhost_user_read() actually.


Yes but we should look for more ways to continue after an error,
not for ways to break the flow IMHO.


> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-07-04 22:35               ` Michael S. Tsirkin
@ 2016-07-05  9:18                 ` Marc-André Lureau
  2016-07-05 11:12                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-07-05  9:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

Hi

On Tue, Jul 5, 2016 at 12:35 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Jul 04, 2016 at 11:56:56PM +0200, Marc-André Lureau wrote:
>> Hi
>>
>> On Mon, Jul 4, 2016 at 5:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > Why does vhost_user_set_log_base need to return error?
>> > If backend is not there to handle this message,
>> > then it is not changing memory so it's ok to ignore the error.
>>
>> How do you know it's not changing the memory?
>
> either it closed socket intentionally or it exited
> and kernel cleaned up.

And if it closed intentionally during migration, we want to catch this
as a bug since it may still modify the memory

>> Furthermore, if the migration happened, it's because backend claim
>> VHOST_F_LOG_ALL, thus it should really not fail
>
> I don't see why - could you explain pls?

If the backend claims migration support, it shouldn't have bad
migration behaviour such as closing the vhost-user socket.

>> > Same logic applies to many other messages.
>>
>> Pretty much all messages, the error can't be ignored, or operations
>> will just fail silentely randomly. I don't understand why vhost-user
>> io error can be ignored. Also it's quite inconsistent the way the code
>> is today, vhost_user_write() returns an error that is mostly ignored,
>> while vhost_user_read() is checked. Why having an error later when you
>> can report it earlier? I fail to understand the rationale of this
>> error handling.
>
> It's historical.  the way I see it, most errors due to disconnect
> can be ignored except
> maybe for the initial feature negotiation which is needed
> so we know what to tell guest.

The way I see it is that errors should not be ignored because it makes
it harder to track what is going on.

> Errors due to e.g. buffer being full should cause an assert
> as it's an internal qemu error.

I don't see why qemu would be responsible for say, a suspended backend.

>
>
>>
>> --
>> Marc-André Lureau



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-07-05  9:18                 ` Marc-André Lureau
@ 2016-07-05 11:12                   ` Michael S. Tsirkin
  2016-07-06 13:40                     ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2016-07-05 11:12 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Tue, Jul 05, 2016 at 11:18:38AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Jul 5, 2016 at 12:35 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Jul 04, 2016 at 11:56:56PM +0200, Marc-André Lureau wrote:
> >> Hi
> >>
> >> On Mon, Jul 4, 2016 at 5:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > Why does vhost_user_set_log_base need to return error?
> >> > If backend is not there to handle this message,
> >> > then it is not changing memory so it's ok to ignore the error.
> >>
> >> How do you know it's not changing the memory?
> >
> > either it closed socket intentionally or it exited
> > and kernel cleaned up.
> 
> And if it closed intentionally during migration, we want to catch this
> as a bug since it may still modify the memory

You can't prevent backend bugs I think.


> >> Furthermore, if the migration happened, it's because backend claim
> >> VHOST_F_LOG_ALL, thus it should really not fail
> >
> > I don't see why - could you explain pls?
> 
> If the backend claims migration support, it shouldn't have bad
> migration behaviour such as closing the vhost-user socket.

But I don't see why it's bad. If it's not modifying memory then
it does not need to log any changes.

> >> > Same logic applies to many other messages.
> >>
> >> Pretty much all messages, the error can't be ignored, or operations
> >> will just fail silentely randomly. I don't understand why vhost-user
> >> io error can be ignored. Also it's quite inconsistent the way the code
> >> is today, vhost_user_write() returns an error that is mostly ignored,
> >> while vhost_user_read() is checked. Why having an error later when you
> >> can report it earlier? I fail to understand the rationale of this
> >> error handling.
> >
> > It's historical.  the way I see it, most errors due to disconnect
> > can be ignored except
> > maybe for the initial feature negotiation which is needed
> > so we know what to tell guest.
> 
> The way I see it is that errors should not be ignored because it makes
> it harder to track what is going on.
> 
> > Errors due to e.g. buffer being full should cause an assert
> > as it's an internal qemu error.
> 
> I don't see why qemu would be responsible for say, a suspended backend.

Protocol limits # of messages in flight so it should not trigger
even if backend is stuck.

> >
> >
> >>
> >> --
> >> Marc-André Lureau
> 
> 
> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-07-05 11:12                   ` Michael S. Tsirkin
@ 2016-07-06 13:40                     ` Marc-André Lureau
  2016-07-06 13:52                       ` Marc-André Lureau
  0 siblings, 1 reply; 60+ messages in thread
From: Marc-André Lureau @ 2016-07-06 13:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

Hi

On Tue, Jul 5, 2016 at 1:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Jul 05, 2016 at 11:18:38AM +0200, Marc-André Lureau wrote:
>> Hi
>>
>> On Tue, Jul 5, 2016 at 12:35 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Mon, Jul 04, 2016 at 11:56:56PM +0200, Marc-André Lureau wrote:
>> >> Hi
>> >>
>> >> On Mon, Jul 4, 2016 at 5:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > Why does vhost_user_set_log_base need to return error?
>> >> > If backend is not there to handle this message,
>> >> > then it is not changing memory so it's ok to ignore the error.
>> >>
>> >> How do you know it's not changing the memory?
>> >
>> > either it closed socket intentionally or it exited
>> > and kernel cleaned up.
>>
>> And if it closed intentionally during migration, we want to catch this
>> as a bug since it may still modify the memory
>
> You can't prevent backend bugs I think.

Right, but it's best to provide an error when you can detect backend bugs.

>> >> Furthermore, if the migration happened, it's because backend claim
>> >> VHOST_F_LOG_ALL, thus it should really not fail
>> >
>> > I don't see why - could you explain pls?
>>
>> If the backend claims migration support, it shouldn't have bad
>> migration behaviour such as closing the vhost-user socket.
>
> But I don't see why it's bad. If it's not modifying memory then
> it does not need to log any changes.

"if it's not modifying memory"...

I fail to understand why some code path check error code, and some
don't. Ignoring error and running further may lead to wrong
assumptions and later issues. I also fail to understand why providing
more useful error messages is bad. I feel quite strongly about having
more consistent error checking in vhost-user, I don't get why you
don't.



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
  2016-07-04 15:43               ` Michael S. Tsirkin
@ 2016-07-06 13:47                 ` Marc-André Lureau
  0 siblings, 0 replies; 60+ messages in thread
From: Marc-André Lureau @ 2016-07-06 13:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuya Mukawa, yuanhan liu, Victor Kaplansky, jonshin, QEMU

On Mon, Jul 4, 2016 at 5:43 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> I think it's ok to add a new message if it actually brings some
> benefit, but I'm not sure why it makes sense to do it just in case.

The main benefit today would be to have a single code path to handle
disconnection, not dozens. The experimental branch I have (I have not
touched it for a while, it's unfinished etc), does also check that the
backend support a stop state, changing the link status state for ex.
Anyway, it's not part of this series, but I believe it would be way
easier to support this method in the long run than any time backend
disconnect...

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error
  2016-07-06 13:40                     ` Marc-André Lureau
@ 2016-07-06 13:52                       ` Marc-André Lureau
  0 siblings, 0 replies; 60+ messages in thread
From: Marc-André Lureau @ 2016-07-06 13:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Gonglei
  Cc: yuanhan liu, Victor Kaplansky, QEMU, jonshin, Tetsuya Mukawa

On Wed, Jul 6, 2016 at 3:40 PM, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
> Hi
>
> On Tue, Jul 5, 2016 at 1:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Tue, Jul 05, 2016 at 11:18:38AM +0200, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Tue, Jul 5, 2016 at 12:35 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> > On Mon, Jul 04, 2016 at 11:56:56PM +0200, Marc-André Lureau wrote:
>>> >> Hi
>>> >>
>>> >> On Mon, Jul 4, 2016 at 5:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> >> > Why does vhost_user_set_log_base need to return error?
>>> >> > If backend is not there to handle this message,
>>> >> > then it is not changing memory so it's ok to ignore the error.
>>> >>
>>> >> How do you know it's not changing the memory?
>>> >
>>> > either it closed socket intentionally or it exited
>>> > and kernel cleaned up.
>>>
>>> And if it closed intentionally during migration, we want to catch this
>>> as a bug since it may still modify the memory
>>
>> You can't prevent backend bugs I think.
>
> Right, but it's best to provide an error when you can detect backend bugs.
>
>>> >> Furthermore, if the migration happened, it's because backend claim
>>> >> VHOST_F_LOG_ALL, thus it should really not fail
>>> >
>>> > I don't see why - could you explain pls?
>>>
>>> If the backend claims migration support, it shouldn't have bad
>>> migration behaviour such as closing the vhost-user socket.
>>
>> But I don't see why it's bad. If it's not modifying memory then
>> it does not need to log any changes.
>
> "if it's not modifying memory"...
>
> I fail to understand why some code path check error code, and some
> don't. Ignoring error and running further may lead to wrong
> assumptions and later issues. I also fail to understand why providing
> more useful error messages is bad. I feel quite strongly about having
> more consistent error checking in vhost-user, I don't get why you
> don't.
>

Fwiw, Gonglei came up with the same patch in "vhost-user: fix
unreasonable return value when vhost-user read failed" and he gave
extra reasons for it that are hard to deny.



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-07-06 13:52 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 01/24] misc: indentation marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 02/24] vhost-user: minor simplification marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 03/24] qemu-char: check socket is actually connected marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 04/24] vhost-user: check qemu_chr_fe_set_msgfds() return value marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail marcandre.lureau
2016-06-23  4:27   ` Michael S. Tsirkin
2016-06-23  9:19     ` Marc-André Lureau
2016-06-23 17:13       ` Michael S. Tsirkin
2016-06-24 13:25         ` Marc-André Lureau
2016-06-24 22:38           ` Michael S. Tsirkin
2016-06-27  9:01             ` Marc-André Lureau
2016-07-04 15:43               ` Michael S. Tsirkin
2016-07-06 13:47                 ` Marc-André Lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value marcandre.lureau
2016-06-23  4:27   ` Michael S. Tsirkin
2016-06-23  9:16     ` Marc-André Lureau
2016-06-23 17:08       ` Michael S. Tsirkin
2016-06-24 12:49         ` Marc-André Lureau
2016-07-04 15:46           ` Michael S. Tsirkin
2016-07-04 22:01             ` Marc-André Lureau
2016-07-04 22:36               ` Michael S. Tsirkin
2016-06-21 10:02 ` [Qemu-devel] [PATCH 07/24] vhost: use error_report() instead of fprintf(stderr, ...) marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 08/24] vhost-user: return a read error marcandre.lureau
2016-06-23  4:27   ` Michael S. Tsirkin
2016-06-23  9:14     ` Marc-André Lureau
2016-06-23 17:03       ` Michael S. Tsirkin
2016-06-24 12:46         ` Marc-André Lureau
2016-07-04 15:47           ` Michael S. Tsirkin
2016-07-04 21:56             ` Marc-André Lureau
2016-07-04 22:35               ` Michael S. Tsirkin
2016-07-05  9:18                 ` Marc-André Lureau
2016-07-05 11:12                   ` Michael S. Tsirkin
2016-07-06 13:40                     ` Marc-André Lureau
2016-07-06 13:52                       ` Marc-André Lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 09/24] vhost: make vhost_log_put() idempotent marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 10/24] vhost: call vhost_log_put() on cleanup marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 11/24] vhost: add vhost device only after all success marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 12/24] vhost: make vhost_dev_cleanup() idempotent marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 13/24] vhost-net: always call vhost_dev_cleanup() on failure marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 14/24] vhost: don't assume opaque is a fd, use backend cleanup marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 15/24] vhost: fix calling vhost_dev_cleanup() after vhost_dev_init() marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 16/24] vhost-user: keep vhost_net after a disconnection marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 17/24] Revert "vhost-net: do not crash if backend is not present" marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 18/24] get_vhost_net() should be != null after vhost_user_init marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 19/24] vhost-net: success if backend has no ops->vhost_migration_done marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 20/24] vhost: add assert() to check runtime behaviour marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 21/24] char: add chr_wait_connected callback marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 22/24] char: add and use tcp_chr_wait_connected marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up marcandre.lureau
2016-06-23  4:25   ` Michael S. Tsirkin
2016-06-23  9:11     ` Marc-André Lureau
2016-06-23 16:45       ` Michael S. Tsirkin
2016-06-21 10:02 ` [Qemu-devel] [PATCH 24/24] tests: add /vhost-user/connect-fail test marcandre.lureau
2016-06-23  4:31 ` [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes Michael S. Tsirkin
2016-06-24 13:22   ` Marc-André Lureau
2016-06-24 22:19     ` Michael S. Tsirkin
2016-06-27 12:37       ` Marc-André Lureau
2016-06-23  4:32 ` Michael S. Tsirkin
2016-06-24 13:17   ` Marc-André Lureau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.