All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
@ 2018-12-06  6:35 elohimes
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode elohimes
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: elohimes @ 2018-12-06  6:35 UTC (permalink / raw)
  To: mst, marcandre.lureau
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patchset is aimed at supporting qemu to reconnect
vhost-user-blk backend after vhost-user-blk backend crash or
restart.

The patch 1 tries to implenment the sync connection for
"reconnect socket".

The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
to support offering shared memory to backend to record
its inflight I/O.

The patch 3,4 are the corresponding libvhost-user patches of
patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.

The patch 5 supports vhost-user-blk to reconnect backend when
connection closed.

The patch 6 tells qemu that we support reconnecting now.

To use it, we could start qemu with:

qemu-system-x86_64 \
        -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
        -device vhost-user-blk-pci,chardev=char0 \

and start vhost-user-blk backend with:

vhost-user-blk -b /path/file -s /path/vhost.socket

Then we can restart vhost-user-blk at any time during VM running.

Xie Yongji (6):
  char-socket: Enable "wait" option for client mode
  vhost-user: Add shared memory to record inflight I/O
  libvhost-user: Introduce vu_queue_map_desc()
  libvhost-user: Support recording inflight I/O in shared memory
  vhost-user-blk: Add support for reconnecting backend
  contrib/vhost-user-blk: enable inflight I/O recording

 chardev/char-socket.c                   |   5 +-
 contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
 contrib/libvhost-user/libvhost-user.h   |  19 +++
 contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
 hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
 hw/virtio/vhost-user.c                  |  69 ++++++++
 hw/virtio/vhost.c                       |   8 +
 include/hw/virtio/vhost-backend.h       |   4 +
 include/hw/virtio/vhost-user-blk.h      |   4 +
 include/hw/virtio/vhost-user.h          |   8 +
 10 files changed, 452 insertions(+), 52 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
@ 2018-12-06  6:35 ` elohimes
  2018-12-06  7:23   ` Marc-André Lureau
  2018-12-06  9:31   ` Yury Kotov
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O elohimes
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: elohimes @ 2018-12-06  6:35 UTC (permalink / raw)
  To: mst, marcandre.lureau
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Now we attempt to connect asynchronously for "reconnect socket"
during open(). But vhost-user device prefer a connected socket
during initialization. That means we may still need to support
sync connection during open() for the "reconnect socket".

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 chardev/char-socket.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index eaa8e8b68f..f2819d52e7 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -1072,7 +1072,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
         s->reconnect_time = reconnect;
     }
 
-    if (s->reconnect_time) {
+    if (s->reconnect_time && !is_waitconnect) {
         tcp_chr_connect_async(chr);
     } else {
         if (s->is_listen) {
@@ -1120,7 +1120,8 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
                                   Error **errp)
 {
     bool is_listen      = qemu_opt_get_bool(opts, "server", false);
-    bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
+    bool is_waitconnect = is_listen ? qemu_opt_get_bool(opts, "wait", true) :
+                          qemu_opt_get_bool(opts, "wait", false);
     bool is_telnet      = qemu_opt_get_bool(opts, "telnet", false);
     bool is_tn3270      = qemu_opt_get_bool(opts, "tn3270", false);
     bool is_websock     = qemu_opt_get_bool(opts, "websocket", false);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode elohimes
@ 2018-12-06  6:35 ` elohimes
  2018-12-06  7:19   ` Marc-André Lureau
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 3/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: elohimes @ 2018-12-06  6:35 UTC (permalink / raw)
  To: mst, marcandre.lureau
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This introduces a new message VHOST_USER_SET_VRING_INFLIGHT
to support offering shared memory to backend to record
its inflight I/O.

With this new message, the backend is able to restart without
missing I/O which would cause I/O hung for block device.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/virtio/vhost-user.c            | 69 +++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  8 ++++
 include/hw/virtio/vhost-backend.h |  4 ++
 include/hw/virtio/vhost-user.h    |  8 ++++
 4 files changed, 89 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e09bed0e4a..4c0e64891d 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -19,6 +19,7 @@
 #include "sysemu/kvm.h"
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
+#include "qemu/memfd.h"
 #include "sysemu/cryptodev.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
@@ -52,6 +53,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -89,6 +91,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_SET_VRING_INFLIGHT = 31,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -147,6 +150,11 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserVringInflight {
+    uint32_t size;
+    uint32_t idx;
+} VhostUserVringInflight;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -169,6 +177,7 @@ typedef union {
         VhostUserConfig config;
         VhostUserCryptoSession session;
         VhostUserVringArea area;
+        VhostUserVringInflight inflight;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1739,6 +1748,58 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
     return result;
 }
 
+static int vhost_user_set_vring_inflight(struct vhost_dev *dev, int idx)
+{
+    struct vhost_user *u = dev->opaque;
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
+        return 0;
+    }
+
+    if (!u->user->inflight[idx].addr) {
+        Error *err = NULL;
+
+        u->user->inflight[idx].size = qemu_real_host_page_size;
+        u->user->inflight[idx].addr = qemu_memfd_alloc("vhost-inflight",
+                                      u->user->inflight[idx].size,
+                                      F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
+                                      &u->user->inflight[idx].fd, &err);
+        if (err) {
+            error_report_err(err);
+            u->user->inflight[idx].addr = NULL;
+            return -1;
+        }
+    }
+
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_SET_VRING_INFLIGHT,
+        .hdr.flags = VHOST_USER_VERSION,
+        .payload.inflight.size = u->user->inflight[idx].size,
+        .payload.inflight.idx = idx,
+        .hdr.size = sizeof(msg.payload.inflight),
+    };
+
+    if (vhost_user_write(dev, &msg, &u->user->inflight[idx].fd, 1) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+void vhost_user_inflight_reset(VhostUserState *user)
+{
+    int i;
+
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        if (!user->inflight[i].addr) {
+            continue;
+        }
+
+        memset(user->inflight[i].addr, 0, user->inflight[i].size);
+    }
+}
+
 VhostUserState *vhost_user_init(void)
 {
     VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1756,6 +1817,13 @@ void vhost_user_cleanup(VhostUserState *user)
             munmap(user->notifier[i].addr, qemu_real_host_page_size);
             user->notifier[i].addr = NULL;
         }
+
+        if (user->inflight[i].addr) {
+            munmap(user->inflight[i].addr, user->inflight[i].size);
+            user->inflight[i].addr = NULL;
+            close(user->inflight[i].fd);
+            user->inflight[i].fd = -1;
+        }
     }
 }
 
@@ -1790,4 +1858,5 @@ const VhostOps user_ops = {
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
         .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
+        .vhost_set_vring_inflight = vhost_user_set_vring_inflight,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 569c4053ea..2ca7b4e841 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -973,6 +973,14 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
         return -errno;
     }
 
+    if (dev->vhost_ops->vhost_set_vring_inflight) {
+        r = dev->vhost_ops->vhost_set_vring_inflight(dev, vhost_vq_index);
+        if (r) {
+            VHOST_OPS_DEBUG("vhost_set_vring_inflight failed");
+            return -errno;
+        }
+    }
+
     state.num = virtio_queue_get_last_avail_idx(vdev, idx);
     r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
     if (r) {
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81283ec50f..8110e09089 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -104,6 +104,9 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
 typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
                                                 MemoryRegionSection *section);
 
+typedef int (*vhost_set_vring_inflight_op)(struct vhost_dev *dev,
+                                           int idx);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -142,6 +145,7 @@ typedef struct VhostOps {
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
     vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
+    vhost_set_vring_inflight_op vhost_set_vring_inflight;
 } VhostOps;
 
 extern const VhostOps user_ops;
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index fd660393a0..ff13433153 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -17,11 +17,19 @@ typedef struct VhostUserHostNotifier {
     bool set;
 } VhostUserHostNotifier;
 
+typedef struct VhostUserInflight {
+    void *addr;
+    uint32_t size;
+    int fd;
+} VhostUserInflight;
+
 typedef struct VhostUserState {
     CharBackend *chr;
     VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
+    VhostUserInflight inflight[VIRTIO_QUEUE_MAX];
 } VhostUserState;
 
+void vhost_user_inflight_reset(VhostUserState *user);
 VhostUserState *vhost_user_init(void);
 void vhost_user_cleanup(VhostUserState *user);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH for-4.0 3/6] libvhost-user: Introduce vu_queue_map_desc()
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode elohimes
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O elohimes
@ 2018-12-06  6:35 ` elohimes
  2018-12-06  7:16   ` Marc-André Lureau
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 4/6] libvhost-user: Support recording inflight I/O in shared memory elohimes
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: elohimes @ 2018-12-06  6:35 UTC (permalink / raw)
  To: mst, marcandre.lureau
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Introduce vu_queue_map_desc() which should be
independent with vu_queue_pop();

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/libvhost-user/libvhost-user.c | 86 +++++++++++++++------------
 1 file changed, 49 insertions(+), 37 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index a6b46cdc03..4432bd8bb4 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -1853,49 +1853,20 @@ virtqueue_alloc_element(size_t sz,
     return elem;
 }
 
-void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+static void *
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
 {
-    unsigned int i, head, max, desc_len;
+    struct vring_desc *desc = vq->vring.desc;
     uint64_t desc_addr, read_len;
+    unsigned int desc_len;
+    unsigned int max = vq->vring.num;
+    unsigned int i = idx;
     VuVirtqElement *elem;
-    unsigned out_num, in_num;
+    unsigned int out_num = 0, in_num = 0;
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
     struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE];
-    struct vring_desc *desc;
     int rc;
 
-    if (unlikely(dev->broken) ||
-        unlikely(!vq->vring.avail)) {
-        return NULL;
-    }
-
-    if (vu_queue_empty(dev, vq)) {
-        return NULL;
-    }
-    /* Needed after virtio_queue_empty(), see comment in
-     * virtqueue_num_heads(). */
-    smp_rmb();
-
-    /* When we start there are none of either input nor output. */
-    out_num = in_num = 0;
-
-    max = vq->vring.num;
-    if (vq->inuse >= vq->vring.num) {
-        vu_panic(dev, "Virtqueue size exceeded");
-        return NULL;
-    }
-
-    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
-        return NULL;
-    }
-
-    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
-        vring_set_avail_event(vq, vq->last_avail_idx);
-    }
-
-    i = head;
-    desc = vq->vring.desc;
     if (desc[i].flags & VRING_DESC_F_INDIRECT) {
         if (desc[i].len % sizeof(struct vring_desc)) {
             vu_panic(dev, "Invalid size for indirect buffer table");
@@ -1947,12 +1918,13 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
+        vu_panic(dev, "read descriptor error");
         return NULL;
     }
 
     /* Now copy what we have collected and mapped */
     elem = virtqueue_alloc_element(sz, out_num, in_num);
-    elem->index = head;
+    elem->index = idx;
     for (i = 0; i < out_num; i++) {
         elem->out_sg[i] = iov[i];
     }
@@ -1960,6 +1932,46 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         elem->in_sg[i] = iov[out_num + i];
     }
 
+    return elem;
+}
+
+void *
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+{
+    unsigned int head;
+    VuVirtqElement *elem;
+
+    if (unlikely(dev->broken) ||
+        unlikely(!vq->vring.avail)) {
+        return NULL;
+    }
+
+    if (vu_queue_empty(dev, vq)) {
+        return NULL;
+    }
+    /* Needed after virtio_queue_empty(), see comment in
+     * virtqueue_num_heads(). */
+    smp_rmb();
+
+    if (vq->inuse >= vq->vring.num) {
+        vu_panic(dev, "Virtqueue size exceeded");
+        return NULL;
+    }
+
+    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
+        return NULL;
+    }
+
+    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
+        vring_set_avail_event(vq, vq->last_avail_idx);
+    }
+
+    elem = vu_queue_map_desc(dev, vq, head, sz);
+
+    if (!elem) {
+        return NULL;
+    }
+
     vq->inuse++;
 
     return elem;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH for-4.0 4/6] libvhost-user: Support recording inflight I/O in shared memory
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (2 preceding siblings ...)
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 3/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
@ 2018-12-06  6:35 ` elohimes
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend elohimes
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: elohimes @ 2018-12-06  6:35 UTC (permalink / raw)
  To: mst, marcandre.lureau
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch adds support for VHOST_USER_SET_VRING_INFLIGHT
message. Now we maintain a "bitmap" of all descriptors in
the shared memory for each queue. Then set it in vu_queue_pop()
and clear it in vu_queue_push();

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/libvhost-user/libvhost-user.c | 129 ++++++++++++++++++++++++++
 contrib/libvhost-user/libvhost-user.h |  19 ++++
 2 files changed, 148 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 4432bd8bb4..38ef1f5898 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -100,6 +100,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_POSTCOPY_LISTEN),
         REQ(VHOST_USER_POSTCOPY_END),
+        REQ(VHOST_USER_SET_VRING_INFLIGHT),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -890,6 +891,41 @@ vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static int
+vu_check_queue_inflights(VuDev *dev, VuVirtq *vq)
+{
+    int i = 0;
+
+    if ((dev->protocol_features &
+         VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->inflight.addr)) {
+        return -1;
+    }
+
+    vq->used_idx = vq->vring.used->idx;
+    vq->inflight_num = 0;
+    for (i = 0; i < vq->vring.num; i++) {
+        if (vq->inflight.addr[i] == 0) {
+            continue;
+        }
+
+        vq->inflight_desc[vq->inflight_num++] = i;
+        vq->inuse++;
+    }
+    vq->shadow_avail_idx = vq->last_avail_idx = vq->inuse + vq->used_idx;
+
+    /* in case of I/O hang after reconnecting */
+    if (eventfd_write(vq->kick_fd, 1) ||
+        eventfd_write(vq->call_fd, 1)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 static bool
 vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -925,6 +961,10 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
                dev->vq[index].kick_fd, index);
     }
 
+    if (vu_check_queue_inflights(dev, &dev->vq[index])) {
+        vu_panic(dev, "Failed to check inflights for vq: %d\n", index);
+    }
+
     return false;
 }
 
@@ -1215,6 +1255,44 @@ vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static bool
+vu_set_vring_inflight(VuDev *dev, VhostUserMsg *vmsg)
+{
+    int fd;
+    uint32_t size, idx;
+    void *rc;
+
+    if (vmsg->fd_num != 1 ||
+        vmsg->size != sizeof(vmsg->payload.inflight)) {
+        vu_panic(dev, "Invalid vring_inflight message size:%d fds:%d",
+                 vmsg->size, vmsg->fd_num);
+        return false;
+    }
+
+    fd = vmsg->fds[0];
+    idx = vmsg->payload.inflight.idx;
+    size = vmsg->payload.inflight.size;
+    DPRINT("vring_inflight idx: %"PRId32"\n", idx);
+    DPRINT("vring_inflight size: %"PRId32"\n", size);
+
+    rc = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+    close(fd);
+
+    if (rc == MAP_FAILED) {
+        vu_panic(dev, "vring_inflight mmap error: %s", strerror(errno));
+        return false;
+    }
+
+    if (dev->vq[idx].inflight.addr) {
+        munmap(dev->vq[idx].inflight.addr, dev->vq[idx].inflight.size);
+    }
+    dev->vq[idx].inflight.addr = (char *)rc;
+    dev->vq[idx].inflight.size = size;
+
+    return false;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1292,6 +1370,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_postcopy_listen(dev, vmsg);
     case VHOST_USER_POSTCOPY_END:
         return vu_set_postcopy_end(dev, vmsg);
+    case VHOST_USER_SET_VRING_INFLIGHT:
+        return vu_set_vring_inflight(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
@@ -1359,6 +1439,11 @@ vu_deinit(VuDev *dev)
             close(vq->err_fd);
             vq->err_fd = -1;
         }
+
+        if (vq->inflight.addr) {
+            munmap(vq->inflight.addr, vq->inflight.size);
+            vq->inflight.addr = NULL;
+        }
     }
 
 
@@ -1935,9 +2020,44 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
     return elem;
 }
 
+static int
+vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if ((dev->protocol_features &
+         VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->inflight.addr)) {
+        return -1;
+    }
+
+    vq->inflight.addr[desc_idx] = 1;
+
+    return 0;
+}
+
+static int
+vu_queue_inflight_put(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if ((dev->protocol_features &
+         VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->inflight.addr)) {
+        return -1;
+    }
+
+    vq->inflight.addr[desc_idx] = 0;
+
+    return 0;
+}
+
 void *
 vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 {
+    int i;
     unsigned int head;
     VuVirtqElement *elem;
 
@@ -1946,6 +2066,12 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         return NULL;
     }
 
+    if (unlikely(vq->inflight_num > 0)) {
+        i = (--vq->inflight_num);
+        elem = vu_queue_map_desc(dev, vq, vq->inflight_desc[i], sz);
+        return elem;
+    }
+
     if (vu_queue_empty(dev, vq)) {
         return NULL;
     }
@@ -1974,6 +2100,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 
     vq->inuse++;
 
+    vu_queue_inflight_get(dev, vq, head);
+
     return elem;
 }
 
@@ -2119,4 +2247,5 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
 {
     vu_queue_fill(dev, vq, elem, len, 0);
     vu_queue_flush(dev, vq, 1);
+    vu_queue_inflight_put(dev, vq, elem->index);
 }
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 4aa55b4d2d..2b0d14fd41 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -91,6 +92,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_SET_VRING_INFLIGHT = 31,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -138,6 +140,11 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserVringInflight {
+    uint32_t size;
+    uint32_t idx;
+} VhostUserVringInflight;
+
 #if defined(_WIN32)
 # define VU_PACKED __attribute__((gcc_struct, packed))
 #else
@@ -163,6 +170,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         VhostUserConfig config;
         VhostUserVringArea area;
+        VhostUserVringInflight inflight;
     } payload;
 
     int fds[VHOST_MEMORY_MAX_NREGIONS];
@@ -234,9 +242,20 @@ typedef struct VuRing {
     uint32_t flags;
 } VuRing;
 
+typedef struct VuInflight {
+    char *addr;
+    uint32_t size;
+} VuInflight;
+
 typedef struct VuVirtq {
     VuRing vring;
 
+    VuInflight inflight;
+
+    uint16_t inflight_desc[VIRTQUEUE_MAX_SIZE];
+
+    uint16_t inflight_num;
+
     /* Next head to pop */
     uint16_t last_avail_idx;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (3 preceding siblings ...)
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 4/6] libvhost-user: Support recording inflight I/O in shared memory elohimes
@ 2018-12-06  6:35 ` elohimes
  2018-12-06 12:21   ` Yury Kotov
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 6/6] contrib/vhost-user-blk: enable inflight I/O recording elohimes
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: elohimes @ 2018-12-06  6:35 UTC (permalink / raw)
  To: mst, marcandre.lureau
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Since the new message VHOST_USER_SET_VRING_INFLIGHT,
the backend is able to restart safely. This patch
allow qemu to reconnect the backend after connection
closed.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Ni Xun <nixun@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/block/vhost-user-blk.c          | 169 +++++++++++++++++++++++++++--
 include/hw/virtio/vhost-user-blk.h |   4 +
 2 files changed, 161 insertions(+), 12 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 1451940845..663e91bcf6 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
     .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
 };
 
-static void vhost_user_blk_start(VirtIODevice *vdev)
+static int vhost_user_blk_start(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
@@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
 
     if (!k->set_guest_notifiers) {
         error_report("binding does not support guest notifiers");
-        return;
+        return -ENOSYS;
     }
 
     ret = vhost_dev_enable_notifiers(&s->dev, vdev);
     if (ret < 0) {
         error_report("Error enabling host notifiers: %d", -ret);
-        return;
+        return ret;
     }
 
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
@@ -140,12 +140,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
         vhost_virtqueue_mask(&s->dev, vdev, i, false);
     }
 
-    return;
+    return ret;
 
 err_guest_notifiers:
     k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&s->dev, vdev);
+    return ret;
 }
 
 static void vhost_user_blk_stop(VirtIODevice *vdev)
@@ -164,7 +165,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
-        return;
     }
 
     vhost_dev_disable_notifiers(&s->dev, vdev);
@@ -174,21 +174,39 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
+    int ret;
 
     if (!vdev->vm_running) {
         should_start = false;
     }
 
-    if (s->dev.started == should_start) {
+    if (s->should_start == should_start) {
+        return;
+    }
+
+    if (!s->connected || s->dev.started == should_start) {
+        s->should_start = should_start;
         return;
     }
 
     if (should_start) {
-        vhost_user_blk_start(vdev);
+        s->should_start = true;
+        /* make sure we ignore fake guest kick by
+         * vhost_dev_enable_notifiers() */
+        barrier();
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            qemu_chr_fe_disconnect(&s->chardev);
+        }
     } else {
         vhost_user_blk_stop(vdev);
+        /* make sure we ignore fake guest kick by
+         * vhost_dev_disable_notifiers() */
+        barrier();
+        s->should_start = false;
     }
-
 }
 
 static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
@@ -218,13 +236,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
 static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    int i;
+    int i, ret;
 
     if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
         !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
         return;
     }
 
+    if (s->should_start) {
+        return;
+    }
+    s->should_start = true;
+
+    if (!s->connected) {
+        return;
+    }
+
     if (s->dev.started) {
         return;
     }
@@ -232,7 +259,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
      * vhost here instead of waiting for .set_status().
      */
-    vhost_user_blk_start(vdev);
+    ret = vhost_user_blk_start(vdev);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost start failed: %s",
+                     strerror(-ret));
+        qemu_chr_fe_disconnect(&s->chardev);
+        return;
+    }
 
     /* Kick right away to begin processing requests already in vring */
     for (i = 0; i < s->dev.nvqs; i++) {
@@ -245,6 +278,106 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static void vhost_user_blk_reset(VirtIODevice *vdev)
+{
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    if (s->vhost_user) {
+        vhost_user_inflight_reset(s->vhost_user);
+    }
+}
+
+static int vhost_user_blk_connect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    int ret = 0;
+
+    if (s->connected) {
+        return 0;
+    }
+    s->connected = true;
+
+    s->dev.nvqs = s->num_queues;
+    s->dev.vqs = s->vqs;
+    s->dev.vq_index = 0;
+    s->dev.backend_features = 0;
+
+    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
+
+    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost initialization failed: %s",
+                     strerror(-ret));
+        return ret;
+    }
+
+    if (s->should_start && !s->dev.started) {
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static void vhost_user_blk_disconnect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    if (!s->connected) {
+        return;
+    }
+    s->connected = false;
+
+    if (s->dev.started) {
+        vhost_user_blk_stop(vdev);
+    }
+
+    vhost_dev_cleanup(&s->dev);
+}
+
+static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
+                                     void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    qemu_chr_fe_disconnect(&s->chardev);
+
+    return true;
+}
+
+static void vhost_user_blk_event(void *opaque, int event)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    switch (event) {
+    case CHR_EVENT_OPENED:
+        if (vhost_user_blk_connect(dev) < 0) {
+            qemu_chr_fe_disconnect(&s->chardev);
+            return;
+        }
+        s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
+                                         vhost_user_blk_watch, dev);
+        break;
+    case CHR_EVENT_CLOSED:
+        vhost_user_blk_disconnect(dev);
+        if (s->watch) {
+            g_source_remove(s->watch);
+            s->watch = 0;
+        }
+        break;
+    }
+}
+
 static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -284,8 +417,13 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
                          vhost_user_blk_handle_output);
     }
 
+    s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
+    s->watch = 0;
+    s->should_start = false;
+    s->connected = true;
+
     s->dev.nvqs = s->num_queues;
-    s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
+    s->dev.vqs = s->vqs;
     s->dev.vq_index = 0;
     s->dev.backend_features = 0;
 
@@ -309,6 +447,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         s->blkcfg.num_queues = s->num_queues;
     }
 
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, vhost_user_blk_event,
+                             NULL, (void *)dev, NULL, true);
+
     return;
 
 vhost_err:
@@ -328,8 +469,11 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
     VHostUserBlk *s = VHOST_USER_BLK(dev);
 
     vhost_user_blk_set_status(vdev, 0);
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, NULL,
+                             NULL, NULL, NULL, false);
     vhost_dev_cleanup(&s->dev);
-    g_free(s->dev.vqs);
+
+    g_free(s->vqs);
     virtio_cleanup(vdev);
 
     if (s->vhost_user) {
@@ -379,6 +523,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data)
     vdc->set_config = vhost_user_blk_set_config;
     vdc->get_features = vhost_user_blk_get_features;
     vdc->set_status = vhost_user_blk_set_status;
+    vdc->reset = vhost_user_blk_reset;
 }
 
 static const TypeInfo vhost_user_blk_info = {
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index d52944aeeb..560bb21459 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -37,6 +37,10 @@ typedef struct VHostUserBlk {
     uint32_t config_wce;
     struct vhost_dev dev;
     VhostUserState *vhost_user;
+    struct vhost_virtqueue *vqs;
+    guint watch;
+    bool should_start;
+    bool connected;
 } VHostUserBlk;
 
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH for-4.0 6/6] contrib/vhost-user-blk: enable inflight I/O recording
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (4 preceding siblings ...)
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend elohimes
@ 2018-12-06  6:35 ` elohimes
  2018-12-06  7:23 ` [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting Marc-André Lureau
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: elohimes @ 2018-12-06  6:35 UTC (permalink / raw)
  To: mst, marcandre.lureau
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch tells qemu that we now support inflight I/O recording
so that qemu could offer shared memory to it.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/vhost-user-blk/vhost-user-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index 571f114a56..f87f9de8cd 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -328,7 +328,8 @@ vub_get_features(VuDev *dev)
 static uint64_t
 vub_get_protocol_features(VuDev *dev)
 {
-    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
+           1uLL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
 }
 
 static int
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 3/6] libvhost-user: Introduce vu_queue_map_desc()
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 3/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
@ 2018-12-06  7:16   ` Marc-André Lureau
  0 siblings, 0 replies; 44+ messages in thread
From: Marc-André Lureau @ 2018-12-06  7:16 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, nixun, QEMU, lilin24, zhangyu31, chaiwen, xieyongji

On Thu, Dec 6, 2018 at 10:40 AM <elohimes@gmail.com> wrote:
>
> From: Xie Yongji <xieyongji@baidu.com>
>
> Introduce vu_queue_map_desc() which should be
> independent with vu_queue_pop();
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

> ---
>  contrib/libvhost-user/libvhost-user.c | 86 +++++++++++++++------------
>  1 file changed, 49 insertions(+), 37 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index a6b46cdc03..4432bd8bb4 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -1853,49 +1853,20 @@ virtqueue_alloc_element(size_t sz,
>      return elem;
>  }
>
> -void *
> -vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
> +static void *
> +vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
>  {
> -    unsigned int i, head, max, desc_len;
> +    struct vring_desc *desc = vq->vring.desc;
>      uint64_t desc_addr, read_len;
> +    unsigned int desc_len;
> +    unsigned int max = vq->vring.num;
> +    unsigned int i = idx;
>      VuVirtqElement *elem;
> -    unsigned out_num, in_num;
> +    unsigned int out_num = 0, in_num = 0;
>      struct iovec iov[VIRTQUEUE_MAX_SIZE];
>      struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE];
> -    struct vring_desc *desc;
>      int rc;
>
> -    if (unlikely(dev->broken) ||
> -        unlikely(!vq->vring.avail)) {
> -        return NULL;
> -    }
> -
> -    if (vu_queue_empty(dev, vq)) {
> -        return NULL;
> -    }
> -    /* Needed after virtio_queue_empty(), see comment in
> -     * virtqueue_num_heads(). */
> -    smp_rmb();
> -
> -    /* When we start there are none of either input nor output. */
> -    out_num = in_num = 0;
> -
> -    max = vq->vring.num;
> -    if (vq->inuse >= vq->vring.num) {
> -        vu_panic(dev, "Virtqueue size exceeded");
> -        return NULL;
> -    }
> -
> -    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
> -        return NULL;
> -    }
> -
> -    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
> -        vring_set_avail_event(vq, vq->last_avail_idx);
> -    }
> -
> -    i = head;
> -    desc = vq->vring.desc;
>      if (desc[i].flags & VRING_DESC_F_INDIRECT) {
>          if (desc[i].len % sizeof(struct vring_desc)) {
>              vu_panic(dev, "Invalid size for indirect buffer table");
> @@ -1947,12 +1918,13 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
>      } while (rc == VIRTQUEUE_READ_DESC_MORE);
>
>      if (rc == VIRTQUEUE_READ_DESC_ERROR) {
> +        vu_panic(dev, "read descriptor error");
>          return NULL;
>      }
>
>      /* Now copy what we have collected and mapped */
>      elem = virtqueue_alloc_element(sz, out_num, in_num);
> -    elem->index = head;
> +    elem->index = idx;
>      for (i = 0; i < out_num; i++) {
>          elem->out_sg[i] = iov[i];
>      }
> @@ -1960,6 +1932,46 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
>          elem->in_sg[i] = iov[out_num + i];
>      }
>
> +    return elem;
> +}
> +
> +void *
> +vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
> +{
> +    unsigned int head;
> +    VuVirtqElement *elem;
> +
> +    if (unlikely(dev->broken) ||
> +        unlikely(!vq->vring.avail)) {
> +        return NULL;
> +    }
> +
> +    if (vu_queue_empty(dev, vq)) {
> +        return NULL;
> +    }
> +    /* Needed after virtio_queue_empty(), see comment in
> +     * virtqueue_num_heads(). */
> +    smp_rmb();
> +
> +    if (vq->inuse >= vq->vring.num) {
> +        vu_panic(dev, "Virtqueue size exceeded");
> +        return NULL;
> +    }
> +
> +    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
> +        return NULL;
> +    }
> +
> +    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
> +        vring_set_avail_event(vq, vq->last_avail_idx);
> +    }
> +
> +    elem = vu_queue_map_desc(dev, vq, head, sz);
> +
> +    if (!elem) {
> +        return NULL;
> +    }
> +
>      vq->inuse++;
>
>      return elem;
> --
> 2.17.1
>
>


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O elohimes
@ 2018-12-06  7:19   ` Marc-André Lureau
  2018-12-06  7:22     ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Marc-André Lureau @ 2018-12-06  7:19 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, nixun, QEMU, lilin24, zhangyu31, chaiwen, xieyongji

Hi
On Thu, Dec 6, 2018 at 10:40 AM <elohimes@gmail.com> wrote:
>
> From: Xie Yongji <xieyongji@baidu.com>
>
> This introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
>
> With this new message, the backend is able to restart without
> missing I/O which would cause I/O hung for block device.
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Chai Wen <chaiwen@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  hw/virtio/vhost-user.c            | 69 +++++++++++++++++++++++++++++++
>  hw/virtio/vhost.c                 |  8 ++++
>  include/hw/virtio/vhost-backend.h |  4 ++
>  include/hw/virtio/vhost-user.h    |  8 ++++

Please update docs/interop/vhost-user.txt to describe the new message

>  4 files changed, 89 insertions(+)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e09bed0e4a..4c0e64891d 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -19,6 +19,7 @@
>  #include "sysemu/kvm.h"
>  #include "qemu/error-report.h"
>  #include "qemu/sockets.h"
> +#include "qemu/memfd.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/migration.h"
>  #include "migration/postcopy-ram.h"
> @@ -52,6 +53,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
>      VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> +    VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>
> @@ -89,6 +91,7 @@ typedef enum VhostUserRequest {
>      VHOST_USER_POSTCOPY_ADVISE  = 28,
>      VHOST_USER_POSTCOPY_LISTEN  = 29,
>      VHOST_USER_POSTCOPY_END     = 30,
> +    VHOST_USER_SET_VRING_INFLIGHT = 31,

why VRING? it seems to be free/arbitrary memory area.

Oh, I understand later that this has an explicit layout and behaviour
later described in "libvhost-user: Support recording inflight I/O in
shared memory"

Please update the vhost-user spec first to describe expected usage/behaviour.


>      VHOST_USER_MAX
>  } VhostUserRequest;
>
> @@ -147,6 +150,11 @@ typedef struct VhostUserVringArea {
>      uint64_t offset;
>  } VhostUserVringArea;
>
> +typedef struct VhostUserVringInflight {
> +    uint32_t size;
> +    uint32_t idx;
> +} VhostUserVringInflight;
> +
>  typedef struct {
>      VhostUserRequest request;
>
> @@ -169,6 +177,7 @@ typedef union {
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
>          VhostUserVringArea area;
> +        VhostUserVringInflight inflight;
>  } VhostUserPayload;
>
>  typedef struct VhostUserMsg {
> @@ -1739,6 +1748,58 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
>      return result;
>  }
>
> +static int vhost_user_set_vring_inflight(struct vhost_dev *dev, int idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> +        return 0;
> +    }
> +
> +    if (!u->user->inflight[idx].addr) {
> +        Error *err = NULL;
> +
> +        u->user->inflight[idx].size = qemu_real_host_page_size;
> +        u->user->inflight[idx].addr = qemu_memfd_alloc("vhost-inflight",
> +                                      u->user->inflight[idx].size,
> +                                      F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> +                                      &u->user->inflight[idx].fd, &err);
> +        if (err) {
> +            error_report_err(err);
> +            u->user->inflight[idx].addr = NULL;
> +            return -1;
> +        }
> +    }
> +
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_SET_VRING_INFLIGHT,
> +        .hdr.flags = VHOST_USER_VERSION,
> +        .payload.inflight.size = u->user->inflight[idx].size,
> +        .payload.inflight.idx = idx,
> +        .hdr.size = sizeof(msg.payload.inflight),
> +    };
> +
> +    if (vhost_user_write(dev, &msg, &u->user->inflight[idx].fd, 1) < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +void vhost_user_inflight_reset(VhostUserState *user)
> +{
> +    int i;
> +
> +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +        if (!user->inflight[i].addr) {
> +            continue;
> +        }
> +
> +        memset(user->inflight[i].addr, 0, user->inflight[i].size);
> +    }
> +}
> +
>  VhostUserState *vhost_user_init(void)
>  {
>      VhostUserState *user = g_new0(struct VhostUserState, 1);
> @@ -1756,6 +1817,13 @@ void vhost_user_cleanup(VhostUserState *user)
>              munmap(user->notifier[i].addr, qemu_real_host_page_size);
>              user->notifier[i].addr = NULL;
>          }
> +
> +        if (user->inflight[i].addr) {
> +            munmap(user->inflight[i].addr, user->inflight[i].size);
> +            user->inflight[i].addr = NULL;
> +            close(user->inflight[i].fd);
> +            user->inflight[i].fd = -1;
> +        }
>      }
>  }
>
> @@ -1790,4 +1858,5 @@ const VhostOps user_ops = {
>          .vhost_crypto_create_session = vhost_user_crypto_create_session,
>          .vhost_crypto_close_session = vhost_user_crypto_close_session,
>          .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
> +        .vhost_set_vring_inflight = vhost_user_set_vring_inflight,
>  };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 569c4053ea..2ca7b4e841 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -973,6 +973,14 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
>          return -errno;
>      }
>
> +    if (dev->vhost_ops->vhost_set_vring_inflight) {
> +        r = dev->vhost_ops->vhost_set_vring_inflight(dev, vhost_vq_index);
> +        if (r) {
> +            VHOST_OPS_DEBUG("vhost_set_vring_inflight failed");
> +            return -errno;
> +        }
> +    }
> +
>      state.num = virtio_queue_get_last_avail_idx(vdev, idx);
>      r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
>      if (r) {
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81283ec50f..8110e09089 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -104,6 +104,9 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
>  typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
>                                                  MemoryRegionSection *section);
>
> +typedef int (*vhost_set_vring_inflight_op)(struct vhost_dev *dev,
> +                                           int idx);
> +
>  typedef struct VhostOps {
>      VhostBackendType backend_type;
>      vhost_backend_init vhost_backend_init;
> @@ -142,6 +145,7 @@ typedef struct VhostOps {
>      vhost_crypto_create_session_op vhost_crypto_create_session;
>      vhost_crypto_close_session_op vhost_crypto_close_session;
>      vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> +    vhost_set_vring_inflight_op vhost_set_vring_inflight;
>  } VhostOps;
>
>  extern const VhostOps user_ops;
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index fd660393a0..ff13433153 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -17,11 +17,19 @@ typedef struct VhostUserHostNotifier {
>      bool set;
>  } VhostUserHostNotifier;
>
> +typedef struct VhostUserInflight {
> +    void *addr;
> +    uint32_t size;
> +    int fd;
> +} VhostUserInflight;
> +
>  typedef struct VhostUserState {
>      CharBackend *chr;
>      VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> +    VhostUserInflight inflight[VIRTIO_QUEUE_MAX];
>  } VhostUserState;
>
> +void vhost_user_inflight_reset(VhostUserState *user);
>  VhostUserState *vhost_user_init(void);
>  void vhost_user_cleanup(VhostUserState *user);
>
> --
> 2.17.1
>
>


--
Marc-André Lureau

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O
  2018-12-06  7:19   ` Marc-André Lureau
@ 2018-12-06  7:22     ` Yongji Xie
  0 siblings, 0 replies; 44+ messages in thread
From: Yongji Xie @ 2018-12-06  7:22 UTC (permalink / raw)
  To: marcandre.lureau
  Cc: Michael S. Tsirkin, nixun, qemu-devel, lilin24, zhangyu31,
	chaiwen, Xie Yongji

On Thu, 6 Dec 2018 at 15:19, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
>
> Hi
> On Thu, Dec 6, 2018 at 10:40 AM <elohimes@gmail.com> wrote:
> >
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > With this new message, the backend is able to restart without
> > missing I/O which would cause I/O hung for block device.
> >
> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > Signed-off-by: Chai Wen <chaiwen@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > ---
> >  hw/virtio/vhost-user.c            | 69 +++++++++++++++++++++++++++++++
> >  hw/virtio/vhost.c                 |  8 ++++
> >  include/hw/virtio/vhost-backend.h |  4 ++
> >  include/hw/virtio/vhost-user.h    |  8 ++++
>
> Please update docs/interop/vhost-user.txt to describe the new message
>

Will do it in v2.

Thanks,
Yongji

> >  4 files changed, 89 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index e09bed0e4a..4c0e64891d 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -19,6 +19,7 @@
> >  #include "sysemu/kvm.h"
> >  #include "qemu/error-report.h"
> >  #include "qemu/sockets.h"
> > +#include "qemu/memfd.h"
> >  #include "sysemu/cryptodev.h"
> >  #include "migration/migration.h"
> >  #include "migration/postcopy-ram.h"
> > @@ -52,6 +53,7 @@ enum VhostUserProtocolFeature {
> >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> >      VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
> >      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> > +    VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
> >      VHOST_USER_PROTOCOL_F_MAX
> >  };
> >
> > @@ -89,6 +91,7 @@ typedef enum VhostUserRequest {
> >      VHOST_USER_POSTCOPY_ADVISE  = 28,
> >      VHOST_USER_POSTCOPY_LISTEN  = 29,
> >      VHOST_USER_POSTCOPY_END     = 30,
> > +    VHOST_USER_SET_VRING_INFLIGHT = 31,
>
> why VRING? it seems to be free/arbitrary memory area.
>
> Oh, I understand later that this has an explicit layout and behaviour
> later described in "libvhost-user: Support recording inflight I/O in
> shared memory"
>
> Please update the vhost-user spec first to describe expected usage/behaviour.
>
>
> >      VHOST_USER_MAX
> >  } VhostUserRequest;
> >
> > @@ -147,6 +150,11 @@ typedef struct VhostUserVringArea {
> >      uint64_t offset;
> >  } VhostUserVringArea;
> >
> > +typedef struct VhostUserVringInflight {
> > +    uint32_t size;
> > +    uint32_t idx;
> > +} VhostUserVringInflight;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >
> > @@ -169,6 +177,7 @@ typedef union {
> >          VhostUserConfig config;
> >          VhostUserCryptoSession session;
> >          VhostUserVringArea area;
> > +        VhostUserVringInflight inflight;
> >  } VhostUserPayload;
> >
> >  typedef struct VhostUserMsg {
> > @@ -1739,6 +1748,58 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
> >      return result;
> >  }
> >
> > +static int vhost_user_set_vring_inflight(struct vhost_dev *dev, int idx)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +
> > +    if (!virtio_has_feature(dev->protocol_features,
> > +                            VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> > +        return 0;
> > +    }
> > +
> > +    if (!u->user->inflight[idx].addr) {
> > +        Error *err = NULL;
> > +
> > +        u->user->inflight[idx].size = qemu_real_host_page_size;
> > +        u->user->inflight[idx].addr = qemu_memfd_alloc("vhost-inflight",
> > +                                      u->user->inflight[idx].size,
> > +                                      F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> > +                                      &u->user->inflight[idx].fd, &err);
> > +        if (err) {
> > +            error_report_err(err);
> > +            u->user->inflight[idx].addr = NULL;
> > +            return -1;
> > +        }
> > +    }
> > +
> > +    VhostUserMsg msg = {
> > +        .hdr.request = VHOST_USER_SET_VRING_INFLIGHT,
> > +        .hdr.flags = VHOST_USER_VERSION,
> > +        .payload.inflight.size = u->user->inflight[idx].size,
> > +        .payload.inflight.idx = idx,
> > +        .hdr.size = sizeof(msg.payload.inflight),
> > +    };
> > +
> > +    if (vhost_user_write(dev, &msg, &u->user->inflight[idx].fd, 1) < 0) {
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +void vhost_user_inflight_reset(VhostUserState *user)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > +        if (!user->inflight[i].addr) {
> > +            continue;
> > +        }
> > +
> > +        memset(user->inflight[i].addr, 0, user->inflight[i].size);
> > +    }
> > +}
> > +
> >  VhostUserState *vhost_user_init(void)
> >  {
> >      VhostUserState *user = g_new0(struct VhostUserState, 1);
> > @@ -1756,6 +1817,13 @@ void vhost_user_cleanup(VhostUserState *user)
> >              munmap(user->notifier[i].addr, qemu_real_host_page_size);
> >              user->notifier[i].addr = NULL;
> >          }
> > +
> > +        if (user->inflight[i].addr) {
> > +            munmap(user->inflight[i].addr, user->inflight[i].size);
> > +            user->inflight[i].addr = NULL;
> > +            close(user->inflight[i].fd);
> > +            user->inflight[i].fd = -1;
> > +        }
> >      }
> >  }
> >
> > @@ -1790,4 +1858,5 @@ const VhostOps user_ops = {
> >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> >          .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
> > +        .vhost_set_vring_inflight = vhost_user_set_vring_inflight,
> >  };
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 569c4053ea..2ca7b4e841 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -973,6 +973,14 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
> >          return -errno;
> >      }
> >
> > +    if (dev->vhost_ops->vhost_set_vring_inflight) {
> > +        r = dev->vhost_ops->vhost_set_vring_inflight(dev, vhost_vq_index);
> > +        if (r) {
> > +            VHOST_OPS_DEBUG("vhost_set_vring_inflight failed");
> > +            return -errno;
> > +        }
> > +    }
> > +
> >      state.num = virtio_queue_get_last_avail_idx(vdev, idx);
> >      r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
> >      if (r) {
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index 81283ec50f..8110e09089 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -104,6 +104,9 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> >  typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
> >                                                  MemoryRegionSection *section);
> >
> > +typedef int (*vhost_set_vring_inflight_op)(struct vhost_dev *dev,
> > +                                           int idx);
> > +
> >  typedef struct VhostOps {
> >      VhostBackendType backend_type;
> >      vhost_backend_init vhost_backend_init;
> > @@ -142,6 +145,7 @@ typedef struct VhostOps {
> >      vhost_crypto_create_session_op vhost_crypto_create_session;
> >      vhost_crypto_close_session_op vhost_crypto_close_session;
> >      vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> > +    vhost_set_vring_inflight_op vhost_set_vring_inflight;
> >  } VhostOps;
> >
> >  extern const VhostOps user_ops;
> > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > index fd660393a0..ff13433153 100644
> > --- a/include/hw/virtio/vhost-user.h
> > +++ b/include/hw/virtio/vhost-user.h
> > @@ -17,11 +17,19 @@ typedef struct VhostUserHostNotifier {
> >      bool set;
> >  } VhostUserHostNotifier;
> >
> > +typedef struct VhostUserInflight {
> > +    void *addr;
> > +    uint32_t size;
> > +    int fd;
> > +} VhostUserInflight;
> > +
> >  typedef struct VhostUserState {
> >      CharBackend *chr;
> >      VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > +    VhostUserInflight inflight[VIRTIO_QUEUE_MAX];
> >  } VhostUserState;
> >
> > +void vhost_user_inflight_reset(VhostUserState *user);
> >  VhostUserState *vhost_user_init(void);
> >  void vhost_user_cleanup(VhostUserState *user);
> >
> > --
> > 2.17.1
> >
> >
>
>
> --
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (5 preceding siblings ...)
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 6/6] contrib/vhost-user-blk: enable inflight I/O recording elohimes
@ 2018-12-06  7:23 ` Marc-André Lureau
  2018-12-06  7:43   ` Yongji Xie
  2018-12-06  9:21 ` Yury Kotov
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Marc-André Lureau @ 2018-12-06  7:23 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, nixun, QEMU, lilin24, zhangyu31, chaiwen, xieyongji

Hi

On Thu, Dec 6, 2018 at 10:36 AM <elohimes@gmail.com> wrote:
>
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
>
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
>
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
>
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
>
> The patch 6 tells qemu that we support reconnecting now.
>
> To use it, we could start qemu with:
>
> qemu-system-x86_64 \
>         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>         -device vhost-user-blk-pci,chardev=char0 \

Why do you want qemu to be the client since it is actually the one
that serves and remains alive?  Why make it try to reconnect regularly
when it could instead wait for a connection to come up?

>
> and start vhost-user-blk backend with:
>
> vhost-user-blk -b /path/file -s /path/vhost.socket
>
> Then we can restart vhost-user-blk at any time during VM running.
>
> Xie Yongji (6):
>   char-socket: Enable "wait" option for client mode
>   vhost-user: Add shared memory to record inflight I/O
>   libvhost-user: Introduce vu_queue_map_desc()
>   libvhost-user: Support recording inflight I/O in shared memory
>   vhost-user-blk: Add support for reconnecting backend
>   contrib/vhost-user-blk: enable inflight I/O recording
>
>  chardev/char-socket.c                   |   5 +-
>  contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h   |  19 +++
>  contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
>  hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
>  hw/virtio/vhost-user.c                  |  69 ++++++++
>  hw/virtio/vhost.c                       |   8 +
>  include/hw/virtio/vhost-backend.h       |   4 +
>  include/hw/virtio/vhost-user-blk.h      |   4 +
>  include/hw/virtio/vhost-user.h          |   8 +
>  10 files changed, 452 insertions(+), 52 deletions(-)
>
> --
> 2.17.1
>
>


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode elohimes
@ 2018-12-06  7:23   ` Marc-André Lureau
  2018-12-06  7:53     ` Yongji Xie
  2018-12-06  9:31   ` Yury Kotov
  1 sibling, 1 reply; 44+ messages in thread
From: Marc-André Lureau @ 2018-12-06  7:23 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, nixun, QEMU, lilin24, zhangyu31, chaiwen, xieyongji

Hi

On Thu, Dec 6, 2018 at 10:38 AM <elohimes@gmail.com> wrote:
>
> From: Xie Yongji <xieyongji@baidu.com>
>
> Now we attempt to connect asynchronously for "reconnect socket"
> during open(). But vhost-user device prefer a connected socket
> during initialization. That means we may still need to support
> sync connection during open() for the "reconnect socket".
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>

I am not sure this makes much sense, since upon reconnect it won't
"wait" (if I am not mistaken)

> ---
>  chardev/char-socket.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index eaa8e8b68f..f2819d52e7 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -1072,7 +1072,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
>          s->reconnect_time = reconnect;
>      }
>
> -    if (s->reconnect_time) {
> +    if (s->reconnect_time && !is_waitconnect) {
>          tcp_chr_connect_async(chr);
>      } else {
>          if (s->is_listen) {
> @@ -1120,7 +1120,8 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
>                                    Error **errp)
>  {
>      bool is_listen      = qemu_opt_get_bool(opts, "server", false);
> -    bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
> +    bool is_waitconnect = is_listen ? qemu_opt_get_bool(opts, "wait", true) :
> +                          qemu_opt_get_bool(opts, "wait", false);
>      bool is_telnet      = qemu_opt_get_bool(opts, "telnet", false);
>      bool is_tn3270      = qemu_opt_get_bool(opts, "tn3270", false);
>      bool is_websock     = qemu_opt_get_bool(opts, "websocket", false);
> --
> 2.17.1
>
>


--
Marc-André Lureau

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  7:23 ` [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting Marc-André Lureau
@ 2018-12-06  7:43   ` Yongji Xie
  0 siblings, 0 replies; 44+ messages in thread
From: Yongji Xie @ 2018-12-06  7:43 UTC (permalink / raw)
  To: marcandre.lureau
  Cc: Michael S. Tsirkin, nixun, qemu-devel, lilin24, zhangyu31,
	chaiwen, Xie Yongji

On Thu, 6 Dec 2018 at 15:23, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
>
> Hi
>
> On Thu, Dec 6, 2018 at 10:36 AM <elohimes@gmail.com> wrote:
> >
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> >
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 6 tells qemu that we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >         -device vhost-user-blk-pci,chardev=char0 \
>
> Why do you want qemu to be the client since it is actually the one
> that serves and remains alive?  Why make it try to reconnect regularly
> when it could instead wait for a connection to come up?
>

Actually, this patchset should also work when qemu is in server mode.
The reason I make qemu to be client is that some vhost-user backend
such as spdk, vhost-user-blk may still work in server mode. And
seems like we could not make sure all vhost-user backend is working in
client mode.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode
  2018-12-06  7:23   ` Marc-André Lureau
@ 2018-12-06  7:53     ` Yongji Xie
  0 siblings, 0 replies; 44+ messages in thread
From: Yongji Xie @ 2018-12-06  7:53 UTC (permalink / raw)
  To: marcandre.lureau
  Cc: Michael S. Tsirkin, nixun, qemu-devel, lilin24, zhangyu31,
	chaiwen, Xie Yongji

On Thu, 6 Dec 2018 at 15:23, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
>
> Hi
>
> On Thu, Dec 6, 2018 at 10:38 AM <elohimes@gmail.com> wrote:
> >
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > Now we attempt to connect asynchronously for "reconnect socket"
> > during open(). But vhost-user device prefer a connected socket
> > during initialization. That means we may still need to support
> > sync connection during open() for the "reconnect socket".
> >
> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
>
> I am not sure this makes much sense, since upon reconnect it won't
> "wait" (if I am not mistaken)
>

Yes, qemu won't wait when reconnecting. The "wait" here just means that
we should wait connection at first time (during open()). I'm not sure whether
reuse the "wait" option is OK here.

If no this option, current qemu will fail to initialize vhost-user-blk
device when
"reconnect" option is specified no matter the backend is running or not.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (6 preceding siblings ...)
  2018-12-06  7:23 ` [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting Marc-André Lureau
@ 2018-12-06  9:21 ` Yury Kotov
  2018-12-06  9:41   ` Yongji Xie
  2018-12-06 13:57 ` Jason Wang
  2018-12-13 14:45 ` Michael S. Tsirkin
  9 siblings, 1 reply; 44+ messages in thread
From: Yury Kotov @ 2018-12-06  9:21 UTC (permalink / raw)
  To: elohimes, mst, marcandre.lureau
  Cc: nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji

Hi, it's very interesting patchset.

I also research reconnecting issue for vhost-user-blk and SPDK.
Did you support a case when vhost backend is not started but QEMU does?

Regards,
Yury

06.12.2018, 09:37, "elohimes@gmail.com" <elohimes@gmail.com>:
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
>
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
>
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
>
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
>
> The patch 6 tells qemu that we support reconnecting now.
>
> To use it, we could start qemu with:
>
> qemu-system-x86_64 \
>         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>         -device vhost-user-blk-pci,chardev=char0 \
>
> and start vhost-user-blk backend with:
>
> vhost-user-blk -b /path/file -s /path/vhost.socket
>
> Then we can restart vhost-user-blk at any time during VM running.
>
> Xie Yongji (6):
>   char-socket: Enable "wait" option for client mode
>   vhost-user: Add shared memory to record inflight I/O
>   libvhost-user: Introduce vu_queue_map_desc()
>   libvhost-user: Support recording inflight I/O in shared memory
>   vhost-user-blk: Add support for reconnecting backend
>   contrib/vhost-user-blk: enable inflight I/O recording
>
>  chardev/char-socket.c | 5 +-
>  contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h | 19 +++
>  contrib/vhost-user-blk/vhost-user-blk.c | 3 +-
>  hw/block/vhost-user-blk.c | 169 +++++++++++++++++--
>  hw/virtio/vhost-user.c | 69 ++++++++
>  hw/virtio/vhost.c | 8 +
>  include/hw/virtio/vhost-backend.h | 4 +
>  include/hw/virtio/vhost-user-blk.h | 4 +
>  include/hw/virtio/vhost-user.h | 8 +
>  10 files changed, 452 insertions(+), 52 deletions(-)
>
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode elohimes
  2018-12-06  7:23   ` Marc-André Lureau
@ 2018-12-06  9:31   ` Yury Kotov
  1 sibling, 0 replies; 44+ messages in thread
From: Yury Kotov @ 2018-12-06  9:31 UTC (permalink / raw)
  To: elohimes, mst, marcandre.lureau
  Cc: nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji

Hi,

06.12.2018, 09:37, "elohimes@gmail.com" <elohimes@gmail.com>:
> From: Xie Yongji <xieyongji@baidu.com>
>
> Now we attempt to connect asynchronously for "reconnect socket"
> during open(). But vhost-user device prefer a connected socket
> during initialization. That means we may still need to support
> sync connection during open() for the "reconnect socket".
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  chardev/char-socket.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index eaa8e8b68f..f2819d52e7 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -1072,7 +1072,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
>          s->reconnect_time = reconnect;
>      }
>
> - if (s->reconnect_time) {
> + if (s->reconnect_time && !is_waitconnect) {
>          tcp_chr_connect_async(chr);

Does reconnect+wait allow to start QEMU when the first connection was failed?
I think it was the main reason to use tcp_chr_connect_async for reconnect case.

>      } else {
>          if (s->is_listen) {
> @@ -1120,7 +1120,8 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
>                                    Error **errp)
>  {
>      bool is_listen = qemu_opt_get_bool(opts, "server", false);
> - bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
> + bool is_waitconnect = is_listen ? qemu_opt_get_bool(opts, "wait", true) :
> + qemu_opt_get_bool(opts, "wait", false);
>      bool is_telnet = qemu_opt_get_bool(opts, "telnet", false);
>      bool is_tn3270 = qemu_opt_get_bool(opts, "tn3270", false);
>      bool is_websock = qemu_opt_get_bool(opts, "websocket", false);
> --
> 2.17.1

Regards,
Yury

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  9:21 ` Yury Kotov
@ 2018-12-06  9:41   ` Yongji Xie
  2018-12-06  9:52     ` Yury Kotov
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-06  9:41 UTC (permalink / raw)
  To: yury-kotov
  Cc: Michael S. Tsirkin, marcandre.lureau, nixun, qemu-devel, lilin24,
	zhangyu31, chaiwen, Xie Yongji

On Thu, 6 Dec 2018 at 17:21, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
> Hi, it's very interesting patchset.
>
> I also research reconnecting issue for vhost-user-blk and SPDK.
> Did you support a case when vhost backend is not started but QEMU does?
>

Now we do not support this case. Because qemu have to get config from vhost-user
backend through VHOST_USER_GET_CONFIG when we initialize the
vhost-user-blk device.

If we delay getting config until we connect vhost-user backend, guest
driver may get incorrect
configuration? That's why I modify the "wait" option to support client mode.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  9:41   ` Yongji Xie
@ 2018-12-06  9:52     ` Yury Kotov
  2018-12-06 10:35       ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Yury Kotov @ 2018-12-06  9:52 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, marcandre.lureau, nixun, qemu-devel, lilin24,
	zhangyu31, chaiwen, Xie Yongji

Yes, I also think that realize shout be sync.

But may be it's better to add an 'disconnected' option to init the chardev
in disconnected state, then do the first connection with
qemu_chr_fe_wait_connected from vhost_user_blk_realize. So when
connection will be broken in realize we can try again.
What do you think about it?

Regards,
Yury

06.12.2018, 12:42, "Yongji Xie" <elohimes@gmail.com>:
> On Thu, 6 Dec 2018 at 17:21, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>>  Hi, it's very interesting patchset.
>>
>>  I also research reconnecting issue for vhost-user-blk and SPDK.
>>  Did you support a case when vhost backend is not started but QEMU does?
>
> Now we do not support this case. Because qemu have to get config from vhost-user
> backend through VHOST_USER_GET_CONFIG when we initialize the
> vhost-user-blk device.
>
> If we delay getting config until we connect vhost-user backend, guest
> driver may get incorrect
> configuration? That's why I modify the "wait" option to support client mode.
>
> Thanks,
> Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  9:52     ` Yury Kotov
@ 2018-12-06 10:35       ` Yongji Xie
  0 siblings, 0 replies; 44+ messages in thread
From: Yongji Xie @ 2018-12-06 10:35 UTC (permalink / raw)
  To: yury-kotov
  Cc: Michael S. Tsirkin, marcandre.lureau, nixun, qemu-devel, lilin24,
	zhangyu31, chaiwen, Xie Yongji

On Thu, 6 Dec 2018 at 17:52, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
> Yes, I also think that realize shout be sync.
>
> But may be it's better to add an 'disconnected' option to init the chardev
> in disconnected state, then do the first connection with
> qemu_chr_fe_wait_connected from vhost_user_blk_realize. So when
> connection will be broken in realize we can try again.
> What do you think about it?
>

Sounds good to me. And maybe we need to add a timeout for the retrying.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend
  2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend elohimes
@ 2018-12-06 12:21   ` Yury Kotov
  2018-12-06 13:26     ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Yury Kotov @ 2018-12-06 12:21 UTC (permalink / raw)
  To: elohimes, marcandre.lureau, mst
  Cc: nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji

06.12.2018, 09:38, "elohimes@gmail.com" <elohimes@gmail.com>:
> From: Xie Yongji <xieyongji@baidu.com>
>
> Since the new message VHOST_USER_SET_VRING_INFLIGHT,
> the backend is able to restart safely. This patch
> allow qemu to reconnect the backend after connection
> closed.
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Ni Xun <nixun@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  hw/block/vhost-user-blk.c | 169 +++++++++++++++++++++++++++--
>  include/hw/virtio/vhost-user-blk.h | 4 +
>  2 files changed, 161 insertions(+), 12 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 1451940845..663e91bcf6 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
>      .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
>  };
>
> -static void vhost_user_blk_start(VirtIODevice *vdev)
> +static int vhost_user_blk_start(VirtIODevice *vdev)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
>      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>
>      if (!k->set_guest_notifiers) {
>          error_report("binding does not support guest notifiers");
> - return;
> + return -ENOSYS;
>      }
>
>      ret = vhost_dev_enable_notifiers(&s->dev, vdev);
>      if (ret < 0) {
>          error_report("Error enabling host notifiers: %d", -ret);
> - return;
> + return ret;
>      }
>
>      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> @@ -140,12 +140,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>          vhost_virtqueue_mask(&s->dev, vdev, i, false);
>      }
>
> - return;
> + return ret;
>
>  err_guest_notifiers:
>      k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>  err_host_notifiers:
>      vhost_dev_disable_notifiers(&s->dev, vdev);
> + return ret;
>  }
>
>  static void vhost_user_blk_stop(VirtIODevice *vdev)
> @@ -164,7 +165,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
>      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>      if (ret < 0) {
>          error_report("vhost guest notifier cleanup failed: %d", ret);
> - return;
>      }
>
>      vhost_dev_disable_notifiers(&s->dev, vdev);
> @@ -174,21 +174,39 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
>      bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
> + int ret;
>
>      if (!vdev->vm_running) {
>          should_start = false;
>      }
>
> - if (s->dev.started == should_start) {
> + if (s->should_start == should_start) {
> + return;
> + }
> +
> + if (!s->connected || s->dev.started == should_start) {
> + s->should_start = should_start;
>          return;
>      }
>
>      if (should_start) {
> - vhost_user_blk_start(vdev);
> + s->should_start = true;
> + /* make sure we ignore fake guest kick by
> + * vhost_dev_enable_notifiers() */
> + barrier();
> + ret = vhost_user_blk_start(vdev);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost start failed: %s",
> + strerror(-ret));
> + qemu_chr_fe_disconnect(&s->chardev);
> + }
>      } else {
>          vhost_user_blk_stop(vdev);
> + /* make sure we ignore fake guest kick by
> + * vhost_dev_disable_notifiers() */
> + barrier();
> + s->should_start = false;
>      }
> -
>  }
>
>  static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> @@ -218,13 +236,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
>  static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> - int i;
> + int i, ret;
>
>      if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
>          !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {

This condition looks complicated. Is it correct? I'm not sure but may be
the second '!' is odd. Sorry, I know that it isn't part of this patch, but it
looks like this cond was added in one of your old patches.

>          return;
>      }
>
> + if (s->should_start) {
> + return;
> + }
> + s->should_start = true;
> +
> + if (!s->connected) {
> + return;
> + }
> +
>      if (s->dev.started) {
>          return;
>      }
> @@ -232,7 +259,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>      /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>       * vhost here instead of waiting for .set_status().
>       */
> - vhost_user_blk_start(vdev);
> + ret = vhost_user_blk_start(vdev);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost start failed: %s",
> + strerror(-ret));
> + qemu_chr_fe_disconnect(&s->chardev);
> + return;
> + }
>
>      /* Kick right away to begin processing requests already in vring */
>      for (i = 0; i < s->dev.nvqs; i++) {
> @@ -245,6 +278,106 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>      }
>  }
>
> +static void vhost_user_blk_reset(VirtIODevice *vdev)
> +{
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> + if (s->vhost_user) {
> + vhost_user_inflight_reset(s->vhost_user);
> + }
> +}
> +
> +static int vhost_user_blk_connect(DeviceState *dev)
> +{
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> + int ret = 0;
> +
> + if (s->connected) {
> + return 0;
> + }
> + s->connected = true;
> +
> + s->dev.nvqs = s->num_queues;
> + s->dev.vqs = s->vqs;
> + s->dev.vq_index = 0;
> + s->dev.backend_features = 0;
> +
> + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> +
> + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost initialization failed: %s",
> + strerror(-ret));
> + return ret;
> + }
> +
> + if (s->should_start && !s->dev.started) {

It seems that `!s->dev.started` is always true here because
vhost_dev_init sets dev.started = 0 on success.

> + ret = vhost_user_blk_start(vdev);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost start failed: %s",
> + strerror(-ret));
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static void vhost_user_blk_disconnect(DeviceState *dev)
> +{
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> + if (!s->connected) {
> + return;
> + }
> + s->connected = false;
> +
> + if (s->dev.started) {
> + vhost_user_blk_stop(vdev);
> + }
> +
> + vhost_dev_cleanup(&s->dev);
> +}
> +
> +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
> + void *opaque)
> +{
> + DeviceState *dev = opaque;
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> + qemu_chr_fe_disconnect(&s->chardev);
> +
> + return true;
> +}
> +
> +static void vhost_user_blk_event(void *opaque, int event)
> +{
> + DeviceState *dev = opaque;
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> + switch (event) {
> + case CHR_EVENT_OPENED:
> + if (vhost_user_blk_connect(dev) < 0) {
> + qemu_chr_fe_disconnect(&s->chardev);
> + return;
> + }
> + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
> + vhost_user_blk_watch, dev);
> + break;
> + case CHR_EVENT_CLOSED:
> + vhost_user_blk_disconnect(dev);
> + if (s->watch) {
> + g_source_remove(s->watch);
> + s->watch = 0;
> + }
> + break;
> + }
> +}
> +
>  static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -284,8 +417,13 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>                           vhost_user_blk_handle_output);
>      }
>
> + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
> + s->watch = 0;
> + s->should_start = false;
> + s->connected = true;
> +
>      s->dev.nvqs = s->num_queues;
> - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
> + s->dev.vqs = s->vqs;
>      s->dev.vq_index = 0;
>      s->dev.backend_features = 0;
>
> @@ -309,6 +447,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>          s->blkcfg.num_queues = s->num_queues;
>      }
>
> + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> + NULL, (void *)dev, NULL, true);
> +
>      return;
>
>  vhost_err:
> @@ -328,8 +469,11 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
>      VHostUserBlk *s = VHOST_USER_BLK(dev);
>
>      vhost_user_blk_set_status(vdev, 0);
> + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL,
> + NULL, NULL, NULL, false);
>      vhost_dev_cleanup(&s->dev);
> - g_free(s->dev.vqs);
> +
> + g_free(s->vqs);
>      virtio_cleanup(vdev);
>
>      if (s->vhost_user) {
> @@ -379,6 +523,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data)
>      vdc->set_config = vhost_user_blk_set_config;
>      vdc->get_features = vhost_user_blk_get_features;
>      vdc->set_status = vhost_user_blk_set_status;
> + vdc->reset = vhost_user_blk_reset;
>  }
>
>  static const TypeInfo vhost_user_blk_info = {
> diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> index d52944aeeb..560bb21459 100644
> --- a/include/hw/virtio/vhost-user-blk.h
> +++ b/include/hw/virtio/vhost-user-blk.h
> @@ -37,6 +37,10 @@ typedef struct VHostUserBlk {
>      uint32_t config_wce;
>      struct vhost_dev dev;
>      VhostUserState *vhost_user;
> + struct vhost_virtqueue *vqs;
> + guint watch;
> + bool should_start;
> + bool connected;
>  } VHostUserBlk;
>
>  #endif
> --
> 2.17.1

Regards,
Yury

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend
  2018-12-06 12:21   ` Yury Kotov
@ 2018-12-06 13:26     ` Yongji Xie
  0 siblings, 0 replies; 44+ messages in thread
From: Yongji Xie @ 2018-12-06 13:26 UTC (permalink / raw)
  To: yury-kotov
  Cc: marcandre.lureau, Michael S. Tsirkin, nixun, qemu-devel, lilin24,
	zhangyu31, chaiwen, Xie Yongji

On Thu, 6 Dec 2018 at 20:21, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
> 06.12.2018, 09:38, "elohimes@gmail.com" <elohimes@gmail.com>:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > Since the new message VHOST_USER_SET_VRING_INFLIGHT,
> > the backend is able to restart safely. This patch
> > allow qemu to reconnect the backend after connection
> > closed.
> >
> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > Signed-off-by: Ni Xun <nixun@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > ---
> >  hw/block/vhost-user-blk.c | 169 +++++++++++++++++++++++++++--
> >  include/hw/virtio/vhost-user-blk.h | 4 +
> >  2 files changed, 161 insertions(+), 12 deletions(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 1451940845..663e91bcf6 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
> >      .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
> >  };
> >
> > -static void vhost_user_blk_start(VirtIODevice *vdev)
> > +static int vhost_user_blk_start(VirtIODevice *vdev)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> > @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >
> >      if (!k->set_guest_notifiers) {
> >          error_report("binding does not support guest notifiers");
> > - return;
> > + return -ENOSYS;
> >      }
> >
> >      ret = vhost_dev_enable_notifiers(&s->dev, vdev);
> >      if (ret < 0) {
> >          error_report("Error enabling host notifiers: %d", -ret);
> > - return;
> > + return ret;
> >      }
> >
> >      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> > @@ -140,12 +140,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >          vhost_virtqueue_mask(&s->dev, vdev, i, false);
> >      }
> >
> > - return;
> > + return ret;
> >
> >  err_guest_notifiers:
> >      k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >  err_host_notifiers:
> >      vhost_dev_disable_notifiers(&s->dev, vdev);
> > + return ret;
> >  }
> >
> >  static void vhost_user_blk_stop(VirtIODevice *vdev)
> > @@ -164,7 +165,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
> >      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >      if (ret < 0) {
> >          error_report("vhost guest notifier cleanup failed: %d", ret);
> > - return;
> >      }
> >
> >      vhost_dev_disable_notifiers(&s->dev, vdev);
> > @@ -174,21 +174,39 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >      bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
> > + int ret;
> >
> >      if (!vdev->vm_running) {
> >          should_start = false;
> >      }
> >
> > - if (s->dev.started == should_start) {
> > + if (s->should_start == should_start) {
> > + return;
> > + }
> > +
> > + if (!s->connected || s->dev.started == should_start) {
> > + s->should_start = should_start;
> >          return;
> >      }
> >
> >      if (should_start) {
> > - vhost_user_blk_start(vdev);
> > + s->should_start = true;
> > + /* make sure we ignore fake guest kick by
> > + * vhost_dev_enable_notifiers() */
> > + barrier();
> > + ret = vhost_user_blk_start(vdev);
> > + if (ret < 0) {
> > + error_report("vhost-user-blk: vhost start failed: %s",
> > + strerror(-ret));
> > + qemu_chr_fe_disconnect(&s->chardev);
> > + }
> >      } else {
> >          vhost_user_blk_stop(vdev);
> > + /* make sure we ignore fake guest kick by
> > + * vhost_dev_disable_notifiers() */
> > + barrier();
> > + s->should_start = false;
> >      }
> > -
> >  }
> >
> >  static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> > @@ -218,13 +236,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> >  static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > - int i;
> > + int i, ret;
> >
> >      if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> >          !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
>
> This condition looks complicated. Is it correct? I'm not sure but may be
> the second '!' is odd. Sorry, I know that it isn't part of this patch, but it
> looks like this cond was added in one of your old patches.
>

This means we only support that case when guest driver is virtio-0.9 and host
device is virtio-1.0.

> >          return;
> >      }
> >
> > + if (s->should_start) {
> > + return;
> > + }
> > + s->should_start = true;
> > +
> > + if (!s->connected) {
> > + return;
> > + }
> > +
> >      if (s->dev.started) {
> >          return;
> >      }
> > @@ -232,7 +259,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >      /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> >       * vhost here instead of waiting for .set_status().
> >       */
> > - vhost_user_blk_start(vdev);
> > + ret = vhost_user_blk_start(vdev);
> > + if (ret < 0) {
> > + error_report("vhost-user-blk: vhost start failed: %s",
> > + strerror(-ret));
> > + qemu_chr_fe_disconnect(&s->chardev);
> > + return;
> > + }
> >
> >      /* Kick right away to begin processing requests already in vring */
> >      for (i = 0; i < s->dev.nvqs; i++) {
> > @@ -245,6 +278,106 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >      }
> >  }
> >
> > +static void vhost_user_blk_reset(VirtIODevice *vdev)
> > +{
> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +
> > + if (s->vhost_user) {
> > + vhost_user_inflight_reset(s->vhost_user);
> > + }
> > +}
> > +
> > +static int vhost_user_blk_connect(DeviceState *dev)
> > +{
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > + int ret = 0;
> > +
> > + if (s->connected) {
> > + return 0;
> > + }
> > + s->connected = true;
> > +
> > + s->dev.nvqs = s->num_queues;
> > + s->dev.vqs = s->vqs;
> > + s->dev.vq_index = 0;
> > + s->dev.backend_features = 0;
> > +
> > + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> > +
> > + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> > + if (ret < 0) {
> > + error_report("vhost-user-blk: vhost initialization failed: %s",
> > + strerror(-ret));
> > + return ret;
> > + }
> > +
> > + if (s->should_start && !s->dev.started) {
>
> It seems that `!s->dev.started` is always true here because
> vhost_dev_init sets dev.started = 0 on success.
>

Good catch! Will fix it in v2.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (7 preceding siblings ...)
  2018-12-06  9:21 ` Yury Kotov
@ 2018-12-06 13:57 ` Jason Wang
  2018-12-06 13:59   ` Michael S. Tsirkin
                     ` (2 more replies)
  2018-12-13 14:45 ` Michael S. Tsirkin
  9 siblings, 3 replies; 44+ messages in thread
From: Jason Wang @ 2018-12-06 13:57 UTC (permalink / raw)
  To: elohimes, mst, marcandre.lureau
  Cc: nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji


On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
>
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
>
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
>
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
>
> The patch 6 tells qemu that we support reconnecting now.
>
> To use it, we could start qemu with:
>
> qemu-system-x86_64 \
>          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>          -device vhost-user-blk-pci,chardev=char0 \
>
> and start vhost-user-blk backend with:
>
> vhost-user-blk -b /path/file -s /path/vhost.socket
>
> Then we can restart vhost-user-blk at any time during VM running.


I wonder whether or not it's better to handle this at the level of 
virtio protocol itself instead of vhost-user level. E.g expose 
last_avail_idx to driver might be sufficient?

Another possible issue is, looks like you need to deal with different 
kinds of ring layouts e.g packed virtqueues.

Thanks


>
> Xie Yongji (6):
>    char-socket: Enable "wait" option for client mode
>    vhost-user: Add shared memory to record inflight I/O
>    libvhost-user: Introduce vu_queue_map_desc()
>    libvhost-user: Support recording inflight I/O in shared memory
>    vhost-user-blk: Add support for reconnecting backend
>    contrib/vhost-user-blk: enable inflight I/O recording
>
>   chardev/char-socket.c                   |   5 +-
>   contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
>   contrib/libvhost-user/libvhost-user.h   |  19 +++
>   contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
>   hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
>   hw/virtio/vhost-user.c                  |  69 ++++++++
>   hw/virtio/vhost.c                       |   8 +
>   include/hw/virtio/vhost-backend.h       |   4 +
>   include/hw/virtio/vhost-user-blk.h      |   4 +
>   include/hw/virtio/vhost-user.h          |   8 +
>   10 files changed, 452 insertions(+), 52 deletions(-)
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06 13:57 ` Jason Wang
@ 2018-12-06 13:59   ` Michael S. Tsirkin
  2018-12-10  9:32     ` Jason Wang
  2018-12-06 14:00   ` Jason Wang
  2018-12-07  8:56   ` Yongji Xie
  2 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2018-12-06 13:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: elohimes, marcandre.lureau, nixun, qemu-devel, lilin24,
	zhangyu31, chaiwen, Xie Yongji

On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> 
> On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> > 
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> > 
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> > 
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> > 
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > 
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> > 
> > The patch 6 tells qemu that we support reconnecting now.
> > 
> > To use it, we could start qemu with:
> > 
> > qemu-system-x86_64 \
> >          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >          -device vhost-user-blk-pci,chardev=char0 \
> > 
> > and start vhost-user-blk backend with:
> > 
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> > 
> > Then we can restart vhost-user-blk at any time during VM running.
> 
> 
> I wonder whether or not it's better to handle this at the level of virtio
> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> driver might be sufficient?
> 
> Another possible issue is, looks like you need to deal with different kinds
> of ring layouts e.g packed virtqueues.
> 
> Thanks

I'm not sure I understand your comments here.
All these would be guest-visible extensions.
Possible for sure but how is this related to
a patch supporting transparent reconnects?

> 
> > 
> > Xie Yongji (6):
> >    char-socket: Enable "wait" option for client mode
> >    vhost-user: Add shared memory to record inflight I/O
> >    libvhost-user: Introduce vu_queue_map_desc()
> >    libvhost-user: Support recording inflight I/O in shared memory
> >    vhost-user-blk: Add support for reconnecting backend
> >    contrib/vhost-user-blk: enable inflight I/O recording
> > 
> >   chardev/char-socket.c                   |   5 +-
> >   contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
> >   contrib/libvhost-user/libvhost-user.h   |  19 +++
> >   contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
> >   hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
> >   hw/virtio/vhost-user.c                  |  69 ++++++++
> >   hw/virtio/vhost.c                       |   8 +
> >   include/hw/virtio/vhost-backend.h       |   4 +
> >   include/hw/virtio/vhost-user-blk.h      |   4 +
> >   include/hw/virtio/vhost-user.h          |   8 +
> >   10 files changed, 452 insertions(+), 52 deletions(-)
> > 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06 13:57 ` Jason Wang
  2018-12-06 13:59   ` Michael S. Tsirkin
@ 2018-12-06 14:00   ` Jason Wang
  2018-12-07  8:56   ` Yongji Xie
  2 siblings, 0 replies; 44+ messages in thread
From: Jason Wang @ 2018-12-06 14:00 UTC (permalink / raw)
  To: elohimes, mst, marcandre.lureau
  Cc: nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji,
	Maxime Coquelin


On 2018/12/6 下午9:57, Jason Wang wrote:
>
> On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
>> From: Xie Yongji <xieyongji@baidu.com>
>>
>> This patchset is aimed at supporting qemu to reconnect
>> vhost-user-blk backend after vhost-user-blk backend crash or
>> restart.
>>
>> The patch 1 tries to implenment the sync connection for
>> "reconnect socket".
>>
>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>> to support offering shared memory to backend to record
>> its inflight I/O.
>>
>> The patch 3,4 are the corresponding libvhost-user patches of
>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>
>> The patch 5 supports vhost-user-blk to reconnect backend when
>> connection closed.
>>
>> The patch 6 tells qemu that we support reconnecting now.
>>
>> To use it, we could start qemu with:
>>
>> qemu-system-x86_64 \
>>          -chardev 
>> socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>          -device vhost-user-blk-pci,chardev=char0 \
>>
>> and start vhost-user-blk backend with:
>>
>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>
>> Then we can restart vhost-user-blk at any time during VM running.
>
>
> I wonder whether or not it's better to handle this at the level of 
> virtio protocol itself instead of vhost-user level. E.g expose 
> last_avail_idx to driver might be sufficient?
>
> Another possible issue is, looks like you need to deal with different 
> kinds of ring layouts e.g packed virtqueues.
>
> Thanks 


Cc Maxime who wrote vhost-user reconnecting for more thoughts.

Thanks

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06 13:57 ` Jason Wang
  2018-12-06 13:59   ` Michael S. Tsirkin
  2018-12-06 14:00   ` Jason Wang
@ 2018-12-07  8:56   ` Yongji Xie
  2 siblings, 0 replies; 44+ messages in thread
From: Yongji Xie @ 2018-12-07  8:56 UTC (permalink / raw)
  To: jasowang
  Cc: Michael S. Tsirkin, marcandre.lureau, nixun, qemu-devel, lilin24,
	zhangyu31, chaiwen, Xie Yongji

On Thu, 6 Dec 2018 at 21:57, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> >
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 6 tells qemu that we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> >          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >          -device vhost-user-blk-pci,chardev=char0 \
> >
> > and start vhost-user-blk backend with:
> >
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> >
> > Then we can restart vhost-user-blk at any time during VM running.
>
>
> I wonder whether or not it's better to handle this at the level of
> virtio protocol itself instead of vhost-user level. E.g expose
> last_avail_idx to driver might be sufficient?
>
> Another possible issue is, looks like you need to deal with different
> kinds of ring layouts e.g packed virtqueues.
>

Yes, we should be able to deal with the packed virtqueues. But this
should be processed in backend rather than in qemu. Qemu just sends
a shared memory to backend. Then backend use it to record inflight I/O
and do I/O replay when necessary.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06 13:59   ` Michael S. Tsirkin
@ 2018-12-10  9:32     ` Jason Wang
  2018-12-12  2:48       ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2018-12-10  9:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: zhangyu31, Xie Yongji, lilin24, qemu-devel, elohimes, chaiwen,
	marcandre.lureau, nixun


On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>> From: Xie Yongji<xieyongji@baidu.com>
>>>
>>> This patchset is aimed at supporting qemu to reconnect
>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>> restart.
>>>
>>> The patch 1 tries to implenment the sync connection for
>>> "reconnect socket".
>>>
>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>> to support offering shared memory to backend to record
>>> its inflight I/O.
>>>
>>> The patch 3,4 are the corresponding libvhost-user patches of
>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>
>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>> connection closed.
>>>
>>> The patch 6 tells qemu that we support reconnecting now.
>>>
>>> To use it, we could start qemu with:
>>>
>>> qemu-system-x86_64 \
>>>           -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>           -device vhost-user-blk-pci,chardev=char0 \
>>>
>>> and start vhost-user-blk backend with:
>>>
>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>
>>> Then we can restart vhost-user-blk at any time during VM running.
>> I wonder whether or not it's better to handle this at the level of virtio
>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>> driver might be sufficient?
>>
>> Another possible issue is, looks like you need to deal with different kinds
>> of ring layouts e.g packed virtqueues.
>>
>> Thanks
> I'm not sure I understand your comments here.
> All these would be guest-visible extensions.


Looks not, it only introduces a shared memory between qemu and 
vhost-user backend?


> Possible for sure but how is this related to
> a patch supporting transparent reconnects?


I might miss something. My understanding is that we support transparent 
reconnects, but we can't deduce an accurate last_avail_idx and this is 
what capability this series try to add. To me, this series is functional 
equivalent to expose last_avail_idx (or avail_idx_cons) in available 
ring. So the information is inside guest memory, vhost-user backend can 
access it and update it directly. I believe this is some modern NIC did 
as well (but index is in MMIO area of course).

Thanks

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-10  9:32     ` Jason Wang
@ 2018-12-12  2:48       ` Yongji Xie
  2018-12-12  3:00         ` Jason Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-12  2:48 UTC (permalink / raw)
  To: jasowang
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun

On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> > On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>> From: Xie Yongji<xieyongji@baidu.com>
> >>>
> >>> This patchset is aimed at supporting qemu to reconnect
> >>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>> restart.
> >>>
> >>> The patch 1 tries to implenment the sync connection for
> >>> "reconnect socket".
> >>>
> >>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>> to support offering shared memory to backend to record
> >>> its inflight I/O.
> >>>
> >>> The patch 3,4 are the corresponding libvhost-user patches of
> >>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>
> >>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>> connection closed.
> >>>
> >>> The patch 6 tells qemu that we support reconnecting now.
> >>>
> >>> To use it, we could start qemu with:
> >>>
> >>> qemu-system-x86_64 \
> >>>           -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>           -device vhost-user-blk-pci,chardev=char0 \
> >>>
> >>> and start vhost-user-blk backend with:
> >>>
> >>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>
> >>> Then we can restart vhost-user-blk at any time during VM running.
> >> I wonder whether or not it's better to handle this at the level of virtio
> >> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >> driver might be sufficient?
> >>
> >> Another possible issue is, looks like you need to deal with different kinds
> >> of ring layouts e.g packed virtqueues.
> >>
> >> Thanks
> > I'm not sure I understand your comments here.
> > All these would be guest-visible extensions.
>
>
> Looks not, it only introduces a shared memory between qemu and
> vhost-user backend?
>
>
> > Possible for sure but how is this related to
> > a patch supporting transparent reconnects?
>
>
> I might miss something. My understanding is that we support transparent
> reconnects, but we can't deduce an accurate last_avail_idx and this is
> what capability this series try to add. To me, this series is functional
> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> ring. So the information is inside guest memory, vhost-user backend can
> access it and update it directly. I believe this is some modern NIC did
> as well (but index is in MMIO area of course).
>

Hi Jason,

If my understand is correct, it might be not enough to only expose
last_avail_idx.
Because we would not process descriptors in the same order in which they have
been made available sometimes. If so, we can't get correct inflight
I/O information
from available ring.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-12  2:48       ` Yongji Xie
@ 2018-12-12  3:00         ` Jason Wang
  2018-12-12  3:21           ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2018-12-12  3:00 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun


On 2018/12/12 上午10:48, Yongji Xie wrote:
> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>>>> From: Xie Yongji<xieyongji@baidu.com>
>>>>>
>>>>> This patchset is aimed at supporting qemu to reconnect
>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>>>> restart.
>>>>>
>>>>> The patch 1 tries to implenment the sync connection for
>>>>> "reconnect socket".
>>>>>
>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>>>> to support offering shared memory to backend to record
>>>>> its inflight I/O.
>>>>>
>>>>> The patch 3,4 are the corresponding libvhost-user patches of
>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>>>
>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>>>> connection closed.
>>>>>
>>>>> The patch 6 tells qemu that we support reconnecting now.
>>>>>
>>>>> To use it, we could start qemu with:
>>>>>
>>>>> qemu-system-x86_64 \
>>>>>            -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>>>            -device vhost-user-blk-pci,chardev=char0 \
>>>>>
>>>>> and start vhost-user-blk backend with:
>>>>>
>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>>>
>>>>> Then we can restart vhost-user-blk at any time during VM running.
>>>> I wonder whether or not it's better to handle this at the level of virtio
>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>>>> driver might be sufficient?
>>>>
>>>> Another possible issue is, looks like you need to deal with different kinds
>>>> of ring layouts e.g packed virtqueues.
>>>>
>>>> Thanks
>>> I'm not sure I understand your comments here.
>>> All these would be guest-visible extensions.
>>
>> Looks not, it only introduces a shared memory between qemu and
>> vhost-user backend?
>>
>>
>>> Possible for sure but how is this related to
>>> a patch supporting transparent reconnects?
>>
>> I might miss something. My understanding is that we support transparent
>> reconnects, but we can't deduce an accurate last_avail_idx and this is
>> what capability this series try to add. To me, this series is functional
>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
>> ring. So the information is inside guest memory, vhost-user backend can
>> access it and update it directly. I believe this is some modern NIC did
>> as well (but index is in MMIO area of course).
>>
> Hi Jason,
>
> If my understand is correct, it might be not enough to only expose
> last_avail_idx.
> Because we would not process descriptors in the same order in which they have
> been made available sometimes. If so, we can't get correct inflight
> I/O information
> from available ring.


You can get this with the help of the both used ring and last_avail_idx 
I believe. Or maybe you can give us an example?

Thanks


>
> Thanks,
> Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-12  3:00         ` Jason Wang
@ 2018-12-12  3:21           ` Yongji Xie
  2018-12-12  4:06             ` Jason Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-12  3:21 UTC (permalink / raw)
  To: jasowang
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun

On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 上午10:48, Yongji Xie wrote:
> > On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> >>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>>>> From: Xie Yongji<xieyongji@baidu.com>
> >>>>>
> >>>>> This patchset is aimed at supporting qemu to reconnect
> >>>>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>>>> restart.
> >>>>>
> >>>>> The patch 1 tries to implenment the sync connection for
> >>>>> "reconnect socket".
> >>>>>
> >>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>>>> to support offering shared memory to backend to record
> >>>>> its inflight I/O.
> >>>>>
> >>>>> The patch 3,4 are the corresponding libvhost-user patches of
> >>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>>>
> >>>>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>>>> connection closed.
> >>>>>
> >>>>> The patch 6 tells qemu that we support reconnecting now.
> >>>>>
> >>>>> To use it, we could start qemu with:
> >>>>>
> >>>>> qemu-system-x86_64 \
> >>>>>            -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>>>            -device vhost-user-blk-pci,chardev=char0 \
> >>>>>
> >>>>> and start vhost-user-blk backend with:
> >>>>>
> >>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>>>
> >>>>> Then we can restart vhost-user-blk at any time during VM running.
> >>>> I wonder whether or not it's better to handle this at the level of virtio
> >>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >>>> driver might be sufficient?
> >>>>
> >>>> Another possible issue is, looks like you need to deal with different kinds
> >>>> of ring layouts e.g packed virtqueues.
> >>>>
> >>>> Thanks
> >>> I'm not sure I understand your comments here.
> >>> All these would be guest-visible extensions.
> >>
> >> Looks not, it only introduces a shared memory between qemu and
> >> vhost-user backend?
> >>
> >>
> >>> Possible for sure but how is this related to
> >>> a patch supporting transparent reconnects?
> >>
> >> I might miss something. My understanding is that we support transparent
> >> reconnects, but we can't deduce an accurate last_avail_idx and this is
> >> what capability this series try to add. To me, this series is functional
> >> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> >> ring. So the information is inside guest memory, vhost-user backend can
> >> access it and update it directly. I believe this is some modern NIC did
> >> as well (but index is in MMIO area of course).
> >>
> > Hi Jason,
> >
> > If my understand is correct, it might be not enough to only expose
> > last_avail_idx.
> > Because we would not process descriptors in the same order in which they have
> > been made available sometimes. If so, we can't get correct inflight
> > I/O information
> > from available ring.
>
>
> You can get this with the help of the both used ring and last_avail_idx
> I believe. Or maybe you can give us an example?
>

A simple example, we assume ring size is 8:

1. guest fill avail ring

avail ring: 0 1 2 3 4 5 6 7
used ring:

2. vhost-user backend complete 4,5,6,7 and fill used ring

avail ring: 0 1 2 3 4 5 6 7
used ring: 4 5 6 7

3. guest fill avail ring again

avail ring: 4 5 6 7 4 5 6 7
used ring: 4 5 6 7

4. vhost-user backend crash

The inflight descriptors 0, 1, 2, 3 lost.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-12  3:21           ` Yongji Xie
@ 2018-12-12  4:06             ` Jason Wang
  2018-12-12  6:41               ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2018-12-12  4:06 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun


On 2018/12/12 上午11:21, Yongji Xie wrote:
> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2018/12/12 上午10:48, Yongji Xie wrote:
>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>>>>>> From: Xie Yongji<xieyongji@baidu.com>
>>>>>>>
>>>>>>> This patchset is aimed at supporting qemu to reconnect
>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>>>>>> restart.
>>>>>>>
>>>>>>> The patch 1 tries to implenment the sync connection for
>>>>>>> "reconnect socket".
>>>>>>>
>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>>>>>> to support offering shared memory to backend to record
>>>>>>> its inflight I/O.
>>>>>>>
>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>>>>>
>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>>>>>> connection closed.
>>>>>>>
>>>>>>> The patch 6 tells qemu that we support reconnecting now.
>>>>>>>
>>>>>>> To use it, we could start qemu with:
>>>>>>>
>>>>>>> qemu-system-x86_64 \
>>>>>>>             -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>>>>>             -device vhost-user-blk-pci,chardev=char0 \
>>>>>>>
>>>>>>> and start vhost-user-blk backend with:
>>>>>>>
>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>>>>>
>>>>>>> Then we can restart vhost-user-blk at any time during VM running.
>>>>>> I wonder whether or not it's better to handle this at the level of virtio
>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>>>>>> driver might be sufficient?
>>>>>>
>>>>>> Another possible issue is, looks like you need to deal with different kinds
>>>>>> of ring layouts e.g packed virtqueues.
>>>>>>
>>>>>> Thanks
>>>>> I'm not sure I understand your comments here.
>>>>> All these would be guest-visible extensions.
>>>> Looks not, it only introduces a shared memory between qemu and
>>>> vhost-user backend?
>>>>
>>>>
>>>>> Possible for sure but how is this related to
>>>>> a patch supporting transparent reconnects?
>>>> I might miss something. My understanding is that we support transparent
>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
>>>> what capability this series try to add. To me, this series is functional
>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
>>>> ring. So the information is inside guest memory, vhost-user backend can
>>>> access it and update it directly. I believe this is some modern NIC did
>>>> as well (but index is in MMIO area of course).
>>>>
>>> Hi Jason,
>>>
>>> If my understand is correct, it might be not enough to only expose
>>> last_avail_idx.
>>> Because we would not process descriptors in the same order in which they have
>>> been made available sometimes. If so, we can't get correct inflight
>>> I/O information
>>> from available ring.
>>
>> You can get this with the help of the both used ring and last_avail_idx
>> I believe. Or maybe you can give us an example?
>>
> A simple example, we assume ring size is 8:
>
> 1. guest fill avail ring
>
> avail ring: 0 1 2 3 4 5 6 7
> used ring:
>
> 2. vhost-user backend complete 4,5,6,7 and fill used ring
>
> avail ring: 0 1 2 3 4 5 6 7
> used ring: 4 5 6 7
>
> 3. guest fill avail ring again
>
> avail ring: 4 5 6 7 4 5 6 7
> used ring: 4 5 6 7
>
> 4. vhost-user backend crash
>
> The inflight descriptors 0, 1, 2, 3 lost.
>
> Thanks,
> Yongji


Ok, then we can simply forbid increasing the avail_idx in this case?

Basically, it's a question of whether or not it's better to done it in 
the level of virtio instead of vhost. I'm pretty sure if we expose 
sufficient information, it could be done without touching vhost-user. 
And we won't deal with e.g migration and other cases.

Thanks

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-12  4:06             ` Jason Wang
@ 2018-12-12  6:41               ` Yongji Xie
  2018-12-12  7:47                 ` Jason Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-12  6:41 UTC (permalink / raw)
  To: jasowang
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun

On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 上午11:21, Yongji Xie wrote:
> > On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2018/12/12 上午10:48, Yongji Xie wrote:
> >>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> >>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>>>>>> From: Xie Yongji<xieyongji@baidu.com>
> >>>>>>>
> >>>>>>> This patchset is aimed at supporting qemu to reconnect
> >>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>>>>>> restart.
> >>>>>>>
> >>>>>>> The patch 1 tries to implenment the sync connection for
> >>>>>>> "reconnect socket".
> >>>>>>>
> >>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>>>>>> to support offering shared memory to backend to record
> >>>>>>> its inflight I/O.
> >>>>>>>
> >>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
> >>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>>>>>
> >>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>>>>>> connection closed.
> >>>>>>>
> >>>>>>> The patch 6 tells qemu that we support reconnecting now.
> >>>>>>>
> >>>>>>> To use it, we could start qemu with:
> >>>>>>>
> >>>>>>> qemu-system-x86_64 \
> >>>>>>>             -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>>>>>             -device vhost-user-blk-pci,chardev=char0 \
> >>>>>>>
> >>>>>>> and start vhost-user-blk backend with:
> >>>>>>>
> >>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>>>>>
> >>>>>>> Then we can restart vhost-user-blk at any time during VM running.
> >>>>>> I wonder whether or not it's better to handle this at the level of virtio
> >>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >>>>>> driver might be sufficient?
> >>>>>>
> >>>>>> Another possible issue is, looks like you need to deal with different kinds
> >>>>>> of ring layouts e.g packed virtqueues.
> >>>>>>
> >>>>>> Thanks
> >>>>> I'm not sure I understand your comments here.
> >>>>> All these would be guest-visible extensions.
> >>>> Looks not, it only introduces a shared memory between qemu and
> >>>> vhost-user backend?
> >>>>
> >>>>
> >>>>> Possible for sure but how is this related to
> >>>>> a patch supporting transparent reconnects?
> >>>> I might miss something. My understanding is that we support transparent
> >>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
> >>>> what capability this series try to add. To me, this series is functional
> >>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> >>>> ring. So the information is inside guest memory, vhost-user backend can
> >>>> access it and update it directly. I believe this is some modern NIC did
> >>>> as well (but index is in MMIO area of course).
> >>>>
> >>> Hi Jason,
> >>>
> >>> If my understand is correct, it might be not enough to only expose
> >>> last_avail_idx.
> >>> Because we would not process descriptors in the same order in which they have
> >>> been made available sometimes. If so, we can't get correct inflight
> >>> I/O information
> >>> from available ring.
> >>
> >> You can get this with the help of the both used ring and last_avail_idx
> >> I believe. Or maybe you can give us an example?
> >>
> > A simple example, we assume ring size is 8:
> >
> > 1. guest fill avail ring
> >
> > avail ring: 0 1 2 3 4 5 6 7
> > used ring:
> >
> > 2. vhost-user backend complete 4,5,6,7 and fill used ring
> >
> > avail ring: 0 1 2 3 4 5 6 7
> > used ring: 4 5 6 7
> >
> > 3. guest fill avail ring again
> >
> > avail ring: 4 5 6 7 4 5 6 7
> > used ring: 4 5 6 7
> >
> > 4. vhost-user backend crash
> >
> > The inflight descriptors 0, 1, 2, 3 lost.
> >
> > Thanks,
> > Yongji
>
>
> Ok, then we can simply forbid increasing the avail_idx in this case?
>
> Basically, it's a question of whether or not it's better to done it in
> the level of virtio instead of vhost. I'm pretty sure if we expose
> sufficient information, it could be done without touching vhost-user.
> And we won't deal with e.g migration and other cases.
>

OK, I get your point. That's indeed an alternative way. But this feature seems
to be only useful to vhost-user backend. I'm not sure whether it make sense to
touch virtio protocol for this feature.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-12  6:41               ` Yongji Xie
@ 2018-12-12  7:47                 ` Jason Wang
  2018-12-12  9:18                   ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2018-12-12  7:47 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun


On 2018/12/12 下午2:41, Yongji Xie wrote:
> On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2018/12/12 上午11:21, Yongji Xie wrote:
>>> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2018/12/12 上午10:48, Yongji Xie wrote:
>>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>>>>>>>> From: Xie Yongji<xieyongji@baidu.com>
>>>>>>>>>
>>>>>>>>> This patchset is aimed at supporting qemu to reconnect
>>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>>>>>>>> restart.
>>>>>>>>>
>>>>>>>>> The patch 1 tries to implenment the sync connection for
>>>>>>>>> "reconnect socket".
>>>>>>>>>
>>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>>>>>>>> to support offering shared memory to backend to record
>>>>>>>>> its inflight I/O.
>>>>>>>>>
>>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
>>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>>>>>>>
>>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>>>>>>>> connection closed.
>>>>>>>>>
>>>>>>>>> The patch 6 tells qemu that we support reconnecting now.
>>>>>>>>>
>>>>>>>>> To use it, we could start qemu with:
>>>>>>>>>
>>>>>>>>> qemu-system-x86_64 \
>>>>>>>>>              -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>>>>>>>              -device vhost-user-blk-pci,chardev=char0 \
>>>>>>>>>
>>>>>>>>> and start vhost-user-blk backend with:
>>>>>>>>>
>>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>>>>>>>
>>>>>>>>> Then we can restart vhost-user-blk at any time during VM running.
>>>>>>>> I wonder whether or not it's better to handle this at the level of virtio
>>>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>>>>>>>> driver might be sufficient?
>>>>>>>>
>>>>>>>> Another possible issue is, looks like you need to deal with different kinds
>>>>>>>> of ring layouts e.g packed virtqueues.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> I'm not sure I understand your comments here.
>>>>>>> All these would be guest-visible extensions.
>>>>>> Looks not, it only introduces a shared memory between qemu and
>>>>>> vhost-user backend?
>>>>>>
>>>>>>
>>>>>>> Possible for sure but how is this related to
>>>>>>> a patch supporting transparent reconnects?
>>>>>> I might miss something. My understanding is that we support transparent
>>>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
>>>>>> what capability this series try to add. To me, this series is functional
>>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
>>>>>> ring. So the information is inside guest memory, vhost-user backend can
>>>>>> access it and update it directly. I believe this is some modern NIC did
>>>>>> as well (but index is in MMIO area of course).
>>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> If my understand is correct, it might be not enough to only expose
>>>>> last_avail_idx.
>>>>> Because we would not process descriptors in the same order in which they have
>>>>> been made available sometimes. If so, we can't get correct inflight
>>>>> I/O information
>>>>> from available ring.
>>>> You can get this with the help of the both used ring and last_avail_idx
>>>> I believe. Or maybe you can give us an example?
>>>>
>>> A simple example, we assume ring size is 8:
>>>
>>> 1. guest fill avail ring
>>>
>>> avail ring: 0 1 2 3 4 5 6 7
>>> used ring:
>>>
>>> 2. vhost-user backend complete 4,5,6,7 and fill used ring
>>>
>>> avail ring: 0 1 2 3 4 5 6 7
>>> used ring: 4 5 6 7
>>>
>>> 3. guest fill avail ring again
>>>
>>> avail ring: 4 5 6 7 4 5 6 7
>>> used ring: 4 5 6 7
>>>
>>> 4. vhost-user backend crash
>>>
>>> The inflight descriptors 0, 1, 2, 3 lost.
>>>
>>> Thanks,
>>> Yongji
>>
>> Ok, then we can simply forbid increasing the avail_idx in this case?
>>
>> Basically, it's a question of whether or not it's better to done it in
>> the level of virtio instead of vhost. I'm pretty sure if we expose
>> sufficient information, it could be done without touching vhost-user.
>> And we won't deal with e.g migration and other cases.
>>
> OK, I get your point. That's indeed an alternative way. But this feature seems
> to be only useful to vhost-user backend.


I admit I could not think of a use case other than vhost-user.


>   I'm not sure whether it make sense to
> touch virtio protocol for this feature.


Some possible advantages:

- Feature could be determined and noticed by user or management layer.

- There's no need to invent ring layout specific protocol to record in 
flight descriptors. E.g if my understanding is correct, for this series 
and for the example above, it still can not work for packed virtqueue 
since descriptor id is not sufficient (descriptor could be overwritten 
by used one). You probably need to have a (partial) copy of descriptor 
ring for this.

- No need to deal with migration, all information was in guest memory.

Thanks

>
> Thanks,
> Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-12  7:47                 ` Jason Wang
@ 2018-12-12  9:18                   ` Yongji Xie
  2018-12-13  2:58                     ` Jason Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-12  9:18 UTC (permalink / raw)
  To: jasowang
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun

On Wed, 12 Dec 2018 at 15:47, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 下午2:41, Yongji Xie wrote:
> > On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2018/12/12 上午11:21, Yongji Xie wrote:
> >>> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2018/12/12 上午10:48, Yongji Xie wrote:
> >>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> >>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >>>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>>>>>>>> From: Xie Yongji<xieyongji@baidu.com>
> >>>>>>>>>
> >>>>>>>>> This patchset is aimed at supporting qemu to reconnect
> >>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>>>>>>>> restart.
> >>>>>>>>>
> >>>>>>>>> The patch 1 tries to implenment the sync connection for
> >>>>>>>>> "reconnect socket".
> >>>>>>>>>
> >>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>>>>>>>> to support offering shared memory to backend to record
> >>>>>>>>> its inflight I/O.
> >>>>>>>>>
> >>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
> >>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>>>>>>>
> >>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>>>>>>>> connection closed.
> >>>>>>>>>
> >>>>>>>>> The patch 6 tells qemu that we support reconnecting now.
> >>>>>>>>>
> >>>>>>>>> To use it, we could start qemu with:
> >>>>>>>>>
> >>>>>>>>> qemu-system-x86_64 \
> >>>>>>>>>              -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>>>>>>>              -device vhost-user-blk-pci,chardev=char0 \
> >>>>>>>>>
> >>>>>>>>> and start vhost-user-blk backend with:
> >>>>>>>>>
> >>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>>>>>>>
> >>>>>>>>> Then we can restart vhost-user-blk at any time during VM running.
> >>>>>>>> I wonder whether or not it's better to handle this at the level of virtio
> >>>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >>>>>>>> driver might be sufficient?
> >>>>>>>>
> >>>>>>>> Another possible issue is, looks like you need to deal with different kinds
> >>>>>>>> of ring layouts e.g packed virtqueues.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>> I'm not sure I understand your comments here.
> >>>>>>> All these would be guest-visible extensions.
> >>>>>> Looks not, it only introduces a shared memory between qemu and
> >>>>>> vhost-user backend?
> >>>>>>
> >>>>>>
> >>>>>>> Possible for sure but how is this related to
> >>>>>>> a patch supporting transparent reconnects?
> >>>>>> I might miss something. My understanding is that we support transparent
> >>>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
> >>>>>> what capability this series try to add. To me, this series is functional
> >>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> >>>>>> ring. So the information is inside guest memory, vhost-user backend can
> >>>>>> access it and update it directly. I believe this is some modern NIC did
> >>>>>> as well (but index is in MMIO area of course).
> >>>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> If my understand is correct, it might be not enough to only expose
> >>>>> last_avail_idx.
> >>>>> Because we would not process descriptors in the same order in which they have
> >>>>> been made available sometimes. If so, we can't get correct inflight
> >>>>> I/O information
> >>>>> from available ring.
> >>>> You can get this with the help of the both used ring and last_avail_idx
> >>>> I believe. Or maybe you can give us an example?
> >>>>
> >>> A simple example, we assume ring size is 8:
> >>>
> >>> 1. guest fill avail ring
> >>>
> >>> avail ring: 0 1 2 3 4 5 6 7
> >>> used ring:
> >>>
> >>> 2. vhost-user backend complete 4,5,6,7 and fill used ring
> >>>
> >>> avail ring: 0 1 2 3 4 5 6 7
> >>> used ring: 4 5 6 7
> >>>
> >>> 3. guest fill avail ring again
> >>>
> >>> avail ring: 4 5 6 7 4 5 6 7
> >>> used ring: 4 5 6 7
> >>>
> >>> 4. vhost-user backend crash
> >>>
> >>> The inflight descriptors 0, 1, 2, 3 lost.
> >>>
> >>> Thanks,
> >>> Yongji
> >>
> >> Ok, then we can simply forbid increasing the avail_idx in this case?
> >>
> >> Basically, it's a question of whether or not it's better to done it in
> >> the level of virtio instead of vhost. I'm pretty sure if we expose
> >> sufficient information, it could be done without touching vhost-user.
> >> And we won't deal with e.g migration and other cases.
> >>
> > OK, I get your point. That's indeed an alternative way. But this feature seems
> > to be only useful to vhost-user backend.
>
>
> I admit I could not think of a use case other than vhost-user.
>
>
> >   I'm not sure whether it make sense to
> > touch virtio protocol for this feature.
>
>
> Some possible advantages:
>
> - Feature could be determined and noticed by user or management layer.
>
> - There's no need to invent ring layout specific protocol to record in
> flight descriptors. E.g if my understanding is correct, for this series
> and for the example above, it still can not work for packed virtqueue
> since descriptor id is not sufficient (descriptor could be overwritten
> by used one). You probably need to have a (partial) copy of descriptor
> ring for this.
>
> - No need to deal with migration, all information was in guest memory.
>

Yes, we have those advantages. But seems like handle this in vhost-user
level could be easier to be maintained in production environment. We can
support old guest. And the bug fix will not depend on guest kernel updating.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-12  9:18                   ` Yongji Xie
@ 2018-12-13  2:58                     ` Jason Wang
  2018-12-13  3:41                       ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2018-12-13  2:58 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun


On 2018/12/12 下午5:18, Yongji Xie wrote:
>>>> Ok, then we can simply forbid increasing the avail_idx in this case?
>>>>
>>>> Basically, it's a question of whether or not it's better to done it in
>>>> the level of virtio instead of vhost. I'm pretty sure if we expose
>>>> sufficient information, it could be done without touching vhost-user.
>>>> And we won't deal with e.g migration and other cases.
>>>>
>>> OK, I get your point. That's indeed an alternative way. But this feature seems
>>> to be only useful to vhost-user backend.
>> I admit I could not think of a use case other than vhost-user.
>>
>>
>>>    I'm not sure whether it make sense to
>>> touch virtio protocol for this feature.
>> Some possible advantages:
>>
>> - Feature could be determined and noticed by user or management layer.
>>
>> - There's no need to invent ring layout specific protocol to record in
>> flight descriptors. E.g if my understanding is correct, for this series
>> and for the example above, it still can not work for packed virtqueue
>> since descriptor id is not sufficient (descriptor could be overwritten
>> by used one). You probably need to have a (partial) copy of descriptor
>> ring for this.
>>
>> - No need to deal with migration, all information was in guest memory.
>>
> Yes, we have those advantages. But seems like handle this in vhost-user
> level could be easier to be maintained in production environment. We can
> support old guest. And the bug fix will not depend on guest kernel updating.


Yes. But the my main concern is the layout specific data structure. If 
it could be done through a generic structure (can it?), it would be 
fine. Otherwise, I believe we don't want another negotiation about what 
kind of layout that backend support for reconnect.

Thanks


>
> Thanks,
> Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-13  2:58                     ` Jason Wang
@ 2018-12-13  3:41                       ` Yongji Xie
  2018-12-13 14:56                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-13  3:41 UTC (permalink / raw)
  To: jasowang
  Cc: Michael S. Tsirkin, zhangyu31, Xie Yongji, lilin24, qemu-devel,
	chaiwen, marcandre.lureau, nixun

On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 下午5:18, Yongji Xie wrote:
> >>>> Ok, then we can simply forbid increasing the avail_idx in this case?
> >>>>
> >>>> Basically, it's a question of whether or not it's better to done it in
> >>>> the level of virtio instead of vhost. I'm pretty sure if we expose
> >>>> sufficient information, it could be done without touching vhost-user.
> >>>> And we won't deal with e.g migration and other cases.
> >>>>
> >>> OK, I get your point. That's indeed an alternative way. But this feature seems
> >>> to be only useful to vhost-user backend.
> >> I admit I could not think of a use case other than vhost-user.
> >>
> >>
> >>>    I'm not sure whether it make sense to
> >>> touch virtio protocol for this feature.
> >> Some possible advantages:
> >>
> >> - Feature could be determined and noticed by user or management layer.
> >>
> >> - There's no need to invent ring layout specific protocol to record in
> >> flight descriptors. E.g if my understanding is correct, for this series
> >> and for the example above, it still can not work for packed virtqueue
> >> since descriptor id is not sufficient (descriptor could be overwritten
> >> by used one). You probably need to have a (partial) copy of descriptor
> >> ring for this.
> >>
> >> - No need to deal with migration, all information was in guest memory.
> >>
> > Yes, we have those advantages. But seems like handle this in vhost-user
> > level could be easier to be maintained in production environment. We can
> > support old guest. And the bug fix will not depend on guest kernel updating.
>
>
> Yes. But the my main concern is the layout specific data structure. If
> it could be done through a generic structure (can it?), it would be
> fine. Otherwise, I believe we don't want another negotiation about what
> kind of layout that backend support for reconnect.
>

Yes, the current layout in shared memory didn't support packed virtqueue because
the information of one descriptor in descriptor ring will not be
available once device fetch it.

I also thought about a generic structure before. But I failed... So I
tried another way
to acheive that in this series. In QEMU side, we just provide a shared
memory to backend
and we didn't define anything for this memory. In backend side, they
should know how to
use those memory to record inflight I/O no matter what kind of
virtqueue they used.
Thus,  If we updates virtqueue for new virtio spec in the feature, we
don't need to touch
QEMU and guest. What do you think about it?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (8 preceding siblings ...)
  2018-12-06 13:57 ` Jason Wang
@ 2018-12-13 14:45 ` Michael S. Tsirkin
  2018-12-14  1:56   ` Yongji Xie
  9 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2018-12-13 14:45 UTC (permalink / raw)
  To: elohimes
  Cc: marcandre.lureau, qemu-devel, zhangyu31, chaiwen, nixun, lilin24,
	Xie Yongji

On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
> 
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
> 
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
> 
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
> 
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> 
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
> 
> The patch 6 tells qemu that we support reconnecting now.
> 
> To use it, we could start qemu with:
> 
> qemu-system-x86_64 \
>         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>         -device vhost-user-blk-pci,chardev=char0 \
> 
> and start vhost-user-blk backend with:
> 
> vhost-user-blk -b /path/file -s /path/vhost.socket
> 
> Then we can restart vhost-user-blk at any time during VM running.
> 
> Xie Yongji (6):
>   char-socket: Enable "wait" option for client mode
>   vhost-user: Add shared memory to record inflight I/O
>   libvhost-user: Introduce vu_queue_map_desc()
>   libvhost-user: Support recording inflight I/O in shared memory
>   vhost-user-blk: Add support for reconnecting backend
>   contrib/vhost-user-blk: enable inflight I/O recording

What is missing in all this is documentation.
Specifically docs/interop/vhost-user.txt.

At a high level the design is IMO a good one.

However I would prefer reading the protocol first before
the code.

So here's what I managed to figure out, and it matches
how I imagined it would work when I was still
thinking about out of order for net:

- backend allocates memory to keep its stuff around
- sends it to qemu so it can maintain it
- gets it back on reconnect

format and size etc are all up to the backend,
a good implementation would probably implement some
kind of versioning.

Is this what this implements?

>  chardev/char-socket.c                   |   5 +-
>  contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h   |  19 +++
>  contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
>  hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
>  hw/virtio/vhost-user.c                  |  69 ++++++++
>  hw/virtio/vhost.c                       |   8 +
>  include/hw/virtio/vhost-backend.h       |   4 +
>  include/hw/virtio/vhost-user-blk.h      |   4 +
>  include/hw/virtio/vhost-user.h          |   8 +
>  10 files changed, 452 insertions(+), 52 deletions(-)
> 
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-13  3:41                       ` Yongji Xie
@ 2018-12-13 14:56                         ` Michael S. Tsirkin
  2018-12-14  4:36                           ` Jason Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2018-12-13 14:56 UTC (permalink / raw)
  To: Yongji Xie
  Cc: jasowang, zhangyu31, Xie Yongji, lilin24, qemu-devel, chaiwen,
	marcandre.lureau, nixun

On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote:
> On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2018/12/12 下午5:18, Yongji Xie wrote:
> > >>>> Ok, then we can simply forbid increasing the avail_idx in this case?
> > >>>>
> > >>>> Basically, it's a question of whether or not it's better to done it in
> > >>>> the level of virtio instead of vhost. I'm pretty sure if we expose
> > >>>> sufficient information, it could be done without touching vhost-user.
> > >>>> And we won't deal with e.g migration and other cases.
> > >>>>
> > >>> OK, I get your point. That's indeed an alternative way. But this feature seems
> > >>> to be only useful to vhost-user backend.
> > >> I admit I could not think of a use case other than vhost-user.
> > >>
> > >>
> > >>>    I'm not sure whether it make sense to
> > >>> touch virtio protocol for this feature.
> > >> Some possible advantages:
> > >>
> > >> - Feature could be determined and noticed by user or management layer.
> > >>
> > >> - There's no need to invent ring layout specific protocol to record in
> > >> flight descriptors. E.g if my understanding is correct, for this series
> > >> and for the example above, it still can not work for packed virtqueue
> > >> since descriptor id is not sufficient (descriptor could be overwritten
> > >> by used one). You probably need to have a (partial) copy of descriptor
> > >> ring for this.
> > >>
> > >> - No need to deal with migration, all information was in guest memory.
> > >>
> > > Yes, we have those advantages. But seems like handle this in vhost-user
> > > level could be easier to be maintained in production environment. We can
> > > support old guest. And the bug fix will not depend on guest kernel updating.
> >
> >
> > Yes. But the my main concern is the layout specific data structure. If
> > it could be done through a generic structure (can it?), it would be
> > fine. Otherwise, I believe we don't want another negotiation about what
> > kind of layout that backend support for reconnect.
> >
> 
> Yes, the current layout in shared memory didn't support packed virtqueue because
> the information of one descriptor in descriptor ring will not be
> available once device fetch it.
> 
> I also thought about a generic structure before. But I failed... So I
> tried another way
> to acheive that in this series. In QEMU side, we just provide a shared
> memory to backend
> and we didn't define anything for this memory. In backend side, they
> should know how to
> use those memory to record inflight I/O no matter what kind of
> virtqueue they used.
> Thus,  If we updates virtqueue for new virtio spec in the feature, we
> don't need to touch
> QEMU and guest. What do you think about it?
> 
> Thanks,
> Yongji

I think that's a good direction to take, yes.
Backends need to be very careful about the layout,
with versioning etc.

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-13 14:45 ` Michael S. Tsirkin
@ 2018-12-14  1:56   ` Yongji Xie
  2018-12-14  2:20     ` Michael S. Tsirkin
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-14  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: marcandre.lureau, qemu-devel, zhangyu31, chaiwen, nixun, lilin24,
	Xie Yongji

On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> >
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 6 tells qemu that we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >         -device vhost-user-blk-pci,chardev=char0 \
> >
> > and start vhost-user-blk backend with:
> >
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> >
> > Then we can restart vhost-user-blk at any time during VM running.
> >
> > Xie Yongji (6):
> >   char-socket: Enable "wait" option for client mode
> >   vhost-user: Add shared memory to record inflight I/O
> >   libvhost-user: Introduce vu_queue_map_desc()
> >   libvhost-user: Support recording inflight I/O in shared memory
> >   vhost-user-blk: Add support for reconnecting backend
> >   contrib/vhost-user-blk: enable inflight I/O recording
>
> What is missing in all this is documentation.
> Specifically docs/interop/vhost-user.txt.
>
> At a high level the design is IMO a good one.
>
> However I would prefer reading the protocol first before
> the code.
>
> So here's what I managed to figure out, and it matches
> how I imagined it would work when I was still
> thinking about out of order for net:
>
> - backend allocates memory to keep its stuff around
> - sends it to qemu so it can maintain it
> - gets it back on reconnect
>
> format and size etc are all up to the backend,
> a good implementation would probably implement some
> kind of versioning.
>
> Is this what this implements?
>

Definitely, yes. And the comments looks good to me. Qemu get size and
version from backend, then allocate memory and send it back with
version. Backend knows how to use the memory according to the version.
If we do that, should we allocate the memory per device rather than
per virtqueue?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-14  1:56   ` Yongji Xie
@ 2018-12-14  2:20     ` Michael S. Tsirkin
  2018-12-14  2:33       ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2018-12-14  2:20 UTC (permalink / raw)
  To: Yongji Xie
  Cc: marcandre.lureau, qemu-devel, zhangyu31, chaiwen, nixun, lilin24,
	Xie Yongji

On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > From: Xie Yongji <xieyongji@baidu.com>
> > >
> > > This patchset is aimed at supporting qemu to reconnect
> > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > restart.
> > >
> > > The patch 1 tries to implenment the sync connection for
> > > "reconnect socket".
> > >
> > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > to support offering shared memory to backend to record
> > > its inflight I/O.
> > >
> > > The patch 3,4 are the corresponding libvhost-user patches of
> > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > >
> > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > connection closed.
> > >
> > > The patch 6 tells qemu that we support reconnecting now.
> > >
> > > To use it, we could start qemu with:
> > >
> > > qemu-system-x86_64 \
> > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > >         -device vhost-user-blk-pci,chardev=char0 \
> > >
> > > and start vhost-user-blk backend with:
> > >
> > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > >
> > > Then we can restart vhost-user-blk at any time during VM running.
> > >
> > > Xie Yongji (6):
> > >   char-socket: Enable "wait" option for client mode
> > >   vhost-user: Add shared memory to record inflight I/O
> > >   libvhost-user: Introduce vu_queue_map_desc()
> > >   libvhost-user: Support recording inflight I/O in shared memory
> > >   vhost-user-blk: Add support for reconnecting backend
> > >   contrib/vhost-user-blk: enable inflight I/O recording
> >
> > What is missing in all this is documentation.
> > Specifically docs/interop/vhost-user.txt.
> >
> > At a high level the design is IMO a good one.
> >
> > However I would prefer reading the protocol first before
> > the code.
> >
> > So here's what I managed to figure out, and it matches
> > how I imagined it would work when I was still
> > thinking about out of order for net:
> >
> > - backend allocates memory to keep its stuff around
> > - sends it to qemu so it can maintain it
> > - gets it back on reconnect
> >
> > format and size etc are all up to the backend,
> > a good implementation would probably implement some
> > kind of versioning.
> >
> > Is this what this implements?
> >
> 
> Definitely, yes. And the comments looks good to me. Qemu get size and
> version from backend, then allocate memory and send it back with
> version. Backend knows how to use the memory according to the version.
> If we do that, should we allocate the memory per device rather than
> per virtqueue?
> 
> Thanks,
> Yongji

It's up to you. Maybe both.

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-14  2:20     ` Michael S. Tsirkin
@ 2018-12-14  2:33       ` Yongji Xie
  2018-12-14 21:23         ` Michael S. Tsirkin
  0 siblings, 1 reply; 44+ messages in thread
From: Yongji Xie @ 2018-12-14  2:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: marcandre.lureau, qemu-devel, zhangyu31, chaiwen, nixun, lilin24,
	Xie Yongji

On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > > From: Xie Yongji <xieyongji@baidu.com>
> > > >
> > > > This patchset is aimed at supporting qemu to reconnect
> > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > restart.
> > > >
> > > > The patch 1 tries to implenment the sync connection for
> > > > "reconnect socket".
> > > >
> > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > > to support offering shared memory to backend to record
> > > > its inflight I/O.
> > > >
> > > > The patch 3,4 are the corresponding libvhost-user patches of
> > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > > >
> > > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > > connection closed.
> > > >
> > > > The patch 6 tells qemu that we support reconnecting now.
> > > >
> > > > To use it, we could start qemu with:
> > > >
> > > > qemu-system-x86_64 \
> > > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > > >         -device vhost-user-blk-pci,chardev=char0 \
> > > >
> > > > and start vhost-user-blk backend with:
> > > >
> > > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > > >
> > > > Then we can restart vhost-user-blk at any time during VM running.
> > > >
> > > > Xie Yongji (6):
> > > >   char-socket: Enable "wait" option for client mode
> > > >   vhost-user: Add shared memory to record inflight I/O
> > > >   libvhost-user: Introduce vu_queue_map_desc()
> > > >   libvhost-user: Support recording inflight I/O in shared memory
> > > >   vhost-user-blk: Add support for reconnecting backend
> > > >   contrib/vhost-user-blk: enable inflight I/O recording
> > >
> > > What is missing in all this is documentation.
> > > Specifically docs/interop/vhost-user.txt.
> > >
> > > At a high level the design is IMO a good one.
> > >
> > > However I would prefer reading the protocol first before
> > > the code.
> > >
> > > So here's what I managed to figure out, and it matches
> > > how I imagined it would work when I was still
> > > thinking about out of order for net:
> > >
> > > - backend allocates memory to keep its stuff around
> > > - sends it to qemu so it can maintain it
> > > - gets it back on reconnect
> > >
> > > format and size etc are all up to the backend,
> > > a good implementation would probably implement some
> > > kind of versioning.
> > >
> > > Is this what this implements?
> > >
> >
> > Definitely, yes. And the comments looks good to me. Qemu get size and
> > version from backend, then allocate memory and send it back with
> > version. Backend knows how to use the memory according to the version.
> > If we do that, should we allocate the memory per device rather than
> > per virtqueue?
> >
> > Thanks,
> > Yongji
>
> It's up to you. Maybe both.
>

OK. I think I may still keep it in virtqueue level in v2. Thank you.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-13 14:56                         ` Michael S. Tsirkin
@ 2018-12-14  4:36                           ` Jason Wang
  2018-12-14 13:31                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2018-12-14  4:36 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yongji Xie
  Cc: nixun, zhangyu31, lilin24, qemu-devel, chaiwen, marcandre.lureau,
	Xie Yongji


On 2018/12/13 下午10:56, Michael S. Tsirkin wrote:
> On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote:
>> On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> On 2018/12/12 下午5:18, Yongji Xie wrote:
>>>>>>> Ok, then we can simply forbid increasing the avail_idx in this case?
>>>>>>>
>>>>>>> Basically, it's a question of whether or not it's better to done it in
>>>>>>> the level of virtio instead of vhost. I'm pretty sure if we expose
>>>>>>> sufficient information, it could be done without touching vhost-user.
>>>>>>> And we won't deal with e.g migration and other cases.
>>>>>>>
>>>>>> OK, I get your point. That's indeed an alternative way. But this feature seems
>>>>>> to be only useful to vhost-user backend.
>>>>> I admit I could not think of a use case other than vhost-user.
>>>>>
>>>>>
>>>>>>     I'm not sure whether it make sense to
>>>>>> touch virtio protocol for this feature.
>>>>> Some possible advantages:
>>>>>
>>>>> - Feature could be determined and noticed by user or management layer.
>>>>>
>>>>> - There's no need to invent ring layout specific protocol to record in
>>>>> flight descriptors. E.g if my understanding is correct, for this series
>>>>> and for the example above, it still can not work for packed virtqueue
>>>>> since descriptor id is not sufficient (descriptor could be overwritten
>>>>> by used one). You probably need to have a (partial) copy of descriptor
>>>>> ring for this.
>>>>>
>>>>> - No need to deal with migration, all information was in guest memory.
>>>>>
>>>> Yes, we have those advantages. But seems like handle this in vhost-user
>>>> level could be easier to be maintained in production environment. We can
>>>> support old guest. And the bug fix will not depend on guest kernel updating.
>>>
>>> Yes. But the my main concern is the layout specific data structure. If
>>> it could be done through a generic structure (can it?), it would be
>>> fine. Otherwise, I believe we don't want another negotiation about what
>>> kind of layout that backend support for reconnect.
>>>
>> Yes, the current layout in shared memory didn't support packed virtqueue because
>> the information of one descriptor in descriptor ring will not be
>> available once device fetch it.
>>
>> I also thought about a generic structure before. But I failed... So I
>> tried another way
>> to acheive that in this series. In QEMU side, we just provide a shared
>> memory to backend
>> and we didn't define anything for this memory. In backend side, they
>> should know how to
>> use those memory to record inflight I/O no matter what kind of
>> virtqueue they used.
>> Thus,  If we updates virtqueue for new virtio spec in the feature, we
>> don't need to touch
>> QEMU and guest. What do you think about it?
>>
>> Thanks,
>> Yongji
> I think that's a good direction to take, yes.
> Backends need to be very careful about the layout,
> with versioning etc.


I'm not sure this could be done 100% transparent to qemu. E.g you need 
to deal with reset I think and you need to carefully choose the size of 
the region. Which means you need negotiate the size, layout through 
backend. And need to deal with migration with them. This is another sin 
of splitting virtio dataplane from qemu anyway.


Thanks


>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-14  4:36                           ` Jason Wang
@ 2018-12-14 13:31                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2018-12-14 13:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yongji Xie, nixun, zhangyu31, lilin24, qemu-devel, chaiwen,
	marcandre.lureau, Xie Yongji, maxime.coquelin

On Fri, Dec 14, 2018 at 12:36:01PM +0800, Jason Wang wrote:
> 
> On 2018/12/13 下午10:56, Michael S. Tsirkin wrote:
> > On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote:
> > > On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > On 2018/12/12 下午5:18, Yongji Xie wrote:
> > > > > > > > Ok, then we can simply forbid increasing the avail_idx in this case?
> > > > > > > > 
> > > > > > > > Basically, it's a question of whether or not it's better to done it in
> > > > > > > > the level of virtio instead of vhost. I'm pretty sure if we expose
> > > > > > > > sufficient information, it could be done without touching vhost-user.
> > > > > > > > And we won't deal with e.g migration and other cases.
> > > > > > > > 
> > > > > > > OK, I get your point. That's indeed an alternative way. But this feature seems
> > > > > > > to be only useful to vhost-user backend.
> > > > > > I admit I could not think of a use case other than vhost-user.
> > > > > > 
> > > > > > 
> > > > > > >     I'm not sure whether it make sense to
> > > > > > > touch virtio protocol for this feature.
> > > > > > Some possible advantages:
> > > > > > 
> > > > > > - Feature could be determined and noticed by user or management layer.
> > > > > > 
> > > > > > - There's no need to invent ring layout specific protocol to record in
> > > > > > flight descriptors. E.g if my understanding is correct, for this series
> > > > > > and for the example above, it still can not work for packed virtqueue
> > > > > > since descriptor id is not sufficient (descriptor could be overwritten
> > > > > > by used one). You probably need to have a (partial) copy of descriptor
> > > > > > ring for this.
> > > > > > 
> > > > > > - No need to deal with migration, all information was in guest memory.
> > > > > > 
> > > > > Yes, we have those advantages. But seems like handle this in vhost-user
> > > > > level could be easier to be maintained in production environment. We can
> > > > > support old guest. And the bug fix will not depend on guest kernel updating.
> > > > 
> > > > Yes. But the my main concern is the layout specific data structure. If
> > > > it could be done through a generic structure (can it?), it would be
> > > > fine. Otherwise, I believe we don't want another negotiation about what
> > > > kind of layout that backend support for reconnect.
> > > > 
> > > Yes, the current layout in shared memory didn't support packed virtqueue because
> > > the information of one descriptor in descriptor ring will not be
> > > available once device fetch it.
> > > 
> > > I also thought about a generic structure before. But I failed... So I
> > > tried another way
> > > to acheive that in this series. In QEMU side, we just provide a shared
> > > memory to backend
> > > and we didn't define anything for this memory. In backend side, they
> > > should know how to
> > > use those memory to record inflight I/O no matter what kind of
> > > virtqueue they used.
> > > Thus,  If we updates virtqueue for new virtio spec in the feature, we
> > > don't need to touch
> > > QEMU and guest. What do you think about it?
> > > 
> > > Thanks,
> > > Yongji
> > I think that's a good direction to take, yes.
> > Backends need to be very careful about the layout,
> > with versioning etc.
> 
> 
> I'm not sure this could be done 100% transparent to qemu. E.g you need to
> deal with reset I think and you need to carefully choose the size of the
> region. Which means you need negotiate the size, layout through backend.

I am not sure I follow. The point is all this state is internal to the
backend. QEMU does not care at all - it just helps a little by hanging
on to it.

> And
> need to deal with migration with them.

Good catch.
There definitely is an issue in that you can not migrate with backend
being disconnected: migration needs to flush the backend and we can't
when it's disconnected.  This needs to be addressed.
I think it's cleanest to just defer migration
until backend does reconnect.


Backend cross version migration is all messed up in vhost user, I agree.
There was a plan to fix it that was never executed unfortunately.
Maxime, do you still plan to look into it?

> This is another sin of splitting
> virtio dataplane from qemu anyway.
> 
> 
> Thanks

It wasn't split as such - dpdk was never a part of qemu.  We just
enabled it without fuse hacks.

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-14  2:33       ` Yongji Xie
@ 2018-12-14 21:23         ` Michael S. Tsirkin
  2018-12-15 11:34           ` Yongji Xie
  0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2018-12-14 21:23 UTC (permalink / raw)
  To: Yongji Xie
  Cc: marcandre.lureau, qemu-devel, zhangyu31, chaiwen, nixun, lilin24,
	Xie Yongji

On Fri, Dec 14, 2018 at 10:33:54AM +0800, Yongji Xie wrote:
> On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> > > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > >
> > > > > This patchset is aimed at supporting qemu to reconnect
> > > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > > restart.
> > > > >
> > > > > The patch 1 tries to implenment the sync connection for
> > > > > "reconnect socket".
> > > > >
> > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > > > to support offering shared memory to backend to record
> > > > > its inflight I/O.
> > > > >
> > > > > The patch 3,4 are the corresponding libvhost-user patches of
> > > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > > > >
> > > > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > > > connection closed.
> > > > >
> > > > > The patch 6 tells qemu that we support reconnecting now.
> > > > >
> > > > > To use it, we could start qemu with:
> > > > >
> > > > > qemu-system-x86_64 \
> > > > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > > > >         -device vhost-user-blk-pci,chardev=char0 \
> > > > >
> > > > > and start vhost-user-blk backend with:
> > > > >
> > > > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > > > >
> > > > > Then we can restart vhost-user-blk at any time during VM running.
> > > > >
> > > > > Xie Yongji (6):
> > > > >   char-socket: Enable "wait" option for client mode
> > > > >   vhost-user: Add shared memory to record inflight I/O
> > > > >   libvhost-user: Introduce vu_queue_map_desc()
> > > > >   libvhost-user: Support recording inflight I/O in shared memory
> > > > >   vhost-user-blk: Add support for reconnecting backend
> > > > >   contrib/vhost-user-blk: enable inflight I/O recording
> > > >
> > > > What is missing in all this is documentation.
> > > > Specifically docs/interop/vhost-user.txt.
> > > >
> > > > At a high level the design is IMO a good one.
> > > >
> > > > However I would prefer reading the protocol first before
> > > > the code.
> > > >
> > > > So here's what I managed to figure out, and it matches
> > > > how I imagined it would work when I was still
> > > > thinking about out of order for net:
> > > >
> > > > - backend allocates memory to keep its stuff around
> > > > - sends it to qemu so it can maintain it
> > > > - gets it back on reconnect
> > > >
> > > > format and size etc are all up to the backend,
> > > > a good implementation would probably implement some
> > > > kind of versioning.
> > > >
> > > > Is this what this implements?
> > > >
> > >
> > > Definitely, yes. And the comments looks good to me. Qemu get size and
> > > version from backend, then allocate memory and send it back with
> > > version. Backend knows how to use the memory according to the version.
> > > If we do that, should we allocate the memory per device rather than
> > > per virtqueue?
> > >
> > > Thanks,
> > > Yongji
> >
> > It's up to you. Maybe both.
> >
> 
> OK. I think I may still keep it in virtqueue level in v2. Thank you.
> 
> Thanks,
> Yongji

I'd actually add options for both, and backend can set size 0 if it
wants to.

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
  2018-12-14 21:23         ` Michael S. Tsirkin
@ 2018-12-15 11:34           ` Yongji Xie
  0 siblings, 0 replies; 44+ messages in thread
From: Yongji Xie @ 2018-12-15 11:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: marcandre.lureau, qemu-devel, zhangyu31, chaiwen, nixun, lilin24,
	Xie Yongji

On Sat, 15 Dec 2018 at 05:23, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Dec 14, 2018 at 10:33:54AM +0800, Yongji Xie wrote:
> > On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> > > > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > > >
> > > > > > This patchset is aimed at supporting qemu to reconnect
> > > > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > > > restart.
> > > > > >
> > > > > > The patch 1 tries to implenment the sync connection for
> > > > > > "reconnect socket".
> > > > > >
> > > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > > > > to support offering shared memory to backend to record
> > > > > > its inflight I/O.
> > > > > >
> > > > > > The patch 3,4 are the corresponding libvhost-user patches of
> > > > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > > > > >
> > > > > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > > > > connection closed.
> > > > > >
> > > > > > The patch 6 tells qemu that we support reconnecting now.
> > > > > >
> > > > > > To use it, we could start qemu with:
> > > > > >
> > > > > > qemu-system-x86_64 \
> > > > > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > > > > >         -device vhost-user-blk-pci,chardev=char0 \
> > > > > >
> > > > > > and start vhost-user-blk backend with:
> > > > > >
> > > > > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > > > > >
> > > > > > Then we can restart vhost-user-blk at any time during VM running.
> > > > > >
> > > > > > Xie Yongji (6):
> > > > > >   char-socket: Enable "wait" option for client mode
> > > > > >   vhost-user: Add shared memory to record inflight I/O
> > > > > >   libvhost-user: Introduce vu_queue_map_desc()
> > > > > >   libvhost-user: Support recording inflight I/O in shared memory
> > > > > >   vhost-user-blk: Add support for reconnecting backend
> > > > > >   contrib/vhost-user-blk: enable inflight I/O recording
> > > > >
> > > > > What is missing in all this is documentation.
> > > > > Specifically docs/interop/vhost-user.txt.
> > > > >
> > > > > At a high level the design is IMO a good one.
> > > > >
> > > > > However I would prefer reading the protocol first before
> > > > > the code.
> > > > >
> > > > > So here's what I managed to figure out, and it matches
> > > > > how I imagined it would work when I was still
> > > > > thinking about out of order for net:
> > > > >
> > > > > - backend allocates memory to keep its stuff around
> > > > > - sends it to qemu so it can maintain it
> > > > > - gets it back on reconnect
> > > > >
> > > > > format and size etc are all up to the backend,
> > > > > a good implementation would probably implement some
> > > > > kind of versioning.
> > > > >
> > > > > Is this what this implements?
> > > > >
> > > >
> > > > Definitely, yes. And the comments looks good to me. Qemu get size and
> > > > version from backend, then allocate memory and send it back with
> > > > version. Backend knows how to use the memory according to the version.
> > > > If we do that, should we allocate the memory per device rather than
> > > > per virtqueue?
> > > >
> > > > Thanks,
> > > > Yongji
> > >
> > > It's up to you. Maybe both.
> > >
> >
> > OK. I think I may still keep it in virtqueue level in v2. Thank you.
> >
> > Thanks,
> > Yongji
>
> I'd actually add options for both, and backend can set size 0 if it
> wants to.
>
OK, let me try it in v2.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-12-15 11:34 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode elohimes
2018-12-06  7:23   ` Marc-André Lureau
2018-12-06  7:53     ` Yongji Xie
2018-12-06  9:31   ` Yury Kotov
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O elohimes
2018-12-06  7:19   ` Marc-André Lureau
2018-12-06  7:22     ` Yongji Xie
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 3/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
2018-12-06  7:16   ` Marc-André Lureau
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 4/6] libvhost-user: Support recording inflight I/O in shared memory elohimes
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend elohimes
2018-12-06 12:21   ` Yury Kotov
2018-12-06 13:26     ` Yongji Xie
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 6/6] contrib/vhost-user-blk: enable inflight I/O recording elohimes
2018-12-06  7:23 ` [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting Marc-André Lureau
2018-12-06  7:43   ` Yongji Xie
2018-12-06  9:21 ` Yury Kotov
2018-12-06  9:41   ` Yongji Xie
2018-12-06  9:52     ` Yury Kotov
2018-12-06 10:35       ` Yongji Xie
2018-12-06 13:57 ` Jason Wang
2018-12-06 13:59   ` Michael S. Tsirkin
2018-12-10  9:32     ` Jason Wang
2018-12-12  2:48       ` Yongji Xie
2018-12-12  3:00         ` Jason Wang
2018-12-12  3:21           ` Yongji Xie
2018-12-12  4:06             ` Jason Wang
2018-12-12  6:41               ` Yongji Xie
2018-12-12  7:47                 ` Jason Wang
2018-12-12  9:18                   ` Yongji Xie
2018-12-13  2:58                     ` Jason Wang
2018-12-13  3:41                       ` Yongji Xie
2018-12-13 14:56                         ` Michael S. Tsirkin
2018-12-14  4:36                           ` Jason Wang
2018-12-14 13:31                             ` Michael S. Tsirkin
2018-12-06 14:00   ` Jason Wang
2018-12-07  8:56   ` Yongji Xie
2018-12-13 14:45 ` Michael S. Tsirkin
2018-12-14  1:56   ` Yongji Xie
2018-12-14  2:20     ` Michael S. Tsirkin
2018-12-14  2:33       ` Yongji Xie
2018-12-14 21:23         ` Michael S. Tsirkin
2018-12-15 11:34           ` Yongji Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.