All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting
@ 2019-01-22  8:31 elohimes
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend elohimes
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: elohimes @ 2019-01-22  8:31 UTC (permalink / raw)
  To: mst, stefanha, marcandre.lureau, berrange, jasowang,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patchset is aimed at supporting qemu to reconnect
vhost-user-blk backend after vhost-user-blk backend crash or
restart.

The patch 1 introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support support transferring shared
buffer between qemu and backend.

The patch 2,3 are the corresponding libvhost-user patches of
patch 1. Make libvhost-user support VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD.

The patch 4 allows vhost-user-blk to use the two new messages
to get/set inflight buffer from/to backend.

The patch 5 supports vhost-user-blk to reconnect backend when
connection closed.

The patch 6 introduces VHOST_USER_PROTOCOL_F_SLAVE_SHMFD
to vhost-user-blk backend which is used to tell qemu that
we support reconnecting now.

This series is based on Daniel P. Berrangé's patchset:

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html

To use it, we could start qemu with:

qemu-system-x86_64 \
        -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1, \
        -device vhost-user-blk-pci,chardev=char0 \

and start vhost-user-blk backend with:

vhost-user-blk -b /path/file -s /path/vhost.socket

Then we can restart vhost-user-blk at any time during VM running.

V4 to V5:
- Drop patch that enables "nowait" option on client sockets
- Support resubmitting inflight I/O in order
- Make inflight I/O tracking more robust
- Remove align field and add queue size field in VhostUserInflight
- Document more details in vhost-user.txt

V3 to V4:
- Drop messages VHOST_USER_GET_SHM_SIZE and VHOST_USER_SET_SHM_FD
- Introduce two new messages VHOST_USER_GET_INFLIGHT_FD
  and VHOST_USER_SET_INFLIGHT_FD
- Allocate inflight buffer in backend rather than in qemu
- Document a recommended format for inflight buffer

V2 to V3:
- Using exisiting wait/nowait options to control connection on
  client sockets instead of introducing "disconnected" option.
- Support the case that vhost-user backend restart during initialzation
  of vhost-user-blk device.

V1 to V2:
- Introduce "disconnected" option for chardev instead of reuse "wait"
  option
- Support the case that QEMU starts before vhost-user backend
- Drop message VHOST_USER_SET_VRING_INFLIGHT
- Introduce two new messages VHOST_USER_GET_SHM_SIZE
  and VHOST_USER_SET_SHM_FD

Xie Yongji (6):
  vhost-user: Support transferring inflight buffer between qemu and
    backend
  libvhost-user: Introduce vu_queue_map_desc()
  libvhost-user: Support tracking inflight I/O in shared memory
  vhost-user-blk: Add support to get/set inflight buffer
  vhost-user-blk: Add support to reconnect backend
  contrib/vhost-user-blk: enable inflight I/O tracking

 Makefile                                |   2 +-
 contrib/libvhost-user/libvhost-user.c   | 402 ++++++++++++++++++++----
 contrib/libvhost-user/libvhost-user.h   |  40 +++
 contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
 docs/interop/vhost-user.txt             | 101 ++++++
 hw/block/vhost-user-blk.c               | 227 ++++++++++---
 hw/virtio/vhost-user.c                  | 110 +++++++
 hw/virtio/vhost.c                       | 105 +++++++
 include/hw/virtio/vhost-backend.h       |  10 +
 include/hw/virtio/vhost-user-blk.h      |   5 +
 include/hw/virtio/vhost.h               |  19 ++
 11 files changed, 925 insertions(+), 99 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
@ 2019-01-22  8:31 ` elohimes
  2019-01-29  4:11   ` Stefan Hajnoczi
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 2/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: elohimes @ 2019-01-22  8:31 UTC (permalink / raw)
  To: mst, stefanha, marcandre.lureau, berrange, jasowang,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.

Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the
shared buffer from backend. Then qemu should send it back
through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.

This shared buffer is used to track inflight I/O by backend.
Qemu should clear it when vm reset.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 docs/interop/vhost-user.txt       | 101 +++++++++++++++++++++++++++
 hw/virtio/vhost-user.c            | 110 ++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 | 105 ++++++++++++++++++++++++++++
 include/hw/virtio/vhost-backend.h |  10 +++
 include/hw/virtio/vhost.h         |  19 ++++++
 5 files changed, 345 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index c2194711d9..fa470a8fe2 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -142,6 +142,18 @@ Depending on the request type, payload can be:
    Offset: a 64-bit offset of this area from the start of the
        supplied file descriptor
 
+ * Inflight description
+   ---------------------------------------------------------------
+   | mmap size | mmap offset | version | num queues | queue size |
+   ---------------------------------------------------------------
+
+   mmap size: a 64-bit size of area to track inflight I/O
+   mmap offset: a 64-bit offset of this area from the start
+                of the supplied file descriptor
+   version: a 16-bit version of this area
+   num queues: a 16-bit number of virtqueues
+   queue size: a 16-bit size of virtqueues
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -157,6 +169,7 @@ typedef struct VhostUserMsg {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserVringArea area;
+        VhostUserInflight inflight;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -175,6 +188,7 @@ the ones that do:
  * VHOST_USER_GET_PROTOCOL_FEATURES
  * VHOST_USER_GET_VRING_BASE
  * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
+ * VHOST_USER_GET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
 
 [ Also see the section on REPLY_ACK protocol extension. ]
 
@@ -188,6 +202,7 @@ in the ancillary data:
  * VHOST_USER_SET_VRING_CALL
  * VHOST_USER_SET_VRING_ERR
  * VHOST_USER_SET_SLAVE_REQ_FD
+ * VHOST_USER_SET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
 
 If Master is unable to send the full message or receives a wrong reply it will
 close the connection. An optional reconnection mechanism can be implemented.
@@ -382,6 +397,71 @@ If VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD protocol feature is negotiated,
 slave can send file descriptors (at most 8 descriptors in each message)
 to master via ancillary data using this fd communication channel.
 
+Inflight I/O tracking
+---------------------
+
+To support reconnecting after restart or crash, slave may need to get
+and resubmit inflight I/O. If virtqueue is processed in order, we can
+easily achieve that by getting the inflight head-descriptor from avail ring.
+However, it can't work when we process descriptors out-of-order because
+some entries which store inflight head-descriptor in avail ring might be
+overrided by new entries. To solve this problem, slave need to allocate a
+buffer to track inflight head-descriptor and share it with master for
+persistent. VHOST_USER_GET_INFLIGHT_FD and VHOST_USER_SET_INFLIGHT_FD are
+used to transfer this buffer between master and slave. And the format of
+this buffer is described below:
+
+-------------------------------------------------------
+| queue0 region | queue1 region | ... | queueN region |
+-------------------------------------------------------
+
+N is the number of available virtqueues. Slave could get it from num queues
+field of VhostUserInflight.
+
+Queue region can be implemented as:
+
+typedef struct DescState {
+    uint8_t inuse;
+    uint8_t version;
+    uint16_t used_idx;
+    uint16_t avail_idx;
+    uint16_t reserved;
+} DescState;
+
+typedef struct QueueRegion {
+    uint8_t valid;
+    uint16_t desc_num;
+    DescState desc[0];
+} QueueRegion;
+
+The struct DescState is used to describe one head-descriptor's state. The
+fields have following meanings:
+
+    inuse: Indicate whether the descriptor is inuse or not.
+
+    version: Indicate whether we have an atomic update to used ring and
+    inflight buffer when slave crash at that point. This field should be
+    increased by one before and after this two updates. An odd version
+    indicates an in-progress update.
+
+    used_idx: Store old index of used ring before we update used ring and
+    inflight buffer so that slave can know whether an odd version inflight
+    head-descriptor in inflight buffer is processed or not.
+
+    avail_idx: Used to preserve the descriptor's order in avail ring so that
+    slave can resubmit descriptors in order.
+
+    reserved: Reserved for future use.
+
+An array of struct DescState is maintained to track all head-descriptors' state
+in queue region. The desc_num is the number of those head-descriptors which is
+also equal to the queue size of the virtqueue. Slave could get it from queue size
+field of VhostUserInflight. The valid field is used to specify whether the region
+is valid or not. Slave can use it to detect whether vm is reset because qemu will
+clear the buffer when the reset happens.
+
+Note that slave should align each region to cache line size.
+
 Protocol features
 -----------------
 
@@ -397,6 +477,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CONFIG         9
 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
+#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
 
 Master message types
 --------------------
@@ -761,6 +842,26 @@ Master message types
       was previously sent.
       The value returned is an error indication; 0 is success.
 
+ * VHOST_USER_GET_INFLIGHT_FD
+      Id: 31
+      Equivalent ioctl: N/A
+      Master payload: inflight description
+
+      When VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD protocol feature has been
+      successfully negotiated, this message is submitted by master to get
+      a shared buffer from slave. The shared buffer will be used to track
+      inflight I/O by slave. QEMU should clear it when vm reset.
+
+ * VHOST_USER_SET_INFLIGHT_FD
+      Id: 32
+      Equivalent ioctl: N/A
+      Master payload: inflight description
+
+      When VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD protocol feature has been
+      successfully negotiated, this message is submitted by master to send
+      the shared inflight buffer back to slave so that slave could get
+      inflight I/O after a crash or restart.
+
 Slave message types
 -------------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 564a31d12c..9aa6204c91 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_GET_INFLIGHT_FD = 31,
+    VHOST_USER_SET_INFLIGHT_FD = 32,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -147,6 +150,14 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserInflight {
+    uint64_t mmap_size;
+    uint64_t mmap_offset;
+    uint16_t version;
+    uint16_t num_queues;
+    uint16_t queue_size;
+} VhostUserInflight;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -169,6 +180,7 @@ typedef union {
         VhostUserConfig config;
         VhostUserCryptoSession session;
         VhostUserVringArea area;
+        VhostUserInflight inflight;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1739,6 +1751,102 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
     return result;
 }
 
+static int vhost_user_get_inflight_fd(struct vhost_dev *dev,
+                                      uint16_t queue_size,
+                                      struct vhost_inflight *inflight)
+{
+    void *addr;
+    int fd;
+    struct vhost_user *u = dev->opaque;
+    CharBackend *chr = u->user->chr;
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_GET_INFLIGHT_FD,
+        .hdr.flags = VHOST_USER_VERSION,
+        .payload.inflight.num_queues = dev->nvqs,
+        .payload.inflight.queue_size = queue_size,
+        .hdr.size = sizeof(msg.payload.inflight),
+    };
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
+        return 0;
+    }
+
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        return -1;
+    }
+
+    if (msg.hdr.request != VHOST_USER_GET_INFLIGHT_FD) {
+        error_report("Received unexpected msg type. "
+                     "Expected %d received %d",
+                     VHOST_USER_GET_INFLIGHT_FD, msg.hdr.request);
+        return -1;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.inflight)) {
+        error_report("Received bad msg size.");
+        return -1;
+    }
+
+    if (!msg.payload.inflight.mmap_size) {
+        return 0;
+    }
+
+    fd = qemu_chr_fe_get_msgfd(chr);
+    if (fd < 0) {
+        error_report("Failed to get mem fd");
+        return -1;
+    }
+
+    addr = mmap(0, msg.payload.inflight.mmap_size, PROT_READ | PROT_WRITE,
+                MAP_SHARED, fd, msg.payload.inflight.mmap_offset);
+
+    if (addr == MAP_FAILED) {
+        error_report("Failed to mmap mem fd");
+        close(fd);
+        return -1;
+    }
+
+    inflight->addr = addr;
+    inflight->fd = fd;
+    inflight->size = msg.payload.inflight.mmap_size;
+    inflight->offset = msg.payload.inflight.mmap_offset;
+    inflight->version = msg.payload.inflight.version;
+    inflight->queue_size = queue_size;
+
+    return 0;
+}
+
+static int vhost_user_set_inflight_fd(struct vhost_dev *dev,
+                                      struct vhost_inflight *inflight)
+{
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_SET_INFLIGHT_FD,
+        .hdr.flags = VHOST_USER_VERSION,
+        .payload.inflight.mmap_size = inflight->size,
+        .payload.inflight.mmap_offset = inflight->offset,
+        .payload.inflight.version = inflight->version,
+        .payload.inflight.num_queues = dev->nvqs,
+        .payload.inflight.queue_size = inflight->queue_size,
+        .hdr.size = sizeof(msg.payload.inflight),
+    };
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
+        return 0;
+    }
+
+    if (vhost_user_write(dev, &msg, &inflight->fd, 1) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
 VhostUserState *vhost_user_init(void)
 {
     VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1790,4 +1898,6 @@ const VhostOps user_ops = {
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
         .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
+        .vhost_get_inflight_fd = vhost_user_get_inflight_fd,
+        .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 569c4053ea..0a4de6b91b 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1481,6 +1481,111 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
     hdev->config_ops = ops;
 }
 
+void vhost_dev_reset_inflight(struct vhost_inflight *inflight)
+{
+    if (inflight->addr) {
+        memset(inflight->addr, 0, inflight->size);
+    }
+}
+
+void vhost_dev_free_inflight(struct vhost_inflight *inflight)
+{
+    if (inflight->addr) {
+        qemu_memfd_free(inflight->addr, inflight->size, inflight->fd);
+        inflight->addr = NULL;
+        inflight->fd = -1;
+    }
+}
+
+static int vhost_dev_resize_inflight(struct vhost_inflight *inflight,
+                                     uint64_t new_size)
+{
+    Error *err = NULL;
+    int fd = -1;
+    void *addr = qemu_memfd_alloc("vhost-inflight", new_size,
+                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
+                                  &fd, &err);
+
+    if (err) {
+        error_report_err(err);
+        return -1;
+    }
+
+    vhost_dev_free_inflight(inflight);
+    inflight->offset = 0;
+    inflight->addr = addr;
+    inflight->fd = fd;
+    inflight->size = new_size;
+
+    return 0;
+}
+
+void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f)
+{
+    if (inflight->addr) {
+        qemu_put_be64(f, inflight->size);
+        qemu_put_be16(f, inflight->version);
+        qemu_put_be16(f, inflight->queue_size);
+        qemu_put_buffer(f, inflight->addr, inflight->size);
+    } else {
+        qemu_put_be64(f, 0);
+    }
+}
+
+int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f)
+{
+    uint64_t size;
+
+    size = qemu_get_be64(f);
+    if (!size) {
+        return 0;
+    }
+
+    if (inflight->size != size) {
+        if (vhost_dev_resize_inflight(inflight, size)) {
+            return -1;
+        }
+    }
+    inflight->version = qemu_get_be16(f);
+    inflight->queue_size = qemu_get_be16(f);
+
+    qemu_get_buffer(f, inflight->addr, size);
+
+    return 0;
+}
+
+int vhost_dev_set_inflight(struct vhost_dev *dev,
+                           struct vhost_inflight *inflight)
+{
+    int r;
+
+    if (dev->vhost_ops->vhost_set_inflight_fd && inflight->addr) {
+        r = dev->vhost_ops->vhost_set_inflight_fd(dev, inflight);
+        if (r) {
+            VHOST_OPS_DEBUG("vhost_set_inflight_fd failed");
+            return -errno;
+        }
+    }
+
+    return 0;
+}
+
+int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
+                           struct vhost_inflight *inflight)
+{
+    int r;
+
+    if (dev->vhost_ops->vhost_get_inflight_fd) {
+        r = dev->vhost_ops->vhost_get_inflight_fd(dev, queue_size, inflight);
+        if (r) {
+            VHOST_OPS_DEBUG("vhost_get_inflight_fd failed");
+            return -errno;
+        }
+    }
+
+    return 0;
+}
+
 /* Host notifiers must be enabled at this point. */
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81283ec50f..d6632a18e6 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
     VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
 } VhostSetConfigType;
 
+struct vhost_inflight;
 struct vhost_dev;
 struct vhost_log;
 struct vhost_memory;
@@ -104,6 +105,13 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
 typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
                                                 MemoryRegionSection *section);
 
+typedef int (*vhost_get_inflight_fd_op)(struct vhost_dev *dev,
+                                        uint16_t queue_size,
+                                        struct vhost_inflight *inflight);
+
+typedef int (*vhost_set_inflight_fd_op)(struct vhost_dev *dev,
+                                        struct vhost_inflight *inflight);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -142,6 +150,8 @@ typedef struct VhostOps {
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
     vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
+    vhost_get_inflight_fd_op vhost_get_inflight_fd;
+    vhost_set_inflight_fd_op vhost_set_inflight_fd;
 } VhostOps;
 
 extern const VhostOps user_ops;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a7f449fa87..f44f93f54a 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -7,6 +7,16 @@
 #include "exec/memory.h"
 
 /* Generic structures common for any vhost based device. */
+
+struct vhost_inflight {
+    int fd;
+    void *addr;
+    uint64_t size;
+    uint64_t offset;
+    uint16_t version;
+    uint16_t queue_size;
+};
+
 struct vhost_virtqueue {
     int kick;
     int call;
@@ -120,4 +130,13 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
  */
 void vhost_dev_set_config_notifier(struct vhost_dev *dev,
                                    const VhostDevConfigOps *ops);
+
+void vhost_dev_reset_inflight(struct vhost_inflight *inflight);
+void vhost_dev_free_inflight(struct vhost_inflight *inflight);
+void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f);
+int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f);
+int vhost_dev_set_inflight(struct vhost_dev *dev,
+                           struct vhost_inflight *inflight);
+int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
+                           struct vhost_inflight *inflight);
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v5 2/6] libvhost-user: Introduce vu_queue_map_desc()
  2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend elohimes
@ 2019-01-22  8:31 ` elohimes
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory elohimes
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: elohimes @ 2019-01-22  8:31 UTC (permalink / raw)
  To: mst, stefanha, marcandre.lureau, berrange, jasowang,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Introduce vu_queue_map_desc() which should be
independent with vu_queue_pop();

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 88 ++++++++++++++++-----------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index a6b46cdc03..23bd52264c 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -1853,49 +1853,20 @@ virtqueue_alloc_element(size_t sz,
     return elem;
 }
 
-void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+static void *
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
 {
-    unsigned int i, head, max, desc_len;
+    struct vring_desc *desc = vq->vring.desc;
     uint64_t desc_addr, read_len;
+    unsigned int desc_len;
+    unsigned int max = vq->vring.num;
+    unsigned int i = idx;
     VuVirtqElement *elem;
-    unsigned out_num, in_num;
+    unsigned int out_num = 0, in_num = 0;
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
     struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE];
-    struct vring_desc *desc;
     int rc;
 
-    if (unlikely(dev->broken) ||
-        unlikely(!vq->vring.avail)) {
-        return NULL;
-    }
-
-    if (vu_queue_empty(dev, vq)) {
-        return NULL;
-    }
-    /* Needed after virtio_queue_empty(), see comment in
-     * virtqueue_num_heads(). */
-    smp_rmb();
-
-    /* When we start there are none of either input nor output. */
-    out_num = in_num = 0;
-
-    max = vq->vring.num;
-    if (vq->inuse >= vq->vring.num) {
-        vu_panic(dev, "Virtqueue size exceeded");
-        return NULL;
-    }
-
-    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
-        return NULL;
-    }
-
-    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
-        vring_set_avail_event(vq, vq->last_avail_idx);
-    }
-
-    i = head;
-    desc = vq->vring.desc;
     if (desc[i].flags & VRING_DESC_F_INDIRECT) {
         if (desc[i].len % sizeof(struct vring_desc)) {
             vu_panic(dev, "Invalid size for indirect buffer table");
@@ -1947,12 +1918,13 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
+        vu_panic(dev, "read descriptor error");
         return NULL;
     }
 
     /* Now copy what we have collected and mapped */
     elem = virtqueue_alloc_element(sz, out_num, in_num);
-    elem->index = head;
+    elem->index = idx;
     for (i = 0; i < out_num; i++) {
         elem->out_sg[i] = iov[i];
     }
@@ -1960,6 +1932,48 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         elem->in_sg[i] = iov[out_num + i];
     }
 
+    return elem;
+}
+
+void *
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+{
+    unsigned int head;
+    VuVirtqElement *elem;
+
+    if (unlikely(dev->broken) ||
+        unlikely(!vq->vring.avail)) {
+        return NULL;
+    }
+
+    if (vu_queue_empty(dev, vq)) {
+        return NULL;
+    }
+    /*
+     * Needed after virtio_queue_empty(), see comment in
+     * virtqueue_num_heads().
+     */
+    smp_rmb();
+
+    if (vq->inuse >= vq->vring.num) {
+        vu_panic(dev, "Virtqueue size exceeded");
+        return NULL;
+    }
+
+    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
+        return NULL;
+    }
+
+    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
+        vring_set_avail_event(vq, vq->last_avail_idx);
+    }
+
+    elem = vu_queue_map_desc(dev, vq, head, sz);
+
+    if (!elem) {
+        return NULL;
+    }
+
     vq->inuse++;
 
     return elem;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend elohimes
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 2/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
@ 2019-01-22  8:31 ` elohimes
  2019-01-30  2:31   ` Jason Wang
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 4/6] vhost-user-blk: Add support to get/set inflight buffer elohimes
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: elohimes @ 2019-01-22  8:31 UTC (permalink / raw)
  To: mst, stefanha, marcandre.lureau, berrange, jasowang,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch adds support for VHOST_USER_GET_INFLIGHT_FD and
VHOST_USER_SET_INFLIGHT_FD message to set/get shared buffer
to/from qemu. Then backend can track inflight I/O in this buffer.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 Makefile                              |   2 +-
 contrib/libvhost-user/libvhost-user.c | 314 ++++++++++++++++++++++++--
 contrib/libvhost-user/libvhost-user.h |  40 ++++
 3 files changed, 335 insertions(+), 21 deletions(-)

diff --git a/Makefile b/Makefile
index dccba1dca2..3f3a5ace66 100644
--- a/Makefile
+++ b/Makefile
@@ -474,7 +474,7 @@ Makefile: $(version-obj-y)
 # Build libraries
 
 libqemuutil.a: $(util-obj-y) $(trace-obj-y) $(stub-obj-y)
-libvhost-user.a: $(libvhost-user-obj-y)
+libvhost-user.a: $(libvhost-user-obj-y) $(util-obj-y) $(stub-obj-y)
 
 ######################################################################
 
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 23bd52264c..066fea668b 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -41,6 +41,8 @@
 #endif
 
 #include "qemu/atomic.h"
+#include "qemu/osdep.h"
+#include "qemu/memfd.h"
 
 #include "libvhost-user.h"
 
@@ -53,6 +55,18 @@
             _min1 < _min2 ? _min1 : _min2; })
 #endif
 
+/* Round number down to multiple */
+#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
+
+/* Round number up to multiple */
+#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
+
+/* Align each region to cache line size in inflight buffer */
+#define INFLIGHT_ALIGNMENT 64
+
+/* The version of inflight buffer */
+#define INFLIGHT_VERSION 1
+
 #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
 
 /* The version of the protocol we support */
@@ -66,6 +80,20 @@
         }                                       \
     } while (0)
 
+static inline
+bool has_feature(uint64_t features, unsigned int fbit)
+{
+    assert(fbit < 64);
+    return !!(features & (1ULL << fbit));
+}
+
+static inline
+bool vu_has_feature(VuDev *dev,
+                    unsigned int fbit)
+{
+    return has_feature(dev->features, fbit);
+}
+
 static const char *
 vu_request_to_string(unsigned int req)
 {
@@ -100,6 +128,8 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_POSTCOPY_LISTEN),
         REQ(VHOST_USER_POSTCOPY_END),
+        REQ(VHOST_USER_GET_INFLIGHT_FD),
+        REQ(VHOST_USER_SET_INFLIGHT_FD),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -890,6 +920,59 @@ vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static int
+vu_check_queue_inflights(VuDev *dev, VuVirtq *vq)
+{
+    int i = 0;
+
+    if (!has_feature(dev->protocol_features,
+        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
+        return 0;
+    }
+
+    if (unlikely(!vq->inflight)) {
+        return -1;
+    }
+
+    /* check whether vm is reset */
+    if (unlikely(!vq->inflight->valid)) {
+        vq->inflight->valid = 1;
+        return 0;
+    }
+
+    vq->used_idx = vq->vring.used->idx;
+    vq->inflight_num = 0;
+    for (i = 0; i < vq->inflight->desc_num; i++) {
+        if (unlikely(vq->inflight->desc[i].version % 2 == 1)) {
+            if (vq->inflight->desc[i].used_idx != vq->used_idx) {
+                vq->inflight->desc[i].inuse = 0;
+            } else {
+                vq->inflight->desc[i].inuse = 1;
+            }
+
+            barrier();
+
+            vq->inflight->desc[i].version++;
+        }
+
+        if (vq->inflight->desc[i].inuse == 0) {
+            continue;
+        }
+
+        vq->inflight_desc[vq->inflight_num++] = i;
+        vq->inuse++;
+    }
+    vq->shadow_avail_idx = vq->last_avail_idx = vq->inuse + vq->used_idx;
+
+    /* in case of I/O hang after reconnecting */
+    if (eventfd_write(vq->kick_fd, 1) ||
+        eventfd_write(vq->call_fd, 1)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 static bool
 vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -925,6 +1008,10 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
                dev->vq[index].kick_fd, index);
     }
 
+    if (vu_check_queue_inflights(dev, &dev->vq[index])) {
+        vu_panic(dev, "Failed to check inflights for vq: %d\n", index);
+    }
+
     return false;
 }
 
@@ -1215,6 +1302,127 @@ vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static inline uint64_t
+vu_inflight_queue_size(uint16_t queue_size)
+{
+    return ALIGN_UP(sizeof(VuDescState) * queue_size +
+           sizeof(uint16_t), INFLIGHT_ALIGNMENT);
+}
+
+static bool
+vu_get_inflight_fd(VuDev *dev, VhostUserMsg *vmsg)
+{
+    int fd;
+    void *addr;
+    uint64_t mmap_size;
+    uint16_t num_queues, queue_size;
+
+    if (vmsg->size != sizeof(vmsg->payload.inflight)) {
+        vu_panic(dev, "Invalid get_inflight_fd message:%d", vmsg->size);
+        vmsg->payload.inflight.mmap_size = 0;
+        return true;
+    }
+
+    num_queues = vmsg->payload.inflight.num_queues;
+    queue_size = vmsg->payload.inflight.queue_size;
+
+    DPRINT("set_inflight_fd num_queues: %"PRId16"\n", num_queues);
+    DPRINT("set_inflight_fd queue_size: %"PRId16"\n", queue_size);
+
+    mmap_size = vu_inflight_queue_size(queue_size) * num_queues;
+
+    addr = qemu_memfd_alloc("vhost-inflight", mmap_size,
+                            F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
+                            &fd, NULL);
+
+    if (!addr) {
+        vu_panic(dev, "Failed to alloc vhost inflight area");
+        vmsg->payload.inflight.mmap_size = 0;
+        return true;
+    }
+
+    memset(addr, 0, mmap_size);
+
+    dev->inflight_info.addr = addr;
+    dev->inflight_info.size = vmsg->payload.inflight.mmap_size = mmap_size;
+    dev->inflight_info.fd = vmsg->fds[0] = fd;
+    vmsg->fd_num = 1;
+    vmsg->payload.inflight.mmap_offset = 0;
+    vmsg->payload.inflight.version = INFLIGHT_VERSION;
+
+    DPRINT("send inflight mmap_size: %"PRId64"\n",
+           vmsg->payload.inflight.mmap_size);
+    DPRINT("send inflight mmap offset: %"PRId64"\n",
+           vmsg->payload.inflight.mmap_offset);
+    DPRINT("send inflight version: %"PRId16"\n",
+           vmsg->payload.inflight.version);
+
+    return true;
+}
+
+static bool
+vu_set_inflight_fd(VuDev *dev, VhostUserMsg *vmsg)
+{
+    int fd, i;
+    uint64_t mmap_size, mmap_offset;
+    uint16_t num_queues, queue_size, version;
+    void *rc;
+
+    if (vmsg->fd_num != 1 ||
+        vmsg->size != sizeof(vmsg->payload.inflight)) {
+        vu_panic(dev, "Invalid set_inflight_fd message size:%d fds:%d",
+                 vmsg->size, vmsg->fd_num);
+        return false;
+    }
+
+    fd = vmsg->fds[0];
+    mmap_size = vmsg->payload.inflight.mmap_size;
+    mmap_offset = vmsg->payload.inflight.mmap_offset;
+    version = vmsg->payload.inflight.version;
+    num_queues = vmsg->payload.inflight.num_queues;
+    queue_size = vmsg->payload.inflight.queue_size;
+
+    DPRINT("set_inflight_fd mmap_size: %"PRId64"\n", mmap_size);
+    DPRINT("set_inflight_fd mmap_offset: %"PRId64"\n", mmap_offset);
+    DPRINT("set_inflight_fd version: %"PRId16"\n", version);
+    DPRINT("set_inflight_fd num_queues: %"PRId16"\n", num_queues);
+    DPRINT("set_inflight_fd queue_size: %"PRId16"\n", queue_size);
+
+    if (version > INFLIGHT_VERSION) {
+        vu_panic(dev, "Invalid set_inflight_fd version: %d", version);
+        return false;
+    }
+
+    rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+              fd, mmap_offset);
+
+    if (rc == MAP_FAILED) {
+        vu_panic(dev, "set_inflight_fd mmap error: %s", strerror(errno));
+        return false;
+    }
+
+    if (dev->inflight_info.fd) {
+        close(dev->inflight_info.fd);
+    }
+
+    if (dev->inflight_info.addr) {
+        munmap(dev->inflight_info.addr, dev->inflight_info.size);
+    }
+
+    dev->inflight_info.fd = fd;
+    dev->inflight_info.addr = rc;
+    dev->inflight_info.size = mmap_size;
+    dev->inflight_info.version = version;
+
+    for (i = 0; i < num_queues; i++) {
+        dev->vq[i].inflight = (VuVirtqInflight *)rc;
+        dev->vq[i].inflight->desc_num = queue_size;
+        rc = (void *)((char *)rc + vu_inflight_queue_size(queue_size));
+    }
+
+    return false;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1292,6 +1500,10 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_postcopy_listen(dev, vmsg);
     case VHOST_USER_POSTCOPY_END:
         return vu_set_postcopy_end(dev, vmsg);
+    case VHOST_USER_GET_INFLIGHT_FD:
+        return vu_get_inflight_fd(dev, vmsg);
+    case VHOST_USER_SET_INFLIGHT_FD:
+        return vu_set_inflight_fd(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
@@ -1359,8 +1571,18 @@ vu_deinit(VuDev *dev)
             close(vq->err_fd);
             vq->err_fd = -1;
         }
+        vq->inflight = NULL;
     }
 
+    if (dev->inflight_info.addr) {
+        munmap(dev->inflight_info.addr, dev->inflight_info.size);
+        dev->inflight_info.addr = NULL;
+    }
+
+    if (dev->inflight_info.fd > 0) {
+        close(dev->inflight_info.fd);
+        dev->inflight_info.fd = -1;
+    }
 
     vu_close_log(dev);
     if (dev->slave_fd != -1) {
@@ -1687,20 +1909,6 @@ vu_queue_empty(VuDev *dev, VuVirtq *vq)
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
-static inline
-bool has_feature(uint64_t features, unsigned int fbit)
-{
-    assert(fbit < 64);
-    return !!(features & (1ULL << fbit));
-}
-
-static inline
-bool vu_has_feature(VuDev *dev,
-                    unsigned int fbit)
-{
-    return has_feature(dev->features, fbit);
-}
-
 static bool
 vring_notify(VuDev *dev, VuVirtq *vq)
 {
@@ -1829,12 +2037,6 @@ virtqueue_map_desc(VuDev *dev,
     *p_num_sg = num_sg;
 }
 
-/* Round number down to multiple */
-#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
-
-/* Round number up to multiple */
-#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
-
 static void *
 virtqueue_alloc_element(size_t sz,
                                      unsigned out_num, unsigned in_num)
@@ -1935,9 +2137,71 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
     return elem;
 }
 
+static int
+vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if (!has_feature(dev->protocol_features,
+        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
+        return 0;
+    }
+
+    if (unlikely(!vq->inflight)) {
+        return -1;
+    }
+
+    vq->inflight->desc[desc_idx].inuse = 1;
+
+    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
+
+    return 0;
+}
+
+static int
+vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if (!has_feature(dev->protocol_features,
+        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
+        return 0;
+    }
+
+    if (unlikely(!vq->inflight)) {
+        return -1;
+    }
+
+    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
+
+    barrier();
+
+    vq->inflight->desc[desc_idx].version++;
+
+    return 0;
+}
+
+static int
+vu_queue_inflight_post_put(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if (!has_feature(dev->protocol_features,
+        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
+        return 0;
+    }
+
+    if (unlikely(!vq->inflight)) {
+        return -1;
+    }
+
+    vq->inflight->desc[desc_idx].inuse = 0;
+
+    barrier();
+
+    vq->inflight->desc[desc_idx].version++;
+
+    return 0;
+}
+
 void *
 vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 {
+    int i;
     unsigned int head;
     VuVirtqElement *elem;
 
@@ -1946,6 +2210,12 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         return NULL;
     }
 
+    if (unlikely(vq->inflight_num > 0)) {
+        i = (--vq->inflight_num);
+        elem = vu_queue_map_desc(dev, vq, vq->inflight_desc[i], sz);
+        return elem;
+    }
+
     if (vu_queue_empty(dev, vq)) {
         return NULL;
     }
@@ -1976,6 +2246,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 
     vq->inuse++;
 
+    vu_queue_inflight_get(dev, vq, head);
+
     return elem;
 }
 
@@ -2120,5 +2392,7 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
               const VuVirtqElement *elem, unsigned int len)
 {
     vu_queue_fill(dev, vq, elem, len, 0);
+    vu_queue_inflight_pre_put(dev, vq, elem->index);
     vu_queue_flush(dev, vq, 1);
+    vu_queue_inflight_post_put(dev, vq, elem->index);
 }
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 4aa55b4d2d..a244342129 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -91,6 +92,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_GET_INFLIGHT_FD = 31,
+    VHOST_USER_SET_INFLIGHT_FD = 32,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -138,6 +141,14 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserInflight {
+    uint64_t mmap_size;
+    uint64_t mmap_offset;
+    uint16_t version;
+    uint16_t num_queues;
+    uint16_t queue_size;
+} VhostUserInflight;
+
 #if defined(_WIN32)
 # define VU_PACKED __attribute__((gcc_struct, packed))
 #else
@@ -163,6 +174,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         VhostUserConfig config;
         VhostUserVringArea area;
+        VhostUserInflight inflight;
     } payload;
 
     int fds[VHOST_MEMORY_MAX_NREGIONS];
@@ -234,9 +246,29 @@ typedef struct VuRing {
     uint32_t flags;
 } VuRing;
 
+typedef struct VuDescState {
+    uint8_t inuse;
+    uint8_t version;
+    uint16_t used_idx;
+    uint16_t avail_idx;
+    uint16_t reserved;
+} VuDescState;
+
+typedef struct VuVirtqInflight {
+    uint8_t valid;
+    uint16_t desc_num;
+    VuDescState desc[0];
+} VuVirtqInflight;
+
 typedef struct VuVirtq {
     VuRing vring;
 
+    VuVirtqInflight *inflight;
+
+    uint16_t inflight_desc[VIRTQUEUE_MAX_SIZE];
+
+    uint16_t inflight_num;
+
     /* Next head to pop */
     uint16_t last_avail_idx;
 
@@ -279,11 +311,19 @@ typedef void (*vu_set_watch_cb) (VuDev *dev, int fd, int condition,
                                  vu_watch_cb cb, void *data);
 typedef void (*vu_remove_watch_cb) (VuDev *dev, int fd);
 
+typedef struct VuDevInflightInfo {
+    int fd;
+    void *addr;
+    uint64_t size;
+    uint16_t version;
+} VuDevInflightInfo;
+
 struct VuDev {
     int sock;
     uint32_t nregions;
     VuDevRegion regions[VHOST_MEMORY_MAX_NREGIONS];
     VuVirtq vq[VHOST_MAX_NR_VIRTQUEUE];
+    VuDevInflightInfo inflight_info;
     int log_call_fd;
     int slave_fd;
     uint64_t log_size;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v5 4/6] vhost-user-blk: Add support to get/set inflight buffer
  2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (2 preceding siblings ...)
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory elohimes
@ 2019-01-22  8:31 ` elohimes
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 5/6] vhost-user-blk: Add support to reconnect backend elohimes
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: elohimes @ 2019-01-22  8:31 UTC (permalink / raw)
  To: mst, stefanha, marcandre.lureau, berrange, jasowang,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch adds support for vhost-user-blk device to get/set
inflight buffer from/to backend.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/block/vhost-user-blk.c          | 26 ++++++++++++++++++++++++++
 include/hw/virtio/vhost-user-blk.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index c3af28fad4..1f899ac1c0 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -126,6 +126,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
     }
 
     s->dev.acked_features = vdev->guest_features;
+
+    ret = vhost_dev_set_inflight(&s->dev, s->inflight);
+    if (ret < 0) {
+        error_report("Error set inflight: %d", -ret);
+        goto err_guest_notifiers;
+    }
+
     ret = vhost_dev_start(&s->dev, vdev);
     if (ret < 0) {
         error_report("Error starting vhost: %d", -ret);
@@ -245,6 +252,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static void vhost_user_blk_reset(VirtIODevice *vdev)
+{
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    vhost_dev_reset_inflight(s->inflight);
+}
+
 static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -285,6 +299,8 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
                          vhost_user_blk_handle_output);
     }
 
+    s->inflight = g_new0(struct vhost_inflight, 1);
+
     s->dev.nvqs = s->num_queues;
     s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
     s->dev.vq_index = 0;
@@ -311,12 +327,19 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         s->blkcfg.num_queues = s->num_queues;
     }
 
+    ret = vhost_dev_get_inflight(&s->dev, s->queue_size, s->inflight);
+    if (ret < 0) {
+        error_setg(errp, "vhost-user-blk: get inflight failed");
+        goto vhost_err;
+    }
+
     return;
 
 vhost_err:
     vhost_dev_cleanup(&s->dev);
 virtio_err:
     g_free(vqs);
+    g_free(s->inflight);
     virtio_cleanup(vdev);
 
     vhost_user_cleanup(user);
@@ -332,7 +355,9 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
 
     vhost_user_blk_set_status(vdev, 0);
     vhost_dev_cleanup(&s->dev);
+    vhost_dev_free_inflight(s->inflight);
     g_free(vqs);
+    g_free(s->inflight);
     virtio_cleanup(vdev);
 
     if (s->vhost_user) {
@@ -382,6 +407,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data)
     vdc->set_config = vhost_user_blk_set_config;
     vdc->get_features = vhost_user_blk_get_features;
     vdc->set_status = vhost_user_blk_set_status;
+    vdc->reset = vhost_user_blk_reset;
 }
 
 static const TypeInfo vhost_user_blk_info = {
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index d52944aeeb..445516604a 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -36,6 +36,7 @@ typedef struct VHostUserBlk {
     uint32_t queue_size;
     uint32_t config_wce;
     struct vhost_dev dev;
+    struct vhost_inflight *inflight;
     VhostUserState *vhost_user;
 } VHostUserBlk;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v5 5/6] vhost-user-blk: Add support to reconnect backend
  2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (3 preceding siblings ...)
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 4/6] vhost-user-blk: Add support to get/set inflight buffer elohimes
@ 2019-01-22  8:31 ` elohimes
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 6/6] contrib/vhost-user-blk: enable inflight I/O tracking elohimes
  2019-01-30  2:29 ` [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting Jason Wang
  6 siblings, 0 replies; 26+ messages in thread
From: elohimes @ 2019-01-22  8:31 UTC (permalink / raw)
  To: mst, stefanha, marcandre.lureau, berrange, jasowang,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Since we now support the message VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD. The backend is able to restart
safely because it can track inflight I/O in shared memory.
This patch allows qemu to reconnect the backend after
connection closed.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Ni Xun <nixun@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/block/vhost-user-blk.c          | 209 +++++++++++++++++++++++------
 include/hw/virtio/vhost-user-blk.h |   4 +
 2 files changed, 169 insertions(+), 44 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 1f899ac1c0..0fab7b1d2f 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
     .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
 };
 
-static void vhost_user_blk_start(VirtIODevice *vdev)
+static int vhost_user_blk_start(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
@@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
 
     if (!k->set_guest_notifiers) {
         error_report("binding does not support guest notifiers");
-        return;
+        return -ENOSYS;
     }
 
     ret = vhost_dev_enable_notifiers(&s->dev, vdev);
     if (ret < 0) {
         error_report("Error enabling host notifiers: %d", -ret);
-        return;
+        return ret;
     }
 
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
@@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
         vhost_virtqueue_mask(&s->dev, vdev, i, false);
     }
 
-    return;
+    return ret;
 
 err_guest_notifiers:
     k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&s->dev, vdev);
+    return ret;
 }
 
 static void vhost_user_blk_stop(VirtIODevice *vdev)
@@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
-        return;
     }
 
     vhost_dev_disable_notifiers(&s->dev, vdev);
@@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
+    int ret;
 
     if (!vdev->vm_running) {
         should_start = false;
     }
 
-    if (s->dev.started == should_start) {
+    if (s->should_start == should_start) {
+        return;
+    }
+
+    if (!s->connected || s->dev.started == should_start) {
+        s->should_start = should_start;
         return;
     }
 
     if (should_start) {
-        vhost_user_blk_start(vdev);
+        s->should_start = true;
+        /*
+         * make sure vhost_user_blk_handle_output() ignores fake
+         * guest kick by vhost_dev_enable_notifiers()
+         */
+        barrier();
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            qemu_chr_fe_disconnect(&s->chardev);
+        }
     } else {
         vhost_user_blk_stop(vdev);
+        /*
+         * make sure vhost_user_blk_handle_output() ignore fake
+         * guest kick by vhost_dev_disable_notifiers()
+         */
+        barrier();
+        s->should_start = false;
     }
-
 }
 
 static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
@@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
 static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    int i;
+    int i, ret;
 
     if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
         !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
         return;
     }
 
+    if (s->should_start) {
+        return;
+    }
+    s->should_start = true;
+
+    if (!s->connected) {
+        return;
+    }
+
     if (s->dev.started) {
         return;
     }
@@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
      * vhost here instead of waiting for .set_status().
      */
-    vhost_user_blk_start(vdev);
+    ret = vhost_user_blk_start(vdev);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost start failed: %s",
+                     strerror(-ret));
+        qemu_chr_fe_disconnect(&s->chardev);
+        return;
+    }
 
     /* Kick right away to begin processing requests already in vring */
     for (i = 0; i < s->dev.nvqs; i++) {
@@ -259,13 +296,106 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
     vhost_dev_reset_inflight(s->inflight);
 }
 
+static int vhost_user_blk_connect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    int ret = 0;
+
+    if (s->connected) {
+        return 0;
+    }
+    s->connected = true;
+
+    s->dev.nvqs = s->num_queues;
+    s->dev.vqs = s->vqs;
+    s->dev.vq_index = 0;
+    s->dev.backend_features = 0;
+
+    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
+
+    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost initialization failed: %s",
+                     strerror(-ret));
+        return ret;
+    }
+
+    /* restore vhost state */
+    if (s->should_start) {
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static void vhost_user_blk_disconnect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    if (!s->connected) {
+        return;
+    }
+    s->connected = false;
+
+    if (s->dev.started) {
+        vhost_user_blk_stop(vdev);
+    }
+
+    vhost_dev_cleanup(&s->dev);
+}
+
+static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
+                                     void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    qemu_chr_fe_disconnect(&s->chardev);
+
+    return true;
+}
+
+static void vhost_user_blk_event(void *opaque, int event)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    switch (event) {
+    case CHR_EVENT_OPENED:
+        if (vhost_user_blk_connect(dev) < 0) {
+            qemu_chr_fe_disconnect(&s->chardev);
+            return;
+        }
+        s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
+                                         vhost_user_blk_watch, dev);
+        break;
+    case CHR_EVENT_CLOSED:
+        vhost_user_blk_disconnect(dev);
+        if (s->watch) {
+            g_source_remove(s->watch);
+            s->watch = 0;
+        }
+        break;
+    }
+}
+
+
 static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     VhostUserState *user;
-    struct vhost_virtqueue *vqs = NULL;
     int i, ret;
+    Error *err = NULL;
 
     if (!s->chardev.chr) {
         error_setg(errp, "vhost-user-blk: chardev is mandatory");
@@ -300,27 +430,28 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
     }
 
     s->inflight = g_new0(struct vhost_inflight, 1);
-
-    s->dev.nvqs = s->num_queues;
-    s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
-    s->dev.vq_index = 0;
-    s->dev.backend_features = 0;
-    vqs = s->dev.vqs;
-
-    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
-
-    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
-    if (ret < 0) {
-        error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
-                   strerror(-ret));
-        goto virtio_err;
-    }
+    s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
+    s->watch = 0;
+    s->should_start = false;
+    s->connected = false;
+
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, vhost_user_blk_event,
+                             NULL, (void *)dev, NULL, true);
+
+reconnect:
+    do {
+        if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
+            error_report_err(err);
+            err = NULL;
+            sleep(1);
+        }
+    } while (!s->connected);
 
     ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
-                              sizeof(struct virtio_blk_config));
+                               sizeof(struct virtio_blk_config));
     if (ret < 0) {
-        error_setg(errp, "vhost-user-blk: get block config failed");
-        goto vhost_err;
+        error_report("vhost-user-blk: get block config failed");
+        goto reconnect;
     }
 
     if (s->blkcfg.num_queues != s->num_queues) {
@@ -329,34 +460,24 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 
     ret = vhost_dev_get_inflight(&s->dev, s->queue_size, s->inflight);
     if (ret < 0) {
-        error_setg(errp, "vhost-user-blk: get inflight failed");
-        goto vhost_err;
+        error_report("vhost-user-blk: get inflight failed");
+        goto reconnect;
     }
 
     return;
-
-vhost_err:
-    vhost_dev_cleanup(&s->dev);
-virtio_err:
-    g_free(vqs);
-    g_free(s->inflight);
-    virtio_cleanup(vdev);
-
-    vhost_user_cleanup(user);
-    g_free(user);
-    s->vhost_user = NULL;
 }
 
 static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(dev);
-    struct vhost_virtqueue *vqs = s->dev.vqs;
 
     vhost_user_blk_set_status(vdev, 0);
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, NULL,
+                             NULL, NULL, NULL, false);
     vhost_dev_cleanup(&s->dev);
     vhost_dev_free_inflight(s->inflight);
-    g_free(vqs);
+    g_free(s->vqs);
     g_free(s->inflight);
     virtio_cleanup(vdev);
 
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index 445516604a..4849aa5eb5 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -38,6 +38,10 @@ typedef struct VHostUserBlk {
     struct vhost_dev dev;
     struct vhost_inflight *inflight;
     VhostUserState *vhost_user;
+    struct vhost_virtqueue *vqs;
+    guint watch;
+    bool should_start;
+    bool connected;
 } VHostUserBlk;
 
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v5 6/6] contrib/vhost-user-blk: enable inflight I/O tracking
  2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (4 preceding siblings ...)
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 5/6] vhost-user-blk: Add support to reconnect backend elohimes
@ 2019-01-22  8:31 ` elohimes
  2019-01-30  2:29 ` [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting Jason Wang
  6 siblings, 0 replies; 26+ messages in thread
From: elohimes @ 2019-01-22  8:31 UTC (permalink / raw)
  To: mst, stefanha, marcandre.lureau, berrange, jasowang,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch enables inflight I/O tracking for
vhost-user-blk backend so that we could restart it safely.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/vhost-user-blk/vhost-user-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index 858221ad95..8cc033946a 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -327,7 +327,8 @@ vub_get_features(VuDev *dev)
 static uint64_t
 vub_get_protocol_features(VuDev *dev)
 {
-    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
+           1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
 }
 
 static int
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend elohimes
@ 2019-01-29  4:11   ` Stefan Hajnoczi
  2019-01-29  4:26     ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2019-01-29  4:11 UTC (permalink / raw)
  To: elohimes
  Cc: mst, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh, qemu-devel, zhangyu31, chaiwen, nixun,
	lilin24, Xie Yongji

[-- Attachment #1: Type: text/plain, Size: 1385 bytes --]

On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> +typedef struct DescState {
> +    uint8_t inuse;
> +    uint8_t version;
> +    uint16_t used_idx;
> +    uint16_t avail_idx;
> +    uint16_t reserved;
> +} DescState;
> +
> +typedef struct QueueRegion {
> +    uint8_t valid;
> +    uint16_t desc_num;
> +    DescState desc[0];
> +} QueueRegion;
> +
> +The struct DescState is used to describe one head-descriptor's state. The
> +fields have following meanings:
> +
> +    inuse: Indicate whether the descriptor is inuse or not.
> +
> +    version: Indicate whether we have an atomic update to used ring and
> +    inflight buffer when slave crash at that point. This field should be
> +    increased by one before and after this two updates. An odd version
> +    indicates an in-progress update.
> +
> +    used_idx: Store old index of used ring before we update used ring and
> +    inflight buffer so that slave can know whether an odd version inflight
> +    head-descriptor in inflight buffer is processed or not.
> +
> +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> +    slave can resubmit descriptors in order.

Will a completely new "packed vring" inflight shm layout be necessary to
support the packed vring layout in VIRTIO 1.1?

https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-29  4:11   ` Stefan Hajnoczi
@ 2019-01-29  4:26     ` Michael S. Tsirkin
  2019-01-29  6:15       ` Yongji Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2019-01-29  4:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elohimes, marcandre.lureau, berrange, jasowang, maxime.coquelin,
	yury-kotov, wrfsh, qemu-devel, zhangyu31, chaiwen, nixun,
	lilin24, Xie Yongji

On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > +typedef struct DescState {
> > +    uint8_t inuse;
> > +    uint8_t version;
> > +    uint16_t used_idx;
> > +    uint16_t avail_idx;
> > +    uint16_t reserved;
> > +} DescState;
> > +
> > +typedef struct QueueRegion {
> > +    uint8_t valid;

what's this?

> > +    uint16_t desc_num;

there's padding before this field. Pls make it explicit.

> > +    DescState desc[0];
> > +} QueueRegion;
> > +
> > +The struct DescState is used to describe one head-descriptor's state. The
> > +fields have following meanings:
> > +
> > +    inuse: Indicate whether the descriptor is inuse or not.

inuse by what?

> > +
> > +    version: Indicate whether we have an atomic update to used ring and
> > +    inflight buffer when slave crash at that point. This field should be
> > +    increased by one before and after this two updates. An odd version
> > +    indicates an in-progress update.

I'm not sure I understand what does the above say. Also does this
require two atomics? Seems pretty expensive. And why is it called
version?

> > +
> > +    used_idx: Store old index of used ring before we update used ring and
> > +    inflight buffer so that slave can know whether an odd version inflight
> > +    head-descriptor in inflight buffer is processed or not.

Here too.

> > +
> > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > +    slave can resubmit descriptors in order.

Why would that be necessary?

> 
> Will a completely new "packed vring" inflight shm layout be necessary to
> support the packed vring layout in VIRTIO 1.1?
> 
> https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007

Probably.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-29  4:26     ` Michael S. Tsirkin
@ 2019-01-29  6:15       ` Yongji Xie
  2019-01-29 14:15         ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Yongji Xie @ 2019-01-29  6:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrangé,
	Jason Wang, Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > > +typedef struct DescState {
> > > +    uint8_t inuse;
> > > +    uint8_t version;
> > > +    uint16_t used_idx;
> > > +    uint16_t avail_idx;
> > > +    uint16_t reserved;
> > > +} DescState;
> > > +
> > > +typedef struct QueueRegion {
> > > +    uint8_t valid;
>
> what's this?
>

We can use this to check whether this buffer is reset by qemu.

> > > +    uint16_t desc_num;
>
> there's padding before this field. Pls make it explicit.
>

Will do it.

> > > +    DescState desc[0];
> > > +} QueueRegion;
> > > +
> > > +The struct DescState is used to describe one head-descriptor's state. The
> > > +fields have following meanings:
> > > +
> > > +    inuse: Indicate whether the descriptor is inuse or not.
>
> inuse by what?
>

Maybe inflight is better?

> > > +
> > > +    version: Indicate whether we have an atomic update to used ring and
> > > +    inflight buffer when slave crash at that point. This field should be
> > > +    increased by one before and after this two updates. An odd version
> > > +    indicates an in-progress update.
>
> I'm not sure I understand what does the above say. Also does this
> require two atomics? Seems pretty expensive. And why is it called
> version?
>
> > > +
> > > +    used_idx: Store old index of used ring before we update used ring and
> > > +    inflight buffer so that slave can know whether an odd version inflight
> > > +    head-descriptor in inflight buffer is processed or not.
>
> Here too.
>

Sorry, the above description may be not clear. This two fields are
used to indicate whether we have an in-progress update to used ring
and inflight buffer. If slave crash before the update to used_ring and
after the update to inflight buffer, the version should be odd and
used_idx should be equal to used_ring.idx. Then we need to roll back
the update to inflight buffer. As for the name of the version filed,
actually I didn't find a good one, so I just copy it from struct
kvm_steal_time...

> > > +
> > > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > > +    slave can resubmit descriptors in order.
>
> Why would that be necessary?
>

Maybe some devices will be able to use it to preserve order after
reconnecting in future?

> >
> > Will a completely new "packed vring" inflight shm layout be necessary to
> > support the packed vring layout in VIRTIO 1.1?
> >
> > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
>
> Probably.
>

How about supporting packed virtqueue in guest driver?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-29  6:15       ` Yongji Xie
@ 2019-01-29 14:15         ` Michael S. Tsirkin
  2019-01-30  2:07           ` Yongji Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2019-01-29 14:15 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrangé,
	Jason Wang, Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > > > +typedef struct DescState {
> > > > +    uint8_t inuse;
> > > > +    uint8_t version;
> > > > +    uint16_t used_idx;
> > > > +    uint16_t avail_idx;
> > > > +    uint16_t reserved;
> > > > +} DescState;
> > > > +
> > > > +typedef struct QueueRegion {
> > > > +    uint8_t valid;
> >
> > what's this?
> >
> 
> We can use this to check whether this buffer is reset by qemu.


I'd use version here. Document that it's 1 currently.
Or do we want feature flags so we can support downgrades too?



> > > > +    uint16_t desc_num;
> >
> > there's padding before this field. Pls make it explicit.
> >
> 
> Will do it.
> 
> > > > +    DescState desc[0];
> > > > +} QueueRegion;
> > > > +
> > > > +The struct DescState is used to describe one head-descriptor's state. The
> > > > +fields have following meanings:
> > > > +
> > > > +    inuse: Indicate whether the descriptor is inuse or not.
> >
> > inuse by what?
> >
> 
> Maybe inflight is better?
> 
> > > > +
> > > > +    version: Indicate whether we have an atomic update to used ring and
> > > > +    inflight buffer when slave crash at that point. This field should be
> > > > +    increased by one before and after this two updates. An odd version
> > > > +    indicates an in-progress update.
> >
> > I'm not sure I understand what does the above say. Also does this
> > require two atomics? Seems pretty expensive. And why is it called
> > version?
> >
> > > > +
> > > > +    used_idx: Store old index of used ring before we update used ring and
> > > > +    inflight buffer so that slave can know whether an odd version inflight
> > > > +    head-descriptor in inflight buffer is processed or not.
> >
> > Here too.
> >
> 
> Sorry, the above description may be not clear. This two fields are
> used to indicate whether we have an in-progress update to used ring
> and inflight buffer. If slave crash before the update to used_ring and
> after the update to inflight buffer, the version should be odd and
> used_idx should be equal to used_ring.idx. Then we need to roll back
> the update to inflight buffer. As for the name of the version filed,
> actually I didn't find a good one, so I just copy it from struct
> kvm_steal_time...
> 
> > > > +
> > > > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > > > +    slave can resubmit descriptors in order.
> >
> > Why would that be necessary?
> >
> 
> Maybe some devices will be able to use it to preserve order after
> reconnecting in future?

If buffers are used in order then old entries in the ring
are not overwritten so inflight tracking is not
necessary. This is exactly the case with vhost user net today.

> > >
> > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > support the packed vring layout in VIRTIO 1.1?
> > >
> > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> >
> > Probably.
> >
> 
> How about supporting packed virtqueue in guest driver?
> 
> Thanks,
> Yongji

Depends on the guest right? Linux has it:


commit f959a128fe83090981add69aadc87a4e496e9369
Author: Tiwei Bie <tiwei.bie@intel.com>
Date:   Wed Nov 21 18:03:30 2018 +0800

    virtio_ring: advertize packed ring layout

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-29 14:15         ` Michael S. Tsirkin
@ 2019-01-30  2:07           ` Yongji Xie
  2019-01-30  2:30             ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Yongji Xie @ 2019-01-30  2:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrangé,
	Jason Wang, Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > > > > +typedef struct DescState {
> > > > > +    uint8_t inuse;
> > > > > +    uint8_t version;
> > > > > +    uint16_t used_idx;
> > > > > +    uint16_t avail_idx;
> > > > > +    uint16_t reserved;
> > > > > +} DescState;
> > > > > +
> > > > > +typedef struct QueueRegion {
> > > > > +    uint8_t valid;
> > >
> > > what's this?
> > >
> >
> > We can use this to check whether this buffer is reset by qemu.
>
>
> I'd use version here. Document that it's 1 currently.

If we put version into the shared buffer, QEMU will reset it when vm
reset. Then if backend restart at the same time, the version of this
buffer will be lost. So I still let QEMU maintain it and send it back
through message's payload.

> Or do we want feature flags so we can support downgrades too?
>

We faced the same problem that the feature flags will be lost if QEMU
do not maintain it. So maybe we should put this into message's
payload?

>
> > > > > +    uint16_t desc_num;
> > >
> > > there's padding before this field. Pls make it explicit.
> > >
> >
> > Will do it.
> >
> > > > > +    DescState desc[0];
> > > > > +} QueueRegion;
> > > > > +
> > > > > +The struct DescState is used to describe one head-descriptor's state. The
> > > > > +fields have following meanings:
> > > > > +
> > > > > +    inuse: Indicate whether the descriptor is inuse or not.
> > >
> > > inuse by what?
> > >
> >
> > Maybe inflight is better?
> >
> > > > > +
> > > > > +    version: Indicate whether we have an atomic update to used ring and
> > > > > +    inflight buffer when slave crash at that point. This field should be
> > > > > +    increased by one before and after this two updates. An odd version
> > > > > +    indicates an in-progress update.
> > >
> > > I'm not sure I understand what does the above say. Also does this
> > > require two atomics? Seems pretty expensive. And why is it called
> > > version?
> > >
> > > > > +
> > > > > +    used_idx: Store old index of used ring before we update used ring and
> > > > > +    inflight buffer so that slave can know whether an odd version inflight
> > > > > +    head-descriptor in inflight buffer is processed or not.
> > >
> > > Here too.
> > >
> >
> > Sorry, the above description may be not clear. This two fields are
> > used to indicate whether we have an in-progress update to used ring
> > and inflight buffer. If slave crash before the update to used_ring and
> > after the update to inflight buffer, the version should be odd and
> > used_idx should be equal to used_ring.idx. Then we need to roll back
> > the update to inflight buffer. As for the name of the version filed,
> > actually I didn't find a good one, so I just copy it from struct
> > kvm_steal_time...
> >
> > > > > +
> > > > > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > > > > +    slave can resubmit descriptors in order.
> > >
> > > Why would that be necessary?
> > >
> >
> > Maybe some devices will be able to use it to preserve order after
> > reconnecting in future?
>
> If buffers are used in order then old entries in the ring
> are not overwritten so inflight tracking is not
> necessary. This is exactly the case with vhost user net today.
>

OK, looks reasonable to me.

> > > >
> > > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > > support the packed vring layout in VIRTIO 1.1?
> > > >
> > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > >
> > > Probably.
> > >
> >
> > How about supporting packed virtqueue in guest driver?
> >
> > Thanks,
> > Yongji
>
> Depends on the guest right? Linux has it:
>

Sorry, actually I mean I prefer to support inflight tracking for
packed virtqueue in guest driver. This feature is only used by legacy
virtio 1.0 or virtio 0.9 device. What do you think about it?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting
  2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (5 preceding siblings ...)
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 6/6] contrib/vhost-user-blk: enable inflight I/O tracking elohimes
@ 2019-01-30  2:29 ` Jason Wang
  2019-01-30  3:40   ` Michael S. Tsirkin
  6 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2019-01-30  2:29 UTC (permalink / raw)
  To: elohimes, mst, stefanha, marcandre.lureau, berrange,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji


On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
>
> The patch 1 introduces two new messages VHOST_USER_GET_INFLIGHT_FD
> and VHOST_USER_SET_INFLIGHT_FD to support support transferring shared
> buffer between qemu and backend.
>
> The patch 2,3 are the corresponding libvhost-user patches of
> patch 1. Make libvhost-user support VHOST_USER_GET_INFLIGHT_FD
> and VHOST_USER_SET_INFLIGHT_FD.
>
> The patch 4 allows vhost-user-blk to use the two new messages
> to get/set inflight buffer from/to backend.
>
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
>
> The patch 6 introduces VHOST_USER_PROTOCOL_F_SLAVE_SHMFD
> to vhost-user-blk backend which is used to tell qemu that
> we support reconnecting now.
>
> This series is based on Daniel P. Berrangé's patchset:
>
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
>
> To use it, we could start qemu with:
>
> qemu-system-x86_64 \
>          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1, \
>          -device vhost-user-blk-pci,chardev=char0 \
>
> and start vhost-user-blk backend with:
>
> vhost-user-blk -b /path/file -s /path/vhost.socket
>
> Then we can restart vhost-user-blk at any time during VM running.
>
> V4 to V5:
> - Drop patch that enables "nowait" option on client sockets
> - Support resubmitting inflight I/O in order
> - Make inflight I/O tracking more robust
> - Remove align field and add queue size field in VhostUserInflight
> - Document more details in vhost-user.txt


I'm still not convinced about this approach. If the maintainer decide to 
merge, at least two things needs to be added besides the correctness of 
the code:

- you need prove that this approach can work for packed ring

-  an unit-test to test the crash during logging in-flight descriptor.

Thanks


>
> V3 to V4:
> - Drop messages VHOST_USER_GET_SHM_SIZE and VHOST_USER_SET_SHM_FD
> - Introduce two new messages VHOST_USER_GET_INFLIGHT_FD
>    and VHOST_USER_SET_INFLIGHT_FD
> - Allocate inflight buffer in backend rather than in qemu
> - Document a recommended format for inflight buffer
>
> V2 to V3:
> - Using exisiting wait/nowait options to control connection on
>    client sockets instead of introducing "disconnected" option.
> - Support the case that vhost-user backend restart during initialzation
>    of vhost-user-blk device.
>
> V1 to V2:
> - Introduce "disconnected" option for chardev instead of reuse "wait"
>    option
> - Support the case that QEMU starts before vhost-user backend
> - Drop message VHOST_USER_SET_VRING_INFLIGHT
> - Introduce two new messages VHOST_USER_GET_SHM_SIZE
>    and VHOST_USER_SET_SHM_FD
>
> Xie Yongji (6):
>    vhost-user: Support transferring inflight buffer between qemu and
>      backend
>    libvhost-user: Introduce vu_queue_map_desc()
>    libvhost-user: Support tracking inflight I/O in shared memory
>    vhost-user-blk: Add support to get/set inflight buffer
>    vhost-user-blk: Add support to reconnect backend
>    contrib/vhost-user-blk: enable inflight I/O tracking
>
>   Makefile                                |   2 +-
>   contrib/libvhost-user/libvhost-user.c   | 402 ++++++++++++++++++++----
>   contrib/libvhost-user/libvhost-user.h   |  40 +++
>   contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
>   docs/interop/vhost-user.txt             | 101 ++++++
>   hw/block/vhost-user-blk.c               | 227 ++++++++++---
>   hw/virtio/vhost-user.c                  | 110 +++++++
>   hw/virtio/vhost.c                       | 105 +++++++
>   include/hw/virtio/vhost-backend.h       |  10 +
>   include/hw/virtio/vhost-user-blk.h      |   5 +
>   include/hw/virtio/vhost.h               |  19 ++
>   11 files changed, 925 insertions(+), 99 deletions(-)
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-30  2:07           ` Yongji Xie
@ 2019-01-30  2:30             ` Michael S. Tsirkin
  2019-01-30  3:49               ` Yongji Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2019-01-30  2:30 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrangé,
	Jason Wang, Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > > > > > +typedef struct DescState {
> > > > > > +    uint8_t inuse;
> > > > > > +    uint8_t version;
> > > > > > +    uint16_t used_idx;
> > > > > > +    uint16_t avail_idx;
> > > > > > +    uint16_t reserved;
> > > > > > +} DescState;
> > > > > > +
> > > > > > +typedef struct QueueRegion {
> > > > > > +    uint8_t valid;
> > > >
> > > > what's this?
> > > >
> > >
> > > We can use this to check whether this buffer is reset by qemu.
> >
> >
> > I'd use version here. Document that it's 1 currently.
> 
> If we put version into the shared buffer, QEMU will reset it when vm
> reset. Then if backend restart at the same time, the version of this
> buffer will be lost. So I still let QEMU maintain it and send it back
> through message's payload.

I don't get it. If it's reset there's no contents.
If there's no contents then who cares what the version is?


> > Or do we want feature flags so we can support downgrades too?
> >
> 
> We faced the same problem that the feature flags will be lost if QEMU
> do not maintain it. So maybe we should put this into message's
> payload?

I don't understand why we care. We only maintain inflight so we
don't need to reset the device. if device is reset we don't
need to maintain them at all.

> >
> > > > > > +    uint16_t desc_num;
> > > >
> > > > there's padding before this field. Pls make it explicit.
> > > >
> > >
> > > Will do it.
> > >
> > > > > > +    DescState desc[0];
> > > > > > +} QueueRegion;
> > > > > > +
> > > > > > +The struct DescState is used to describe one head-descriptor's state. The
> > > > > > +fields have following meanings:
> > > > > > +
> > > > > > +    inuse: Indicate whether the descriptor is inuse or not.
> > > >
> > > > inuse by what?
> > > >
> > >
> > > Maybe inflight is better?
> > >
> > > > > > +
> > > > > > +    version: Indicate whether we have an atomic update to used ring and
> > > > > > +    inflight buffer when slave crash at that point. This field should be
> > > > > > +    increased by one before and after this two updates. An odd version
> > > > > > +    indicates an in-progress update.
> > > >
> > > > I'm not sure I understand what does the above say. Also does this
> > > > require two atomics? Seems pretty expensive. And why is it called
> > > > version?
> > > >
> > > > > > +
> > > > > > +    used_idx: Store old index of used ring before we update used ring and
> > > > > > +    inflight buffer so that slave can know whether an odd version inflight
> > > > > > +    head-descriptor in inflight buffer is processed or not.
> > > >
> > > > Here too.
> > > >
> > >
> > > Sorry, the above description may be not clear. This two fields are
> > > used to indicate whether we have an in-progress update to used ring
> > > and inflight buffer. If slave crash before the update to used_ring and
> > > after the update to inflight buffer, the version should be odd and
> > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > the update to inflight buffer. As for the name of the version filed,
> > > actually I didn't find a good one, so I just copy it from struct
> > > kvm_steal_time...
> > >
> > > > > > +
> > > > > > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > > > > > +    slave can resubmit descriptors in order.
> > > >
> > > > Why would that be necessary?
> > > >
> > >
> > > Maybe some devices will be able to use it to preserve order after
> > > reconnecting in future?
> >
> > If buffers are used in order then old entries in the ring
> > are not overwritten so inflight tracking is not
> > necessary. This is exactly the case with vhost user net today.
> >
> 
> OK, looks reasonable to me.
> 
> > > > >
> > > > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > > > support the packed vring layout in VIRTIO 1.1?
> > > > >
> > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > > >
> > > > Probably.
> > > >
> > >
> > > How about supporting packed virtqueue in guest driver?
> > >
> > > Thanks,
> > > Yongji
> >
> > Depends on the guest right? Linux has it:
> >
> 
> Sorry, actually I mean I prefer to support inflight tracking for
> packed virtqueue in guest driver. This feature is only used by legacy
> virtio 1.0 or virtio 0.9 device. What do you think about it?
> 
> Thanks,
> Yongji

I don't see what does one have to do with the other.  Either we do or we
don't want to do it without downtime and retries. If we don't mind
resets then let's do it by making guest changes.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory elohimes
@ 2019-01-30  2:31   ` Jason Wang
  2019-01-30  3:14     ` Michael S. Tsirkin
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jason Wang @ 2019-01-30  2:31 UTC (permalink / raw)
  To: elohimes, mst, stefanha, marcandre.lureau, berrange,
	maxime.coquelin, yury-kotov, wrfsh
  Cc: nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji


On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
> +static int
> +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
> +{
> +    if (!has_feature(dev->protocol_features,
> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> +        return 0;
> +    }
> +
> +    if (unlikely(!vq->inflight)) {
> +        return -1;
> +    }
> +
> +    vq->inflight->desc[desc_idx].inuse = 1;
> +
> +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
> +
> +    return 0;
> +}
> +
> +static int
> +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
> +{
> +    if (!has_feature(dev->protocol_features,
> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> +        return 0;
> +    }
> +
> +    if (unlikely(!vq->inflight)) {
> +        return -1;
> +    }
> +
> +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
> +
> +    barrier();
> +
> +    vq->inflight->desc[desc_idx].version++;
> +
> +    return 0;
> +}


You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the 
value reach memory.

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-30  2:31   ` Jason Wang
@ 2019-01-30  3:14     ` Michael S. Tsirkin
  2019-01-30  9:52       ` Jason Wang
  2019-01-30  3:58     ` Yongji Xie
  2019-01-30  5:48     ` Yongji Xie
  2 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2019-01-30  3:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: elohimes, stefanha, marcandre.lureau, berrange, maxime.coquelin,
	yury-kotov, wrfsh, nixun, qemu-devel, lilin24, zhangyu31,
	chaiwen, Xie Yongji

On Wed, Jan 30, 2019 at 10:31:49AM +0800, Jason Wang wrote:
> 
> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
> > +static int
> > +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
> > +{
> > +    if (!has_feature(dev->protocol_features,
> > +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> > +        return 0;
> > +    }
> > +
> > +    if (unlikely(!vq->inflight)) {
> > +        return -1;
> > +    }
> > +
> > +    vq->inflight->desc[desc_idx].inuse = 1;
> > +
> > +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
> > +{
> > +    if (!has_feature(dev->protocol_features,
> > +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> > +        return 0;
> > +    }
> > +
> > +    if (unlikely(!vq->inflight)) {
> > +        return -1;
> > +    }
> > +
> > +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
> > +
> > +    barrier();
> > +
> > +    vq->inflight->desc[desc_idx].version++;
> > +
> > +    return 0;
> > +}
> 
> 
> You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the
> value reach memory.
> 
> Thanks
> 

WRITE_ONCE is literally volatile + dependency memory barrier.
So unless compiler is a very agressive one, it does not
buy you much.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting
  2019-01-30  2:29 ` [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting Jason Wang
@ 2019-01-30  3:40   ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2019-01-30  3:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: elohimes, stefanha, marcandre.lureau, berrange, maxime.coquelin,
	yury-kotov, wrfsh, qemu-devel, zhangyu31, chaiwen, nixun,
	lilin24, Xie Yongji

On Wed, Jan 30, 2019 at 10:29:10AM +0800, Jason Wang wrote:
> 
> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> > 
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> > 
> > The patch 1 introduces two new messages VHOST_USER_GET_INFLIGHT_FD
> > and VHOST_USER_SET_INFLIGHT_FD to support support transferring shared
> > buffer between qemu and backend.
> > 
> > The patch 2,3 are the corresponding libvhost-user patches of
> > patch 1. Make libvhost-user support VHOST_USER_GET_INFLIGHT_FD
> > and VHOST_USER_SET_INFLIGHT_FD.
> > 
> > The patch 4 allows vhost-user-blk to use the two new messages
> > to get/set inflight buffer from/to backend.
> > 
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> > 
> > The patch 6 introduces VHOST_USER_PROTOCOL_F_SLAVE_SHMFD
> > to vhost-user-blk backend which is used to tell qemu that
> > we support reconnecting now.
> > 
> > This series is based on Daniel P. Berrangé's patchset:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
> > 
> > To use it, we could start qemu with:
> > 
> > qemu-system-x86_64 \
> >          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1, \
> >          -device vhost-user-blk-pci,chardev=char0 \
> > 
> > and start vhost-user-blk backend with:
> > 
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> > 
> > Then we can restart vhost-user-blk at any time during VM running.
> > 
> > V4 to V5:
> > - Drop patch that enables "nowait" option on client sockets
> > - Support resubmitting inflight I/O in order
> > - Make inflight I/O tracking more robust
> > - Remove align field and add queue size field in VhostUserInflight
> > - Document more details in vhost-user.txt
> 
> 
> I'm still not convinced about this approach. If the maintainer decide to
> merge, at least two things needs to be added besides the correctness of the
> code:
> 
> - you need prove that this approach can work for packed ring

Or rather document how it's used.

> -  an unit-test to test the crash during logging in-flight descriptor.
> 
> Thanks

For contrib/vhost-user-blk? Well I don't think it's used by unit
tests right now, it is? So it's a worthwhile goal but not
necessarily a requirement for any specific patch I think.


> 
> > 
> > V3 to V4:
> > - Drop messages VHOST_USER_GET_SHM_SIZE and VHOST_USER_SET_SHM_FD
> > - Introduce two new messages VHOST_USER_GET_INFLIGHT_FD
> >    and VHOST_USER_SET_INFLIGHT_FD
> > - Allocate inflight buffer in backend rather than in qemu
> > - Document a recommended format for inflight buffer
> > 
> > V2 to V3:
> > - Using exisiting wait/nowait options to control connection on
> >    client sockets instead of introducing "disconnected" option.
> > - Support the case that vhost-user backend restart during initialzation
> >    of vhost-user-blk device.
> > 
> > V1 to V2:
> > - Introduce "disconnected" option for chardev instead of reuse "wait"
> >    option
> > - Support the case that QEMU starts before vhost-user backend
> > - Drop message VHOST_USER_SET_VRING_INFLIGHT
> > - Introduce two new messages VHOST_USER_GET_SHM_SIZE
> >    and VHOST_USER_SET_SHM_FD
> > 
> > Xie Yongji (6):
> >    vhost-user: Support transferring inflight buffer between qemu and
> >      backend
> >    libvhost-user: Introduce vu_queue_map_desc()
> >    libvhost-user: Support tracking inflight I/O in shared memory
> >    vhost-user-blk: Add support to get/set inflight buffer
> >    vhost-user-blk: Add support to reconnect backend
> >    contrib/vhost-user-blk: enable inflight I/O tracking
> > 
> >   Makefile                                |   2 +-
> >   contrib/libvhost-user/libvhost-user.c   | 402 ++++++++++++++++++++----
> >   contrib/libvhost-user/libvhost-user.h   |  40 +++
> >   contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
> >   docs/interop/vhost-user.txt             | 101 ++++++
> >   hw/block/vhost-user-blk.c               | 227 ++++++++++---
> >   hw/virtio/vhost-user.c                  | 110 +++++++
> >   hw/virtio/vhost.c                       | 105 +++++++
> >   include/hw/virtio/vhost-backend.h       |  10 +
> >   include/hw/virtio/vhost-user-blk.h      |   5 +
> >   include/hw/virtio/vhost.h               |  19 ++
> >   11 files changed, 925 insertions(+), 99 deletions(-)
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-30  2:30             ` Michael S. Tsirkin
@ 2019-01-30  3:49               ` Yongji Xie
  2019-01-30  4:07                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Yongji Xie @ 2019-01-30  3:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrangé,
	Jason Wang, Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Wed, 30 Jan 2019 at 10:30, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> > On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > > > > > > +typedef struct DescState {
> > > > > > > +    uint8_t inuse;
> > > > > > > +    uint8_t version;
> > > > > > > +    uint16_t used_idx;
> > > > > > > +    uint16_t avail_idx;
> > > > > > > +    uint16_t reserved;
> > > > > > > +} DescState;
> > > > > > > +
> > > > > > > +typedef struct QueueRegion {
> > > > > > > +    uint8_t valid;
> > > > >
> > > > > what's this?
> > > > >
> > > >
> > > > We can use this to check whether this buffer is reset by qemu.
> > >
> > >
> > > I'd use version here. Document that it's 1 currently.
> >
> > If we put version into the shared buffer, QEMU will reset it when vm
> > reset. Then if backend restart at the same time, the version of this
> > buffer will be lost. So I still let QEMU maintain it and send it back
> > through message's payload.
>
> I don't get it. If it's reset there's no contents.
> If there's no contents then who cares what the version is?
>

What I thought before is that we should not update the buffer when vm
reset. But now seems like it's unnecessary. I will update this patch
as you said. Thank you!

> > > Or do we want feature flags so we can support downgrades too?
> > >
> >
> > We faced the same problem that the feature flags will be lost if QEMU
> > do not maintain it. So maybe we should put this into message's
> > payload?
>
> I don't understand why we care. We only maintain inflight so we
> don't need to reset the device. if device is reset we don't
> need to maintain them at all.
>
> > >
> > > > > > > +    uint16_t desc_num;
> > > > >
> > > > > there's padding before this field. Pls make it explicit.
> > > > >
> > > >
> > > > Will do it.
> > > >
> > > > > > > +    DescState desc[0];
> > > > > > > +} QueueRegion;
> > > > > > > +
> > > > > > > +The struct DescState is used to describe one head-descriptor's state. The
> > > > > > > +fields have following meanings:
> > > > > > > +
> > > > > > > +    inuse: Indicate whether the descriptor is inuse or not.
> > > > >
> > > > > inuse by what?
> > > > >
> > > >
> > > > Maybe inflight is better?
> > > >
> > > > > > > +
> > > > > > > +    version: Indicate whether we have an atomic update to used ring and
> > > > > > > +    inflight buffer when slave crash at that point. This field should be
> > > > > > > +    increased by one before and after this two updates. An odd version
> > > > > > > +    indicates an in-progress update.
> > > > >
> > > > > I'm not sure I understand what does the above say. Also does this
> > > > > require two atomics? Seems pretty expensive. And why is it called
> > > > > version?
> > > > >
> > > > > > > +
> > > > > > > +    used_idx: Store old index of used ring before we update used ring and
> > > > > > > +    inflight buffer so that slave can know whether an odd version inflight
> > > > > > > +    head-descriptor in inflight buffer is processed or not.
> > > > >
> > > > > Here too.
> > > > >
> > > >
> > > > Sorry, the above description may be not clear. This two fields are
> > > > used to indicate whether we have an in-progress update to used ring
> > > > and inflight buffer. If slave crash before the update to used_ring and
> > > > after the update to inflight buffer, the version should be odd and
> > > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > > the update to inflight buffer. As for the name of the version filed,
> > > > actually I didn't find a good one, so I just copy it from struct
> > > > kvm_steal_time...
> > > >
> > > > > > > +
> > > > > > > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > > > > > > +    slave can resubmit descriptors in order.
> > > > >
> > > > > Why would that be necessary?
> > > > >
> > > >
> > > > Maybe some devices will be able to use it to preserve order after
> > > > reconnecting in future?
> > >
> > > If buffers are used in order then old entries in the ring
> > > are not overwritten so inflight tracking is not
> > > necessary. This is exactly the case with vhost user net today.
> > >
> >
> > OK, looks reasonable to me.
> >
> > > > > >
> > > > > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > > > > support the packed vring layout in VIRTIO 1.1?
> > > > > >
> > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > > > >
> > > > > Probably.
> > > > >
> > > >
> > > > How about supporting packed virtqueue in guest driver?
> > > >
> > > > Thanks,
> > > > Yongji
> > >
> > > Depends on the guest right? Linux has it:
> > >
> >
> > Sorry, actually I mean I prefer to support inflight tracking for
> > packed virtqueue in guest driver. This feature is only used by legacy
> > virtio 1.0 or virtio 0.9 device. What do you think about it?
> >
> > Thanks,
> > Yongji
>
> I don't see what does one have to do with the other.  Either we do or we
> don't want to do it without downtime and retries. If we don't mind
> resets then let's do it by making guest changes.
>

OK, I see. But seems like now we don't support packed virtqueue in
qemu. Is it better to do that in a splited patchset? Add packed
virtqueue implenment in contrib/libvhost-user.c, then document the
inflight I/O tracking behavior.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-30  2:31   ` Jason Wang
  2019-01-30  3:14     ` Michael S. Tsirkin
@ 2019-01-30  3:58     ` Yongji Xie
  2019-02-01  2:26       ` Jason Wang
  2019-01-30  5:48     ` Yongji Xie
  2 siblings, 1 reply; 26+ messages in thread
From: Yongji Xie @ 2019-01-30  3:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrangé,
	Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji

On Wed, 30 Jan 2019 at 10:32, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
> > +static int
> > +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
> > +{
> > +    if (!has_feature(dev->protocol_features,
> > +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> > +        return 0;
> > +    }
> > +
> > +    if (unlikely(!vq->inflight)) {
> > +        return -1;
> > +    }
> > +
> > +    vq->inflight->desc[desc_idx].inuse = 1;
> > +
> > +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
> > +{
> > +    if (!has_feature(dev->protocol_features,
> > +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> > +        return 0;
> > +    }
> > +
> > +    if (unlikely(!vq->inflight)) {
> > +        return -1;
> > +    }
> > +
> > +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
> > +
> > +    barrier();
> > +
> > +    vq->inflight->desc[desc_idx].version++;
> > +
> > +    return 0;
> > +}
>
>
> You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the
> value reach memory.
>

Is it enough to declare those variables as volatile?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-30  3:49               ` Yongji Xie
@ 2019-01-30  4:07                 ` Michael S. Tsirkin
  2019-01-30  4:11                   ` Yongji Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2019-01-30  4:07 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrangé,
	Jason Wang, Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Wed, Jan 30, 2019 at 11:49:56AM +0800, Yongji Xie wrote:
> On Wed, 30 Jan 2019 at 10:30, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> > > On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > > > > > > > +typedef struct DescState {
> > > > > > > > +    uint8_t inuse;
> > > > > > > > +    uint8_t version;
> > > > > > > > +    uint16_t used_idx;
> > > > > > > > +    uint16_t avail_idx;
> > > > > > > > +    uint16_t reserved;
> > > > > > > > +} DescState;
> > > > > > > > +
> > > > > > > > +typedef struct QueueRegion {
> > > > > > > > +    uint8_t valid;
> > > > > >
> > > > > > what's this?
> > > > > >
> > > > >
> > > > > We can use this to check whether this buffer is reset by qemu.
> > > >
> > > >
> > > > I'd use version here. Document that it's 1 currently.
> > >
> > > If we put version into the shared buffer, QEMU will reset it when vm
> > > reset. Then if backend restart at the same time, the version of this
> > > buffer will be lost. So I still let QEMU maintain it and send it back
> > > through message's payload.
> >
> > I don't get it. If it's reset there's no contents.
> > If there's no contents then who cares what the version is?
> >
> 
> What I thought before is that we should not update the buffer when vm
> reset. But now seems like it's unnecessary. I will update this patch
> as you said. Thank you!
> 
> > > > Or do we want feature flags so we can support downgrades too?
> > > >
> > >
> > > We faced the same problem that the feature flags will be lost if QEMU
> > > do not maintain it. So maybe we should put this into message's
> > > payload?
> >
> > I don't understand why we care. We only maintain inflight so we
> > don't need to reset the device. if device is reset we don't
> > need to maintain them at all.
> >
> > > >
> > > > > > > > +    uint16_t desc_num;
> > > > > >
> > > > > > there's padding before this field. Pls make it explicit.
> > > > > >
> > > > >
> > > > > Will do it.
> > > > >
> > > > > > > > +    DescState desc[0];
> > > > > > > > +} QueueRegion;
> > > > > > > > +
> > > > > > > > +The struct DescState is used to describe one head-descriptor's state. The
> > > > > > > > +fields have following meanings:
> > > > > > > > +
> > > > > > > > +    inuse: Indicate whether the descriptor is inuse or not.
> > > > > >
> > > > > > inuse by what?
> > > > > >
> > > > >
> > > > > Maybe inflight is better?
> > > > >
> > > > > > > > +
> > > > > > > > +    version: Indicate whether we have an atomic update to used ring and
> > > > > > > > +    inflight buffer when slave crash at that point. This field should be
> > > > > > > > +    increased by one before and after this two updates. An odd version
> > > > > > > > +    indicates an in-progress update.
> > > > > >
> > > > > > I'm not sure I understand what does the above say. Also does this
> > > > > > require two atomics? Seems pretty expensive. And why is it called
> > > > > > version?
> > > > > >
> > > > > > > > +
> > > > > > > > +    used_idx: Store old index of used ring before we update used ring and
> > > > > > > > +    inflight buffer so that slave can know whether an odd version inflight
> > > > > > > > +    head-descriptor in inflight buffer is processed or not.
> > > > > >
> > > > > > Here too.
> > > > > >
> > > > >
> > > > > Sorry, the above description may be not clear. This two fields are
> > > > > used to indicate whether we have an in-progress update to used ring
> > > > > and inflight buffer. If slave crash before the update to used_ring and
> > > > > after the update to inflight buffer, the version should be odd and
> > > > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > > > the update to inflight buffer. As for the name of the version filed,
> > > > > actually I didn't find a good one, so I just copy it from struct
> > > > > kvm_steal_time...
> > > > >
> > > > > > > > +
> > > > > > > > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > > > > > > > +    slave can resubmit descriptors in order.
> > > > > >
> > > > > > Why would that be necessary?
> > > > > >
> > > > >
> > > > > Maybe some devices will be able to use it to preserve order after
> > > > > reconnecting in future?
> > > >
> > > > If buffers are used in order then old entries in the ring
> > > > are not overwritten so inflight tracking is not
> > > > necessary. This is exactly the case with vhost user net today.
> > > >
> > >
> > > OK, looks reasonable to me.
> > >
> > > > > > >
> > > > > > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > > > > > support the packed vring layout in VIRTIO 1.1?
> > > > > > >
> > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > > > > >
> > > > > > Probably.
> > > > > >
> > > > >
> > > > > How about supporting packed virtqueue in guest driver?
> > > > >
> > > > > Thanks,
> > > > > Yongji
> > > >
> > > > Depends on the guest right? Linux has it:
> > > >
> > >
> > > Sorry, actually I mean I prefer to support inflight tracking for
> > > packed virtqueue in guest driver. This feature is only used by legacy
> > > virtio 1.0 or virtio 0.9 device. What do you think about it?
> > >
> > > Thanks,
> > > Yongji
> >
> > I don't see what does one have to do with the other.  Either we do or we
> > don't want to do it without downtime and retries. If we don't mind
> > resets then let's do it by making guest changes.
> >
> 
> OK, I see. But seems like now we don't support packed virtqueue in
> qemu. Is it better to do that in a splited patchset? Add packed
> virtqueue implenment in contrib/libvhost-user.c, then document the
> inflight I/O tracking behavior.
> 
> Thanks,
> Yongji

it's just documentation so it does not have to depend
on actual code.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend
  2019-01-30  4:07                 ` Michael S. Tsirkin
@ 2019-01-30  4:11                   ` Yongji Xie
  0 siblings, 0 replies; 26+ messages in thread
From: Yongji Xie @ 2019-01-30  4:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, Marc-André Lureau, Daniel P. Berrangé,
	Jason Wang, Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Wed, 30 Jan 2019 at 12:07, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 30, 2019 at 11:49:56AM +0800, Yongji Xie wrote:
> > On Wed, 30 Jan 2019 at 10:30, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> > > > On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > > > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohimes@gmail.com wrote:
> > > > > > > > > +typedef struct DescState {
> > > > > > > > > +    uint8_t inuse;
> > > > > > > > > +    uint8_t version;
> > > > > > > > > +    uint16_t used_idx;
> > > > > > > > > +    uint16_t avail_idx;
> > > > > > > > > +    uint16_t reserved;
> > > > > > > > > +} DescState;
> > > > > > > > > +
> > > > > > > > > +typedef struct QueueRegion {
> > > > > > > > > +    uint8_t valid;
> > > > > > >
> > > > > > > what's this?
> > > > > > >
> > > > > >
> > > > > > We can use this to check whether this buffer is reset by qemu.
> > > > >
> > > > >
> > > > > I'd use version here. Document that it's 1 currently.
> > > >
> > > > If we put version into the shared buffer, QEMU will reset it when vm
> > > > reset. Then if backend restart at the same time, the version of this
> > > > buffer will be lost. So I still let QEMU maintain it and send it back
> > > > through message's payload.
> > >
> > > I don't get it. If it's reset there's no contents.
> > > If there's no contents then who cares what the version is?
> > >
> >
> > What I thought before is that we should not update the buffer when vm
> > reset. But now seems like it's unnecessary. I will update this patch
> > as you said. Thank you!
> >
> > > > > Or do we want feature flags so we can support downgrades too?
> > > > >
> > > >
> > > > We faced the same problem that the feature flags will be lost if QEMU
> > > > do not maintain it. So maybe we should put this into message's
> > > > payload?
> > >
> > > I don't understand why we care. We only maintain inflight so we
> > > don't need to reset the device. if device is reset we don't
> > > need to maintain them at all.
> > >
> > > > >
> > > > > > > > > +    uint16_t desc_num;
> > > > > > >
> > > > > > > there's padding before this field. Pls make it explicit.
> > > > > > >
> > > > > >
> > > > > > Will do it.
> > > > > >
> > > > > > > > > +    DescState desc[0];
> > > > > > > > > +} QueueRegion;
> > > > > > > > > +
> > > > > > > > > +The struct DescState is used to describe one head-descriptor's state. The
> > > > > > > > > +fields have following meanings:
> > > > > > > > > +
> > > > > > > > > +    inuse: Indicate whether the descriptor is inuse or not.
> > > > > > >
> > > > > > > inuse by what?
> > > > > > >
> > > > > >
> > > > > > Maybe inflight is better?
> > > > > >
> > > > > > > > > +
> > > > > > > > > +    version: Indicate whether we have an atomic update to used ring and
> > > > > > > > > +    inflight buffer when slave crash at that point. This field should be
> > > > > > > > > +    increased by one before and after this two updates. An odd version
> > > > > > > > > +    indicates an in-progress update.
> > > > > > >
> > > > > > > I'm not sure I understand what does the above say. Also does this
> > > > > > > require two atomics? Seems pretty expensive. And why is it called
> > > > > > > version?
> > > > > > >
> > > > > > > > > +
> > > > > > > > > +    used_idx: Store old index of used ring before we update used ring and
> > > > > > > > > +    inflight buffer so that slave can know whether an odd version inflight
> > > > > > > > > +    head-descriptor in inflight buffer is processed or not.
> > > > > > >
> > > > > > > Here too.
> > > > > > >
> > > > > >
> > > > > > Sorry, the above description may be not clear. This two fields are
> > > > > > used to indicate whether we have an in-progress update to used ring
> > > > > > and inflight buffer. If slave crash before the update to used_ring and
> > > > > > after the update to inflight buffer, the version should be odd and
> > > > > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > > > > the update to inflight buffer. As for the name of the version filed,
> > > > > > actually I didn't find a good one, so I just copy it from struct
> > > > > > kvm_steal_time...
> > > > > >
> > > > > > > > > +
> > > > > > > > > +    avail_idx: Used to preserve the descriptor's order in avail ring so that
> > > > > > > > > +    slave can resubmit descriptors in order.
> > > > > > >
> > > > > > > Why would that be necessary?
> > > > > > >
> > > > > >
> > > > > > Maybe some devices will be able to use it to preserve order after
> > > > > > reconnecting in future?
> > > > >
> > > > > If buffers are used in order then old entries in the ring
> > > > > are not overwritten so inflight tracking is not
> > > > > necessary. This is exactly the case with vhost user net today.
> > > > >
> > > >
> > > > OK, looks reasonable to me.
> > > >
> > > > > > > >
> > > > > > > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > > > > > > support the packed vring layout in VIRTIO 1.1?
> > > > > > > >
> > > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > > > > > >
> > > > > > > Probably.
> > > > > > >
> > > > > >
> > > > > > How about supporting packed virtqueue in guest driver?
> > > > > >
> > > > > > Thanks,
> > > > > > Yongji
> > > > >
> > > > > Depends on the guest right? Linux has it:
> > > > >
> > > >
> > > > Sorry, actually I mean I prefer to support inflight tracking for
> > > > packed virtqueue in guest driver. This feature is only used by legacy
> > > > virtio 1.0 or virtio 0.9 device. What do you think about it?
> > > >
> > > > Thanks,
> > > > Yongji
> > >
> > > I don't see what does one have to do with the other.  Either we do or we
> > > don't want to do it without downtime and retries. If we don't mind
> > > resets then let's do it by making guest changes.
> > >
> >
> > OK, I see. But seems like now we don't support packed virtqueue in
> > qemu. Is it better to do that in a splited patchset? Add packed
> > virtqueue implenment in contrib/libvhost-user.c, then document the
> > inflight I/O tracking behavior.
> >
> > Thanks,
> > Yongji
>
> it's just documentation so it does not have to depend
> on actual code.

I get it.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-30  2:31   ` Jason Wang
  2019-01-30  3:14     ` Michael S. Tsirkin
  2019-01-30  3:58     ` Yongji Xie
@ 2019-01-30  5:48     ` Yongji Xie
  2019-02-01  2:27       ` Jason Wang
  2 siblings, 1 reply; 26+ messages in thread
From: Yongji Xie @ 2019-01-30  5:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrangé,
	Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji

On Wed, 30 Jan 2019 at 10:32, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
> > +static int
> > +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
> > +{
> > +    if (!has_feature(dev->protocol_features,
> > +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> > +        return 0;
> > +    }
> > +
> > +    if (unlikely(!vq->inflight)) {
> > +        return -1;
> > +    }
> > +
> > +    vq->inflight->desc[desc_idx].inuse = 1;
> > +
> > +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
> > +{
> > +    if (!has_feature(dev->protocol_features,
> > +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> > +        return 0;
> > +    }
> > +
> > +    if (unlikely(!vq->inflight)) {
> > +        return -1;
> > +    }
> > +
> > +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
> > +
> > +    barrier();
> > +
> > +    vq->inflight->desc[desc_idx].version++;
> > +
> > +    return 0;
> > +}
>
>
> You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the
> value reach memory.
>

The cache line should have been flushed during crash. So we can see
the correct value when backend reconnecting. If so, compile barrier
should be enough here, right?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-30  3:14     ` Michael S. Tsirkin
@ 2019-01-30  9:52       ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2019-01-30  9:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: elohimes, stefanha, marcandre.lureau, berrange, maxime.coquelin,
	yury-kotov, wrfsh, nixun, qemu-devel, lilin24, zhangyu31,
	chaiwen, Xie Yongji


On 2019/1/30 上午11:14, Michael S. Tsirkin wrote:
> On Wed, Jan 30, 2019 at 10:31:49AM +0800, Jason Wang wrote:
>> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
>>> +static int
>>> +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
>>> +{
>>> +    if (!has_feature(dev->protocol_features,
>>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (unlikely(!vq->inflight)) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    vq->inflight->desc[desc_idx].inuse = 1;
>>> +
>>> +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
>>> +{
>>> +    if (!has_feature(dev->protocol_features,
>>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (unlikely(!vq->inflight)) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
>>> +
>>> +    barrier();
>>> +
>>> +    vq->inflight->desc[desc_idx].version++;
>>> +
>>> +    return 0;
>>> +}
>>
>> You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the
>> value reach memory.
>>
>> Thanks
>>
> WRITE_ONCE is literally volatile + dependency memory barrier.
> So unless compiler is a very agressive one, it does not
> buy you much.


Well, since version is increased twice, if compiler decide the inline 
both vu_queue_inflight_pre_put() and vu_queue_inflight_post_put(), can 
we make sure it always generate instructions that write to memory 
instead of registers?

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-30  3:58     ` Yongji Xie
@ 2019-02-01  2:26       ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2019-02-01  2:26 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrangé,
	Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji


On 2019/1/30 上午11:58, Yongji Xie wrote:
> On Wed, 30 Jan 2019 at 10:32, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
>>> +static int
>>> +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
>>> +{
>>> +    if (!has_feature(dev->protocol_features,
>>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (unlikely(!vq->inflight)) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    vq->inflight->desc[desc_idx].inuse = 1;
>>> +
>>> +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
>>> +{
>>> +    if (!has_feature(dev->protocol_features,
>>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (unlikely(!vq->inflight)) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
>>> +
>>> +    barrier();
>>> +
>>> +    vq->inflight->desc[desc_idx].version++;
>>> +
>>> +    return 0;
>>> +}
>>
>> You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the
>> value reach memory.
>>
> Is it enough to declare those variables as volatile?
>
> Thanks,
> Yongji


I think so.

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-01-30  5:48     ` Yongji Xie
@ 2019-02-01  2:27       ` Jason Wang
  2019-02-05  1:37         ` Yongji Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2019-02-01  2:27 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrangé,
	Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji


On 2019/1/30 下午1:48, Yongji Xie wrote:
> On Wed, 30 Jan 2019 at 10:32, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
>>> +static int
>>> +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
>>> +{
>>> +    if (!has_feature(dev->protocol_features,
>>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (unlikely(!vq->inflight)) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    vq->inflight->desc[desc_idx].inuse = 1;
>>> +
>>> +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
>>> +{
>>> +    if (!has_feature(dev->protocol_features,
>>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (unlikely(!vq->inflight)) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
>>> +
>>> +    barrier();
>>> +
>>> +    vq->inflight->desc[desc_idx].version++;
>>> +
>>> +    return 0;
>>> +}
>>
>> You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the
>> value reach memory.
>>
> The cache line should have been flushed during crash. So we can see
> the correct value when backend reconnecting. If so, compile barrier
> should be enough here, right?


Maybe I worry too much but it's not about flushing cache, but about 
whether or not compiler can generate mov to memory instead of mov to 
registers.

Thanks


>
> Thanks,
> Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory
  2019-02-01  2:27       ` Jason Wang
@ 2019-02-05  1:37         ` Yongji Xie
  0 siblings, 0 replies; 26+ messages in thread
From: Yongji Xie @ 2019-02-05  1:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, Marc-André Lureau,
	Daniel P. Berrangé,
	Coquelin, Maxime, Yury Kotov,
	Евгений
	Яковлев,
	nixun, qemu-devel, lilin24, zhangyu31, chaiwen, Xie Yongji

On Fri, 1 Feb 2019 at 10:28, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2019/1/30 下午1:48, Yongji Xie wrote:
> > On Wed, 30 Jan 2019 at 10:32, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2019/1/22 下午4:31, elohimes@gmail.com wrote:
> >>> +static int
> >>> +vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
> >>> +{
> >>> +    if (!has_feature(dev->protocol_features,
> >>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (unlikely(!vq->inflight)) {
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    vq->inflight->desc[desc_idx].inuse = 1;
> >>> +
> >>> +    vq->inflight->desc[desc_idx].avail_idx = vq->last_avail_idx;
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static int
> >>> +vu_queue_inflight_pre_put(VuDev *dev, VuVirtq *vq, int desc_idx)
> >>> +{
> >>> +    if (!has_feature(dev->protocol_features,
> >>> +        VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (unlikely(!vq->inflight)) {
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    vq->inflight->desc[desc_idx].used_idx = vq->used_idx;
> >>> +
> >>> +    barrier();
> >>> +
> >>> +    vq->inflight->desc[desc_idx].version++;
> >>> +
> >>> +    return 0;
> >>> +}
> >>
> >> You probably need WRITE_ONCE() semantic (e.g volatile) to make sure the
> >> value reach memory.
> >>
> > The cache line should have been flushed during crash. So we can see
> > the correct value when backend reconnecting. If so, compile barrier
> > should be enough here, right?
>
>
> Maybe I worry too much but it's not about flushing cache, but about
> whether or not compiler can generate mov to memory instead of mov to
> registers.
>

OK, I see. I will declare those variables as volatile in v6. Thank you.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-02-05  1:42 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-22  8:31 [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend elohimes
2019-01-29  4:11   ` Stefan Hajnoczi
2019-01-29  4:26     ` Michael S. Tsirkin
2019-01-29  6:15       ` Yongji Xie
2019-01-29 14:15         ` Michael S. Tsirkin
2019-01-30  2:07           ` Yongji Xie
2019-01-30  2:30             ` Michael S. Tsirkin
2019-01-30  3:49               ` Yongji Xie
2019-01-30  4:07                 ` Michael S. Tsirkin
2019-01-30  4:11                   ` Yongji Xie
2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 2/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 3/6] libvhost-user: Support tracking inflight I/O in shared memory elohimes
2019-01-30  2:31   ` Jason Wang
2019-01-30  3:14     ` Michael S. Tsirkin
2019-01-30  9:52       ` Jason Wang
2019-01-30  3:58     ` Yongji Xie
2019-02-01  2:26       ` Jason Wang
2019-01-30  5:48     ` Yongji Xie
2019-02-01  2:27       ` Jason Wang
2019-02-05  1:37         ` Yongji Xie
2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 4/6] vhost-user-blk: Add support to get/set inflight buffer elohimes
2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 5/6] vhost-user-blk: Add support to reconnect backend elohimes
2019-01-22  8:31 ` [Qemu-devel] [PATCH v5 6/6] contrib/vhost-user-blk: enable inflight I/O tracking elohimes
2019-01-30  2:29 ` [Qemu-devel] [PATCH v5 0/6] vhost-user-blk: Add support for backend reconnecting Jason Wang
2019-01-30  3:40   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.