All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting
@ 2018-12-18  9:59 elohimes
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket elohimes
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: elohimes @ 2018-12-18  9:59 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patchset is aimed at supporting qemu to reconnect
vhost-user-blk backend after vhost-user-blk backend crash or
restart.

The patch 1 adds a "disconnected" option to init the chardev socket
in disconnected state.

The patch 2 introduces two new messages VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD to support providing shared
memory to backend.

The patch 3,4 are the corresponding libvhost-user patches of
patch 2. Make libvhost-user support VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD.

The patch 5 allows vhost-user-blk to use the two new messages
to provide shared memory to backend.

The patch 6 supports vhost-user-blk to reconnect backend when
connection closed.

The patch 7 introduces VHOST_USER_PROTOCOL_F_SLAVE_SHMFD
to vhost-user-blk backend which is used to tell qemu that
we support reconnecting now.

To use it, we could start qemu with:

qemu-system-x86_64 \
        -chardev socket,id=char0,path=/path/vhost.socket,disconnected,reconnect=1, \
        -device vhost-user-blk-pci,chardev=char0 \

and start vhost-user-blk backend with:

vhost-user-blk -b /path/file -s /path/vhost.socket

Then we can restart vhost-user-blk at any time during VM running.

V1 to V2:
- Introduce "disconnected" option for chardev instead of reuse "wait"
  option
- Support the case that QEMU starts before vhost-user backend
- Drop message VHOST_USER_SET_VRING_INFLIGHT
- Introduce two new messages VHOST_USER_GET_SHM_SIZE
  and VHOST_USER_SET_SHM_FD

Xie Yongji (7):
  chardev: Add disconnected option for chardev socket
  vhost-user: Support providing shared memory to backend
  libvhost-user: Introduce vu_queue_map_desc()
  libvhost-user: Support recording inflight I/O in shared memory
  vhost-user-blk: Add support to provide shared memory to backend
  vhost-user-blk: Add support to reconnect backend
  contrib/vhost-user-blk: enable inflight I/O recording

 chardev/char-socket.c                   |  10 +
 chardev/char.c                          |   3 +
 contrib/libvhost-user/libvhost-user.c   | 309 ++++++++++++++++++++----
 contrib/libvhost-user/libvhost-user.h   |  33 +++
 contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
 docs/interop/vhost-user.txt             |  41 ++++
 hw/block/vhost-user-blk.c               | 205 ++++++++++++++--
 hw/virtio/vhost-user.c                  |  86 +++++++
 hw/virtio/vhost.c                       | 117 +++++++++
 include/hw/virtio/vhost-backend.h       |   9 +
 include/hw/virtio/vhost-user-blk.h      |   5 +
 include/hw/virtio/vhost.h               |  19 ++
 qapi/char.json                          |   3 +
 qemu-options.hx                         |  28 ++-
 14 files changed, 790 insertions(+), 81 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
@ 2018-12-18  9:59 ` elohimes
  2018-12-18 12:24   ` Marc-André Lureau
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: elohimes @ 2018-12-18  9:59 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

New option "disconnected" is added to init the chardev socket
in disconnected state. Then we can use qemu_chr_fe_wait_connected()
to connect when necessary. Now it would be used for unix domain
socket of vhost-user-blk device to support reconnect.

Suggested-by: Yury Kotov <yury-kotov@yandex-team.ru>
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 chardev/char-socket.c | 10 ++++++++++
 chardev/char.c        |  3 +++
 qapi/char.json        |  3 +++
 qemu-options.hx       | 28 ++++++++++++++++------------
 4 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index eaa8e8b68f..2464d7abad 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -70,6 +70,7 @@ typedef struct {
     TCPChardevTelnetInit *telnet_init;
 
     bool is_websock;
+    bool is_disconnected;
 
     GSource *reconnect_timer;
     int64_t reconnect_time;
@@ -1000,6 +1001,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
     bool is_tn3270      = sock->has_tn3270  ? sock->tn3270  : false;
     bool is_waitconnect = sock->has_wait    ? sock->wait    : false;
     bool is_websock     = sock->has_websocket ? sock->websocket : false;
+    bool is_disconnected = sock->has_disconnected ? sock->disconnected : false;
     int64_t reconnect   = sock->has_reconnect ? sock->reconnect : 0;
     QIOChannelSocket *sioc = NULL;
     SocketAddress *addr;
@@ -1072,6 +1074,10 @@ static void qmp_chardev_open_socket(Chardev *chr,
         s->reconnect_time = reconnect;
     }
 
+    if (!s->is_listen && is_disconnected) {
+        return;
+    }
+
     if (s->reconnect_time) {
         tcp_chr_connect_async(chr);
     } else {
@@ -1125,6 +1131,8 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
     bool is_tn3270      = qemu_opt_get_bool(opts, "tn3270", false);
     bool is_websock     = qemu_opt_get_bool(opts, "websocket", false);
     bool do_nodelay     = !qemu_opt_get_bool(opts, "delay", true);
+    bool is_disconnected = !is_listen &&
+                           qemu_opt_get_bool(opts, "disconnected", false);
     int64_t reconnect   = qemu_opt_get_number(opts, "reconnect", 0);
     const char *path = qemu_opt_get(opts, "path");
     const char *host = qemu_opt_get(opts, "host");
@@ -1176,6 +1184,8 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
     sock->websocket = is_websock;
     sock->has_wait = true;
     sock->wait = is_waitconnect;
+    sock->has_disconnected = true;
+    sock->disconnected = is_disconnected;
     sock->has_reconnect = qemu_opt_find(opts, "reconnect");
     sock->reconnect = reconnect;
     sock->tls_creds = g_strdup(tls_creds);
diff --git a/chardev/char.c b/chardev/char.c
index ccba36bafb..fb916eb176 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -862,6 +862,9 @@ QemuOptsList qemu_chardev_opts = {
         },{
             .name = "delay",
             .type = QEMU_OPT_BOOL,
+        },{
+            .name = "disconnected",
+            .type = QEMU_OPT_BOOL,
         },{
             .name = "reconnect",
             .type = QEMU_OPT_NUMBER,
diff --git a/qapi/char.json b/qapi/char.json
index 77ed847972..b634442229 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -258,6 +258,8 @@
 #          sockets (default: false) (Since: 2.10)
 # @websocket: enable websocket protocol on server
 #           sockets (default: false) (Since: 3.1)
+# @disconnected: init a client socket in disconnected
+#                state (default: false) (Since: 4.0)
 # @reconnect: For a client socket, if a socket is disconnected,
 #          then attempt a reconnect after the given number of seconds.
 #          Setting this to zero disables this function. (default: 0)
@@ -274,6 +276,7 @@
             '*telnet': 'bool',
             '*tn3270': 'bool',
             '*websocket': 'bool',
+            '*disconnected': 'bool',
             '*reconnect': 'int' },
   'base': 'ChardevCommon' }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index df42116ecc..f9d3495dc8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2418,10 +2418,10 @@ DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
     "-chardev help\n"
     "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
     "-chardev socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]\n"
-    "         [,server][,nowait][,telnet][,websocket][,reconnect=seconds][,mux=on|off]\n"
-    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID] (tcp)\n"
-    "-chardev socket,id=id,path=path[,server][,nowait][,telnet][,websocket][,reconnect=seconds]\n"
-    "         [,mux=on|off][,logfile=PATH][,logappend=on|off] (unix)\n"
+    "         [,server][,nowait][,telnet][,websocket][,disconnected][,reconnect=seconds]\n"
+    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,tls-creds=ID] (tcp)\n"
+    "-chardev socket,id=id,path=path[,server][,nowait][,telnet][,websocket][,disconnected]\n"
+    "         [,reconnect=seconds][,mux=on|off][,logfile=PATH][,logappend=on|off] (unix)\n"
     "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
     "         [,localport=localport][,ipv4][,ipv6][,mux=on|off]\n"
     "         [,logfile=PATH][,logappend=on|off]\n"
@@ -2548,7 +2548,7 @@ The available backends are:
 A void device. This device will not emit any data, and will drop any data it
 receives. The null backend does not take any options.
 
-@item -chardev socket,id=@var{id}[,@var{TCP options} or @var{unix options}][,server][,nowait][,telnet][,websocket][,reconnect=@var{seconds}][,tls-creds=@var{id}]
+@item -chardev socket,id=@var{id}[,@var{TCP options} or @var{unix options}][,server][,nowait][,telnet][,websocket][,disconnected][,reconnect=@var{seconds}][,tls-creds=@var{id}]
 
 Create a two-way stream socket, which can be either a TCP or a unix socket. A
 unix socket will be created if @option{path} is specified. Behaviour is
@@ -2565,6 +2565,9 @@ escape sequences.
 @option{websocket} specifies that the socket uses WebSocket protocol for
 communication.
 
+@option{disconnected} specifies that the non-server socket should not try
+to connect the remote during initialization.
+
 @option{reconnect} sets the timeout for reconnecting on non-server sockets when
 the remote end goes away.  qemu will delay this many seconds and then attempt
 to reconnect.  Zero disables reconnecting, and is the default.
@@ -3087,18 +3090,19 @@ telnet on port 5555 to access the QEMU port.
 localhost 5555
 @end table
 
-@item tcp:[@var{host}]:@var{port}[,@var{server}][,nowait][,nodelay][,reconnect=@var{seconds}]
+@item tcp:[@var{host}]:@var{port}[,@var{server}][,nowait][,nodelay][,disconnected]
+[,reconnect=@var{seconds}]
 The TCP Net Console has two modes of operation.  It can send the serial
 I/O to a location or wait for a connection from a location.  By default
 the TCP Net Console is sent to @var{host} at the @var{port}.  If you use
 the @var{server} option QEMU will wait for a client socket application
 to connect to the port before continuing, unless the @code{nowait}
 option was specified.  The @code{nodelay} option disables the Nagle buffering
-algorithm.  The @code{reconnect} option only applies if @var{noserver} is
-set, if the connection goes down it will attempt to reconnect at the
-given interval.  If @var{host} is omitted, 0.0.0.0 is assumed. Only
-one TCP connection at a time is accepted. You can use @code{telnet} to
-connect to the corresponding character device.
+algorithm.  If @var{noserver} is specified, the @code{disconnected} will disallow
+QEMU to connect during initialization, and the @code{reconnect} will ask QEMU
+to reconnect at the given interval when the connection goes down.  If @var{host}
+is omitted, 0.0.0.0 is assumed. Only one TCP connection at a time is accepted.
+You can use @code{telnet} to connect to the corresponding character device.
 @table @code
 @item Example to send tcp console to 192.168.0.2 port 4444
 -serial tcp:192.168.0.2:4444
@@ -3121,7 +3125,7 @@ type "send break" followed by pressing the enter key.
 The WebSocket protocol is used instead of raw tcp socket. The port acts as
 a WebSocket server. Client mode is not supported.
 
-@item unix:@var{path}[,server][,nowait][,reconnect=@var{seconds}]
+@item unix:@var{path}[,server][,nowait][,disconnected][,reconnect=@var{seconds}]
 A unix domain socket is used instead of a tcp socket.  The option works the
 same as if you had specified @code{-serial tcp} except the unix domain socket
 @var{path} is used for connections.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket elohimes
@ 2018-12-18  9:59 ` elohimes
  2018-12-18 14:25   ` Michael S. Tsirkin
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: elohimes @ 2018-12-18  9:59 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD to support providing shared
memory to backend.

Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
required size of shared memory from backend. Then, qemu
allocates memory and sends them back to backend through
VHOST_USER_SET_SHM_FD.

Note that the shared memory should be used to record
inflight I/O by backend. Qemu will clear it when vm reset.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 docs/interop/vhost-user.txt       |  41 +++++++++++
 hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
 hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-backend.h |   9 +++
 include/hw/virtio/vhost.h         |  19 +++++
 5 files changed, 272 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index c2194711d9..5ee9c28ab0 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -142,6 +142,19 @@ Depending on the request type, payload can be:
    Offset: a 64-bit offset of this area from the start of the
        supplied file descriptor
 
+ * Shm description
+   -----------------------------------
+   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
+   -----------------------------------
+
+   Mmap_size: a 64-bit size of the shared memory
+   Mmap_offset: a 64-bit offset of the shared memory from the start
+                of the supplied file descriptor
+   Dev_size: a 32-bit size of device region in shared memory
+   Vq_size: a 32-bit size of each virtqueue region in shared memory
+   Align: a 32-bit align of each region in shared memory
+   Version: a 32-bit version of this shared memory
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserVringArea area;
+        VhostUserShm shm;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -175,6 +189,7 @@ the ones that do:
  * VHOST_USER_GET_PROTOCOL_FEATURES
  * VHOST_USER_GET_VRING_BASE
  * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
+ * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
 
 [ Also see the section on REPLY_ACK protocol extension. ]
 
@@ -188,6 +203,7 @@ in the ancillary data:
  * VHOST_USER_SET_VRING_CALL
  * VHOST_USER_SET_VRING_ERR
  * VHOST_USER_SET_SLAVE_REQ_FD
+ * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
 
 If Master is unable to send the full message or receives a wrong reply it will
 close the connection. An optional reconnection mechanism can be implemented.
@@ -397,6 +413,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CONFIG         9
 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
+#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
 
 Master message types
 --------------------
@@ -761,6 +778,30 @@ Master message types
       was previously sent.
       The value returned is an error indication; 0 is success.
 
+ * VHOST_USER_GET_SHM_SIZE
+      Id: 31
+      Equivalent ioctl: N/A
+      Master payload: shm description
+
+      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
+      successfully negotiated, master need to provide a shared memory to
+      slave. This message is used by master to get required size from slave.
+      The shared memory contains one region for device and several regions
+      for virtqueue. The size of those two kinds of regions is specified
+      by dev_size field and vq_size filed. The align field specify the alignment
+      of those regions.
+
+ * VHOST_USER_SET_SHM_FD
+      Id: 32
+      Equivalent ioctl: N/A
+      Master payload: shm description
+
+      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
+      successfully negotiated, master uses this message to set shared memory
+      for slave. The memory fd is passed in the ancillary data. The shared
+      memory should be used to record inflight I/O by slave. And master will
+      clear it when vm reset.
+
 Slave message types
 -------------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e09bed0e4a..8cdf3b5121 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_GET_SHM_SIZE = 31,
+    VHOST_USER_SET_SHM_FD = 32,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserShm {
+    uint64_t mmap_size;
+    uint64_t mmap_offset;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+} VhostUserShm;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -169,6 +181,7 @@ typedef union {
         VhostUserConfig config;
         VhostUserCryptoSession session;
         VhostUserVringArea area;
+        VhostUserShm shm;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
     return result;
 }
 
+static int vhost_user_get_shm_size(struct vhost_dev *dev,
+                                   struct vhost_shm *shm)
+{
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_GET_SHM_SIZE,
+        .hdr.flags = VHOST_USER_VERSION,
+        .hdr.size = sizeof(msg.payload.shm),
+    };
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
+        shm->dev_size = 0;
+        shm->vq_size = 0;
+        return 0;
+    }
+
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        return -1;
+    }
+
+    if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
+        error_report("Received unexpected msg type. "
+                     "Expected %d received %d",
+                     VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
+        return -1;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.shm)) {
+        error_report("Received bad msg size.");
+        return -1;
+    }
+
+    shm->dev_size = msg.payload.shm.dev_size;
+    shm->vq_size = msg.payload.shm.vq_size;
+    shm->align = msg.payload.shm.align;
+    shm->version = msg.payload.shm.version;
+
+    return 0;
+}
+
+static int vhost_user_set_shm_fd(struct vhost_dev *dev,
+                                 struct vhost_shm *shm)
+{
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_SET_SHM_FD,
+        .hdr.flags = VHOST_USER_VERSION,
+        .payload.shm.mmap_size = shm->mmap_size,
+        .payload.shm.mmap_offset = 0,
+        .payload.shm.dev_size = shm->dev_size,
+        .payload.shm.vq_size = shm->vq_size,
+        .payload.shm.align = shm->align,
+        .payload.shm.version = shm->version,
+        .hdr.size = sizeof(msg.payload.shm),
+    };
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
+        return 0;
+    }
+
+    if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
 VhostUserState *vhost_user_init(void)
 {
     VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
         .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
+        .vhost_get_shm_size = vhost_user_get_shm_size,
+        .vhost_set_shm_fd = vhost_user_set_shm_fd,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 569c4053ea..7a38fed50f 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
     hdev->config_ops = ops;
 }
 
+void vhost_dev_reset_shm(struct vhost_shm *shm)
+{
+    if (shm->addr) {
+        memset(shm->addr, 0, shm->mmap_size);
+    }
+}
+
+void vhost_dev_free_shm(struct vhost_shm *shm)
+{
+    if (shm->addr) {
+        qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
+        shm->addr = NULL;
+        shm->fd = -1;
+    }
+}
+
+int vhost_dev_alloc_shm(struct vhost_shm *shm)
+{
+    Error *err = NULL;
+    int fd = -1;
+    void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
+                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
+                                  &fd, &err);
+    if (err) {
+        error_report_err(err);
+        return -1;
+    }
+
+    shm->addr = addr;
+    shm->fd = fd;
+
+    return 0;
+}
+
+void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
+{
+    if (shm->addr) {
+        qemu_put_be64(f, shm->mmap_size);
+        qemu_put_be32(f, shm->dev_size);
+        qemu_put_be32(f, shm->vq_size);
+        qemu_put_be32(f, shm->align);
+        qemu_put_be32(f, shm->version);
+        qemu_put_buffer(f, shm->addr, shm->mmap_size);
+    } else {
+        qemu_put_be64(f, 0);
+    }
+}
+
+int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
+{
+    uint64_t mmap_size;
+
+    mmap_size = qemu_get_be64(f);
+    if (!mmap_size) {
+        return 0;
+    }
+
+    vhost_dev_free_shm(shm);
+
+    shm->mmap_size = mmap_size;
+    shm->dev_size = qemu_get_be32(f);
+    shm->vq_size = qemu_get_be32(f);
+    shm->align = qemu_get_be32(f);
+    shm->version = qemu_get_be32(f);
+
+    if (vhost_dev_alloc_shm(shm)) {
+        return -ENOMEM;
+    }
+
+    qemu_get_buffer(f, shm->addr, mmap_size);
+
+    return 0;
+}
+
+int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
+{
+    int r;
+
+    if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
+        r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
+        if (r) {
+            VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
+            return -errno;
+        }
+    }
+
+    return 0;
+}
+
+int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
+{
+    int r;
+
+    if (dev->vhost_ops->vhost_get_shm_size) {
+        r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
+        if (r) {
+            VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
+            return -errno;
+        }
+
+        if (!shm->dev_size && !shm->vq_size) {
+            return 0;
+        }
+
+        shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
+                         dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
+
+        if (vhost_dev_alloc_shm(shm)) {
+            return -ENOMEM;
+        }
+
+        vhost_dev_reset_shm(shm);
+    }
+
+    return 0;
+}
+
 /* Host notifiers must be enabled at this point. */
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81283ec50f..4e7f13c9e9 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
     VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
 } VhostSetConfigType;
 
+struct vhost_shm;
 struct vhost_dev;
 struct vhost_log;
 struct vhost_memory;
@@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
 typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
                                                 MemoryRegionSection *section);
 
+typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
+                                     struct vhost_shm *shm);
+
+typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
+                                   struct vhost_shm *shm);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -142,6 +149,8 @@ typedef struct VhostOps {
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
     vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
+    vhost_get_shm_size_op vhost_get_shm_size;
+    vhost_set_shm_fd_op vhost_set_shm_fd;
 } VhostOps;
 
 extern const VhostOps user_ops;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a7f449fa87..b6e3d6ab56 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -7,6 +7,17 @@
 #include "exec/memory.h"
 
 /* Generic structures common for any vhost based device. */
+
+struct vhost_shm {
+    void *addr;
+    uint64_t mmap_size;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+    int fd;
+};
+
 struct vhost_virtqueue {
     int kick;
     int call;
@@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
  */
 void vhost_dev_set_config_notifier(struct vhost_dev *dev,
                                    const VhostDevConfigOps *ops);
+
+void vhost_dev_reset_shm(struct vhost_shm *shm);
+void vhost_dev_free_shm(struct vhost_shm *shm);
+int vhost_dev_alloc_shm(struct vhost_shm *shm);
+void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
+int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
+int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
+int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc()
  2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket elohimes
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
@ 2018-12-18  9:59 ` elohimes
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: elohimes @ 2018-12-18  9:59 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Introduce vu_queue_map_desc() which should be
independent with vu_queue_pop();

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 88 ++++++++++++++++-----------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index a6b46cdc03..23bd52264c 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -1853,49 +1853,20 @@ virtqueue_alloc_element(size_t sz,
     return elem;
 }
 
-void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+static void *
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
 {
-    unsigned int i, head, max, desc_len;
+    struct vring_desc *desc = vq->vring.desc;
     uint64_t desc_addr, read_len;
+    unsigned int desc_len;
+    unsigned int max = vq->vring.num;
+    unsigned int i = idx;
     VuVirtqElement *elem;
-    unsigned out_num, in_num;
+    unsigned int out_num = 0, in_num = 0;
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
     struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE];
-    struct vring_desc *desc;
     int rc;
 
-    if (unlikely(dev->broken) ||
-        unlikely(!vq->vring.avail)) {
-        return NULL;
-    }
-
-    if (vu_queue_empty(dev, vq)) {
-        return NULL;
-    }
-    /* Needed after virtio_queue_empty(), see comment in
-     * virtqueue_num_heads(). */
-    smp_rmb();
-
-    /* When we start there are none of either input nor output. */
-    out_num = in_num = 0;
-
-    max = vq->vring.num;
-    if (vq->inuse >= vq->vring.num) {
-        vu_panic(dev, "Virtqueue size exceeded");
-        return NULL;
-    }
-
-    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
-        return NULL;
-    }
-
-    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
-        vring_set_avail_event(vq, vq->last_avail_idx);
-    }
-
-    i = head;
-    desc = vq->vring.desc;
     if (desc[i].flags & VRING_DESC_F_INDIRECT) {
         if (desc[i].len % sizeof(struct vring_desc)) {
             vu_panic(dev, "Invalid size for indirect buffer table");
@@ -1947,12 +1918,13 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
+        vu_panic(dev, "read descriptor error");
         return NULL;
     }
 
     /* Now copy what we have collected and mapped */
     elem = virtqueue_alloc_element(sz, out_num, in_num);
-    elem->index = head;
+    elem->index = idx;
     for (i = 0; i < out_num; i++) {
         elem->out_sg[i] = iov[i];
     }
@@ -1960,6 +1932,48 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         elem->in_sg[i] = iov[out_num + i];
     }
 
+    return elem;
+}
+
+void *
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+{
+    unsigned int head;
+    VuVirtqElement *elem;
+
+    if (unlikely(dev->broken) ||
+        unlikely(!vq->vring.avail)) {
+        return NULL;
+    }
+
+    if (vu_queue_empty(dev, vq)) {
+        return NULL;
+    }
+    /*
+     * Needed after virtio_queue_empty(), see comment in
+     * virtqueue_num_heads().
+     */
+    smp_rmb();
+
+    if (vq->inuse >= vq->vring.num) {
+        vu_panic(dev, "Virtqueue size exceeded");
+        return NULL;
+    }
+
+    if (!virtqueue_get_head(dev, vq, vq->last_avail_idx++, &head)) {
+        return NULL;
+    }
+
+    if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
+        vring_set_avail_event(vq, vq->last_avail_idx);
+    }
+
+    elem = vu_queue_map_desc(dev, vq, head, sz);
+
+    if (!elem) {
+        return NULL;
+    }
+
     vq->inuse++;
 
     return elem;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory
  2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (2 preceding siblings ...)
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
@ 2018-12-18  9:59 ` elohimes
  2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: elohimes @ 2018-12-18  9:59 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch adds support for VHOST_USER_GET_SHM_SIZE and
VHOST_USER_SET_SHM_FD message to get shared memory from qemu.
Then we maintain a "bitmap" of all descriptors in
the shared memory for each queue to record inflight I/O.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/libvhost-user/libvhost-user.c | 221 +++++++++++++++++++++++++-
 contrib/libvhost-user/libvhost-user.h |  33 ++++
 2 files changed, 248 insertions(+), 6 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 23bd52264c..f18f5e6e62 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -53,6 +53,18 @@
             _min1 < _min2 ? _min1 : _min2; })
 #endif
 
+/* Round number down to multiple */
+#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
+
+/* Round number up to multiple */
+#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
+
+/* Align each region to cache line size in shared memory */
+#define SHM_ALIGNMENT 64
+
+/* The version of shared memory */
+#define SHM_VERSION 1
+
 #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
 
 /* The version of the protocol we support */
@@ -100,6 +112,8 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_POSTCOPY_LISTEN),
         REQ(VHOST_USER_POSTCOPY_END),
+        REQ(VHOST_USER_GET_SHM_SIZE),
+        REQ(VHOST_USER_SET_SHM_FD),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -890,6 +904,41 @@ vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static int
+vu_check_queue_inflights(VuDev *dev, VuVirtq *vq)
+{
+    int i = 0;
+
+    if ((dev->protocol_features &
+        VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->shm)) {
+        return -1;
+    }
+
+    vq->used_idx = vq->vring.used->idx;
+    vq->inflight_num = 0;
+    for (i = 0; i < vq->vring.num; i++) {
+        if (vq->shm->inflight[i] == 0) {
+            continue;
+        }
+
+        vq->inflight_desc[vq->inflight_num++] = i;
+        vq->inuse++;
+    }
+    vq->shadow_avail_idx = vq->last_avail_idx = vq->inuse + vq->used_idx;
+
+    /* in case of I/O hang after reconnecting */
+    if (eventfd_write(vq->kick_fd, 1) ||
+        eventfd_write(vq->call_fd, 1)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 static bool
 vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -925,6 +974,10 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
                dev->vq[index].kick_fd, index);
     }
 
+    if (vu_check_queue_inflights(dev, &dev->vq[index])) {
+        vu_panic(dev, "Failed to check inflights for vq: %d\n", index);
+    }
+
     return false;
 }
 
@@ -1215,6 +1268,115 @@ vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
     return true;
 }
 
+static int
+vu_setup_shm(VuDev *dev)
+{
+    int i;
+    char *addr = (char *)dev->shm_info.addr;
+    uint64_t size = 0;
+    uint32_t vq_size = ALIGN_UP(dev->shm_info.vq_size, dev->shm_info.align);
+
+    if (dev->shm_info.version != SHM_VERSION) {
+        DPRINT("Invalid version for shm: %d", dev->shm_info.version);
+        return -1;
+    }
+
+    if (dev->shm_info.dev_size != 0) {
+        DPRINT("Invalid dev_size for shm: %d", dev->shm_info.dev_size);
+        return -1;
+    }
+
+    if (dev->shm_info.vq_size != sizeof(VuVirtqShm)) {
+        DPRINT("Invalid vq_size for shm: %d", dev->shm_info.vq_size);
+        return -1;
+    }
+
+    for (i = 0; i < VHOST_MAX_NR_VIRTQUEUE; i++) {
+        size += vq_size;
+        if (size > dev->shm_info.mmap_size) {
+            break;
+        }
+        dev->vq[i].shm = (VuVirtqShm *)addr;
+        addr += vq_size;
+    }
+
+    return 0;
+}
+
+static bool
+vu_get_shm_size(VuDev *dev, VhostUserMsg *vmsg)
+{
+    if (vmsg->size != sizeof(vmsg->payload.shm)) {
+        vu_panic(dev, "Invalid get_shm_size message:%d", vmsg->size);
+        vmsg->size = 0;
+        return true;
+    }
+
+    vmsg->payload.shm.dev_size = 0;
+    vmsg->payload.shm.vq_size = sizeof(VuVirtqShm);
+    vmsg->payload.shm.align = SHM_ALIGNMENT;
+    vmsg->payload.shm.version = SHM_VERSION;
+
+    DPRINT("send shm dev_size: %"PRId32"\n", vmsg->payload.shm.dev_size);
+    DPRINT("send shm vq_size: %"PRId32"\n", vmsg->payload.shm.vq_size);
+    DPRINT("send shm align: %"PRId32"\n", vmsg->payload.shm.align);
+    DPRINT("send shm version: %"PRId32"\n", vmsg->payload.shm.version);
+
+    return true;
+}
+
+static bool
+vu_set_shm_fd(VuDev *dev, VhostUserMsg *vmsg)
+{
+    int fd;
+    uint64_t mmap_size, mmap_offset;
+    void *rc;
+
+    if (vmsg->fd_num != 1 ||
+        vmsg->size != sizeof(vmsg->payload.shm)) {
+        vu_panic(dev, "Invalid set_shm_fd message size:%d fds:%d",
+                 vmsg->size, vmsg->fd_num);
+        return false;
+    }
+
+    fd = vmsg->fds[0];
+    mmap_size = vmsg->payload.shm.mmap_size;
+    mmap_offset = vmsg->payload.shm.mmap_offset;
+    DPRINT("set_shm_fd mmap_size: %"PRId64"\n", mmap_size);
+    DPRINT("set_shm_fd mmap_offset: %"PRId64"\n", mmap_offset);
+    DPRINT("set_shm_fd dev_size: %"PRId32"\n", vmsg->payload.shm.dev_size);
+    DPRINT("set_shm_fd vq_size: %"PRId32"\n", vmsg->payload.shm.vq_size);
+    DPRINT("set_shm_fd align: %"PRId32"\n", vmsg->payload.shm.align);
+    DPRINT("set_shm_fd version: %"PRId32"\n", vmsg->payload.shm.version);
+
+    rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+              fd, mmap_offset);
+
+    close(fd);
+
+    if (rc == MAP_FAILED) {
+        vu_panic(dev, "set_shm_fd mmap error: %s", strerror(errno));
+        return false;
+    }
+
+    if (dev->shm_info.addr) {
+        munmap(dev->shm_info.addr, dev->shm_info.mmap_size);
+    }
+    dev->shm_info.addr = rc;
+    dev->shm_info.mmap_size = mmap_size;
+    dev->shm_info.dev_size = vmsg->payload.shm.dev_size;
+    dev->shm_info.vq_size = vmsg->payload.shm.vq_size;
+    dev->shm_info.align = vmsg->payload.shm.align;
+    dev->shm_info.version = vmsg->payload.shm.version;
+
+    if (vu_setup_shm(dev)) {
+        vu_panic(dev, "setup shm failed");
+        return false;
+    }
+
+    return false;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1292,6 +1454,10 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_postcopy_listen(dev, vmsg);
     case VHOST_USER_POSTCOPY_END:
         return vu_set_postcopy_end(dev, vmsg);
+    case VHOST_USER_GET_SHM_SIZE:
+        return vu_get_shm_size(dev, vmsg);
+    case VHOST_USER_SET_SHM_FD:
+        return vu_set_shm_fd(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
@@ -1359,8 +1525,13 @@ vu_deinit(VuDev *dev)
             close(vq->err_fd);
             vq->err_fd = -1;
         }
+        vq->shm = NULL;
     }
 
+    if (dev->shm_info.addr) {
+        munmap(dev->shm_info.addr, dev->shm_info.mmap_size);
+        dev->shm_info.addr = NULL;
+    }
 
     vu_close_log(dev);
     if (dev->slave_fd != -1) {
@@ -1829,12 +2000,6 @@ virtqueue_map_desc(VuDev *dev,
     *p_num_sg = num_sg;
 }
 
-/* Round number down to multiple */
-#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
-
-/* Round number up to multiple */
-#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
-
 static void *
 virtqueue_alloc_element(size_t sz,
                                      unsigned out_num, unsigned in_num)
@@ -1935,9 +2100,44 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
     return elem;
 }
 
+static int
+vu_queue_inflight_get(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if ((dev->protocol_features &
+        VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->shm)) {
+        return -1;
+    }
+
+    vq->shm->inflight[desc_idx] = 1;
+
+    return 0;
+}
+
+static int
+vu_queue_inflight_put(VuDev *dev, VuVirtq *vq, int desc_idx)
+{
+    if ((dev->protocol_features &
+        VHOST_USER_PROTOCOL_F_SLAVE_SHMFD) == 0) {
+        return 0;
+    }
+
+    if (unlikely(!vq->shm)) {
+        return -1;
+    }
+
+    vq->shm->inflight[desc_idx] = 0;
+
+    return 0;
+}
+
 void *
 vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 {
+    int i;
     unsigned int head;
     VuVirtqElement *elem;
 
@@ -1946,6 +2146,12 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         return NULL;
     }
 
+    if (unlikely(vq->inflight_num > 0)) {
+        i = (--vq->inflight_num);
+        elem = vu_queue_map_desc(dev, vq, vq->inflight_desc[i], sz);
+        return elem;
+    }
+
     if (vu_queue_empty(dev, vq)) {
         return NULL;
     }
@@ -1976,6 +2182,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 
     vq->inuse++;
 
+    vu_queue_inflight_get(dev, vq, head);
+
     return elem;
 }
 
@@ -2121,4 +2329,5 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
 {
     vu_queue_fill(dev, vq, elem, len, 0);
     vu_queue_flush(dev, vq, 1);
+    vu_queue_inflight_put(dev, vq, elem->index);
 }
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 4aa55b4d2d..fdfda688d2 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+    VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -91,6 +92,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_POSTCOPY_END     = 30,
+    VHOST_USER_GET_SHM_SIZE = 31,
+    VHOST_USER_SET_SHM_FD = 32,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -138,6 +141,15 @@ typedef struct VhostUserVringArea {
     uint64_t offset;
 } VhostUserVringArea;
 
+typedef struct VhostUserShm {
+    uint64_t mmap_size;
+    uint64_t mmap_offset;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+} VhostUserShm;
+
 #if defined(_WIN32)
 # define VU_PACKED __attribute__((gcc_struct, packed))
 #else
@@ -163,6 +175,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         VhostUserConfig config;
         VhostUserVringArea area;
+        VhostUserShm shm;
     } payload;
 
     int fds[VHOST_MEMORY_MAX_NREGIONS];
@@ -234,9 +247,19 @@ typedef struct VuRing {
     uint32_t flags;
 } VuRing;
 
+typedef struct VuVirtqShm {
+    char inflight[VIRTQUEUE_MAX_SIZE];
+} VuVirtqShm;
+
 typedef struct VuVirtq {
     VuRing vring;
 
+    VuVirtqShm *shm;
+
+    uint16_t inflight_desc[VIRTQUEUE_MAX_SIZE];
+
+    uint16_t inflight_num;
+
     /* Next head to pop */
     uint16_t last_avail_idx;
 
@@ -279,11 +302,21 @@ typedef void (*vu_set_watch_cb) (VuDev *dev, int fd, int condition,
                                  vu_watch_cb cb, void *data);
 typedef void (*vu_remove_watch_cb) (VuDev *dev, int fd);
 
+typedef struct VuDevShmInfo {
+    void *addr;
+    uint64_t mmap_size;
+    uint32_t dev_size;
+    uint32_t vq_size;
+    uint32_t align;
+    uint32_t version;
+} VuDevShmInfo;
+
 struct VuDev {
     int sock;
     uint32_t nregions;
     VuDevRegion regions[VHOST_MEMORY_MAX_NREGIONS];
     VuVirtq vq[VHOST_MAX_NR_VIRTQUEUE];
+    VuDevShmInfo shm_info;
     int log_call_fd;
     int slave_fd;
     uint64_t log_size;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend
  2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (3 preceding siblings ...)
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
@ 2018-12-18 10:00 ` elohimes
  2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
  2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes
  6 siblings, 0 replies; 30+ messages in thread
From: elohimes @ 2018-12-18 10:00 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch add supports for vhost-user-blk device to provide
shared memory to backend.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/block/vhost-user-blk.c          | 26 ++++++++++++++++++++++++++
 include/hw/virtio/vhost-user-blk.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 1451940845..27028cf996 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -126,6 +126,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
     }
 
     s->dev.acked_features = vdev->guest_features;
+
+    ret = vhost_dev_set_shm(&s->dev, s->shm);
+    if (ret < 0) {
+        error_report("Error set shared memory: %d", -ret);
+        goto err_guest_notifiers;
+    }
+
     ret = vhost_dev_start(&s->dev, vdev);
     if (ret < 0) {
         error_report("Error starting vhost: %d", -ret);
@@ -245,6 +252,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static void vhost_user_blk_reset(VirtIODevice *vdev)
+{
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    vhost_dev_reset_shm(s->shm);
+}
+
 static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -284,6 +298,8 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
                          vhost_user_blk_handle_output);
     }
 
+    s->shm = g_new0(struct vhost_shm, 1);
+
     s->dev.nvqs = s->num_queues;
     s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
     s->dev.vq_index = 0;
@@ -309,12 +325,19 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         s->blkcfg.num_queues = s->num_queues;
     }
 
+    ret = vhost_dev_init_shm(&s->dev, s->shm);
+    if (ret < 0) {
+        error_setg(errp, "vhost-user-blk: init shared memory failed");
+        goto vhost_err;
+    }
+
     return;
 
 vhost_err:
     vhost_dev_cleanup(&s->dev);
 virtio_err:
     g_free(s->dev.vqs);
+    g_free(s->shm);
     virtio_cleanup(vdev);
 
     vhost_user_cleanup(user);
@@ -329,7 +352,9 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
 
     vhost_user_blk_set_status(vdev, 0);
     vhost_dev_cleanup(&s->dev);
+    vhost_dev_free_shm(s->shm);
     g_free(s->dev.vqs);
+    g_free(s->shm);
     virtio_cleanup(vdev);
 
     if (s->vhost_user) {
@@ -379,6 +404,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data)
     vdc->set_config = vhost_user_blk_set_config;
     vdc->get_features = vhost_user_blk_get_features;
     vdc->set_status = vhost_user_blk_set_status;
+    vdc->reset = vhost_user_blk_reset;
 }
 
 static const TypeInfo vhost_user_blk_info = {
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index d52944aeeb..bb706d70b3 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -36,6 +36,7 @@ typedef struct VHostUserBlk {
     uint32_t queue_size;
     uint32_t config_wce;
     struct vhost_dev dev;
+    struct vhost_shm *shm;
     VhostUserState *vhost_user;
 } VHostUserBlk;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (4 preceding siblings ...)
  2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
@ 2018-12-18 10:00 ` elohimes
  2018-12-18 12:30   ` Yury Kotov
  2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes
  6 siblings, 1 reply; 30+ messages in thread
From: elohimes @ 2018-12-18 10:00 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

Since we now support the message VHOST_USER_GET_SHM_SIZE
and VHOST_USER_SET_SHM_FD. The backend is able to restart
safely because it can record inflight I/O in shared memory.
This patch allows qemu to reconnect the backend after
connection closed.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Ni Xun <nixun@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 hw/block/vhost-user-blk.c          | 183 ++++++++++++++++++++++++-----
 include/hw/virtio/vhost-user-blk.h |   4 +
 2 files changed, 160 insertions(+), 27 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 27028cf996..80f2e2d765 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
     .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
 };
 
-static void vhost_user_blk_start(VirtIODevice *vdev)
+static int vhost_user_blk_start(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
@@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
 
     if (!k->set_guest_notifiers) {
         error_report("binding does not support guest notifiers");
-        return;
+        return -ENOSYS;
     }
 
     ret = vhost_dev_enable_notifiers(&s->dev, vdev);
     if (ret < 0) {
         error_report("Error enabling host notifiers: %d", -ret);
-        return;
+        return ret;
     }
 
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
@@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
         vhost_virtqueue_mask(&s->dev, vdev, i, false);
     }
 
-    return;
+    return ret;
 
 err_guest_notifiers:
     k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&s->dev, vdev);
+    return ret;
 }
 
 static void vhost_user_blk_stop(VirtIODevice *vdev)
@@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
     ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
-        return;
     }
 
     vhost_dev_disable_notifiers(&s->dev, vdev);
@@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
+    int ret;
 
     if (!vdev->vm_running) {
         should_start = false;
     }
 
-    if (s->dev.started == should_start) {
+    if (s->should_start == should_start) {
+        return;
+    }
+
+    if (!s->connected || s->dev.started == should_start) {
+        s->should_start = should_start;
         return;
     }
 
     if (should_start) {
-        vhost_user_blk_start(vdev);
+        s->should_start = true;
+        /*
+         * make sure vhost_user_blk_handle_output() ignores fake
+         * guest kick by vhost_dev_enable_notifiers()
+         */
+        barrier();
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            qemu_chr_fe_disconnect(&s->chardev);
+        }
     } else {
         vhost_user_blk_stop(vdev);
+        /*
+         * make sure vhost_user_blk_handle_output() ignore fake
+         * guest kick by vhost_dev_disable_notifiers()
+         */
+        barrier();
+        s->should_start = false;
     }
-
 }
 
 static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
@@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
 static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    int i;
+    int i, ret;
 
     if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
         !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
         return;
     }
 
+    if (s->should_start) {
+        return;
+    }
+    s->should_start = true;
+
+    if (!s->connected) {
+        return;
+    }
+
     if (s->dev.started) {
         return;
     }
@@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
      * vhost here instead of waiting for .set_status().
      */
-    vhost_user_blk_start(vdev);
+    ret = vhost_user_blk_start(vdev);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost start failed: %s",
+                     strerror(-ret));
+        qemu_chr_fe_disconnect(&s->chardev);
+        return;
+    }
 
     /* Kick right away to begin processing requests already in vring */
     for (i = 0; i < s->dev.nvqs; i++) {
@@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
     vhost_dev_reset_shm(s->shm);
 }
 
+static int vhost_user_blk_connect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    int ret = 0;
+
+    if (s->connected) {
+        return 0;
+    }
+    s->connected = true;
+
+    s->dev.nvqs = s->num_queues;
+    s->dev.vqs = s->vqs;
+    s->dev.vq_index = 0;
+    s->dev.backend_features = 0;
+
+    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
+
+    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
+    if (ret < 0) {
+        error_report("vhost-user-blk: vhost initialization failed: %s",
+                     strerror(-ret));
+        return ret;
+    }
+
+    /* restore vhost state */
+    if (s->should_start) {
+        ret = vhost_user_blk_start(vdev);
+        if (ret < 0) {
+            error_report("vhost-user-blk: vhost start failed: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static void vhost_user_blk_disconnect(DeviceState *dev)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    if (!s->connected) {
+        return;
+    }
+    s->connected = false;
+
+    if (s->dev.started) {
+        vhost_user_blk_stop(vdev);
+    }
+
+    vhost_dev_cleanup(&s->dev);
+}
+
+static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
+                                     void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    qemu_chr_fe_disconnect(&s->chardev);
+
+    return true;
+}
+
+static void vhost_user_blk_event(void *opaque, int event)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    switch (event) {
+    case CHR_EVENT_OPENED:
+        if (vhost_user_blk_connect(dev) < 0) {
+            qemu_chr_fe_disconnect(&s->chardev);
+            return;
+        }
+        s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
+                                         vhost_user_blk_watch, dev);
+        break;
+    case CHR_EVENT_CLOSED:
+        vhost_user_blk_disconnect(dev);
+        if (s->watch) {
+            g_source_remove(s->watch);
+            s->watch = 0;
+        }
+        break;
+    }
+}
+
 static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
     VhostUserState *user;
     int i, ret;
+    Error *err = NULL;
 
     if (!s->chardev.chr) {
         error_setg(errp, "vhost-user-blk: chardev is mandatory");
@@ -299,23 +429,21 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
     }
 
     s->shm = g_new0(struct vhost_shm, 1);
-
-    s->dev.nvqs = s->num_queues;
-    s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
-    s->dev.vq_index = 0;
-    s->dev.backend_features = 0;
-
-    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
-
-    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
-    if (ret < 0) {
-        error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
-                   strerror(-ret));
-        goto virtio_err;
+    s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
+    s->watch = 0;
+    s->should_start = false;
+    s->connected = false;
+
+    while (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
+        error_report_err(err);
+        err = NULL;
+        sleep(1);
     }
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, vhost_user_blk_event,
+                             NULL, (void *)dev, NULL, true);
 
     ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
-                              sizeof(struct virtio_blk_config));
+                               sizeof(struct virtio_blk_config));
     if (ret < 0) {
         error_setg(errp, "vhost-user-blk: get block config failed");
         goto vhost_err;
@@ -335,8 +463,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 
 vhost_err:
     vhost_dev_cleanup(&s->dev);
-virtio_err:
-    g_free(s->dev.vqs);
+    g_free(s->vqs);
     g_free(s->shm);
     virtio_cleanup(vdev);
 
@@ -351,9 +478,11 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
     VHostUserBlk *s = VHOST_USER_BLK(dev);
 
     vhost_user_blk_set_status(vdev, 0);
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, NULL,
+                             NULL, NULL, NULL, false);
     vhost_dev_cleanup(&s->dev);
     vhost_dev_free_shm(s->shm);
-    g_free(s->dev.vqs);
+    g_free(s->vqs);
     g_free(s->shm);
     virtio_cleanup(vdev);
 
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index bb706d70b3..c17d47402b 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -38,6 +38,10 @@ typedef struct VHostUserBlk {
     struct vhost_dev dev;
     struct vhost_shm *shm;
     VhostUserState *vhost_user;
+    struct vhost_virtqueue *vqs;
+    guint watch;
+    bool should_start;
+    bool connected;
 } VHostUserBlk;
 
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording
  2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
                   ` (5 preceding siblings ...)
  2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
@ 2018-12-18 10:00 ` elohimes
  6 siblings, 0 replies; 30+ messages in thread
From: elohimes @ 2018-12-18 10:00 UTC (permalink / raw)
  To: mst, marcandre.lureau, jasowang, yury-kotov, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

From: Xie Yongji <xieyongji@baidu.com>

This patch enables inflight I/O recording for
vhost-user-blk backend so that we could restart it safely.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 contrib/vhost-user-blk/vhost-user-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index 571f114a56..b144490fd6 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -328,7 +328,8 @@ vub_get_features(VuDev *dev)
 static uint64_t
 vub_get_protocol_features(VuDev *dev)
 {
-    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
+           1ull << VHOST_USER_PROTOCOL_F_SLAVE_SHMFD;
 }
 
 static int
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket elohimes
@ 2018-12-18 12:24   ` Marc-André Lureau
  2018-12-18 13:33     ` Yongji Xie
  2018-12-18 15:25     ` Daniel P. Berrangé
  0 siblings, 2 replies; 30+ messages in thread
From: Marc-André Lureau @ 2018-12-18 12:24 UTC (permalink / raw)
  To: elohimes
  Cc: Michael S . Tsirkin, Jason Wang, Yury Kotov, Coquelin, Maxime,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, xieyongji

Hi

On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
>
> From: Xie Yongji <xieyongji@baidu.com>
>
> New option "disconnected" is added to init the chardev socket
> in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> to connect when necessary. Now it would be used for unix domain
> socket of vhost-user-blk device to support reconnect.

What difference does that make if you wait for connection in
qemu_chr_fe_wait_connected(), or during chardev setup?

"disconnected" is misleading, would it be possible to reuse
"wait/nowait" instead?

Tests would be welcome.

>
> Suggested-by: Yury Kotov <yury-kotov@yandex-team.ru>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  chardev/char-socket.c | 10 ++++++++++
>  chardev/char.c        |  3 +++
>  qapi/char.json        |  3 +++
>  qemu-options.hx       | 28 ++++++++++++++++------------
>  4 files changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index eaa8e8b68f..2464d7abad 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -70,6 +70,7 @@ typedef struct {
>      TCPChardevTelnetInit *telnet_init;
>
>      bool is_websock;
> +    bool is_disconnected;
>
>      GSource *reconnect_timer;
>      int64_t reconnect_time;
> @@ -1000,6 +1001,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
>      bool is_tn3270      = sock->has_tn3270  ? sock->tn3270  : false;
>      bool is_waitconnect = sock->has_wait    ? sock->wait    : false;
>      bool is_websock     = sock->has_websocket ? sock->websocket : false;
> +    bool is_disconnected = sock->has_disconnected ? sock->disconnected : false;
>      int64_t reconnect   = sock->has_reconnect ? sock->reconnect : 0;
>      QIOChannelSocket *sioc = NULL;
>      SocketAddress *addr;
> @@ -1072,6 +1074,10 @@ static void qmp_chardev_open_socket(Chardev *chr,
>          s->reconnect_time = reconnect;
>      }
>
> +    if (!s->is_listen && is_disconnected) {
> +        return;
> +    }
> +
>      if (s->reconnect_time) {
>          tcp_chr_connect_async(chr);
>      } else {
> @@ -1125,6 +1131,8 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
>      bool is_tn3270      = qemu_opt_get_bool(opts, "tn3270", false);
>      bool is_websock     = qemu_opt_get_bool(opts, "websocket", false);
>      bool do_nodelay     = !qemu_opt_get_bool(opts, "delay", true);
> +    bool is_disconnected = !is_listen &&
> +                           qemu_opt_get_bool(opts, "disconnected", false);
>      int64_t reconnect   = qemu_opt_get_number(opts, "reconnect", 0);
>      const char *path = qemu_opt_get(opts, "path");
>      const char *host = qemu_opt_get(opts, "host");
> @@ -1176,6 +1184,8 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
>      sock->websocket = is_websock;
>      sock->has_wait = true;
>      sock->wait = is_waitconnect;
> +    sock->has_disconnected = true;
> +    sock->disconnected = is_disconnected;
>      sock->has_reconnect = qemu_opt_find(opts, "reconnect");
>      sock->reconnect = reconnect;
>      sock->tls_creds = g_strdup(tls_creds);
> diff --git a/chardev/char.c b/chardev/char.c
> index ccba36bafb..fb916eb176 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -862,6 +862,9 @@ QemuOptsList qemu_chardev_opts = {
>          },{
>              .name = "delay",
>              .type = QEMU_OPT_BOOL,
> +        },{
> +            .name = "disconnected",
> +            .type = QEMU_OPT_BOOL,
>          },{
>              .name = "reconnect",
>              .type = QEMU_OPT_NUMBER,
> diff --git a/qapi/char.json b/qapi/char.json
> index 77ed847972..b634442229 100644
> --- a/qapi/char.json
> +++ b/qapi/char.json
> @@ -258,6 +258,8 @@
>  #          sockets (default: false) (Since: 2.10)
>  # @websocket: enable websocket protocol on server
>  #           sockets (default: false) (Since: 3.1)
> +# @disconnected: init a client socket in disconnected
> +#                state (default: false) (Since: 4.0)
>  # @reconnect: For a client socket, if a socket is disconnected,
>  #          then attempt a reconnect after the given number of seconds.
>  #          Setting this to zero disables this function. (default: 0)
> @@ -274,6 +276,7 @@
>              '*telnet': 'bool',
>              '*tn3270': 'bool',
>              '*websocket': 'bool',
> +            '*disconnected': 'bool',
>              '*reconnect': 'int' },
>    'base': 'ChardevCommon' }
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index df42116ecc..f9d3495dc8 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2418,10 +2418,10 @@ DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
>      "-chardev help\n"
>      "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
>      "-chardev socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]\n"
> -    "         [,server][,nowait][,telnet][,websocket][,reconnect=seconds][,mux=on|off]\n"
> -    "         [,logfile=PATH][,logappend=on|off][,tls-creds=ID] (tcp)\n"
> -    "-chardev socket,id=id,path=path[,server][,nowait][,telnet][,websocket][,reconnect=seconds]\n"
> -    "         [,mux=on|off][,logfile=PATH][,logappend=on|off] (unix)\n"
> +    "         [,server][,nowait][,telnet][,websocket][,disconnected][,reconnect=seconds]\n"
> +    "         [,mux=on|off][,logfile=PATH][,logappend=on|off][,tls-creds=ID] (tcp)\n"
> +    "-chardev socket,id=id,path=path[,server][,nowait][,telnet][,websocket][,disconnected]\n"
> +    "         [,reconnect=seconds][,mux=on|off][,logfile=PATH][,logappend=on|off] (unix)\n"
>      "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
>      "         [,localport=localport][,ipv4][,ipv6][,mux=on|off]\n"
>      "         [,logfile=PATH][,logappend=on|off]\n"
> @@ -2548,7 +2548,7 @@ The available backends are:
>  A void device. This device will not emit any data, and will drop any data it
>  receives. The null backend does not take any options.
>
> -@item -chardev socket,id=@var{id}[,@var{TCP options} or @var{unix options}][,server][,nowait][,telnet][,websocket][,reconnect=@var{seconds}][,tls-creds=@var{id}]
> +@item -chardev socket,id=@var{id}[,@var{TCP options} or @var{unix options}][,server][,nowait][,telnet][,websocket][,disconnected][,reconnect=@var{seconds}][,tls-creds=@var{id}]
>
>  Create a two-way stream socket, which can be either a TCP or a unix socket. A
>  unix socket will be created if @option{path} is specified. Behaviour is
> @@ -2565,6 +2565,9 @@ escape sequences.
>  @option{websocket} specifies that the socket uses WebSocket protocol for
>  communication.
>
> +@option{disconnected} specifies that the non-server socket should not try
> +to connect the remote during initialization.
> +
>  @option{reconnect} sets the timeout for reconnecting on non-server sockets when
>  the remote end goes away.  qemu will delay this many seconds and then attempt
>  to reconnect.  Zero disables reconnecting, and is the default.
> @@ -3087,18 +3090,19 @@ telnet on port 5555 to access the QEMU port.
>  localhost 5555
>  @end table
>
> -@item tcp:[@var{host}]:@var{port}[,@var{server}][,nowait][,nodelay][,reconnect=@var{seconds}]
> +@item tcp:[@var{host}]:@var{port}[,@var{server}][,nowait][,nodelay][,disconnected]
> +[,reconnect=@var{seconds}]
>  The TCP Net Console has two modes of operation.  It can send the serial
>  I/O to a location or wait for a connection from a location.  By default
>  the TCP Net Console is sent to @var{host} at the @var{port}.  If you use
>  the @var{server} option QEMU will wait for a client socket application
>  to connect to the port before continuing, unless the @code{nowait}
>  option was specified.  The @code{nodelay} option disables the Nagle buffering
> -algorithm.  The @code{reconnect} option only applies if @var{noserver} is
> -set, if the connection goes down it will attempt to reconnect at the
> -given interval.  If @var{host} is omitted, 0.0.0.0 is assumed. Only
> -one TCP connection at a time is accepted. You can use @code{telnet} to
> -connect to the corresponding character device.
> +algorithm.  If @var{noserver} is specified, the @code{disconnected} will disallow
> +QEMU to connect during initialization, and the @code{reconnect} will ask QEMU
> +to reconnect at the given interval when the connection goes down.  If @var{host}
> +is omitted, 0.0.0.0 is assumed. Only one TCP connection at a time is accepted.
> +You can use @code{telnet} to connect to the corresponding character device.
>  @table @code
>  @item Example to send tcp console to 192.168.0.2 port 4444
>  -serial tcp:192.168.0.2:4444
> @@ -3121,7 +3125,7 @@ type "send break" followed by pressing the enter key.
>  The WebSocket protocol is used instead of raw tcp socket. The port acts as
>  a WebSocket server. Client mode is not supported.
>
> -@item unix:@var{path}[,server][,nowait][,reconnect=@var{seconds}]
> +@item unix:@var{path}[,server][,nowait][,disconnected][,reconnect=@var{seconds}]
>  A unix domain socket is used instead of a tcp socket.  The option works the
>  same as if you had specified @code{-serial tcp} except the unix domain socket
>  @var{path} is used for connections.
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
@ 2018-12-18 12:30   ` Yury Kotov
  2018-12-18 14:16     ` Yongji Xie
  0 siblings, 1 reply; 30+ messages in thread
From: Yury Kotov @ 2018-12-18 12:30 UTC (permalink / raw)
  To: elohimes, mst, marcandre.lureau, jasowang, maxime.coquelin
  Cc: qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji,
	Евгений
	Яковлев

+ wrfsh@

Hi,

18.12.2018, 13:01, "elohimes@gmail.com" <elohimes@gmail.com>:
> From: Xie Yongji <xieyongji@baidu.com>
>
> Since we now support the message VHOST_USER_GET_SHM_SIZE
> and VHOST_USER_SET_SHM_FD. The backend is able to restart
> safely because it can record inflight I/O in shared memory.
> This patch allows qemu to reconnect the backend after
> connection closed.
>
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Ni Xun <nixun@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  hw/block/vhost-user-blk.c | 183 ++++++++++++++++++++++++-----
>  include/hw/virtio/vhost-user-blk.h | 4 +
>  2 files changed, 160 insertions(+), 27 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 27028cf996..80f2e2d765 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
>      .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
>  };
>
> -static void vhost_user_blk_start(VirtIODevice *vdev)
> +static int vhost_user_blk_start(VirtIODevice *vdev)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
>      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>
>      if (!k->set_guest_notifiers) {
>          error_report("binding does not support guest notifiers");
> - return;
> + return -ENOSYS;
>      }
>
>      ret = vhost_dev_enable_notifiers(&s->dev, vdev);
>      if (ret < 0) {
>          error_report("Error enabling host notifiers: %d", -ret);
> - return;
> + return ret;
>      }
>
>      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> @@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>          vhost_virtqueue_mask(&s->dev, vdev, i, false);
>      }
>
> - return;
> + return ret;
>
>  err_guest_notifiers:
>      k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>  err_host_notifiers:
>      vhost_dev_disable_notifiers(&s->dev, vdev);
> + return ret;
>  }
>
>  static void vhost_user_blk_stop(VirtIODevice *vdev)
> @@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
>      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>      if (ret < 0) {
>          error_report("vhost guest notifier cleanup failed: %d", ret);
> - return;
>      }
>
>      vhost_dev_disable_notifiers(&s->dev, vdev);
> @@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
>      bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
> + int ret;
>
>      if (!vdev->vm_running) {
>          should_start = false;
>      }
>
> - if (s->dev.started == should_start) {
> + if (s->should_start == should_start) {
> + return;
> + }
> +
> + if (!s->connected || s->dev.started == should_start) {
> + s->should_start = should_start;
>          return;
>      }
>
>      if (should_start) {
> - vhost_user_blk_start(vdev);
> + s->should_start = true;
> + /*
> + * make sure vhost_user_blk_handle_output() ignores fake
> + * guest kick by vhost_dev_enable_notifiers()
> + */
> + barrier();
> + ret = vhost_user_blk_start(vdev);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost start failed: %s",
> + strerror(-ret));
> + qemu_chr_fe_disconnect(&s->chardev);
> + }
>      } else {
>          vhost_user_blk_stop(vdev);
> + /*
> + * make sure vhost_user_blk_handle_output() ignore fake
> + * guest kick by vhost_dev_disable_notifiers()
> + */
> + barrier();
> + s->should_start = false;
>      }
> -
>  }
>
>  static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> @@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
>  static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> - int i;
> + int i, ret;
>
>      if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
>          !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
>          return;
>      }
>
> + if (s->should_start) {
> + return;
> + }
> + s->should_start = true;
> +
> + if (!s->connected) {
> + return;
> + }
> +
>      if (s->dev.started) {
>          return;
>      }
> @@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>      /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>       * vhost here instead of waiting for .set_status().
>       */
> - vhost_user_blk_start(vdev);
> + ret = vhost_user_blk_start(vdev);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost start failed: %s",
> + strerror(-ret));
> + qemu_chr_fe_disconnect(&s->chardev);
> + return;
> + }
>
>      /* Kick right away to begin processing requests already in vring */
>      for (i = 0; i < s->dev.nvqs; i++) {
> @@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
>      vhost_dev_reset_shm(s->shm);
>  }
>
> +static int vhost_user_blk_connect(DeviceState *dev)
> +{
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> + int ret = 0;
> +
> + if (s->connected) {
> + return 0;
> + }
> + s->connected = true;
> +
> + s->dev.nvqs = s->num_queues;
> + s->dev.vqs = s->vqs;
> + s->dev.vq_index = 0;
> + s->dev.backend_features = 0;
> +
> + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> +
> + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost initialization failed: %s",
> + strerror(-ret));
> + return ret;
> + }
> +
> + /* restore vhost state */
> + if (s->should_start) {
> + ret = vhost_user_blk_start(vdev);
> + if (ret < 0) {
> + error_report("vhost-user-blk: vhost start failed: %s",
> + strerror(-ret));
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static void vhost_user_blk_disconnect(DeviceState *dev)
> +{
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> + if (!s->connected) {
> + return;
> + }
> + s->connected = false;
> +
> + if (s->dev.started) {
> + vhost_user_blk_stop(vdev);
> + }
> +
> + vhost_dev_cleanup(&s->dev);
> +}
> +
> +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
> + void *opaque)
> +{
> + DeviceState *dev = opaque;
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> + qemu_chr_fe_disconnect(&s->chardev);
> +
> + return true;
> +}
> +
> +static void vhost_user_blk_event(void *opaque, int event)
> +{
> + DeviceState *dev = opaque;
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> + switch (event) {
> + case CHR_EVENT_OPENED:
> + if (vhost_user_blk_connect(dev) < 0) {
> + qemu_chr_fe_disconnect(&s->chardev);
> + return;
> + }
> + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
> + vhost_user_blk_watch, dev);
> + break;
> + case CHR_EVENT_CLOSED:
> + vhost_user_blk_disconnect(dev);
> + if (s->watch) {
> + g_source_remove(s->watch);
> + s->watch = 0;
> + }
> + break;
> + }
> +}
> +
>  static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
>      VhostUserState *user;
>      int i, ret;
> + Error *err = NULL;
>
>      if (!s->chardev.chr) {
>          error_setg(errp, "vhost-user-blk: chardev is mandatory");
> @@ -299,23 +429,21 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>      }
>
>      s->shm = g_new0(struct vhost_shm, 1);
> -
> - s->dev.nvqs = s->num_queues;
> - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
> - s->dev.vq_index = 0;
> - s->dev.backend_features = 0;
> -
> - vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> -
> - ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> - if (ret < 0) {
> - error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> - strerror(-ret));
> - goto virtio_err;
> + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
> + s->watch = 0;
> + s->should_start = false;
> + s->connected = false;
> +
> + while (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> + error_report_err(err);
> + err = NULL;
> + sleep(1);
>      }

After the reconnect loop we have some function calls to vhost backend:
* qemu_chr_fe_set_handlers implicit calls vhost_dev_init
* vhost_dev_get_config
* vhost_dev_init_shm

If vhost backend will restart during their execution the initialzation will be
failed. What do you think? May be we can call these functions inside of
the reconnect loop to fix it?

> + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> + NULL, (void *)dev, NULL, true);
>
>      ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
> - sizeof(struct virtio_blk_config));
> + sizeof(struct virtio_blk_config));
>      if (ret < 0) {
>          error_setg(errp, "vhost-user-blk: get block config failed");
>          goto vhost_err;
> @@ -335,8 +463,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>
>  vhost_err:
>      vhost_dev_cleanup(&s->dev);
> -virtio_err:
> - g_free(s->dev.vqs);
> + g_free(s->vqs);
>      g_free(s->shm);
>      virtio_cleanup(vdev);
>
> @@ -351,9 +478,11 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
>      VHostUserBlk *s = VHOST_USER_BLK(dev);
>
>      vhost_user_blk_set_status(vdev, 0);
> + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL,
> + NULL, NULL, NULL, false);
>      vhost_dev_cleanup(&s->dev);
>      vhost_dev_free_shm(s->shm);
> - g_free(s->dev.vqs);
> + g_free(s->vqs);
>      g_free(s->shm);
>      virtio_cleanup(vdev);
>
> diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> index bb706d70b3..c17d47402b 100644
> --- a/include/hw/virtio/vhost-user-blk.h
> +++ b/include/hw/virtio/vhost-user-blk.h
> @@ -38,6 +38,10 @@ typedef struct VHostUserBlk {
>      struct vhost_dev dev;
>      struct vhost_shm *shm;
>      VhostUserState *vhost_user;
> + struct vhost_virtqueue *vqs;
> + guint watch;
> + bool should_start;
> + bool connected;
>  } VHostUserBlk;
>
>  #endif
> --
> 2.17.1

Regards,
Yury

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18 12:24   ` Marc-André Lureau
@ 2018-12-18 13:33     ` Yongji Xie
  2018-12-18 15:25     ` Daniel P. Berrangé
  1 sibling, 0 replies; 30+ messages in thread
From: Yongji Xie @ 2018-12-18 13:33 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Michael S . Tsirkin, Jason Wang, Yury Kotov, Coquelin, Maxime,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, 18 Dec 2018 at 20:24, Marc-André Lureau
<marcandre.lureau@redhat.com> wrote:
>
> Hi
>
> On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> >
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > New option "disconnected" is added to init the chardev socket
> > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > to connect when necessary. Now it would be used for unix domain
> > socket of vhost-user-blk device to support reconnect.
>
> What difference does that make if you wait for connection in
> qemu_chr_fe_wait_connected(), or during chardev setup?
>

Wait for connection in qemu_chr_fe_wait_connected() could make
it possible to start qemu before backend. Maybe we need to support
this case?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2018-12-18 12:30   ` Yury Kotov
@ 2018-12-18 14:16     ` Yongji Xie
  2018-12-18 14:35       ` Yury Kotov
  0 siblings, 1 reply; 30+ messages in thread
From: Yongji Xie @ 2018-12-18 14:16 UTC (permalink / raw)
  To: Yury Kotov
  Cc: mst, marcandre.lureau, jasowang, maxime.coquelin, qemu-devel,
	zhangyu31, chaiwen, nixun, lilin24, Xie Yongji,
	Евгений
	Яковлев

On Tue, 18 Dec 2018 at 20:30, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
> + wrfsh@
>
> Hi,
>
> 18.12.2018, 13:01, "elohimes@gmail.com" <elohimes@gmail.com>:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > Since we now support the message VHOST_USER_GET_SHM_SIZE
> > and VHOST_USER_SET_SHM_FD. The backend is able to restart
> > safely because it can record inflight I/O in shared memory.
> > This patch allows qemu to reconnect the backend after
> > connection closed.
> >
> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> > Signed-off-by: Ni Xun <nixun@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > ---
> >  hw/block/vhost-user-blk.c | 183 ++++++++++++++++++++++++-----
> >  include/hw/virtio/vhost-user-blk.h | 4 +
> >  2 files changed, 160 insertions(+), 27 deletions(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 27028cf996..80f2e2d765 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
> >      .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
> >  };
> >
> > -static void vhost_user_blk_start(VirtIODevice *vdev)
> > +static int vhost_user_blk_start(VirtIODevice *vdev)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> > @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >
> >      if (!k->set_guest_notifiers) {
> >          error_report("binding does not support guest notifiers");
> > - return;
> > + return -ENOSYS;
> >      }
> >
> >      ret = vhost_dev_enable_notifiers(&s->dev, vdev);
> >      if (ret < 0) {
> >          error_report("Error enabling host notifiers: %d", -ret);
> > - return;
> > + return ret;
> >      }
> >
> >      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> > @@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >          vhost_virtqueue_mask(&s->dev, vdev, i, false);
> >      }
> >
> > - return;
> > + return ret;
> >
> >  err_guest_notifiers:
> >      k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >  err_host_notifiers:
> >      vhost_dev_disable_notifiers(&s->dev, vdev);
> > + return ret;
> >  }
> >
> >  static void vhost_user_blk_stop(VirtIODevice *vdev)
> > @@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
> >      ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >      if (ret < 0) {
> >          error_report("vhost guest notifier cleanup failed: %d", ret);
> > - return;
> >      }
> >
> >      vhost_dev_disable_notifiers(&s->dev, vdev);
> > @@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >      bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
> > + int ret;
> >
> >      if (!vdev->vm_running) {
> >          should_start = false;
> >      }
> >
> > - if (s->dev.started == should_start) {
> > + if (s->should_start == should_start) {
> > + return;
> > + }
> > +
> > + if (!s->connected || s->dev.started == should_start) {
> > + s->should_start = should_start;
> >          return;
> >      }
> >
> >      if (should_start) {
> > - vhost_user_blk_start(vdev);
> > + s->should_start = true;
> > + /*
> > + * make sure vhost_user_blk_handle_output() ignores fake
> > + * guest kick by vhost_dev_enable_notifiers()
> > + */
> > + barrier();
> > + ret = vhost_user_blk_start(vdev);
> > + if (ret < 0) {
> > + error_report("vhost-user-blk: vhost start failed: %s",
> > + strerror(-ret));
> > + qemu_chr_fe_disconnect(&s->chardev);
> > + }
> >      } else {
> >          vhost_user_blk_stop(vdev);
> > + /*
> > + * make sure vhost_user_blk_handle_output() ignore fake
> > + * guest kick by vhost_dev_disable_notifiers()
> > + */
> > + barrier();
> > + s->should_start = false;
> >      }
> > -
> >  }
> >
> >  static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> > @@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> >  static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > - int i;
> > + int i, ret;
> >
> >      if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> >          !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
> >          return;
> >      }
> >
> > + if (s->should_start) {
> > + return;
> > + }
> > + s->should_start = true;
> > +
> > + if (!s->connected) {
> > + return;
> > + }
> > +
> >      if (s->dev.started) {
> >          return;
> >      }
> > @@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >      /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> >       * vhost here instead of waiting for .set_status().
> >       */
> > - vhost_user_blk_start(vdev);
> > + ret = vhost_user_blk_start(vdev);
> > + if (ret < 0) {
> > + error_report("vhost-user-blk: vhost start failed: %s",
> > + strerror(-ret));
> > + qemu_chr_fe_disconnect(&s->chardev);
> > + return;
> > + }
> >
> >      /* Kick right away to begin processing requests already in vring */
> >      for (i = 0; i < s->dev.nvqs; i++) {
> > @@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
> >      vhost_dev_reset_shm(s->shm);
> >  }
> >
> > +static int vhost_user_blk_connect(DeviceState *dev)
> > +{
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > + int ret = 0;
> > +
> > + if (s->connected) {
> > + return 0;
> > + }
> > + s->connected = true;
> > +
> > + s->dev.nvqs = s->num_queues;
> > + s->dev.vqs = s->vqs;
> > + s->dev.vq_index = 0;
> > + s->dev.backend_features = 0;
> > +
> > + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> > +
> > + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> > + if (ret < 0) {
> > + error_report("vhost-user-blk: vhost initialization failed: %s",
> > + strerror(-ret));
> > + return ret;
> > + }
> > +
> > + /* restore vhost state */
> > + if (s->should_start) {
> > + ret = vhost_user_blk_start(vdev);
> > + if (ret < 0) {
> > + error_report("vhost-user-blk: vhost start failed: %s",
> > + strerror(-ret));
> > + return ret;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void vhost_user_blk_disconnect(DeviceState *dev)
> > +{
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +
> > + if (!s->connected) {
> > + return;
> > + }
> > + s->connected = false;
> > +
> > + if (s->dev.started) {
> > + vhost_user_blk_stop(vdev);
> > + }
> > +
> > + vhost_dev_cleanup(&s->dev);
> > +}
> > +
> > +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
> > + void *opaque)
> > +{
> > + DeviceState *dev = opaque;
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +
> > + qemu_chr_fe_disconnect(&s->chardev);
> > +
> > + return true;
> > +}
> > +
> > +static void vhost_user_blk_event(void *opaque, int event)
> > +{
> > + DeviceState *dev = opaque;
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +
> > + switch (event) {
> > + case CHR_EVENT_OPENED:
> > + if (vhost_user_blk_connect(dev) < 0) {
> > + qemu_chr_fe_disconnect(&s->chardev);
> > + return;
> > + }
> > + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
> > + vhost_user_blk_watch, dev);
> > + break;
> > + case CHR_EVENT_CLOSED:
> > + vhost_user_blk_disconnect(dev);
> > + if (s->watch) {
> > + g_source_remove(s->watch);
> > + s->watch = 0;
> > + }
> > + break;
> > + }
> > +}
> > +
> >  static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >      VhostUserState *user;
> >      int i, ret;
> > + Error *err = NULL;
> >
> >      if (!s->chardev.chr) {
> >          error_setg(errp, "vhost-user-blk: chardev is mandatory");
> > @@ -299,23 +429,21 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >      }
> >
> >      s->shm = g_new0(struct vhost_shm, 1);
> > -
> > - s->dev.nvqs = s->num_queues;
> > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
> > - s->dev.vq_index = 0;
> > - s->dev.backend_features = 0;
> > -
> > - vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> > -
> > - ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> > - if (ret < 0) {
> > - error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> > - strerror(-ret));
> > - goto virtio_err;
> > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
> > + s->watch = 0;
> > + s->should_start = false;
> > + s->connected = false;
> > +
> > + while (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> > + error_report_err(err);
> > + err = NULL;
> > + sleep(1);
> >      }
>
> After the reconnect loop we have some function calls to vhost backend:
> * qemu_chr_fe_set_handlers implicit calls vhost_dev_init
> * vhost_dev_get_config
> * vhost_dev_init_shm
>
> If vhost backend will restart during their execution the initialzation will be
> failed. What do you think? May be we can call these functions inside of
> the reconnect loop to fix it?
>

qemu_chr_fe_wait_connected() should be called only when we are in
disconnected state. Otherwise, it will be conflicted with reconnect
thread.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
@ 2018-12-18 14:25   ` Michael S. Tsirkin
  2018-12-18 14:47     ` Yongji Xie
  0 siblings, 1 reply; 30+ messages in thread
From: Michael S. Tsirkin @ 2018-12-18 14:25 UTC (permalink / raw)
  To: elohimes
  Cc: marcandre.lureau, jasowang, yury-kotov, maxime.coquelin,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, Dec 18, 2018 at 05:59:57PM +0800, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
> 
> This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> and VHOST_USER_SET_SHM_FD to support providing shared
> memory to backend.
> 
> Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> required size of shared memory from backend. Then, qemu
> allocates memory and sends them back to backend through
> VHOST_USER_SET_SHM_FD.
> 
> Note that the shared memory should be used to record
> inflight I/O by backend. Qemu will clear it when vm reset.

An interesting design choice. Why not let the backend clear it
on start?

> 
> Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> Signed-off-by: Chai Wen <chaiwen@baidu.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> ---
>  docs/interop/vhost-user.txt       |  41 +++++++++++
>  hw/virtio/vhost-user.c            |  86 ++++++++++++++++++++++
>  hw/virtio/vhost.c                 | 117 ++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-backend.h |   9 +++
>  include/hw/virtio/vhost.h         |  19 +++++
>  5 files changed, 272 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index c2194711d9..5ee9c28ab0 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -142,6 +142,19 @@ Depending on the request type, payload can be:
>     Offset: a 64-bit offset of this area from the start of the
>         supplied file descriptor
>  
> + * Shm description
> +   -----------------------------------
> +   | mmap_size | mmap_offset | dev_size | vq_size | align | version |
> +   -----------------------------------
> +
> +   Mmap_size: a 64-bit size of the shared memory
> +   Mmap_offset: a 64-bit offset of the shared memory from the start
> +                of the supplied file descriptor
> +   Dev_size: a 32-bit size of device region in shared memory
> +   Vq_size: a 32-bit size of each virtqueue region in shared memory
> +   Align: a 32-bit align of each region in shared memory
> +   Version: a 32-bit version of this shared memory
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -157,6 +170,7 @@ typedef struct VhostUserMsg {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserVringArea area;
> +        VhostUserShm shm;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -175,6 +189,7 @@ the ones that do:
>   * VHOST_USER_GET_PROTOCOL_FEATURES
>   * VHOST_USER_GET_VRING_BASE
>   * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> + * VHOST_USER_GET_SHM_SIZE (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>  
>  [ Also see the section on REPLY_ACK protocol extension. ]
>  
> @@ -188,6 +203,7 @@ in the ancillary data:
>   * VHOST_USER_SET_VRING_CALL
>   * VHOST_USER_SET_VRING_ERR
>   * VHOST_USER_SET_SLAVE_REQ_FD
> + * VHOST_USER_SET_SHM_FD (if VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)
>  
>  If Master is unable to send the full message or receives a wrong reply it will
>  close the connection. An optional reconnection mechanism can be implemented.
> @@ -397,6 +413,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_CONFIG         9
>  #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
>  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
> +#define VHOST_USER_PROTOCOL_F_SLAVE_SHMFD 12
>  
>  Master message types
>  --------------------
> @@ -761,6 +778,30 @@ Master message types
>        was previously sent.
>        The value returned is an error indication; 0 is success.
>  
> + * VHOST_USER_GET_SHM_SIZE
> +      Id: 31
> +      Equivalent ioctl: N/A
> +      Master payload: shm description
> +
> +      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> +      successfully negotiated, master need to provide a shared memory to
> +      slave. This message is used by master to get required size from slave.
> +      The shared memory contains one region for device and several regions
> +      for virtqueue. The size of those two kinds of regions is specified
> +      by dev_size field and vq_size filed. The align field specify the alignment
> +      of those regions.
> +
> + * VHOST_USER_SET_SHM_FD
> +      Id: 32
> +      Equivalent ioctl: N/A
> +      Master payload: shm description
> +
> +      When VHOST_USER_PROTOCOL_F_SLAVE_SHMFD protocol feature has been
> +      successfully negotiated, master uses this message to set shared memory
> +      for slave. The memory fd is passed in the ancillary data. The shared
> +      memory should be used to record inflight I/O by slave. And master will
> +      clear it when vm reset.
> +
>  Slave message types
>  -------------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index e09bed0e4a..8cdf3b5121 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -52,6 +52,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
>      VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> +    VHOST_USER_PROTOCOL_F_SLAVE_SHMFD = 12,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -89,6 +90,8 @@ typedef enum VhostUserRequest {
>      VHOST_USER_POSTCOPY_ADVISE  = 28,
>      VHOST_USER_POSTCOPY_LISTEN  = 29,
>      VHOST_USER_POSTCOPY_END     = 30,
> +    VHOST_USER_GET_SHM_SIZE = 31,
> +    VHOST_USER_SET_SHM_FD = 32,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
> @@ -147,6 +150,15 @@ typedef struct VhostUserVringArea {
>      uint64_t offset;
>  } VhostUserVringArea;
>  
> +typedef struct VhostUserShm {
> +    uint64_t mmap_size;
> +    uint64_t mmap_offset;
> +    uint32_t dev_size;
> +    uint32_t vq_size;
> +    uint32_t align;
> +    uint32_t version;
> +} VhostUserShm;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -169,6 +181,7 @@ typedef union {
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
>          VhostUserVringArea area;
> +        VhostUserShm shm;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -1739,6 +1752,77 @@ static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
>      return result;
>  }
>  
> +static int vhost_user_get_shm_size(struct vhost_dev *dev,
> +                                   struct vhost_shm *shm)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_GET_SHM_SIZE,
> +        .hdr.flags = VHOST_USER_VERSION,
> +        .hdr.size = sizeof(msg.payload.shm),
> +    };
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> +        shm->dev_size = 0;
> +        shm->vq_size = 0;
> +        return 0;
> +    }
> +
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -1;
> +    }
> +
> +    if (vhost_user_read(dev, &msg) < 0) {
> +        return -1;
> +    }
> +
> +    if (msg.hdr.request != VHOST_USER_GET_SHM_SIZE) {
> +        error_report("Received unexpected msg type. "
> +                     "Expected %d received %d",
> +                     VHOST_USER_GET_SHM_SIZE, msg.hdr.request);
> +        return -1;
> +    }
> +
> +    if (msg.hdr.size != sizeof(msg.payload.shm)) {
> +        error_report("Received bad msg size.");
> +        return -1;
> +    }
> +
> +    shm->dev_size = msg.payload.shm.dev_size;
> +    shm->vq_size = msg.payload.shm.vq_size;
> +    shm->align = msg.payload.shm.align;
> +    shm->version = msg.payload.shm.version;
> +
> +    return 0;
> +}
> +
> +static int vhost_user_set_shm_fd(struct vhost_dev *dev,
> +                                 struct vhost_shm *shm)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_SET_SHM_FD,
> +        .hdr.flags = VHOST_USER_VERSION,
> +        .payload.shm.mmap_size = shm->mmap_size,
> +        .payload.shm.mmap_offset = 0,
> +        .payload.shm.dev_size = shm->dev_size,
> +        .payload.shm.vq_size = shm->vq_size,
> +        .payload.shm.align = shm->align,
> +        .payload.shm.version = shm->version,
> +        .hdr.size = sizeof(msg.payload.shm),
> +    };
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_SLAVE_SHMFD)) {
> +        return 0;
> +    }
> +
> +    if (vhost_user_write(dev, &msg, &shm->fd, 1) < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  VhostUserState *vhost_user_init(void)
>  {
>      VhostUserState *user = g_new0(struct VhostUserState, 1);
> @@ -1790,4 +1874,6 @@ const VhostOps user_ops = {
>          .vhost_crypto_create_session = vhost_user_crypto_create_session,
>          .vhost_crypto_close_session = vhost_user_crypto_close_session,
>          .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
> +        .vhost_get_shm_size = vhost_user_get_shm_size,
> +        .vhost_set_shm_fd = vhost_user_set_shm_fd,
>  };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 569c4053ea..7a38fed50f 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1481,6 +1481,123 @@ void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
>      hdev->config_ops = ops;
>  }
>  
> +void vhost_dev_reset_shm(struct vhost_shm *shm)
> +{
> +    if (shm->addr) {
> +        memset(shm->addr, 0, shm->mmap_size);
> +    }
> +}
> +
> +void vhost_dev_free_shm(struct vhost_shm *shm)
> +{
> +    if (shm->addr) {
> +        qemu_memfd_free(shm->addr, shm->mmap_size, shm->fd);
> +        shm->addr = NULL;
> +        shm->fd = -1;
> +    }
> +}
> +
> +int vhost_dev_alloc_shm(struct vhost_shm *shm)
> +{
> +    Error *err = NULL;
> +    int fd = -1;
> +    void *addr = qemu_memfd_alloc("vhost-shm", shm->mmap_size,
> +                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> +                                  &fd, &err);
> +    if (err) {
> +        error_report_err(err);
> +        return -1;
> +    }
> +
> +    shm->addr = addr;
> +    shm->fd = fd;
> +
> +    return 0;
> +}
> +
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> +    if (shm->addr) {
> +        qemu_put_be64(f, shm->mmap_size);
> +        qemu_put_be32(f, shm->dev_size);
> +        qemu_put_be32(f, shm->vq_size);
> +        qemu_put_be32(f, shm->align);
> +        qemu_put_be32(f, shm->version);
> +        qemu_put_buffer(f, shm->addr, shm->mmap_size);
> +    } else {
> +        qemu_put_be64(f, 0);
> +    }
> +}
> +
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f)
> +{
> +    uint64_t mmap_size;
> +
> +    mmap_size = qemu_get_be64(f);
> +    if (!mmap_size) {
> +        return 0;
> +    }
> +
> +    vhost_dev_free_shm(shm);
> +
> +    shm->mmap_size = mmap_size;
> +    shm->dev_size = qemu_get_be32(f);
> +    shm->vq_size = qemu_get_be32(f);
> +    shm->align = qemu_get_be32(f);
> +    shm->version = qemu_get_be32(f);
> +
> +    if (vhost_dev_alloc_shm(shm)) {
> +        return -ENOMEM;
> +    }
> +
> +    qemu_get_buffer(f, shm->addr, mmap_size);
> +
> +    return 0;
> +}
> +
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> +    int r;
> +
> +    if (dev->vhost_ops->vhost_set_shm_fd && shm->addr) {
> +        r = dev->vhost_ops->vhost_set_shm_fd(dev, shm);
> +        if (r) {
> +            VHOST_OPS_DEBUG("vhost_set_vring_shm_fd failed");
> +            return -errno;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm)
> +{
> +    int r;
> +
> +    if (dev->vhost_ops->vhost_get_shm_size) {
> +        r = dev->vhost_ops->vhost_get_shm_size(dev, shm);
> +        if (r) {
> +            VHOST_OPS_DEBUG("vhost_get_vring_shm_size failed");
> +            return -errno;
> +        }
> +
> +        if (!shm->dev_size && !shm->vq_size) {
> +            return 0;
> +        }
> +
> +        shm->mmap_size = QEMU_ALIGN_UP(shm->dev_size, shm->align) +
> +                         dev->nvqs * QEMU_ALIGN_UP(shm->vq_size, shm->align);
> +
> +        if (vhost_dev_alloc_shm(shm)) {
> +            return -ENOMEM;
> +        }
> +
> +        vhost_dev_reset_shm(shm);
> +    }
> +
> +    return 0;
> +}
> +
>  /* Host notifiers must be enabled at this point. */
>  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>  {
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81283ec50f..4e7f13c9e9 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -25,6 +25,7 @@ typedef enum VhostSetConfigType {
>      VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>  } VhostSetConfigType;
>  
> +struct vhost_shm;
>  struct vhost_dev;
>  struct vhost_log;
>  struct vhost_memory;
> @@ -104,6 +105,12 @@ typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
>  typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
>                                                  MemoryRegionSection *section);
>  
> +typedef int (*vhost_get_shm_size_op)(struct vhost_dev *dev,
> +                                     struct vhost_shm *shm);
> +
> +typedef int (*vhost_set_shm_fd_op)(struct vhost_dev *dev,
> +                                   struct vhost_shm *shm);
> +
>  typedef struct VhostOps {
>      VhostBackendType backend_type;
>      vhost_backend_init vhost_backend_init;
> @@ -142,6 +149,8 @@ typedef struct VhostOps {
>      vhost_crypto_create_session_op vhost_crypto_create_session;
>      vhost_crypto_close_session_op vhost_crypto_close_session;
>      vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> +    vhost_get_shm_size_op vhost_get_shm_size;
> +    vhost_set_shm_fd_op vhost_set_shm_fd;
>  } VhostOps;
>  
>  extern const VhostOps user_ops;
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a7f449fa87..b6e3d6ab56 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -7,6 +7,17 @@
>  #include "exec/memory.h"
>  
>  /* Generic structures common for any vhost based device. */
> +
> +struct vhost_shm {
> +    void *addr;
> +    uint64_t mmap_size;
> +    uint32_t dev_size;
> +    uint32_t vq_size;
> +    uint32_t align;
> +    uint32_t version;
> +    int fd;
> +};
> +
>  struct vhost_virtqueue {
>      int kick;
>      int call;
> @@ -120,4 +131,12 @@ int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
>   */
>  void vhost_dev_set_config_notifier(struct vhost_dev *dev,
>                                     const VhostDevConfigOps *ops);
> +
> +void vhost_dev_reset_shm(struct vhost_shm *shm);
> +void vhost_dev_free_shm(struct vhost_shm *shm);
> +int vhost_dev_alloc_shm(struct vhost_shm *shm);
> +void vhost_dev_save_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_load_shm(struct vhost_shm *shm, QEMUFile *f);
> +int vhost_dev_set_shm(struct vhost_dev *dev, struct vhost_shm *shm);
> +int vhost_dev_init_shm(struct vhost_dev *dev, struct vhost_shm *shm);
>  #endif
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2018-12-18 14:16     ` Yongji Xie
@ 2018-12-18 14:35       ` Yury Kotov
  2018-12-18 14:59         ` Yongji Xie
  0 siblings, 1 reply; 30+ messages in thread
From: Yury Kotov @ 2018-12-18 14:35 UTC (permalink / raw)
  To: Yongji Xie
  Cc: mst, marcandre.lureau, jasowang, maxime.coquelin, qemu-devel,
	zhangyu31, chaiwen, nixun, lilin24, Xie Yongji,
	Евгений
	Яковлев

18.12.2018, 17:16, "Yongji Xie" <elohimes@gmail.com>:
> On Tue, 18 Dec 2018 at 20:30, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>>  + wrfsh@
>>
>>  Hi,
>>
>>  18.12.2018, 13:01, "elohimes@gmail.com" <elohimes@gmail.com>:
>>  > From: Xie Yongji <xieyongji@baidu.com>
>>  >
>>  > Since we now support the message VHOST_USER_GET_SHM_SIZE
>>  > and VHOST_USER_SET_SHM_FD. The backend is able to restart
>>  > safely because it can record inflight I/O in shared memory.
>>  > This patch allows qemu to reconnect the backend after
>>  > connection closed.
>>  >
>>  > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
>>  > Signed-off-by: Ni Xun <nixun@baidu.com>
>>  > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
>>  > ---
>>  > hw/block/vhost-user-blk.c | 183 ++++++++++++++++++++++++-----
>>  > include/hw/virtio/vhost-user-blk.h | 4 +
>>  > 2 files changed, 160 insertions(+), 27 deletions(-)
>>  >
>>  > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>>  > index 27028cf996..80f2e2d765 100644
>>  > --- a/hw/block/vhost-user-blk.c
>>  > +++ b/hw/block/vhost-user-blk.c
>>  > @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
>>  > .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
>>  > };
>>  >
>>  > -static void vhost_user_blk_start(VirtIODevice *vdev)
>>  > +static int vhost_user_blk_start(VirtIODevice *vdev)
>>  > {
>>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
>>  > @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>>  >
>>  > if (!k->set_guest_notifiers) {
>>  > error_report("binding does not support guest notifiers");
>>  > - return;
>>  > + return -ENOSYS;
>>  > }
>>  >
>>  > ret = vhost_dev_enable_notifiers(&s->dev, vdev);
>>  > if (ret < 0) {
>>  > error_report("Error enabling host notifiers: %d", -ret);
>>  > - return;
>>  > + return ret;
>>  > }
>>  >
>>  > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
>>  > @@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>>  > vhost_virtqueue_mask(&s->dev, vdev, i, false);
>>  > }
>>  >
>>  > - return;
>>  > + return ret;
>>  >
>>  > err_guest_notifiers:
>>  > k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>>  > err_host_notifiers:
>>  > vhost_dev_disable_notifiers(&s->dev, vdev);
>>  > + return ret;
>>  > }
>>  >
>>  > static void vhost_user_blk_stop(VirtIODevice *vdev)
>>  > @@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
>>  > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>>  > if (ret < 0) {
>>  > error_report("vhost guest notifier cleanup failed: %d", ret);
>>  > - return;
>>  > }
>>  >
>>  > vhost_dev_disable_notifiers(&s->dev, vdev);
>>  > @@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
>>  > {
>>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
>>  > + int ret;
>>  >
>>  > if (!vdev->vm_running) {
>>  > should_start = false;
>>  > }
>>  >
>>  > - if (s->dev.started == should_start) {
>>  > + if (s->should_start == should_start) {
>>  > + return;
>>  > + }
>>  > +
>>  > + if (!s->connected || s->dev.started == should_start) {
>>  > + s->should_start = should_start;
>>  > return;
>>  > }
>>  >
>>  > if (should_start) {
>>  > - vhost_user_blk_start(vdev);
>>  > + s->should_start = true;
>>  > + /*
>>  > + * make sure vhost_user_blk_handle_output() ignores fake
>>  > + * guest kick by vhost_dev_enable_notifiers()
>>  > + */
>>  > + barrier();
>>  > + ret = vhost_user_blk_start(vdev);
>>  > + if (ret < 0) {
>>  > + error_report("vhost-user-blk: vhost start failed: %s",
>>  > + strerror(-ret));
>>  > + qemu_chr_fe_disconnect(&s->chardev);
>>  > + }
>>  > } else {
>>  > vhost_user_blk_stop(vdev);
>>  > + /*
>>  > + * make sure vhost_user_blk_handle_output() ignore fake
>>  > + * guest kick by vhost_dev_disable_notifiers()
>>  > + */
>>  > + barrier();
>>  > + s->should_start = false;
>>  > }
>>  > -
>>  > }
>>  >
>>  > static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
>>  > @@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
>>  > static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>>  > {
>>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > - int i;
>>  > + int i, ret;
>>  >
>>  > if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
>>  > !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
>>  > return;
>>  > }
>>  >
>>  > + if (s->should_start) {
>>  > + return;
>>  > + }
>>  > + s->should_start = true;
>>  > +
>>  > + if (!s->connected) {
>>  > + return;
>>  > + }
>>  > +
>>  > if (s->dev.started) {
>>  > return;
>>  > }
>>  > @@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>>  > /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>>  > * vhost here instead of waiting for .set_status().
>>  > */
>>  > - vhost_user_blk_start(vdev);
>>  > + ret = vhost_user_blk_start(vdev);
>>  > + if (ret < 0) {
>>  > + error_report("vhost-user-blk: vhost start failed: %s",
>>  > + strerror(-ret));
>>  > + qemu_chr_fe_disconnect(&s->chardev);
>>  > + return;
>>  > + }
>>  >
>>  > /* Kick right away to begin processing requests already in vring */
>>  > for (i = 0; i < s->dev.nvqs; i++) {
>>  > @@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
>>  > vhost_dev_reset_shm(s->shm);
>>  > }
>>  >
>>  > +static int vhost_user_blk_connect(DeviceState *dev)
>>  > +{
>>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > + int ret = 0;
>>  > +
>>  > + if (s->connected) {
>>  > + return 0;
>>  > + }
>>  > + s->connected = true;
>>  > +
>>  > + s->dev.nvqs = s->num_queues;
>>  > + s->dev.vqs = s->vqs;
>>  > + s->dev.vq_index = 0;
>>  > + s->dev.backend_features = 0;
>>  > +
>>  > + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
>>  > +
>>  > + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
>>  > + if (ret < 0) {
>>  > + error_report("vhost-user-blk: vhost initialization failed: %s",
>>  > + strerror(-ret));
>>  > + return ret;
>>  > + }
>>  > +
>>  > + /* restore vhost state */
>>  > + if (s->should_start) {
>>  > + ret = vhost_user_blk_start(vdev);
>>  > + if (ret < 0) {
>>  > + error_report("vhost-user-blk: vhost start failed: %s",
>>  > + strerror(-ret));
>>  > + return ret;
>>  > + }
>>  > + }
>>  > +
>>  > + return 0;
>>  > +}
>>  > +
>>  > +static void vhost_user_blk_disconnect(DeviceState *dev)
>>  > +{
>>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > +
>>  > + if (!s->connected) {
>>  > + return;
>>  > + }
>>  > + s->connected = false;
>>  > +
>>  > + if (s->dev.started) {
>>  > + vhost_user_blk_stop(vdev);
>>  > + }
>>  > +
>>  > + vhost_dev_cleanup(&s->dev);
>>  > +}
>>  > +
>>  > +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
>>  > + void *opaque)
>>  > +{
>>  > + DeviceState *dev = opaque;
>>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > +
>>  > + qemu_chr_fe_disconnect(&s->chardev);
>>  > +
>>  > + return true;
>>  > +}
>>  > +
>>  > +static void vhost_user_blk_event(void *opaque, int event)
>>  > +{
>>  > + DeviceState *dev = opaque;
>>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > +
>>  > + switch (event) {
>>  > + case CHR_EVENT_OPENED:
>>  > + if (vhost_user_blk_connect(dev) < 0) {
>>  > + qemu_chr_fe_disconnect(&s->chardev);
>>  > + return;
>>  > + }
>>  > + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
>>  > + vhost_user_blk_watch, dev);
>>  > + break;
>>  > + case CHR_EVENT_CLOSED:
>>  > + vhost_user_blk_disconnect(dev);
>>  > + if (s->watch) {
>>  > + g_source_remove(s->watch);
>>  > + s->watch = 0;
>>  > + }
>>  > + break;
>>  > + }
>>  > +}
>>  > +
>>  > static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>>  > {
>>  > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  > VhostUserState *user;
>>  > int i, ret;
>>  > + Error *err = NULL;
>>  >
>>  > if (!s->chardev.chr) {
>>  > error_setg(errp, "vhost-user-blk: chardev is mandatory");
>>  > @@ -299,23 +429,21 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>>  > }
>>  >
>>  > s->shm = g_new0(struct vhost_shm, 1);
>>  > -
>>  > - s->dev.nvqs = s->num_queues;
>>  > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
>>  > - s->dev.vq_index = 0;
>>  > - s->dev.backend_features = 0;
>>  > -
>>  > - vhost_dev_set_config_notifier(&s->dev, &blk_ops);
>>  > -
>>  > - ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
>>  > - if (ret < 0) {
>>  > - error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
>>  > - strerror(-ret));
>>  > - goto virtio_err;
>>  > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
>>  > + s->watch = 0;
>>  > + s->should_start = false;
>>  > + s->connected = false;
>>  > +
>>  > + while (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
>>  > + error_report_err(err);
>>  > + err = NULL;
>>  > + sleep(1);
>>  > }
>>
>>  After the reconnect loop we have some function calls to vhost backend:
>>  * qemu_chr_fe_set_handlers implicit calls vhost_dev_init
>>  * vhost_dev_get_config
>>  * vhost_dev_init_shm
>>
>>  If vhost backend will restart during their execution the initialzation will be
>>  failed. What do you think? May be we can call these functions inside of
>>  the reconnect loop to fix it?
>
> qemu_chr_fe_wait_connected() should be called only when we are in
> disconnected state. Otherwise, it will be conflicted with reconnect
> thread.
>

Reconnect thread is started by reconnet timer callback from the main loop but
vhost_user_blk_device_realize is also runs in the main loop, so we don't a race
in this case I think.

Regards,
Yury

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2018-12-18 14:25   ` Michael S. Tsirkin
@ 2018-12-18 14:47     ` Yongji Xie
  2018-12-18 14:57       ` Michael S. Tsirkin
  0 siblings, 1 reply; 30+ messages in thread
From: Yongji Xie @ 2018-12-18 14:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, Jason Wang, Yury Kotov, Coquelin, Maxime,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, 18 Dec 2018 at 22:25, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Dec 18, 2018 at 05:59:57PM +0800, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > and VHOST_USER_SET_SHM_FD to support providing shared
> > memory to backend.
> >
> > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > required size of shared memory from backend. Then, qemu
> > allocates memory and sends them back to backend through
> > VHOST_USER_SET_SHM_FD.
> >
> > Note that the shared memory should be used to record
> > inflight I/O by backend. Qemu will clear it when vm reset.
>
> An interesting design choice. Why not let the backend clear it
> on start?
>

The backend might restart when it has some inflight I/Os. In this case,
it should not clear the memory on start, right?

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2018-12-18 14:47     ` Yongji Xie
@ 2018-12-18 14:57       ` Michael S. Tsirkin
  2018-12-18 15:10         ` Yongji Xie
  0 siblings, 1 reply; 30+ messages in thread
From: Michael S. Tsirkin @ 2018-12-18 14:57 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Marc-André Lureau, Jason Wang, Yury Kotov, Coquelin, Maxime,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, Dec 18, 2018 at 10:47:32PM +0800, Yongji Xie wrote:
> On Tue, 18 Dec 2018 at 22:25, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Dec 18, 2018 at 05:59:57PM +0800, elohimes@gmail.com wrote:
> > > From: Xie Yongji <xieyongji@baidu.com>
> > >
> > > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > > and VHOST_USER_SET_SHM_FD to support providing shared
> > > memory to backend.
> > >
> > > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > > required size of shared memory from backend. Then, qemu
> > > allocates memory and sends them back to backend through
> > > VHOST_USER_SET_SHM_FD.
> > >
> > > Note that the shared memory should be used to record
> > > inflight I/O by backend. Qemu will clear it when vm reset.
> >
> > An interesting design choice. Why not let the backend clear it
> > on start?
> >
> 
> The backend might restart when it has some inflight I/Os. In this case,
> it should not clear the memory on start, right?
> 
> Thanks,
> Yongji

I see. So this allows backend to detect a non-initialized buffer
by checking e.g. a version is 0? Clever.

-- 
MST

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2018-12-18 14:35       ` Yury Kotov
@ 2018-12-18 14:59         ` Yongji Xie
  2018-12-18 15:33           ` Yury Kotov
  0 siblings, 1 reply; 30+ messages in thread
From: Yongji Xie @ 2018-12-18 14:59 UTC (permalink / raw)
  To: Yury Kotov
  Cc: mst, marcandre.lureau, jasowang, maxime.coquelin, qemu-devel,
	zhangyu31, chaiwen, nixun, lilin24, Xie Yongji,
	Евгений
	Яковлев

On Tue, 18 Dec 2018 at 22:35, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
> 18.12.2018, 17:16, "Yongji Xie" <elohimes@gmail.com>:
> > On Tue, 18 Dec 2018 at 20:30, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
> >>  + wrfsh@
> >>
> >>  Hi,
> >>
> >>  18.12.2018, 13:01, "elohimes@gmail.com" <elohimes@gmail.com>:
> >>  > From: Xie Yongji <xieyongji@baidu.com>
> >>  >
> >>  > Since we now support the message VHOST_USER_GET_SHM_SIZE
> >>  > and VHOST_USER_SET_SHM_FD. The backend is able to restart
> >>  > safely because it can record inflight I/O in shared memory.
> >>  > This patch allows qemu to reconnect the backend after
> >>  > connection closed.
> >>  >
> >>  > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> >>  > Signed-off-by: Ni Xun <nixun@baidu.com>
> >>  > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> >>  > ---
> >>  > hw/block/vhost-user-blk.c | 183 ++++++++++++++++++++++++-----
> >>  > include/hw/virtio/vhost-user-blk.h | 4 +
> >>  > 2 files changed, 160 insertions(+), 27 deletions(-)
> >>  >
> >>  > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >>  > index 27028cf996..80f2e2d765 100644
> >>  > --- a/hw/block/vhost-user-blk.c
> >>  > +++ b/hw/block/vhost-user-blk.c
> >>  > @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
> >>  > .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
> >>  > };
> >>  >
> >>  > -static void vhost_user_blk_start(VirtIODevice *vdev)
> >>  > +static int vhost_user_blk_start(VirtIODevice *vdev)
> >>  > {
> >>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >>  > @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >>  >
> >>  > if (!k->set_guest_notifiers) {
> >>  > error_report("binding does not support guest notifiers");
> >>  > - return;
> >>  > + return -ENOSYS;
> >>  > }
> >>  >
> >>  > ret = vhost_dev_enable_notifiers(&s->dev, vdev);
> >>  > if (ret < 0) {
> >>  > error_report("Error enabling host notifiers: %d", -ret);
> >>  > - return;
> >>  > + return ret;
> >>  > }
> >>  >
> >>  > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> >>  > @@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >>  > vhost_virtqueue_mask(&s->dev, vdev, i, false);
> >>  > }
> >>  >
> >>  > - return;
> >>  > + return ret;
> >>  >
> >>  > err_guest_notifiers:
> >>  > k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >>  > err_host_notifiers:
> >>  > vhost_dev_disable_notifiers(&s->dev, vdev);
> >>  > + return ret;
> >>  > }
> >>  >
> >>  > static void vhost_user_blk_stop(VirtIODevice *vdev)
> >>  > @@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
> >>  > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >>  > if (ret < 0) {
> >>  > error_report("vhost guest notifier cleanup failed: %d", ret);
> >>  > - return;
> >>  > }
> >>  >
> >>  > vhost_dev_disable_notifiers(&s->dev, vdev);
> >>  > @@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
> >>  > {
> >>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
> >>  > + int ret;
> >>  >
> >>  > if (!vdev->vm_running) {
> >>  > should_start = false;
> >>  > }
> >>  >
> >>  > - if (s->dev.started == should_start) {
> >>  > + if (s->should_start == should_start) {
> >>  > + return;
> >>  > + }
> >>  > +
> >>  > + if (!s->connected || s->dev.started == should_start) {
> >>  > + s->should_start = should_start;
> >>  > return;
> >>  > }
> >>  >
> >>  > if (should_start) {
> >>  > - vhost_user_blk_start(vdev);
> >>  > + s->should_start = true;
> >>  > + /*
> >>  > + * make sure vhost_user_blk_handle_output() ignores fake
> >>  > + * guest kick by vhost_dev_enable_notifiers()
> >>  > + */
> >>  > + barrier();
> >>  > + ret = vhost_user_blk_start(vdev);
> >>  > + if (ret < 0) {
> >>  > + error_report("vhost-user-blk: vhost start failed: %s",
> >>  > + strerror(-ret));
> >>  > + qemu_chr_fe_disconnect(&s->chardev);
> >>  > + }
> >>  > } else {
> >>  > vhost_user_blk_stop(vdev);
> >>  > + /*
> >>  > + * make sure vhost_user_blk_handle_output() ignore fake
> >>  > + * guest kick by vhost_dev_disable_notifiers()
> >>  > + */
> >>  > + barrier();
> >>  > + s->should_start = false;
> >>  > }
> >>  > -
> >>  > }
> >>  >
> >>  > static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> >>  > @@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> >>  > static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >>  > {
> >>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > - int i;
> >>  > + int i, ret;
> >>  >
> >>  > if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> >>  > !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
> >>  > return;
> >>  > }
> >>  >
> >>  > + if (s->should_start) {
> >>  > + return;
> >>  > + }
> >>  > + s->should_start = true;
> >>  > +
> >>  > + if (!s->connected) {
> >>  > + return;
> >>  > + }
> >>  > +
> >>  > if (s->dev.started) {
> >>  > return;
> >>  > }
> >>  > @@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >>  > /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> >>  > * vhost here instead of waiting for .set_status().
> >>  > */
> >>  > - vhost_user_blk_start(vdev);
> >>  > + ret = vhost_user_blk_start(vdev);
> >>  > + if (ret < 0) {
> >>  > + error_report("vhost-user-blk: vhost start failed: %s",
> >>  > + strerror(-ret));
> >>  > + qemu_chr_fe_disconnect(&s->chardev);
> >>  > + return;
> >>  > + }
> >>  >
> >>  > /* Kick right away to begin processing requests already in vring */
> >>  > for (i = 0; i < s->dev.nvqs; i++) {
> >>  > @@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
> >>  > vhost_dev_reset_shm(s->shm);
> >>  > }
> >>  >
> >>  > +static int vhost_user_blk_connect(DeviceState *dev)
> >>  > +{
> >>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > + int ret = 0;
> >>  > +
> >>  > + if (s->connected) {
> >>  > + return 0;
> >>  > + }
> >>  > + s->connected = true;
> >>  > +
> >>  > + s->dev.nvqs = s->num_queues;
> >>  > + s->dev.vqs = s->vqs;
> >>  > + s->dev.vq_index = 0;
> >>  > + s->dev.backend_features = 0;
> >>  > +
> >>  > + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> >>  > +
> >>  > + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> >>  > + if (ret < 0) {
> >>  > + error_report("vhost-user-blk: vhost initialization failed: %s",
> >>  > + strerror(-ret));
> >>  > + return ret;
> >>  > + }
> >>  > +
> >>  > + /* restore vhost state */
> >>  > + if (s->should_start) {
> >>  > + ret = vhost_user_blk_start(vdev);
> >>  > + if (ret < 0) {
> >>  > + error_report("vhost-user-blk: vhost start failed: %s",
> >>  > + strerror(-ret));
> >>  > + return ret;
> >>  > + }
> >>  > + }
> >>  > +
> >>  > + return 0;
> >>  > +}
> >>  > +
> >>  > +static void vhost_user_blk_disconnect(DeviceState *dev)
> >>  > +{
> >>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > +
> >>  > + if (!s->connected) {
> >>  > + return;
> >>  > + }
> >>  > + s->connected = false;
> >>  > +
> >>  > + if (s->dev.started) {
> >>  > + vhost_user_blk_stop(vdev);
> >>  > + }
> >>  > +
> >>  > + vhost_dev_cleanup(&s->dev);
> >>  > +}
> >>  > +
> >>  > +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
> >>  > + void *opaque)
> >>  > +{
> >>  > + DeviceState *dev = opaque;
> >>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > +
> >>  > + qemu_chr_fe_disconnect(&s->chardev);
> >>  > +
> >>  > + return true;
> >>  > +}
> >>  > +
> >>  > +static void vhost_user_blk_event(void *opaque, int event)
> >>  > +{
> >>  > + DeviceState *dev = opaque;
> >>  > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > +
> >>  > + switch (event) {
> >>  > + case CHR_EVENT_OPENED:
> >>  > + if (vhost_user_blk_connect(dev) < 0) {
> >>  > + qemu_chr_fe_disconnect(&s->chardev);
> >>  > + return;
> >>  > + }
> >>  > + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
> >>  > + vhost_user_blk_watch, dev);
> >>  > + break;
> >>  > + case CHR_EVENT_CLOSED:
> >>  > + vhost_user_blk_disconnect(dev);
> >>  > + if (s->watch) {
> >>  > + g_source_remove(s->watch);
> >>  > + s->watch = 0;
> >>  > + }
> >>  > + break;
> >>  > + }
> >>  > +}
> >>  > +
> >>  > static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >>  > {
> >>  > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  > VhostUserState *user;
> >>  > int i, ret;
> >>  > + Error *err = NULL;
> >>  >
> >>  > if (!s->chardev.chr) {
> >>  > error_setg(errp, "vhost-user-blk: chardev is mandatory");
> >>  > @@ -299,23 +429,21 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >>  > }
> >>  >
> >>  > s->shm = g_new0(struct vhost_shm, 1);
> >>  > -
> >>  > - s->dev.nvqs = s->num_queues;
> >>  > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
> >>  > - s->dev.vq_index = 0;
> >>  > - s->dev.backend_features = 0;
> >>  > -
> >>  > - vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> >>  > -
> >>  > - ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> >>  > - if (ret < 0) {
> >>  > - error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> >>  > - strerror(-ret));
> >>  > - goto virtio_err;
> >>  > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
> >>  > + s->watch = 0;
> >>  > + s->should_start = false;
> >>  > + s->connected = false;
> >>  > +
> >>  > + while (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> >>  > + error_report_err(err);
> >>  > + err = NULL;
> >>  > + sleep(1);
> >>  > }
> >>
> >>  After the reconnect loop we have some function calls to vhost backend:
> >>  * qemu_chr_fe_set_handlers implicit calls vhost_dev_init
> >>  * vhost_dev_get_config
> >>  * vhost_dev_init_shm
> >>
> >>  If vhost backend will restart during their execution the initialzation will be
> >>  failed. What do you think? May be we can call these functions inside of
> >>  the reconnect loop to fix it?
> >
> > qemu_chr_fe_wait_connected() should be called only when we are in
> > disconnected state. Otherwise, it will be conflicted with reconnect
> > thread.
> >
>
> Reconnect thread is started by reconnet timer callback from the main loop but
> vhost_user_blk_device_realize is also runs in the main loop, so we don't a race
> in this case I think.
>

The reconnet timer callback have an assert: assert(s->connected == 0).
And it's odd to call qemu_chr_fe_wait_connected() when reconnect
thread is running.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend
  2018-12-18 14:57       ` Michael S. Tsirkin
@ 2018-12-18 15:10         ` Yongji Xie
  0 siblings, 0 replies; 30+ messages in thread
From: Yongji Xie @ 2018-12-18 15:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, Jason Wang, Yury Kotov, Coquelin, Maxime,
	qemu-devel, zhangyu31, chaiwen, nixun, lilin24, Xie Yongji

On Tue, 18 Dec 2018 at 22:57, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Dec 18, 2018 at 10:47:32PM +0800, Yongji Xie wrote:
> > On Tue, 18 Dec 2018 at 22:25, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Dec 18, 2018 at 05:59:57PM +0800, elohimes@gmail.com wrote:
> > > > From: Xie Yongji <xieyongji@baidu.com>
> > > >
> > > > This patch introduces two new messages VHOST_USER_GET_SHM_SIZE
> > > > and VHOST_USER_SET_SHM_FD to support providing shared
> > > > memory to backend.
> > > >
> > > > Firstly, qemu uses VHOST_USER_GET_SHM_SIZE to get the
> > > > required size of shared memory from backend. Then, qemu
> > > > allocates memory and sends them back to backend through
> > > > VHOST_USER_SET_SHM_FD.
> > > >
> > > > Note that the shared memory should be used to record
> > > > inflight I/O by backend. Qemu will clear it when vm reset.
> > >
> > > An interesting design choice. Why not let the backend clear it
> > > on start?
> > >
> >
> > The backend might restart when it has some inflight I/Os. In this case,
> > it should not clear the memory on start, right?
> >
> > Thanks,
> > Yongji
>
> I see. So this allows backend to detect a non-initialized buffer
> by checking e.g. a version is 0? Clever.
>

If the version is a variable in the buffer, yes, we can detect whether
the buffer is initialized or not by checking it.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18 12:24   ` Marc-André Lureau
  2018-12-18 13:33     ` Yongji Xie
@ 2018-12-18 15:25     ` Daniel P. Berrangé
  2018-12-18 16:02       ` Michael S. Tsirkin
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel P. Berrangé @ 2018-12-18 15:25 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: elohimes, zhangyu31, Michael S . Tsirkin, xieyongji, Jason Wang,
	qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime, chaiwen,
	nixun

On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> >
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > New option "disconnected" is added to init the chardev socket
> > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > to connect when necessary. Now it would be used for unix domain
> > socket of vhost-user-blk device to support reconnect.
> 
> What difference does that make if you wait for connection in
> qemu_chr_fe_wait_connected(), or during chardev setup?
> 
> "disconnected" is misleading, would it be possible to reuse
> "wait/nowait" instead?

Currently we default to doing a blocking connect in foreground,
except if reconnect is non-zero, in which case we do a connect
async in the background. This "disconnected" proposal effectively
does a blocking connect, but delayed to later in startup.

IOW, this could already be achieved if "reconnect" were set to
non-zero. If the usage doesn't want reconnect though, I tend
to agree that we should use the exisiting wait/nowait options
to let it be controlled. I don't see that this "disconnected"
option gives a compelling benefit over using wait/nowait.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2018-12-18 14:59         ` Yongji Xie
@ 2018-12-18 15:33           ` Yury Kotov
  2018-12-19  8:42             ` Yongji Xie
  0 siblings, 1 reply; 30+ messages in thread
From: Yury Kotov @ 2018-12-18 15:33 UTC (permalink / raw)
  To: Yongji Xie
  Cc: mst, marcandre.lureau, jasowang, maxime.coquelin, qemu-devel,
	zhangyu31, chaiwen, nixun, lilin24, Xie Yongji,
	Евгений
	Яковлев



18.12.2018, 17:59, "Yongji Xie" <elohimes@gmail.com>:
> On Tue, 18 Dec 2018 at 22:35, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>>  18.12.2018, 17:16, "Yongji Xie" <elohimes@gmail.com>:
>>  > On Tue, 18 Dec 2018 at 20:30, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>>  >> + wrfsh@
>>  >>
>>  >> Hi,
>>  >>
>>  >> 18.12.2018, 13:01, "elohimes@gmail.com" <elohimes@gmail.com>:
>>  >> > From: Xie Yongji <xieyongji@baidu.com>
>>  >> >
>>  >> > Since we now support the message VHOST_USER_GET_SHM_SIZE
>>  >> > and VHOST_USER_SET_SHM_FD. The backend is able to restart
>>  >> > safely because it can record inflight I/O in shared memory.
>>  >> > This patch allows qemu to reconnect the backend after
>>  >> > connection closed.
>>  >> >
>>  >> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
>>  >> > Signed-off-by: Ni Xun <nixun@baidu.com>
>>  >> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
>>  >> > ---
>>  >> > hw/block/vhost-user-blk.c | 183 ++++++++++++++++++++++++-----
>>  >> > include/hw/virtio/vhost-user-blk.h | 4 +
>>  >> > 2 files changed, 160 insertions(+), 27 deletions(-)
>>  >> >
>>  >> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>>  >> > index 27028cf996..80f2e2d765 100644
>>  >> > --- a/hw/block/vhost-user-blk.c
>>  >> > +++ b/hw/block/vhost-user-blk.c
>>  >> > @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
>>  >> > .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
>>  >> > };
>>  >> >
>>  >> > -static void vhost_user_blk_start(VirtIODevice *vdev)
>>  >> > +static int vhost_user_blk_start(VirtIODevice *vdev)
>>  >> > {
>>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
>>  >> > @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>>  >> >
>>  >> > if (!k->set_guest_notifiers) {
>>  >> > error_report("binding does not support guest notifiers");
>>  >> > - return;
>>  >> > + return -ENOSYS;
>>  >> > }
>>  >> >
>>  >> > ret = vhost_dev_enable_notifiers(&s->dev, vdev);
>>  >> > if (ret < 0) {
>>  >> > error_report("Error enabling host notifiers: %d", -ret);
>>  >> > - return;
>>  >> > + return ret;
>>  >> > }
>>  >> >
>>  >> > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
>>  >> > @@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
>>  >> > vhost_virtqueue_mask(&s->dev, vdev, i, false);
>>  >> > }
>>  >> >
>>  >> > - return;
>>  >> > + return ret;
>>  >> >
>>  >> > err_guest_notifiers:
>>  >> > k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>>  >> > err_host_notifiers:
>>  >> > vhost_dev_disable_notifiers(&s->dev, vdev);
>>  >> > + return ret;
>>  >> > }
>>  >> >
>>  >> > static void vhost_user_blk_stop(VirtIODevice *vdev)
>>  >> > @@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
>>  >> > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>>  >> > if (ret < 0) {
>>  >> > error_report("vhost guest notifier cleanup failed: %d", ret);
>>  >> > - return;
>>  >> > }
>>  >> >
>>  >> > vhost_dev_disable_notifiers(&s->dev, vdev);
>>  >> > @@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
>>  >> > {
>>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
>>  >> > + int ret;
>>  >> >
>>  >> > if (!vdev->vm_running) {
>>  >> > should_start = false;
>>  >> > }
>>  >> >
>>  >> > - if (s->dev.started == should_start) {
>>  >> > + if (s->should_start == should_start) {
>>  >> > + return;
>>  >> > + }
>>  >> > +
>>  >> > + if (!s->connected || s->dev.started == should_start) {
>>  >> > + s->should_start = should_start;
>>  >> > return;
>>  >> > }
>>  >> >
>>  >> > if (should_start) {
>>  >> > - vhost_user_blk_start(vdev);
>>  >> > + s->should_start = true;
>>  >> > + /*
>>  >> > + * make sure vhost_user_blk_handle_output() ignores fake
>>  >> > + * guest kick by vhost_dev_enable_notifiers()
>>  >> > + */
>>  >> > + barrier();
>>  >> > + ret = vhost_user_blk_start(vdev);
>>  >> > + if (ret < 0) {
>>  >> > + error_report("vhost-user-blk: vhost start failed: %s",
>>  >> > + strerror(-ret));
>>  >> > + qemu_chr_fe_disconnect(&s->chardev);
>>  >> > + }
>>  >> > } else {
>>  >> > vhost_user_blk_stop(vdev);
>>  >> > + /*
>>  >> > + * make sure vhost_user_blk_handle_output() ignore fake
>>  >> > + * guest kick by vhost_dev_disable_notifiers()
>>  >> > + */
>>  >> > + barrier();
>>  >> > + s->should_start = false;
>>  >> > }
>>  >> > -
>>  >> > }
>>  >> >
>>  >> > static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
>>  >> > @@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
>>  >> > static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>>  >> > {
>>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > - int i;
>>  >> > + int i, ret;
>>  >> >
>>  >> > if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
>>  >> > !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
>>  >> > return;
>>  >> > }
>>  >> >
>>  >> > + if (s->should_start) {
>>  >> > + return;
>>  >> > + }
>>  >> > + s->should_start = true;
>>  >> > +
>>  >> > + if (!s->connected) {
>>  >> > + return;
>>  >> > + }
>>  >> > +
>>  >> > if (s->dev.started) {
>>  >> > return;
>>  >> > }
>>  >> > @@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>>  >> > /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>>  >> > * vhost here instead of waiting for .set_status().
>>  >> > */
>>  >> > - vhost_user_blk_start(vdev);
>>  >> > + ret = vhost_user_blk_start(vdev);
>>  >> > + if (ret < 0) {
>>  >> > + error_report("vhost-user-blk: vhost start failed: %s",
>>  >> > + strerror(-ret));
>>  >> > + qemu_chr_fe_disconnect(&s->chardev);
>>  >> > + return;
>>  >> > + }
>>  >> >
>>  >> > /* Kick right away to begin processing requests already in vring */
>>  >> > for (i = 0; i < s->dev.nvqs; i++) {
>>  >> > @@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
>>  >> > vhost_dev_reset_shm(s->shm);
>>  >> > }
>>  >> >
>>  >> > +static int vhost_user_blk_connect(DeviceState *dev)
>>  >> > +{
>>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > + int ret = 0;
>>  >> > +
>>  >> > + if (s->connected) {
>>  >> > + return 0;
>>  >> > + }
>>  >> > + s->connected = true;
>>  >> > +
>>  >> > + s->dev.nvqs = s->num_queues;
>>  >> > + s->dev.vqs = s->vqs;
>>  >> > + s->dev.vq_index = 0;
>>  >> > + s->dev.backend_features = 0;
>>  >> > +
>>  >> > + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
>>  >> > +
>>  >> > + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
>>  >> > + if (ret < 0) {
>>  >> > + error_report("vhost-user-blk: vhost initialization failed: %s",
>>  >> > + strerror(-ret));
>>  >> > + return ret;
>>  >> > + }
>>  >> > +
>>  >> > + /* restore vhost state */
>>  >> > + if (s->should_start) {
>>  >> > + ret = vhost_user_blk_start(vdev);
>>  >> > + if (ret < 0) {
>>  >> > + error_report("vhost-user-blk: vhost start failed: %s",
>>  >> > + strerror(-ret));
>>  >> > + return ret;
>>  >> > + }
>>  >> > + }
>>  >> > +
>>  >> > + return 0;
>>  >> > +}
>>  >> > +
>>  >> > +static void vhost_user_blk_disconnect(DeviceState *dev)
>>  >> > +{
>>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > +
>>  >> > + if (!s->connected) {
>>  >> > + return;
>>  >> > + }
>>  >> > + s->connected = false;
>>  >> > +
>>  >> > + if (s->dev.started) {
>>  >> > + vhost_user_blk_stop(vdev);
>>  >> > + }
>>  >> > +
>>  >> > + vhost_dev_cleanup(&s->dev);
>>  >> > +}
>>  >> > +
>>  >> > +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
>>  >> > + void *opaque)
>>  >> > +{
>>  >> > + DeviceState *dev = opaque;
>>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > +
>>  >> > + qemu_chr_fe_disconnect(&s->chardev);
>>  >> > +
>>  >> > + return true;
>>  >> > +}
>>  >> > +
>>  >> > +static void vhost_user_blk_event(void *opaque, int event)
>>  >> > +{
>>  >> > + DeviceState *dev = opaque;
>>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > +
>>  >> > + switch (event) {
>>  >> > + case CHR_EVENT_OPENED:
>>  >> > + if (vhost_user_blk_connect(dev) < 0) {
>>  >> > + qemu_chr_fe_disconnect(&s->chardev);
>>  >> > + return;
>>  >> > + }
>>  >> > + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
>>  >> > + vhost_user_blk_watch, dev);
>>  >> > + break;
>>  >> > + case CHR_EVENT_CLOSED:
>>  >> > + vhost_user_blk_disconnect(dev);
>>  >> > + if (s->watch) {
>>  >> > + g_source_remove(s->watch);
>>  >> > + s->watch = 0;
>>  >> > + }
>>  >> > + break;
>>  >> > + }
>>  >> > +}
>>  >> > +
>>  >> > static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>>  >> > {
>>  >> > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>  >> > VhostUserState *user;
>>  >> > int i, ret;
>>  >> > + Error *err = NULL;
>>  >> >
>>  >> > if (!s->chardev.chr) {
>>  >> > error_setg(errp, "vhost-user-blk: chardev is mandatory");
>>  >> > @@ -299,23 +429,21 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>>  >> > }
>>  >> >
>>  >> > s->shm = g_new0(struct vhost_shm, 1);
>>  >> > -
>>  >> > - s->dev.nvqs = s->num_queues;
>>  >> > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
>>  >> > - s->dev.vq_index = 0;
>>  >> > - s->dev.backend_features = 0;
>>  >> > -
>>  >> > - vhost_dev_set_config_notifier(&s->dev, &blk_ops);
>>  >> > -
>>  >> > - ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
>>  >> > - if (ret < 0) {
>>  >> > - error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
>>  >> > - strerror(-ret));
>>  >> > - goto virtio_err;
>>  >> > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
>>  >> > + s->watch = 0;
>>  >> > + s->should_start = false;
>>  >> > + s->connected = false;
>>  >> > +
>>  >> > + while (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
>>  >> > + error_report_err(err);
>>  >> > + err = NULL;
>>  >> > + sleep(1);
>>  >> > }
>>  >>
>>  >> After the reconnect loop we have some function calls to vhost backend:
>>  >> * qemu_chr_fe_set_handlers implicit calls vhost_dev_init
>>  >> * vhost_dev_get_config
>>  >> * vhost_dev_init_shm
>>  >>
>>  >> If vhost backend will restart during their execution the initialzation will be
>>  >> failed. What do you think? May be we can call these functions inside of
>>  >> the reconnect loop to fix it?
>>  >
>>  > qemu_chr_fe_wait_connected() should be called only when we are in
>>  > disconnected state. Otherwise, it will be conflicted with reconnect
>>  > thread.
>>  >
>>
>>  Reconnect thread is started by reconnet timer callback from the main loop but
>>  vhost_user_blk_device_realize is also runs in the main loop, so we don't a race
>>  in this case I think.
>
> The reconnet timer callback have an assert: assert(s->connected == 0).
> And it's odd to call qemu_chr_fe_wait_connected() when reconnect
> thread is running.
>

Reconnect timer callback is a function socket_reconnect_timeout. This function
doesn't have any asserts. Do you mean qemu_chr_socket_restart_timer? It has an
assert, yes. But it is called from tcp_chr_disconnect in the same thread.

E.g. vhost_user_get_config ->
     vhost_user_read ->
     qemu_chr_fe_read_all ->
     tcp_chr_sync_read ->
     tcp_chr_disconnect ->
     qemu_chr_socket_restart_timer

May be I misunderstand something, but don't see any problems here...

I see that reconnect timer callback is postponed until main loop run and we may
safely call qemu_chr_fe_wait_connected.

Regards,
Yury

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18 15:25     ` Daniel P. Berrangé
@ 2018-12-18 16:02       ` Michael S. Tsirkin
  2018-12-18 16:09         ` Daniel P. Berrangé
  0 siblings, 1 reply; 30+ messages in thread
From: Michael S. Tsirkin @ 2018-12-18 16:02 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Marc-André Lureau, elohimes, zhangyu31, xieyongji,
	Jason Wang, qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime,
	chaiwen, nixun

On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > Hi
> > 
> > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > >
> > > From: Xie Yongji <xieyongji@baidu.com>
> > >
> > > New option "disconnected" is added to init the chardev socket
> > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > to connect when necessary. Now it would be used for unix domain
> > > socket of vhost-user-blk device to support reconnect.
> > 
> > What difference does that make if you wait for connection in
> > qemu_chr_fe_wait_connected(), or during chardev setup?
> > 
> > "disconnected" is misleading, would it be possible to reuse
> > "wait/nowait" instead?
> 
> Currently we default to doing a blocking connect in foreground,
> except if reconnect is non-zero, in which case we do a connect
> async in the background. This "disconnected" proposal effectively
> does a blocking connect, but delayed to later in startup.
> 
> IOW, this could already be achieved if "reconnect" were set to
> non-zero. If the usage doesn't want reconnect though, I tend
> to agree that we should use the exisiting wait/nowait options
> to let it be controlled. I don't see that this "disconnected"
> option gives a compelling benefit over using wait/nowait.
> 
> 
> Regards,
> Daniel

So the tricky thing is that we can not expose the
device to guest until we get a connection and can query
it for the first time. After that is completed,
we can reasonably support disconnected operation at least for net.

We could do hotplug at time of connect I guess ...


> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18 16:02       ` Michael S. Tsirkin
@ 2018-12-18 16:09         ` Daniel P. Berrangé
  2018-12-19  9:01           ` Yongji Xie
  2018-12-19 15:55           ` Michael S. Tsirkin
  0 siblings, 2 replies; 30+ messages in thread
From: Daniel P. Berrangé @ 2018-12-18 16:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, elohimes, zhangyu31, xieyongji,
	Jason Wang, qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime,
	chaiwen, nixun

On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > >
> > > > From: Xie Yongji <xieyongji@baidu.com>
> > > >
> > > > New option "disconnected" is added to init the chardev socket
> > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > to connect when necessary. Now it would be used for unix domain
> > > > socket of vhost-user-blk device to support reconnect.
> > > 
> > > What difference does that make if you wait for connection in
> > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > 
> > > "disconnected" is misleading, would it be possible to reuse
> > > "wait/nowait" instead?
> > 
> > Currently we default to doing a blocking connect in foreground,
> > except if reconnect is non-zero, in which case we do a connect
> > async in the background. This "disconnected" proposal effectively
> > does a blocking connect, but delayed to later in startup.
> > 
> > IOW, this could already be achieved if "reconnect" were set to
> > non-zero. If the usage doesn't want reconnect though, I tend
> > to agree that we should use the exisiting wait/nowait options
> > to let it be controlled. I don't see that this "disconnected"
> > option gives a compelling benefit over using wait/nowait.
> 
> So the tricky thing is that we can not expose the
> device to guest until we get a connection and can query
> it for the first time. After that is completed,
> we can reasonably support disconnected operation at least for net.

The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
so that its no different to the situation with the proposed "disconnected"
flag.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend
  2018-12-18 15:33           ` Yury Kotov
@ 2018-12-19  8:42             ` Yongji Xie
  0 siblings, 0 replies; 30+ messages in thread
From: Yongji Xie @ 2018-12-19  8:42 UTC (permalink / raw)
  To: Yury Kotov
  Cc: mst, marcandre.lureau, jasowang, maxime.coquelin, qemu-devel,
	zhangyu31, chaiwen, nixun, lilin24, Xie Yongji,
	Евгений
	Яковлев

On Tue, 18 Dec 2018 at 23:33, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
>
>
> 18.12.2018, 17:59, "Yongji Xie" <elohimes@gmail.com>:
> > On Tue, 18 Dec 2018 at 22:35, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
> >>  18.12.2018, 17:16, "Yongji Xie" <elohimes@gmail.com>:
> >>  > On Tue, 18 Dec 2018 at 20:30, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
> >>  >> + wrfsh@
> >>  >>
> >>  >> Hi,
> >>  >>
> >>  >> 18.12.2018, 13:01, "elohimes@gmail.com" <elohimes@gmail.com>:
> >>  >> > From: Xie Yongji <xieyongji@baidu.com>
> >>  >> >
> >>  >> > Since we now support the message VHOST_USER_GET_SHM_SIZE
> >>  >> > and VHOST_USER_SET_SHM_FD. The backend is able to restart
> >>  >> > safely because it can record inflight I/O in shared memory.
> >>  >> > This patch allows qemu to reconnect the backend after
> >>  >> > connection closed.
> >>  >> >
> >>  >> > Signed-off-by: Xie Yongji <xieyongji@baidu.com>
> >>  >> > Signed-off-by: Ni Xun <nixun@baidu.com>
> >>  >> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> >>  >> > ---
> >>  >> > hw/block/vhost-user-blk.c | 183 ++++++++++++++++++++++++-----
> >>  >> > include/hw/virtio/vhost-user-blk.h | 4 +
> >>  >> > 2 files changed, 160 insertions(+), 27 deletions(-)
> >>  >> >
> >>  >> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >>  >> > index 27028cf996..80f2e2d765 100644
> >>  >> > --- a/hw/block/vhost-user-blk.c
> >>  >> > +++ b/hw/block/vhost-user-blk.c
> >>  >> > @@ -101,7 +101,7 @@ const VhostDevConfigOps blk_ops = {
> >>  >> > .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
> >>  >> > };
> >>  >> >
> >>  >> > -static void vhost_user_blk_start(VirtIODevice *vdev)
> >>  >> > +static int vhost_user_blk_start(VirtIODevice *vdev)
> >>  >> > {
> >>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >>  >> > @@ -110,13 +110,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >>  >> >
> >>  >> > if (!k->set_guest_notifiers) {
> >>  >> > error_report("binding does not support guest notifiers");
> >>  >> > - return;
> >>  >> > + return -ENOSYS;
> >>  >> > }
> >>  >> >
> >>  >> > ret = vhost_dev_enable_notifiers(&s->dev, vdev);
> >>  >> > if (ret < 0) {
> >>  >> > error_report("Error enabling host notifiers: %d", -ret);
> >>  >> > - return;
> >>  >> > + return ret;
> >>  >> > }
> >>  >> >
> >>  >> > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> >>  >> > @@ -147,12 +147,13 @@ static void vhost_user_blk_start(VirtIODevice *vdev)
> >>  >> > vhost_virtqueue_mask(&s->dev, vdev, i, false);
> >>  >> > }
> >>  >> >
> >>  >> > - return;
> >>  >> > + return ret;
> >>  >> >
> >>  >> > err_guest_notifiers:
> >>  >> > k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >>  >> > err_host_notifiers:
> >>  >> > vhost_dev_disable_notifiers(&s->dev, vdev);
> >>  >> > + return ret;
> >>  >> > }
> >>  >> >
> >>  >> > static void vhost_user_blk_stop(VirtIODevice *vdev)
> >>  >> > @@ -171,7 +172,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
> >>  >> > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >>  >> > if (ret < 0) {
> >>  >> > error_report("vhost guest notifier cleanup failed: %d", ret);
> >>  >> > - return;
> >>  >> > }
> >>  >> >
> >>  >> > vhost_dev_disable_notifiers(&s->dev, vdev);
> >>  >> > @@ -181,21 +181,43 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status)
> >>  >> > {
> >>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
> >>  >> > + int ret;
> >>  >> >
> >>  >> > if (!vdev->vm_running) {
> >>  >> > should_start = false;
> >>  >> > }
> >>  >> >
> >>  >> > - if (s->dev.started == should_start) {
> >>  >> > + if (s->should_start == should_start) {
> >>  >> > + return;
> >>  >> > + }
> >>  >> > +
> >>  >> > + if (!s->connected || s->dev.started == should_start) {
> >>  >> > + s->should_start = should_start;
> >>  >> > return;
> >>  >> > }
> >>  >> >
> >>  >> > if (should_start) {
> >>  >> > - vhost_user_blk_start(vdev);
> >>  >> > + s->should_start = true;
> >>  >> > + /*
> >>  >> > + * make sure vhost_user_blk_handle_output() ignores fake
> >>  >> > + * guest kick by vhost_dev_enable_notifiers()
> >>  >> > + */
> >>  >> > + barrier();
> >>  >> > + ret = vhost_user_blk_start(vdev);
> >>  >> > + if (ret < 0) {
> >>  >> > + error_report("vhost-user-blk: vhost start failed: %s",
> >>  >> > + strerror(-ret));
> >>  >> > + qemu_chr_fe_disconnect(&s->chardev);
> >>  >> > + }
> >>  >> > } else {
> >>  >> > vhost_user_blk_stop(vdev);
> >>  >> > + /*
> >>  >> > + * make sure vhost_user_blk_handle_output() ignore fake
> >>  >> > + * guest kick by vhost_dev_disable_notifiers()
> >>  >> > + */
> >>  >> > + barrier();
> >>  >> > + s->should_start = false;
> >>  >> > }
> >>  >> > -
> >>  >> > }
> >>  >> >
> >>  >> > static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> >>  >> > @@ -225,13 +247,22 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev,
> >>  >> > static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >>  >> > {
> >>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > - int i;
> >>  >> > + int i, ret;
> >>  >> >
> >>  >> > if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> >>  >> > !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) {
> >>  >> > return;
> >>  >> > }
> >>  >> >
> >>  >> > + if (s->should_start) {
> >>  >> > + return;
> >>  >> > + }
> >>  >> > + s->should_start = true;
> >>  >> > +
> >>  >> > + if (!s->connected) {
> >>  >> > + return;
> >>  >> > + }
> >>  >> > +
> >>  >> > if (s->dev.started) {
> >>  >> > return;
> >>  >> > }
> >>  >> > @@ -239,7 +270,13 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >>  >> > /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> >>  >> > * vhost here instead of waiting for .set_status().
> >>  >> > */
> >>  >> > - vhost_user_blk_start(vdev);
> >>  >> > + ret = vhost_user_blk_start(vdev);
> >>  >> > + if (ret < 0) {
> >>  >> > + error_report("vhost-user-blk: vhost start failed: %s",
> >>  >> > + strerror(-ret));
> >>  >> > + qemu_chr_fe_disconnect(&s->chardev);
> >>  >> > + return;
> >>  >> > + }
> >>  >> >
> >>  >> > /* Kick right away to begin processing requests already in vring */
> >>  >> > for (i = 0; i < s->dev.nvqs; i++) {
> >>  >> > @@ -259,12 +296,105 @@ static void vhost_user_blk_reset(VirtIODevice *vdev)
> >>  >> > vhost_dev_reset_shm(s->shm);
> >>  >> > }
> >>  >> >
> >>  >> > +static int vhost_user_blk_connect(DeviceState *dev)
> >>  >> > +{
> >>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > + int ret = 0;
> >>  >> > +
> >>  >> > + if (s->connected) {
> >>  >> > + return 0;
> >>  >> > + }
> >>  >> > + s->connected = true;
> >>  >> > +
> >>  >> > + s->dev.nvqs = s->num_queues;
> >>  >> > + s->dev.vqs = s->vqs;
> >>  >> > + s->dev.vq_index = 0;
> >>  >> > + s->dev.backend_features = 0;
> >>  >> > +
> >>  >> > + vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> >>  >> > +
> >>  >> > + ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> >>  >> > + if (ret < 0) {
> >>  >> > + error_report("vhost-user-blk: vhost initialization failed: %s",
> >>  >> > + strerror(-ret));
> >>  >> > + return ret;
> >>  >> > + }
> >>  >> > +
> >>  >> > + /* restore vhost state */
> >>  >> > + if (s->should_start) {
> >>  >> > + ret = vhost_user_blk_start(vdev);
> >>  >> > + if (ret < 0) {
> >>  >> > + error_report("vhost-user-blk: vhost start failed: %s",
> >>  >> > + strerror(-ret));
> >>  >> > + return ret;
> >>  >> > + }
> >>  >> > + }
> >>  >> > +
> >>  >> > + return 0;
> >>  >> > +}
> >>  >> > +
> >>  >> > +static void vhost_user_blk_disconnect(DeviceState *dev)
> >>  >> > +{
> >>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > +
> >>  >> > + if (!s->connected) {
> >>  >> > + return;
> >>  >> > + }
> >>  >> > + s->connected = false;
> >>  >> > +
> >>  >> > + if (s->dev.started) {
> >>  >> > + vhost_user_blk_stop(vdev);
> >>  >> > + }
> >>  >> > +
> >>  >> > + vhost_dev_cleanup(&s->dev);
> >>  >> > +}
> >>  >> > +
> >>  >> > +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition cond,
> >>  >> > + void *opaque)
> >>  >> > +{
> >>  >> > + DeviceState *dev = opaque;
> >>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > +
> >>  >> > + qemu_chr_fe_disconnect(&s->chardev);
> >>  >> > +
> >>  >> > + return true;
> >>  >> > +}
> >>  >> > +
> >>  >> > +static void vhost_user_blk_event(void *opaque, int event)
> >>  >> > +{
> >>  >> > + DeviceState *dev = opaque;
> >>  >> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  >> > + VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > +
> >>  >> > + switch (event) {
> >>  >> > + case CHR_EVENT_OPENED:
> >>  >> > + if (vhost_user_blk_connect(dev) < 0) {
> >>  >> > + qemu_chr_fe_disconnect(&s->chardev);
> >>  >> > + return;
> >>  >> > + }
> >>  >> > + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP,
> >>  >> > + vhost_user_blk_watch, dev);
> >>  >> > + break;
> >>  >> > + case CHR_EVENT_CLOSED:
> >>  >> > + vhost_user_blk_disconnect(dev);
> >>  >> > + if (s->watch) {
> >>  >> > + g_source_remove(s->watch);
> >>  >> > + s->watch = 0;
> >>  >> > + }
> >>  >> > + break;
> >>  >> > + }
> >>  >> > +}
> >>  >> > +
> >>  >> > static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >>  >> > {
> >>  >> > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  >> > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >> > VhostUserState *user;
> >>  >> > int i, ret;
> >>  >> > + Error *err = NULL;
> >>  >> >
> >>  >> > if (!s->chardev.chr) {
> >>  >> > error_setg(errp, "vhost-user-blk: chardev is mandatory");
> >>  >> > @@ -299,23 +429,21 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >>  >> > }
> >>  >> >
> >>  >> > s->shm = g_new0(struct vhost_shm, 1);
> >>  >> > -
> >>  >> > - s->dev.nvqs = s->num_queues;
> >>  >> > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
> >>  >> > - s->dev.vq_index = 0;
> >>  >> > - s->dev.backend_features = 0;
> >>  >> > -
> >>  >> > - vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> >>  >> > -
> >>  >> > - ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> >>  >> > - if (ret < 0) {
> >>  >> > - error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> >>  >> > - strerror(-ret));
> >>  >> > - goto virtio_err;
> >>  >> > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
> >>  >> > + s->watch = 0;
> >>  >> > + s->should_start = false;
> >>  >> > + s->connected = false;
> >>  >> > +
> >>  >> > + while (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> >>  >> > + error_report_err(err);
> >>  >> > + err = NULL;
> >>  >> > + sleep(1);
> >>  >> > }
> >>  >>
> >>  >> After the reconnect loop we have some function calls to vhost backend:
> >>  >> * qemu_chr_fe_set_handlers implicit calls vhost_dev_init
> >>  >> * vhost_dev_get_config
> >>  >> * vhost_dev_init_shm
> >>  >>
> >>  >> If vhost backend will restart during their execution the initialzation will be
> >>  >> failed. What do you think? May be we can call these functions inside of
> >>  >> the reconnect loop to fix it?
> >>  >
> >>  > qemu_chr_fe_wait_connected() should be called only when we are in
> >>  > disconnected state. Otherwise, it will be conflicted with reconnect
> >>  > thread.
> >>  >
> >>
> >>  Reconnect thread is started by reconnet timer callback from the main loop but
> >>  vhost_user_blk_device_realize is also runs in the main loop, so we don't a race
> >>  in this case I think.
> >
> > The reconnet timer callback have an assert: assert(s->connected == 0).
> > And it's odd to call qemu_chr_fe_wait_connected() when reconnect
> > thread is running.
> >
>
> Reconnect timer callback is a function socket_reconnect_timeout. This function
> doesn't have any asserts. Do you mean qemu_chr_socket_restart_timer? It has an
> assert, yes. But it is called from tcp_chr_disconnect in the same thread.
>
> E.g. vhost_user_get_config ->
>      vhost_user_read ->
>      qemu_chr_fe_read_all ->
>      tcp_chr_sync_read ->
>      tcp_chr_disconnect ->
>      qemu_chr_socket_restart_timer
>
> May be I misunderstand something, but don't see any problems here...
>
> I see that reconnect timer callback is postponed until main loop run and we may
> safely call qemu_chr_fe_wait_connected.
>

Oh, yes, you are right! I made a mistake here. The reconnect thread
would not run in this case.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18 16:09         ` Daniel P. Berrangé
@ 2018-12-19  9:01           ` Yongji Xie
  2018-12-19 15:55           ` Michael S. Tsirkin
  1 sibling, 0 replies; 30+ messages in thread
From: Yongji Xie @ 2018-12-19  9:01 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, Marc-André Lureau, zhangyu31,
	Xie Yongji, Jason Wang, qemu-devel, lilin24, Yury Kotov,
	Coquelin, Maxime, chaiwen, nixun

On Wed, 19 Dec 2018 at 00:09, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > > Hi
> > > >
> > > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > > >
> > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > >
> > > > > New option "disconnected" is added to init the chardev socket
> > > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > > to connect when necessary. Now it would be used for unix domain
> > > > > socket of vhost-user-blk device to support reconnect.
> > > >
> > > > What difference does that make if you wait for connection in
> > > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > >
> > > > "disconnected" is misleading, would it be possible to reuse
> > > > "wait/nowait" instead?
> > >
> > > Currently we default to doing a blocking connect in foreground,
> > > except if reconnect is non-zero, in which case we do a connect
> > > async in the background. This "disconnected" proposal effectively
> > > does a blocking connect, but delayed to later in startup.
> > >
> > > IOW, this could already be achieved if "reconnect" were set to
> > > non-zero. If the usage doesn't want reconnect though, I tend
> > > to agree that we should use the exisiting wait/nowait options
> > > to let it be controlled. I don't see that this "disconnected"
> > > option gives a compelling benefit over using wait/nowait.
> >
> > So the tricky thing is that we can not expose the
> > device to guest until we get a connection and can query
> > it for the first time. After that is completed,
> > we can reasonably support disconnected operation at least for net.
>
> The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
> so that its no different to the situation with the proposed "disconnected"
> flag.
>

I considered resuing wait/nowait before. One problem is that we will
have a thread to do connect async if backend is not started during
chardev setup.
Then if we use qemu_chr_fe_wait_connected() manually to connect in
device code, we may have some conflict in main thread like:

qemu_chr_fe_wait_connected()
 tcp_chr_wait_connected()
  tcp_chr_connect()
   s->connected = 1;

    qemu_chr_socket_connected()
     check_report_connect_error()
      qemu_chr_socket_restart_timer()
       assert(s->connected == 0) /* qemu crash */

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-18 16:09         ` Daniel P. Berrangé
  2018-12-19  9:01           ` Yongji Xie
@ 2018-12-19 15:55           ` Michael S. Tsirkin
  2018-12-19 16:01             ` Daniel P. Berrangé
  2018-12-20  4:25             ` Yongji Xie
  1 sibling, 2 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2018-12-19 15:55 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Marc-André Lureau, elohimes, zhangyu31, xieyongji,
	Jason Wang, qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime,
	chaiwen, nixun

On Tue, Dec 18, 2018 at 04:09:19PM +0000, Daniel P. Berrangé wrote:
> On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > > Hi
> > > > 
> > > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > > >
> > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > >
> > > > > New option "disconnected" is added to init the chardev socket
> > > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > > to connect when necessary. Now it would be used for unix domain
> > > > > socket of vhost-user-blk device to support reconnect.
> > > > 
> > > > What difference does that make if you wait for connection in
> > > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > > 
> > > > "disconnected" is misleading, would it be possible to reuse
> > > > "wait/nowait" instead?
> > > 
> > > Currently we default to doing a blocking connect in foreground,
> > > except if reconnect is non-zero, in which case we do a connect
> > > async in the background. This "disconnected" proposal effectively
> > > does a blocking connect, but delayed to later in startup.
> > > 
> > > IOW, this could already be achieved if "reconnect" were set to
> > > non-zero. If the usage doesn't want reconnect though, I tend
> > > to agree that we should use the exisiting wait/nowait options
> > > to let it be controlled. I don't see that this "disconnected"
> > > option gives a compelling benefit over using wait/nowait.
> > 
> > So the tricky thing is that we can not expose the
> > device to guest until we get a connection and can query
> > it for the first time. After that is completed,
> > we can reasonably support disconnected operation at least for net.
> 
> The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
> so that its no different to the situation with the proposed "disconnected"
> flag.
> 
> Regards,
> Daniel

I guess so, but wouldn't it be confusing to users that one says
"nowait" and qemu still waits for a connection and does not
run the VM? That's different from how nowait behaves normally.

> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-19 15:55           ` Michael S. Tsirkin
@ 2018-12-19 16:01             ` Daniel P. Berrangé
  2018-12-19 16:50               ` Michael S. Tsirkin
  2018-12-20  4:25             ` Yongji Xie
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel P. Berrangé @ 2018-12-19 16:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, elohimes, zhangyu31, xieyongji,
	Jason Wang, qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime,
	chaiwen, nixun

On Wed, Dec 19, 2018 at 10:55:09AM -0500, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2018 at 04:09:19PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > > > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > > > Hi
> > > > > 
> > > > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > > > >
> > > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > > >
> > > > > > New option "disconnected" is added to init the chardev socket
> > > > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > > > to connect when necessary. Now it would be used for unix domain
> > > > > > socket of vhost-user-blk device to support reconnect.
> > > > > 
> > > > > What difference does that make if you wait for connection in
> > > > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > > > 
> > > > > "disconnected" is misleading, would it be possible to reuse
> > > > > "wait/nowait" instead?
> > > > 
> > > > Currently we default to doing a blocking connect in foreground,
> > > > except if reconnect is non-zero, in which case we do a connect
> > > > async in the background. This "disconnected" proposal effectively
> > > > does a blocking connect, but delayed to later in startup.
> > > > 
> > > > IOW, this could already be achieved if "reconnect" were set to
> > > > non-zero. If the usage doesn't want reconnect though, I tend
> > > > to agree that we should use the exisiting wait/nowait options
> > > > to let it be controlled. I don't see that this "disconnected"
> > > > option gives a compelling benefit over using wait/nowait.
> > > 
> > > So the tricky thing is that we can not expose the
> > > device to guest until we get a connection and can query
> > > it for the first time. After that is completed,
> > > we can reasonably support disconnected operation at least for net.
> > 
> > The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
> > so that its no different to the situation with the proposed "disconnected"
> > flag.
> > 
> > Regards,
> > Daniel
> 
> I guess so, but wouldn't it be confusing to users that one says
> "nowait" and qemu still waits for a connection and does not
> run the VM? That's different from how nowait behaves normally.

Well "nowait" is only referring to the chardev behaviour which is
still an accurate description.

The fact that the network then waits during startup is a seperate
matter. We don't define chardev semantics in terms of possible
users as there are many distinct users of chardevs in QEMU.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-19 16:01             ` Daniel P. Berrangé
@ 2018-12-19 16:50               ` Michael S. Tsirkin
  2018-12-19 17:09                 ` Daniel P. Berrangé
  0 siblings, 1 reply; 30+ messages in thread
From: Michael S. Tsirkin @ 2018-12-19 16:50 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Marc-André Lureau, elohimes, zhangyu31, xieyongji,
	Jason Wang, qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime,
	chaiwen, nixun

On Wed, Dec 19, 2018 at 04:01:02PM +0000, Daniel P. Berrangé wrote:
> On Wed, Dec 19, 2018 at 10:55:09AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Dec 18, 2018 at 04:09:19PM +0000, Daniel P. Berrangé wrote:
> > > On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > > > > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > > > > Hi
> > > > > > 
> > > > > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > > > > >
> > > > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > > > >
> > > > > > > New option "disconnected" is added to init the chardev socket
> > > > > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > > > > to connect when necessary. Now it would be used for unix domain
> > > > > > > socket of vhost-user-blk device to support reconnect.
> > > > > > 
> > > > > > What difference does that make if you wait for connection in
> > > > > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > > > > 
> > > > > > "disconnected" is misleading, would it be possible to reuse
> > > > > > "wait/nowait" instead?
> > > > > 
> > > > > Currently we default to doing a blocking connect in foreground,
> > > > > except if reconnect is non-zero, in which case we do a connect
> > > > > async in the background. This "disconnected" proposal effectively
> > > > > does a blocking connect, but delayed to later in startup.
> > > > > 
> > > > > IOW, this could already be achieved if "reconnect" were set to
> > > > > non-zero. If the usage doesn't want reconnect though, I tend
> > > > > to agree that we should use the exisiting wait/nowait options
> > > > > to let it be controlled. I don't see that this "disconnected"
> > > > > option gives a compelling benefit over using wait/nowait.
> > > > 
> > > > So the tricky thing is that we can not expose the
> > > > device to guest until we get a connection and can query
> > > > it for the first time. After that is completed,
> > > > we can reasonably support disconnected operation at least for net.
> > > 
> > > The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
> > > so that its no different to the situation with the proposed "disconnected"
> > > flag.
> > > 
> > > Regards,
> > > Daniel
> > 
> > I guess so, but wouldn't it be confusing to users that one says
> > "nowait" and qemu still waits for a connection and does not
> > run the VM? That's different from how nowait behaves normally.
> 
> Well "nowait" is only referring to the chardev behaviour which is
> still an accurate description.

man page seems to say

           nowait specifies that QEMU should not block waiting for a client to connect to a listening socket.

if we do wait for a client to connect then how is it accurate?

> The fact that the network then waits during startup is a seperate
> matter. We don't define chardev semantics in terms of possible
> users as there are many distinct users of chardevs in QEMU.
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-19 16:50               ` Michael S. Tsirkin
@ 2018-12-19 17:09                 ` Daniel P. Berrangé
  2018-12-19 18:18                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel P. Berrangé @ 2018-12-19 17:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, elohimes, zhangyu31, xieyongji,
	Jason Wang, qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime,
	chaiwen, nixun

On Wed, Dec 19, 2018 at 11:50:38AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 19, 2018 at 04:01:02PM +0000, Daniel P. Berrangé wrote:
> > On Wed, Dec 19, 2018 at 10:55:09AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Dec 18, 2018 at 04:09:19PM +0000, Daniel P. Berrangé wrote:
> > > > On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > > > > > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > > > > > Hi
> > > > > > > 
> > > > > > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > > > > > >
> > > > > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > > > > >
> > > > > > > > New option "disconnected" is added to init the chardev socket
> > > > > > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > > > > > to connect when necessary. Now it would be used for unix domain
> > > > > > > > socket of vhost-user-blk device to support reconnect.
> > > > > > > 
> > > > > > > What difference does that make if you wait for connection in
> > > > > > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > > > > > 
> > > > > > > "disconnected" is misleading, would it be possible to reuse
> > > > > > > "wait/nowait" instead?
> > > > > > 
> > > > > > Currently we default to doing a blocking connect in foreground,
> > > > > > except if reconnect is non-zero, in which case we do a connect
> > > > > > async in the background. This "disconnected" proposal effectively
> > > > > > does a blocking connect, but delayed to later in startup.
> > > > > > 
> > > > > > IOW, this could already be achieved if "reconnect" were set to
> > > > > > non-zero. If the usage doesn't want reconnect though, I tend
> > > > > > to agree that we should use the exisiting wait/nowait options
> > > > > > to let it be controlled. I don't see that this "disconnected"
> > > > > > option gives a compelling benefit over using wait/nowait.
> > > > > 
> > > > > So the tricky thing is that we can not expose the
> > > > > device to guest until we get a connection and can query
> > > > > it for the first time. After that is completed,
> > > > > we can reasonably support disconnected operation at least for net.
> > > > 
> > > > The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
> > > > so that its no different to the situation with the proposed "disconnected"
> > > > flag.
> > > > 
> > > > Regards,
> > > > Daniel
> > > 
> > > I guess so, but wouldn't it be confusing to users that one says
> > > "nowait" and qemu still waits for a connection and does not
> > > run the VM? That's different from how nowait behaves normally.
> > 
> > Well "nowait" is only referring to the chardev behaviour which is
> > still an accurate description.
> 
> man page seems to say
> 
>            nowait specifies that QEMU should not block waiting
>            for a client to connect to a listening socket.
> 
> if we do wait for a client to connect then how is it accurate?

Well the manpage needs to be clarified that this applies to the
initialization of the chardev.  What a downstream objet/device
does with the chardev is outside the scope of chardev config
options.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-19 17:09                 ` Daniel P. Berrangé
@ 2018-12-19 18:18                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2018-12-19 18:18 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Marc-André Lureau, elohimes, zhangyu31, xieyongji,
	Jason Wang, qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime,
	chaiwen, nixun

On Wed, Dec 19, 2018 at 05:09:09PM +0000, Daniel P. Berrangé wrote:
> On Wed, Dec 19, 2018 at 11:50:38AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 19, 2018 at 04:01:02PM +0000, Daniel P. Berrangé wrote:
> > > On Wed, Dec 19, 2018 at 10:55:09AM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Dec 18, 2018 at 04:09:19PM +0000, Daniel P. Berrangé wrote:
> > > > > On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > > > > > > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > > > > > > Hi
> > > > > > > > 
> > > > > > > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > > > > > >
> > > > > > > > > New option "disconnected" is added to init the chardev socket
> > > > > > > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > > > > > > to connect when necessary. Now it would be used for unix domain
> > > > > > > > > socket of vhost-user-blk device to support reconnect.
> > > > > > > > 
> > > > > > > > What difference does that make if you wait for connection in
> > > > > > > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > > > > > > 
> > > > > > > > "disconnected" is misleading, would it be possible to reuse
> > > > > > > > "wait/nowait" instead?
> > > > > > > 
> > > > > > > Currently we default to doing a blocking connect in foreground,
> > > > > > > except if reconnect is non-zero, in which case we do a connect
> > > > > > > async in the background. This "disconnected" proposal effectively
> > > > > > > does a blocking connect, but delayed to later in startup.
> > > > > > > 
> > > > > > > IOW, this could already be achieved if "reconnect" were set to
> > > > > > > non-zero. If the usage doesn't want reconnect though, I tend
> > > > > > > to agree that we should use the exisiting wait/nowait options
> > > > > > > to let it be controlled. I don't see that this "disconnected"
> > > > > > > option gives a compelling benefit over using wait/nowait.
> > > > > > 
> > > > > > So the tricky thing is that we can not expose the
> > > > > > device to guest until we get a connection and can query
> > > > > > it for the first time. After that is completed,
> > > > > > we can reasonably support disconnected operation at least for net.
> > > > > 
> > > > > The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
> > > > > so that its no different to the situation with the proposed "disconnected"
> > > > > flag.
> > > > > 
> > > > > Regards,
> > > > > Daniel
> > > > 
> > > > I guess so, but wouldn't it be confusing to users that one says
> > > > "nowait" and qemu still waits for a connection and does not
> > > > run the VM? That's different from how nowait behaves normally.
> > > 
> > > Well "nowait" is only referring to the chardev behaviour which is
> > > still an accurate description.
> > 
> > man page seems to say
> > 
> >            nowait specifies that QEMU should not block waiting
> >            for a client to connect to a listening socket.
> > 
> > if we do wait for a client to connect then how is it accurate?
> 
> Well the manpage needs to be clarified that this applies to the
> initialization of the chardev.  What a downstream objet/device
> does with the chardev is outside the scope of chardev config
> options.
> 
> Regards,
> Daniel

Practically qemu blocks. I suspect the distinction which object
within qemu does it will be lost on users.

And it's not exactly about waiting anyway.
It's about whether we should stop the VM on disconnect
which I feel is different.

So I think we need a new flag along the lines of
	disconnect=ignore/exit/stop

with proper documentation.

> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket
  2018-12-19 15:55           ` Michael S. Tsirkin
  2018-12-19 16:01             ` Daniel P. Berrangé
@ 2018-12-20  4:25             ` Yongji Xie
  1 sibling, 0 replies; 30+ messages in thread
From: Yongji Xie @ 2018-12-20  4:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daniel P. Berrangé,
	Marc-André Lureau, zhangyu31, Xie Yongji, Jason Wang,
	qemu-devel, lilin24, Yury Kotov, Coquelin, Maxime, chaiwen,
	nixun

On Wed, 19 Dec 2018 at 23:55, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Dec 18, 2018 at 04:09:19PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Dec 18, 2018 at 11:02:46AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Dec 18, 2018 at 03:25:20PM +0000, Daniel P. Berrangé wrote:
> > > > On Tue, Dec 18, 2018 at 04:24:26PM +0400, Marc-André Lureau wrote:
> > > > > Hi
> > > > >
> > > > > On Tue, Dec 18, 2018 at 2:01 PM <elohimes@gmail.com> wrote:
> > > > > >
> > > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > > >
> > > > > > New option "disconnected" is added to init the chardev socket
> > > > > > in disconnected state. Then we can use qemu_chr_fe_wait_connected()
> > > > > > to connect when necessary. Now it would be used for unix domain
> > > > > > socket of vhost-user-blk device to support reconnect.
> > > > >
> > > > > What difference does that make if you wait for connection in
> > > > > qemu_chr_fe_wait_connected(), or during chardev setup?
> > > > >
> > > > > "disconnected" is misleading, would it be possible to reuse
> > > > > "wait/nowait" instead?
> > > >
> > > > Currently we default to doing a blocking connect in foreground,
> > > > except if reconnect is non-zero, in which case we do a connect
> > > > async in the background. This "disconnected" proposal effectively
> > > > does a blocking connect, but delayed to later in startup.
> > > >
> > > > IOW, this could already be achieved if "reconnect" were set to
> > > > non-zero. If the usage doesn't want reconnect though, I tend
> > > > to agree that we should use the exisiting wait/nowait options
> > > > to let it be controlled. I don't see that this "disconnected"
> > > > option gives a compelling benefit over using wait/nowait.
> > >
> > > So the tricky thing is that we can not expose the
> > > device to guest until we get a connection and can query
> > > it for the first time. After that is completed,
> > > we can reasonably support disconnected operation at least for net.
> >
> > The device code would still use  qemu_chr_fe_wait_connected() in my proposal,
> > so that its no different to the situation with the proposed "disconnected"
> > flag.
> >
> > Regards,
> > Daniel
>
> I guess so, but wouldn't it be confusing to users that one says
> "nowait" and qemu still waits for a connection and does not
> run the VM? That's different from how nowait behaves normally.
>

With this patch:

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 2464d7a..ffe3a60 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -97,7 +97,10 @@ static void qemu_chr_socket_restart_timer(Chardev *chr)
     SocketChardev *s = SOCKET_CHARDEV(chr);
     char *name;

-    assert(s->connected == 0);
+    if (s->connected) {
+        return;
+    }
+
     name = g_strdup_printf("chardev-socket-reconnect-%s", chr->label);
     s->reconnect_timer = qemu_chr_timeout_add_ms(chr,
                                                  s->reconnect_time * 1000,

We can use qemu_chr_fe_wait_connected() without adding redundant
"wait" or "disconnected" option.
The disadavantage is that backend may have two connection from qemu
during initialization. But I think backend should support this case.

Thanks,
Yongji

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-12-20  4:26 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  9:59 [Qemu-devel] [PATCH v2 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting elohimes
2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 1/7] chardev: Add disconnected option for chardev socket elohimes
2018-12-18 12:24   ` Marc-André Lureau
2018-12-18 13:33     ` Yongji Xie
2018-12-18 15:25     ` Daniel P. Berrangé
2018-12-18 16:02       ` Michael S. Tsirkin
2018-12-18 16:09         ` Daniel P. Berrangé
2018-12-19  9:01           ` Yongji Xie
2018-12-19 15:55           ` Michael S. Tsirkin
2018-12-19 16:01             ` Daniel P. Berrangé
2018-12-19 16:50               ` Michael S. Tsirkin
2018-12-19 17:09                 ` Daniel P. Berrangé
2018-12-19 18:18                   ` Michael S. Tsirkin
2018-12-20  4:25             ` Yongji Xie
2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 2/7] vhost-user: Support providing shared memory to backend elohimes
2018-12-18 14:25   ` Michael S. Tsirkin
2018-12-18 14:47     ` Yongji Xie
2018-12-18 14:57       ` Michael S. Tsirkin
2018-12-18 15:10         ` Yongji Xie
2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 3/7] libvhost-user: Introduce vu_queue_map_desc() elohimes
2018-12-18  9:59 ` [Qemu-devel] [PATCH v2 for-4.0 4/7] libvhost-user: Support recording inflight I/O in shared memory elohimes
2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 5/7] vhost-user-blk: Add support to provide shared memory to backend elohimes
2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 6/7] vhost-user-blk: Add support to reconnect backend elohimes
2018-12-18 12:30   ` Yury Kotov
2018-12-18 14:16     ` Yongji Xie
2018-12-18 14:35       ` Yury Kotov
2018-12-18 14:59         ` Yongji Xie
2018-12-18 15:33           ` Yury Kotov
2018-12-19  8:42             ` Yongji Xie
2018-12-18 10:00 ` [Qemu-devel] [PATCH v2 for-4.0 7/7] contrib/vhost-user-blk: enable inflight I/O recording elohimes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.