[Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
@ 2015-09-08 23:09 marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 01/14] vhost-user: Add ability to know vhost-user backend disconnection marcandre.lureau
                   ` (14 more replies)
  0 siblings, 15 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

In a previous series "Add feature to start QEMU without vhost-user
backend", Tetsuya Mukawa proposed to allow the vhost-user backend to
disconnect and reconnect. However, Michael Tsirkin pointed out that
you can't do that without extra care, because the guest and hypervisor
don't know the slave ring manipulation state, there might be pending
replies for example that could be lost, and suggested to reset the
guest queues, but this requires kernel changes, and it may have to
clear the ring and lose queued packets.

The following series starts from the idea that the slave can request a
"managed" shutdown instead and later recover (I guess the use case for
this is to allow for example to update static dispatching/filter rules
etc)

In order to do it, the slave must be in a good state, that is it
should flush all pending buffers so that resume after
VHOST_SET_VRING_BASE is enough to resume where it lefts. The guest is
made aware of virtio-net disconnection thanks to VIRTIO_NET_S_LINK_UP
status, so communication can be stopped.

Unfortunately, vhost-user protocol isn't bidirectional, so a new
optional communication channel is added for the slave to make request
to the master, such as a the new shutdown request.

I have done some testing with modified vapp and linux 4.2, it seems to
work just fine. But more intensive testing and review are required, as
I am not sure this approach can be made solid enough. Before going
further, I would welcome any comment or testing suggestions!

The series is based on top of pending vhost-user migration series, but
for easier testing you may just use the following git repo:
https://github.com/elmarco/qemu vhost-user-reconnect branch

Marc-André Lureau (12):
  vhost-user: remove useless is_server field
  qemu-char: avoid potential double-free
  qemu-char: remove all msgfds on disconnect
  qemu-char: make tcp_chr_disconnect() reentrant-safe
  vhost-net: keep VIRTIO_NET_F_STATUS for vhost-user
  virtio-net: enable tx notification if up and vhost started
  vhost: add vhost_dev stop callback
  vhost-user: add vhost_user to hold the chr
  qemu-char: add qemu_chr_free()
  vhost-user: add slave-fd support
  vhost-user: add shutdown support
  test: start vhost-user reconnect test

Tetsuya Mukawa (2):
  vhost-user: Add ability to know vhost-user backend disconnection
  qemu-char: Add qemu_chr_disconnect to close a fd accepted by listen fd

 docs/specs/vhost-user.txt |  38 +++++++++++
 hw/net/vhost_net.c        |  14 +++-
 hw/net/virtio-net.c       |   4 ++
 hw/virtio/vhost-user.c    | 120 +++++++++++++++++++++++++++++++--
 include/hw/virtio/vhost.h |   4 ++
 include/sysemu/char.h     |  17 ++++-
 net/vhost-user.c          |  20 +++++-
 qemu-char.c               |  23 ++++++-
 tests/Makefile            |   2 +-
 tests/vhost-user-test.c   | 168 ++++++++++++++++++++++++++++++++++++++++++----
 10 files changed, 384 insertions(+), 26 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 01/14] vhost-user: Add ability to know vhost-user backend disconnection
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
@ 2015-09-08 23:09 ` marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 02/14] vhost-user: remove useless is_server field marcandre.lureau
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, mst

From: Tetsuya Mukawa <mukawa@igel.co.jp>

Current QEMU cannot detect vhost-user backend disconnection. The
patch adds ability to know it.
To know disconnection, add watcher to detect G_IO_HUP event. When
G_IO_HUP event is detected, the disconnected socket will be read
to cause a CHR_EVENT_CLOSED.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 net/vhost-user.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 5657992..266b54d 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -20,6 +20,7 @@ typedef struct VhostUserState {
     NetClientState nc;
     CharDriverState *chr;
     VHostNetState *vhost_net;
+    int watch;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -120,6 +121,20 @@ static void net_vhost_link_down(VhostUserState *s, bool link_down)
     }
 }
 
+static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
+                                           void *opaque)
+{
+    VhostUserState *s = opaque;
+    uint8_t buf[1];
+
+    /* We don't actually want to read anything, but CHR_EVENT_CLOSED will be
+     * raised as a side-effect of the read.
+     */
+    qemu_chr_fe_read_all(s->chr, buf, sizeof(buf));
+
+    return FALSE;
+}
+
 static void net_vhost_user_event(void *opaque, int event)
 {
     VhostUserState *s = opaque;
@@ -128,12 +143,15 @@ static void net_vhost_user_event(void *opaque, int event)
 
     switch (event) {
     case CHR_EVENT_OPENED:
+        s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP, net_vhost_user_watch, s);
         vhost_user_start(s);
         net_vhost_link_down(s, false);
         break;
     case CHR_EVENT_CLOSED:
         net_vhost_link_down(s, true);
         vhost_user_stop(s);
+        g_source_remove(s->watch);
+        s->watch = 0;
         break;
     }
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 02/14] vhost-user: remove useless is_server field
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 01/14] vhost-user: Add ability to know vhost-user backend disconnection marcandre.lureau
@ 2015-09-08 23:09 ` marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 03/14] qemu-char: avoid potential double-free marcandre.lureau
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 net/vhost-user.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 266b54d..5d24129 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -26,7 +26,6 @@ typedef struct VhostUserState {
 typedef struct VhostUserChardevProps {
     bool is_socket;
     bool is_unix;
-    bool is_server;
 } VhostUserChardevProps;
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
@@ -187,7 +186,6 @@ static int net_vhost_chardev_opts(void *opaque,
     } else if (strcmp(name, "path") == 0) {
         props->is_unix = true;
     } else if (strcmp(name, "server") == 0) {
-        props->is_server = true;
     } else {
         error_setg(errp,
                    "vhost-user does not support a chardev with option %s=%s",
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 03/14] qemu-char: avoid potential double-free
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 01/14] vhost-user: Add ability to know vhost-user backend disconnection marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 02/14] vhost-user: remove useless is_server field marcandre.lureau
@ 2015-09-08 23:09 ` marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 04/14] qemu-char: remove all msgfds on disconnect marcandre.lureau
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

If tcp_set_msgfds() is called several time with NULL fds, this
could lead to double-free.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qemu-char.c b/qemu-char.c
index d956f8d..bc37628 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2713,6 +2713,7 @@ static int tcp_set_msgfds(CharDriverState *chr, int *fds, int num)
     /* clear old pending fd array */
     if (s->write_msgfds) {
         g_free(s->write_msgfds);
+        s->write_msgfds = NULL;
     }
 
     if (num) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 04/14] qemu-char: remove all msgfds on disconnect
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (2 preceding siblings ...)
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 03/14] qemu-char: avoid potential double-free marcandre.lureau
@ 2015-09-08 23:09 ` marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 05/14] qemu-char: make tcp_chr_disconnect() reentrant-safe marcandre.lureau
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Disconnect should reset context.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qemu-char.c b/qemu-char.c
index bc37628..d34bfd1 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2846,6 +2846,7 @@ static void tcp_chr_disconnect(CharDriverState *chr)
     SocketAddress_to_str(chr->filename, CHR_MAX_FILENAME_SIZE,
                          "disconnected:", s->addr, s->is_listen, s->is_telnet);
     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+    tcp_set_msgfds(chr, NULL, 0);
     if (s->reconnect_time) {
         qemu_chr_socket_restart_timer(chr);
     }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 05/14] qemu-char: make tcp_chr_disconnect() reentrant-safe
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (3 preceding siblings ...)
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 04/14] qemu-char: remove all msgfds on disconnect marcandre.lureau
@ 2015-09-08 23:09 ` marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 06/14] vhost-net: keep VIRTIO_NET_F_STATUS for vhost-user marcandre.lureau
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

During CHR_EVENT_CLOSED, the function could be reentered, make this
case safe.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/qemu-char.c b/qemu-char.c
index d34bfd1..c37a9f9 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2833,6 +2833,10 @@ static void tcp_chr_disconnect(CharDriverState *chr)
 {
     TCPCharDriver *s = chr->opaque;
 
+    if (!s->connected) {
+        return;
+    }
+
     s->connected = 0;
     if (s->listen_chan) {
         s->listen_tag = g_io_add_watch(s->listen_chan, G_IO_IN,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 06/14] vhost-net: keep VIRTIO_NET_F_STATUS for vhost-user
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (4 preceding siblings ...)
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 05/14] qemu-char: make tcp_chr_disconnect() reentrant-safe marcandre.lureau
@ 2015-09-08 23:09 ` marcandre.lureau
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 07/14] virtio-net: enable tx notification if up and vhost started marcandre.lureau
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Even if slave does not declare VIRTIO_NET_F_STATUS, we can keep
this feature enabled as it is driven by qemu.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 9850520..ea15220 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -77,7 +77,6 @@ static const int user_feature_bits[] = {
     VIRTIO_NET_F_HOST_ECN,
     VIRTIO_NET_F_HOST_UFO,
     VIRTIO_NET_F_MRG_RXBUF,
-    VIRTIO_NET_F_STATUS,
     VIRTIO_NET_F_CTRL_VQ,
     VIRTIO_NET_F_CTRL_RX,
     VIRTIO_NET_F_CTRL_VLAN,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 07/14] virtio-net: enable tx notification if up and vhost started
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (5 preceding siblings ...)
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 06/14] vhost-net: keep VIRTIO_NET_F_STATUS for vhost-user marcandre.lureau
@ 2015-09-08 23:09 ` marcandre.lureau
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 08/14] vhost: add vhost_dev stop callback marcandre.lureau
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

When vhost is restarted, make sure the tx notification is on.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/virtio-net.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 8d28e45..d494c45 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -195,6 +195,10 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
             } else {
                 qemu_bh_cancel(q->tx_bh);
             }
+
+            if (n->vhost_started) {
+                virtio_queue_set_notification(q->tx_vq, 1);
+            }
         }
     }
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 08/14] vhost: add vhost_dev stop callback
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (6 preceding siblings ...)
  2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 07/14] virtio-net: enable tx notification if up and vhost started marcandre.lureau
@ 2015-09-08 23:10 ` marcandre.lureau
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 09/14] vhost-user: add vhost_user to hold the chr marcandre.lureau
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

vhost backend may want to stop the device, for example if it wants to
restart itself (translates to a link down for vhost-net).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c        | 13 +++++++++++++
 include/hw/virtio/vhost.h |  4 ++++
 2 files changed, 17 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ea15220..f977e2d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -134,6 +134,18 @@ static int vhost_net_get_fd(NetClientState *backend)
     }
 }
 
+static void vhost_net_backend_stop(struct vhost_dev *dev)
+{
+    struct vhost_net *net = container_of(dev, struct vhost_net, dev);
+    NetClientState *nc = net->nc;
+    NetClientState *peer = nc->peer;
+
+    peer->link_down = 1;
+    if (peer->info->link_status_changed) {
+        peer->info->link_status_changed(peer);
+    }
+}
+
 struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
     int r;
@@ -163,6 +175,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 
     net->dev.nvqs = 2;
     net->dev.vqs = net->vqs;
+    net->dev.stop = vhost_net_backend_stop;
 
     r = vhost_dev_init(&net->dev, options->opaque,
                        options->backend_type);
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index ab1dcac..48efd87 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -35,6 +35,8 @@ struct vhost_log {
     vhost_log_chunk_t *log;
 };
 
+typedef void (*vhost_stop)(struct vhost_dev *dev);
+
 struct vhost_memory;
 struct vhost_dev {
     MemoryListener memory_listener;
@@ -59,6 +61,8 @@ struct vhost_dev {
     const VhostOps *vhost_ops;
     void *opaque;
     struct vhost_log *log;
+    /* backend request to stop */
+    vhost_stop stop;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 09/14] vhost-user: add vhost_user to hold the chr
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (7 preceding siblings ...)
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 08/14] vhost: add vhost_dev stop callback marcandre.lureau
@ 2015-09-08 23:10 ` marcandre.lureau
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 10/14] qemu-char: add qemu_chr_free() marcandre.lureau
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Next patches will add more fields to the structure

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost-user.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b2f46a9..2875b69 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -91,6 +91,10 @@ static VhostUserMsg m __attribute__ ((unused));
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    (0x1)
 
+struct vhost_user {
+    CharDriverState *chr;
+};
+
 static bool ioeventfd_enabled(void)
 {
     return kvm_enabled() && kvm_eventfds_enabled();
@@ -134,7 +138,8 @@ static VhostUserRequest vhost_user_request_translate(unsigned long int request)
 
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
-    CharDriverState *chr = dev->opaque;
+    struct vhost_user *u = dev->opaque;
+    CharDriverState *chr = u->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -181,7 +186,8 @@ fail:
 static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
-    CharDriverState *chr = dev->opaque;
+    struct vhost_user *u = dev->opaque;
+    CharDriverState *chr = u->chr;
     int size = VHOST_USER_HDR_SIZE + msg->size;
 
     if (fd_num) {
@@ -358,11 +364,14 @@ end:
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     VhostUserMsg msg = { 0 };
+    struct vhost_user *u;
     int err;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
-    dev->opaque = opaque;
+    u = g_new0(struct vhost_user, 1);
+    u->chr = opaque;
+    dev->opaque = u;
 
     msg.request = VHOST_USER_GET_FEATURES;
     msg.flags = VHOST_USER_VERSION;
@@ -413,8 +422,12 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
+    struct vhost_user *u;
+
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
+    u = dev->opaque;
+    g_free(u);
     dev->opaque = 0;
 
     return 0;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 10/14] qemu-char: add qemu_chr_free()
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (8 preceding siblings ...)
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 09/14] vhost-user: add vhost_user to hold the chr marcandre.lureau
@ 2015-09-08 23:10 ` marcandre.lureau
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 11/14] vhost-user: add slave-fd support marcandre.lureau
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

If a chardev is allowed to be created outside of QMP, then it must be
also possible to free it. This is useful for ivshmem that creates
chardev anonymously and must be able to free them.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/sysemu/char.h | 10 +++++++++-
 qemu-char.c           |  9 +++++++--
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 832b7fe..5fd0a09 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -128,11 +128,19 @@ CharDriverState *qemu_chr_new(const char *label, const char *filename,
 /**
  * @qemu_chr_delete:
  *
- * Destroy a character backend.
+ * Destroy a character backend and remove it from the list of
+ * identified character backends.
  */
 void qemu_chr_delete(CharDriverState *chr);
 
 /**
+ * @qemu_chr_free:
+ *
+ * Destroy a character backend.
+ */
+void qemu_chr_free(CharDriverState *chr);
+
+/**
  * @qemu_chr_fe_set_echo:
  *
  * Ask the backend to override its normal echo setting.  This only really
diff --git a/qemu-char.c b/qemu-char.c
index c37a9f9..a6b9bd2 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3849,9 +3849,8 @@ void qemu_chr_fe_release(CharDriverState *s)
     s->avail_connections++;
 }
 
-void qemu_chr_delete(CharDriverState *chr)
+void qemu_chr_free(CharDriverState *chr)
 {
-    QTAILQ_REMOVE(&chardevs, chr, next);
     if (chr->chr_close) {
         chr->chr_close(chr);
     }
@@ -3861,6 +3860,12 @@ void qemu_chr_delete(CharDriverState *chr)
     g_free(chr);
 }
 
+void qemu_chr_delete(CharDriverState *chr)
+{
+    QTAILQ_REMOVE(&chardevs, chr, next);
+    qemu_chr_free(chr);
+}
+
 ChardevInfoList *qmp_query_chardev(Error **errp)
 {
     ChardevInfoList *chr_list = NULL;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 11/14] vhost-user: add slave-fd support
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (9 preceding siblings ...)
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 10/14] qemu-char: add qemu_chr_free() marcandre.lureau
@ 2015-09-08 23:10 ` marcandre.lureau
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 12/14] vhost-user: add shutdown support marcandre.lureau
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Learn to give a socket to the slave to let him make requests to the
master.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 docs/specs/vhost-user.txt | 23 +++++++++++++++
 hw/virtio/vhost-user.c    | 71 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 1bc6adb..5e00bd3 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -136,10 +136,23 @@ As older slaves don't support negotiating protocol features,
 a feature bit was dedicated for this purpose:
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 
+Slave communication
+-------------------
+
+Since the vhost-user protocol do not let slave make requests back to
+the master, an optional communication channel is provided if the slave
+declares VHOST_USER_PROTOCOL_F_SLAVE_REQ feature.
+
+The fd is provided via VHOST_USER_SET_SLAVE_FD ancillary data.
+
+A slave may then send VHOST_USER_SLAVE_* messages to the master by
+using this fd.
+
 Protocol features
 -----------------
 
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 0
+#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 1
 
 Message types
 -------------
@@ -308,6 +321,16 @@ Message types
       invalid FD flag. This flag is set when there is no file descriptor
       in the ancillary data.
 
+ * VHOST_USER_SET_SLAVE_FD
+      Id: 17
+      Equivalent ioctl: N/A
+      Master payload: N/A
+
+      Set the file descriptor for the salve to make VHOST_USER_SLAVE_*
+      request to the master. It is passed in the ancillary data.
+      This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_REQ
+      feature is available.
+
 Migration
 ---------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2875b69..49f566c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -27,8 +27,9 @@
 
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 
-#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x1ULL
+#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x3ULL
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 0
+#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 1
 
 typedef enum VhostUserRequest {
     VHOST_USER_NONE = 0,
@@ -48,9 +49,15 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_VRING_ERR = 14,
     VHOST_USER_GET_PROTOCOL_FEATURES = 15,
     VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+    VHOST_USER_SET_SLAVE_FD = 17,
     VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+    VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_MAX
+}  VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -93,6 +100,7 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
     CharDriverState *chr;
+    CharDriverState *slave_chr;
 };
 
 static bool ioeventfd_enabled(void)
@@ -277,6 +285,7 @@ static int vhost_user_call(struct vhost_dev *dev,
 
         break;
 
+    case VHOST_USER_SET_SLAVE_FD:
     case VHOST_USER_SET_LOG_FD:
         fds[fd_num++] = *((int *) arg);
         break;
@@ -361,6 +370,43 @@ end:
     return ret;
 }
 
+static int slave_can_receive(void *opaque)
+{
+    return VHOST_USER_HDR_SIZE;
+}
+
+static void slave_receive(void *opaque, const uint8_t *buf, int size)
+{
+    struct vhost_dev *dev = opaque;
+    VhostUserMsg *msg = (VhostUserMsg *)buf;
+
+    if (size != VHOST_USER_HDR_SIZE) {
+        error_report("Failed to read from slave.");
+        return;
+    }
+
+    switch (msg->request) {
+    default:
+        error_report("Received unexpected msg type.");
+    }
+}
+
+static void slave_event(void *opaque, int event)
+{
+    struct vhost_dev *dev = opaque;
+    struct vhost_user *u = dev->opaque;
+    CharDriverState *slave_chr = u->slave_chr;
+
+    switch (event) {
+    case CHR_EVENT_CLOSED:
+        if (slave_chr) {
+            u->slave_chr = NULL;
+            qemu_chr_free(slave_chr);
+        }
+        break;
+    }
+}
+
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     VhostUserMsg msg = { 0 };
@@ -417,16 +463,39 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
         }
     }
 
+    if (__virtio_has_feature(dev->protocol_features,
+                             VHOST_USER_PROTOCOL_F_SLAVE_REQ)) {
+        int sv[2];
+
+        if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+            error_report("socketpair() failed");
+            return -1;
+        }
+
+        vhost_user_call(dev, VHOST_USER_SET_SLAVE_FD, &sv[1]);
+
+        u->slave_chr = qemu_chr_open_eventfd(sv[0]);
+        qemu_chr_add_handlers(u->slave_chr, slave_can_receive, slave_receive,
+                              slave_event, dev);
+    }
+
+
     return 0;
 }
 
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
+    CharDriverState *slave_chr;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = dev->opaque;
+    if (u->slave_chr) {
+        slave_chr = u->slave_chr;
+        u->slave_chr = NULL;
+        qemu_chr_free(slave_chr);
+    }
     g_free(u);
     dev->opaque = 0;
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 12/14] vhost-user: add shutdown support
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (10 preceding siblings ...)
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 11/14] vhost-user: add slave-fd support marcandre.lureau
@ 2015-09-08 23:10 ` marcandre.lureau
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 13/14] qemu-char: Add qemu_chr_disconnect to close a fd accepted by listen fd marcandre.lureau
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 docs/specs/vhost-user.txt | 15 +++++++++++++++
 hw/virtio/vhost-user.c    | 30 +++++++++++++++++++++++++++++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 5e00bd3..854493e 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -331,6 +331,21 @@ Message types
       This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_REQ
       feature is available.
 
+Slave message types
+-------------------
+
+ * VHOST_USER_SLAVE_SHUTDOWN:
+      Id: 1
+      Master payload: N/A
+      Slave payload: u64
+
+      Request the master to shutdown the slave. A 0 reply is for
+      success, in which case the slave may close all connections
+      immediately and quit. A non-zero reply cancels the request.
+
+      Before a reply comes, the master may make other requests in
+      order to flush or sync state.
+
 Migration
 ---------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 49f566c..949382c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -55,6 +55,7 @@ typedef enum VhostUserRequest {
 
 typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_SHUTDOWN = 1,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -140,7 +141,7 @@ static VhostUserRequest vhost_user_request_translate(unsigned long int request)
     case VHOST_SET_VRING_ERR:
         return VHOST_USER_SET_VRING_ERR;
     default:
-        return VHOST_USER_MAX;
+        return request;
     }
 }
 
@@ -375,6 +376,21 @@ static int slave_can_receive(void *opaque)
     return VHOST_USER_HDR_SIZE;
 }
 
+static int vhost_user_slave_write(struct vhost_dev *dev, VhostUserMsg *msg,
+                                  int *fds, int fd_num)
+{
+    struct vhost_user *u = dev->opaque;
+    CharDriverState *chr = u->slave_chr;
+    int size = VHOST_USER_HDR_SIZE + msg->size;
+
+    if (fd_num) {
+        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
+    }
+
+    return qemu_chr_fe_write_all(chr, (const uint8_t *) msg, size) == size ?
+        0 : -1;
+}
+
 static void slave_receive(void *opaque, const uint8_t *buf, int size)
 {
     struct vhost_dev *dev = opaque;
@@ -386,6 +402,18 @@ static void slave_receive(void *opaque, const uint8_t *buf, int size)
     }
 
     switch (msg->request) {
+    case VHOST_USER_SLAVE_SHUTDOWN: {
+        uint64_t success = 1;
+
+        if (dev->stop) {
+            dev->stop(dev);
+            success = 0;
+        }
+
+        msg->u64 = success;
+        vhost_user_slave_write(dev, msg, NULL, 0);
+        return;
+    }
     default:
         error_report("Received unexpected msg type.");
     }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 13/14] qemu-char: Add qemu_chr_disconnect to close a fd accepted by listen fd
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (11 preceding siblings ...)
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 12/14] vhost-user: add shutdown support marcandre.lureau
@ 2015-09-08 23:10 ` marcandre.lureau
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 14/14] test: start vhost-user reconnect test marcandre.lureau
  2015-11-26 10:33 ` [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection Michael S. Tsirkin
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, mst

From: Tetsuya Mukawa <mukawa@igel.co.jp>

The patch introduces qemu_chr_disconnect(). The function is used for
closing a fd accepted by listen fd. Though we already have qemu_chr_delete(),
but it closes not only accepted fd but also listen fd. This new function
is used when we still want to keep listen fd.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 include/sysemu/char.h | 7 +++++++
 qemu-char.c           | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 5fd0a09..9f84f95 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -70,6 +70,7 @@ struct CharDriverState {
     IOReadHandler *chr_read;
     void *handler_opaque;
     void (*chr_close)(struct CharDriverState *chr);
+    void (*chr_disconnect)(struct CharDriverState *chr);
     void (*chr_accept_input)(struct CharDriverState *chr);
     void (*chr_set_echo)(struct CharDriverState *chr, bool echo);
     void (*chr_set_fe_open)(struct CharDriverState *chr, int fe_open);
@@ -124,6 +125,12 @@ CharDriverState *qemu_chr_new_from_opts(QemuOpts *opts,
  */
 CharDriverState *qemu_chr_new(const char *label, const char *filename,
                               void (*init)(struct CharDriverState *s));
+/**
+ * @qemu_chr_disconnect:
+ *
+ * Close a fd accpeted by character backend.
+ */
+void qemu_chr_disconnect(CharDriverState *chr);
 
 /**
  * @qemu_chr_delete:
diff --git a/qemu-char.c b/qemu-char.c
index a6b9bd2..67155e9 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3849,6 +3849,13 @@ void qemu_chr_fe_release(CharDriverState *s)
     s->avail_connections++;
 }
 
+void qemu_chr_disconnect(CharDriverState *chr)
+{
+    if (chr->chr_disconnect) {
+        chr->chr_disconnect(chr);
+    }
+}
+
 void qemu_chr_free(CharDriverState *chr)
 {
     if (chr->chr_close) {
@@ -4172,6 +4179,7 @@ static CharDriverState *qmp_chardev_open_socket(ChardevSocket *sock,
     chr->chr_write = tcp_chr_write;
     chr->chr_sync_read = tcp_chr_sync_read;
     chr->chr_close = tcp_chr_close;
+    chr->chr_disconnect = tcp_chr_disconnect;
     chr->get_msgfds = tcp_get_msgfds;
     chr->set_msgfds = tcp_set_msgfds;
     chr->chr_add_client = tcp_chr_add_client;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 14/14] test: start vhost-user reconnect test
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (12 preceding siblings ...)
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 13/14] qemu-char: Add qemu_chr_disconnect to close a fd accepted by listen fd marcandre.lureau
@ 2015-09-08 23:10 ` marcandre.lureau
  2015-11-26 10:33 ` [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection Michael S. Tsirkin
  14 siblings, 0 replies; 25+ messages in thread
From: marcandre.lureau @ 2015-09-08 23:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: mukawa, Marc-André Lureau, mst

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Check that SLAVE_FD & SHUTDOWN message work and that the master
asked for the ring read status.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/Makefile          |   2 +-
 tests/vhost-user-test.c | 168 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 155 insertions(+), 15 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 34c6136..740542c 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -412,7 +412,7 @@ tests/usb-hcd-uhci-test$(EXESUF): tests/usb-hcd-uhci-test.o $(libqos-usb-obj-y)
 tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o $(libqos-usb-obj-y)
 tests/usb-hcd-xhci-test$(EXESUF): tests/usb-hcd-xhci-test.o $(libqos-usb-obj-y)
 tests/pc-cpu-test$(EXESUF): tests/pc-cpu-test.o
-tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o $(qtest-obj-y)
+tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o $(qtest-obj-y) $(libqos-pc-obj-y) $(libqos-virtio-obj-y)
 tests/qemu-iotests/socket_scm_helper$(EXESUF): tests/qemu-iotests/socket_scm_helper.o
 tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o libqemuutil.a libqemustub.a
 tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o $(block-obj-y) libqemuutil.a libqemustub.a
diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
index f78483d..dba50c0 100644
--- a/tests/vhost-user-test.c
+++ b/tests/vhost-user-test.c
@@ -21,6 +21,12 @@
 #include <sys/mman.h>
 #include <sys/vfs.h>
 #include <qemu/sockets.h>
+#include "libqos/pci-pc.h"
+#include "libqos/virtio.h"
+#include "libqos/virtio-pci.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/malloc-generic.h"
 
 /* GLIB version compatibility flags */
 #if !GLIB_CHECK_VERSION(2, 26, 0)
@@ -40,12 +46,12 @@
 #define QEMU_CMD_ACCEL  " -machine accel=tcg"
 #define QEMU_CMD_MEM    " -m %d -object memory-backend-file,id=mem,size=%dM,"\
                         "mem-path=%s,share=on -numa node,memdev=mem"
-#define QEMU_CMD_CHR    " -chardev socket,id=%s,path=%s"
+#define QEMU_CMD_CHR    " -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=%s,vhostforce"
 #define QEMU_CMD_NET    " -device virtio-net-pci,netdev=net0 "
 #define QEMU_CMD_ROM    " -option-rom ../pc-bios/pxe-virtio.rom"
 
-#define QEMU_CMD        QEMU_CMD_ACCEL QEMU_CMD_MEM QEMU_CMD_CHR \
+#define QEMU_CMD        QEMU_CMD_ACCEL QEMU_CMD_MEM QEMU_CMD_CHR        \
                         QEMU_CMD_NETDEV QEMU_CMD_NET QEMU_CMD_ROM
 
 #define HUGETLBFS_MAGIC       0x958458f6
@@ -56,6 +62,7 @@
 
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 0
+#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 1
 
 #define VHOST_LOG_PAGE 0x1000
 
@@ -77,9 +84,16 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_VRING_ERR = 14,
     VHOST_USER_GET_PROTOCOL_FEATURES = 15,
     VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+    VHOST_USER_SET_SLAVE_FD = 17,
     VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+    VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_SHUTDOWN = 1,
+    VHOST_USER_SLAVE_MAX
+} VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -118,6 +132,7 @@ static VhostUserMsg m __attribute__ ((unused));
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    (0x1)
 /*****************************************************************************/
+#define VIRTIO_QUEUE_MAX 1024
 
 typedef struct TestServer {
     gchar *socket_path;
@@ -129,6 +144,9 @@ typedef struct TestServer {
     GMutex *data_mutex;
     GCond *data_cond;
     int log_fd;
+    int req_fd;
+    int get_base_count;
+    uint16_t set_base[VIRTIO_QUEUE_MAX];
 } TestServer;
 
 static const char *hugefs;
@@ -331,7 +349,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int size)
         /* send back features to qemu */
         msg.flags |= VHOST_USER_REPLY_MASK;
         msg.size = sizeof(m.u64);
-        msg.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
+        msg.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
+            1 << VHOST_USER_PROTOCOL_F_SLAVE_REQ;
         p = (uint8_t *) &msg;
         qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
         break;
@@ -343,6 +362,11 @@ static void chr_read(void *opaque, const uint8_t *buf, int size)
         msg.state.num = 0;
         p = (uint8_t *) &msg;
         qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
+        s->get_base_count++;
+        break;
+
+    case VHOST_USER_SET_VRING_BASE:
+        s->set_base[msg.state.index] = msg.state.num;
         break;
 
     case VHOST_USER_SET_MEM_TABLE:
@@ -371,6 +395,11 @@ static void chr_read(void *opaque, const uint8_t *buf, int size)
         g_cond_signal(s->data_cond);
         break;
 
+    case VHOST_USER_SET_SLAVE_FD:
+        qemu_chr_fe_get_msgfds(chr, &s->req_fd, 1);
+        g_cond_signal(s->data_cond);
+        break;
+
     case VHOST_USER_RESET_OWNER:
         s->fds_num = 0;
         break;
@@ -382,7 +411,7 @@ static void chr_read(void *opaque, const uint8_t *buf, int size)
     g_mutex_unlock(s->data_mutex);
 }
 
-static TestServer *test_server_new(const gchar *name)
+static TestServer *test_server_new(const gchar *name, const gchar *chr_opt)
 {
     TestServer *server = g_new0(TestServer, 1);
     gchar *chr_path;
@@ -390,7 +419,8 @@ static TestServer *test_server_new(const gchar *name)
     server->socket_path = g_strdup_printf("/tmp/vhost-%s-%d.sock",
                                           name, getpid());
 
-    chr_path = g_strdup_printf("unix:%s,server,nowait", server->socket_path);
+    chr_path = g_strdup_printf(chr_opt ?: "unix:%s,server,nowait",
+                               server->socket_path);
     server->chr_name = g_strdup_printf("chr-%s", name);
     server->chr = qemu_chr_new(server->chr_name, chr_path, NULL);
     g_free(chr_path);
@@ -402,17 +432,19 @@ static TestServer *test_server_new(const gchar *name)
     server->data_cond = _cond_new();
 
     server->log_fd = -1;
+    server->req_fd = -1;
 
     return server;
 }
 
 #define GET_QEMU_CMD(s)                                 \
     g_strdup_printf(QEMU_CMD, 512, 512, (hugefs),       \
-        (s)->chr_name, (s)->socket_path, (s)->chr_name)
+        (s)->chr_name, (s)->socket_path, "", (s)->chr_name)
 
-#define GET_QEMU_CMDE(s, mem, extra, ...)                               \
-    g_strdup_printf(QEMU_CMD extra, (mem), (mem), (hugefs),             \
-        (s)->chr_name, (s)->socket_path, (s)->chr_name, ##__VA_ARGS__)
+#define GET_QEMU_CMDE(s, mem, chr_opts, extra, ...)             \
+    g_strdup_printf(QEMU_CMD extra, (mem), (mem), (hugefs),     \
+        (s)->chr_name, (s)->socket_path, (chr_opts),            \
+        (s)->chr_name, ##__VA_ARGS__)
 
 static void test_server_free(TestServer *server)
 {
@@ -428,6 +460,10 @@ static void test_server_free(TestServer *server)
         close(server->log_fd);
     }
 
+    if (server->req_fd != -1) {
+        close(server->req_fd);
+    }
+
     unlink(server->socket_path);
     g_free(server->socket_path);
 
@@ -532,8 +568,8 @@ GSourceFuncs test_migrate_source_funcs = {
 
 static void test_migrate(void)
 {
-    TestServer *s = test_server_new("src");
-    TestServer *dest = test_server_new("dest");
+    TestServer *s = test_server_new("src", NULL);
+    TestServer *dest = test_server_new("dest", NULL);
     const char *uri = "tcp:127.0.0.1:1234";
     QTestState *global = global_qtest, *from, *to;
     GSource *source;
@@ -542,7 +578,7 @@ static void test_migrate(void)
     guint8 *log;
     guint64 size;
 
-    cmd = GET_QEMU_CMDE(s, 2, "");
+    cmd = GET_QEMU_CMDE(s, 2, "", "");
     from = qtest_start(cmd);
     g_free(cmd);
 
@@ -550,7 +586,7 @@ static void test_migrate(void)
     size = get_log_size(s);
     g_assert_cmpint(size, ==, (2 * 1024 * 1024) / (VHOST_LOG_PAGE * 8));
 
-    cmd = GET_QEMU_CMDE(dest, 2, " -incoming %s", uri);
+    cmd = GET_QEMU_CMDE(dest, 2, "", " -incoming %s", uri);
     to = qtest_init(cmd);
     g_free(cmd);
 
@@ -609,6 +645,108 @@ static void test_migrate(void)
     global_qtest = global;
 }
 
+static void wait_for_req_fd(TestServer *s)
+{
+    gint64 end_time;
+
+    g_mutex_lock(s->data_mutex);
+    end_time = _get_time() + 5 * G_TIME_SPAN_SECOND;
+    while (s->req_fd == -1) {
+        if (!_cond_wait_until(s->data_cond, s->data_mutex, end_time)) {
+            /* timeout has passed */
+            g_assert(s->req_fd != -1);
+            break;
+        }
+    }
+
+    g_mutex_unlock(s->data_mutex);
+}
+
+#define PCI_SLOT                0x04
+
+static QVirtioPCIDevice *virtio_net_pci_init(QPCIBus *bus, int slot)
+{
+    QVirtioPCIDevice *dev;
+
+    dev = qvirtio_pci_device_find(bus, QVIRTIO_NET_DEVICE_ID);
+    g_assert(dev != NULL);
+    g_assert_cmphex(dev->vdev.device_type, ==, QVIRTIO_NET_DEVICE_ID);
+
+    qvirtio_pci_device_enable(dev);
+    qvirtio_reset(&qvirtio_pci, &dev->vdev);
+    qvirtio_set_acknowledge(&qvirtio_pci, &dev->vdev);
+    qvirtio_set_driver(&qvirtio_pci, &dev->vdev);
+
+    return dev;
+}
+
+static void driver_init(const QVirtioBus *bus, QVirtioDevice *dev)
+{
+    uint32_t features;
+
+    features = qvirtio_get_features(bus, dev);
+    features = features & ~(QVIRTIO_F_BAD_FEATURE |
+                            QVIRTIO_F_RING_INDIRECT_DESC |
+                            QVIRTIO_F_RING_EVENT_IDX);
+    qvirtio_set_features(bus, dev, features);
+
+    qvirtio_set_driver_ok(bus, dev);
+}
+
+static void test_reconnect(void)
+{
+    TestServer *s = test_server_new("src", "unix:%s,reconnect=1");
+    QTestState *global = global_qtest, *from;
+    VhostUserMsg msg = { .request = VHOST_USER_SLAVE_SHUTDOWN };
+    gchar *cmd;
+    int ret;
+    QPCIBus *bus;
+    QVirtioPCIDevice *dev;
+    QGuestAllocator *alloc;
+
+    msg.flags &= ~VHOST_USER_VERSION_MASK;
+    msg.flags |= VHOST_USER_VERSION;
+
+    cmd = GET_QEMU_CMDE(s, 2, ",server", "");
+    from = qtest_start(cmd);
+    wait_for_req_fd(s);
+    g_free(cmd);
+
+    bus = qpci_init_pc();
+    dev = virtio_net_pci_init(bus, PCI_SLOT);
+    alloc = pc_alloc_init();
+    driver_init(&qvirtio_pci, &dev->vdev);
+
+    s->get_base_count = 0;
+    ret = send(s->req_fd, &msg, VHOST_USER_HDR_SIZE, 0);
+    g_assert_cmpint(ret, ==, VHOST_USER_HDR_SIZE);
+    ret = read(s->req_fd, &msg, VHOST_USER_HDR_SIZE);
+    g_assert_cmpint(ret, ==, VHOST_USER_HDR_SIZE);
+    g_assert_cmpint(msg.u64, ==, 0);
+
+    /* check that get_base was called for each queue */
+    g_assert_cmpint(s->get_base_count, ==, 2);
+
+    close(s->req_fd);
+    s->req_fd = -1;
+
+    /* reconnect */
+    qemu_chr_disconnect(s->chr);
+    wait_for_fds(s);
+    /* FIXME: manipulate vring to change last used index */
+    g_assert_cmpint(s->set_base[0], ==, 0);
+    g_assert_cmpint(s->set_base[1], ==, 0);
+    pc_alloc_uninit(alloc);
+    qvirtio_pci_device_disable(dev);
+    g_free(dev);
+    qpci_free_pc(bus);
+
+    qtest_quit(from);
+    test_server_free(s);
+
+    global_qtest = global;
+}
+
 static const char *init_hugepagefs(void)
 {
     const char *path;
@@ -656,18 +794,20 @@ int main(int argc, char **argv)
 
     hugefs = init_hugepagefs();
     if (!hugefs) {
+        g_debug("no hugepagefs available, skipping");
         return 0;
     }
 
     _thread_new(NULL, thread_function, NULL);
 
-    server = test_server_new("test");
+    server = test_server_new("test", NULL);
     qemu_cmd = GET_QEMU_CMD(server);
     s = qtest_start(qemu_cmd);
     g_free(qemu_cmd);
 
     qtest_add_data_func("/vhost-user/read-guest-mem", server, read_guest_mem);
     qtest_add_func("/vhost-user/migrate", test_migrate);
+    qtest_add_func("/vhost-user/reconnect", test_reconnect);
 
     ret = g_test_run();
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
                   ` (13 preceding siblings ...)
  2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 14/14] test: start vhost-user reconnect test marcandre.lureau
@ 2015-11-26 10:33 ` Michael S. Tsirkin
  2016-02-23 18:00   ` Marc-André Lureau
  2016-03-24  7:10   ` Yuanhan Liu
  14 siblings, 2 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2015-11-26 10:33 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: mukawa, qemu-devel

On Wed, Sep 09, 2015 at 01:09:52AM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> In a previous series "Add feature to start QEMU without vhost-user
> backend", Tetsuya Mukawa proposed to allow the vhost-user backend to
> disconnect and reconnect. However, Michael Tsirkin pointed out that
> you can't do that without extra care, because the guest and hypervisor
> don't know the slave ring manipulation state, there might be pending
> replies for example that could be lost, and suggested to reset the
> guest queues, but this requires kernel changes, and it may have to
> clear the ring and lose queued packets.
> 
> The following series starts from the idea that the slave can request a
> "managed" shutdown instead and later recover (I guess the use case for
> this is to allow for example to update static dispatching/filter rules
> etc)

I'm still not sure users actually need this.  I am inclined to think we
should teach guests to respond to NEED_RESET status. Then to handle
disconnect, we would
- deactivate the disconnected backend
- stop VM, and wait for a reconnect
- set NEED_RESET status, and re-activate the backend
  after guest reset

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2015-11-26 10:33 ` [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection Michael S. Tsirkin
@ 2016-02-23 18:00   ` Marc-André Lureau
  2016-03-02  6:44     ` Michael S. Tsirkin
  2016-03-24  7:10   ` Yuanhan Liu
  1 sibling, 1 reply; 25+ messages in thread
From: Marc-André Lureau @ 2016-02-23 18:00 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Tetsuya Mukawa, QEMU

Hi

On Thu, Nov 26, 2015 at 11:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> I'm still not sure users actually need this.  I am inclined to think we
> should teach guests to respond to NEED_RESET status. Then to handle
> disconnect, we would
> - deactivate the disconnected backend
> - stop VM, and wait for a reconnect
> - set NEED_RESET status, and re-activate the backend
>   after guest reset

Sorry your answer came late, so my reply too ;). It may depend on the
use case, what I suggest wouldn't need to stop the VM, but merely set
the link-down status. Furthermore, how do you disconnect/deactivate
the backend cleanly? Having an explicit message for that and document
it may make things easier.

It has been a while I haven't looked at vhost-user, so I am short of
details, but I rebased this series, if there is still interest. I'll
send a RFCv2, there are a couple of worthy fixes to pick from that.

thanks

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2016-02-23 18:00   ` Marc-André Lureau
@ 2016-03-02  6:44     ` Michael S. Tsirkin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2016-03-02  6:44 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Tetsuya Mukawa, QEMU

On Tue, Feb 23, 2016 at 07:00:27PM +0100, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Nov 26, 2015 at 11:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > I'm still not sure users actually need this.  I am inclined to think we
> > should teach guests to respond to NEED_RESET status. Then to handle
> > disconnect, we would
> > - deactivate the disconnected backend
> > - stop VM, and wait for a reconnect
> > - set NEED_RESET status, and re-activate the backend
> >   after guest reset
> 
> Sorry your answer came late, so my reply too ;). It may depend on the
> use case, what I suggest wouldn't need to stop the VM, but merely set
> the link-down status. Furthermore, how do you disconnect/deactivate
> the backend cleanly? Having an explicit message for that and document
> it may make things easier.
> 
> It has been a while I haven't looked at vhost-user, so I am short of
> details, but I rebased this series, if there is still interest. I'll
> send a RFCv2, there are a couple of worthy fixes to pick from that.
> 
> thanks

OK, sure. Let's split this up: fixes, then new features.

> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2015-11-26 10:33 ` [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection Michael S. Tsirkin
  2016-02-23 18:00   ` Marc-André Lureau
@ 2016-03-24  7:10   ` Yuanhan Liu
  2016-03-25 18:00     ` Marc-André Lureau
  1 sibling, 1 reply; 25+ messages in thread
From: Yuanhan Liu @ 2016-03-24  7:10 UTC (permalink / raw)
  To: Michael S. Tsirkin, marcandre.lureau; +Cc: mukawa, qemu-devel

On Thu, Nov 26, 2015 at 12:33:22PM +0200, Michael S. Tsirkin wrote:
> On Wed, Sep 09, 2015 at 01:09:52AM +0200, marcandre.lureau@redhat.com wrote:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > In a previous series "Add feature to start QEMU without vhost-user
> > backend", Tetsuya Mukawa proposed to allow the vhost-user backend to
> > disconnect and reconnect. However, Michael Tsirkin pointed out that
> > you can't do that without extra care, because the guest and hypervisor
> > don't know the slave ring manipulation state, there might be pending
> > replies for example that could be lost, and suggested to reset the
> > guest queues, but this requires kernel changes, and it may have to
> > clear the ring and lose queued packets.

Hi Michael & Marc,

I said I'd would like to have a try last week, and I did. However, I
just be able to have a closer look at the code recently.

> > The following series starts from the idea that the slave can request a
> > "managed" shutdown instead and later recover (I guess the use case for
> > this is to allow for example to update static dispatching/filter rules
> > etc)

What if the backend crashes, that no such request will be sent? And
I'm wondering why this request is needed, as we are able to detect
the disconnect now (with your patches).

BTW, you meant to let QEMU as the server and the backend as the client
here, right? Honestly, that's what we've thought of, too, in the first
time.

However, I'm wondering could we still go with the QEMU as the client
and the backend as the server (the default and the only way DPDK
supports), and let QEMU to try to reconnect when the backend crashes
and restarts. In such case, we need enable the "reconnect" option
for vhost-user, and once I have done that, it basically works in my
test:

- start DPDK vhost-switch example

- start QEMU, which will connect to DPDK vhost-user

  link is good now.

- kill DPDK vhost-switch

  link is broken at this stage

- start DPDK vhost-switch again

  you will find that the link is back again.

Will that makes sense to you? If so, we may need do nothing (or just
very few) changes at all to DPDK to get the reconnect work.

	--yliu

> I'm still not sure users actually need this.  I am inclined to think we
> should teach guests to respond to NEED_RESET status. Then to handle
> disconnect, we would
> - deactivate the disconnected backend
> - stop VM, and wait for a reconnect
> - set NEED_RESET status, and re-activate the backend
>   after guest reset
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2016-03-24  7:10   ` Yuanhan Liu
@ 2016-03-25 18:00     ` Marc-André Lureau
  2016-03-28  1:53       ` Tetsuya Mukawa
  2016-03-29  8:10       ` Yuanhan Liu
  0 siblings, 2 replies; 25+ messages in thread
From: Marc-André Lureau @ 2016-03-25 18:00 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Tetsuya Mukawa, QEMU, Michael S. Tsirkin

Hi

On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
>> > The following series starts from the idea that the slave can request a
>> > "managed" shutdown instead and later recover (I guess the use case for
>> > this is to allow for example to update static dispatching/filter rules
>> > etc)
>
> What if the backend crashes, that no such request will be sent? And
> I'm wondering why this request is needed, as we are able to detect
> the disconnect now (with your patches).

I don't think trying to handle backend crashes is really a thing we
need to take care of. If the backend is bad enough to crash, it may as
well corrupt the guest memory (mst: my understanding of vhost-user is
that backend must be trusted, or it could just throw garbage in the
queue descriptors with surprising consequences or elsewhere in the
guest memory actually, right?).

> BTW, you meant to let QEMU as the server and the backend as the client
> here, right? Honestly, that's what we've thought of, too, in the first
> time.
> However, I'm wondering could we still go with the QEMU as the client
> and the backend as the server (the default and the only way DPDK
> supports), and let QEMU to try to reconnect when the backend crashes
> and restarts. In such case, we need enable the "reconnect" option
> for vhost-user, and once I have done that, it basically works in my
> test:
>

Conceptually, I think if we allow the backend to disconnect, it makes
sense that qemu is actually the socket server. But it doesn't matter
much, it's simple to teach qemu to reconnect a timer... So we should
probably allow both cases anyway.

> - start DPDK vhost-switch example
>
> - start QEMU, which will connect to DPDK vhost-user
>
>   link is good now.
>
> - kill DPDK vhost-switch
>
>   link is broken at this stage
>
> - start DPDK vhost-switch again
>
>   you will find that the link is back again.
>
>
> Will that makes sense to you? If so, we may need do nothing (or just
> very few) changes at all to DPDK to get the reconnect work.

The main issue with handling crashes (gone at any time) is that the
backend my not have time to sync the used idx (at the least). It may
already have processed incoming packets, so on reconnect, it may
duplicate the receiving/dispatching work. Similarly, on the backend
receiving end, some packets may be lost, never received by the VM, and
later overwritten by the backend after reconnect (for the same used
idx update reason). This may not be a big deal for unreliable
protocols, but I am not familiar enough with network usage to know if
that's fine in all cases. It may be fine for some packets, such as
udp.

However, in general, vhost-user should not be specific to network
transmission, and it would be nice to have a reliable way for the the
backend to reconnect. That's what I try to do in this series. I'll
repost it after I have done more testing.

thanks

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2016-03-25 18:00     ` Marc-André Lureau
@ 2016-03-28  1:53       ` Tetsuya Mukawa
  2016-03-28  2:06         ` Tetsuya Mukawa
  2016-03-29  8:10       ` Yuanhan Liu
  1 sibling, 1 reply; 25+ messages in thread
From: Tetsuya Mukawa @ 2016-03-28  1:53 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Marc-André Lureau, QEMU, Michael S. Tsirkin

On 2016/03/26 3:00, Marc-André Lureau wrote:
> Hi
>
> On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
>>>> The following series starts from the idea that the slave can request a
>>>> "managed" shutdown instead and later recover (I guess the use case for
>>>> this is to allow for example to update static dispatching/filter rules
>>>> etc)
>> What if the backend crashes, that no such request will be sent? And
>> I'm wondering why this request is needed, as we are able to detect
>> the disconnect now (with your patches).
> I don't think trying to handle backend crashes is really a thing we
> need to take care of. If the backend is bad enough to crash, it may as
> well corrupt the guest memory (mst: my understanding of vhost-user is
> that backend must be trusted, or it could just throw garbage in the
> queue descriptors with surprising consequences or elsewhere in the
> guest memory actually, right?).
>
>> BTW, you meant to let QEMU as the server and the backend as the client
>> here, right? Honestly, that's what we've thought of, too, in the first
>> time.
>> However, I'm wondering could we still go with the QEMU as the client
>> and the backend as the server (the default and the only way DPDK
>> supports), and let QEMU to try to reconnect when the backend crashes
>> and restarts. In such case, we need enable the "reconnect" option
>> for vhost-user, and once I have done that, it basically works in my
>> test:
>>
> Conceptually, I think if we allow the backend to disconnect, it makes
> sense that qemu is actually the socket server. But it doesn't matter
> much, it's simple to teach qemu to reconnect a timer... So we should
> probably allow both cases anyway.
>
>> - start DPDK vhost-switch example
>>
>> - start QEMU, which will connect to DPDK vhost-user
>>
>>   link is good now.
>>
>> - kill DPDK vhost-switch
>>
>>   link is broken at this stage
>>
>> - start DPDK vhost-switch again
>>
>>   you will find that the link is back again.
>>
>>
>> Will that makes sense to you? If so, we may need do nothing (or just
>> very few) changes at all to DPDK to get the reconnect work.
> The main issue with handling crashes (gone at any time) is that the
> backend my not have time to sync the used idx (at the least). It may
> already have processed incoming packets, so on reconnect, it may
> duplicate the receiving/dispatching work. Similarly, on the backend
> receiving end, some packets may be lost, never received by the VM, and
> later overwritten by the backend after reconnect (for the same used
> idx update reason). This may not be a big deal for unreliable
> protocols, but I am not familiar enough with network usage to know if
> that's fine in all cases. It may be fine for some packets, such as
> udp.
>
> However, in general, vhost-user should not be specific to network
> transmission, and it would be nice to have a reliable way for the the
> backend to reconnect. That's what I try to do in this series. I'll
> repost it after I have done more testing.
>
> thanks
>

Hi Yuanhan,

Probably, we have 2 options here.
One is using DEVICE_NEEDS_RESET, or adding one more new status like
QUEUE_NEEDS_RESET to virtio specification.
In this case, we will need to fix virtio-net drivers and virtio-net
device of QEMU, so it might need to fix a lot of code, but we can handle
unexpected shutdown of vhost-user backend.
The other option is Marc's simple solution. In this case, we don't need
to change virtio-net drivers, but we cannot handle unexpected shutdown.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2016-03-28  1:53       ` Tetsuya Mukawa
@ 2016-03-28  2:06         ` Tetsuya Mukawa
  0 siblings, 0 replies; 25+ messages in thread
From: Tetsuya Mukawa @ 2016-03-28  2:06 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Marc-André Lureau, QEMU, Michael S. Tsirkin

On 2016/03/28 10:53, Tetsuya Mukawa wrote:
> On 2016/03/26 3:00, Marc-André Lureau wrote:
>> Hi
>>
>> On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu
>> <yuanhan.liu@linux.intel.com> wrote:
>>>>> The following series starts from the idea that the slave can request a
>>>>> "managed" shutdown instead and later recover (I guess the use case for
>>>>> this is to allow for example to update static dispatching/filter rules
>>>>> etc)
>>> What if the backend crashes, that no such request will be sent? And
>>> I'm wondering why this request is needed, as we are able to detect
>>> the disconnect now (with your patches).
>> I don't think trying to handle backend crashes is really a thing we
>> need to take care of. If the backend is bad enough to crash, it may as
>> well corrupt the guest memory (mst: my understanding of vhost-user is
>> that backend must be trusted, or it could just throw garbage in the
>> queue descriptors with surprising consequences or elsewhere in the
>> guest memory actually, right?).
>>
>>> BTW, you meant to let QEMU as the server and the backend as the client
>>> here, right? Honestly, that's what we've thought of, too, in the first
>>> time.
>>> However, I'm wondering could we still go with the QEMU as the client
>>> and the backend as the server (the default and the only way DPDK
>>> supports), and let QEMU to try to reconnect when the backend crashes
>>> and restarts. In such case, we need enable the "reconnect" option
>>> for vhost-user, and once I have done that, it basically works in my
>>> test:
>>>
>> Conceptually, I think if we allow the backend to disconnect, it makes
>> sense that qemu is actually the socket server. But it doesn't matter
>> much, it's simple to teach qemu to reconnect a timer... So we should
>> probably allow both cases anyway.
>>
>>> - start DPDK vhost-switch example
>>>
>>> - start QEMU, which will connect to DPDK vhost-user
>>>
>>>   link is good now.
>>>
>>> - kill DPDK vhost-switch
>>>
>>>   link is broken at this stage
>>>
>>> - start DPDK vhost-switch again
>>>
>>>   you will find that the link is back again.
>>>
>>>
>>> Will that makes sense to you? If so, we may need do nothing (or just
>>> very few) changes at all to DPDK to get the reconnect work.
>> The main issue with handling crashes (gone at any time) is that the
>> backend my not have time to sync the used idx (at the least). It may
>> already have processed incoming packets, so on reconnect, it may
>> duplicate the receiving/dispatching work. Similarly, on the backend
>> receiving end, some packets may be lost, never received by the VM, and
>> later overwritten by the backend after reconnect (for the same used
>> idx update reason). This may not be a big deal for unreliable
>> protocols, but I am not familiar enough with network usage to know if
>> that's fine in all cases. It may be fine for some packets, such as
>> udp.
>>
>> However, in general, vhost-user should not be specific to network
>> transmission, and it would be nice to have a reliable way for the the
>> backend to reconnect. That's what I try to do in this series. I'll
>> repost it after I have done more testing.
>>
>> thanks
>>
> Hi Yuanhan,
>
> Probably, we have 2 options here.
> One is using DEVICE_NEEDS_RESET, or adding one more new status like
> QUEUE_NEEDS_RESET to virtio specification.
> In this case, we will need to fix virtio-net drivers and virtio-net
> device of QEMU, so it might need to fix a lot of code, but we can handle
> unexpected shutdown of vhost-user backend.
> The other option is Marc's simple solution. In this case, we don't need
> to change virtio-net drivers, but we cannot handle unexpected shutdown.

Let me add a bit.
Actually we can use both options at the same.
For example, only when vhost-user backend closes unexpectedly, use
DEVICE_NEEDS_RESET status.
So probably it's nice to start merging Marc's patches first.

Anyway, if we want to handle unexpected shutdown properly , we may need
to use a kind of DEVICE_NEEDS_RESET status.

Tetsuya

> Thanks,
> Tetsuya
>
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2016-03-25 18:00     ` Marc-André Lureau
  2016-03-28  1:53       ` Tetsuya Mukawa
@ 2016-03-29  8:10       ` Yuanhan Liu
  2016-03-29 10:52         ` Marc-André Lureau
  1 sibling, 1 reply; 25+ messages in thread
From: Yuanhan Liu @ 2016-03-29  8:10 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Michael S. Tsirkin, QEMU, Wang, Zhihong, Tetsuya Mukawa,
	Kavanagh, Mark B, Traynor, Kevin

On Fri, Mar 25, 2016 at 07:00:15PM +0100, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> >> > The following series starts from the idea that the slave can request a
> >> > "managed" shutdown instead and later recover (I guess the use case for
> >> > this is to allow for example to update static dispatching/filter rules
> >> > etc)
> >
> > What if the backend crashes, that no such request will be sent? And
> > I'm wondering why this request is needed, as we are able to detect
> > the disconnect now (with your patches).

Hi, sorry for late and thanks for the input.

(CC'ed more guys, as they are the hungry users that wish this feature
 to be done ASAP; they may provide more valuable inputs)

> 
> I don't think trying to handle backend crashes is really a thing we
> need to take care of.

Yes, me neither. The reason I mentioned backend crash is that it's
one of the reasons why we need vhost-user reconnect stuff. And the
another reason is the backend restart.

Backend crash may be not that normal in production usage, it is in
development stage. It would be handy if we can handle that as well,
as it's a painful experience for me that every time I do a VM2VM test
and once my changes to backend causes a crash (unfortunately, it
happens often in early stage), I have to reboot the two VMs.

If we have reconnect, life could be easier. I then don't need worry
that I will mess the backend and crash it any more.

And again, the reason I mentioned crash here again is not because
we need handle it, specially. Instead, I was thinking we might be
able to handle the two reasons both.

> If the backend is bad enough to crash, it may as
> well corrupt the guest memory (mst: my understanding of vhost-user is
> that backend must be trusted, or it could just throw garbage in the
> queue descriptors with surprising consequences or elsewhere in the
> guest memory actually, right?).
> 
> > BTW, you meant to let QEMU as the server and the backend as the client
> > here, right? Honestly, that's what we've thought of, too, in the first
> > time.
> > However, I'm wondering could we still go with the QEMU as the client
> > and the backend as the server (the default and the only way DPDK
> > supports), and let QEMU to try to reconnect when the backend crashes
> > and restarts. In such case, we need enable the "reconnect" option
> > for vhost-user, and once I have done that, it basically works in my
> > test:
> >
> 
> Conceptually, I think if we allow the backend to disconnect, it makes
> sense that qemu is actually the socket server. But it doesn't matter
> much, it's simple to teach qemu to reconnect a timer...

We already have that, right? I mean, the char-dev "reconnect" option.

> So we should
> probably allow both cases anyway.

Yes, I think both should work. I may be wrong (that I may miss
something), but it seems it's (far) easier to me to keep QEMU
as the client, and adding the "reconnect" option, judging that
we have all stuff to make it work ready. In this way,

- if backend crashes/quits, you just need restart the backend,
  and QEMU will retry the connection.

- if QEMU crashes/quits, you just need restart the QEMU, and
  then QEMU will start a fresh connection.


However, if let QEMU as the server, there are 2 major works need
to be done in the backend side (take DPDK as example, that just
acts as vhost-user server only so far):

- Introducing a new API or extending current API to let it to
  connect the server socket from QEMU.

- If QEMU crashes/quits, we need add code to backend to keep
  reconnecting unless connection is established, which is something
  similar to the "reconnect" stuff in QEMU.

As you can see, it needs more effort (though it's not something
you care :). And it has duplicate work.

> 
> > - start DPDK vhost-switch example
> >
> > - start QEMU, which will connect to DPDK vhost-user
> >
> >   link is good now.
> >
> > - kill DPDK vhost-switch
> >
> >   link is broken at this stage
> >
> > - start DPDK vhost-switch again
> >
> >   you will find that the link is back again.
> >
> >
> > Will that makes sense to you? If so, we may need do nothing (or just
> > very few) changes at all to DPDK to get the reconnect work.
> 
> The main issue with handling crashes (gone at any time) is that the
> backend my not have time to sync the used idx (at the least). It may
> already have processed incoming packets, so on reconnect, it may
> duplicate the receiving/dispatching work.

That's not the case for DPDK backend implementation: incoming packets
won't be delivered for processing before we update the used idx.

> Similarly, on the backend
> receiving end, some packets may be lost, never received by the VM, and
> later overwritten by the backend after reconnect (for the same used
> idx update reason). This may not be a big deal for unreliable
> protocols, but I am not familiar enough with network usage to know if
> that's fine in all cases. It may be fine for some packets, such as
> udp.
> 
> However, in general, vhost-user should not be specific to network
> transmission, and it would be nice to have a reliable way for the the
> backend to reconnect. That's what I try to do in this series.

Agreed.

> I'll
> repost it after I have done more testing.

Great, and thanks for the work. I will also take some time to give
closer review.

	--yliu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2016-03-29  8:10       ` Yuanhan Liu
@ 2016-03-29 10:52         ` Marc-André Lureau
  2016-03-29 14:28           ` Yuanhan Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Marc-André Lureau @ 2016-03-29 10:52 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Michael S. Tsirkin, QEMU, Wang, Zhihong, Tetsuya Mukawa,
	Kavanagh, Mark B, Traynor, Kevin

Hi

On Tue, Mar 29, 2016 at 10:10 AM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> Backend crash may be not that normal in production usage, it is in
> development stage. It would be handy if we can handle that as well,
> as it's a painful experience for me that every time I do a VM2VM test
> and once my changes to backend causes a crash (unfortunately, it
> happens often in early stage), I have to reboot the two VMs.
>
> If we have reconnect, life could be easier. I then don't need worry
> that I will mess the backend and crash it any more.
>
> And again, the reason I mentioned crash here again is not because
> we need handle it, specially. Instead, I was thinking we might be
> able to handle the two reasons both.

I think crash could be handle with queue reset Michael proposed, but
that would probably require guest side changes.

>> Conceptually, I think if we allow the backend to disconnect, it makes
>> sense that qemu is actually the socket server. But it doesn't matter
>> much, it's simple to teach qemu to reconnect a timer...
>
> We already have that, right? I mean, the char-dev "reconnect" option.

Sure, what I mean is that it sounds cleaner from a design pov for qemu
to be the server (since it is the one actually waiting for backend in
this case), beside a timer is often a pretty rough solution to
anything.

>> So we should
>> probably allow both cases anyway.
>
> Yes, I think both should work. I may be wrong (that I may miss
> something), but it seems it's (far) easier to me to keep QEMU
> as the client, and adding the "reconnect" option, judging that
> we have all stuff to make it work ready. In this way,
>
> - if backend crashes/quits, you just need restart the backend,
>   and QEMU will retry the connection.
>
> - if QEMU crashes/quits, you just need restart the QEMU, and
>   then QEMU will start a fresh connection.

It may all depend on use cases, it's not more obvious or easy than the
other to me.

> However, if let QEMU as the server, there are 2 major works need
> to be done in the backend side (take DPDK as example, that just
> acts as vhost-user server only so far):
>
> - Introducing a new API or extending current API to let it to
>   connect the server socket from QEMU.
>
> - If QEMU crashes/quits, we need add code to backend to keep
>   reconnecting unless connection is established, which is something
>   similar to the "reconnect" stuff in QEMU.
> As you can see, it needs more effort (though it's not something
> you care :). And it has duplicate work.


Ah, I am looking at this from qemu angle, backend may need to adapt if
it doesn't already handle both "socket role" (client & server).

>
>>
>> > - start DPDK vhost-switch example
>> >
>> > - start QEMU, which will connect to DPDK vhost-user
>> >
>> >   link is good now.
>> >
>> > - kill DPDK vhost-switch
>> >
>> >   link is broken at this stage
>> >
>> > - start DPDK vhost-switch again
>> >
>> >   you will find that the link is back again.
>> >
>> >
>> > Will that makes sense to you? If so, we may need do nothing (or just
>> > very few) changes at all to DPDK to get the reconnect work.
>>
>> The main issue with handling crashes (gone at any time) is that the
>> backend my not have time to sync the used idx (at the least). It may
>> already have processed incoming packets, so on reconnect, it may
>> duplicate the receiving/dispatching work.
>
> That's not the case for DPDK backend implementation: incoming packets
> won't be delivered for processing before we update the used idx.

It could be ok on incoming packet side (vm->dpdk), but I don't see how
you could avoid packet loss on the other side (dpdk->vm), since the
packets must be added to queues before updating used idx.

>
>> Similarly, on the backend
>> receiving end, some packets may be lost, never received by the VM, and
>> later overwritten by the backend after reconnect (for the same used
>> idx update reason). This may not be a big deal for unreliable
>> protocols, but I am not familiar enough with network usage to know if
>> that's fine in all cases. It may be fine for some packets, such as
>> udp.
>>
>> However, in general, vhost-user should not be specific to network
>> transmission, and it would be nice to have a reliable way for the the
>> backend to reconnect. That's what I try to do in this series.
>
> Agreed.
>
>> I'll
>> repost it after I have done more testing.
>
> Great, and thanks for the work. I will also take some time to give
> closer review.

thanks


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
  2016-03-29 10:52         ` Marc-André Lureau
@ 2016-03-29 14:28           ` Yuanhan Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Yuanhan Liu @ 2016-03-29 14:28 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Yuanhan Liu, Michael S. Tsirkin, QEMU, Wang, Zhihong,
	Tetsuya Mukawa, Kavanagh, Mark B, Traynor, Kevin

On Tue, Mar 29, 2016 at 12:52:32PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Mar 29, 2016 at 10:10 AM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> > Backend crash may be not that normal in production usage, it is in
> > development stage. It would be handy if we can handle that as well,
> > as it's a painful experience for me that every time I do a VM2VM test
> > and once my changes to backend causes a crash (unfortunately, it
> > happens often in early stage), I have to reboot the two VMs.
> >
> > If we have reconnect, life could be easier. I then don't need worry
> > that I will mess the backend and crash it any more.
> >
> > And again, the reason I mentioned crash here again is not because
> > we need handle it, specially. Instead, I was thinking we might be
> > able to handle the two reasons both.
> 
> I think crash could be handle with queue reset Michael proposed, but
> that would probably require guest side changes.

Is the reply from Michael in this thread the whole story? If not, would
you please give me a link of that discussion?

> 
> >> Conceptually, I think if we allow the backend to disconnect, it makes
> >> sense that qemu is actually the socket server. But it doesn't matter
> >> much, it's simple to teach qemu to reconnect a timer...
> >
> > We already have that, right? I mean, the char-dev "reconnect" option.
> 
> Sure, what I mean is that it sounds cleaner from a design pov for qemu
> to be the server (since it is the one actually waiting for backend in
> this case),

Yeah, I agree with you on that. However, thinking in this way: let QEMU
be the server and backend be the client, the client still need to keep
trying to connect the server, when the server crashes/quits/restarts.

My point is, in either way, we need bear in mind that server could also
be down (due to crashes/restarts), that we have to teach the client to
do reconnect when disconnected. Judging that QEMU already has the support,
I'd slightly prefer to let QEMU still be the client, and do the reconnect
tries, if it works.

And to make it clear, both should work, but it's more like a question
which one will be a better option (to us, DPDK), QEMU be the server or
the client?  What's sure here is that, in either way, your patches would
work and are required.

> beside a timer is often a pretty rough solution to
> anything.

As stated above, I'm afraid that is somehow needed. It might be in QEMU
or backend, though.

> >> So we should
> >> probably allow both cases anyway.
> >
> > Yes, I think both should work. I may be wrong (that I may miss
> > something), but it seems it's (far) easier to me to keep QEMU
> > as the client, and adding the "reconnect" option, judging that
> > we have all stuff to make it work ready. In this way,
> >
> > - if backend crashes/quits, you just need restart the backend,
> >   and QEMU will retry the connection.
> >
> > - if QEMU crashes/quits, you just need restart the QEMU, and
> >   then QEMU will start a fresh connection.
> 
> It may all depend on use cases, it's not more obvious or easy than the
> other to me.
> 
> > However, if let QEMU as the server, there are 2 major works need
> > to be done in the backend side (take DPDK as example, that just
> > acts as vhost-user server only so far):
> >
> > - Introducing a new API or extending current API to let it to
> >   connect the server socket from QEMU.
> >
> > - If QEMU crashes/quits, we need add code to backend to keep
> >   reconnecting unless connection is established, which is something
> >   similar to the "reconnect" stuff in QEMU.
> > As you can see, it needs more effort (though it's not something
> > you care :). And it has duplicate work.
> 
> 
> Ah, I am looking at this from qemu angle, backend may need to adapt if
> it doesn't already handle both "socket role" (client & server).

Agreed.

> >
> >>
> >> > - start DPDK vhost-switch example
> >> >
> >> > - start QEMU, which will connect to DPDK vhost-user
> >> >
> >> >   link is good now.
> >> >
> >> > - kill DPDK vhost-switch
> >> >
> >> >   link is broken at this stage
> >> >
> >> > - start DPDK vhost-switch again
> >> >
> >> >   you will find that the link is back again.
> >> >
> >> >
> >> > Will that makes sense to you? If so, we may need do nothing (or just
> >> > very few) changes at all to DPDK to get the reconnect work.
> >>
> >> The main issue with handling crashes (gone at any time) is that the
> >> backend my not have time to sync the used idx (at the least). It may
> >> already have processed incoming packets, so on reconnect, it may
> >> duplicate the receiving/dispatching work.
> >
> > That's not the case for DPDK backend implementation: incoming packets
> > won't be delivered for processing before we update the used idx.
> 
> It could be ok on incoming packet side (vm->dpdk), but I don't see how
> you could avoid packet loss on the other side (dpdk->vm), since the
> packets must be added to queues before updating used idx.

You could check the return value, how many packets have been successfully
delivered. You then could retry in case of errors. However, I am not quite
sure such disconnect error can be detected though.

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-03-29 14:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08 23:09 [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection marcandre.lureau
2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 01/14] vhost-user: Add ability to know vhost-user backend disconnection marcandre.lureau
2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 02/14] vhost-user: remove useless is_server field marcandre.lureau
2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 03/14] qemu-char: avoid potential double-free marcandre.lureau
2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 04/14] qemu-char: remove all msgfds on disconnect marcandre.lureau
2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 05/14] qemu-char: make tcp_chr_disconnect() reentrant-safe marcandre.lureau
2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 06/14] vhost-net: keep VIRTIO_NET_F_STATUS for vhost-user marcandre.lureau
2015-09-08 23:09 ` [Qemu-devel] [PATCH RFC 07/14] virtio-net: enable tx notification if up and vhost started marcandre.lureau
2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 08/14] vhost: add vhost_dev stop callback marcandre.lureau
2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 09/14] vhost-user: add vhost_user to hold the chr marcandre.lureau
2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 10/14] qemu-char: add qemu_chr_free() marcandre.lureau
2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 11/14] vhost-user: add slave-fd support marcandre.lureau
2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 12/14] vhost-user: add shutdown support marcandre.lureau
2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 13/14] qemu-char: Add qemu_chr_disconnect to close a fd accepted by listen fd marcandre.lureau
2015-09-08 23:10 ` [Qemu-devel] [PATCH RFC 14/14] test: start vhost-user reconnect test marcandre.lureau
2015-11-26 10:33 ` [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection Michael S. Tsirkin
2016-02-23 18:00   ` Marc-André Lureau
2016-03-02  6:44     ` Michael S. Tsirkin
2016-03-24  7:10   ` Yuanhan Liu
2016-03-25 18:00     ` Marc-André Lureau
2016-03-28  1:53       ` Tetsuya Mukawa
2016-03-28  2:06         ` Tetsuya Mukawa
2016-03-29  8:10       ` Yuanhan Liu
2016-03-29 10:52         ` Marc-André Lureau
2016-03-29 14:28           ` Yuanhan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.