All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection
@ 2016-04-01 11:16 marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 01/18] tests: append i386 tests marcandre.lureau
                   ` (17 more replies)
  0 siblings, 18 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Hi

This is a follow-up on previous RFC allowing the slave to request a
"managed" shutdown and reconnect later. A new optional communication
socket is added for the slave to make request to the master (since
vhost-user protocol isn't bidirectional)

The initial connection must be made before the guest is started for
the feature negotiation to complete. In "server" mode, qemu waits
before starting the VM. However, in "client" mode with "reconnect",
you have to specify "wait". This will wait for the initial connection
before starting the VM (in contrast with the "nowait"+backend features
proposed by Tetsuya [1]).

In order to do a clean shutdown, the slave should flush all pending
buffers so that after VHOST_SET_VRING_BASE, it is enough to
resume. The guest is made aware of virtio-net disconnection thanks to
VIRTIO_NET_S_LINK_UP status, which is reflected by a link-down on the
nic.

RFCv2:
- rebased, added a few preliminary patches
- fix a few mistakes in shutdown message recv/send
- enforce "wait" when using "reconnect"
- save & restore features & check backend compatibility
- save & restore the ring state
- add shutdown support to vhost-user-bridge

Testing:
- the vhost-user-test has a simple reconnect test.
- vhost-user-bridge can be used to run test interactively:

1) Run a slirp/vlan in a background process:
$ qemu -net none -net socket,vlan=0,udp=localhost:4444,localaddr=localhost:5555 -net user,vlan=0

2) Start vubr (it'll use the slirp/vlan process above by default):
$ tests/vhost-user-bridge

3) Start qemu with vhost-user / virtio-net:
$ qemu ... -chardev socket,id=char0,path=/tmp/vubr.sock,reconnect=1,wait -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1

4) Play in the guest, interrupt (ctlr-c) vubr, check nic link status, restart vubr, etc..

[1] in a previous series "Add feature to start QEMU without
vhost-user backend", Tetsuya Mukawa proposed to allow the vhost-user
backend to disconnect and reconnect. However, Michael Tsirkin pointed
out that you can't do that without extra care, because the guest and
hypervisor don't know the slave ring manipulation state, there might
be pending replies for example that could be lost, and suggested to
reset the guest queues, but this requires kernel changes, and it may
have to clear the ring and lose queued packets. He also introduced a
new option to specify backend features on qemu command line to be able
to boot the VM without waiting for the backend. I consider this a
seperate enhancement.

Marc-André Lureau (16):
  tests: append i386 tests
  char: lower reconnect error to trace event
  char: use a trace for when the char is waiting
  char: add wait support for reconnect
  vhost-user: check reconnect comes with wait
  vhost: add vhost_dev stop callback
  vhost-user: add vhost_user to hold the chr
  vhost-user: add slave-fd support
  vhost-user: add shutdown support
  vhost-user: disconnect on start failure
  vhost-net: do not crash if backend is not present
  vhost-net: save & restore vhost-user acked features
  vhost-net: save & restore vring enable state
  test: vubr check vring enable state
  test: start vhost-user reconnect test
  test: add shutdown support vubr test

Tetsuya Mukawa (2):
  vhost-user: add ability to know vhost-user backend disconnection
  qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd

 docs/specs/vhost-user.txt |  38 ++++++++++
 hw/net/vhost_net.c        |  53 ++++++++++++-
 hw/virtio/vhost-user.c    | 104 +++++++++++++++++++++++++-
 include/hw/virtio/vhost.h |   4 +
 include/net/net.h         |   1 +
 include/net/vhost-user.h  |   1 +
 include/net/vhost_net.h   |   3 +
 include/sysemu/char.h     |   7 ++
 net/vhost-user.c          |  44 ++++++++++-
 qemu-char.c               |  46 +++++++++---
 tests/Makefile            |   4 +-
 tests/vhost-user-bridge.c | 112 ++++++++++++++++++++++++++--
 tests/vhost-user-test.c   | 184 ++++++++++++++++++++++++++++++++++++++++++----
 trace-events              |   4 +
 14 files changed, 564 insertions(+), 41 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 01/18] tests: append i386 tests
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 20:30   ` [Qemu-devel] [PATCH 01/18 for-2.6] " Eric Blake
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 02/18] char: lower reconnect error to trace event marcandre.lureau
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Do not overwrite x86-64 tests, re-enable vhost-user-test.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/Makefile b/tests/Makefile
index 45b9048..9293d49 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -222,7 +222,7 @@ endif
 check-qtest-i386-y += tests/test-netfilter$(EXESUF)
 check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
-check-qtest-x86_64-y = $(check-qtest-i386-y)
+check-qtest-x86_64-y += $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
 check-qtest-mips-y = tests/endianness-test$(EXESUF)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 02/18] char: lower reconnect error to trace event
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 01/18] tests: append i386 tests marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 03/18] char: use a trace for when the char is waiting marcandre.lureau
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

This message is an information rather than an error. Use a trace event
instead.

This allows using reconnect in tests without extra error messages.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c  | 4 ++--
 trace-events | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 270819a..a944ee0 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -40,6 +40,7 @@
 #include "io/channel-file.h"
 #include "io/channel-tls.h"
 #include "sysemu/replay.h"
+#include "trace.h"
 
 #include <zlib.h>
 
@@ -2639,8 +2640,7 @@ static void check_report_connect_error(CharDriverState *chr,
     TCPCharDriver *s = chr->opaque;
 
     if (!s->connect_err_reported) {
-        error_report("Unable to connect character device %s: %s",
-                     chr->label, error_get_pretty(err));
+        trace_char_socket_reconnect_error(chr->label, error_get_pretty(err));
         s->connect_err_reported = true;
     }
     qemu_chr_socket_restart_timer(chr);
diff --git a/trace-events b/trace-events
index 996a77f..19e0a05 100644
--- a/trace-events
+++ b/trace-events
@@ -689,6 +689,9 @@ grlib_apbuart_readl_unknown(uint64_t addr) "addr 0x%"PRIx64
 leon3_set_irq(int intno) "Set CPU IRQ %d"
 leon3_reset_irq(int intno) "Reset CPU IRQ %d"
 
+# qemu-char.c
+char_socket_reconnect_error(const char *label, const char *err) "Unable to connect character device %s: %s"
+
 # spice-qemu-char.c
 spice_vmc_write(ssize_t out, int len) "spice wrottn %zd of requested %d"
 spice_vmc_read(int bytes, int len) "spice read %d of requested %d"
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 03/18] char: use a trace for when the char is waiting
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 01/18] tests: append i386 tests marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 02/18] char: lower reconnect error to trace event marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 04/18] char: add wait support for reconnect marcandre.lureau
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Similarly to the reconnect error, this is an information message rather
than an error. Use a trace event instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c  | 3 +--
 trace-events | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index a944ee0..8702931 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -4434,8 +4434,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
         }
         s->listen_ioc = sioc;
         if (is_waitconnect) {
-            fprintf(stderr, "QEMU waiting for connection on: %s\n",
-                    chr->filename);
+            trace_char_socket_waiting(chr->filename);
             tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
         }
         qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), false, NULL);
diff --git a/trace-events b/trace-events
index 19e0a05..e59c0f5 100644
--- a/trace-events
+++ b/trace-events
@@ -691,6 +691,7 @@ leon3_reset_irq(int intno) "Reset CPU IRQ %d"
 
 # qemu-char.c
 char_socket_reconnect_error(const char *label, const char *err) "Unable to connect character device %s: %s"
+char_socket_waiting(const char *filename) "QEMU waiting for connection on: %s"
 
 # spice-qemu-char.c
 spice_vmc_write(ssize_t out, int len) "spice wrottn %zd of requested %d"
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 04/18] char: add wait support for reconnect
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (2 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 03/18] char: use a trace for when the char is waiting marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-28  4:33   ` Yuanhan Liu
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 05/18] vhost-user: check reconnect comes with wait marcandre.lureau
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Until now, 'wait' was solely used for listening sockets. However, it can
also be useful for 'reconnect' socket kind, where the first open must
succeed before continuing.

This allows for instance (with other support patches) having vhost-user
wait for an initial connection to setup the vhost-net, before eventually
accepting new connections.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 qemu-char.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 8702931..3e25c08 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3659,7 +3659,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
                                   Error **errp)
 {
     bool is_listen      = qemu_opt_get_bool(opts, "server", false);
-    bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
+    bool is_wait        = qemu_opt_get_bool(opts, "wait", is_listen);
     bool is_telnet      = qemu_opt_get_bool(opts, "telnet", false);
     bool do_nodelay     = !qemu_opt_get_bool(opts, "delay", true);
     int64_t reconnect   = qemu_opt_get_number(opts, "reconnect", 0);
@@ -3696,7 +3696,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
     sock->has_telnet = true;
     sock->telnet = is_telnet;
     sock->has_wait = true;
-    sock->wait = is_waitconnect;
+    sock->wait = is_wait;
     sock->has_reconnect = true;
     sock->reconnect = reconnect;
     sock->tls_creds = g_strdup(tls_creds);
@@ -4350,7 +4350,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
     bool do_nodelay     = sock->has_nodelay ? sock->nodelay : false;
     bool is_listen      = sock->has_server  ? sock->server  : true;
     bool is_telnet      = sock->has_telnet  ? sock->telnet  : false;
-    bool is_waitconnect = sock->has_wait    ? sock->wait    : false;
+    bool is_wait        = sock->has_wait    ? sock->wait    : false;
     int64_t reconnect   = sock->has_reconnect ? sock->reconnect : 0;
     ChardevCommon *common = qapi_ChardevSocket_base(sock);
     QIOChannelSocket *sioc = NULL;
@@ -4424,7 +4424,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
     }
 
     sioc = qio_channel_socket_new();
-    if (s->reconnect_time) {
+    if (s->reconnect_time && !is_wait) {
         qio_channel_socket_connect_async(sioc, s->addr,
                                          qemu_chr_socket_connected,
                                          chr, NULL);
@@ -4433,7 +4433,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
             goto error;
         }
         s->listen_ioc = sioc;
-        if (is_waitconnect) {
+        if (is_wait) {
             trace_char_socket_waiting(chr->filename);
             tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
         }
@@ -4443,9 +4443,24 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
                 QIO_CHANNEL(s->listen_ioc), G_IO_IN, tcp_chr_accept, chr, NULL);
         }
     } else {
-        if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
-            goto error;
-        }
+        do {
+            Error *err = NULL;
+
+            if (qio_channel_socket_connect_sync(sioc, s->addr, &err) < 0) {
+                if (reconnect) {
+                    trace_char_socket_reconnect_error(chr->label,
+                                                      error_get_pretty(err));
+                    error_free(err);
+                    continue;
+                } else {
+                    error_propagate(errp, err);
+                    goto error;
+                }
+            } else {
+                break;
+            }
+        } while (true);
+
         tcp_chr_new_client(chr, sioc);
         object_unref(OBJECT(sioc));
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 05/18] vhost-user: check reconnect comes with wait
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (3 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 04/18] char: add wait support for reconnect marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 06/18] vhost-user: add ability to know vhost-user backend disconnection marcandre.lureau
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

If the client socket has the 'reconnect' option, make sure the 'wait'
option is also used. That way, an initial connection will be ensured
before the VM start and the virtio device is configured.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 net/vhost-user.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 1b9e73a..9007d0b 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -27,6 +27,8 @@ typedef struct VhostUserState {
 typedef struct VhostUserChardevProps {
     bool is_socket;
     bool is_unix;
+    bool is_reconnect;
+    bool is_wait;
 } VhostUserChardevProps;
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
@@ -239,6 +241,10 @@ static int net_vhost_chardev_opts(void *opaque,
     } else if (strcmp(name, "path") == 0) {
         props->is_unix = true;
     } else if (strcmp(name, "server") == 0) {
+    } else if (strcmp(name, "reconnect") == 0) {
+        props->is_reconnect = true;
+    } else if (strcmp(name, "wait") == 0) {
+        props->is_wait = true;
     } else {
         error_setg(errp,
                    "vhost-user does not support a chardev with option %s=%s",
@@ -271,6 +277,12 @@ static CharDriverState *net_vhost_parse_chardev(
         return NULL;
     }
 
+    if (props.is_reconnect && !props.is_wait) {
+        error_setg(errp, "chardev \"%s\" must also 'wait' with 'reconnect'",
+                   opts->chardev);
+        return NULL;
+    }
+
     qemu_chr_fe_claim_no_fail(chr);
 
     return chr;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 06/18] vhost-user: add ability to know vhost-user backend disconnection
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (4 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 05/18] vhost-user: check reconnect comes with wait marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 07/18] vhost: add vhost_dev stop callback marcandre.lureau
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tetsuya Mukawa, Ilya Maximets, Yuanhan Liu, jonshin, Michael S. Tsirkin

From: Tetsuya Mukawa <mukawa@igel.co.jp>

Current QEMU cannot detect vhost-user backend disconnection. The
patch adds ability to know it.
To know disconnection, add watcher to detect G_IO_HUP event. When
G_IO_HUP event is detected, the disconnected socket will be read
to cause a CHR_EVENT_CLOSED.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 net/vhost-user.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 9007d0b..3dae53c 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -22,6 +22,7 @@ typedef struct VhostUserState {
     NetClientState nc;
     CharDriverState *chr;
     VHostNetState *vhost_net;
+    int watch;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -169,6 +170,20 @@ static NetClientInfo net_vhost_user_info = {
         .has_ufo = vhost_user_has_ufo,
 };
 
+static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
+                                           void *opaque)
+{
+    VhostUserState *s = opaque;
+    uint8_t buf[1];
+
+    /* We don't actually want to read anything, but CHR_EVENT_CLOSED will be
+     * raised as a side-effect of the read.
+     */
+    qemu_chr_fe_read_all(s->chr, buf, sizeof(buf));
+
+    return FALSE;
+}
+
 static void net_vhost_user_event(void *opaque, int event)
 {
     const char *name = opaque;
@@ -186,6 +201,8 @@ static void net_vhost_user_event(void *opaque, int event)
     trace_vhost_user_event(s->chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
+        s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP,
+                                         net_vhost_user_watch, s);
         if (vhost_user_start(queues, ncs) < 0) {
             exit(1);
         }
@@ -194,6 +211,8 @@ static void net_vhost_user_event(void *opaque, int event)
     case CHR_EVENT_CLOSED:
         qmp_set_link(name, false, &err);
         vhost_user_stop(queues, ncs);
+        g_source_remove(s->watch);
+        s->watch = 0;
         break;
     }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 07/18] vhost: add vhost_dev stop callback
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (5 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 06/18] vhost-user: add ability to know vhost-user backend disconnection marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 08/18] vhost-user: add vhost_user to hold the chr marcandre.lureau
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

vhost backend may want to stop the device, for example if it wants to
restart itself (translates to a link down for vhost-net).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c        | 14 ++++++++++++++
 include/hw/virtio/vhost.h |  4 ++++
 2 files changed, 18 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6e1032f..1e4710d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -131,6 +131,18 @@ static int vhost_net_get_fd(NetClientState *backend)
     }
 }
 
+static void vhost_net_backend_stop(struct vhost_dev *dev)
+{
+    struct vhost_net *net = container_of(dev, struct vhost_net, dev);
+    NetClientState *nc = net->nc;
+    NetClientState *peer = nc->peer;
+
+    peer->link_down = 1;
+    if (peer->info->link_status_changed) {
+        peer->info->link_status_changed(peer);
+    }
+}
+
 struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
     int r;
@@ -165,6 +177,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
         net->dev.vq_index = net->nc->queue_index * net->dev.nvqs;
     }
 
+    net->dev.stop = vhost_net_backend_stop;
+
     r = vhost_dev_init(&net->dev, options->opaque,
                        options->backend_type);
     if (r < 0) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index b60d758..859be64 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -35,6 +35,8 @@ struct vhost_log {
     vhost_log_chunk_t *log;
 };
 
+typedef void (*vhost_stop)(struct vhost_dev *dev);
+
 struct vhost_memory;
 struct vhost_dev {
     MemoryListener memory_listener;
@@ -61,6 +63,8 @@ struct vhost_dev {
     void *opaque;
     struct vhost_log *log;
     QLIST_ENTRY(vhost_dev) entry;
+    /* backend request to stop */
+    vhost_stop stop;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 08/18] vhost-user: add vhost_user to hold the chr
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (6 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 07/18] vhost: add vhost_dev stop callback marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 09/18] qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd marcandre.lureau
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Next patches will add more fields to the structure

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/virtio/vhost-user.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 5914e85..02ac1fc 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -108,6 +108,10 @@ static VhostUserMsg m __attribute__ ((unused));
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    (0x1)
 
+struct vhost_user {
+    CharDriverState *chr;
+};
+
 static bool ioeventfd_enabled(void)
 {
     return kvm_enabled() && kvm_eventfds_enabled();
@@ -115,7 +119,8 @@ static bool ioeventfd_enabled(void)
 
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
-    CharDriverState *chr = dev->opaque;
+    struct vhost_user *u = dev->opaque;
+    CharDriverState *chr = u->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -176,7 +181,8 @@ static bool vhost_user_one_time_request(VhostUserRequest request)
 static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
-    CharDriverState *chr = dev->opaque;
+    struct vhost_user *u = dev->opaque;
+    CharDriverState *chr = u->chr;
     int size = VHOST_USER_HDR_SIZE + msg->size;
 
     /*
@@ -510,11 +516,14 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features;
+    struct vhost_user *u;
     int err;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
-    dev->opaque = opaque;
+    u = g_new0(struct vhost_user, 1);
+    u->chr = opaque;
+    dev->opaque = u;
 
     err = vhost_user_get_features(dev, &features);
     if (err < 0) {
@@ -559,8 +568,12 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
+    struct vhost_user *u;
+
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
+    u = dev->opaque;
+    g_free(u);
     dev->opaque = 0;
 
     return 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 09/18] qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (7 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 08/18] vhost-user: add vhost_user to hold the chr marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support marcandre.lureau
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tetsuya Mukawa, Ilya Maximets, Yuanhan Liu, jonshin, Michael S. Tsirkin

From: Tetsuya Mukawa <mukawa@igel.co.jp>

The patch introduces qemu_chr_disconnect(). The function is used for
closing a fd accepted by listen fd. Though we already have qemu_chr_delete(),
but it closes not only accepted fd but also listen fd. This new function
is used when we still want to keep listen fd.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 include/sysemu/char.h | 7 +++++++
 qemu-char.c           | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 307fd8f..2c39987 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -75,6 +75,7 @@ struct CharDriverState {
     IOReadHandler *chr_read;
     void *handler_opaque;
     void (*chr_close)(struct CharDriverState *chr);
+    void (*chr_disconnect)(struct CharDriverState *chr);
     void (*chr_accept_input)(struct CharDriverState *chr);
     void (*chr_set_echo)(struct CharDriverState *chr, bool echo);
     void (*chr_set_fe_open)(struct CharDriverState *chr, int fe_open);
@@ -143,6 +144,12 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon *backend);
  */
 CharDriverState *qemu_chr_new(const char *label, const char *filename,
                               void (*init)(struct CharDriverState *s));
+/**
+ * @qemu_chr_disconnect:
+ *
+ * Close a fd accpeted by character backend.
+ */
+void qemu_chr_disconnect(CharDriverState *chr);
 
 /**
  * @qemu_chr_new_noreplay:
diff --git a/qemu-char.c b/qemu-char.c
index 3e25c08..e249b0a 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -4011,6 +4011,13 @@ void qemu_chr_fe_release(CharDriverState *s)
     s->avail_connections++;
 }
 
+void qemu_chr_disconnect(CharDriverState *chr)
+{
+    if (chr->chr_disconnect) {
+        chr->chr_disconnect(chr);
+    }
+}
+
 static void qemu_chr_free_common(CharDriverState *chr)
 {
     g_free(chr->filename);
@@ -4404,6 +4411,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
     chr->chr_write = tcp_chr_write;
     chr->chr_sync_read = tcp_chr_sync_read;
     chr->chr_close = tcp_chr_close;
+    chr->chr_disconnect = tcp_chr_disconnect;
     chr->get_msgfds = tcp_get_msgfds;
     chr->set_msgfds = tcp_set_msgfds;
     chr->chr_add_client = tcp_chr_add_client;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (8 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 09/18] qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 20:33   ` Eric Blake
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support marcandre.lureau
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Learn to give a socket to the slave to let him make requests to the
master.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 docs/specs/vhost-user.txt | 23 ++++++++++++++++
 hw/virtio/vhost-user.c    | 69 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 0312d40..8a635fa 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -248,12 +248,25 @@ Once the source has finished migration, rings will be stopped by
 the source. No further update must be done before rings are
 restarted.
 
+Slave communication
+-------------------
+
+An optional communication channel is provided if the slave
+declares VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL feature, to allow
+the slave to make requests to the master.
+
+The fd is provided via VHOST_USER_SET_SLAVE_FD ancillary data.
+
+A slave may then send VHOST_USER_SLAVE_* messages to the master
+using this fd communication channel.
+
 Protocol features
 -----------------
 
 #define VHOST_USER_PROTOCOL_F_MQ             0
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD      1
 #define VHOST_USER_PROTOCOL_F_RARP           2
+#define VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL  3
 
 Message types
 -------------
@@ -464,3 +477,13 @@ Message types
       is present in VHOST_USER_GET_PROTOCOL_FEATURES.
       The first 6 bytes of the payload contain the mac address of the guest to
       allow the vhost user backend to construct and broadcast the fake RARP.
+
+ * VHOST_USER_SET_SLAVE_FD
+      Id: 20
+      Equivalent ioctl: N/A
+      Master payload: N/A
+
+      Set the file descriptor for the salve to make VHOST_USER_SLAVE_*
+      request to the master. It is passed in the ancillary data.
+      This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
+      feature is available.
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 02ac1fc..890c1bd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -32,6 +32,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_MQ = 0,
     VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
     VHOST_USER_PROTOCOL_F_RARP = 2,
+    VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL = 3,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -59,9 +60,16 @@ typedef enum VhostUserRequest {
     VHOST_USER_GET_QUEUE_NUM = 17,
     VHOST_USER_SET_VRING_ENABLE = 18,
     VHOST_USER_SEND_RARP = 19,
+    VHOST_USER_SET_SLAVE_FD = 20,
+
     VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+    VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_MAX
+}  VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -110,6 +118,7 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
     CharDriverState *chr;
+    int slave_fd;
 };
 
 static bool ioeventfd_enabled(void)
@@ -513,6 +522,55 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
+static void slave_read(void *opaque)
+{
+    struct vhost_dev *dev = opaque;
+    struct vhost_user *u = dev->opaque;
+    VhostUserMsg msg = { 0, };
+    int size;
+
+    size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE);
+    if (size != VHOST_USER_HDR_SIZE) {
+        error_report("Failed to read from slave.");
+        qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
+        close(u->slave_fd);
+        u->slave_fd = -1;
+        return;
+    }
+
+    switch (msg.request) {
+    default:
+        error_report("Received unexpected msg type.");
+    }
+}
+
+static int vhost_setup_slave_channel(struct vhost_dev *dev)
+{
+    VhostUserMsg msg = {
+        .request = VHOST_USER_SET_SLAVE_FD,
+        .flags = VHOST_USER_VERSION,
+    };
+    struct vhost_user *u = dev->opaque;
+    int sv[2];
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL)) {
+        return 0;
+    }
+
+    if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+        error_report("socketpair() failed");
+        return -1;
+    }
+
+    u->slave_fd = sv[0];
+    qemu_set_fd_handler(u->slave_fd, slave_read, NULL, dev);
+
+    vhost_user_write(dev, &msg, &sv[1], 1);
+
+    return 0;
+}
+
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features;
@@ -523,6 +581,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 
     u = g_new0(struct vhost_user, 1);
     u->chr = opaque;
+    u->slave_fd = -1;
     dev->opaque = u;
 
     err = vhost_user_get_features(dev, &features);
@@ -563,6 +622,12 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
                    "VHOST_USER_PROTOCOL_F_LOG_SHMFD feature.");
     }
 
+
+    err = vhost_setup_slave_channel(dev);
+    if (err < 0) {
+        return err;
+    }
+
     return 0;
 }
 
@@ -573,6 +638,10 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = dev->opaque;
+    if (u->slave_fd >= 0) {
+        close(u->slave_fd);
+        u->slave_fd = -1;
+    }
     g_free(u);
     dev->opaque = 0;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (9 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-13  2:49   ` Yuanhan Liu
  2016-04-28  5:23   ` Yuanhan Liu
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 12/18] vhost-user: disconnect on start failure marcandre.lureau
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 docs/specs/vhost-user.txt | 15 +++++++++++++++
 hw/virtio/vhost-user.c    | 16 ++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 8a635fa..60d6d13 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -487,3 +487,18 @@ Message types
       request to the master. It is passed in the ancillary data.
       This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
       feature is available.
+
+Slave message types
+-------------------
+
+ * VHOST_USER_SLAVE_SHUTDOWN:
+      Id: 1
+      Master payload: N/A
+      Slave payload: u64
+
+      Request the master to shutdown the slave. A 0 reply is for
+      success, in which case the slave may close all connections
+      immediately and quit. A non-zero reply cancels the request.
+
+      Before a reply comes, the master may make other requests in
+      order to flush or sync state.
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 890c1bd..f91baee 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -67,6 +67,8 @@ typedef enum VhostUserRequest {
 
 typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_SHUTDOWN = 1,
+
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -539,6 +541,20 @@ static void slave_read(void *opaque)
     }
 
     switch (msg.request) {
+    case VHOST_USER_SLAVE_SHUTDOWN: {
+        uint64_t success = 1; /* 0 is for success */
+        if (dev->stop) {
+            dev->stop(dev);
+            success = 0;
+        }
+        msg.payload.u64 = success;
+        msg.size = sizeof(msg.payload.u64);
+        size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0);
+        if (size != VHOST_USER_HDR_SIZE + msg.size) {
+            error_report("Failed to write reply.");
+        }
+        break;
+    }
     default:
         error_report("Received unexpected msg type.");
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 12/18] vhost-user: disconnect on start failure
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (10 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 13/18] vhost-net: do not crash if backend is not present marcandre.lureau
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

If the backend failed to start (for example feature negociation failed),
do not exit, but disconnect the char device instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 net/vhost-user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 3dae53c..54849e9 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -204,7 +204,8 @@ static void net_vhost_user_event(void *opaque, int event)
         s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP,
                                          net_vhost_user_watch, s);
         if (vhost_user_start(queues, ncs) < 0) {
-            exit(1);
+            qemu_chr_disconnect(s->chr);
+            return;
         }
         qmp_set_link(name, true, &err);
         break;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 13/18] vhost-net: do not crash if backend is not present
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (11 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 12/18] vhost-user: disconnect on start failure marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 14/18] vhost-net: save & restore vhost-user acked features marcandre.lureau
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 1e4710d..6c40c4e 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -415,8 +415,13 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
     VHostNetState *net = get_vhost_net(nc);
-    const VhostOps *vhost_ops = net->dev.vhost_ops;
+    const VhostOps *vhost_ops;
+
+    if (!net) {
+        return 0;
+    }
 
+    vhost_ops = net->dev.vhost_ops;
     if (vhost_ops->vhost_set_vring_enable) {
         return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 14/18] vhost-net: save & restore vhost-user acked features
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (12 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 13/18] vhost-net: do not crash if backend is not present marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 15/18] vhost-net: save & restore vring enable state marcandre.lureau
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.

To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c       | 21 ++++++++++++++++++++-
 include/net/vhost-user.h |  1 +
 include/net/vhost_net.h  |  3 +++
 net/vhost-user.c         | 10 ++++++++++
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6c40c4e..ee39e9c 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -120,6 +120,11 @@ uint64_t vhost_net_get_max_queues(VHostNetState *net)
     return net->dev.max_queues;
 }
 
+uint64_t vhost_net_get_acked_features(VHostNetState *net)
+{
+    return net->dev.acked_features;
+}
+
 static int vhost_net_get_fd(NetClientState *backend)
 {
     switch (backend->info->type) {
@@ -148,6 +153,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
     int r;
     bool backend_kernel = options->backend_type == VHOST_BACKEND_TYPE_KERNEL;
     struct vhost_net *net = g_malloc(sizeof *net);
+    uint64_t features = 0;
 
     if (!options->net_backend) {
         fprintf(stderr, "vhost-net requires net backend to be setup\n");
@@ -197,8 +203,21 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
             goto fail;
         }
     }
+
     /* Set sane init value. Override when guest acks. */
-    vhost_net_ack_features(net, 0);
+    if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
+        features = vhost_user_get_acked_features(net->nc);
+        if (~net->dev.features & features) {
+            fprintf(stderr, "vhost lacks feature mask %" PRIu64
+                    " for backend\n",
+                    (uint64_t)(~net->dev.features & features));
+            vhost_dev_cleanup(&net->dev);
+            goto fail;
+        }
+    }
+
+    vhost_net_ack_features(net, features);
+
     return net;
 fail:
     g_free(net);
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 85109f6..efae35d 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -13,5 +13,6 @@
 
 struct vhost_net;
 struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
+uint64_t vhost_user_get_acked_features(NetClientState *nc);
 
 #endif /* VHOST_USER_H_ */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 3389b41..0bd4877 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -31,4 +31,7 @@ int vhost_net_notify_migration_done(VHostNetState *net, char* mac_addr);
 VHostNetState *get_vhost_net(NetClientState *nc);
 
 int vhost_set_vring_enable(NetClientState * nc, int enable);
+
+uint64_t vhost_net_get_acked_features(VHostNetState *net);
+
 #endif
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 54849e9..ae63e20 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -23,6 +23,7 @@ typedef struct VhostUserState {
     CharDriverState *chr;
     VHostNetState *vhost_net;
     int watch;
+    uint64_t acked_features;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -39,6 +40,13 @@ VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
     return s->vhost_net;
 }
 
+uint64_t vhost_user_get_acked_features(NetClientState *nc)
+{
+    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
+    return s->acked_features;
+}
+
 static int vhost_user_running(VhostUserState *s)
 {
     return (s->vhost_net) ? 1 : 0;
@@ -58,6 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
         }
 
         if (s->vhost_net) {
+            /* save acked features */
+            s->acked_features = vhost_net_get_acked_features(s->vhost_net);
             vhost_net_cleanup(s->vhost_net);
             s->vhost_net = NULL;
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 15/18] vhost-net: save & restore vring enable state
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (13 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 14/18] vhost-net: save & restore vhost-user acked features marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 16/18] test: vubr check " marcandre.lureau
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)

Restore the vring state when the vhost-user backend is started, so it
can process the ring.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 hw/net/vhost_net.c | 11 +++++++++++
 include/net/net.h  |  1 +
 2 files changed, 12 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ee39e9c..a56a794 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -343,6 +343,15 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         if (r < 0) {
             goto err_start;
         }
+
+        if (ncs[i].peer->vring_enable) {
+            /* restore vring enable state */
+            r = vhost_set_vring_enable(ncs[i].peer, ncs[i].peer->vring_enable);
+
+            if (r < 0) {
+                goto err_start;
+            }
+        }
     }
 
     return 0;
@@ -436,6 +445,8 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
     VHostNetState *net = get_vhost_net(nc);
     const VhostOps *vhost_ops;
 
+    nc->vring_enable = enable;
+
     if (!net) {
         return 0;
     }
diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..2c4b23a 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -92,6 +92,7 @@ struct NetClientState {
     NetClientDestructor *destructor;
     unsigned int queue_index;
     unsigned rxfilter_notify_enabled:1;
+    int vring_enable;
     QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 16/18] test: vubr check vring enable state
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (14 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 15/18] vhost-net: save & restore vring enable state marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 17/18] test: start vhost-user reconnect test marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test marcandre.lureau
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

The rings shouldn't be processed unless previously enabled.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/vhost-user-bridge.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 0779ba2..42450a6 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -710,7 +710,8 @@ vubr_backend_recv_cb(int sock, void *ctx)
     int buflen = sizeof(buf);
     int len;
 
-    if (!dev->ready) {
+    if (!dev->ready || !rx_vq->enable) {
+        DPRINT("\n NOT READY: dev: %d, rx: %d\n", dev->ready, rx_vq->enable);
         return;
     }
 
@@ -747,8 +748,11 @@ vubr_kick_cb(int sock, void *ctx)
     if (rc == -1) {
         vubr_die("eventfd_read()");
     } else {
-        DPRINT("Got kick_data: %016"PRIx64"\n", kick_data);
-        vubr_process_avail(dev, &dev->vq[1]);
+        DPRINT("Got kick_data: %016"PRIx64", vq enabled: %d\n",
+               kick_data, dev->vq[1].enable);
+        if (dev->vq[1].enable) {
+            vubr_process_avail(dev, &dev->vq[1]);
+        }
     }
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 17/18] test: start vhost-user reconnect test
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (15 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 16/18] test: vubr check " marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test marcandre.lureau
  17 siblings, 0 replies; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Check that SLAVE_FD & SHUTDOWN message work and that the master
asked for the ring read status.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/Makefile          |   2 +-
 tests/vhost-user-test.c | 184 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 170 insertions(+), 16 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 9293d49..8e4807e 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -577,7 +577,7 @@ tests/usb-hcd-uhci-test$(EXESUF): tests/usb-hcd-uhci-test.o $(libqos-usb-obj-y)
 tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o $(libqos-usb-obj-y)
 tests/usb-hcd-xhci-test$(EXESUF): tests/usb-hcd-xhci-test.o $(libqos-usb-obj-y)
 tests/pc-cpu-test$(EXESUF): tests/pc-cpu-test.o
-tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o $(qtest-obj-y) $(test-io-obj-y)
+tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o $(qtest-obj-y) $(test-io-obj-y) $(libqos-pc-obj-y) $(libqos-virtio-obj-y)
 tests/qemu-iotests/socket_scm_helper$(EXESUF): tests/qemu-iotests/socket_scm_helper.o
 tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o $(test-util-obj-y)
 tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o $(test-block-obj-y)
diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
index 6961596..17ebfd1 100644
--- a/tests/vhost-user-test.c
+++ b/tests/vhost-user-test.c
@@ -21,6 +21,12 @@
 #include <sys/mman.h>
 #include <sys/vfs.h>
 #include <qemu/sockets.h>
+#include "libqos/pci-pc.h"
+#include "libqos/virtio.h"
+#include "libqos/virtio-pci.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/malloc-generic.h"
 
 /* GLIB version compatibility flags */
 #if !GLIB_CHECK_VERSION(2, 26, 0)
@@ -34,7 +40,7 @@
 #define QEMU_CMD_ACCEL  " -machine accel=tcg"
 #define QEMU_CMD_MEM    " -m %d -object memory-backend-file,id=mem,size=%dM,"\
                         "mem-path=%s,share=on -numa node,memdev=mem"
-#define QEMU_CMD_CHR    " -chardev socket,id=%s,path=%s"
+#define QEMU_CMD_CHR    " -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=%s,vhostforce"
 #define QEMU_CMD_NET    " -device virtio-net-pci,netdev=net0,romfile=./pc-bios/pxe-virtio.rom"
 
@@ -49,6 +55,7 @@
 
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
+#define VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL 3
 
 #define VHOST_LOG_PAGE 0x1000
 
@@ -71,9 +78,16 @@ typedef enum VhostUserRequest {
     VHOST_USER_GET_PROTOCOL_FEATURES = 15,
     VHOST_USER_SET_PROTOCOL_FEATURES = 16,
     VHOST_USER_SET_VRING_ENABLE = 18,
+    VHOST_USER_SET_SLAVE_FD = 20,
     VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+    VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_SHUTDOWN = 1,
+    VHOST_USER_SLAVE_MAX
+} VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -119,6 +133,8 @@ static VhostUserMsg m __attribute__ ((unused));
 
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    (0x1)
+
+#define VIRTIO_QUEUE_MAX 1024
 /*****************************************************************************/
 
 typedef struct TestServer {
@@ -133,6 +149,8 @@ typedef struct TestServer {
     GCond data_cond;
     int log_fd;
     uint64_t rings;
+    int slave_fd;
+    uint16_t set_base[VIRTIO_QUEUE_MAX];
 } TestServer;
 
 #if !GLIB_CHECK_VERSION(2, 32, 0)
@@ -269,7 +287,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int size)
         /* send back features to qemu */
         msg.flags |= VHOST_USER_REPLY_MASK;
         msg.size = sizeof(m.payload.u64);
-        msg.payload.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
+        msg.payload.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
+            1 << VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL;
         p = (uint8_t *) &msg;
         qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
         break;
@@ -324,6 +343,12 @@ static void chr_read(void *opaque, const uint8_t *buf, int size)
     case VHOST_USER_SET_VRING_BASE:
         assert(msg.payload.state.index < 2);
         s->rings |= 0x1ULL << msg.payload.state.index;
+        s->set_base[msg.payload.state.index] = msg.payload.state.num;
+        break;
+
+    case VHOST_USER_SET_SLAVE_FD:
+        qemu_chr_fe_get_msgfds(chr, &s->slave_fd, 1);
+        g_cond_signal(&s->data_cond);
         break;
 
     default:
@@ -360,7 +385,7 @@ static const char *init_hugepagefs(const char *path)
     return path;
 }
 
-static TestServer *test_server_new(const gchar *name)
+static TestServer *test_server_new(const gchar *name, const gchar *chr_opt)
 {
     TestServer *server = g_new0(TestServer, 1);
     gchar *chr_path;
@@ -368,7 +393,8 @@ static TestServer *test_server_new(const gchar *name)
     server->socket_path = g_strdup_printf("%s/%s.sock", tmpfs, name);
     server->mig_path = g_strdup_printf("%s/%s.mig", tmpfs, name);
 
-    chr_path = g_strdup_printf("unix:%s,server,nowait", server->socket_path);
+    chr_path = g_strdup_printf(chr_opt ?: "unix:%s,server,nowait",
+                               server->socket_path);
     server->chr_name = g_strdup_printf("chr-%s", name);
     server->chr = qemu_chr_new(server->chr_name, chr_path, NULL);
     g_free(chr_path);
@@ -379,17 +405,18 @@ static TestServer *test_server_new(const gchar *name)
     g_cond_init(&server->data_cond);
 
     server->log_fd = -1;
+    server->slave_fd = -1;
 
     return server;
 }
 
-#define GET_QEMU_CMD(s)                                                        \
-    g_strdup_printf(QEMU_CMD, 512, 512, (root), (s)->chr_name,                 \
-                    (s)->socket_path, (s)->chr_name)
+#define GET_QEMU_CMD(s)                                         \
+    g_strdup_printf(QEMU_CMD, 512, 512, (root), (s)->chr_name,  \
+                    (s)->socket_path, "", (s)->chr_name)
 
-#define GET_QEMU_CMDE(s, mem, extra, ...)                                      \
-    g_strdup_printf(QEMU_CMD extra, (mem), (mem), (root), (s)->chr_name,       \
-                    (s)->socket_path, (s)->chr_name, ##__VA_ARGS__)
+#define GET_QEMU_CMDE(s, mem, chr_opts, extra, ...)                     \
+    g_strdup_printf(QEMU_CMD extra, (mem), (mem), (root), (s)->chr_name, \
+                    (s)->socket_path, (chr_opts), (s)->chr_name, ##__VA_ARGS__)
 
 static gboolean _test_server_free(TestServer *server)
 {
@@ -405,6 +432,10 @@ static gboolean _test_server_free(TestServer *server)
         close(server->log_fd);
     }
 
+    if (server->slave_fd != -1) {
+        close(server->slave_fd);
+    }
+
     unlink(server->socket_path);
     g_free(server->socket_path);
 
@@ -527,8 +558,8 @@ GSourceFuncs test_migrate_source_funcs = {
 
 static void test_migrate(void)
 {
-    TestServer *s = test_server_new("src");
-    TestServer *dest = test_server_new("dest");
+    TestServer *s = test_server_new("src", NULL);
+    TestServer *dest = test_server_new("dest", NULL);
     char *uri = g_strdup_printf("%s%s", "unix:", dest->mig_path);
     QTestState *global = global_qtest, *from, *to;
     GSource *source;
@@ -537,7 +568,7 @@ static void test_migrate(void)
     guint8 *log;
     guint64 size;
 
-    cmd = GET_QEMU_CMDE(s, 2, "");
+    cmd = GET_QEMU_CMDE(s, 2, "", "");
     from = qtest_start(cmd);
     g_free(cmd);
 
@@ -545,7 +576,7 @@ static void test_migrate(void)
     size = get_log_size(s);
     g_assert_cmpint(size, ==, (2 * 1024 * 1024) / (VHOST_LOG_PAGE * 8));
 
-    cmd = GET_QEMU_CMDE(dest, 2, " -incoming %s", uri);
+    cmd = GET_QEMU_CMDE(dest, 2, "", " -incoming %s", uri);
     to = qtest_init(cmd);
     g_free(cmd);
 
@@ -605,6 +636,128 @@ static void test_migrate(void)
     global_qtest = global;
 }
 
+static void wait_for_slave_fd(TestServer *s)
+{
+    gint64 end_time;
+
+    g_mutex_lock(&s->data_mutex);
+    end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
+    while (s->slave_fd == -1) {
+        if (!g_cond_wait_until(&s->data_cond, &s->data_mutex, end_time)) {
+            /* timeout has passed */
+            g_assert(s->slave_fd != -1);
+            break;
+        }
+    }
+
+    g_mutex_unlock(&s->data_mutex);
+}
+
+static void wait_for_rings_started(TestServer *s, size_t count)
+{
+    gint64 end_time;
+
+    g_mutex_lock(&s->data_mutex);
+    end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
+    while (ctpop64(s->rings) == count) {
+        if (!g_cond_wait_until(&s->data_cond, &s->data_mutex, end_time)) {
+            /* timeout has passed */
+            g_assert_cmpint(ctpop64(s->rings), ==, count);
+            break;
+        }
+    }
+
+    g_mutex_unlock(&s->data_mutex);
+}
+
+#define PCI_SLOT                0x04
+
+static QVirtioPCIDevice *virtio_net_pci_init(QPCIBus *bus, int slot)
+{
+    QVirtioPCIDevice *dev;
+
+    dev = qvirtio_pci_device_find(bus, QVIRTIO_NET_DEVICE_ID);
+    g_assert(dev != NULL);
+    g_assert_cmphex(dev->vdev.device_type, ==, QVIRTIO_NET_DEVICE_ID);
+
+    qvirtio_pci_device_enable(dev);
+    qvirtio_reset(&qvirtio_pci, &dev->vdev);
+    qvirtio_set_acknowledge(&qvirtio_pci, &dev->vdev);
+    qvirtio_set_driver(&qvirtio_pci, &dev->vdev);
+
+    return dev;
+}
+
+static void virtio_driver_init(const QVirtioBus *bus, QVirtioDevice *dev)
+{
+    uint32_t features;
+
+    features = qvirtio_get_features(bus, dev);
+    features = features & ~(QVIRTIO_F_BAD_FEATURE |
+                            QVIRTIO_F_RING_INDIRECT_DESC |
+                            QVIRTIO_F_RING_EVENT_IDX);
+    qvirtio_set_features(bus, dev, features);
+
+    qvirtio_set_driver_ok(bus, dev);
+}
+
+static void test_reconnect(void)
+{
+    TestServer *s = test_server_new("reconnect", "unix:%s,reconnect=1");
+    QTestState *global = global_qtest, *from;
+    VhostUserMsg msg = {
+        .request = VHOST_USER_SLAVE_SHUTDOWN,
+        .flags = VHOST_USER_VERSION
+    };
+    gchar *cmd;
+    int ret;
+    QPCIBus *bus;
+    QVirtioPCIDevice *dev;
+    QGuestAllocator *alloc;
+
+    cmd = GET_QEMU_CMDE(s, 2, ",server", "");
+    from = qtest_start(cmd);
+    wait_for_slave_fd(s);
+    g_free(cmd);
+
+    bus = qpci_init_pc();
+    dev = virtio_net_pci_init(bus, PCI_SLOT);
+    alloc = pc_alloc_init();
+    virtio_driver_init(&qvirtio_pci, &dev->vdev);
+
+    wait_for_rings_started(s, 2);
+
+    /* request shutdown */
+    ret = send(s->slave_fd, &msg, VHOST_USER_HDR_SIZE, 0);
+    g_assert_cmpint(ret, ==, VHOST_USER_HDR_SIZE);
+    ret = read(s->slave_fd, &msg, VHOST_USER_HDR_SIZE);
+    g_assert_cmpint(ret, ==, VHOST_USER_HDR_SIZE);
+    g_assert_cmpint(msg.payload.u64, ==, 0);
+    g_assert_cmpint(s->rings, ==, 0);
+
+    close(s->slave_fd);
+    s->slave_fd = -1;
+
+    /* reconnect */
+    s->fds_num = 0;
+    qemu_chr_disconnect(s->chr);
+    wait_for_fds(s);
+
+    /* FIXME: manipulate vring to change last used index */
+    g_assert_cmpint(s->set_base[0], ==, 0);
+    g_assert_cmpint(s->set_base[1], ==, 0);
+
+    pc_alloc_uninit(alloc);
+    qvirtio_pci_device_disable(dev);
+    g_free(dev);
+    qpci_free_pc(bus);
+
+    qtest_quit(from);
+    test_server_free(s);
+
+    global_qtest = global;
+}
+
 int main(int argc, char **argv)
 {
     QTestState *s = NULL;
@@ -635,7 +788,7 @@ int main(int argc, char **argv)
         root = tmpfs;
     }
 
-    server = test_server_new("test");
+    server = test_server_new("test", NULL);
 
     loop = g_main_loop_new(NULL, FALSE);
     /* run the main loop thread so the chardev may operate */
@@ -648,6 +801,7 @@ int main(int argc, char **argv)
 
     qtest_add_data_func("/vhost-user/read-guest-mem", server, read_guest_mem);
     qtest_add_func("/vhost-user/migrate", test_migrate);
+    qtest_add_func("/vhost-user/reconnect", test_reconnect);
 
     ret = g_test_run();
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test
  2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
                   ` (16 preceding siblings ...)
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 17/18] test: start vhost-user reconnect test marcandre.lureau
@ 2016-04-01 11:16 ` marcandre.lureau
  2016-04-13  2:52   ` Yuanhan Liu
  17 siblings, 1 reply; 42+ messages in thread
From: marcandre.lureau @ 2016-04-01 11:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yuanhan Liu, Michael S. Tsirkin, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

The bridge can now be interrupted with ctrl-c. Once the slave channel is
up, it will request a shutdown, and wait for success reply to exit.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/vhost-user-bridge.c | 102 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 99 insertions(+), 3 deletions(-)

diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 42450a6..ea123be 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -135,6 +135,9 @@ dispatcher_wait(Dispatcher *dispr, uint32_t timeout)
     int rc = select(dispr->max_sock + 1, &fdset, 0, 0, &tv);
 
     if (rc == -1) {
+        if (errno == EINTR) {
+            return 0;
+        }
         vubr_die("select");
     }
 
@@ -186,6 +189,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_MQ = 0,
     VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
     VHOST_USER_PROTOCOL_F_RARP = 2,
+    VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL = 3,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -213,9 +217,16 @@ typedef enum VhostUserRequest {
     VHOST_USER_GET_QUEUE_NUM = 17,
     VHOST_USER_SET_VRING_ENABLE = 18,
     VHOST_USER_SEND_RARP = 19,
+    VHOST_USER_SET_SLAVE_FD = 20,
     VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+    VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_SHUTDOWN = 1,
+    VHOST_USER_SLAVE_MAX
+} VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -288,6 +299,8 @@ typedef struct VubrDev {
     int ready;
     uint64_t features;
     int hdrlen;
+    int slave_fd;
+    bool shutdown_requested;
 } VubrDev;
 
 static const char *vubr_request_str[] = {
@@ -311,7 +324,14 @@ static const char *vubr_request_str[] = {
     [VHOST_USER_GET_QUEUE_NUM]          =  "VHOST_USER_GET_QUEUE_NUM",
     [VHOST_USER_SET_VRING_ENABLE]       =  "VHOST_USER_SET_VRING_ENABLE",
     [VHOST_USER_SEND_RARP]              =  "VHOST_USER_SEND_RARP",
-    [VHOST_USER_MAX]                    =  "VHOST_USER_MAX",
+    [VHOST_USER_SET_SLAVE_FD]           =  "VHOST_USER_SET_SLAVE_FD",
+    [VHOST_USER_MAX]                    =  "VHOST_USER_MAX"
+};
+
+static const char *vubr_slave_request_str[] = {
+    [VHOST_USER_SLAVE_NONE]             =  "VHOST_USER_SLAVE_NONE",
+    [VHOST_USER_SLAVE_SHUTDOWN]         =  "VHOST_USER_SLAVE_SHUTDOWN",
+    [VHOST_USER_SLAVE_MAX]              =  "VHOST_USER_SLAVE_MAX"
 };
 
 static void
@@ -638,7 +658,7 @@ vubr_process_desc(VubrDev *dev, VubrVirtq *vq)
     size_t buf_size = 4096;
     uint8_t buf[4096];
 
-    DPRINT("Chunks: ");
+    DPRINT("Chunks: aidx:%d   ", a_index);
     i = d_index;
     do {
         void *chunk_start = (void *)(uintptr_t)gpa_to_va(dev, desc[i].addr);
@@ -1063,7 +1083,9 @@ vubr_set_vring_err_exec(VubrDev *dev, VhostUserMsg *vmsg)
 static int
 vubr_get_protocol_features_exec(VubrDev *dev, VhostUserMsg *vmsg)
 {
-    vmsg->payload.u64 = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
+    vmsg->payload.u64 =
+        1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
+        1ULL << VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL;
     DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
     vmsg->size = sizeof(vmsg->payload.u64);
 
@@ -1105,6 +1127,46 @@ vubr_send_rarp_exec(VubrDev *dev, VhostUserMsg *vmsg)
     return 0;
 }
 
+static void
+vubr_handle_slave_reply(VhostUserMsg *vmsg)
+{
+    DPRINT(
+        "==================   Vhost slave reply from QEMU   ==================\n");
+    DPRINT("Request: %s (%d)\n", vubr_slave_request_str[vmsg->request],
+           vmsg->request);
+    DPRINT("Flags:   0x%x\n", vmsg->flags);
+    DPRINT("Size:    %d\n", vmsg->size);
+
+    switch (vmsg->request) {
+    case VHOST_USER_SLAVE_SHUTDOWN:
+        DPRINT("Shutdown success: 0x%016"PRIx64"\n", vmsg->payload.u64);
+        if (vmsg->payload.u64 == 0) {
+            exit(0);
+        }
+    default:
+        DPRINT("Invalid slave reply");
+    };
+}
+
+static void
+slave_receive_cb(int sock, void *ctx)
+{
+    VhostUserMsg vmsg;
+
+    vubr_message_read(sock, &vmsg);
+    vubr_handle_slave_reply(&vmsg);
+}
+
+static int
+vubr_set_slave_fd_exec(VubrDev *dev, VhostUserMsg *vmsg)
+{
+    assert(vmsg->fd_num == 1);
+    dev->slave_fd = vmsg->fds[0];
+    DPRINT("Got slave_fd: %d\n", vmsg->fds[0]);
+    dispatcher_add(&dev->dispatcher, dev->slave_fd, dev, slave_receive_cb);
+    return 0;
+}
+
 static int
 vubr_execute_request(VubrDev *dev, VhostUserMsg *vmsg)
 {
@@ -1166,6 +1228,8 @@ vubr_execute_request(VubrDev *dev, VhostUserMsg *vmsg)
         return vubr_set_vring_enable_exec(dev, vmsg);
     case VHOST_USER_SEND_RARP:
         return vubr_send_rarp_exec(dev, vmsg);
+    case VHOST_USER_SET_SLAVE_FD:
+        return vubr_set_slave_fd_exec(dev, vmsg);
 
     case VHOST_USER_MAX:
         assert(vmsg->request != VHOST_USER_MAX);
@@ -1226,6 +1290,7 @@ vubr_new(const char *path)
         };
     }
 
+    dev->slave_fd = -1;
     /* Init log */
     dev->log_call_fd = -1;
     dev->log_size = 0;
@@ -1333,12 +1398,42 @@ vubr_backend_udp_setup(VubrDev *dev,
 }
 
 static void
+vubr_request_shutdown(VubrDev *dev)
+{
+    VhostUserMsg vmsg = {
+        .request = VHOST_USER_SLAVE_SHUTDOWN,
+        .flags = VHOST_USER_VERSION
+    };
+
+    DPRINT("requesting shutdown\n");
+    vubr_message_write(dev->slave_fd, &vmsg);
+}
+
+static volatile int interrupted;
+
+static void interrupt_handler(int dummy)
+{
+    interrupted = 1;
+}
+
+static void
 vubr_run(VubrDev *dev)
 {
     while (1) {
         /* timeout 200ms */
         dispatcher_wait(&dev->dispatcher, 200000);
         /* Here one can try polling strategy. */
+
+        if (interrupted) {
+            if (dev->slave_fd == -1) {
+                return;
+            }
+
+            if (!dev->shutdown_requested) {
+                vubr_request_shutdown(dev);
+                dev->shutdown_requested = 1;
+            }
+        }
     }
 }
 
@@ -1405,6 +1500,7 @@ main(int argc, char *argv[])
     }
 
     vubr_backend_udp_setup(dev, lhost, lport, rhost, rport);
+    signal(SIGINT, interrupt_handler);
     vubr_run(dev);
     return 0;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 01/18 for-2.6] tests: append i386 tests
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 01/18] tests: append i386 tests marcandre.lureau
@ 2016-04-01 20:30   ` Eric Blake
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Blake @ 2016-04-01 20:30 UTC (permalink / raw)
  To: marcandre.lureau, qemu-devel
  Cc: Ilya Maximets, Tetsuya Mukawa, Yuanhan Liu, jonshin, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

On 04/01/2016 05:16 AM, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Do not overwrite x86-64 tests, re-enable vhost-user-test.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  tests/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This one should go in 2.6.

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/tests/Makefile b/tests/Makefile
> index 45b9048..9293d49 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -222,7 +222,7 @@ endif
>  check-qtest-i386-y += tests/test-netfilter$(EXESUF)
>  check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
>  check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
> -check-qtest-x86_64-y = $(check-qtest-i386-y)
> +check-qtest-x86_64-y += $(check-qtest-i386-y)
>  gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
>  gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
>  check-qtest-mips-y = tests/endianness-test$(EXESUF)
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support marcandre.lureau
@ 2016-04-01 20:33   ` Eric Blake
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Blake @ 2016-04-01 20:33 UTC (permalink / raw)
  To: marcandre.lureau, qemu-devel
  Cc: Ilya Maximets, Tetsuya Mukawa, Yuanhan Liu, jonshin, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2316 bytes --]

On 04/01/2016 05:16 AM, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Learn to give a socket to the slave to let him make requests to the
> master.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  docs/specs/vhost-user.txt | 23 ++++++++++++++++
>  hw/virtio/vhost-user.c    | 69 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 92 insertions(+)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 0312d40..8a635fa 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -248,12 +248,25 @@ Once the source has finished migration, rings will be stopped by
>  the source. No further update must be done before rings are
>  restarted.
>  
> +Slave communication
> +-------------------
> +
> +An optional communication channel is provided if the slave
> +declares VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL feature, to allow

s/declares/declares the/

> +the slave to make requests to the master.
> +
> +The fd is provided via VHOST_USER_SET_SLAVE_FD ancillary data.
> +
> +A slave may then send VHOST_USER_SLAVE_* messages to the master
> +using this fd communication channel.
> +
>  Protocol features
>  -----------------
>  
>  #define VHOST_USER_PROTOCOL_F_MQ             0
>  #define VHOST_USER_PROTOCOL_F_LOG_SHMFD      1
>  #define VHOST_USER_PROTOCOL_F_RARP           2
> +#define VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL  3
>  
>  Message types
>  -------------
> @@ -464,3 +477,13 @@ Message types
>        is present in VHOST_USER_GET_PROTOCOL_FEATURES.
>        The first 6 bytes of the payload contain the mac address of the guest to
>        allow the vhost user backend to construct and broadcast the fake RARP.
> +
> + * VHOST_USER_SET_SLAVE_FD
> +      Id: 20
> +      Equivalent ioctl: N/A
> +      Master payload: N/A
> +
> +      Set the file descriptor for the salve to make VHOST_USER_SLAVE_*

s/salve/slave/

> +      request to the master. It is passed in the ancillary data.
> +      This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
> +      feature is available.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support marcandre.lureau
@ 2016-04-13  2:49   ` Yuanhan Liu
  2016-04-13  9:51     ` Marc-André Lureau
  2016-04-28  5:23   ` Yuanhan Liu
  1 sibling, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2016-04-13  2:49 UTC (permalink / raw)
  To: marcandre.lureau
  Cc: qemu-devel, Michael S. Tsirkin, Tetsuya Mukawa, jonshin, Ilya Maximets

Hi Marc,

First of all, sorry again for late response!

Last time I tried with your first version, I found few issues related
with reconnect, mainly on the acked_feautres lost. While checking your
new code, I found that you've already solved that, which is great.

So, I tried harder this time, your patches work great, except that I
found few nits.

On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
...
> +Slave message types
> +-------------------
> +
> + * VHOST_USER_SLAVE_SHUTDOWN:
> +      Id: 1
> +      Master payload: N/A
> +      Slave payload: u64
> +
> +      Request the master to shutdown the slave. A 0 reply is for
> +      success, in which case the slave may close all connections
> +      immediately and quit.

Assume we are using ovs + dpdk here, that we could have two
vhost-user connections. While ovs tries to initiate a restart,
it might unregister the two connections one by one. In such
case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent,
and two replies will get. Therefore, I don't think it's a
proper ask here to let the backend implementation to do quit
here.


>  
>      switch (msg.request) {
> +    case VHOST_USER_SLAVE_SHUTDOWN: {
> +        uint64_t success = 1; /* 0 is for success */
> +        if (dev->stop) {
> +            dev->stop(dev);
> +            success = 0;
> +        }
> +        msg.payload.u64 = success;
> +        msg.size = sizeof(msg.payload.u64);
> +        size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0);
> +        if (size != VHOST_USER_HDR_SIZE + msg.size) {
> +            error_report("Failed to write reply.");
> +        }
> +        break;

You might want to remove the slave_fd from watch list? We
might also need to close slave_fd here, assuming that we
will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is
received?

I'm asking because I found a seg fault issue sometimes,
due to opaque is NULL.


	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test marcandre.lureau
@ 2016-04-13  2:52   ` Yuanhan Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2016-04-13  2:52 UTC (permalink / raw)
  To: marcandre.lureau
  Cc: qemu-devel, Michael S. Tsirkin, Tetsuya Mukawa, jonshin, Ilya Maximets

On Fri, Apr 01, 2016 at 01:16:28PM +0200, marcandre.lureau@redhat.com wrote:
> +static void
> +vubr_handle_slave_reply(VhostUserMsg *vmsg)
> +{
> +    DPRINT(
> +        "==================   Vhost slave reply from QEMU   ==================\n");
> +    DPRINT("Request: %s (%d)\n", vubr_slave_request_str[vmsg->request],
> +           vmsg->request);
> +    DPRINT("Flags:   0x%x\n", vmsg->flags);
> +    DPRINT("Size:    %d\n", vmsg->size);
> +
> +    switch (vmsg->request) {
> +    case VHOST_USER_SLAVE_SHUTDOWN:
> +        DPRINT("Shutdown success: 0x%016"PRIx64"\n", vmsg->payload.u64);
> +        if (vmsg->payload.u64 == 0) {
> +            exit(0);
> +        }
> +    default:
> +        DPRINT("Invalid slave reply");
> +    };
       ^^

Minor nit: redundant ';'.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-13  2:49   ` Yuanhan Liu
@ 2016-04-13  9:51     ` Marc-André Lureau
  2016-04-13 17:32       ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Marc-André Lureau @ 2016-04-13  9:51 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: marcandre lureau, qemu-devel, Michael S. Tsirkin, Tetsuya Mukawa,
	jonshin, Ilya Maximets

Hi

----- Original Message -----
> Hi Marc,
> 
> First of all, sorry again for late response!
> 
> Last time I tried with your first version, I found few issues related
> with reconnect, mainly on the acked_feautres lost. While checking your
> new code, I found that you've already solved that, which is great.
> 
> So, I tried harder this time, your patches work great, except that I
> found few nits.
> 
> On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> ...
> > +Slave message types
> > +-------------------
> > +
> > + * VHOST_USER_SLAVE_SHUTDOWN:
> > +      Id: 1
> > +      Master payload: N/A
> > +      Slave payload: u64
> > +
> > +      Request the master to shutdown the slave. A 0 reply is for
> > +      success, in which case the slave may close all connections
> > +      immediately and quit.
> 
> Assume we are using ovs + dpdk here, that we could have two
> vhost-user connections. While ovs tries to initiate a restart,
> it might unregister the two connections one by one. In such
> case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent,
> and two replies will get. Therefore, I don't think it's a
> proper ask here to let the backend implementation to do quit
> here.
> 

On success reply, the master sent all the commands to finish the connection. So the slave must flush/finish all pending requests first. I think this should be enough, otherwise we may need a new explicit message?

> 
> >  
> >      switch (msg.request) {
> > +    case VHOST_USER_SLAVE_SHUTDOWN: {
> > +        uint64_t success = 1; /* 0 is for success */
> > +        if (dev->stop) {
> > +            dev->stop(dev);
> > +            success = 0;
> > +        }
> > +        msg.payload.u64 = success;
> > +        msg.size = sizeof(msg.payload.u64);
> > +        size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0);
> > +        if (size != VHOST_USER_HDR_SIZE + msg.size) {
> > +            error_report("Failed to write reply.");
> > +        }
> > +        break;
> 
> You might want to remove the slave_fd from watch list? We
> might also need to close slave_fd here, assuming that we
> will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is
> received?

Makes sense, I will change that in next update.

> I'm asking because I found a seg fault issue sometimes,
> due to opaque is NULL.
>

I would be interested to see the backtrace or have a reproducer.

thanks

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-13  9:51     ` Marc-André Lureau
@ 2016-04-13 17:32       ` Yuanhan Liu
  2016-04-13 21:43         ` Marc-André Lureau
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2016-04-13 17:32 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: marcandre lureau, qemu-devel, Michael S. Tsirkin, Tetsuya Mukawa,
	jonshin, Ilya Maximets

On Wed, Apr 13, 2016 at 05:51:15AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > Hi Marc,
> > 
> > First of all, sorry again for late response!
> > 
> > Last time I tried with your first version, I found few issues related
> > with reconnect, mainly on the acked_feautres lost. While checking your
> > new code, I found that you've already solved that, which is great.
> > 
> > So, I tried harder this time, your patches work great, except that I
> > found few nits.
> > 
> > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote:
> > > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > ...
> > > +Slave message types
> > > +-------------------
> > > +
> > > + * VHOST_USER_SLAVE_SHUTDOWN:
> > > +      Id: 1
> > > +      Master payload: N/A
> > > +      Slave payload: u64
> > > +
> > > +      Request the master to shutdown the slave. A 0 reply is for
> > > +      success, in which case the slave may close all connections
> > > +      immediately and quit.
> > 
> > Assume we are using ovs + dpdk here, that we could have two
> > vhost-user connections. While ovs tries to initiate a restart,
> > it might unregister the two connections one by one. In such
> > case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent,
> > and two replies will get. Therefore, I don't think it's a
> > proper ask here to let the backend implementation to do quit
> > here.
> > 
> 
> On success reply, the master sent all the commands to finish the connection. So the slave must flush/finish all pending requests first.

Yes, that's okay. I here just mean the "close __all__ connections"
and "quit" part.

Firstly, we should do cleanup/flush/finish to it's own connection.
But not all, right?

Second, as stated, doing quit might not make sense, as we may
have more connections.

> I think this should be enough, otherwise we may need a new explicit message?
> 
> > 
> > >  
> > >      switch (msg.request) {
> > > +    case VHOST_USER_SLAVE_SHUTDOWN: {
> > > +        uint64_t success = 1; /* 0 is for success */
> > > +        if (dev->stop) {
> > > +            dev->stop(dev);
> > > +            success = 0;
> > > +        }
> > > +        msg.payload.u64 = success;
> > > +        msg.size = sizeof(msg.payload.u64);
> > > +        size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0);
> > > +        if (size != VHOST_USER_HDR_SIZE + msg.size) {
> > > +            error_report("Failed to write reply.");
> > > +        }
> > > +        break;
> > 
> > You might want to remove the slave_fd from watch list? We
> > might also need to close slave_fd here, assuming that we
> > will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is
> > received?
> 
> Makes sense, I will change that in next update.
> 
> > I'm asking because I found a seg fault issue sometimes,
> > due to opaque is NULL.

Oh, I was wrong, it's u being NULL, but not opaque.
> >
> 
> I would be interested to see the backtrace or have a reproducer.

It's a normal test steps: start a vhost-user switch (I'm using DPDK
vhost-switch example), kill it, and wait for a while (something like
more than 10s or even longer), then I saw a seg fault:

    (gdb) p dev
    $4 = (struct vhost_dev *) 0x555556571bf0
    (gdb) p u
    $5 = (struct vhost_user *) 0x0
    (gdb) where
    #0  0x0000555555798612 in slave_read (opaque=0x555556571bf0)
        at /home/yliu/qemu/hw/virtio/vhost-user.c:539
    #1  0x0000555555a343a4 in aio_dispatch (ctx=0x55555655f560) at /home/yliu/qemu/aio-posix.c:327
    #2  0x0000555555a2738b in aio_ctx_dispatch (source=0x55555655f560, callback=0x0, user_data=0x0)
        at /home/yliu/qemu/async.c:233
    #3  0x00007ffff51032a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
    #4  0x0000555555a3239e in glib_pollfds_poll () at /home/yliu/qemu/main-loop.c:213
    #5  0x0000555555a3247b in os_host_main_loop_wait (timeout=29875848) at /home/yliu/qemu/main-loop.c:258
    #6  0x0000555555a3252b in main_loop_wait (nonblocking=0) at /home/yliu/qemu/main-loop.c:506
    #7  0x0000555555846e35 in main_loop () at /home/yliu/qemu/vl.c:1934
    #8  0x000055555584e6bf in main (argc=31, argv=0x7fffffffe078, envp=0x7fffffffe178)
        at /home/yliu/qemu/vl.c:4658


	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-13 17:32       ` Yuanhan Liu
@ 2016-04-13 21:43         ` Marc-André Lureau
  2016-04-13 21:57           ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Marc-André Lureau @ 2016-04-13 21:43 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Michael S. Tsirkin, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa

On Wed, Apr 13, 2016 at 7:32 PM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
>>
>> > I'm asking because I found a seg fault issue sometimes,
>> > due to opaque is NULL.
>
> Oh, I was wrong, it's u being NULL, but not opaque.
>> >
>>
>> I would be interested to see the backtrace or have a reproducer.
>
> It's a normal test steps: start a vhost-user switch (I'm using DPDK
> vhost-switch example), kill it, and wait for a while (something like
> more than 10s or even longer), then I saw a seg fault:
>
>     (gdb) p dev
>     $4 = (struct vhost_dev *) 0x555556571bf0
>     (gdb) p u
>     $5 = (struct vhost_user *) 0x0
>     (gdb) where
>     #0  0x0000555555798612 in slave_read (opaque=0x555556571bf0)
>         at /home/yliu/qemu/hw/virtio/vhost-user.c:539
>     #1  0x0000555555a343a4 in aio_dispatch (ctx=0x55555655f560) at /home/yliu/qemu/aio-posix.c:327
>     #2  0x0000555555a2738b in aio_ctx_dispatch (source=0x55555655f560, callback=0x0, user_data=0x0)
>         at /home/yliu/qemu/async.c:233
>     #3  0x00007ffff51032a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>     #4  0x0000555555a3239e in glib_pollfds_poll () at /home/yliu/qemu/main-loop.c:213
>     #5  0x0000555555a3247b in os_host_main_loop_wait (timeout=29875848) at /home/yliu/qemu/main-loop.c:258
>     #6  0x0000555555a3252b in main_loop_wait (nonblocking=0) at /home/yliu/qemu/main-loop.c:506
>     #7  0x0000555555846e35 in main_loop () at /home/yliu/qemu/vl.c:1934
>     #8  0x000055555584e6bf in main (argc=31, argv=0x7fffffffe078, envp=0x7fffffffe178)
>         at /home/yliu/qemu/vl.c:4658
>

This patch set doesn't try to handle crashes from backend. This would
require a much more detailed study of the existing code path. A lot of
places assume the backend is fully working as expected. I think
handling backend crashes should be a different, later, patch set.



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-13 21:43         ` Marc-André Lureau
@ 2016-04-13 21:57           ` Yuanhan Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2016-04-13 21:57 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Michael S. Tsirkin, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa

On Wed, Apr 13, 2016 at 11:43:56PM +0200, Marc-André Lureau wrote:
> On Wed, Apr 13, 2016 at 7:32 PM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> >>
> >> > I'm asking because I found a seg fault issue sometimes,
> >> > due to opaque is NULL.
> >
> > Oh, I was wrong, it's u being NULL, but not opaque.
> >> >
> >>
> >> I would be interested to see the backtrace or have a reproducer.
> >
> > It's a normal test steps: start a vhost-user switch (I'm using DPDK
> > vhost-switch example), kill it, and wait for a while (something like
> > more than 10s or even longer), then I saw a seg fault:
> >
> >     (gdb) p dev
> >     $4 = (struct vhost_dev *) 0x555556571bf0
> >     (gdb) p u
> >     $5 = (struct vhost_user *) 0x0
> >     (gdb) where
> >     #0  0x0000555555798612 in slave_read (opaque=0x555556571bf0)
> >         at /home/yliu/qemu/hw/virtio/vhost-user.c:539
> >     #1  0x0000555555a343a4 in aio_dispatch (ctx=0x55555655f560) at /home/yliu/qemu/aio-posix.c:327
> >     #2  0x0000555555a2738b in aio_ctx_dispatch (source=0x55555655f560, callback=0x0, user_data=0x0)
> >         at /home/yliu/qemu/async.c:233
> >     #3  0x00007ffff51032a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> >     #4  0x0000555555a3239e in glib_pollfds_poll () at /home/yliu/qemu/main-loop.c:213
> >     #5  0x0000555555a3247b in os_host_main_loop_wait (timeout=29875848) at /home/yliu/qemu/main-loop.c:258
> >     #6  0x0000555555a3252b in main_loop_wait (nonblocking=0) at /home/yliu/qemu/main-loop.c:506
> >     #7  0x0000555555846e35 in main_loop () at /home/yliu/qemu/vl.c:1934
> >     #8  0x000055555584e6bf in main (argc=31, argv=0x7fffffffe078, envp=0x7fffffffe178)
> >         at /home/yliu/qemu/vl.c:4658
> >
> 
> This patch set doesn't try to handle crashes from backend. This would
> require a much more detailed study of the existing code path. A lot of
> places assume the backend is fully working as expected. I think
> handling backend crashes should be a different, later, patch set.

Oh, sorry for not making it clear. I actually did the kill by "ctrl-c".
It then is captured to send a SLAVE_SHUTDOWN request.  So, I would say
it's a normal quit.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 04/18] char: add wait support for reconnect
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 04/18] char: add wait support for reconnect marcandre.lureau
@ 2016-04-28  4:33   ` Yuanhan Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2016-04-28  4:33 UTC (permalink / raw)
  To: marcandre.lureau
  Cc: qemu-devel, Michael S. Tsirkin, Tetsuya Mukawa, jonshin, Ilya Maximets

On Fri, Apr 01, 2016 at 01:16:14PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Until now, 'wait' was solely used for listening sockets. However, it can
> also be useful for 'reconnect' socket kind, where the first open must
> succeed before continuing.
> 
> This allows for instance (with other support patches) having vhost-user
> wait for an initial connection to setup the vhost-net, before eventually
> accepting new connections.

I have met a tricky issue about this patch. Assume the socket exist in
before we start QEMU, but somehow we can not connect to it, say due to
permission error. In such case, QEMU would wait there forever, without
any error messages. I would assume it's a bug, right?

	--yliu
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  qemu-char.c | 31 +++++++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 8702931..3e25c08 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -3659,7 +3659,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
>                                    Error **errp)
>  {
>      bool is_listen      = qemu_opt_get_bool(opts, "server", false);
> -    bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
> +    bool is_wait        = qemu_opt_get_bool(opts, "wait", is_listen);
>      bool is_telnet      = qemu_opt_get_bool(opts, "telnet", false);
>      bool do_nodelay     = !qemu_opt_get_bool(opts, "delay", true);
>      int64_t reconnect   = qemu_opt_get_number(opts, "reconnect", 0);
> @@ -3696,7 +3696,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
>      sock->has_telnet = true;
>      sock->telnet = is_telnet;
>      sock->has_wait = true;
> -    sock->wait = is_waitconnect;
> +    sock->wait = is_wait;
>      sock->has_reconnect = true;
>      sock->reconnect = reconnect;
>      sock->tls_creds = g_strdup(tls_creds);
> @@ -4350,7 +4350,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
>      bool do_nodelay     = sock->has_nodelay ? sock->nodelay : false;
>      bool is_listen      = sock->has_server  ? sock->server  : true;
>      bool is_telnet      = sock->has_telnet  ? sock->telnet  : false;
> -    bool is_waitconnect = sock->has_wait    ? sock->wait    : false;
> +    bool is_wait        = sock->has_wait    ? sock->wait    : false;
>      int64_t reconnect   = sock->has_reconnect ? sock->reconnect : 0;
>      ChardevCommon *common = qapi_ChardevSocket_base(sock);
>      QIOChannelSocket *sioc = NULL;
> @@ -4424,7 +4424,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
>      }
>  
>      sioc = qio_channel_socket_new();
> -    if (s->reconnect_time) {
> +    if (s->reconnect_time && !is_wait) {
>          qio_channel_socket_connect_async(sioc, s->addr,
>                                           qemu_chr_socket_connected,
>                                           chr, NULL);
> @@ -4433,7 +4433,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
>              goto error;
>          }
>          s->listen_ioc = sioc;
> -        if (is_waitconnect) {
> +        if (is_wait) {
>              trace_char_socket_waiting(chr->filename);
>              tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
>          }
> @@ -4443,9 +4443,24 @@ static CharDriverState *qmp_chardev_open_socket(const char *id,
>                  QIO_CHANNEL(s->listen_ioc), G_IO_IN, tcp_chr_accept, chr, NULL);
>          }
>      } else {
> -        if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
> -            goto error;
> -        }
> +        do {
> +            Error *err = NULL;
> +
> +            if (qio_channel_socket_connect_sync(sioc, s->addr, &err) < 0) {
> +                if (reconnect) {
> +                    trace_char_socket_reconnect_error(chr->label,
> +                                                      error_get_pretty(err));
> +                    error_free(err);
> +                    continue;
> +                } else {
> +                    error_propagate(errp, err);
> +                    goto error;
> +                }
> +            } else {
> +                break;
> +            }
> +        } while (true);
> +
>          tcp_chr_new_client(chr, sioc);
>          object_unref(OBJECT(sioc));
>      }
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-01 11:16 ` [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support marcandre.lureau
  2016-04-13  2:49   ` Yuanhan Liu
@ 2016-04-28  5:23   ` Yuanhan Liu
  2016-04-29 10:40     ` Marc-André Lureau
  1 sibling, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2016-04-28  5:23 UTC (permalink / raw)
  To: marcandre.lureau
  Cc: qemu-devel, Michael S. Tsirkin, Tetsuya Mukawa, jonshin,
	Ilya Maximets, Xie, Huawei

On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  docs/specs/vhost-user.txt | 15 +++++++++++++++
>  hw/virtio/vhost-user.c    | 16 ++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 8a635fa..60d6d13 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -487,3 +487,18 @@ Message types
>        request to the master. It is passed in the ancillary data.
>        This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
>        feature is available.
> +
> +Slave message types
> +-------------------
> +
> + * VHOST_USER_SLAVE_SHUTDOWN:
> +      Id: 1
> +      Master payload: N/A
> +      Slave payload: u64
> +
> +      Request the master to shutdown the slave. A 0 reply is for
> +      success, in which case the slave may close all connections
> +      immediately and quit. A non-zero reply cancels the request.
> +
> +      Before a reply comes, the master may make other requests in
> +      order to flush or sync state.

Hi all,

I threw this proposal as well as DPDK's implementation to our customer
(OVS, Openstack and some other teams) who made such request before. I'm
sorry to say that none of them really liked that we can't handle crash.
Making reconnect work from a vhost-user backend crash is exactly something
they are after.

And to handle the crash, I was thinking of the proposal from Michael.
That is to do reset from the guest OS. This would fix this issue
ultimately. However, old kernel will not benefit from this, as well
as other guest other than Linux, making it not that useful for current
usage. 

Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance
to get the vring base (last used idx) from the backend, Huawei suggests
that we could still make it in a consistent state after the crash, if
we get the vring base from vring->used->idx.  That worked as expected
from my test. The only tricky thing might be how to detect a crash,
and we could do a simple compare of the vring base from QEMU with
the vring->used->idx at the initiation stage. If mismatch found, get
it from vring->used->idx instead.

Comments/thoughts? Is it a solid enough solution to you?  If so, we
could make things much simpler, and what's most important, we could
be able to handle crash.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-28  5:23   ` Yuanhan Liu
@ 2016-04-29 10:40     ` Marc-André Lureau
  2016-04-29 17:48       ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Marc-André Lureau @ 2016-04-29 10:40 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Michael S. Tsirkin, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa,
	Xie, Huawei

Hi

On Thu, Apr 28, 2016 at 7:23 AM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote:
>> From: Marc-André Lureau <marcandre.lureau@redhat.com>
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>>  docs/specs/vhost-user.txt | 15 +++++++++++++++
>>  hw/virtio/vhost-user.c    | 16 ++++++++++++++++
>>  2 files changed, 31 insertions(+)
>>
>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>> index 8a635fa..60d6d13 100644
>> --- a/docs/specs/vhost-user.txt
>> +++ b/docs/specs/vhost-user.txt
>> @@ -487,3 +487,18 @@ Message types
>>        request to the master. It is passed in the ancillary data.
>>        This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
>>        feature is available.
>> +
>> +Slave message types
>> +-------------------
>> +
>> + * VHOST_USER_SLAVE_SHUTDOWN:
>> +      Id: 1
>> +      Master payload: N/A
>> +      Slave payload: u64
>> +
>> +      Request the master to shutdown the slave. A 0 reply is for
>> +      success, in which case the slave may close all connections
>> +      immediately and quit. A non-zero reply cancels the request.
>> +
>> +      Before a reply comes, the master may make other requests in
>> +      order to flush or sync state.
>
> Hi all,
>
> I threw this proposal as well as DPDK's implementation to our customer
> (OVS, Openstack and some other teams) who made such request before. I'm
> sorry to say that none of them really liked that we can't handle crash.
> Making reconnect work from a vhost-user backend crash is exactly something
> they are after.

Handling crashes is not quite the same as what I propose here. I see
it as a different goal. But I doubt about usefulness and reliability
of a backend that crashes. In many case, vhost-user was designed after
kernel vhost, and qemu code has the same expectation about the kernel
or the vhost-user backend: many calls are sync and will simply
assert() on unexpected results.

> And to handle the crash, I was thinking of the proposal from Michael.
> That is to do reset from the guest OS. This would fix this issue
> ultimately. However, old kernel will not benefit from this, as well
> as other guest other than Linux, making it not that useful for current
> usage.

Yes, I hope Michael can help with that, I am not very familiar with
the kernel code.

> Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance
> to get the vring base (last used idx) from the backend, Huawei suggests

Right, but after this message, the backend should have flushed all
pending ring packets and stop processing them. So it's also a clean
sync point.

> that we could still make it in a consistent state after the crash, if
> we get the vring base from vring->used->idx.  That worked as expected

You can have a backend that would have already processed packets and
not updated used idx. You could also have out-of-order packets in
flights (ex: desc 1-2-3 avail, 1-3 used, 2 pending..). I can't see a
clean way to restore this, but to reset the queues and start over,
with either packet loss or packet duplication. If the backend
guarantees to process packets in order, it may simplify things, but it
would be a special case.

> from my test. The only tricky thing might be how to detect a crash,
> and we could do a simple compare of the vring base from QEMU with
> the vring->used->idx at the initiation stage. If mismatch found, get
> it from vring->used->idx instead.

I don't follow, would the backend restore its last vring->used->idx
after a crash?

> Comments/thoughts? Is it a solid enough solution to you?  If so, we
> could make things much simpler, and what's most important, we could
> be able to handle crash.

That's not so easy, many of the vhost_ops->vhost*() are followed by
assert(r>0), which will be hard to change to handle failure. But, I
would worry first about a backend that crashes that it may corrupt the
VM memory too...

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-29 10:40     ` Marc-André Lureau
@ 2016-04-29 17:48       ` Yuanhan Liu
  2016-05-01 11:37         ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2016-04-29 17:48 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Michael S. Tsirkin, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa,
	Xie, Huawei

On Fri, Apr 29, 2016 at 12:40:09PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Apr 28, 2016 at 7:23 AM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote:
> >> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> >> +Slave message types
> >> +-------------------
> >> +
> >> + * VHOST_USER_SLAVE_SHUTDOWN:
> >> +      Id: 1
> >> +      Master payload: N/A
> >> +      Slave payload: u64
> >> +
> >> +      Request the master to shutdown the slave. A 0 reply is for
> >> +      success, in which case the slave may close all connections
> >> +      immediately and quit. A non-zero reply cancels the request.
> >> +
> >> +      Before a reply comes, the master may make other requests in
> >> +      order to flush or sync state.
> >
> > Hi all,
> >
> > I threw this proposal as well as DPDK's implementation to our customer
> > (OVS, Openstack and some other teams) who made such request before. I'm
> > sorry to say that none of them really liked that we can't handle crash.
> > Making reconnect work from a vhost-user backend crash is exactly something
> > they are after.
> 
> Handling crashes is not quite the same as what I propose here.

Yes, I know. However, handling crashes is exactly what our customers
want. And I just want to let you know that, say, I don't ask you to
do that :)

> I see
> it as a different goal. But I doubt about usefulness and reliability
> of a backend that crashes.

Agreed with you on that. However, I guess you have to admit that crashes
just happen. Kernel sometimes crashes, too. So, it would be great if
we could make whole stuff work again after an unexpected crash (say,
from OVS), without restarting all guests.

> In many case, vhost-user was designed after
> kernel vhost, and qemu code has the same expectation about the kernel
> or the vhost-user backend: many calls are sync and will simply
> assert() on unexpected results.

I guess we could at aleast try to dimish it, if we can't avoid it completely.

> > And to handle the crash, I was thinking of the proposal from Michael.
> > That is to do reset from the guest OS. This would fix this issue
> > ultimately. However, old kernel will not benefit from this, as well
> > as other guest other than Linux, making it not that useful for current
> > usage.
> 
> Yes, I hope Michael can help with that, I am not very familiar with
> the kernel code.
> 
> > Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance
> > to get the vring base (last used idx) from the backend, Huawei suggests
> 
> Right, but after this message, the backend should have flushed all
> pending ring packets and stop processing them. So it's also a clean
> sync point.
> 
> > that we could still make it in a consistent state after the crash, if
> > we get the vring base from vring->used->idx.  That worked as expected
> 
> You can have a backend that would have already processed packets and
> not updated used idx. You could also have out-of-order packets in
> flights (ex: desc 1-2-3 avail, 1-3 used, 2 pending..). I can't see a
> clean way to restore this, but to reset the queues and start over,
> with either packet loss or packet duplication.

Judging that it (crash or restart) happens so rare, I don't think
it matters. Moreoever, doesn't that happen in real world :)

> If the backend
> guarantees to process packets in order, it may simplify things, but it
> would be a special case.

Well, it's more like a backend thing: it's the backend to try to
set a saner vring base as stated in above proposal. Therefore, I
will not say it's a special case.

> 
> > from my test. The only tricky thing might be how to detect a crash,
> > and we could do a simple compare of the vring base from QEMU with
> > the vring->used->idx at the initiation stage. If mismatch found, get
> > it from vring->used->idx instead.
> 
> I don't follow, would the backend restore its last vring->used->idx
> after a crash?

Yes, restore from the SET_VRING_BASE from QEMU. But it's a stale value,
normally 0 if no start/stop happens before. Therefore, we can't use
it after the crash, instead, we could try to detect the mismatch and
try to fix it at SET_VRING_ADDR request.

> 
> > Comments/thoughts? Is it a solid enough solution to you?  If so, we
> > could make things much simpler, and what's most important, we could
> > be able to handle crash.
> 
> That's not so easy, many of the vhost_ops->vhost*() are followed by
> assert(r>0), which will be hard to change to handle failure. But, I
> would worry first about a backend that crashes that it may corrupt the
> VM memory too...

Not quite sure I follow this. But, backend just touches the virtio
related memory, so it will do no harm to the VM?

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-04-29 17:48       ` Yuanhan Liu
@ 2016-05-01 11:37         ` Michael S. Tsirkin
  2016-05-01 21:04           ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2016-05-01 11:37 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Marc-André Lureau, QEMU, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Xie, Huawei

On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote:
> > But, I
> > would worry first about a backend that crashes that it may corrupt the
> > VM memory too...
> 
> Not quite sure I follow this. But, backend just touches the virtio
> related memory, so it will do no harm to the VM?

It crashed so apparently it's not behaving as designed -
how can we be sure it does not harm the VM?

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-01 11:37         ` Michael S. Tsirkin
@ 2016-05-01 21:04           ` Yuanhan Liu
  2016-05-02 10:45             ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2016-05-01 21:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, QEMU, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Xie, Huawei

On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote:
> > > But, I
> > > would worry first about a backend that crashes that it may corrupt the
> > > VM memory too...
> > 
> > Not quite sure I follow this. But, backend just touches the virtio
> > related memory, so it will do no harm to the VM?
> 
> It crashed so apparently it's not behaving as designed -
> how can we be sure it does not harm the VM?

Hi Michael,

What I know so far, a crash might could corrupt the virtio memory in two
ways:

- vring_used_elem might be in stale state when we are in the middle of
  updating used vring. Say, we might have updated the "id" field, but
  leaving "len" untouched.

- vring desc buff might also be in stale state, when we are copying
  data into it.

However, the two issues will not be real issue, as used->idx is not
updated in both case. Thefore, those corrupted memory will not be
processed by guest. So, no harm I'd say. Or, am I missing anything?

BTW, Michael, what's your thoughts on the whole reconnect stuff?

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-01 21:04           ` Yuanhan Liu
@ 2016-05-02 10:45             ` Michael S. Tsirkin
  2016-05-02 11:29               ` Marc-André Lureau
  2016-05-02 17:37               ` Yuanhan Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2016-05-02 10:45 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Marc-André Lureau, QEMU, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Xie, Huawei

On Sun, May 01, 2016 at 02:04:42PM -0700, Yuanhan Liu wrote:
> On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote:
> > > > But, I
> > > > would worry first about a backend that crashes that it may corrupt the
> > > > VM memory too...
> > > 
> > > Not quite sure I follow this. But, backend just touches the virtio
> > > related memory, so it will do no harm to the VM?
> > 
> > It crashed so apparently it's not behaving as designed -
> > how can we be sure it does not harm the VM?
> 
> Hi Michael,
> 
> What I know so far, a crash might could corrupt the virtio memory in two
> ways:
> 
> - vring_used_elem might be in stale state when we are in the middle of
>   updating used vring. Say, we might have updated the "id" field, but
>   leaving "len" untouched.
> 
> - vring desc buff might also be in stale state, when we are copying
>   data into it.


- a random write into VM memory due to backend bug corrupts it.

> However, the two issues will not be real issue, as used->idx is not
> updated in both case. Thefore, those corrupted memory will not be
> processed by guest. So, no harm I'd say. Or, am I missing anything?

Why is backend crashing? It shouldn't so maybe this means it's
memory is corrupt. That is the claim.

> BTW, Michael, what's your thoughts on the whole reconnect stuff?
> 
> 	--yliu

My thoughts are that we need to split these patchsets up.

There are several issues here:


1. Graceful disconnect
- One should be able to do vmstop, disconnect, connect then vm start
  This looks like a nice intermediate step.
- Why would we always assume it's always remote initiating the disconnect?
  Monitor commands for disconnecting seem called for.
  

2. Unexpected disconnect
- If buffers are processed in order or flushed before socket close,
  then on disconnect status can be recovered
  from ring alone. If that is of interest, we might want to add
  a protocol flag for that. F_DISCONNECT_FLUSH ? Without this,
  only graceful disconnect or reset with guest's help can be supported.
- As Marc-André said, without graceful shutdown it is not enough to
  handle ring state though.  We must handle socket getting disconnected
  in the middle of send/receive.  While it's more work,
  it does seem like a nice interface to support.
- I understand the concern that existing guests do not handle NEED_RESET
  status. One way to fix this would be a new feature bit. If NEED_RESET not
  handled, request hot-unplug instead.

3. Running while disconnected
- At the moment, we wait on vm start for remote to connect,
  if we support disconnecting backend without stopping
  we probably should also support starting it without waiting
  for connection
- Guests expect tx buffers to be used in a timely manner, thus:
- If vm is running we must use them in qemu, maybe discarding packets
  in the process.
  There already are commands for link down, I'm not sure there's value
  in doing this automatically in qemu.
- Alternatively, we should also have an option to stop vm automatically (like we do on
  disk errors) to reduce number of dropped packets.

4. Reconnecting
- When acting as a server, we might want an option to go back to
  listening state, but it should not be the only option,
  there should be a monitor command for moving it back to
  listening state.
- When acting as a client, auto-reconnect might be handy sometimes, but should not be the only
  option, there should be a way to manually request connect, possibly to
  a different target, so a monitor command for re-connecting seems called for.
- We'll also need monitor events for disconnects so management knows it
  must re-connect/restart listening.
- If we stopped VM, there could be an option to restart on reconnect.


-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-02 10:45             ` Michael S. Tsirkin
@ 2016-05-02 11:29               ` Marc-André Lureau
  2016-05-02 12:04                 ` Michael S. Tsirkin
  2016-05-02 17:37               ` Yuanhan Liu
  1 sibling, 1 reply; 42+ messages in thread
From: Marc-André Lureau @ 2016-05-02 11:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yuanhan Liu, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa, Xie, Huawei

Hi

On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 1. Graceful disconnect
> - One should be able to do vmstop, disconnect, connect then vm start
>   This looks like a nice intermediate step.
> - Why would we always assume it's always remote initiating the disconnect?
>   Monitor commands for disconnecting seem called for.

Those two solutions involve VM management. This looks more complex to
communicate/synchronize vhost-user backend & vm management & qemu. The
use case I cover is request from the backend to shutdown, because
that's what the users wanted (and it is already possible to stop the
backend and disconnect it from qemu, we would only need to know when,
with a new command..)

>
> 2. Unexpected disconnect
> - If buffers are processed in order or flushed before socket close,
>   then on disconnect status can be recovered
>   from ring alone. If that is of interest, we might want to add
>   a protocol flag for that. F_DISCONNECT_FLUSH ? Without this,
>   only graceful disconnect or reset with guest's help can be supported.

(doing all this, at this point, I don't see much difference with
having an explicit shutdown)

> - As Marc-André said, without graceful shutdown it is not enough to
>   handle ring state though.  We must handle socket getting disconnected
>   in the middle of send/receive.  While it's more work,
>   it does seem like a nice interface to support.
> - I understand the concern that existing guests do not handle NEED_RESET
>   status. One way to fix this would be a new feature bit. If NEED_RESET not
>   handled, request hot-unplug instead.
>
> 3. Running while disconnected
> - At the moment, we wait on vm start for remote to connect,
>   if we support disconnecting backend without stopping
>   we probably should also support starting it without waiting
>   for connection

That's what Tetsuya proposed in its initial series, but handling the
flags was quite tedious. I think this can be considered easily a
seperate enhancement. What I proposed is to keep waiting for the
initial connect, and check the flags remains compatible on reconnect.

> - Guests expect tx buffers to be used in a timely manner, thus:
> - If vm is running we must use them in qemu, maybe discarding packets
>   in the process.
>   There already are commands for link down, I'm not sure there's value
>   in doing this automatically in qemu.

Testing doesn't show such buffer issues when the link is down (this
can be tested with vubr example above)

> - Alternatively, we should also have an option to stop vm automatically (like we do on
>   disk errors) to reduce number of dropped packets.

Why not, we would need to know if this is actually useful.

>
> 4. Reconnecting
> - When acting as a server, we might want an option to go back to
>   listening state, but it should not be the only option,
>   there should be a monitor command for moving it back to
>   listening state.
> - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only
>   option, there should be a way to manually request connect, possibly to
>   a different target, so a monitor command for re-connecting seems called for.
> - We'll also need monitor events for disconnects so management knows it
>   must re-connect/restart listening.
> - If we stopped VM, there could be an option to restart on reconnect.

That's all involving a third party, adding complexity but the benefit
isn't so clear.

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-02 11:29               ` Marc-André Lureau
@ 2016-05-02 12:04                 ` Michael S. Tsirkin
  2016-05-02 17:50                   ` Yuanhan Liu
  2016-05-04 13:16                   ` Marc-André Lureau
  0 siblings, 2 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2016-05-02 12:04 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Yuanhan Liu, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa, Xie, Huawei

On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 1. Graceful disconnect
> > - One should be able to do vmstop, disconnect, connect then vm start
> >   This looks like a nice intermediate step.
> > - Why would we always assume it's always remote initiating the disconnect?
> >   Monitor commands for disconnecting seem called for.
> 
> Those two solutions involve VM management. This looks more complex to
> communicate/synchronize vhost-user backend & vm management & qemu. The
> use case I cover is request from the backend to shutdown,

Right, but flushing buffers + closing the socket looks like
a cleaner interface than a ton of messages going back and forth.

> because
> that's what the users wanted (and it is already possible to stop the
> backend and disconnect it from qemu, we would only need to know when,
> with a new command..)

You assume the backend has a monitor interface to disconnect though.
That's not a given.

> >
> > 2. Unexpected disconnect
> > - If buffers are processed in order or flushed before socket close,
> >   then on disconnect status can be recovered
> >   from ring alone. If that is of interest, we might want to add
> >   a protocol flag for that. F_DISCONNECT_FLUSH ? Without this,
> >   only graceful disconnect or reset with guest's help can be supported.
> 
> (doing all this, at this point, I don't see much difference with
> having an explicit shutdown)

With graceful shutdown we implicitly request flush when we ask
backend to stop.

> > - As Marc-André said, without graceful shutdown it is not enough to
> >   handle ring state though.  We must handle socket getting disconnected
> >   in the middle of send/receive.  While it's more work,
> >   it does seem like a nice interface to support.
> > - I understand the concern that existing guests do not handle NEED_RESET
> >   status. One way to fix this would be a new feature bit. If NEED_RESET not
> >   handled, request hot-unplug instead.
> >
> > 3. Running while disconnected
> > - At the moment, we wait on vm start for remote to connect,
> >   if we support disconnecting backend without stopping
> >   we probably should also support starting it without waiting
> >   for connection
> 
> That's what Tetsuya proposed in its initial series, but handling the
> flags was quite tedious.

That would be up to management. E.g. let backend export the list
of flags it supports in some file, and apply to QEMU.

> I think this can be considered easily a
> seperate enhancement. What I proposed is to keep waiting for the
> initial connect, and check the flags remains compatible on reconnect.

Seems asymmetrical unless we stop the vm.

> > - Guests expect tx buffers to be used in a timely manner, thus:
> > - If vm is running we must use them in qemu, maybe discarding packets
> >   in the process.
> >   There already are commands for link down, I'm not sure there's value
> >   in doing this automatically in qemu.
> 
> Testing doesn't show such buffer issues when the link is down (this
> can be tested with vubr example above)

Not enough testing then - it's a race condition: buffers can be sent
before link down.

> > - Alternatively, we should also have an option to stop vm automatically (like we do on
> >   disk errors) to reduce number of dropped packets.
> 
> Why not, we would need to know if this is actually useful.
> 
> >
> > 4. Reconnecting
> > - When acting as a server, we might want an option to go back to
> >   listening state, but it should not be the only option,
> >   there should be a monitor command for moving it back to
> >   listening state.
> > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only
> >   option, there should be a way to manually request connect, possibly to
> >   a different target, so a monitor command for re-connecting seems called for.
> > - We'll also need monitor events for disconnects so management knows it
> >   must re-connect/restart listening.
> > - If we stopped VM, there could be an option to restart on reconnect.
> 
> That's all involving a third party, adding complexity but the benefit
> isn't so clear.

It's rather clear to me. Let's assume you want to restart bridge,
so you trigger disconnect.
If qemu auto-reconnects there's a race as it might re-connect
to the old bridge.
Management is required to make this robust, auto-reconnect
is handy for people bypassing management.

> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-02 10:45             ` Michael S. Tsirkin
  2016-05-02 11:29               ` Marc-André Lureau
@ 2016-05-02 17:37               ` Yuanhan Liu
  1 sibling, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2016-05-02 17:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, QEMU, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Xie, Huawei

On Mon, May 02, 2016 at 01:45:31PM +0300, Michael S. Tsirkin wrote:
> On Sun, May 01, 2016 at 02:04:42PM -0700, Yuanhan Liu wrote:
> > On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote:
> > > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote:
> > > > > But, I
> > > > > would worry first about a backend that crashes that it may corrupt the
> > > > > VM memory too...
> > > > 
> > > > Not quite sure I follow this. But, backend just touches the virtio
> > > > related memory, so it will do no harm to the VM?
> > > 
> > > It crashed so apparently it's not behaving as designed -
> > > how can we be sure it does not harm the VM?
> > 
> > Hi Michael,
> > 
> > What I know so far, a crash might could corrupt the virtio memory in two
> > ways:
> > 
> > - vring_used_elem might be in stale state when we are in the middle of
> >   updating used vring. Say, we might have updated the "id" field, but
> >   leaving "len" untouched.
> > 
> > - vring desc buff might also be in stale state, when we are copying
> >   data into it.
> 
> 
> - a random write into VM memory due to backend bug corrupts it.
> 
> > However, the two issues will not be real issue, as used->idx is not
> > updated in both case. Thefore, those corrupted memory will not be
> > processed by guest. So, no harm I'd say. Or, am I missing anything?
> 
> Why is backend crashing? It shouldn't so maybe this means it's
> memory is corrupt. That is the claim.

As far as virtio memory is not corrupted (or even corrupt in above
ways), there would be no issue. But, yes, you made a good point: we
make no guarantees that it's not the virtio memory corruption caused
the crash.

> > BTW, Michael, what's your thoughts on the whole reconnect stuff?
> > 
> > 	--yliu
> 
> My thoughts are that we need to split these patchsets up.
> 
> There are several issues here:
> 
> 
> 1. Graceful disconnect
> - One should be able to do vmstop, disconnect, connect then vm start
>   This looks like a nice intermediate step.
> - Why would we always assume it's always remote initiating the disconnect?
>   Monitor commands for disconnecting seem called for.

A monitor command is a good suggestion. But I'm thinking why vmstop is
necessary. Basically, a disconnect is as to a cable unplug to physical
NIC; we don't need pause it before unplugging.

> 
> 2. Unexpected disconnect
> - If buffers are processed in order or flushed before socket close,
>   then on disconnect status can be recovered
>   from ring alone. If that is of interest, we might want to add
>   a protocol flag for that. F_DISCONNECT_FLUSH ?

Sorry, what does this flag supposed to work here?

> Without this,
>   only graceful disconnect or reset with guest's help can be supported.
> - As Marc-André said, without graceful shutdown it is not enough to
>   handle ring state though.  We must handle socket getting disconnected
>   in the middle of send/receive.  While it's more work,
>   it does seem like a nice interface to support.

Same as above, what the backend (or QEMU) can do for this case without
the guest's (reset) help?


> - I understand the concern that existing guests do not handle NEED_RESET
>   status. One way to fix this would be a new feature bit. If NEED_RESET not
>   handled,

I could check the code by myself, but I'm thinking it might be trivial
for you to answer my question: how do we know that NEED_RESET is not
handled?

> request hot-unplug instead.

Same as above, may I know how to request a hot-unplug?

> 
> 3. Running while disconnected
> - At the moment, we wait on vm start for remote to connect,
>   if we support disconnecting backend without stopping
>   we probably should also support starting it without waiting
>   for connection

Agreed. I guess that would anaswer my first question: we don't actually
need to do vmstop before disconnect.

	--yliu

> - Guests expect tx buffers to be used in a timely manner, thus:
> - If vm is running we must use them in qemu, maybe discarding packets
>   in the process.
>   There already are commands for link down, I'm not sure there's value
>   in doing this automatically in qemu.
> - Alternatively, we should also have an option to stop vm automatically (like we do on
>   disk errors) to reduce number of dropped packets.
> 
> 4. Reconnecting
> - When acting as a server, we might want an option to go back to
>   listening state, but it should not be the only option,
>   there should be a monitor command for moving it back to
>   listening state.
> - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only
>   option, there should be a way to manually request connect, possibly to
>   a different target, so a monitor command for re-connecting seems called for.
> - We'll also need monitor events for disconnects so management knows it
>   must re-connect/restart listening.
> - If we stopped VM, there could be an option to restart on reconnect.
> 
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-02 12:04                 ` Michael S. Tsirkin
@ 2016-05-02 17:50                   ` Yuanhan Liu
  2016-05-04 13:16                   ` Marc-André Lureau
  1 sibling, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2016-05-02 17:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, QEMU, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Xie, Huawei

On Mon, May 02, 2016 at 03:04:30PM +0300, Michael S. Tsirkin wrote:
> On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote:
> > Hi
> > 
> > On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > 1. Graceful disconnect
> > > - One should be able to do vmstop, disconnect, connect then vm start
> > >   This looks like a nice intermediate step.
> > > - Why would we always assume it's always remote initiating the disconnect?
> > >   Monitor commands for disconnecting seem called for.
> > 
> > Those two solutions involve VM management. This looks more complex to
> > communicate/synchronize vhost-user backend & vm management & qemu. The
> > use case I cover is request from the backend to shutdown,
> 
> Right, but flushing buffers + closing the socket looks like
> a cleaner interface than a ton of messages going back and forth.

I'd agree with Michael on that. It needs more cares when dealing with
two stream buffers: you can't quit backend soon after the shutdown
request, instead, you have to wait for the ACK from QEMU. Making request
from QEMU avoids that.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-02 12:04                 ` Michael S. Tsirkin
  2016-05-02 17:50                   ` Yuanhan Liu
@ 2016-05-04 13:16                   ` Marc-André Lureau
  2016-05-04 19:13                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 42+ messages in thread
From: Marc-André Lureau @ 2016-05-04 13:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yuanhan Liu, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa, Xie, Huawei

Hi

On Mon, May 2, 2016 at 2:04 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote:
>> Hi
>>
>> On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > 1. Graceful disconnect
>> > - One should be able to do vmstop, disconnect, connect then vm start
>> >   This looks like a nice intermediate step.
>> > - Why would we always assume it's always remote initiating the disconnect?
>> >   Monitor commands for disconnecting seem called for.
>>
>> Those two solutions involve VM management. This looks more complex to
>> communicate/synchronize vhost-user backend & vm management & qemu. The
>> use case I cover is request from the backend to shutdown,
>
> Right, but flushing buffers + closing the socket looks like
> a cleaner interface than a ton of messages going back and forth.

What do you mean by "a ton of messages"? It adds
VHOST_USER_SET_SLAVE_FD (generic), and VHOST_USER_SLAVE_SHUTDOWN. The
amount of work to flush and close is the same regardless. I figured
later that if we refactor vhost-user socket handling, we may be able
to accept request from the main channel socket, without extra "slave
channel".

>
>> because
>> that's what the users wanted (and it is already possible to stop the
>> backend and disconnect it from qemu, we would only need to know when,
>> with a new command..)
>
> You assume the backend has a monitor interface to disconnect though.
> That's not a given.

What do you mean? The backend must have a way to request to close/quit
indeed. That's outside of scope how the backend get this information
(via signals or other means). It's external, having this information
from VM management layer is the same (someone has to trigger this
somehow).

>> > 3. Running while disconnected
>> > - At the moment, we wait on vm start for remote to connect,
>> >   if we support disconnecting backend without stopping
>> >   we probably should also support starting it without waiting
>> >   for connection
>>
>> That's what Tetsuya proposed in its initial series, but handling the
>> flags was quite tedious.
>
> That would be up to management. E.g. let backend export the list
> of flags it supports in some file, and apply to QEMU.

That makes me worry that such unfriendly connections details have to
spread outside of vhost-user to VM management layer, making usage &
maintenance harder for no clear benefit. It's a similar concern you
have with "the backend has a monitor interface", here "the backend
must have an introspection interface" or at least export vhost-user
details somehow.

>
>> I think this can be considered easily a
>> seperate enhancement. What I proposed is to keep waiting for the
>> initial connect, and check the flags remains compatible on reconnect.
>
> Seems asymmetrical unless we stop the vm.

That's the point, there will be time with and without the backend if
we keep the VM running.

>> > - Guests expect tx buffers to be used in a timely manner, thus:
>> > - If vm is running we must use them in qemu, maybe discarding packets
>> >   in the process.
>> >   There already are commands for link down, I'm not sure there's value
>> >   in doing this automatically in qemu.
>>
>> Testing doesn't show such buffer issues when the link is down (this
>> can be tested with vubr example above)
>
> Not enough testing then - it's a race condition: buffers can be sent
> before link down.

Ok, I'll do more testing. In all cases, looks reasonable to discard.

>
>> > - Alternatively, we should also have an option to stop vm automatically (like we do on
>> >   disk errors) to reduce number of dropped packets.
>>
>> Why not, we would need to know if this is actually useful.
>>
>> >
>> > 4. Reconnecting
>> > - When acting as a server, we might want an option to go back to
>> >   listening state, but it should not be the only option,
>> >   there should be a monitor command for moving it back to
>> >   listening state.
>> > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only
>> >   option, there should be a way to manually request connect, possibly to
>> >   a different target, so a monitor command for re-connecting seems called for.
>> > - We'll also need monitor events for disconnects so management knows it
>> >   must re-connect/restart listening.
>> > - If we stopped VM, there could be an option to restart on reconnect.
>>
>> That's all involving a third party, adding complexity but the benefit
>> isn't so clear.
>
> It's rather clear to me. Let's assume you want to restart bridge,
> so you trigger disconnect.
> If qemu auto-reconnects there's a race as it might re-connect
> to the old bridge.

I would say that race can trivially be avoided, so it's a backend bug.

> Management is required to make this robust, auto-reconnect
> is handy for people bypassing management.

tbh, I don't like autoreconnect. My previous series didn't include
this and assumed the feature would be supported only when qemu is
configured to be the server. I added reconnect upon request by users.

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-04 13:16                   ` Marc-André Lureau
@ 2016-05-04 19:13                     ` Michael S. Tsirkin
  2016-05-05  3:44                       ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2016-05-04 19:13 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Yuanhan Liu, QEMU, Ilya Maximets, jonshin, Tetsuya Mukawa, Xie, Huawei

On Wed, May 04, 2016 at 03:16:44PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Mon, May 2, 2016 at 2:04 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote:
> >> Hi
> >>
> >> On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > 1. Graceful disconnect
> >> > - One should be able to do vmstop, disconnect, connect then vm start
> >> >   This looks like a nice intermediate step.
> >> > - Why would we always assume it's always remote initiating the disconnect?
> >> >   Monitor commands for disconnecting seem called for.
> >>
> >> Those two solutions involve VM management. This looks more complex to
> >> communicate/synchronize vhost-user backend & vm management & qemu. The
> >> use case I cover is request from the backend to shutdown,
> >
> > Right, but flushing buffers + closing the socket looks like
> > a cleaner interface than a ton of messages going back and forth.
> 
> What do you mean by "a ton of messages"? It adds
> VHOST_USER_SET_SLAVE_FD (generic), and VHOST_USER_SLAVE_SHUTDOWN. The
> amount of work to flush and close is the same regardless. I figured
> later that if we refactor vhost-user socket handling, we may be able
> to accept request from the main channel socket, without extra "slave
> channel".
> 
> >
> >> because
> >> that's what the users wanted (and it is already possible to stop the
> >> backend and disconnect it from qemu, we would only need to know when,
> >> with a new command..)
> >
> > You assume the backend has a monitor interface to disconnect though.
> > That's not a given.
> 
> What do you mean? The backend must have a way to request to close/quit
> indeed. That's outside of scope how the backend get this information
> (via signals or other means). It's external, having this information
> from VM management layer is the same (someone has to trigger this
> somehow).

Correct. So for symmetry if nothing else, besides handling
slave close request, we should be able to initiate close
from qemu with a new command, and get event when not connected.

Afterwards client can be killed with -9 as it's no longer
connected to qemu.


> >> > 3. Running while disconnected
> >> > - At the moment, we wait on vm start for remote to connect,
> >> >   if we support disconnecting backend without stopping
> >> >   we probably should also support starting it without waiting
> >> >   for connection
> >>
> >> That's what Tetsuya proposed in its initial series, but handling the
> >> flags was quite tedious.
> >
> > That would be up to management. E.g. let backend export the list
> > of flags it supports in some file, and apply to QEMU.
> 
> That makes me worry that such unfriendly connections details have to
> spread outside of vhost-user to VM management layer, making usage &
> maintenance harder for no clear benefit. It's a similar concern you
> have with "the backend has a monitor interface", here "the backend
> must have an introspection interface" or at least export vhost-user
> details somehow.

How can we start VM before backend connects otherwise?
Have better ideas?


> >
> >> I think this can be considered easily a
> >> seperate enhancement. What I proposed is to keep waiting for the
> >> initial connect, and check the flags remains compatible on reconnect.
> >
> > Seems asymmetrical unless we stop the vm.
> 
> That's the point, there will be time with and without the backend if
> we keep the VM running.
> 
> >> > - Guests expect tx buffers to be used in a timely manner, thus:
> >> > - If vm is running we must use them in qemu, maybe discarding packets
> >> >   in the process.
> >> >   There already are commands for link down, I'm not sure there's value
> >> >   in doing this automatically in qemu.
> >>
> >> Testing doesn't show such buffer issues when the link is down (this
> >> can be tested with vubr example above)
> >
> > Not enough testing then - it's a race condition: buffers can be sent
> > before link down.
> 
> Ok, I'll do more testing. In all cases, looks reasonable to discard.

Discard has some issues - for example processing ring in qemu
sometimes exposes us to more security risks.

> >
> >> > - Alternatively, we should also have an option to stop vm automatically (like we do on
> >> >   disk errors) to reduce number of dropped packets.
> >>
> >> Why not, we would need to know if this is actually useful.
> >>
> >> >
> >> > 4. Reconnecting
> >> > - When acting as a server, we might want an option to go back to
> >> >   listening state, but it should not be the only option,
> >> >   there should be a monitor command for moving it back to
> >> >   listening state.
> >> > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only
> >> >   option, there should be a way to manually request connect, possibly to
> >> >   a different target, so a monitor command for re-connecting seems called for.
> >> > - We'll also need monitor events for disconnects so management knows it
> >> >   must re-connect/restart listening.
> >> > - If we stopped VM, there could be an option to restart on reconnect.
> >>
> >> That's all involving a third party, adding complexity but the benefit
> >> isn't so clear.
> >
> > It's rather clear to me. Let's assume you want to restart bridge,
> > so you trigger disconnect.
> > If qemu auto-reconnects there's a race as it might re-connect
> > to the old bridge.
> 
> I would say that race can trivially be avoided, so it's a backend bug.

How do you avoid it?

> > Management is required to make this robust, auto-reconnect
> > is handy for people bypassing management.
> 
> tbh, I don't like autoreconnect. My previous series didn't include
> this and assumed the feature would be supported only when qemu is
> configured to be the server. I added reconnect upon request by users.

I don't have better solutions so OK I guess.

> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-04 19:13                     ` Michael S. Tsirkin
@ 2016-05-05  3:44                       ` Yuanhan Liu
  2016-05-05 13:42                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2016-05-05  3:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, QEMU, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Xie, Huawei

Hello,

On Wed, May 04, 2016 at 10:13:49PM +0300, Michael S. Tsirkin wrote:
> How do you avoid it?
> 
> > > Management is required to make this robust, auto-reconnect
> > > is handy for people bypassing management.
> > 
> > tbh, I don't like autoreconnect. My previous series didn't include
> > this and assumed the feature would be supported only when qemu is
> > configured to be the server. I added reconnect upon request by users.
> 
> I don't have better solutions so OK I guess.

Yes, it's a request from me :)
Well, there may be few others also requested that.

The reason I had this asking is that, so far, we just have only one
vhost-user frontend, and that is QEMU. But we may have many backends,
I'm aware of 4 at the time writing, including the vubr from QEMU.
While we could do vhost-user client and reconnect implementation
on all backends, it's clear that implementing it in the only backend
(QEMU) introduces more benefits.

OTOH, I could implement DPDK vhost-user as client and try reconnect
there, if that could makes all stuff a bit easier.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
  2016-05-05  3:44                       ` Yuanhan Liu
@ 2016-05-05 13:42                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2016-05-05 13:42 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Marc-André Lureau, QEMU, Ilya Maximets, jonshin,
	Tetsuya Mukawa, Xie, Huawei

On Wed, May 04, 2016 at 08:44:37PM -0700, Yuanhan Liu wrote:
> Hello,
> 
> On Wed, May 04, 2016 at 10:13:49PM +0300, Michael S. Tsirkin wrote:
> > How do you avoid it?
> > 
> > > > Management is required to make this robust, auto-reconnect
> > > > is handy for people bypassing management.
> > > 
> > > tbh, I don't like autoreconnect. My previous series didn't include
> > > this and assumed the feature would be supported only when qemu is
> > > configured to be the server. I added reconnect upon request by users.
> > 
> > I don't have better solutions so OK I guess.
> 
> Yes, it's a request from me :)
> Well, there may be few others also requested that.
> 
> The reason I had this asking is that, so far, we just have only one
> vhost-user frontend, and that is QEMU. But we may have many backends,
> I'm aware of 4 at the time writing, including the vubr from QEMU.
> While we could do vhost-user client and reconnect implementation
> on all backends, it's clear that implementing it in the only backend
> (QEMU) introduces more benefits.
> 
> OTOH, I could implement DPDK vhost-user as client and try reconnect
> there, if that could makes all stuff a bit easier.
> 
> 	--yliu

In my opinion, if slave initiates disconnecting then slave should
also initiate connecting.

In a sense, autoretry of connections is not a vhost-user
feature. It's a general chardev feature. It also does not
have to be related to disconnect: retrying on first connect
failure makes exactly as much sense.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-05-05 13:42 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 11:16 [Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 01/18] tests: append i386 tests marcandre.lureau
2016-04-01 20:30   ` [Qemu-devel] [PATCH 01/18 for-2.6] " Eric Blake
2016-04-01 11:16 ` [Qemu-devel] [PATCH 02/18] char: lower reconnect error to trace event marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 03/18] char: use a trace for when the char is waiting marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 04/18] char: add wait support for reconnect marcandre.lureau
2016-04-28  4:33   ` Yuanhan Liu
2016-04-01 11:16 ` [Qemu-devel] [PATCH 05/18] vhost-user: check reconnect comes with wait marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 06/18] vhost-user: add ability to know vhost-user backend disconnection marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 07/18] vhost: add vhost_dev stop callback marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 08/18] vhost-user: add vhost_user to hold the chr marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 09/18] qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support marcandre.lureau
2016-04-01 20:33   ` Eric Blake
2016-04-01 11:16 ` [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support marcandre.lureau
2016-04-13  2:49   ` Yuanhan Liu
2016-04-13  9:51     ` Marc-André Lureau
2016-04-13 17:32       ` Yuanhan Liu
2016-04-13 21:43         ` Marc-André Lureau
2016-04-13 21:57           ` Yuanhan Liu
2016-04-28  5:23   ` Yuanhan Liu
2016-04-29 10:40     ` Marc-André Lureau
2016-04-29 17:48       ` Yuanhan Liu
2016-05-01 11:37         ` Michael S. Tsirkin
2016-05-01 21:04           ` Yuanhan Liu
2016-05-02 10:45             ` Michael S. Tsirkin
2016-05-02 11:29               ` Marc-André Lureau
2016-05-02 12:04                 ` Michael S. Tsirkin
2016-05-02 17:50                   ` Yuanhan Liu
2016-05-04 13:16                   ` Marc-André Lureau
2016-05-04 19:13                     ` Michael S. Tsirkin
2016-05-05  3:44                       ` Yuanhan Liu
2016-05-05 13:42                         ` Michael S. Tsirkin
2016-05-02 17:37               ` Yuanhan Liu
2016-04-01 11:16 ` [Qemu-devel] [PATCH 12/18] vhost-user: disconnect on start failure marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 13/18] vhost-net: do not crash if backend is not present marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 14/18] vhost-net: save & restore vhost-user acked features marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 15/18] vhost-net: save & restore vring enable state marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 16/18] test: vubr check " marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 17/18] test: start vhost-user reconnect test marcandre.lureau
2016-04-01 11:16 ` [Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test marcandre.lureau
2016-04-13  2:52   ` Yuanhan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.