All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization
@ 2020-04-23 18:39 Dima Stepanov
  2020-04-23 18:39 ` [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate disconnect on init Dima Stepanov
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

During vhost-user reconnect functionality we hit several issues, if
vhost-user-blk daemon is "crashed" or made disconnect during vhost
initialization. The general scenario is as follows:
  - vhost start routine is called
  - vhost write failed due to SIGPIPE
  - this call the disconnect routine and vhost_dev_cleanup routine
    which set to 0 all the field of the vhost_dev structure
  - return back to vhost start routine with the error
  - on the fail path vhost start routine tries to rollback the changes
    by using vhost_dev struct fields which were already reset
  - sometimes this leads to SIGSEGV, sometimes to SIGABRT
Before revising the vhost-user initialization code, we suggest adding
the sanity checks to be aware of the possible disconnect event and that
the vhost_dev structure can be in "uninitialized" state.

The vhost-user-blk daemon is updated with the additional
"--simulate-disconnect-stage=CASENUM" argument to simulate disconnect during
VHOST device initialization. For instance:
  1. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=1
     This command will simulate disconnect in the SET_VRING_CALL handler.
     In this case the vhost device in QEMU is not set the started field to
     true.
  2. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=2
     This command will simulate disconnect in the SET_VRING_NUM handler.
     In this case the started field is set to true.
These two cases test different QEMU parts. Also to trigger different code paths
disconnect should be simulated in two ways:
  - before any successful initialization
  - make successful initialization once and try to simulate disconnects
Also we catch SIGABRT on the migration start if vhost-user daemon disconnected
during vhost-user set log commands communication.

Dima Stepanov (7):
  contrib/vhost-user-blk: add option to simulate disconnect on init
  char-socket: return -1 in case of disconnect during tcp_chr_write
  char-socket: initialize reconnect timer only if close is emitted
  vhost: introduce wrappers to set guest notifiers for virtio device
  vhost-user-blk: add mechanism to track the guest notifiers init state
  vhost: check vring address before calling unmap
  vhost: add device started check in migration set log

 backends/cryptodev-vhost.c              |  26 +++++---
 backends/vhost-user.c                   |  16 ++---
 chardev/char-socket.c                   |  18 ++---
 contrib/libvhost-user/libvhost-user.c   |  30 +++++++++
 contrib/libvhost-user/libvhost-user.h   |  13 ++++
 contrib/vhost-user-blk/vhost-user-blk.c |  14 +++-
 hw/block/vhost-user-blk.c               |  23 +++----
 hw/net/vhost_net.c                      |  30 +++++----
 hw/scsi/vhost-scsi-common.c             |  15 ++---
 hw/virtio/vhost-user-fs.c               |  17 ++---
 hw/virtio/vhost-vsock.c                 |  18 +++--
 hw/virtio/vhost.c                       | 115 +++++++++++++++++++++++++++++---
 hw/virtio/virtio.c                      |  13 ++++
 include/hw/virtio/vhost.h               |   5 ++
 include/hw/virtio/virtio.h              |   1 +
 15 files changed, 256 insertions(+), 98 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate disconnect on init
  2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
@ 2020-04-23 18:39 ` Dima Stepanov
       [not found]   ` <20200422160206.GA30919@localhost.localdomain>
  2020-04-23 18:39 ` [RFC PATCH v1 2/7] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

Add "--simulate-disconnect-stage" option for the testing purposes.
This option can be used to test the vhost-user reconnect functionality:
  ./vhost-user-blk ... --simulate-disconnect-stage=<CASENUM>
In this case the daemon will "crash" in the middle of the VHOST comands
communication. Case nums are as follows:
  1 - make assert in the handler of the SET_VRING_CALL command
  2 - make assert in the handler of the SET_VRING_NUM command
Main purpose is to test QEMU reconnect functionality. Such fail
injection should not lead to QEMU crash and should be handled
successfully.
Also update the "GOptionEntry entries" definition with the final NULL
item according to API.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 contrib/libvhost-user/libvhost-user.c   | 30 ++++++++++++++++++++++++++++++
 contrib/libvhost-user/libvhost-user.h   | 13 +++++++++++++
 contrib/vhost-user-blk/vhost-user-blk.c | 14 +++++++++++++-
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 3bca996..5215214 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -73,6 +73,14 @@
 #define VHOST_USER_VERSION 1
 #define LIBVHOST_USER_DEBUG 0
 
+/*
+ * Inject fail in different places in daemon. This will trigger different
+ * paths in QEMU. Main purpose is to test the reconnect functionality
+ * during vhost initialization step.
+ */
+#define VHOST_SDISCONNECT_SET_VRING_CALL 1
+#define VHOST_SDISCONNECT_SET_VRING_NUM 2
+
 #define DPRINT(...)                             \
     do {                                        \
         if (LIBVHOST_USER_DEBUG) {              \
@@ -861,6 +869,11 @@ vu_set_vring_num_exec(VuDev *dev, VhostUserMsg *vmsg)
     DPRINT("State.index: %d\n", index);
     DPRINT("State.num:   %d\n", num);
     dev->vq[index].vring.num = num;
+    if (dev->simulate_init_disconnect == VHOST_SDISCONNECT_SET_VRING_NUM) {
+        DPRINT("Simulate vhost daemon crash during initialization.\n");
+        assert(0);
+        return false;
+    }
 
     return false;
 }
@@ -1161,6 +1174,13 @@ vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
 
     DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
 
+    /* Simulate crash during initialization. */
+    if (dev->simulate_init_disconnect == VHOST_SDISCONNECT_SET_VRING_CALL) {
+        DPRINT("Simulate vhost daemon crash during initialization.\n");
+        assert(0);
+        return false;
+    }
+
     if (!vu_check_queue_msg_file(dev, vmsg)) {
         return false;
     }
@@ -2073,6 +2093,16 @@ vu_queue_empty(VuDev *dev, VuVirtq *vq)
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
+/*
+ * Set the flag to simulate the vhost-user daemon crash during
+ * initialization. This is used to test reconnect functionality.
+ */
+void
+vu_simulate_init_disconnect(VuDev *dev, int should_simulate)
+{
+    dev->simulate_init_disconnect = should_simulate;
+}
+
 static bool
 vring_notify(VuDev *dev, VuVirtq *vq)
 {
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index f30394f..9f75e86 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -388,6 +388,9 @@ struct VuDev {
     /* Postcopy data */
     int postcopy_ufd;
     bool postcopy_listening;
+
+    /* Fields to simulate test cases. */
+    int simulate_init_disconnect;
 };
 
 typedef struct VuVirtqElement {
@@ -645,4 +648,14 @@ void vu_queue_get_avail_bytes(VuDev *vdev, VuVirtq *vq, unsigned int *in_bytes,
 bool vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
 
+/**
+ * vu_simulate_init_disconnect:
+ * @dev: a VuDev context
+ * @should_simulate: expected simulation behaviour (true or false)
+ *
+ * Set the flag to simulate the vhost-user daemon crash during
+ * initialization. This is used to test reconnect functionality.
+ */
+void vu_simulate_init_disconnect(VuDev *dev, int should_simulate);
+
 #endif /* LIBVHOST_USER_H */
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index 6fd91c7..6ac37ca 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -581,6 +581,7 @@ static char *opt_socket_path;
 static char *opt_blk_file;
 static gboolean opt_print_caps;
 static gboolean opt_read_only;
+static gboolean opt_simulate_disconnect;
 
 static GOptionEntry entries[] = {
     { "print-capabilities", 'c', 0, G_OPTION_ARG_NONE, &opt_print_caps,
@@ -592,7 +593,14 @@ static GOptionEntry entries[] = {
     {"blk-file", 'b', 0, G_OPTION_ARG_FILENAME, &opt_blk_file,
      "block device or file path", "PATH"},
     { "read-only", 'r', 0, G_OPTION_ARG_NONE, &opt_read_only,
-      "Enable read-only", NULL }
+      "Enable read-only", NULL },
+    { "simulate-disconnect-stage", 0, 0, G_OPTION_ARG_INT,
+      &opt_simulate_disconnect,
+      "Simulate disconnect during initialization for the testing purposes.\n"
+      "\t1 - make assert in the handler of the SET_VRING_CALL command\n"
+      "\t2 - make assert in the handler of the SET_VRING_NUM command\n",
+      "CASENUM" },
+    { NULL },
 };
 
 int main(int argc, char **argv)
@@ -656,6 +664,10 @@ int main(int argc, char **argv)
         exit(EXIT_FAILURE);
     }
 
+    /* Set testing flags. */
+    vu_simulate_init_disconnect(&vdev_blk->parent.parent,
+            opt_simulate_disconnect);
+
     g_main_loop_run(vdev_blk->loop);
     g_main_loop_unref(vdev_blk->loop);
     g_option_context_free(context);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v1 2/7] char-socket: return -1 in case of disconnect during tcp_chr_write
  2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
  2020-04-23 18:39 ` [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate disconnect on init Dima Stepanov
@ 2020-04-23 18:39 ` Dima Stepanov
  2020-04-23 20:10   ` Marc-André Lureau
       [not found]   ` <ca921f6f56104bcbb664424f97558ec3@HE1PR08MB2650.eurprd08.prod.outlook.com>
  2020-04-23 18:39 ` [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted Dima Stepanov
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

During testing of the vhost-user-blk reconnect functionality the qemu
SIGSEGV was triggered:
 start qemu as:
 x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
   -object memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
   -numa node,cpus=0,memdev=ram-node0 \
   -chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
   -device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
 start vhost-user-blk daemon:
 ./vhost-user-blk -s ./vhost.sock -b test-img.raw

If vhost-user-blk will be killed during the vhost initialization
process, for instance after getting VHOST_SET_VRING_CALL command, then
QEMU will fail with the following backtrace:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
    at ./hw/virtio/vhost-user.c:260
260         CharBackend *chr = u->user->chr;

 #0  0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
    at ./hw/virtio/vhost-user.c:260
 #1  0x000055555592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
    at ./hw/virtio/vhost-user.c:1645
 #2  0x0000555555925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
    at ./hw/virtio/vhost.c:1490
 #3  0x00005555558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd8f0)
    at ./hw/block/vhost-user-blk.c:429
 #4  0x0000555555920090 in virtio_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd948)
    at ./hw/virtio/virtio.c:3615
 #5  0x0000555555a9779c in device_set_realized (obj=0x7fffef2d51a0, value=true, errp=0x7fffffffdb88)
    at ./hw/core/qdev.c:891
 ...

The problem is that vhost_user_write doesn't get an error after
disconnect and try to call vhost_user_read(). The tcp_chr_write()
routine should return -1 in case of disconnect. Indicate the EIO error
if this routine is called in the disconnected state.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 chardev/char-socket.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 185fe38..c128cca 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
         if (ret < 0 && errno != EAGAIN) {
             if (tcp_chr_read_poll(chr) <= 0) {
                 tcp_chr_disconnect_locked(chr);
-                return len;
+                /* Return an error since we made a disconnect. */
+                return ret;
             } /* else let the read handler finish it properly */
         }
 
         return ret;
     } else {
-        /* XXX: indicate an error ? */
-        return len;
+        /* Indicate an error. */
+        errno = EIO;
+        return -1;
     }
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted
  2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
  2020-04-23 18:39 ` [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate disconnect on init Dima Stepanov
  2020-04-23 18:39 ` [RFC PATCH v1 2/7] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
@ 2020-04-23 18:39 ` Dima Stepanov
  2020-04-23 19:16   ` Marc-André Lureau
  2020-04-23 18:39 ` [RFC PATCH v1 4/7] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

During vhost-user reconnect functionality testing the following assert
was hit:
  qemu-system-x86_64: chardev/char-socket.c:125:
  qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed.
  Aborted (core dumped)
This is observed only if the connection is closed by the vhost-user-blk
daemon during the initialization routine. In this case the
tcp_chr_disconnect_locked() routine is called twice. First time it is
called in the tcp_chr_write() routine, after getting the SIGPIPE signal.
Second time it is called when vhost_user_blk_connect() routine return
error. In general it looks correct, because the initialization routine
can return error in many cases.
The tcp_chr_disconnect_locked() routine could be fixed. The timer will
be restarted only if the close event is emitted.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 chardev/char-socket.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index c128cca..83ca4d9 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -476,7 +476,7 @@ static void update_disconnected_filename(SocketChardev *s)
 static void tcp_chr_disconnect_locked(Chardev *chr)
 {
     SocketChardev *s = SOCKET_CHARDEV(chr);
-    bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
+    bool was_connected = s->state == TCP_CHARDEV_STATE_CONNECTED;
 
     tcp_chr_free_connection(chr);
 
@@ -485,11 +485,11 @@ static void tcp_chr_disconnect_locked(Chardev *chr)
                                               chr, NULL, chr->gcontext);
     }
     update_disconnected_filename(s);
-    if (emit_close) {
+    if (was_connected) {
         qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
-    }
-    if (s->reconnect_time) {
-        qemu_chr_socket_restart_timer(chr);
+        if (s->reconnect_time) {
+            qemu_chr_socket_restart_timer(chr);
+        }
     }
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v1 4/7] vhost: introduce wrappers to set guest notifiers for virtio device
  2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
                   ` (2 preceding siblings ...)
  2020-04-23 18:39 ` [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted Dima Stepanov
@ 2020-04-23 18:39 ` Dima Stepanov
  2020-04-23 18:39 ` [RFC PATCH v1 5/7] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

Introduce new wrappers to set/reset guest notifiers for the virtio
device in the vhost device module:
  vhost_dev_assign_guest_notifiers
    ->set_guest_notifiers(..., ..., true);
  vhost_dev_drop_guest_notifiers
    ->set_guest_notifiers(..., ..., false);
This is a preliminary step to refactor code, so the set_guest_notifiers
methods could be called based on the vhost device state.
Update all vhost used devices to use these wrappers instead of direct
method call.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 backends/cryptodev-vhost.c  | 26 +++++++++++++++-----------
 backends/vhost-user.c       | 16 +++++-----------
 hw/block/vhost-user-blk.c   | 15 +++++----------
 hw/net/vhost_net.c          | 30 +++++++++++++++++-------------
 hw/scsi/vhost-scsi-common.c | 15 +++++----------
 hw/virtio/vhost-user-fs.c   | 17 +++++++----------
 hw/virtio/vhost-vsock.c     | 18 ++++++++----------
 hw/virtio/vhost.c           | 38 ++++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio.c          | 13 +++++++++++++
 include/hw/virtio/vhost.h   |  4 ++++
 include/hw/virtio/virtio.h  |  1 +
 11 files changed, 118 insertions(+), 75 deletions(-)

diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index 8337c9a..4522195 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -169,16 +169,13 @@ vhost_set_vring_enable(CryptoDevBackendClient *cc,
 int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
 {
     VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     int r, e;
     int i;
     CryptoDevBackend *b = vcrypto->cryptodev;
     CryptoDevBackendVhost *vhost_crypto;
     CryptoDevBackendClient *cc;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(dev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
@@ -198,9 +195,13 @@ int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
         }
      }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues, true);
+    /*
+     * Since all the states are handled by one vhost device,
+     * use the first one in array.
+     */
+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+    r = vhost_dev_assign_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
     if (r < 0) {
-        error_report("error binding guest notifier: %d", -r);
         goto err;
     }
 
@@ -232,7 +233,8 @@ err_start:
         vhost_crypto = cryptodev_get_vhost(cc, b, i);
         cryptodev_vhost_stop_one(vhost_crypto, dev);
     }
-    e = k->set_guest_notifiers(qbus->parent, total_queues, false);
+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+    e = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
     if (e < 0) {
         error_report("vhost guest notifier cleanup failed: %d", e);
     }
@@ -242,9 +244,6 @@ err:
 
 void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
     CryptoDevBackend *b = vcrypto->cryptodev;
     CryptoDevBackendVhost *vhost_crypto;
@@ -259,7 +258,12 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
         cryptodev_vhost_stop_one(vhost_crypto, dev);
     }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues, false);
+    /*
+     * Since all the states are handled by one vhost device,
+     * use the first one in array.
+     */
+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+    r = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
     if (r < 0) {
         error_report("vhost guest notifier cleanup failed: %d", r);
     }
diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index 2bf3406..e116bc6 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -60,15 +60,13 @@ vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice *vdev,
 void
 vhost_user_backend_start(VhostUserBackend *b)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret, i ;
 
     if (b->started) {
         return;
     }
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(b->vdev)) {
         error_report("binding does not support guest notifiers");
         return;
     }
@@ -78,9 +76,8 @@ vhost_user_backend_start(VhostUserBackend *b)
         return;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, b->dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier");
         goto err_host_notifiers;
     }
 
@@ -104,7 +101,7 @@ vhost_user_backend_start(VhostUserBackend *b)
     return;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, b->dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&b->dev, b->vdev);
 }
@@ -112,8 +109,6 @@ err_host_notifiers:
 void
 vhost_user_backend_stop(VhostUserBackend *b)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret = 0;
 
     if (!b->started) {
@@ -122,9 +117,8 @@ vhost_user_backend_stop(VhostUserBackend *b)
 
     vhost_dev_stop(&b->dev, b->vdev);
 
-    if (k->set_guest_notifiers) {
-        ret = k->set_guest_notifiers(qbus->parent,
-                                     b->dev.nvqs, false);
+    if (virtio_device_guest_notifiers_initialized(b->vdev)) {
+        ret = vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
         if (ret < 0) {
             error_report("vhost guest notifier cleanup failed: %d", ret);
         }
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 17df533..70d7842 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -109,11 +109,9 @@ const VhostDevConfigOps blk_ops = {
 static int vhost_user_blk_start(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int i, ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
@@ -124,9 +122,8 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
         return ret;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier: %d", -ret);
         goto err_host_notifiers;
     }
 
@@ -163,7 +160,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
     return ret;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&s->dev, vdev);
     return ret;
@@ -172,17 +169,15 @@ err_host_notifiers:
 static void vhost_user_blk_stop(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         return;
     }
 
     vhost_dev_stop(&s->dev, vdev);
 
-    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
+    ret = vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
         return;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6b82803..c13b444 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -303,19 +303,15 @@ static void vhost_net_stop_one(struct vhost_net *net,
 int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
                     int total_queues)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+    struct vhost_net *net;
     int r, e, i;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(dev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
 
     for (i = 0; i < total_queues; i++) {
-        struct vhost_net *net;
-
         net = get_vhost_net(ncs[i].peer);
         vhost_net_set_vq_index(net, i * 2);
 
@@ -328,9 +324,13 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         }
      }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
+    /*
+     * Since all the states are handled by one vhost_net device,
+     * use the first one in array.
+     */
+    net = get_vhost_net(ncs[0].peer);
+    r = vhost_dev_assign_guest_notifiers(&net->dev, dev, total_queues * 2);
     if (r < 0) {
-        error_report("Error binding guest notifier: %d", -r);
         goto err;
     }
 
@@ -357,7 +357,8 @@ err_start:
     while (--i >= 0) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
     }
-    e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
+    net = get_vhost_net(ncs[0].peer);
+    e = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
     if (e < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
         fflush(stderr);
@@ -369,16 +370,19 @@ err:
 void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
                     int total_queues)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+    struct vhost_net *net;
     int i, r;
 
     for (i = 0; i < total_queues; i++) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
     }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
+    /*
+     * Since all the states are handled by one vhost_net device,
+     * use the first one in array.
+     */
+    net = get_vhost_net(ncs[0].peer);
+    r = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
     if (r < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
         fflush(stderr);
diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 8ec49d7..8f51ec0 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -29,10 +29,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
 {
     int ret, i;
     VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
@@ -42,9 +40,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
         return ret;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier");
         goto err_host_notifiers;
     }
 
@@ -66,7 +63,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
     return ret;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&vsc->dev, vdev);
     return ret;
@@ -75,14 +72,12 @@ err_host_notifiers:
 void vhost_scsi_common_stop(VHostSCSICommon *vsc)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret = 0;
 
     vhost_dev_stop(&vsc->dev, vdev);
 
-    if (k->set_guest_notifiers) {
-        ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
+    if (virtio_device_guest_notifiers_initialized(vdev)) {
+        ret = vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
         if (ret < 0) {
                 error_report("vhost guest notifier cleanup failed: %d", ret);
         }
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 6136768..6b101fc 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -38,12 +38,10 @@ static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 static void vuf_start(VirtIODevice *vdev)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
     int i;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return;
     }
@@ -54,9 +52,9 @@ static void vuf_start(VirtIODevice *vdev)
         return;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&fs->vhost_dev, vdev,
+            fs->vhost_dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier: %d", -ret);
         goto err_host_notifiers;
     }
 
@@ -79,7 +77,7 @@ static void vuf_start(VirtIODevice *vdev)
     return;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev, fs->vhost_dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&fs->vhost_dev, vdev);
 }
@@ -87,17 +85,16 @@ err_host_notifiers:
 static void vuf_stop(VirtIODevice *vdev)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         return;
     }
 
     vhost_dev_stop(&fs->vhost_dev, vdev);
 
-    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
+    ret = vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev,
+            fs->vhost_dev.nvqs);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
         return;
diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index 09b6b07..52489dd 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -75,12 +75,10 @@ static int vhost_vsock_set_running(VHostVSock *vsock, int start)
 static void vhost_vsock_start(VirtIODevice *vdev)
 {
     VHostVSock *vsock = VHOST_VSOCK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
     int i;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return;
     }
@@ -91,9 +89,9 @@ static void vhost_vsock_start(VirtIODevice *vdev)
         return;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&vsock->vhost_dev,
+            vdev, vsock->vhost_dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier: %d", -ret);
         goto err_host_notifiers;
     }
 
@@ -123,7 +121,8 @@ static void vhost_vsock_start(VirtIODevice *vdev)
 err_dev_start:
     vhost_dev_stop(&vsock->vhost_dev, vdev);
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
+            vdev, vsock->vhost_dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&vsock->vhost_dev, vdev);
 }
@@ -131,11 +130,9 @@ err_host_notifiers:
 static void vhost_vsock_stop(VirtIODevice *vdev)
 {
     VHostVSock *vsock = VHOST_VSOCK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         return;
     }
 
@@ -147,7 +144,8 @@ static void vhost_vsock_stop(VirtIODevice *vdev)
 
     vhost_dev_stop(&vsock->vhost_dev, vdev);
 
-    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
+    ret = vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
+            vdev, vsock->vhost_dev.nvqs);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
         return;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 01ebe12..fa3da9c 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1419,6 +1419,44 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     virtio_device_release_ioeventfd(vdev);
 }
 
+/*
+ * Assign guest notifiers.
+ * Should be called after vhost_dev_enable_notifiers.
+ */
+int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
+                                     VirtIODevice *vdev, int nvqs)
+{
+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+    int ret;
+
+    ret = k->set_guest_notifiers(qbus->parent, nvqs, true);
+    if (ret < 0) {
+        error_report("Error binding guest notifier: %d", -ret);
+    }
+
+    return ret;
+}
+
+/*
+ * Drop guest notifiers.
+ * Should be called before vhost_dev_disable_notifiers.
+ */
+int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
+                                   VirtIODevice *vdev, int nvqs)
+{
+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+    int ret;
+
+    ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
+    if (ret < 0) {
+        error_report("Error reset guest notifier: %d", -ret);
+    }
+
+    return ret;
+}
+
 /* Test and clear event pending status.
  * Should be called after unmask to avoid losing events.
  */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index b6c8ef5..8a95618 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3812,6 +3812,19 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
     return virtio_bus_ioeventfd_enabled(vbus);
 }
 
+/*
+ * Check if set_guest_notifiers() method is set by the init routine.
+ * Return true if yes, otherwise return false.
+ */
+bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev)
+{
+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    return k->set_guest_notifiers;
+}
+
+
 static const TypeInfo virtio_device_info = {
     .name = TYPE_VIRTIO_DEVICE,
     .parent = TYPE_DEVICE,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 085450c..4d0d2e2 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -100,6 +100,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
+                                     VirtIODevice *vdev, int nvqs);
+int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
+                                   VirtIODevice *vdev, int nvqs);
 
 /* Test and clear masked event pending status.
  * Should be called after unmask to avoid losing events.
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b69d517..d9a3d72 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -323,6 +323,7 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
                                                 VirtIOHandleAIOOutput handle_output);
 VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
 VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
+bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev);
 
 static inline void virtio_add_feature(uint64_t *features, unsigned int fbit)
 {
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v1 5/7] vhost-user-blk: add mechanism to track the guest notifiers init state
  2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
                   ` (3 preceding siblings ...)
  2020-04-23 18:39 ` [RFC PATCH v1 4/7] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
@ 2020-04-23 18:39 ` Dima Stepanov
       [not found]   ` <20200422161750.GC31091@localhost.localdomain>
  2020-04-23 18:39 ` [RFC PATCH v1 6/7] vhost: check vring address before calling unmap Dima Stepanov
  2020-04-23 18:39 ` [RFC PATCH v1 7/7] vhost: add device started check in migration set log Dima Stepanov
  6 siblings, 1 reply; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

In case of the vhost-user devices the daemon can be killed at any
moment. Since QEMU supports the reconnet functionality the guest
notifiers should be reset and disabled after "disconnect" event. The
most issues were found if the "disconnect" event happened during vhost
device initialization step.
The disconnect event leads to the call of the vhost_dev_cleanup()
routine. Which memset to 0 a vhost device structure. Because of this, if
device was not started (dev.started == false) and the connection is
broken, then the set_guest_notifier method will produce assertion error.
Also connection can be broken after the dev.started field is set to
true.
A new notifiers_set field is added to the vhost_dev structure to track
the state of the guest notifiers during the initialization process.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 hw/block/vhost-user-blk.c |  8 ++++----
 hw/virtio/vhost.c         | 11 +++++++++++
 include/hw/virtio/vhost.h |  1 +
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 70d7842..5a3de0f 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -175,7 +175,9 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&s->dev, vdev);
+    if (s->dev.started) {
+        vhost_dev_stop(&s->dev, vdev);
+    }
 
     ret = vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
     if (ret < 0) {
@@ -337,9 +339,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
     }
     s->connected = false;
 
-    if (s->dev.started) {
-        vhost_user_blk_stop(vdev);
-    }
+    vhost_user_blk_stop(vdev);
 
     vhost_dev_cleanup(&s->dev);
 }
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index fa3da9c..ddbdc53 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1380,6 +1380,7 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
             goto fail_vq;
         }
     }
+    hdev->notifiers_set = true;
 
     return 0;
 fail_vq:
@@ -1407,6 +1408,10 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
     int i, r;
 
+    if (!hdev->notifiers_set) {
+        return;
+    }
+
     for (i = 0; i < hdev->nvqs; ++i) {
         r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i,
                                          false);
@@ -1417,6 +1422,8 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
         virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i);
     }
     virtio_device_release_ioeventfd(vdev);
+
+    hdev->notifiers_set = false;
 }
 
 /*
@@ -1449,6 +1456,10 @@ int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
+    if (!hdev->notifiers_set) {
+        return 0;
+    }
+
     ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
     if (ret < 0) {
         error_report("Error reset guest notifier: %d", -ret);
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 4d0d2e2..e3711a7 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -90,6 +90,7 @@ struct vhost_dev {
     QLIST_HEAD(, vhost_iommu) iommu_list;
     IOMMUNotifier n;
     const VhostDevConfigOps *config_ops;
+    bool notifiers_set;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v1 6/7] vhost: check vring address before calling unmap
  2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
                   ` (4 preceding siblings ...)
  2020-04-23 18:39 ` [RFC PATCH v1 5/7] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
@ 2020-04-23 18:39 ` Dima Stepanov
  2020-04-23 18:39 ` [RFC PATCH v1 7/7] vhost: add device started check in migration set log Dima Stepanov
  6 siblings, 0 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

Since disconnect can happen at any time during initialization not all
vring buffers (for instance used vring) can be intialized successfully.
If the buffer was not initialized then vhost_memory_unmap call will lead
to SIGSEGV. Add checks for the vring address value before calling unmap.
Also add assert() in the vhost_memory_unmap() routine.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 hw/virtio/vhost.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddbdc53..3ee50c4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
                                hwaddr len, int is_write,
                                hwaddr access_len)
 {
+    assert(buffer);
+
     if (!vhost_dev_has_iommu(dev)) {
         cpu_physical_memory_unmap(buffer, len, is_write, access_len);
     }
@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
                                                 vhost_vq_index);
     }
 
-    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-                       1, virtio_queue_get_used_size(vdev, idx));
-    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-                       0, virtio_queue_get_avail_size(vdev, idx));
-    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-                       0, virtio_queue_get_desc_size(vdev, idx));
+    /*
+     * Since the vhost-user disconnect can happen during initialization
+     * check if vring was initialized, before making unmap.
+     */
+    if (vq->used) {
+        vhost_memory_unmap(dev, vq->used,
+                           virtio_queue_get_used_size(vdev, idx),
+                           1, virtio_queue_get_used_size(vdev, idx));
+    }
+    if (vq->avail) {
+        vhost_memory_unmap(dev, vq->avail,
+                           virtio_queue_get_avail_size(vdev, idx),
+                           0, virtio_queue_get_avail_size(vdev, idx));
+    }
+    if (vq->desc) {
+        vhost_memory_unmap(dev, vq->desc,
+                           virtio_queue_get_desc_size(vdev, idx),
+                           0, virtio_queue_get_desc_size(vdev, idx));
+    }
 }
 
 static void vhost_eventfd_add(MemoryListener *listener,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v1 7/7] vhost: add device started check in migration set log
  2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
                   ` (5 preceding siblings ...)
  2020-04-23 18:39 ` [RFC PATCH v1 6/7] vhost: check vring address before calling unmap Dima Stepanov
@ 2020-04-23 18:39 ` Dima Stepanov
  6 siblings, 0 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-23 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. If such
disconnect happened in the vhost_migration_log() routine the vhost
device structure will be clean up.
At the start of the vhost_migration_log() function there is a check:
  if (!dev->started) {
      dev->log_enabled = enable;
      return 0;
  }
To be consistent with this check add the same check after calling the
vhost_dev_set_log() routine. This in general help not to break a
migration due the assert() message. But it looks like that this code
should be revised to handle these errors more carefully.

In case of vhost-user device backend the fail paths should consider the
state of the device. In this case we should skip some function calls
during rollback on the error paths, so not to get the NULL dereference
errors.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 3ee50c4..d5ab96d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
 static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
 {
     int r, i, idx;
+
+    if (!dev->started) {
+        /*
+         * If vhost-user daemon is used as a backend for the
+         * device and the connection is broken, then the vhost_dev
+         * structure will be reset all its values to 0.
+         * Add additional check for the device state.
+         */
+        return -1;
+    }
+
     r = vhost_dev_set_features(dev, enable_log);
     if (r < 0) {
         goto err_features;
@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
     }
     return 0;
 err_vq:
-    for (; i >= 0; --i) {
+    /*
+     * Disconnect with the vhost-user daemon can lead to the
+     * vhost_dev_cleanup() call which will clean up vhost_dev
+     * structure.
+     */
+    for (; dev->started && (i >= 0); --i) {
         idx = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + i);
         vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
                                  dev->log_enabled);
     }
-    vhost_dev_set_features(dev, dev->log_enabled);
+    if (dev->started) {
+        vhost_dev_set_features(dev, dev->log_enabled);
+    }
 err_features:
     return r;
 }
@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
     } else {
         vhost_dev_log_resize(dev, vhost_get_log_size(dev));
         r = vhost_dev_set_log(dev, true);
-        if (r < 0) {
+        /*
+         * The dev log resize can fail, because of disconnect
+         * with the vhost-user-blk daemon. Check the device
+         * state before calling the vhost_dev_set_log()
+         * function.
+         * Don't return error if device isn't started to be
+         * consistent with the check above.
+         */
+        if (dev->started && r < 0) {
             return r;
         }
     }
@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 fail_log:
     vhost_log_put(hdev, false);
 fail_vq:
-    while (--i >= 0) {
+    /*
+     * Disconnect with the vhost-user daemon can lead to the
+     * vhost_dev_cleanup() call which will clean up vhost_dev
+     * structure.
+     */
+    while ((--i >= 0) && (hdev->started)) {
         vhost_virtqueue_stop(hdev,
                              vdev,
                              hdev->vqs + i,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted
  2020-04-23 18:39 ` [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted Dima Stepanov
@ 2020-04-23 19:16   ` Marc-André Lureau
  2020-04-26  7:26     ` Li Feng
  0 siblings, 1 reply; 16+ messages in thread
From: Marc-André Lureau @ 2020-04-23 19:16 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin, Jason Wang, QEMU,
	Dr. David Alan Gilbert, Raphael Norwitz, Gonglei, Li Feng,
	yc-core, Paolo Bonzini, Max Reitz

Hi

On Thu, Apr 23, 2020 at 8:41 PM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> During vhost-user reconnect functionality testing the following assert
> was hit:
>   qemu-system-x86_64: chardev/char-socket.c:125:
>   qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed.
>   Aborted (core dumped)

That looks related to "[PATCH 3/4] char-socket: avoid double call
tcp_chr_free_connection"

> This is observed only if the connection is closed by the vhost-user-blk
> daemon during the initialization routine. In this case the
> tcp_chr_disconnect_locked() routine is called twice. First time it is
> called in the tcp_chr_write() routine, after getting the SIGPIPE signal.
> Second time it is called when vhost_user_blk_connect() routine return
> error. In general it looks correct, because the initialization routine
> can return error in many cases.
> The tcp_chr_disconnect_locked() routine could be fixed. The timer will
> be restarted only if the close event is emitted.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>  chardev/char-socket.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index c128cca..83ca4d9 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -476,7 +476,7 @@ static void update_disconnected_filename(SocketChardev *s)
>  static void tcp_chr_disconnect_locked(Chardev *chr)
>  {
>      SocketChardev *s = SOCKET_CHARDEV(chr);
> -    bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
> +    bool was_connected = s->state == TCP_CHARDEV_STATE_CONNECTED;
>
>      tcp_chr_free_connection(chr);
>
> @@ -485,11 +485,11 @@ static void tcp_chr_disconnect_locked(Chardev *chr)
>                                                chr, NULL, chr->gcontext);
>      }
>      update_disconnected_filename(s);
> -    if (emit_close) {
> +    if (was_connected) {
>          qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
> -    }
> -    if (s->reconnect_time) {
> -        qemu_chr_socket_restart_timer(chr);
> +        if (s->reconnect_time) {
> +            qemu_chr_socket_restart_timer(chr);
> +        }
>      }
>  }
>
> --
> 2.7.4
>
>


-- 
Marc-André Lureau


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 2/7] char-socket: return -1 in case of disconnect during tcp_chr_write
  2020-04-23 18:39 ` [RFC PATCH v1 2/7] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
@ 2020-04-23 20:10   ` Marc-André Lureau
       [not found]   ` <ca921f6f56104bcbb664424f97558ec3@HE1PR08MB2650.eurprd08.prod.outlook.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Marc-André Lureau @ 2020-04-23 20:10 UTC (permalink / raw)
  To: Dima Stepanov, Anton Nefedov
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin, Jason Wang, QEMU,
	Dr. David Alan Gilbert, Raphael Norwitz, Gonglei, yc-core,
	Paolo Bonzini, Max Reitz

Hi

On Thu, Apr 23, 2020 at 8:43 PM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> During testing of the vhost-user-blk reconnect functionality the qemu
> SIGSEGV was triggered:
>  start qemu as:
>  x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
>    -object memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
>    -numa node,cpus=0,memdev=ram-node0 \
>    -chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
>    -device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
>  start vhost-user-blk daemon:
>  ./vhost-user-blk -s ./vhost.sock -b test-img.raw
>
> If vhost-user-blk will be killed during the vhost initialization
> process, for instance after getting VHOST_SET_VRING_CALL command, then
> QEMU will fail with the following backtrace:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
>     at ./hw/virtio/vhost-user.c:260
> 260         CharBackend *chr = u->user->chr;
>
>  #0  0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
>     at ./hw/virtio/vhost-user.c:260
>  #1  0x000055555592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
>     at ./hw/virtio/vhost-user.c:1645
>  #2  0x0000555555925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
>     at ./hw/virtio/vhost.c:1490
>  #3  0x00005555558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd8f0)
>     at ./hw/block/vhost-user-blk.c:429
>  #4  0x0000555555920090 in virtio_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd948)
>     at ./hw/virtio/virtio.c:3615
>  #5  0x0000555555a9779c in device_set_realized (obj=0x7fffef2d51a0, value=true, errp=0x7fffffffdb88)
>     at ./hw/core/qdev.c:891
>  ...
>
> The problem is that vhost_user_write doesn't get an error after
> disconnect and try to call vhost_user_read(). The tcp_chr_write()
> routine should return -1 in case of disconnect. Indicate the EIO error
> if this routine is called in the disconnected state.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>  chardev/char-socket.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 185fe38..c128cca 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>          if (ret < 0 && errno != EAGAIN) {
>              if (tcp_chr_read_poll(chr) <= 0) {
>                  tcp_chr_disconnect_locked(chr);
> -                return len;
> +                /* Return an error since we made a disconnect. */
> +                return ret;

Looks ok, but this return was introduced in commit
b0a335e351103bf92f3f9d0bd5759311be8156ac ("qemu-char: socket backend:
disconnect on write error"). It doesn't say why it didn't return -1
though. Anton, could you review? thanks

>              } /* else let the read handler finish it properly */
>          }
>
>          return ret;
>      } else {
> -        /* XXX: indicate an error ? */
> -        return len;
> +        /* Indicate an error. */
> +        errno = EIO;
> +        return -1;
>      }
>  }
>
> --
> 2.7.4
>
>


-- 
Marc-André Lureau


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 5/7] vhost-user-blk: add mechanism to track the guest notifiers init state
       [not found]   ` <20200422161750.GC31091@localhost.localdomain>
@ 2020-04-24  9:17     ` Dima Stepanov
  0 siblings, 0 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-24  9:17 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: qemu-devel

On Wed, Apr 22, 2020 at 12:17:50PM -0400, Raphael Norwitz wrote:
> There are some problems with this patch. It doesn't apply cleanly.
> 
> Are you sure you’re developing on an up to date master branch?
> 
> On Thu, Apr 23, 2020 at 09:39:36PM +0300, Dima Stepanov wrote:
> > 
> > In case of the vhost-user devices the daemon can be killed at any
> > moment. Since QEMU supports the reconnet functionality the guest
> > notifiers should be reset and disabled after "disconnect" event. The
> > most issues were found if the "disconnect" event happened during vhost
> > device initialization step.
> > The disconnect event leads to the call of the vhost_dev_cleanup()
> > routine. Which memset to 0 a vhost device structure. Because of this, if
> > device was not started (dev.started == false) and the connection is
> > broken, then the set_guest_notifier method will produce assertion error.
> > Also connection can be broken after the dev.started field is set to
> > true.
> > A new notifiers_set field is added to the vhost_dev structure to track
> > the state of the guest notifiers during the initialization process.
> > 
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > ---
> >  hw/block/vhost-user-blk.c |  8 ++++----
> >  hw/virtio/vhost.c         | 11 +++++++++++
> >  include/hw/virtio/vhost.h |  1 +
> >  3 files changed, 16 insertions(+), 4 deletions(-)
> > 
> >  @@ -1417,6 +1422,8 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
> 
> I can’t find the function vhost_dev_drop_guest_notifiers() in
> /hw/virtio/vhost.c, or anywhere in the codebase.
> 
> Where does this code come from?

These wrapper functions were introduced in this patch set with the
previous patch:
  [RFC PATCH v1 4/7] vhost: introduce wrappers to set guest notifiers for
virtio device
The problem was found in the vhost-user-blk stack. But it seems to me
that it is the same for all vhost-user devices. That is why i've tried
to suggest solution which will resolve other vhost-user devices, too.
And this was the reason to add new wrappers.

> 
> >          virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i);
> >      }
> >      virtio_device_release_ioeventfd(vdev);
> > +
> > +    hdev->notifiers_set = false;
> >  }
> >  
> >  /*
> > @@ -1449,6 +1456,10 @@ int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
> >      VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret;
> >  
> > +    if (!hdev->notifiers_set) {
> > +        return 0;
> > +    }
> > +
> >      ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
> >      if (ret < 0) {
> >          error_report("Error reset guest notifier: %d", -ret);
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index 4d0d2e2..e3711a7 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -90,6 +90,7 @@ struct vhost_dev {
> >      QLIST_HEAD(, vhost_iommu) iommu_list;
> >      IOMMUNotifier n;
> >      const VhostDevConfigOps *config_ops;
> > +    bool notifiers_set;
> >  };
> >  
> >  int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> > -- 
> > 2.7.4
> > 
> > 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 2/7] char-socket: return -1 in case of disconnect during tcp_chr_write
       [not found]   ` <ca921f6f56104bcbb664424f97558ec3@HE1PR08MB2650.eurprd08.prod.outlook.com>
@ 2020-04-24  9:55     ` Anton Nefedov
  0 siblings, 0 replies; 16+ messages in thread
From: Anton Nefedov @ 2020-04-24  9:55 UTC (permalink / raw)
  To: Marc-André Lureau, Dima Stepanov, Anton Nefedov
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin, Jason Wang, QEMU,
	Dr. David Alan Gilbert, Raphael Norwitz, Gonglei, yc-core,
	Paolo Bonzini, Max Reitz

> On Thu, Apr 23, 2020 at 8:43 PM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>> The problem is that vhost_user_write doesn't get an error after
>> disconnect and try to call vhost_user_read(). The tcp_chr_write()
>> routine should return -1 in case of disconnect. Indicate the EIO error
>> if this routine is called in the disconnected state.
>>
>> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
>> ---
>>   chardev/char-socket.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
>> index 185fe38..c128cca 100644
>> --- a/chardev/char-socket.c
>> +++ b/chardev/char-socket.c
>> @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>>           if (ret < 0 && errno != EAGAIN) {
>>               if (tcp_chr_read_poll(chr) <= 0) {
>>                   tcp_chr_disconnect_locked(chr);
>> -                return len;
>> +                /* Return an error since we made a disconnect. */
>> +                return ret;
> 
> Looks ok, but this return was introduced in commit
> b0a335e351103bf92f3f9d0bd5759311be8156ac ("qemu-char: socket backend:
> disconnect on write error"). It doesn't say why it didn't return -1
> though. Anton, could you review? thanks
> 

hej,

I think I had no special intent but to repeat the behaviour as in the
snippet below, that is to return @len when the socket is disconnected.

Seems that tcp_chr_write() worked that way since the very beginning
(commit 0bab00f).

It looks ok to me to return an error though. If some guest device doesnt
expect that I guess it should ignore the error on its side.

>>               } /* else let the read handler finish it properly */
>>           }
>>
>>           return ret;
>>       } else {
>> -        /* XXX: indicate an error ? */
>> -        return len;
>> +        /* Indicate an error. */
>> +        errno = EIO;
>> +        return -1;
>>       }
>>   }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate disconnect on init
       [not found]   ` <20200422160206.GA30919@localhost.localdomain>
@ 2020-04-24 10:17     ` Marc-André Lureau
  2020-04-25  9:30       ` Dima Stepanov
  0 siblings, 1 reply; 16+ messages in thread
From: Marc-André Lureau @ 2020-04-24 10:17 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Dima Stepanov, qemu-devel

Hi

On Fri, Apr 24, 2020 at 4:32 AM Raphael Norwitz
<raphael.norwitz@nutanix.com> wrote:
>
> I’m not opposed to adding this kind of debugging functionality to the
> vhost-user-blk sample. It could be helpful to easily test these cases
> in the future.
>
> That said, I'm not sure how others will feel about adding these kind
> of debugging capabilities to libvhost-user. Marc-Andre, thoughts?

Maybe we should only enable this code if LIBVHOST_USER_DEBUG is set?

And to make logging silent by default, we shouldn't print them unless
VHOST_USER_DEBUG env is set?

>
> If we go this route I would prefer to add the debugging options to the
> vhost-user-blk sample in a separate patch.
>
> On Thu, Apr 23, 2020 at 09:39:32PM +0300, Dima Stepanov wrote:
> >
> > Add "--simulate-disconnect-stage" option for the testing purposes.
> > This option can be used to test the vhost-user reconnect functionality:
> >   ./vhost-user-blk ... --simulate-disconnect-stage=<CASENUM>
> > In this case the daemon will "crash" in the middle of the VHOST comands
> > communication. Case nums are as follows:
> >   1 - make assert in the handler of the SET_VRING_CALL command
> >   2 - make assert in the handler of the SET_VRING_NUM command
> > Main purpose is to test QEMU reconnect functionality. Such fail
> > injection should not lead to QEMU crash and should be handled
> > successfully.
> > Also update the "GOptionEntry entries" definition with the final NULL
> > item according to API.
> >
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > ---
> >  contrib/libvhost-user/libvhost-user.c   | 30 ++++++++++++++++++++++++++++++
> >  contrib/libvhost-user/libvhost-user.h   | 13 +++++++++++++
> >  contrib/vhost-user-blk/vhost-user-blk.c | 14 +++++++++++++-
> >  3 files changed, 56 insertions(+), 1 deletion(-)
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate disconnect on init
  2020-04-24 10:17     ` Marc-André Lureau
@ 2020-04-25  9:30       ` Dima Stepanov
  0 siblings, 0 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-25  9:30 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: qemu-devel, Raphael Norwitz

On Fri, Apr 24, 2020 at 12:17:54PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Apr 24, 2020 at 4:32 AM Raphael Norwitz
> <raphael.norwitz@nutanix.com> wrote:
> >
> > I’m not opposed to adding this kind of debugging functionality to the
> > vhost-user-blk sample. It could be helpful to easily test these cases
> > in the future.
> >
> > That said, I'm not sure how others will feel about adding these kind
> > of debugging capabilities to libvhost-user. Marc-Andre, thoughts?
> 
> Maybe we should only enable this code if LIBVHOST_USER_DEBUG is set?
> 
> And to make logging silent by default, we shouldn't print them unless
> VHOST_USER_DEBUG env is set?
Yes, it is a good idea to move this code under LIBVHOST_USER_DEBUG.
Agree. Will update it in version 2, but need more feedback on other
patches first.

> 
> >
> > If we go this route I would prefer to add the debugging options to the
> > vhost-user-blk sample in a separate patch.
> >
> > On Thu, Apr 23, 2020 at 09:39:32PM +0300, Dima Stepanov wrote:
> > >
> > > Add "--simulate-disconnect-stage" option for the testing purposes.
> > > This option can be used to test the vhost-user reconnect functionality:
> > >   ./vhost-user-blk ... --simulate-disconnect-stage=<CASENUM>
> > > In this case the daemon will "crash" in the middle of the VHOST comands
> > > communication. Case nums are as follows:
> > >   1 - make assert in the handler of the SET_VRING_CALL command
> > >   2 - make assert in the handler of the SET_VRING_NUM command
> > > Main purpose is to test QEMU reconnect functionality. Such fail
> > > injection should not lead to QEMU crash and should be handled
> > > successfully.
> > > Also update the "GOptionEntry entries" definition with the final NULL
> > > item according to API.
> > >
> > > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > > ---
> > >  contrib/libvhost-user/libvhost-user.c   | 30 ++++++++++++++++++++++++++++++
> > >  contrib/libvhost-user/libvhost-user.h   | 13 +++++++++++++
> > >  contrib/vhost-user-blk/vhost-user-blk.c | 14 +++++++++++++-
> > >  3 files changed, 56 insertions(+), 1 deletion(-)
> >
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted
  2020-04-23 19:16   ` Marc-André Lureau
@ 2020-04-26  7:26     ` Li Feng
  2020-04-27  9:38       ` Dima Stepanov
  0 siblings, 1 reply; 16+ messages in thread
From: Li Feng @ 2020-04-26  7:26 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin, Jason Wang, QEMU,
	Dr. David Alan Gilbert, Raphael Norwitz, Gonglei, yc-core,
	Paolo Bonzini, Max Reitz, Dima Stepanov

This patch is trying to fix the same issue with me.
However, our fix is different.

I think that check the s->reconnect_timer is better.

Thanks,
Feng Li

Marc-André Lureau <marcandre.lureau@gmail.com> 于2020年4月24日周五 上午3:16写道:


>
> Hi
>
> On Thu, Apr 23, 2020 at 8:41 PM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> >
> > During vhost-user reconnect functionality testing the following assert
> > was hit:
> >   qemu-system-x86_64: chardev/char-socket.c:125:
> >   qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed.
> >   Aborted (core dumped)
>
> That looks related to "[PATCH 3/4] char-socket: avoid double call
> tcp_chr_free_connection"
>
> > This is observed only if the connection is closed by the vhost-user-blk
> > daemon during the initialization routine. In this case the
> > tcp_chr_disconnect_locked() routine is called twice. First time it is
> > called in the tcp_chr_write() routine, after getting the SIGPIPE signal.
> > Second time it is called when vhost_user_blk_connect() routine return
> > error. In general it looks correct, because the initialization routine
> > can return error in many cases.
> > The tcp_chr_disconnect_locked() routine could be fixed. The timer will
> > be restarted only if the close event is emitted.
> >
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > ---
> >  chardev/char-socket.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > index c128cca..83ca4d9 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -476,7 +476,7 @@ static void update_disconnected_filename(SocketChardev *s)
> >  static void tcp_chr_disconnect_locked(Chardev *chr)
> >  {
> >      SocketChardev *s = SOCKET_CHARDEV(chr);
> > -    bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
> > +    bool was_connected = s->state == TCP_CHARDEV_STATE_CONNECTED;
> >
> >      tcp_chr_free_connection(chr);
> >
> > @@ -485,11 +485,11 @@ static void tcp_chr_disconnect_locked(Chardev *chr)
> >                                                chr, NULL, chr->gcontext);
> >      }
> >      update_disconnected_filename(s);
> > -    if (emit_close) {
> > +    if (was_connected) {
> >          qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
> > -    }
> > -    if (s->reconnect_time) {
> > -        qemu_chr_socket_restart_timer(chr);
> > +        if (s->reconnect_time) {
> > +            qemu_chr_socket_restart_timer(chr);
> > +        }
> >      }
> >  }
> >
> > --
> > 2.7.4
> >
> >
>
>
> --
> Marc-André Lureau

-- 
The SmartX email address is only for business purpose. Any sent message 
that is not related to the business is not authorized or permitted by 
SmartX.
本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted
  2020-04-26  7:26     ` Li Feng
@ 2020-04-27  9:38       ` Dima Stepanov
  0 siblings, 0 replies; 16+ messages in thread
From: Dima Stepanov @ 2020-04-27  9:38 UTC (permalink / raw)
  To: Li Feng
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin, Jason Wang, QEMU,
	Dr. David Alan Gilbert, Raphael Norwitz, Gonglei,
	Marc-André Lureau, yc-core, Paolo Bonzini, Max Reitz

On Sun, Apr 26, 2020 at 03:26:58PM +0800, Li Feng wrote:
> This patch is trying to fix the same issue with me.
> However, our fix is different.
> 
> I think that check the s->reconnect_timer is better.

I also thought about your solution:
  - if (s->reconnect_time) {
  + if (s->reconnect_time && !s->reconnect_timer) {
But was afraid of possible side effects. Since Marc-André approved your
fix, i'm also good with your approach. In this case i'll remove this
patch from the v2 patchset.

Thanks for handling it!

> 
> Thanks,
> Feng Li
> 
> Marc-André Lureau <marcandre.lureau@gmail.com> 于2020年4月24日周五 上午3:16写道:
> 
> 
> >
> > Hi
> >
> > On Thu, Apr 23, 2020 at 8:41 PM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> > >
> > > During vhost-user reconnect functionality testing the following assert
> > > was hit:
> > >   qemu-system-x86_64: chardev/char-socket.c:125:
> > >   qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed.
> > >   Aborted (core dumped)
> >
> > That looks related to "[PATCH 3/4] char-socket: avoid double call
> > tcp_chr_free_connection"
> >
> > > This is observed only if the connection is closed by the vhost-user-blk
> > > daemon during the initialization routine. In this case the
> > > tcp_chr_disconnect_locked() routine is called twice. First time it is
> > > called in the tcp_chr_write() routine, after getting the SIGPIPE signal.
> > > Second time it is called when vhost_user_blk_connect() routine return
> > > error. In general it looks correct, because the initialization routine
> > > can return error in many cases.
> > > The tcp_chr_disconnect_locked() routine could be fixed. The timer will
> > > be restarted only if the close event is emitted.
> > >
> > > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > > ---
> > >  chardev/char-socket.c | 10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > > index c128cca..83ca4d9 100644
> > > --- a/chardev/char-socket.c
> > > +++ b/chardev/char-socket.c
> > > @@ -476,7 +476,7 @@ static void update_disconnected_filename(SocketChardev *s)
> > >  static void tcp_chr_disconnect_locked(Chardev *chr)
> > >  {
> > >      SocketChardev *s = SOCKET_CHARDEV(chr);
> > > -    bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
> > > +    bool was_connected = s->state == TCP_CHARDEV_STATE_CONNECTED;
> > >
> > >      tcp_chr_free_connection(chr);
> > >
> > > @@ -485,11 +485,11 @@ static void tcp_chr_disconnect_locked(Chardev *chr)
> > >                                                chr, NULL, chr->gcontext);
> > >      }
> > >      update_disconnected_filename(s);
> > > -    if (emit_close) {
> > > +    if (was_connected) {
> > >          qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
> > > -    }
> > > -    if (s->reconnect_time) {
> > > -        qemu_chr_socket_restart_timer(chr);
> > > +        if (s->reconnect_time) {
> > > +            qemu_chr_socket_restart_timer(chr);
> > > +        }
> > >      }
> > >  }
> > >
> > > --
> > > 2.7.4
> > >
> > >
> >
> >
> > --
> > Marc-André Lureau
> 
> -- 
> The SmartX email address is only for business purpose. Any sent message 
> that is not related to the business is not authorized or permitted by 
> SmartX.
> 本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-04-27  9:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-23 18:39 [RFC PATCH v1 0/7] vhost-user reconnect issues during vhost initialization Dima Stepanov
2020-04-23 18:39 ` [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate disconnect on init Dima Stepanov
     [not found]   ` <20200422160206.GA30919@localhost.localdomain>
2020-04-24 10:17     ` Marc-André Lureau
2020-04-25  9:30       ` Dima Stepanov
2020-04-23 18:39 ` [RFC PATCH v1 2/7] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
2020-04-23 20:10   ` Marc-André Lureau
     [not found]   ` <ca921f6f56104bcbb664424f97558ec3@HE1PR08MB2650.eurprd08.prod.outlook.com>
2020-04-24  9:55     ` Anton Nefedov
2020-04-23 18:39 ` [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if close is emitted Dima Stepanov
2020-04-23 19:16   ` Marc-André Lureau
2020-04-26  7:26     ` Li Feng
2020-04-27  9:38       ` Dima Stepanov
2020-04-23 18:39 ` [RFC PATCH v1 4/7] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
2020-04-23 18:39 ` [RFC PATCH v1 5/7] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
     [not found]   ` <20200422161750.GC31091@localhost.localdomain>
2020-04-24  9:17     ` Dima Stepanov
2020-04-23 18:39 ` [RFC PATCH v1 6/7] vhost: check vring address before calling unmap Dima Stepanov
2020-04-23 18:39 ` [RFC PATCH v1 7/7] vhost: add device started check in migration set log Dima Stepanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.