All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization
@ 2020-04-30 13:36 Dima Stepanov
  2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
                   ` (4 more replies)
  0 siblings, 5 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-04-30 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

Changes in v2:
- Add to CC list: Li Feng <fengli@smartx.com>, since it looks like that we
are working on pretty similar issues
- Remove [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate
disconnect on init. Going to send this functionality in the separate
patch, with the LIBVHOST_USER_DEBUG rework. Need to think how to reuse
this option and silence the messages first.
- Remove [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if
close is emitted. This will be handled in the separate patchset:
[PATCH 3/4] char-socket: avoid double call tcp_chr_free_connection by Li
Feng

v1:

During vhost-user reconnect functionality we hit several issues, if
vhost-user-blk daemon is "crashed" or made disconnect during vhost
initialization. The general scenario is as follows:
  - vhost start routine is called
  - vhost write failed due to SIGPIPE
  - this call the disconnect routine and vhost_dev_cleanup routine
    which set to 0 all the field of the vhost_dev structure
  - return back to vhost start routine with the error
  - on the fail path vhost start routine tries to rollback the changes
    by using vhost_dev struct fields which were already reset
  - sometimes this leads to SIGSEGV, sometimes to SIGABRT
Before revising the vhost-user initialization code, we suggest adding
the sanity checks to be aware of the possible disconnect event and that
the vhost_dev structure can be in "uninitialized" state.

The vhost-user-blk daemon is updated with the additional
"--simulate-disconnect-stage=CASENUM" argument to simulate disconnect during
VHOST device initialization. For instance:
  1. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=1
     This command will simulate disconnect in the SET_VRING_CALL handler.
     In this case the vhost device in QEMU is not set the started field to
     true.
  2. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=2
     This command will simulate disconnect in the SET_VRING_NUM handler.
     In this case the started field is set to true.
These two cases test different QEMU parts. Also to trigger different code paths
disconnect should be simulated in two ways:
  - before any successful initialization
  - make successful initialization once and try to simulate disconnects
Also we catch SIGABRT on the migration start if vhost-user daemon disconnected
during vhost-user set log commands communication.
*** BLURB HERE ***

Dima Stepanov (5):
  char-socket: return -1 in case of disconnect during tcp_chr_write
  vhost: introduce wrappers to set guest notifiers for virtio device
  vhost-user-blk: add mechanism to track the guest notifiers init state
  vhost: check vring address before calling unmap
  vhost: add device started check in migration set log

 backends/cryptodev-vhost.c  |  26 +++++-----
 backends/vhost-user.c       |  16 ++----
 chardev/char-socket.c       |   8 +--
 hw/block/vhost-user-blk.c   |  23 ++++-----
 hw/net/vhost_net.c          |  30 +++++++-----
 hw/scsi/vhost-scsi-common.c |  15 ++----
 hw/virtio/vhost-user-fs.c   |  17 +++----
 hw/virtio/vhost-vsock.c     |  18 +++----
 hw/virtio/vhost.c           | 115 ++++++++++++++++++++++++++++++++++++++++----
 hw/virtio/virtio.c          |  13 +++++
 include/hw/virtio/vhost.h   |   5 ++
 include/hw/virtio/virtio.h  |   1 +
 12 files changed, 195 insertions(+), 92 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write
  2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
@ 2020-04-30 13:36 ` Dima Stepanov
  2020-05-06  8:54   ` Li Feng
  2020-05-06  9:46   ` Marc-André Lureau
  2020-04-30 13:36 ` [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-04-30 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

During testing of the vhost-user-blk reconnect functionality the qemu
SIGSEGV was triggered:
 start qemu as:
 x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
   -object memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
   -numa node,cpus=0,memdev=ram-node0 \
   -chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
   -device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
 start vhost-user-blk daemon:
 ./vhost-user-blk -s ./vhost.sock -b test-img.raw

If vhost-user-blk will be killed during the vhost initialization
process, for instance after getting VHOST_SET_VRING_CALL command, then
QEMU will fail with the following backtrace:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
    at ./hw/virtio/vhost-user.c:260
260         CharBackend *chr = u->user->chr;

 #0  0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
    at ./hw/virtio/vhost-user.c:260
 #1  0x000055555592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
    at ./hw/virtio/vhost-user.c:1645
 #2  0x0000555555925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
    at ./hw/virtio/vhost.c:1490
 #3  0x00005555558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd8f0)
    at ./hw/block/vhost-user-blk.c:429
 #4  0x0000555555920090 in virtio_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd948)
    at ./hw/virtio/virtio.c:3615
 #5  0x0000555555a9779c in device_set_realized (obj=0x7fffef2d51a0, value=true, errp=0x7fffffffdb88)
    at ./hw/core/qdev.c:891
 ...

The problem is that vhost_user_write doesn't get an error after
disconnect and try to call vhost_user_read(). The tcp_chr_write()
routine should return -1 in case of disconnect. Indicate the EIO error
if this routine is called in the disconnected state.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 chardev/char-socket.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 185fe38..c128cca 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
         if (ret < 0 && errno != EAGAIN) {
             if (tcp_chr_read_poll(chr) <= 0) {
                 tcp_chr_disconnect_locked(chr);
-                return len;
+                /* Return an error since we made a disconnect. */
+                return ret;
             } /* else let the read handler finish it properly */
         }
 
         return ret;
     } else {
-        /* XXX: indicate an error ? */
-        return len;
+        /* Indicate an error. */
+        errno = EIO;
+        return -1;
     }
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device
  2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
  2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
@ 2020-04-30 13:36 ` Dima Stepanov
  2020-05-04  0:36   ` Raphael Norwitz
  2020-05-11  3:03   ` Jason Wang
  2020-04-30 13:36 ` [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-04-30 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

Introduce new wrappers to set/reset guest notifiers for the virtio
device in the vhost device module:
  vhost_dev_assign_guest_notifiers
    ->set_guest_notifiers(..., ..., true);
  vhost_dev_drop_guest_notifiers
    ->set_guest_notifiers(..., ..., false);
This is a preliminary step to refactor code, so the set_guest_notifiers
methods could be called based on the vhost device state.
Update all vhost used devices to use these wrappers instead of direct
method call.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 backends/cryptodev-vhost.c  | 26 +++++++++++++++-----------
 backends/vhost-user.c       | 16 +++++-----------
 hw/block/vhost-user-blk.c   | 15 +++++----------
 hw/net/vhost_net.c          | 30 +++++++++++++++++-------------
 hw/scsi/vhost-scsi-common.c | 15 +++++----------
 hw/virtio/vhost-user-fs.c   | 17 +++++++----------
 hw/virtio/vhost-vsock.c     | 18 ++++++++----------
 hw/virtio/vhost.c           | 38 ++++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio.c          | 13 +++++++++++++
 include/hw/virtio/vhost.h   |  4 ++++
 include/hw/virtio/virtio.h  |  1 +
 11 files changed, 118 insertions(+), 75 deletions(-)

diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index 8337c9a..4522195 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -169,16 +169,13 @@ vhost_set_vring_enable(CryptoDevBackendClient *cc,
 int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
 {
     VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     int r, e;
     int i;
     CryptoDevBackend *b = vcrypto->cryptodev;
     CryptoDevBackendVhost *vhost_crypto;
     CryptoDevBackendClient *cc;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(dev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
@@ -198,9 +195,13 @@ int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
         }
      }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues, true);
+    /*
+     * Since all the states are handled by one vhost device,
+     * use the first one in array.
+     */
+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+    r = vhost_dev_assign_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
     if (r < 0) {
-        error_report("error binding guest notifier: %d", -r);
         goto err;
     }
 
@@ -232,7 +233,8 @@ err_start:
         vhost_crypto = cryptodev_get_vhost(cc, b, i);
         cryptodev_vhost_stop_one(vhost_crypto, dev);
     }
-    e = k->set_guest_notifiers(qbus->parent, total_queues, false);
+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+    e = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
     if (e < 0) {
         error_report("vhost guest notifier cleanup failed: %d", e);
     }
@@ -242,9 +244,6 @@ err:
 
 void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
     CryptoDevBackend *b = vcrypto->cryptodev;
     CryptoDevBackendVhost *vhost_crypto;
@@ -259,7 +258,12 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
         cryptodev_vhost_stop_one(vhost_crypto, dev);
     }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues, false);
+    /*
+     * Since all the states are handled by one vhost device,
+     * use the first one in array.
+     */
+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+    r = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
     if (r < 0) {
         error_report("vhost guest notifier cleanup failed: %d", r);
     }
diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index 2bf3406..e116bc6 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -60,15 +60,13 @@ vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice *vdev,
 void
 vhost_user_backend_start(VhostUserBackend *b)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret, i ;
 
     if (b->started) {
         return;
     }
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(b->vdev)) {
         error_report("binding does not support guest notifiers");
         return;
     }
@@ -78,9 +76,8 @@ vhost_user_backend_start(VhostUserBackend *b)
         return;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, b->dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier");
         goto err_host_notifiers;
     }
 
@@ -104,7 +101,7 @@ vhost_user_backend_start(VhostUserBackend *b)
     return;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, b->dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&b->dev, b->vdev);
 }
@@ -112,8 +109,6 @@ err_host_notifiers:
 void
 vhost_user_backend_stop(VhostUserBackend *b)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret = 0;
 
     if (!b->started) {
@@ -122,9 +117,8 @@ vhost_user_backend_stop(VhostUserBackend *b)
 
     vhost_dev_stop(&b->dev, b->vdev);
 
-    if (k->set_guest_notifiers) {
-        ret = k->set_guest_notifiers(qbus->parent,
-                                     b->dev.nvqs, false);
+    if (virtio_device_guest_notifiers_initialized(b->vdev)) {
+        ret = vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
         if (ret < 0) {
             error_report("vhost guest notifier cleanup failed: %d", ret);
         }
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 17df533..70d7842 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -109,11 +109,9 @@ const VhostDevConfigOps blk_ops = {
 static int vhost_user_blk_start(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int i, ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
@@ -124,9 +122,8 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
         return ret;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier: %d", -ret);
         goto err_host_notifiers;
     }
 
@@ -163,7 +160,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
     return ret;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&s->dev, vdev);
     return ret;
@@ -172,17 +169,15 @@ err_host_notifiers:
 static void vhost_user_blk_stop(VirtIODevice *vdev)
 {
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         return;
     }
 
     vhost_dev_stop(&s->dev, vdev);
 
-    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
+    ret = vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
         return;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6b82803..c13b444 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -303,19 +303,15 @@ static void vhost_net_stop_one(struct vhost_net *net,
 int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
                     int total_queues)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+    struct vhost_net *net;
     int r, e, i;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(dev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
 
     for (i = 0; i < total_queues; i++) {
-        struct vhost_net *net;
-
         net = get_vhost_net(ncs[i].peer);
         vhost_net_set_vq_index(net, i * 2);
 
@@ -328,9 +324,13 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         }
      }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
+    /*
+     * Since all the states are handled by one vhost_net device,
+     * use the first one in array.
+     */
+    net = get_vhost_net(ncs[0].peer);
+    r = vhost_dev_assign_guest_notifiers(&net->dev, dev, total_queues * 2);
     if (r < 0) {
-        error_report("Error binding guest notifier: %d", -r);
         goto err;
     }
 
@@ -357,7 +357,8 @@ err_start:
     while (--i >= 0) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
     }
-    e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
+    net = get_vhost_net(ncs[0].peer);
+    e = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
     if (e < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
         fflush(stderr);
@@ -369,16 +370,19 @@ err:
 void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
                     int total_queues)
 {
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+    struct vhost_net *net;
     int i, r;
 
     for (i = 0; i < total_queues; i++) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
     }
 
-    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
+    /*
+     * Since all the states are handled by one vhost_net device,
+     * use the first one in array.
+     */
+    net = get_vhost_net(ncs[0].peer);
+    r = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
     if (r < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
         fflush(stderr);
diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 8ec49d7..8f51ec0 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -29,10 +29,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
 {
     int ret, i;
     VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
@@ -42,9 +40,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
         return ret;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier");
         goto err_host_notifiers;
     }
 
@@ -66,7 +63,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
     return ret;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&vsc->dev, vdev);
     return ret;
@@ -75,14 +72,12 @@ err_host_notifiers:
 void vhost_scsi_common_stop(VHostSCSICommon *vsc)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret = 0;
 
     vhost_dev_stop(&vsc->dev, vdev);
 
-    if (k->set_guest_notifiers) {
-        ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
+    if (virtio_device_guest_notifiers_initialized(vdev)) {
+        ret = vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
         if (ret < 0) {
                 error_report("vhost guest notifier cleanup failed: %d", ret);
         }
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 6136768..6b101fc 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -38,12 +38,10 @@ static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 static void vuf_start(VirtIODevice *vdev)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
     int i;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return;
     }
@@ -54,9 +52,9 @@ static void vuf_start(VirtIODevice *vdev)
         return;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&fs->vhost_dev, vdev,
+            fs->vhost_dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier: %d", -ret);
         goto err_host_notifiers;
     }
 
@@ -79,7 +77,7 @@ static void vuf_start(VirtIODevice *vdev)
     return;
 
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev, fs->vhost_dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&fs->vhost_dev, vdev);
 }
@@ -87,17 +85,16 @@ err_host_notifiers:
 static void vuf_stop(VirtIODevice *vdev)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         return;
     }
 
     vhost_dev_stop(&fs->vhost_dev, vdev);
 
-    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
+    ret = vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev,
+            fs->vhost_dev.nvqs);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
         return;
diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index 09b6b07..52489dd 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -75,12 +75,10 @@ static int vhost_vsock_set_running(VHostVSock *vsock, int start)
 static void vhost_vsock_start(VirtIODevice *vdev)
 {
     VHostVSock *vsock = VHOST_VSOCK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
     int i;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         error_report("binding does not support guest notifiers");
         return;
     }
@@ -91,9 +89,9 @@ static void vhost_vsock_start(VirtIODevice *vdev)
         return;
     }
 
-    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, true);
+    ret = vhost_dev_assign_guest_notifiers(&vsock->vhost_dev,
+            vdev, vsock->vhost_dev.nvqs);
     if (ret < 0) {
-        error_report("Error binding guest notifier: %d", -ret);
         goto err_host_notifiers;
     }
 
@@ -123,7 +121,8 @@ static void vhost_vsock_start(VirtIODevice *vdev)
 err_dev_start:
     vhost_dev_stop(&vsock->vhost_dev, vdev);
 err_guest_notifiers:
-    k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
+    vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
+            vdev, vsock->vhost_dev.nvqs);
 err_host_notifiers:
     vhost_dev_disable_notifiers(&vsock->vhost_dev, vdev);
 }
@@ -131,11 +130,9 @@ err_host_notifiers:
 static void vhost_vsock_stop(VirtIODevice *vdev)
 {
     VHostVSock *vsock = VHOST_VSOCK(vdev);
-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
-    if (!k->set_guest_notifiers) {
+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
         return;
     }
 
@@ -147,7 +144,8 @@ static void vhost_vsock_stop(VirtIODevice *vdev)
 
     vhost_dev_stop(&vsock->vhost_dev, vdev);
 
-    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
+    ret = vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
+            vdev, vsock->vhost_dev.nvqs);
     if (ret < 0) {
         error_report("vhost guest notifier cleanup failed: %d", ret);
         return;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 01ebe12..fa3da9c 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1419,6 +1419,44 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     virtio_device_release_ioeventfd(vdev);
 }
 
+/*
+ * Assign guest notifiers.
+ * Should be called after vhost_dev_enable_notifiers.
+ */
+int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
+                                     VirtIODevice *vdev, int nvqs)
+{
+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+    int ret;
+
+    ret = k->set_guest_notifiers(qbus->parent, nvqs, true);
+    if (ret < 0) {
+        error_report("Error binding guest notifier: %d", -ret);
+    }
+
+    return ret;
+}
+
+/*
+ * Drop guest notifiers.
+ * Should be called before vhost_dev_disable_notifiers.
+ */
+int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
+                                   VirtIODevice *vdev, int nvqs)
+{
+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+    int ret;
+
+    ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
+    if (ret < 0) {
+        error_report("Error reset guest notifier: %d", -ret);
+    }
+
+    return ret;
+}
+
 /* Test and clear event pending status.
  * Should be called after unmask to avoid losing events.
  */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index b6c8ef5..8a95618 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3812,6 +3812,19 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
     return virtio_bus_ioeventfd_enabled(vbus);
 }
 
+/*
+ * Check if set_guest_notifiers() method is set by the init routine.
+ * Return true if yes, otherwise return false.
+ */
+bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev)
+{
+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    return k->set_guest_notifiers;
+}
+
+
 static const TypeInfo virtio_device_info = {
     .name = TYPE_VIRTIO_DEVICE,
     .parent = TYPE_DEVICE,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 085450c..4d0d2e2 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -100,6 +100,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
+                                     VirtIODevice *vdev, int nvqs);
+int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
+                                   VirtIODevice *vdev, int nvqs);
 
 /* Test and clear masked event pending status.
  * Should be called after unmask to avoid losing events.
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b69d517..d9a3d72 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -323,6 +323,7 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
                                                 VirtIOHandleAIOOutput handle_output);
 VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
 VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
+bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev);
 
 static inline void virtio_add_feature(uint64_t *features, unsigned int fbit)
 {
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state
  2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
  2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
  2020-04-30 13:36 ` [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
@ 2020-04-30 13:36 ` Dima Stepanov
  2020-05-04  1:06   ` Raphael Norwitz
  2020-04-30 13:36 ` [PATCH v2 4/5] vhost: check vring address before calling unmap Dima Stepanov
  2020-04-30 13:36 ` [PATCH v2 5/5] vhost: add device started check in migration set log Dima Stepanov
  4 siblings, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-04-30 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

In case of the vhost-user devices the daemon can be killed at any
moment. Since QEMU supports the reconnet functionality the guest
notifiers should be reset and disabled after "disconnect" event. The
most issues were found if the "disconnect" event happened during vhost
device initialization step.
The disconnect event leads to the call of the vhost_dev_cleanup()
routine. Which memset to 0 a vhost device structure. Because of this, if
device was not started (dev.started == false) and the connection is
broken, then the set_guest_notifier method will produce assertion error.
Also connection can be broken after the dev.started field is set to
true.
A new notifiers_set field is added to the vhost_dev structure to track
the state of the guest notifiers during the initialization process.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 hw/block/vhost-user-blk.c |  8 ++++----
 hw/virtio/vhost.c         | 11 +++++++++++
 include/hw/virtio/vhost.h |  1 +
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 70d7842..5a3de0f 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -175,7 +175,9 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
         return;
     }
 
-    vhost_dev_stop(&s->dev, vdev);
+    if (s->dev.started) {
+        vhost_dev_stop(&s->dev, vdev);
+    }
 
     ret = vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
     if (ret < 0) {
@@ -337,9 +339,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
     }
     s->connected = false;
 
-    if (s->dev.started) {
-        vhost_user_blk_stop(vdev);
-    }
+    vhost_user_blk_stop(vdev);
 
     vhost_dev_cleanup(&s->dev);
 }
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index fa3da9c..ddbdc53 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1380,6 +1380,7 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
             goto fail_vq;
         }
     }
+    hdev->notifiers_set = true;
 
     return 0;
 fail_vq:
@@ -1407,6 +1408,10 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
     int i, r;
 
+    if (!hdev->notifiers_set) {
+        return;
+    }
+
     for (i = 0; i < hdev->nvqs; ++i) {
         r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i,
                                          false);
@@ -1417,6 +1422,8 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
         virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i);
     }
     virtio_device_release_ioeventfd(vdev);
+
+    hdev->notifiers_set = false;
 }
 
 /*
@@ -1449,6 +1456,10 @@ int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     int ret;
 
+    if (!hdev->notifiers_set) {
+        return 0;
+    }
+
     ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
     if (ret < 0) {
         error_report("Error reset guest notifier: %d", -ret);
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 4d0d2e2..e3711a7 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -90,6 +90,7 @@ struct vhost_dev {
     QLIST_HEAD(, vhost_iommu) iommu_list;
     IOMMUNotifier n;
     const VhostDevConfigOps *config_ops;
+    bool notifiers_set;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
                   ` (2 preceding siblings ...)
  2020-04-30 13:36 ` [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
@ 2020-04-30 13:36 ` Dima Stepanov
  2020-05-04  1:13   ` Raphael Norwitz
  2020-05-11  3:05   ` Jason Wang
  2020-04-30 13:36 ` [PATCH v2 5/5] vhost: add device started check in migration set log Dima Stepanov
  4 siblings, 2 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-04-30 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

Since disconnect can happen at any time during initialization not all
vring buffers (for instance used vring) can be intialized successfully.
If the buffer was not initialized then vhost_memory_unmap call will lead
to SIGSEGV. Add checks for the vring address value before calling unmap.
Also add assert() in the vhost_memory_unmap() routine.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 hw/virtio/vhost.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddbdc53..3ee50c4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
                                hwaddr len, int is_write,
                                hwaddr access_len)
 {
+    assert(buffer);
+
     if (!vhost_dev_has_iommu(dev)) {
         cpu_physical_memory_unmap(buffer, len, is_write, access_len);
     }
@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
                                                 vhost_vq_index);
     }
 
-    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-                       1, virtio_queue_get_used_size(vdev, idx));
-    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-                       0, virtio_queue_get_avail_size(vdev, idx));
-    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-                       0, virtio_queue_get_desc_size(vdev, idx));
+    /*
+     * Since the vhost-user disconnect can happen during initialization
+     * check if vring was initialized, before making unmap.
+     */
+    if (vq->used) {
+        vhost_memory_unmap(dev, vq->used,
+                           virtio_queue_get_used_size(vdev, idx),
+                           1, virtio_queue_get_used_size(vdev, idx));
+    }
+    if (vq->avail) {
+        vhost_memory_unmap(dev, vq->avail,
+                           virtio_queue_get_avail_size(vdev, idx),
+                           0, virtio_queue_get_avail_size(vdev, idx));
+    }
+    if (vq->desc) {
+        vhost_memory_unmap(dev, vq->desc,
+                           virtio_queue_get_desc_size(vdev, idx),
+                           0, virtio_queue_get_desc_size(vdev, idx));
+    }
 }
 
 static void vhost_eventfd_add(MemoryListener *listener,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
                   ` (3 preceding siblings ...)
  2020-04-30 13:36 ` [PATCH v2 4/5] vhost: check vring address before calling unmap Dima Stepanov
@ 2020-04-30 13:36 ` Dima Stepanov
  2020-05-06 22:08   ` Raphael Norwitz
  2020-05-11  3:15   ` Jason Wang
  4 siblings, 2 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-04-30 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, jasowang, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. If such
disconnect happened in the vhost_migration_log() routine the vhost
device structure will be clean up.
At the start of the vhost_migration_log() function there is a check:
  if (!dev->started) {
      dev->log_enabled = enable;
      return 0;
  }
To be consistent with this check add the same check after calling the
vhost_dev_set_log() routine. This in general help not to break a
migration due the assert() message. But it looks like that this code
should be revised to handle these errors more carefully.

In case of vhost-user device backend the fail paths should consider the
state of the device. In this case we should skip some function calls
during rollback on the error paths, so not to get the NULL dereference
errors.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 3ee50c4..d5ab96d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
 static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
 {
     int r, i, idx;
+
+    if (!dev->started) {
+        /*
+         * If vhost-user daemon is used as a backend for the
+         * device and the connection is broken, then the vhost_dev
+         * structure will be reset all its values to 0.
+         * Add additional check for the device state.
+         */
+        return -1;
+    }
+
     r = vhost_dev_set_features(dev, enable_log);
     if (r < 0) {
         goto err_features;
@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
     }
     return 0;
 err_vq:
-    for (; i >= 0; --i) {
+    /*
+     * Disconnect with the vhost-user daemon can lead to the
+     * vhost_dev_cleanup() call which will clean up vhost_dev
+     * structure.
+     */
+    for (; dev->started && (i >= 0); --i) {
         idx = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + i);
         vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
                                  dev->log_enabled);
     }
-    vhost_dev_set_features(dev, dev->log_enabled);
+    if (dev->started) {
+        vhost_dev_set_features(dev, dev->log_enabled);
+    }
 err_features:
     return r;
 }
@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
     } else {
         vhost_dev_log_resize(dev, vhost_get_log_size(dev));
         r = vhost_dev_set_log(dev, true);
-        if (r < 0) {
+        /*
+         * The dev log resize can fail, because of disconnect
+         * with the vhost-user-blk daemon. Check the device
+         * state before calling the vhost_dev_set_log()
+         * function.
+         * Don't return error if device isn't started to be
+         * consistent with the check above.
+         */
+        if (dev->started && r < 0) {
             return r;
         }
     }
@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 fail_log:
     vhost_log_put(hdev, false);
 fail_vq:
-    while (--i >= 0) {
+    /*
+     * Disconnect with the vhost-user daemon can lead to the
+     * vhost_dev_cleanup() call which will clean up vhost_dev
+     * structure.
+     */
+    while ((--i >= 0) && (hdev->started)) {
         vhost_virtqueue_stop(hdev,
                              vdev,
                              hdev->vqs + i,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device
  2020-04-30 13:36 ` [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
@ 2020-05-04  0:36   ` Raphael Norwitz
  2020-05-06  8:54     ` Dima Stepanov
  2020-05-11  3:03   ` Jason Wang
  1 sibling, 1 reply; 51+ messages in thread
From: Raphael Norwitz @ 2020-05-04  0:36 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

I’m happy from the vhost, vhost-user-blk and vhost-user-scsi side. For
other device types it looks pretty straightforward, but their maintainers
should probably confirm.

Since you plan to change the behavior of these helpers in subsequent
patches, maybe consider sending the other device types separately
after the rest of the series has been merged? That way the changes to
individual devices will be much easier to review.

On Thu, Apr 30, 2020 at 9:48 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> Introduce new wrappers to set/reset guest notifiers for the virtio
> device in the vhost device module:
>   vhost_dev_assign_guest_notifiers
>     ->set_guest_notifiers(..., ..., true);
>   vhost_dev_drop_guest_notifiers
>     ->set_guest_notifiers(..., ..., false);
> This is a preliminary step to refactor code, so the set_guest_notifiers
> methods could be called based on the vhost device state.
> Update all vhost used devices to use these wrappers instead of direct
> method call.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>  backends/cryptodev-vhost.c  | 26 +++++++++++++++-----------
>  backends/vhost-user.c       | 16 +++++-----------
>  hw/block/vhost-user-blk.c   | 15 +++++----------
>  hw/net/vhost_net.c          | 30 +++++++++++++++++-------------
>  hw/scsi/vhost-scsi-common.c | 15 +++++----------
>  hw/virtio/vhost-user-fs.c   | 17 +++++++----------
>  hw/virtio/vhost-vsock.c     | 18 ++++++++----------
>  hw/virtio/vhost.c           | 38 ++++++++++++++++++++++++++++++++++++++
>  hw/virtio/virtio.c          | 13 +++++++++++++
>  include/hw/virtio/vhost.h   |  4 ++++
>  include/hw/virtio/virtio.h  |  1 +
>  11 files changed, 118 insertions(+), 75 deletions(-)
>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state
  2020-04-30 13:36 ` [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
@ 2020-05-04  1:06   ` Raphael Norwitz
  2020-05-06  8:51     ` Dima Stepanov
  0 siblings, 1 reply; 51+ messages in thread
From: Raphael Norwitz @ 2020-05-04  1:06 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

Apologies for mixing up patches last time. This looks good from a
vhost-user-blk perspective, but I worry that some of these changes
could impact other vhost device types.

I agree with adding notifiers_set to struct vhost_dev, and setting it in
vhost_dev_enable/disable notifiers, but is there any reason notifiers_set
can’t be checked inside vhost-user-blk?

On Thu, Apr 30, 2020 at 9:55 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> In case of the vhost-user devices the daemon can be killed at any
> moment. Since QEMU supports the reconnet functionality the guest
> notifiers should be reset and disabled after "disconnect" event. The
> most issues were found if the "disconnect" event happened during vhost
> device initialization step.
> The disconnect event leads to the call of the vhost_dev_cleanup()
> routine. Which memset to 0 a vhost device structure. Because of this, if
> device was not started (dev.started == false) and the connection is
> broken, then the set_guest_notifier method will produce assertion error.
> Also connection can be broken after the dev.started field is set to
> true.
> A new notifiers_set field is added to the vhost_dev structure to track
> the state of the guest notifiers during the initialization process.
>

From what I can tell this patch does two things:

(1)
In vhost.c you’re adding checks to abort early, while still returning
successfully, from
vhost_dev_drop_guest_notifiers() and vhost_dev_disable_notifiers() if
notifiers have
not been enabled. This new logic will affect all existing vhost devices.

(2)
For vhost-user-blk backend disconnect, you are ensuring that notifiers
are dropped and
disabled if and only if the notifiers are currently enabled.

I completely agree with (2), but I don't think we need all of what
you've done for
(1) to accomplish (2).

Either way, please clarify in your commit message.

> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>  hw/block/vhost-user-blk.c |  8 ++++----
>  hw/virtio/vhost.c         | 11 +++++++++++
>  include/hw/virtio/vhost.h |  1 +
>  3 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 70d7842..5a3de0f 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -175,7 +175,9 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
>          return;
>      }
>
> -    vhost_dev_stop(&s->dev, vdev);
> +    if (s->dev.started) {
> +        vhost_dev_stop(&s->dev, vdev);
> +    }
>

Couldn't we check if s->dev.notifiers_set here before calling
vhost_dev_drop_guest_notifiers()?

>      ret = vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
>      if (ret < 0) {
> @@ -337,9 +339,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>      }
>      s->connected = false;
>
> -    if (s->dev.started) {
> -        vhost_user_blk_stop(vdev);
> -    }
> +    vhost_user_blk_stop(vdev);
>
>      vhost_dev_cleanup(&s->dev);
>  }
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index fa3da9c..ddbdc53 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1380,6 +1380,7 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>              goto fail_vq;
>          }
>      }
> +    hdev->notifiers_set = true;
>
>      return 0;
>  fail_vq:
> @@ -1407,6 +1408,10 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
>      int i, r;
>

I’m a little weary of short circuiting logic like this without at
least propagating an
error up. Couldn’t we leave it to the backends to check notifiers_set
before they
call vhost_dev_disable_notifiers() or vhost_dev_drop_guest_notifiers()?

Then, if anything, maybe make this check an assert?

> +    if (!hdev->notifiers_set) {
> +        return;
> +    }
> +
>      for (i = 0; i < hdev->nvqs; ++i) {
>          r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i,
>                                           false);
> @@ -1417,6 +1422,8 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>          virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i);
>      }
>      virtio_device_release_ioeventfd(vdev);
> +
> +    hdev->notifiers_set = false;
>  }
>
>  /*
> @@ -1449,6 +1456,10 @@ int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
>      VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>      int ret;
>

Same comment as above - I’d prefer vhost-user-blk (and other backends
supporting reconnect)
check before calling the function instead of changing existing API
behavior for other vhost devices.

> +    if (!hdev->notifiers_set) {
> +        return 0;
> +    }
> +
>      ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
>      if (ret < 0) {
>          error_report("Error reset guest notifier: %d", -ret);
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 4d0d2e2..e3711a7 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -90,6 +90,7 @@ struct vhost_dev {
>      QLIST_HEAD(, vhost_iommu) iommu_list;
>      IOMMUNotifier n;
>      const VhostDevConfigOps *config_ops;
> +    bool notifiers_set;
>  };
>
>  int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> --
> 2.7.4
>
>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-04-30 13:36 ` [PATCH v2 4/5] vhost: check vring address before calling unmap Dima Stepanov
@ 2020-05-04  1:13   ` Raphael Norwitz
  2020-05-11  3:05   ` Jason Wang
  1 sibling, 0 replies; 51+ messages in thread
From: Raphael Norwitz @ 2020-05-04  1:13 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

On Thu, Apr 30, 2020 at 9:50 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> Since disconnect can happen at any time during initialization not all
> vring buffers (for instance used vring) can be intialized successfully.
> If the buffer was not initialized then vhost_memory_unmap call will lead
> to SIGSEGV. Add checks for the vring address value before calling unmap.
> Also add assert() in the vhost_memory_unmap() routine.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>

Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

> ---
>  hw/virtio/vhost.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index ddbdc53..3ee50c4 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
>                                 hwaddr len, int is_write,
>                                 hwaddr access_len)
>  {
> +    assert(buffer);
> +
>      if (!vhost_dev_has_iommu(dev)) {
>          cpu_physical_memory_unmap(buffer, len, is_write, access_len);
>      }
> @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>                                                  vhost_vq_index);
>      }
>
> -    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
> -                       1, virtio_queue_get_used_size(vdev, idx));
> -    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
> -                       0, virtio_queue_get_avail_size(vdev, idx));
> -    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
> -                       0, virtio_queue_get_desc_size(vdev, idx));
> +    /*
> +     * Since the vhost-user disconnect can happen during initialization
> +     * check if vring was initialized, before making unmap.
> +     */
> +    if (vq->used) {
> +        vhost_memory_unmap(dev, vq->used,
> +                           virtio_queue_get_used_size(vdev, idx),
> +                           1, virtio_queue_get_used_size(vdev, idx));
> +    }
> +    if (vq->avail) {
> +        vhost_memory_unmap(dev, vq->avail,
> +                           virtio_queue_get_avail_size(vdev, idx),
> +                           0, virtio_queue_get_avail_size(vdev, idx));
> +    }
> +    if (vq->desc) {
> +        vhost_memory_unmap(dev, vq->desc,
> +                           virtio_queue_get_desc_size(vdev, idx),
> +                           0, virtio_queue_get_desc_size(vdev, idx));
> +    }
>  }
>
>  static void vhost_eventfd_add(MemoryListener *listener,
> --
> 2.7.4
>
>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state
  2020-05-04  1:06   ` Raphael Norwitz
@ 2020-05-06  8:51     ` Dima Stepanov
  0 siblings, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-06  8:51 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

On Sun, May 03, 2020 at 09:06:38PM -0400, Raphael Norwitz wrote:
> Apologies for mixing up patches last time. This looks good from a
> vhost-user-blk perspective, but I worry that some of these changes
> could impact other vhost device types.
> 
> I agree with adding notifiers_set to struct vhost_dev, and setting it in
> vhost_dev_enable/disable notifiers, but is there any reason notifiers_set
> can’t be checked inside vhost-user-blk?
Thanks for your review. I also have some concerns about changing current
API, but my idea was that these issues will be triggered for all
vhost-user/reconnect devices. But maybe you are right and first we
should fix vhost-user-blk issues.
I'll try to modify patch 2 and 3 in my patchset, so new notifiers_set
field will be added, but no API change will be made. Will see how it
looks.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device
  2020-05-04  0:36   ` Raphael Norwitz
@ 2020-05-06  8:54     ` Dima Stepanov
  0 siblings, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-06  8:54 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

On Sun, May 03, 2020 at 08:36:45PM -0400, Raphael Norwitz wrote:
> I’m happy from the vhost, vhost-user-blk and vhost-user-scsi side. For
> other device types it looks pretty straightforward, but their maintainers
> should probably confirm.
> 
> Since you plan to change the behavior of these helpers in subsequent
> patches, maybe consider sending the other device types separately
> after the rest of the series has been merged? That way the changes to
> individual devices will be much easier to review.

Thanks for comments.
Agree, will make a more straightforward fix only for vhost-user-blk.
After it we can figure out how to propogate this change to other
devices.

> 
> On Thu, Apr 30, 2020 at 9:48 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> >
> > Introduce new wrappers to set/reset guest notifiers for the virtio
> > device in the vhost device module:
> >   vhost_dev_assign_guest_notifiers
> >     ->set_guest_notifiers(..., ..., true);
> >   vhost_dev_drop_guest_notifiers
> >     ->set_guest_notifiers(..., ..., false);
> > This is a preliminary step to refactor code, so the set_guest_notifiers
> > methods could be called based on the vhost device state.
> > Update all vhost used devices to use these wrappers instead of direct
> > method call.
> >
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > ---
> >  backends/cryptodev-vhost.c  | 26 +++++++++++++++-----------
> >  backends/vhost-user.c       | 16 +++++-----------
> >  hw/block/vhost-user-blk.c   | 15 +++++----------
> >  hw/net/vhost_net.c          | 30 +++++++++++++++++-------------
> >  hw/scsi/vhost-scsi-common.c | 15 +++++----------
> >  hw/virtio/vhost-user-fs.c   | 17 +++++++----------
> >  hw/virtio/vhost-vsock.c     | 18 ++++++++----------
> >  hw/virtio/vhost.c           | 38 ++++++++++++++++++++++++++++++++++++++
> >  hw/virtio/virtio.c          | 13 +++++++++++++
> >  include/hw/virtio/vhost.h   |  4 ++++
> >  include/hw/virtio/virtio.h  |  1 +
> >  11 files changed, 118 insertions(+), 75 deletions(-)
> >


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write
  2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
@ 2020-05-06  8:54   ` Li Feng
  2020-05-06  9:46   ` Marc-André Lureau
  1 sibling, 0 replies; 51+ messages in thread
From: Li Feng @ 2020-05-06  8:54 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: Fam Zheng, Kevin Wolf, yc-core, open list:Block layer core,
	Michael S. Tsirkin, Jason Wang, open list:All patches CC here,
	Dr. David Alan Gilbert, Gonglei, Raphael Norwitz,
	Stefan Hajnoczi, Marc-André Lureau, Paolo Bonzini,
	Max Reitz

Thanks,

Feng Li

Dima Stepanov <dimastep@yandex-team.ru> 于2020年4月30日周四 下午9:36写道:
>
> During testing of the vhost-user-blk reconnect functionality the qemu
> SIGSEGV was triggered:
>  start qemu as:
>  x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
>    -object memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
>    -numa node,cpus=0,memdev=ram-node0 \
>    -chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
>    -device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
>  start vhost-user-blk daemon:
>  ./vhost-user-blk -s ./vhost.sock -b test-img.raw
>
> If vhost-user-blk will be killed during the vhost initialization
> process, for instance after getting VHOST_SET_VRING_CALL command, then
> QEMU will fail with the following backtrace:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
>     at ./hw/virtio/vhost-user.c:260
> 260         CharBackend *chr = u->user->chr;
>
>  #0  0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
>     at ./hw/virtio/vhost-user.c:260
>  #1  0x000055555592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
>     at ./hw/virtio/vhost-user.c:1645
>  #2  0x0000555555925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
>     at ./hw/virtio/vhost.c:1490
>  #3  0x00005555558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd8f0)
>     at ./hw/block/vhost-user-blk.c:429
>  #4  0x0000555555920090 in virtio_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd948)
>     at ./hw/virtio/virtio.c:3615
>  #5  0x0000555555a9779c in device_set_realized (obj=0x7fffef2d51a0, value=true, errp=0x7fffffffdb88)
>     at ./hw/core/qdev.c:891
>  ...
>
> The problem is that vhost_user_write doesn't get an error after
> disconnect and try to call vhost_user_read(). The tcp_chr_write()
> routine should return -1 in case of disconnect. Indicate the EIO error
> if this routine is called in the disconnected state.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>  chardev/char-socket.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 185fe38..c128cca 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>          if (ret < 0 && errno != EAGAIN) {
>              if (tcp_chr_read_poll(chr) <= 0) {
>                  tcp_chr_disconnect_locked(chr);
> -                return len;
> +                /* Return an error since we made a disconnect. */
> +                return ret;
The `return` statement could be deleted.
The outside has a return statement.

>              } /* else let the read handler finish it properly */
>          }
>
>          return ret;
>      } else {
> -        /* XXX: indicate an error ? */
> -        return len;
> +        /* Indicate an error. */
> +        errno = EIO;
> +        return -1;
>      }
>  }
>
> --
> 2.7.4
>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write
  2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
  2020-05-06  8:54   ` Li Feng
@ 2020-05-06  9:46   ` Marc-André Lureau
  1 sibling, 0 replies; 51+ messages in thread
From: Marc-André Lureau @ 2020-05-06  9:46 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: Fam Zheng, Wolf, Kevin, yc-core, qemu-block, Michael Tsirkin,
	Jason Wang, qemu-devel, David Gilbert, Gonglei, Raphael Norwitz,
	Li Feng, Stefan Hajnoczi, Bonzini, Paolo, Max Reitz

On Thu, Apr 30, 2020 at 3:37 PM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> During testing of the vhost-user-blk reconnect functionality the qemu
> SIGSEGV was triggered:
>  start qemu as:
>  x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
>    -object memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
>    -numa node,cpus=0,memdev=ram-node0 \
>    -chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
>    -device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
>  start vhost-user-blk daemon:
>  ./vhost-user-blk -s ./vhost.sock -b test-img.raw
>
> If vhost-user-blk will be killed during the vhost initialization
> process, for instance after getting VHOST_SET_VRING_CALL command, then
> QEMU will fail with the following backtrace:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
>     at ./hw/virtio/vhost-user.c:260
> 260         CharBackend *chr = u->user->chr;
>
>  #0  0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
>     at ./hw/virtio/vhost-user.c:260
>  #1  0x000055555592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
>     at ./hw/virtio/vhost-user.c:1645
>  #2  0x0000555555925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
>     at ./hw/virtio/vhost.c:1490
>  #3  0x00005555558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd8f0)
>     at ./hw/block/vhost-user-blk.c:429
>  #4  0x0000555555920090 in virtio_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd948)
>     at ./hw/virtio/virtio.c:3615
>  #5  0x0000555555a9779c in device_set_realized (obj=0x7fffef2d51a0, value=true, errp=0x7fffffffdb88)
>     at ./hw/core/qdev.c:891
>  ...
>
> The problem is that vhost_user_write doesn't get an error after
> disconnect and try to call vhost_user_read(). The tcp_chr_write()
> routine should return -1 in case of disconnect. Indicate the EIO error
> if this routine is called in the disconnected state.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>


Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

> ---
>  chardev/char-socket.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 185fe38..c128cca 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -175,14 +175,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>          if (ret < 0 && errno != EAGAIN) {
>              if (tcp_chr_read_poll(chr) <= 0) {
>                  tcp_chr_disconnect_locked(chr);
> -                return len;
> +                /* Return an error since we made a disconnect. */
> +                return ret;
>              } /* else let the read handler finish it properly */
>          }
>
>          return ret;
>      } else {
> -        /* XXX: indicate an error ? */
> -        return len;
> +        /* Indicate an error. */
> +        errno = EIO;
> +        return -1;
>      }
>  }
>
> --
> 2.7.4
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-04-30 13:36 ` [PATCH v2 5/5] vhost: add device started check in migration set log Dima Stepanov
@ 2020-05-06 22:08   ` Raphael Norwitz
  2020-05-07  7:15     ` Michael S. Tsirkin
  2020-05-07 15:35     ` Dima Stepanov
  2020-05-11  3:15   ` Jason Wang
  1 sibling, 2 replies; 51+ messages in thread
From: Raphael Norwitz @ 2020-05-06 22:08 UTC (permalink / raw)
  To: Dima Stepanov, mst, fengli
  Cc: fam, kwolf, stefanha, qemu-block, jasowang, qemu-devel, dgilbert,
	raphael.norwitz, arei.gonglei, yc-core, pbonzini,
	marcandre.lureau, mreitz

As you correctly point out, this code needs to be looked at more
carefully so that
if the device does disconnect in the background we can handle the migration path
gracefully. In particular, we need to decide whether a migration
should be allowed
to continue if a device disconnects durning the migration stage.

mst, any thoughts?

Have you looked at the suggestion I gave Li Feng to move vhost_dev_cleanup()
into the connection path in vhost-user-blk? I’m not sure if he’s
actively working on it,
but I would prefer if we can find a way to keep some state around
between reconnects
so we aren’t constantly checking dev->started. A device can be stopped
for reasons
other than backend disconnect so I’d rather not reuse this field to
check for backend
disconnect failures.

On Thu, Apr 30, 2020 at 9:57 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> If vhost-user daemon is used as a backend for the vhost device, then we
> should consider a possibility of disconnect at any moment. If such
> disconnect happened in the vhost_migration_log() routine the vhost
> device structure will be clean up.
> At the start of the vhost_migration_log() function there is a check:
>   if (!dev->started) {
>       dev->log_enabled = enable;
>       return 0;
>   }
> To be consistent with this check add the same check after calling the
> vhost_dev_set_log() routine. This in general help not to break a

Could you point to the specific asserts which are being triggered?

> migration due the assert() message. But it looks like that this code
> should be revised to handle these errors more carefully.
>
> In case of vhost-user device backend the fail paths should consider the
> state of the device. In this case we should skip some function calls
> during rollback on the error paths, so not to get the NULL dereference
> errors.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 3ee50c4..d5ab96d 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>  {
>      int r, i, idx;

A couple points here


(1) This will fail the live migration if the device is disconnected.
That my be the right thing
      to do, but if there are cases where migrations can proceed with
a disconnected device,
      this may not be desirable.

(2) This looks racy. As far as I can tell vhost_dev_set_log() is only
called by vhost_migration_log(),
      and as you say one of the first things vhost_migration_log does
is return if dev->started is not
      set. What’s to stop a disconnect from clearing the vdev right
after this check, just before
      vhost_dev_set_features() is called?

As stated above, I would prefer it if we could add some state which
would persist between
reconnects which could then be checked in the vhost-user code before
interacting with
the backend. I understand this will be a much more involved change and
will require a lot
of thought.

Also, regarding (1) above, if the original check in
vhost_migration_log() returns success if the
device is not started why return an error here? I imagine this could
lead to some inconsistent
behavior if the device disconnects before the first check verses
before the second.

> +
> +    if (!dev->started) {
> +        /*
> +         * If vhost-user daemon is used as a backend for the
> +         * device and the connection is broken, then the vhost_dev
> +         * structure will be reset all its values to 0.
> +         * Add additional check for the device state.
> +         */
> +        return -1;
> +    }
> +
>      r = vhost_dev_set_features(dev, enable_log);
>      if (r < 0) {
>          goto err_features;
> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>      }


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-06 22:08   ` Raphael Norwitz
@ 2020-05-07  7:15     ` Michael S. Tsirkin
  2020-05-07 15:35     ` Dima Stepanov
  1 sibling, 0 replies; 51+ messages in thread
From: Michael S. Tsirkin @ 2020-05-07  7:15 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: fam, kwolf, stefanha, qemu-block, jasowang, qemu-devel, dgilbert,
	raphael.norwitz, arei.gonglei, fengli, yc-core, pbonzini,
	marcandre.lureau, mreitz, Dima Stepanov

On Wed, May 06, 2020 at 06:08:34PM -0400, Raphael Norwitz wrote:
> As you correctly point out, this code needs to be looked at more
> carefully so that
> if the device does disconnect in the background we can handle the migration path
> gracefully. In particular, we need to decide whether a migration
> should be allowed
> to continue if a device disconnects durning the migration stage.
> 
> mst, any thoughts?

Why not? It can't change state while disconnected, so it just makes
things easier.

> Have you looked at the suggestion I gave Li Feng to move vhost_dev_cleanup()
> into the connection path in vhost-user-blk? I’m not sure if he’s
> actively working on it,
> but I would prefer if we can find a way to keep some state around
> between reconnects
> so we aren’t constantly checking dev->started. A device can be stopped
> for reasons
> other than backend disconnect so I’d rather not reuse this field to
> check for backend
> disconnect failures.
> 
> On Thu, Apr 30, 2020 at 9:57 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> >
> > If vhost-user daemon is used as a backend for the vhost device, then we
> > should consider a possibility of disconnect at any moment. If such
> > disconnect happened in the vhost_migration_log() routine the vhost
> > device structure will be clean up.
> > At the start of the vhost_migration_log() function there is a check:
> >   if (!dev->started) {
> >       dev->log_enabled = enable;
> >       return 0;
> >   }
> > To be consistent with this check add the same check after calling the
> > vhost_dev_set_log() routine. This in general help not to break a
> 
> Could you point to the specific asserts which are being triggered?
> 
> > migration due the assert() message. But it looks like that this code
> > should be revised to handle these errors more carefully.
> >
> > In case of vhost-user device backend the fail paths should consider the
> > state of the device. In this case we should skip some function calls
> > during rollback on the error paths, so not to get the NULL dereference
> > errors.
> >
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > ---
> >  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >  1 file changed, 35 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 3ee50c4..d5ab96d 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >  {
> >      int r, i, idx;
> 
> A couple points here
> 
> 
> (1) This will fail the live migration if the device is disconnected.
> That my be the right thing
>       to do, but if there are cases where migrations can proceed with
> a disconnected device,
>       this may not be desirable.
> 
> (2) This looks racy. As far as I can tell vhost_dev_set_log() is only
> called by vhost_migration_log(),
>       and as you say one of the first things vhost_migration_log does
> is return if dev->started is not
>       set. What’s to stop a disconnect from clearing the vdev right
> after this check, just before
>       vhost_dev_set_features() is called?
> 
> As stated above, I would prefer it if we could add some state which
> would persist between
> reconnects which could then be checked in the vhost-user code before
> interacting with
> the backend. I understand this will be a much more involved change and
> will require a lot
> of thought.
> 
> Also, regarding (1) above, if the original check in
> vhost_migration_log() returns success if the
> device is not started why return an error here? I imagine this could
> lead to some inconsistent
> behavior if the device disconnects before the first check verses
> before the second.
> 
> > +
> > +    if (!dev->started) {
> > +        /*
> > +         * If vhost-user daemon is used as a backend for the
> > +         * device and the connection is broken, then the vhost_dev
> > +         * structure will be reset all its values to 0.
> > +         * Add additional check for the device state.
> > +         */
> > +        return -1;
> > +    }
> > +
> >      r = vhost_dev_set_features(dev, enable_log);
> >      if (r < 0) {
> >          goto err_features;
> > @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >      }



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-06 22:08   ` Raphael Norwitz
  2020-05-07  7:15     ` Michael S. Tsirkin
@ 2020-05-07 15:35     ` Dima Stepanov
  2020-05-11  0:03       ` Raphael Norwitz
  1 sibling, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-07 15:35 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

On Wed, May 06, 2020 at 06:08:34PM -0400, Raphael Norwitz wrote:
> As you correctly point out, this code needs to be looked at more
> carefully so that
> if the device does disconnect in the background we can handle the migration path
> gracefully. In particular, we need to decide whether a migration
> should be allowed
> to continue if a device disconnects durning the migration stage.
From what i see from the code it is allowed. At the start of the
hw/virtio/vhost.c:vhost_migration_log() routine there is a check:
    if (!dev->started) {
        dev->log_enabled = enable;
        return 0;
    }
So our changes had the same idea. If device isn't started then 0 can be
returned. Please note, that if we want to return error here then the
following assert will be hit (hw/virtio/vhost.c)
    static void vhost_log_global_start(MemoryListener *listener)
    {
        int r;

        r = vhost_migration_log(listener, true);
        if (r < 0) {
            abort();
        }
    }
But as i mentioned we didn't change this logic, we just propogate it on
the whole migration start process during vhost handshake. After it our
tests passed successfully.

> 
> mst, any thoughts?
> 
> Have you looked at the suggestion I gave Li Feng to move vhost_dev_cleanup()
> into the connection path in vhost-user-blk? I’m not sure if he’s
> actively working on it,
> but I would prefer if we can find a way to keep some state around
> between reconnects
> so we aren’t constantly checking dev->started. A device can be stopped
> for reasons
> other than backend disconnect so I’d rather not reuse this field to
> check for backend
> disconnect failures.
In fact i didn't try to use >started field to signal about disconnect.
What i tried to follow is that if device not started (because of
disconnect or any other reason), there is no need to continue
initialization and we can proceed with the next migration step.

> 
> On Thu, Apr 30, 2020 at 9:57 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> >
> > If vhost-user daemon is used as a backend for the vhost device, then we
> > should consider a possibility of disconnect at any moment. If such
> > disconnect happened in the vhost_migration_log() routine the vhost
> > device structure will be clean up.
> > At the start of the vhost_migration_log() function there is a check:
> >   if (!dev->started) {
> >       dev->log_enabled = enable;
> >       return 0;
> >   }
> > To be consistent with this check add the same check after calling the
> > vhost_dev_set_log() routine. This in general help not to break a
> 
> Could you point to the specific asserts which are being triggered?
Just to be clear here. The assert message i mentioned is described
above. I wanted to explain why we followed the "(!dev->started) return 0"
logic. And in this case we didn't return error and return 0.

But the first error we hit during migration testing was SIGSEGV:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056354db0a74a in vhost_dev_has_iommu (dev=0x563550562b00)
    at hw/virtio/vhost.c:299
299         return vdev->dma_as != &address_space_memory &&
(gdb) p vdev
$1 = (VirtIODevice *) 0x0
(gdb) bt
#0  0x000056354db0a74a in vhost_dev_has_iommu (dev=0x563550562b00)
    at hw/virtio/vhost.c:299
#1  0x000056354db0bb76 in vhost_dev_set_features (dev=0x563550562b00, enable_log=true)
    at hw/virtio/vhost.c:777
#2  0x000056354db0bc1e in vhost_dev_set_log (dev=0x563550562b00, enable_log=true)
    at hw/virtio/vhost.c:790
    #3  0x000056354db0be58 in vhost_migration_log (listener=0x563550562b08, enable=1)
    at hw/virtio/vhost.c:834
#4  0x000056354db0be9b in vhost_log_global_start (listener=0x563550562b08)
    at hw/virtio/vhost.c:847
#5  0x000056354da72e7e in memory_global_dirty_log_start ()
    at memory.c:2611
...


> 
> > migration due the assert() message. But it looks like that this code
> > should be revised to handle these errors more carefully.
> >
> > In case of vhost-user device backend the fail paths should consider the
> > state of the device. In this case we should skip some function calls
> > during rollback on the error paths, so not to get the NULL dereference
> > errors.
> >
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > ---
> >  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >  1 file changed, 35 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 3ee50c4..d5ab96d 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >  {
> >      int r, i, idx;
> 
> A couple points here
> 
> 
> (1) This will fail the live migration if the device is disconnected.
> That my be the right thing
>       to do, but if there are cases where migrations can proceed with
> a disconnected device,
>       this may not be desirable.
I'm not sure that it is correct. VM could be migrated successfully
during device daemon disconnect.

> 
> (2) This looks racy. As far as I can tell vhost_dev_set_log() is only
> called by vhost_migration_log(),
>       and as you say one of the first things vhost_migration_log does
> is return if dev->started is not
>       set. What’s to stop a disconnect from clearing the vdev right
> after this check, just before
>       vhost_dev_set_features() is called?
Sorry, but i'm not sure that i've got your point here. We can't stop
disconnect just before vhost_dev_set_features() right now. Or do you
mean that it should be like don't make clean up if device is in the
migration step? Well, it is hard to say. If we can agree that
dev-started logic is correct, then there is no reason for it. But if we
think that this logic is wrong, then yes, we should change smth.

> 
> As stated above, I would prefer it if we could add some state which
> would persist between
> reconnects which could then be checked in the vhost-user code before
> interacting with
> the backend. I understand this will be a much more involved change and
> will require a lot
> of thought.
> 
> Also, regarding (1) above, if the original check in
> vhost_migration_log() returns success if the
> device is not started why return an error here? I imagine this could
> lead to some inconsistent
> behavior if the device disconnects before the first check verses
> before the second.
Yes, i agree with you. That is why i mentioned in the commit message
that maybe this code should be reviewed carefully. On the other side
our changes, as i see it:
  - following the same logic with the device state
  - fix the SIGSEGV error in case of disconnect
  - migration/disconnect test passed successfully
Some words about the test:
  - run src VM with vhost-usr-blk daemon used
  - run fio inside it
  - perform reconnect every X seconds (just kill and restart daemon),
    X is random
  - run dst VM
  - perform migration
  - fio should complete in dst VM
And we cycle this test like forever. SIGSEGV during vhost handshake is
hit once per ~30 tries. By adding an artificial delay in the QEMU source
code we were able to hit this SIGSEGV in 100% of times. After this patch
our test passed successfully.

What do you think?

No other comments mixed in below.

> 
> > +
> > +    if (!dev->started) {
> > +        /*
> > +         * If vhost-user daemon is used as a backend for the
> > +         * device and the connection is broken, then the vhost_dev
> > +         * structure will be reset all its values to 0.
> > +         * Add additional check for the device state.
> > +         */
> > +        return -1;
> > +    }
> > +
> >      r = vhost_dev_set_features(dev, enable_log);
> >      if (r < 0) {
> >          goto err_features;
> > @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >      }


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-07 15:35     ` Dima Stepanov
@ 2020-05-11  0:03       ` Raphael Norwitz
  2020-05-11  9:43         ` Dima Stepanov
  0 siblings, 1 reply; 51+ messages in thread
From: Raphael Norwitz @ 2020-05-11  0:03 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

On Thu, May 7, 2020 at 11:35 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> What do you think?
>

Apologies - I tripped over the if (dev->started && r < 0) check.
Never-mind my point with race conditions and failing migrations.

Rather than modifying vhost_dev_set_log(), it may be clearer to put a
check after vhost_dev_log_resize()? Something like:

--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -829,11 +829,22 @@ static int vhost_migration_log(MemoryListener
*listener, int enable)
         vhost_log_put(dev, false);
     } else {
         vhost_dev_log_resize(dev, vhost_get_log_size(dev));
+        /*
+         * A device can be stopped because of backend disconnect inside
+         * vhost_dev_log_resize(). In this case we should mark logging
+         * enabled and return without attempting to set the backend
+         * logging state.
+         */
+        if (!dev->started) {
+            goto out_success;
+        }
         r = vhost_dev_set_log(dev, true);
         if (r < 0) {
             return r;
         }
     }
+
+out_success:
     dev->log_enabled = enable;
     return 0;
 }

This seems harmless enough to me, and I see how it fixes your
particular crash, but I would still prefer we worked towards a more
robust solution. In particular I think we could handle this inside
vhost-user-blk if we let the device state persist between connections
(i.e. call vhost_dev_cleanup() inside vhost_user_blk_connect() before
vhost_dev_init() on reconnect). This should also fix some of the
crashes Li Feng has hit, and probably others which haven’t been
reported yet. What do you think?

If that’s unworkable I guess we will need to add these vhost level
checks. In that case I would still prefer we add a “disconnected” flag
in struct vhost_dev struct, and make sure it isn’t cleared by
vhost_dev_cleanup(). That way we don’t conflate stopping a device with
backend disconnect at the vhost level and potentially regress behavior
for other device types.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device
  2020-04-30 13:36 ` [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
  2020-05-04  0:36   ` Raphael Norwitz
@ 2020-05-11  3:03   ` Jason Wang
  2020-05-11  8:55     ` Dima Stepanov
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-11  3:03 UTC (permalink / raw)
  To: Dima Stepanov, qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz


On 2020/4/30 下午9:36, Dima Stepanov wrote:
> Introduce new wrappers to set/reset guest notifiers for the virtio
> device in the vhost device module:
>    vhost_dev_assign_guest_notifiers
>      ->set_guest_notifiers(..., ..., true);
>    vhost_dev_drop_guest_notifiers
>      ->set_guest_notifiers(..., ..., false);
> This is a preliminary step to refactor code,


Maybe I miss something, I don't see any add-on patch to modify the new 
wrapper in this series?


>   so the set_guest_notifiers
> methods could be called based on the vhost device state.
> Update all vhost used devices to use these wrappers instead of direct
> method call.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>   backends/cryptodev-vhost.c  | 26 +++++++++++++++-----------
>   backends/vhost-user.c       | 16 +++++-----------
>   hw/block/vhost-user-blk.c   | 15 +++++----------
>   hw/net/vhost_net.c          | 30 +++++++++++++++++-------------
>   hw/scsi/vhost-scsi-common.c | 15 +++++----------
>   hw/virtio/vhost-user-fs.c   | 17 +++++++----------
>   hw/virtio/vhost-vsock.c     | 18 ++++++++----------
>   hw/virtio/vhost.c           | 38 ++++++++++++++++++++++++++++++++++++++
>   hw/virtio/virtio.c          | 13 +++++++++++++
>   include/hw/virtio/vhost.h   |  4 ++++
>   include/hw/virtio/virtio.h  |  1 +
>   11 files changed, 118 insertions(+), 75 deletions(-)
>
> diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
> index 8337c9a..4522195 100644
> --- a/backends/cryptodev-vhost.c
> +++ b/backends/cryptodev-vhost.c
> @@ -169,16 +169,13 @@ vhost_set_vring_enable(CryptoDevBackendClient *cc,
>   int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
>   {
>       VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> -    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>       int r, e;
>       int i;
>       CryptoDevBackend *b = vcrypto->cryptodev;
>       CryptoDevBackendVhost *vhost_crypto;
>       CryptoDevBackendClient *cc;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(dev)) {
>           error_report("binding does not support guest notifiers");
>           return -ENOSYS;
>       }
> @@ -198,9 +195,13 @@ int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
>           }
>        }
>   
> -    r = k->set_guest_notifiers(qbus->parent, total_queues, true);
> +    /*
> +     * Since all the states are handled by one vhost device,
> +     * use the first one in array.
> +     */
> +    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
> +    r = vhost_dev_assign_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
>       if (r < 0) {
> -        error_report("error binding guest notifier: %d", -r);
>           goto err;
>       }
>   
> @@ -232,7 +233,8 @@ err_start:
>           vhost_crypto = cryptodev_get_vhost(cc, b, i);
>           cryptodev_vhost_stop_one(vhost_crypto, dev);
>       }
> -    e = k->set_guest_notifiers(qbus->parent, total_queues, false);
> +    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
> +    e = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
>       if (e < 0) {
>           error_report("vhost guest notifier cleanup failed: %d", e);
>       }
> @@ -242,9 +244,6 @@ err:
>   
>   void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
>   {
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> -    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>       VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
>       CryptoDevBackend *b = vcrypto->cryptodev;
>       CryptoDevBackendVhost *vhost_crypto;
> @@ -259,7 +258,12 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
>           cryptodev_vhost_stop_one(vhost_crypto, dev);
>       }
>   
> -    r = k->set_guest_notifiers(qbus->parent, total_queues, false);
> +    /*
> +     * Since all the states are handled by one vhost device,
> +     * use the first one in array.
> +     */
> +    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
> +    r = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
>       if (r < 0) {
>           error_report("vhost guest notifier cleanup failed: %d", r);
>       }
> diff --git a/backends/vhost-user.c b/backends/vhost-user.c
> index 2bf3406..e116bc6 100644
> --- a/backends/vhost-user.c
> +++ b/backends/vhost-user.c
> @@ -60,15 +60,13 @@ vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice *vdev,
>   void
>   vhost_user_backend_start(VhostUserBackend *b)
>   {
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret, i ;
>   
>       if (b->started) {
>           return;
>       }
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(b->vdev)) {
>           error_report("binding does not support guest notifiers");
>           return;
>       }
> @@ -78,9 +76,8 @@ vhost_user_backend_start(VhostUserBackend *b)
>           return;
>       }
>   
> -    ret = k->set_guest_notifiers(qbus->parent, b->dev.nvqs, true);
> +    ret = vhost_dev_assign_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
>       if (ret < 0) {
> -        error_report("Error binding guest notifier");
>           goto err_host_notifiers;
>       }
>   
> @@ -104,7 +101,7 @@ vhost_user_backend_start(VhostUserBackend *b)
>       return;
>   
>   err_guest_notifiers:
> -    k->set_guest_notifiers(qbus->parent, b->dev.nvqs, false);
> +    vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
>   err_host_notifiers:
>       vhost_dev_disable_notifiers(&b->dev, b->vdev);
>   }
> @@ -112,8 +109,6 @@ err_host_notifiers:
>   void
>   vhost_user_backend_stop(VhostUserBackend *b)
>   {
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret = 0;
>   
>       if (!b->started) {
> @@ -122,9 +117,8 @@ vhost_user_backend_stop(VhostUserBackend *b)
>   
>       vhost_dev_stop(&b->dev, b->vdev);
>   
> -    if (k->set_guest_notifiers) {
> -        ret = k->set_guest_notifiers(qbus->parent,
> -                                     b->dev.nvqs, false);
> +    if (virtio_device_guest_notifiers_initialized(b->vdev)) {
> +        ret = vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
>           if (ret < 0) {
>               error_report("vhost guest notifier cleanup failed: %d", ret);
>           }
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 17df533..70d7842 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -109,11 +109,9 @@ const VhostDevConfigOps blk_ops = {
>   static int vhost_user_blk_start(VirtIODevice *vdev)
>   {
>       VHostUserBlk *s = VHOST_USER_BLK(vdev);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int i, ret;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(vdev)) {
>           error_report("binding does not support guest notifiers");
>           return -ENOSYS;
>       }
> @@ -124,9 +122,8 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
>           return ret;
>       }
>   
> -    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> +    ret = vhost_dev_assign_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
>       if (ret < 0) {
> -        error_report("Error binding guest notifier: %d", -ret);
>           goto err_host_notifiers;
>       }
>   
> @@ -163,7 +160,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
>       return ret;
>   
>   err_guest_notifiers:
> -    k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> +    vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
>   err_host_notifiers:
>       vhost_dev_disable_notifiers(&s->dev, vdev);
>       return ret;
> @@ -172,17 +169,15 @@ err_host_notifiers:
>   static void vhost_user_blk_stop(VirtIODevice *vdev)
>   {
>       VHostUserBlk *s = VHOST_USER_BLK(vdev);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(vdev)) {
>           return;
>       }
>   
>       vhost_dev_stop(&s->dev, vdev);
>   
> -    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> +    ret = vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
>       if (ret < 0) {
>           error_report("vhost guest notifier cleanup failed: %d", ret);
>           return;
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 6b82803..c13b444 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -303,19 +303,15 @@ static void vhost_net_stop_one(struct vhost_net *net,
>   int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>                       int total_queues)
>   {
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> -    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> +    struct vhost_net *net;
>       int r, e, i;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(dev)) {
>           error_report("binding does not support guest notifiers");
>           return -ENOSYS;
>       }
>   
>       for (i = 0; i < total_queues; i++) {
> -        struct vhost_net *net;
> -
>           net = get_vhost_net(ncs[i].peer);
>           vhost_net_set_vq_index(net, i * 2);
>   
> @@ -328,9 +324,13 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>           }
>        }
>   
> -    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> +    /*
> +     * Since all the states are handled by one vhost_net device,
> +     * use the first one in array.
> +     */


This comment is confusing, kernel vhost-net backends will use all its peers.


> +    net = get_vhost_net(ncs[0].peer);
> +    r = vhost_dev_assign_guest_notifiers(&net->dev, dev, total_queues * 2);
>       if (r < 0) {
> -        error_report("Error binding guest notifier: %d", -r);
>           goto err;
>       }
>   
> @@ -357,7 +357,8 @@ err_start:
>       while (--i >= 0) {
>           vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
>       }
> -    e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
> +    net = get_vhost_net(ncs[0].peer);
> +    e = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
>       if (e < 0) {
>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
>           fflush(stderr);
> @@ -369,16 +370,19 @@ err:
>   void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>                       int total_queues)
>   {
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> -    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> +    struct vhost_net *net;
>       int i, r;
>   
>       for (i = 0; i < total_queues; i++) {
>           vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
>       }
>   
> -    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
> +    /*
> +     * Since all the states are handled by one vhost_net device,
> +     * use the first one in array.
> +     */
> +    net = get_vhost_net(ncs[0].peer);
> +    r = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
>       if (r < 0) {
>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
>           fflush(stderr);
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> index 8ec49d7..8f51ec0 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -29,10 +29,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>   {
>       int ret, i;
>       VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(vdev)) {
>           error_report("binding does not support guest notifiers");
>           return -ENOSYS;
>       }
> @@ -42,9 +40,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>           return ret;
>       }
>   
> -    ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, true);
> +    ret = vhost_dev_assign_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
>       if (ret < 0) {
> -        error_report("Error binding guest notifier");
>           goto err_host_notifiers;
>       }
>   
> @@ -66,7 +63,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>       return ret;
>   
>   err_guest_notifiers:
> -    k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
> +    vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
>   err_host_notifiers:
>       vhost_dev_disable_notifiers(&vsc->dev, vdev);
>       return ret;
> @@ -75,14 +72,12 @@ err_host_notifiers:
>   void vhost_scsi_common_stop(VHostSCSICommon *vsc)
>   {
>       VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret = 0;
>   
>       vhost_dev_stop(&vsc->dev, vdev);
>   
> -    if (k->set_guest_notifiers) {
> -        ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
> +    if (virtio_device_guest_notifiers_initialized(vdev)) {
> +        ret = vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
>           if (ret < 0) {
>                   error_report("vhost guest notifier cleanup failed: %d", ret);
>           }
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 6136768..6b101fc 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -38,12 +38,10 @@ static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>   static void vuf_start(VirtIODevice *vdev)
>   {
>       VHostUserFS *fs = VHOST_USER_FS(vdev);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret;
>       int i;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(vdev)) {
>           error_report("binding does not support guest notifiers");
>           return;
>       }
> @@ -54,9 +52,9 @@ static void vuf_start(VirtIODevice *vdev)
>           return;
>       }
>   
> -    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, true);
> +    ret = vhost_dev_assign_guest_notifiers(&fs->vhost_dev, vdev,
> +            fs->vhost_dev.nvqs);
>       if (ret < 0) {
> -        error_report("Error binding guest notifier: %d", -ret);
>           goto err_host_notifiers;
>       }
>   
> @@ -79,7 +77,7 @@ static void vuf_start(VirtIODevice *vdev)
>       return;
>   
>   err_guest_notifiers:
> -    k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
> +    vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev, fs->vhost_dev.nvqs);
>   err_host_notifiers:
>       vhost_dev_disable_notifiers(&fs->vhost_dev, vdev);
>   }
> @@ -87,17 +85,16 @@ err_host_notifiers:
>   static void vuf_stop(VirtIODevice *vdev)
>   {
>       VHostUserFS *fs = VHOST_USER_FS(vdev);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(vdev)) {
>           return;
>       }
>   
>       vhost_dev_stop(&fs->vhost_dev, vdev);
>   
> -    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
> +    ret = vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev,
> +            fs->vhost_dev.nvqs);
>       if (ret < 0) {
>           error_report("vhost guest notifier cleanup failed: %d", ret);
>           return;
> diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
> index 09b6b07..52489dd 100644
> --- a/hw/virtio/vhost-vsock.c
> +++ b/hw/virtio/vhost-vsock.c
> @@ -75,12 +75,10 @@ static int vhost_vsock_set_running(VHostVSock *vsock, int start)
>   static void vhost_vsock_start(VirtIODevice *vdev)
>   {
>       VHostVSock *vsock = VHOST_VSOCK(vdev);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret;
>       int i;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(vdev)) {
>           error_report("binding does not support guest notifiers");
>           return;
>       }
> @@ -91,9 +89,9 @@ static void vhost_vsock_start(VirtIODevice *vdev)
>           return;
>       }
>   
> -    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, true);
> +    ret = vhost_dev_assign_guest_notifiers(&vsock->vhost_dev,
> +            vdev, vsock->vhost_dev.nvqs);
>       if (ret < 0) {
> -        error_report("Error binding guest notifier: %d", -ret);
>           goto err_host_notifiers;
>       }
>   
> @@ -123,7 +121,8 @@ static void vhost_vsock_start(VirtIODevice *vdev)
>   err_dev_start:
>       vhost_dev_stop(&vsock->vhost_dev, vdev);
>   err_guest_notifiers:
> -    k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
> +    vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
> +            vdev, vsock->vhost_dev.nvqs);
>   err_host_notifiers:
>       vhost_dev_disable_notifiers(&vsock->vhost_dev, vdev);
>   }
> @@ -131,11 +130,9 @@ err_host_notifiers:
>   static void vhost_vsock_stop(VirtIODevice *vdev)
>   {
>       VHostVSock *vsock = VHOST_VSOCK(vdev);
> -    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>       int ret;
>   
> -    if (!k->set_guest_notifiers) {
> +    if (!virtio_device_guest_notifiers_initialized(vdev)) {
>           return;
>       }
>   
> @@ -147,7 +144,8 @@ static void vhost_vsock_stop(VirtIODevice *vdev)
>   
>       vhost_dev_stop(&vsock->vhost_dev, vdev);
>   
> -    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
> +    ret = vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
> +            vdev, vsock->vhost_dev.nvqs);
>       if (ret < 0) {
>           error_report("vhost guest notifier cleanup failed: %d", ret);
>           return;
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 01ebe12..fa3da9c 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1419,6 +1419,44 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>       virtio_device_release_ioeventfd(vdev);
>   }
>   
> +/*
> + * Assign guest notifiers.
> + * Should be called after vhost_dev_enable_notifiers.
> + */
> +int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
> +                                     VirtIODevice *vdev, int nvqs)
> +{
> +    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> +    int ret;
> +
> +    ret = k->set_guest_notifiers(qbus->parent, nvqs, true);
> +    if (ret < 0) {
> +        error_report("Error binding guest notifier: %d", -ret);
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * Drop guest notifiers.
> + * Should be called before vhost_dev_disable_notifiers.
> + */
> +int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
> +                                   VirtIODevice *vdev, int nvqs)
> +{


hdev is not used?

Thanks


> +    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> +    int ret;
> +
> +    ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
> +    if (ret < 0) {
> +        error_report("Error reset guest notifier: %d", -ret);
> +    }
> +
> +    return ret;
> +}
> +
>   /* Test and clear event pending status.
>    * Should be called after unmask to avoid losing events.
>    */
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index b6c8ef5..8a95618 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3812,6 +3812,19 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
>       return virtio_bus_ioeventfd_enabled(vbus);
>   }
>   
> +/*
> + * Check if set_guest_notifiers() method is set by the init routine.
> + * Return true if yes, otherwise return false.
> + */
> +bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev)
> +{
> +    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> +
> +    return k->set_guest_notifiers;
> +}
> +
> +
>   static const TypeInfo virtio_device_info = {
>       .name = TYPE_VIRTIO_DEVICE,
>       .parent = TYPE_DEVICE,
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 085450c..4d0d2e2 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -100,6 +100,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
> +int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
> +                                     VirtIODevice *vdev, int nvqs);
> +int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
> +                                   VirtIODevice *vdev, int nvqs);
>   
>   /* Test and clear masked event pending status.
>    * Should be called after unmask to avoid losing events.
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b69d517..d9a3d72 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -323,6 +323,7 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
>                                                   VirtIOHandleAIOOutput handle_output);
>   VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
>   VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
> +bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev);
>   
>   static inline void virtio_add_feature(uint64_t *features, unsigned int fbit)
>   {



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-04-30 13:36 ` [PATCH v2 4/5] vhost: check vring address before calling unmap Dima Stepanov
  2020-05-04  1:13   ` Raphael Norwitz
@ 2020-05-11  3:05   ` Jason Wang
  2020-05-11  9:11     ` Dima Stepanov
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-11  3:05 UTC (permalink / raw)
  To: Dima Stepanov, qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz


On 2020/4/30 下午9:36, Dima Stepanov wrote:
> Since disconnect can happen at any time during initialization not all
> vring buffers (for instance used vring) can be intialized successfully.
> If the buffer was not initialized then vhost_memory_unmap call will lead
> to SIGSEGV. Add checks for the vring address value before calling unmap.
> Also add assert() in the vhost_memory_unmap() routine.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>   hw/virtio/vhost.c | 27 +++++++++++++++++++++------
>   1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index ddbdc53..3ee50c4 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
>                                  hwaddr len, int is_write,
>                                  hwaddr access_len)
>   {
> +    assert(buffer);
> +
>       if (!vhost_dev_has_iommu(dev)) {
>           cpu_physical_memory_unmap(buffer, len, is_write, access_len);
>       }
> @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>                                                   vhost_vq_index);
>       }
>   
> -    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
> -                       1, virtio_queue_get_used_size(vdev, idx));
> -    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
> -                       0, virtio_queue_get_avail_size(vdev, idx));
> -    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
> -                       0, virtio_queue_get_desc_size(vdev, idx));
> +    /*
> +     * Since the vhost-user disconnect can happen during initialization
> +     * check if vring was initialized, before making unmap.
> +     */
> +    if (vq->used) {
> +        vhost_memory_unmap(dev, vq->used,
> +                           virtio_queue_get_used_size(vdev, idx),
> +                           1, virtio_queue_get_used_size(vdev, idx));
> +    }
> +    if (vq->avail) {
> +        vhost_memory_unmap(dev, vq->avail,
> +                           virtio_queue_get_avail_size(vdev, idx),
> +                           0, virtio_queue_get_avail_size(vdev, idx));
> +    }
> +    if (vq->desc) {
> +        vhost_memory_unmap(dev, vq->desc,
> +                           virtio_queue_get_desc_size(vdev, idx),
> +                           0, virtio_queue_get_desc_size(vdev, idx));
> +    }


Any reason not checking hdev->started instead? vhost_dev_start() will 
set it to true if virtqueues were correctly mapped.

Thanks


>   }
>   
>   static void vhost_eventfd_add(MemoryListener *listener,



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-04-30 13:36 ` [PATCH v2 5/5] vhost: add device started check in migration set log Dima Stepanov
  2020-05-06 22:08   ` Raphael Norwitz
@ 2020-05-11  3:15   ` Jason Wang
  2020-05-11  9:25     ` Dima Stepanov
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-11  3:15 UTC (permalink / raw)
  To: Dima Stepanov, qemu-devel
  Cc: fam, kwolf, yc-core, qemu-block, mst, dgilbert, mreitz,
	arei.gonglei, fengli, stefanha, marcandre.lureau, pbonzini,
	raphael.norwitz


On 2020/4/30 下午9:36, Dima Stepanov wrote:
> If vhost-user daemon is used as a backend for the vhost device, then we
> should consider a possibility of disconnect at any moment. If such
> disconnect happened in the vhost_migration_log() routine the vhost
> device structure will be clean up.
> At the start of the vhost_migration_log() function there is a check:
>    if (!dev->started) {
>        dev->log_enabled = enable;
>        return 0;
>    }
> To be consistent with this check add the same check after calling the
> vhost_dev_set_log() routine. This in general help not to break a
> migration due the assert() message. But it looks like that this code
> should be revised to handle these errors more carefully.
>
> In case of vhost-user device backend the fail paths should consider the
> state of the device. In this case we should skip some function calls
> during rollback on the error paths, so not to get the NULL dereference
> errors.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>   hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
>   1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 3ee50c4..d5ab96d 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>   static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>   {
>       int r, i, idx;
> +
> +    if (!dev->started) {
> +        /*
> +         * If vhost-user daemon is used as a backend for the
> +         * device and the connection is broken, then the vhost_dev
> +         * structure will be reset all its values to 0.
> +         * Add additional check for the device state.
> +         */
> +        return -1;
> +    }
> +
>       r = vhost_dev_set_features(dev, enable_log);
>       if (r < 0) {
>           goto err_features;
> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>       }
>       return 0;
>   err_vq:
> -    for (; i >= 0; --i) {
> +    /*
> +     * Disconnect with the vhost-user daemon can lead to the
> +     * vhost_dev_cleanup() call which will clean up vhost_dev
> +     * structure.
> +     */
> +    for (; dev->started && (i >= 0); --i) {
>           idx = dev->vhost_ops->vhost_get_vq_index(


Why need the check of dev->started here, can started be modified outside 
mainloop? If yes, I don't get the check of !dev->started in the 
beginning of this function.


> dev, dev->vq_index + i);
>           vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
>                                    dev->log_enabled);
>       }
> -    vhost_dev_set_features(dev, dev->log_enabled);
> +    if (dev->started) {
> +        vhost_dev_set_features(dev, dev->log_enabled);
> +    }
>   err_features:
>       return r;
>   }
> @@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
>       } else {
>           vhost_dev_log_resize(dev, vhost_get_log_size(dev));
>           r = vhost_dev_set_log(dev, true);
> -        if (r < 0) {
> +        /*
> +         * The dev log resize can fail, because of disconnect
> +         * with the vhost-user-blk daemon. Check the device
> +         * state before calling the vhost_dev_set_log()
> +         * function.
> +         * Don't return error if device isn't started to be
> +         * consistent with the check above.
> +         */
> +        if (dev->started && r < 0) {
>               return r;
>           }
>       }
> @@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>   fail_log:
>       vhost_log_put(hdev, false);
>   fail_vq:
> -    while (--i >= 0) {
> +    /*
> +     * Disconnect with the vhost-user daemon can lead to the
> +     * vhost_dev_cleanup() call which will clean up vhost_dev
> +     * structure.
> +     */
> +    while ((--i >= 0) && (hdev->started)) {
>           vhost_virtqueue_stop(hdev,
>                                vdev,
>                                hdev->vqs + i,


This should be a separate patch.

Thanks



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device
  2020-05-11  3:03   ` Jason Wang
@ 2020-05-11  8:55     ` Dima Stepanov
  0 siblings, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-11  8:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, yc-core, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, raphael.norwitz, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

On Mon, May 11, 2020 at 11:03:01AM +0800, Jason Wang wrote:
> 
> On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >Introduce new wrappers to set/reset guest notifiers for the virtio
> >device in the vhost device module:
> >   vhost_dev_assign_guest_notifiers
> >     ->set_guest_notifiers(..., ..., true);
> >   vhost_dev_drop_guest_notifiers
> >     ->set_guest_notifiers(..., ..., false);
> >This is a preliminary step to refactor code,
> 
> 
> Maybe I miss something, I don't see any add-on patch to modify the new
> wrapper in this series?
Hi, in fact the next 3/5 patch:
  "[PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest
notifiers init state"
is about using these wrappers. But disregard it, i decided to follow
Raphael suggestion. So we will fix the vhost-user-blk case first, so i
will not introduce these wrappers. And the code will be more easier to
read and straightforward.
I will send v3 as soon as we decide what to do with the migration fix
in this patchset.

No other comments mixed in below.

> 
> 
> >  so the set_guest_notifiers
> >methods could be called based on the vhost device state.
> >Update all vhost used devices to use these wrappers instead of direct
> >method call.
> >
> >Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> >---
> >  backends/cryptodev-vhost.c  | 26 +++++++++++++++-----------
> >  backends/vhost-user.c       | 16 +++++-----------
> >  hw/block/vhost-user-blk.c   | 15 +++++----------
> >  hw/net/vhost_net.c          | 30 +++++++++++++++++-------------
> >  hw/scsi/vhost-scsi-common.c | 15 +++++----------
> >  hw/virtio/vhost-user-fs.c   | 17 +++++++----------
> >  hw/virtio/vhost-vsock.c     | 18 ++++++++----------
> >  hw/virtio/vhost.c           | 38 ++++++++++++++++++++++++++++++++++++++
> >  hw/virtio/virtio.c          | 13 +++++++++++++
> >  include/hw/virtio/vhost.h   |  4 ++++
> >  include/hw/virtio/virtio.h  |  1 +
> >  11 files changed, 118 insertions(+), 75 deletions(-)
> >
> >diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
> >index 8337c9a..4522195 100644
> >--- a/backends/cryptodev-vhost.c
> >+++ b/backends/cryptodev-vhost.c
> >@@ -169,16 +169,13 @@ vhost_set_vring_enable(CryptoDevBackendClient *cc,
> >  int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
> >  {
> >      VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> >-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> >      int r, e;
> >      int i;
> >      CryptoDevBackend *b = vcrypto->cryptodev;
> >      CryptoDevBackendVhost *vhost_crypto;
> >      CryptoDevBackendClient *cc;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(dev)) {
> >          error_report("binding does not support guest notifiers");
> >          return -ENOSYS;
> >      }
> >@@ -198,9 +195,13 @@ int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
> >          }
> >       }
> >-    r = k->set_guest_notifiers(qbus->parent, total_queues, true);
> >+    /*
> >+     * Since all the states are handled by one vhost device,
> >+     * use the first one in array.
> >+     */
> >+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
> >+    r = vhost_dev_assign_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
> >      if (r < 0) {
> >-        error_report("error binding guest notifier: %d", -r);
> >          goto err;
> >      }
> >@@ -232,7 +233,8 @@ err_start:
> >          vhost_crypto = cryptodev_get_vhost(cc, b, i);
> >          cryptodev_vhost_stop_one(vhost_crypto, dev);
> >      }
> >-    e = k->set_guest_notifiers(qbus->parent, total_queues, false);
> >+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
> >+    e = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
> >      if (e < 0) {
> >          error_report("vhost guest notifier cleanup failed: %d", e);
> >      }
> >@@ -242,9 +244,6 @@ err:
> >  void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
> >  {
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> >-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> >      VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
> >      CryptoDevBackend *b = vcrypto->cryptodev;
> >      CryptoDevBackendVhost *vhost_crypto;
> >@@ -259,7 +258,12 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)
> >          cryptodev_vhost_stop_one(vhost_crypto, dev);
> >      }
> >-    r = k->set_guest_notifiers(qbus->parent, total_queues, false);
> >+    /*
> >+     * Since all the states are handled by one vhost device,
> >+     * use the first one in array.
> >+     */
> >+    vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
> >+    r = vhost_dev_drop_guest_notifiers(&vhost_crypto->dev, dev, total_queues);
> >      if (r < 0) {
> >          error_report("vhost guest notifier cleanup failed: %d", r);
> >      }
> >diff --git a/backends/vhost-user.c b/backends/vhost-user.c
> >index 2bf3406..e116bc6 100644
> >--- a/backends/vhost-user.c
> >+++ b/backends/vhost-user.c
> >@@ -60,15 +60,13 @@ vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice *vdev,
> >  void
> >  vhost_user_backend_start(VhostUserBackend *b)
> >  {
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret, i ;
> >      if (b->started) {
> >          return;
> >      }
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(b->vdev)) {
> >          error_report("binding does not support guest notifiers");
> >          return;
> >      }
> >@@ -78,9 +76,8 @@ vhost_user_backend_start(VhostUserBackend *b)
> >          return;
> >      }
> >-    ret = k->set_guest_notifiers(qbus->parent, b->dev.nvqs, true);
> >+    ret = vhost_dev_assign_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
> >      if (ret < 0) {
> >-        error_report("Error binding guest notifier");
> >          goto err_host_notifiers;
> >      }
> >@@ -104,7 +101,7 @@ vhost_user_backend_start(VhostUserBackend *b)
> >      return;
> >  err_guest_notifiers:
> >-    k->set_guest_notifiers(qbus->parent, b->dev.nvqs, false);
> >+    vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
> >  err_host_notifiers:
> >      vhost_dev_disable_notifiers(&b->dev, b->vdev);
> >  }
> >@@ -112,8 +109,6 @@ err_host_notifiers:
> >  void
> >  vhost_user_backend_stop(VhostUserBackend *b)
> >  {
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret = 0;
> >      if (!b->started) {
> >@@ -122,9 +117,8 @@ vhost_user_backend_stop(VhostUserBackend *b)
> >      vhost_dev_stop(&b->dev, b->vdev);
> >-    if (k->set_guest_notifiers) {
> >-        ret = k->set_guest_notifiers(qbus->parent,
> >-                                     b->dev.nvqs, false);
> >+    if (virtio_device_guest_notifiers_initialized(b->vdev)) {
> >+        ret = vhost_dev_drop_guest_notifiers(&b->dev, b->vdev, b->dev.nvqs);
> >          if (ret < 0) {
> >              error_report("vhost guest notifier cleanup failed: %d", ret);
> >          }
> >diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >index 17df533..70d7842 100644
> >--- a/hw/block/vhost-user-blk.c
> >+++ b/hw/block/vhost-user-blk.c
> >@@ -109,11 +109,9 @@ const VhostDevConfigOps blk_ops = {
> >  static int vhost_user_blk_start(VirtIODevice *vdev)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int i, ret;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
> >          error_report("binding does not support guest notifiers");
> >          return -ENOSYS;
> >      }
> >@@ -124,9 +122,8 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
> >          return ret;
> >      }
> >-    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> >+    ret = vhost_dev_assign_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
> >      if (ret < 0) {
> >-        error_report("Error binding guest notifier: %d", -ret);
> >          goto err_host_notifiers;
> >      }
> >@@ -163,7 +160,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
> >      return ret;
> >  err_guest_notifiers:
> >-    k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >+    vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
> >  err_host_notifiers:
> >      vhost_dev_disable_notifiers(&s->dev, vdev);
> >      return ret;
> >@@ -172,17 +169,15 @@ err_host_notifiers:
> >  static void vhost_user_blk_stop(VirtIODevice *vdev)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
> >          return;
> >      }
> >      vhost_dev_stop(&s->dev, vdev);
> >-    ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> >+    ret = vhost_dev_drop_guest_notifiers(&s->dev, vdev, s->dev.nvqs);
> >      if (ret < 0) {
> >          error_report("vhost guest notifier cleanup failed: %d", ret);
> >          return;
> >diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >index 6b82803..c13b444 100644
> >--- a/hw/net/vhost_net.c
> >+++ b/hw/net/vhost_net.c
> >@@ -303,19 +303,15 @@ static void vhost_net_stop_one(struct vhost_net *net,
> >  int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >                      int total_queues)
> >  {
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> >-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> >+    struct vhost_net *net;
> >      int r, e, i;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(dev)) {
> >          error_report("binding does not support guest notifiers");
> >          return -ENOSYS;
> >      }
> >      for (i = 0; i < total_queues; i++) {
> >-        struct vhost_net *net;
> >-
> >          net = get_vhost_net(ncs[i].peer);
> >          vhost_net_set_vq_index(net, i * 2);
> >@@ -328,9 +324,13 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >          }
> >       }
> >-    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> >+    /*
> >+     * Since all the states are handled by one vhost_net device,
> >+     * use the first one in array.
> >+     */
> 
> 
> This comment is confusing, kernel vhost-net backends will use all its peers.
> 
> 
> >+    net = get_vhost_net(ncs[0].peer);
> >+    r = vhost_dev_assign_guest_notifiers(&net->dev, dev, total_queues * 2);
> >      if (r < 0) {
> >-        error_report("Error binding guest notifier: %d", -r);
> >          goto err;
> >      }
> >@@ -357,7 +357,8 @@ err_start:
> >      while (--i >= 0) {
> >          vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
> >      }
> >-    e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
> >+    net = get_vhost_net(ncs[0].peer);
> >+    e = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
> >      if (e < 0) {
> >          fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
> >          fflush(stderr);
> >@@ -369,16 +370,19 @@ err:
> >  void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
> >                      int total_queues)
> >  {
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> >-    VirtioBusState *vbus = VIRTIO_BUS(qbus);
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> >+    struct vhost_net *net;
> >      int i, r;
> >      for (i = 0; i < total_queues; i++) {
> >          vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
> >      }
> >-    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
> >+    /*
> >+     * Since all the states are handled by one vhost_net device,
> >+     * use the first one in array.
> >+     */
> >+    net = get_vhost_net(ncs[0].peer);
> >+    r = vhost_dev_drop_guest_notifiers(&net->dev, dev, total_queues * 2);
> >      if (r < 0) {
> >          fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
> >          fflush(stderr);
> >diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> >index 8ec49d7..8f51ec0 100644
> >--- a/hw/scsi/vhost-scsi-common.c
> >+++ b/hw/scsi/vhost-scsi-common.c
> >@@ -29,10 +29,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> >  {
> >      int ret, i;
> >      VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
> >          error_report("binding does not support guest notifiers");
> >          return -ENOSYS;
> >      }
> >@@ -42,9 +40,8 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> >          return ret;
> >      }
> >-    ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, true);
> >+    ret = vhost_dev_assign_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
> >      if (ret < 0) {
> >-        error_report("Error binding guest notifier");
> >          goto err_host_notifiers;
> >      }
> >@@ -66,7 +63,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> >      return ret;
> >  err_guest_notifiers:
> >-    k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
> >+    vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
> >  err_host_notifiers:
> >      vhost_dev_disable_notifiers(&vsc->dev, vdev);
> >      return ret;
> >@@ -75,14 +72,12 @@ err_host_notifiers:
> >  void vhost_scsi_common_stop(VHostSCSICommon *vsc)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret = 0;
> >      vhost_dev_stop(&vsc->dev, vdev);
> >-    if (k->set_guest_notifiers) {
> >-        ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
> >+    if (virtio_device_guest_notifiers_initialized(vdev)) {
> >+        ret = vhost_dev_drop_guest_notifiers(&vsc->dev, vdev, vsc->dev.nvqs);
> >          if (ret < 0) {
> >                  error_report("vhost guest notifier cleanup failed: %d", ret);
> >          }
> >diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> >index 6136768..6b101fc 100644
> >--- a/hw/virtio/vhost-user-fs.c
> >+++ b/hw/virtio/vhost-user-fs.c
> >@@ -38,12 +38,10 @@ static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> >  static void vuf_start(VirtIODevice *vdev)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret;
> >      int i;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
> >          error_report("binding does not support guest notifiers");
> >          return;
> >      }
> >@@ -54,9 +52,9 @@ static void vuf_start(VirtIODevice *vdev)
> >          return;
> >      }
> >-    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, true);
> >+    ret = vhost_dev_assign_guest_notifiers(&fs->vhost_dev, vdev,
> >+            fs->vhost_dev.nvqs);
> >      if (ret < 0) {
> >-        error_report("Error binding guest notifier: %d", -ret);
> >          goto err_host_notifiers;
> >      }
> >@@ -79,7 +77,7 @@ static void vuf_start(VirtIODevice *vdev)
> >      return;
> >  err_guest_notifiers:
> >-    k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
> >+    vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev, fs->vhost_dev.nvqs);
> >  err_host_notifiers:
> >      vhost_dev_disable_notifiers(&fs->vhost_dev, vdev);
> >  }
> >@@ -87,17 +85,16 @@ err_host_notifiers:
> >  static void vuf_stop(VirtIODevice *vdev)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
> >          return;
> >      }
> >      vhost_dev_stop(&fs->vhost_dev, vdev);
> >-    ret = k->set_guest_notifiers(qbus->parent, fs->vhost_dev.nvqs, false);
> >+    ret = vhost_dev_drop_guest_notifiers(&fs->vhost_dev, vdev,
> >+            fs->vhost_dev.nvqs);
> >      if (ret < 0) {
> >          error_report("vhost guest notifier cleanup failed: %d", ret);
> >          return;
> >diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
> >index 09b6b07..52489dd 100644
> >--- a/hw/virtio/vhost-vsock.c
> >+++ b/hw/virtio/vhost-vsock.c
> >@@ -75,12 +75,10 @@ static int vhost_vsock_set_running(VHostVSock *vsock, int start)
> >  static void vhost_vsock_start(VirtIODevice *vdev)
> >  {
> >      VHostVSock *vsock = VHOST_VSOCK(vdev);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret;
> >      int i;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
> >          error_report("binding does not support guest notifiers");
> >          return;
> >      }
> >@@ -91,9 +89,9 @@ static void vhost_vsock_start(VirtIODevice *vdev)
> >          return;
> >      }
> >-    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, true);
> >+    ret = vhost_dev_assign_guest_notifiers(&vsock->vhost_dev,
> >+            vdev, vsock->vhost_dev.nvqs);
> >      if (ret < 0) {
> >-        error_report("Error binding guest notifier: %d", -ret);
> >          goto err_host_notifiers;
> >      }
> >@@ -123,7 +121,8 @@ static void vhost_vsock_start(VirtIODevice *vdev)
> >  err_dev_start:
> >      vhost_dev_stop(&vsock->vhost_dev, vdev);
> >  err_guest_notifiers:
> >-    k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
> >+    vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
> >+            vdev, vsock->vhost_dev.nvqs);
> >  err_host_notifiers:
> >      vhost_dev_disable_notifiers(&vsock->vhost_dev, vdev);
> >  }
> >@@ -131,11 +130,9 @@ err_host_notifiers:
> >  static void vhost_vsock_stop(VirtIODevice *vdev)
> >  {
> >      VHostVSock *vsock = VHOST_VSOCK(vdev);
> >-    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >      int ret;
> >-    if (!k->set_guest_notifiers) {
> >+    if (!virtio_device_guest_notifiers_initialized(vdev)) {
> >          return;
> >      }
> >@@ -147,7 +144,8 @@ static void vhost_vsock_stop(VirtIODevice *vdev)
> >      vhost_dev_stop(&vsock->vhost_dev, vdev);
> >-    ret = k->set_guest_notifiers(qbus->parent, vsock->vhost_dev.nvqs, false);
> >+    ret = vhost_dev_drop_guest_notifiers(&vsock->vhost_dev,
> >+            vdev, vsock->vhost_dev.nvqs);
> >      if (ret < 0) {
> >          error_report("vhost guest notifier cleanup failed: %d", ret);
> >          return;
> >diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >index 01ebe12..fa3da9c 100644
> >--- a/hw/virtio/vhost.c
> >+++ b/hw/virtio/vhost.c
> >@@ -1419,6 +1419,44 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
> >      virtio_device_release_ioeventfd(vdev);
> >  }
> >+/*
> >+ * Assign guest notifiers.
> >+ * Should be called after vhost_dev_enable_notifiers.
> >+ */
> >+int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
> >+                                     VirtIODevice *vdev, int nvqs)
> >+{
> >+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >+    int ret;
> >+
> >+    ret = k->set_guest_notifiers(qbus->parent, nvqs, true);
> >+    if (ret < 0) {
> >+        error_report("Error binding guest notifier: %d", -ret);
> >+    }
> >+
> >+    return ret;
> >+}
> >+
> >+/*
> >+ * Drop guest notifiers.
> >+ * Should be called before vhost_dev_disable_notifiers.
> >+ */
> >+int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
> >+                                   VirtIODevice *vdev, int nvqs)
> >+{
> 
> 
> hdev is not used?
> 
> Thanks
> 
> 
> >+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >+    int ret;
> >+
> >+    ret = k->set_guest_notifiers(qbus->parent, nvqs, false);
> >+    if (ret < 0) {
> >+        error_report("Error reset guest notifier: %d", -ret);
> >+    }
> >+
> >+    return ret;
> >+}
> >+
> >  /* Test and clear event pending status.
> >   * Should be called after unmask to avoid losing events.
> >   */
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index b6c8ef5..8a95618 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -3812,6 +3812,19 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
> >      return virtio_bus_ioeventfd_enabled(vbus);
> >  }
> >+/*
> >+ * Check if set_guest_notifiers() method is set by the init routine.
> >+ * Return true if yes, otherwise return false.
> >+ */
> >+bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev)
> >+{
> >+    BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> >+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >+
> >+    return k->set_guest_notifiers;
> >+}
> >+
> >+
> >  static const TypeInfo virtio_device_info = {
> >      .name = TYPE_VIRTIO_DEVICE,
> >      .parent = TYPE_DEVICE,
> >diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> >index 085450c..4d0d2e2 100644
> >--- a/include/hw/virtio/vhost.h
> >+++ b/include/hw/virtio/vhost.h
> >@@ -100,6 +100,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
> >  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> >  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
> >  void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
> >+int vhost_dev_assign_guest_notifiers(struct vhost_dev *hdev,
> >+                                     VirtIODevice *vdev, int nvqs);
> >+int vhost_dev_drop_guest_notifiers(struct vhost_dev *hdev,
> >+                                   VirtIODevice *vdev, int nvqs);
> >  /* Test and clear masked event pending status.
> >   * Should be called after unmask to avoid losing events.
> >diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >index b69d517..d9a3d72 100644
> >--- a/include/hw/virtio/virtio.h
> >+++ b/include/hw/virtio/virtio.h
> >@@ -323,6 +323,7 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
> >                                                  VirtIOHandleAIOOutput handle_output);
> >  VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
> >  VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
> >+bool virtio_device_guest_notifiers_initialized(VirtIODevice *vdev);
> >  static inline void virtio_add_feature(uint64_t *features, unsigned int fbit)
> >  {
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-05-11  3:05   ` Jason Wang
@ 2020-05-11  9:11     ` Dima Stepanov
  2020-05-12  3:26       ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-11  9:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, yc-core, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, raphael.norwitz, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:
> 
> On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >Since disconnect can happen at any time during initialization not all
> >vring buffers (for instance used vring) can be intialized successfully.
> >If the buffer was not initialized then vhost_memory_unmap call will lead
> >to SIGSEGV. Add checks for the vring address value before calling unmap.
> >Also add assert() in the vhost_memory_unmap() routine.
> >
> >Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> >---
> >  hw/virtio/vhost.c | 27 +++++++++++++++++++++------
> >  1 file changed, 21 insertions(+), 6 deletions(-)
> >
> >diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >index ddbdc53..3ee50c4 100644
> >--- a/hw/virtio/vhost.c
> >+++ b/hw/virtio/vhost.c
> >@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> >                                 hwaddr len, int is_write,
> >                                 hwaddr access_len)
> >  {
> >+    assert(buffer);
> >+
> >      if (!vhost_dev_has_iommu(dev)) {
> >          cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> >      }
> >@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
> >                                                  vhost_vq_index);
> >      }
> >-    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
> >-                       1, virtio_queue_get_used_size(vdev, idx));
> >-    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
> >-                       0, virtio_queue_get_avail_size(vdev, idx));
> >-    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
> >-                       0, virtio_queue_get_desc_size(vdev, idx));
> >+    /*
> >+     * Since the vhost-user disconnect can happen during initialization
> >+     * check if vring was initialized, before making unmap.
> >+     */
> >+    if (vq->used) {
> >+        vhost_memory_unmap(dev, vq->used,
> >+                           virtio_queue_get_used_size(vdev, idx),
> >+                           1, virtio_queue_get_used_size(vdev, idx));
> >+    }
> >+    if (vq->avail) {
> >+        vhost_memory_unmap(dev, vq->avail,
> >+                           virtio_queue_get_avail_size(vdev, idx),
> >+                           0, virtio_queue_get_avail_size(vdev, idx));
> >+    }
> >+    if (vq->desc) {
> >+        vhost_memory_unmap(dev, vq->desc,
> >+                           virtio_queue_get_desc_size(vdev, idx),
> >+                           0, virtio_queue_get_desc_size(vdev, idx));
> >+    }
> 
> 
> Any reason not checking hdev->started instead? vhost_dev_start() will set it
> to true if virtqueues were correctly mapped.
> 
> Thanks
Well i see it a little bit different:
 - vhost_dev_start() sets hdev->started to true before starting
   virtqueues
 - vhost_virtqueue_start() maps all the memory
If we hit the vhost disconnect at the start of the
vhost_virtqueue_start(), for instance for this call:
  r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
Then we will call vhost_user_blk_disconnect:
  vhost_user_blk_disconnect()->
    vhost_user_blk_stop()->
      vhost_dev_stop()->
        vhost_virtqueue_stop()
As a result we will come in this routine with the hdev->started still
set to true, but if used/avail/desc fields still uninitialized and set
to 0.

> 
> 
> >  }
> >  static void vhost_eventfd_add(MemoryListener *listener,
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-11  3:15   ` Jason Wang
@ 2020-05-11  9:25     ` Dima Stepanov
  2020-05-12  3:32       ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-11  9:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, yc-core, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, raphael.norwitz, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> 
> On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >If vhost-user daemon is used as a backend for the vhost device, then we
> >should consider a possibility of disconnect at any moment. If such
> >disconnect happened in the vhost_migration_log() routine the vhost
> >device structure will be clean up.
> >At the start of the vhost_migration_log() function there is a check:
> >   if (!dev->started) {
> >       dev->log_enabled = enable;
> >       return 0;
> >   }
> >To be consistent with this check add the same check after calling the
> >vhost_dev_set_log() routine. This in general help not to break a
> >migration due the assert() message. But it looks like that this code
> >should be revised to handle these errors more carefully.
> >
> >In case of vhost-user device backend the fail paths should consider the
> >state of the device. In this case we should skip some function calls
> >during rollback on the error paths, so not to get the NULL dereference
> >errors.
> >
> >Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> >---
> >  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >  1 file changed, 35 insertions(+), 4 deletions(-)
> >
> >diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >index 3ee50c4..d5ab96d 100644
> >--- a/hw/virtio/vhost.c
> >+++ b/hw/virtio/vhost.c
> >@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >  {
> >      int r, i, idx;
> >+
> >+    if (!dev->started) {
> >+        /*
> >+         * If vhost-user daemon is used as a backend for the
> >+         * device and the connection is broken, then the vhost_dev
> >+         * structure will be reset all its values to 0.
> >+         * Add additional check for the device state.
> >+         */
> >+        return -1;
> >+    }
> >+
> >      r = vhost_dev_set_features(dev, enable_log);
> >      if (r < 0) {
> >          goto err_features;
> >@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >      }
> >      return 0;
> >  err_vq:
> >-    for (; i >= 0; --i) {
> >+    /*
> >+     * Disconnect with the vhost-user daemon can lead to the
> >+     * vhost_dev_cleanup() call which will clean up vhost_dev
> >+     * structure.
> >+     */
> >+    for (; dev->started && (i >= 0); --i) {
> >          idx = dev->vhost_ops->vhost_get_vq_index(
> 
> 
> Why need the check of dev->started here, can started be modified outside
> mainloop? If yes, I don't get the check of !dev->started in the beginning of
> this function.
> 
No dev->started can't change outside the mainloop. The main problem is
only for the vhost_user_blk daemon. Consider the case when we
successfully pass the dev->started check at the beginning of the
function, but after it we hit the disconnect on the next call on the
second or third iteration:
     r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
The unix socket backend device will call the disconnect routine for this
device and reset the structure. So the structure will be reset (and
dev->started set to false) inside this set_addr() call. So
we shouldn't call the clean up calls because this virtqueues were clean
up in the disconnect call. But we should protect these calls somehow, so
it will not hit SIGSEGV and we will be able to pass migration.

Just to summarize it:
For the vhost-user-blk devices we ca hit clean up calls twice in case of
vhost disconnect:
1. The first time during the disconnect process. The clean up is called
inside it.
2. The second time during roll back clean up.
So if it is the case we should skip p2.

> 
> >dev, dev->vq_index + i);
> >          vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> >                                   dev->log_enabled);
> >      }
> >-    vhost_dev_set_features(dev, dev->log_enabled);
> >+    if (dev->started) {
> >+        vhost_dev_set_features(dev, dev->log_enabled);
> >+    }
> >  err_features:
> >      return r;
> >  }
> >@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
> >      } else {
> >          vhost_dev_log_resize(dev, vhost_get_log_size(dev));
> >          r = vhost_dev_set_log(dev, true);
> >-        if (r < 0) {
> >+        /*
> >+         * The dev log resize can fail, because of disconnect
> >+         * with the vhost-user-blk daemon. Check the device
> >+         * state before calling the vhost_dev_set_log()
> >+         * function.
> >+         * Don't return error if device isn't started to be
> >+         * consistent with the check above.
> >+         */
> >+        if (dev->started && r < 0) {
> >              return r;
> >          }
> >      }
> >@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >  fail_log:
> >      vhost_log_put(hdev, false);
> >  fail_vq:
> >-    while (--i >= 0) {
> >+    /*
> >+     * Disconnect with the vhost-user daemon can lead to the
> >+     * vhost_dev_cleanup() call which will clean up vhost_dev
> >+     * structure.
> >+     */
> >+    while ((--i >= 0) && (hdev->started)) {
> >          vhost_virtqueue_stop(hdev,
> >                               vdev,
> >                               hdev->vqs + i,
> 
> 
> This should be a separate patch.
Do you mean i should split this patch to two patches?

Thanks.

> 
> Thanks
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-11  0:03       ` Raphael Norwitz
@ 2020-05-11  9:43         ` Dima Stepanov
  0 siblings, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-11  9:43 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: fam, kwolf, stefanha, qemu-block, mst, jasowang, qemu-devel,
	dgilbert, raphael.norwitz, arei.gonglei, fengli, yc-core,
	pbonzini, marcandre.lureau, mreitz

On Sun, May 10, 2020 at 08:03:39PM -0400, Raphael Norwitz wrote:
> On Thu, May 7, 2020 at 11:35 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> >
> > What do you think?
> >
> 
> Apologies - I tripped over the if (dev->started && r < 0) check.
> Never-mind my point with race conditions and failing migrations.
> 
> Rather than modifying vhost_dev_set_log(), it may be clearer to put a
> check after vhost_dev_log_resize()? Something like:
> 
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -829,11 +829,22 @@ static int vhost_migration_log(MemoryListener
> *listener, int enable)
>          vhost_log_put(dev, false);
>      } else {
>          vhost_dev_log_resize(dev, vhost_get_log_size(dev));
> +        /*
> +         * A device can be stopped because of backend disconnect inside
> +         * vhost_dev_log_resize(). In this case we should mark logging
> +         * enabled and return without attempting to set the backend
> +         * logging state.
> +         */
> +        if (!dev->started) {
> +            goto out_success;
> +        }
>          r = vhost_dev_set_log(dev, true);
>          if (r < 0) {
>              return r;
>          }
>      }
> +
> +out_success:
>      dev->log_enabled = enable;
>      return 0;
>  }
This patch will not fix all the issues. Consider the case than you will
hit disconnect inside vhost_dev_set_log. For instance for the 3rd
virtqueue, for the following call:
  vhost_virtqueue_set_addr(...)
Maybe i didn't explain very clearly the problem. The problem i've tried
to fix is only for the vhost-user-blk devices. This issue can be hit
during VHOST_USER commands "handshake". If we hit disconnect on any step
of this "handshake" then we will try to make clean up twice:
1. First during disconnect cleanup (unix socket backend).
2. Second as roll back for initialization.
If this is the case, then we shouldn't call p2, as everything was clean
up on p1. And the complicated thing is that there are several VHOST_USER
commands and we should consider the state after each. And even more,
initialization could fail because of some other reason and we hit
disconnect inside roll back clean up, in this case we should complete
clean up in the disconnect function and stop rolling back.

Hope it helps ).

> 
> This seems harmless enough to me, and I see how it fixes your
> particular crash, but I would still prefer we worked towards a more
> robust solution. In particular I think we could handle this inside
> vhost-user-blk if we let the device state persist between connections
> (i.e. call vhost_dev_cleanup() inside vhost_user_blk_connect() before
> vhost_dev_init() on reconnect). This should also fix some of the
> crashes Li Feng has hit, and probably others which haven’t been
> reported yet. What do you think?
Yes, this looks like a good direction. Because all my patches are only
workarounds and i believe there can be some other issues which haven't
been reported or will be introduced ).
I still think that these patches are good to submit and to think about
more complicated/refactoring solution as the next step.

> 
> If that’s unworkable I guess we will need to add these vhost level
> checks.
At least for now, i don't think its unworkable, i just think that it
will take some time to figure out how to refactor it properly. But the
SIGSEGV issue is real.

> In that case I would still prefer we add a “disconnected” flag
> in struct vhost_dev struct, and make sure it isn’t cleared by
> vhost_dev_cleanup(). That way we don’t conflate stopping a device with
> backend disconnect at the vhost level and potentially regress behavior
> for other device types.
It is also possible, but should be analyzed and properly tested. So as i
said it will take some time to figure out how to refactor it properly.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-05-11  9:11     ` Dima Stepanov
@ 2020-05-12  3:26       ` Jason Wang
  2020-05-12  9:08         ` Dima Stepanov
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-12  3:26 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, yc-core, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, raphael.norwitz, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz


On 2020/5/11 下午5:11, Dima Stepanov wrote:
> On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:
>> On 2020/4/30 下午9:36, Dima Stepanov wrote:
>>> Since disconnect can happen at any time during initialization not all
>>> vring buffers (for instance used vring) can be intialized successfully.
>>> If the buffer was not initialized then vhost_memory_unmap call will lead
>>> to SIGSEGV. Add checks for the vring address value before calling unmap.
>>> Also add assert() in the vhost_memory_unmap() routine.
>>>
>>> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
>>> ---
>>>   hw/virtio/vhost.c | 27 +++++++++++++++++++++------
>>>   1 file changed, 21 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index ddbdc53..3ee50c4 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
>>>                                  hwaddr len, int is_write,
>>>                                  hwaddr access_len)
>>>   {
>>> +    assert(buffer);
>>> +
>>>       if (!vhost_dev_has_iommu(dev)) {
>>>           cpu_physical_memory_unmap(buffer, len, is_write, access_len);
>>>       }
>>> @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>>>                                                   vhost_vq_index);
>>>       }
>>> -    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
>>> -                       1, virtio_queue_get_used_size(vdev, idx));
>>> -    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
>>> -                       0, virtio_queue_get_avail_size(vdev, idx));
>>> -    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
>>> -                       0, virtio_queue_get_desc_size(vdev, idx));
>>> +    /*
>>> +     * Since the vhost-user disconnect can happen during initialization
>>> +     * check if vring was initialized, before making unmap.
>>> +     */
>>> +    if (vq->used) {
>>> +        vhost_memory_unmap(dev, vq->used,
>>> +                           virtio_queue_get_used_size(vdev, idx),
>>> +                           1, virtio_queue_get_used_size(vdev, idx));
>>> +    }
>>> +    if (vq->avail) {
>>> +        vhost_memory_unmap(dev, vq->avail,
>>> +                           virtio_queue_get_avail_size(vdev, idx),
>>> +                           0, virtio_queue_get_avail_size(vdev, idx));
>>> +    }
>>> +    if (vq->desc) {
>>> +        vhost_memory_unmap(dev, vq->desc,
>>> +                           virtio_queue_get_desc_size(vdev, idx),
>>> +                           0, virtio_queue_get_desc_size(vdev, idx));
>>> +    }
>>
>> Any reason not checking hdev->started instead? vhost_dev_start() will set it
>> to true if virtqueues were correctly mapped.
>>
>> Thanks
> Well i see it a little bit different:
>   - vhost_dev_start() sets hdev->started to true before starting
>     virtqueues
>   - vhost_virtqueue_start() maps all the memory
> If we hit the vhost disconnect at the start of the
> vhost_virtqueue_start(), for instance for this call:
>    r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
> Then we will call vhost_user_blk_disconnect:
>    vhost_user_blk_disconnect()->
>      vhost_user_blk_stop()->
>        vhost_dev_stop()->
>          vhost_virtqueue_stop()
> As a result we will come in this routine with the hdev->started still
> set to true, but if used/avail/desc fields still uninitialized and set
> to 0.


I may miss something, but consider both vhost_dev_start() and 
vhost_user_blk_disconnect() were serialized in main loop. Can this 
really happen?

Thanks


>
>>
>>>   }
>>>   static void vhost_eventfd_add(MemoryListener *listener,



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-11  9:25     ` Dima Stepanov
@ 2020-05-12  3:32       ` Jason Wang
  2020-05-12  3:47         ` Li Feng
  2020-05-12  9:35         ` Dima Stepanov
  0 siblings, 2 replies; 51+ messages in thread
From: Jason Wang @ 2020-05-12  3:32 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, fengli, yc-core, pbonzini, marcandre.lureau,
	raphael.norwitz, mreitz


On 2020/5/11 下午5:25, Dima Stepanov wrote:
> On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
>> On 2020/4/30 下午9:36, Dima Stepanov wrote:
>>> If vhost-user daemon is used as a backend for the vhost device, then we
>>> should consider a possibility of disconnect at any moment. If such
>>> disconnect happened in the vhost_migration_log() routine the vhost
>>> device structure will be clean up.
>>> At the start of the vhost_migration_log() function there is a check:
>>>    if (!dev->started) {
>>>        dev->log_enabled = enable;
>>>        return 0;
>>>    }
>>> To be consistent with this check add the same check after calling the
>>> vhost_dev_set_log() routine. This in general help not to break a
>>> migration due the assert() message. But it looks like that this code
>>> should be revised to handle these errors more carefully.
>>>
>>> In case of vhost-user device backend the fail paths should consider the
>>> state of the device. In this case we should skip some function calls
>>> during rollback on the error paths, so not to get the NULL dereference
>>> errors.
>>>
>>> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
>>> ---
>>>   hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
>>>   1 file changed, 35 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index 3ee50c4..d5ab96d 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>>>   static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>>>   {
>>>       int r, i, idx;
>>> +
>>> +    if (!dev->started) {
>>> +        /*
>>> +         * If vhost-user daemon is used as a backend for the
>>> +         * device and the connection is broken, then the vhost_dev
>>> +         * structure will be reset all its values to 0.
>>> +         * Add additional check for the device state.
>>> +         */
>>> +        return -1;
>>> +    }
>>> +
>>>       r = vhost_dev_set_features(dev, enable_log);
>>>       if (r < 0) {
>>>           goto err_features;
>>> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>>>       }
>>>       return 0;
>>>   err_vq:
>>> -    for (; i >= 0; --i) {
>>> +    /*
>>> +     * Disconnect with the vhost-user daemon can lead to the
>>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
>>> +     * structure.
>>> +     */
>>> +    for (; dev->started && (i >= 0); --i) {
>>>           idx = dev->vhost_ops->vhost_get_vq_index(
>>
>> Why need the check of dev->started here, can started be modified outside
>> mainloop? If yes, I don't get the check of !dev->started in the beginning of
>> this function.
>>
> No dev->started can't change outside the mainloop. The main problem is
> only for the vhost_user_blk daemon. Consider the case when we
> successfully pass the dev->started check at the beginning of the
> function, but after it we hit the disconnect on the next call on the
> second or third iteration:
>       r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> The unix socket backend device will call the disconnect routine for this
> device and reset the structure. So the structure will be reset (and
> dev->started set to false) inside this set_addr() call.


I still don't get here. I think the disconnect can not happen in the 
middle of vhost_dev_set_log() since both of them were running in 
mainloop. And even if it can, we probably need other synchronization 
mechanism other than simple check here.


>   So
> we shouldn't call the clean up calls because this virtqueues were clean
> up in the disconnect call. But we should protect these calls somehow, so
> it will not hit SIGSEGV and we will be able to pass migration.
>
> Just to summarize it:
> For the vhost-user-blk devices we ca hit clean up calls twice in case of
> vhost disconnect:
> 1. The first time during the disconnect process. The clean up is called
> inside it.
> 2. The second time during roll back clean up.
> So if it is the case we should skip p2.
>
>>> dev, dev->vq_index + i);
>>>           vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
>>>                                    dev->log_enabled);
>>>       }
>>> -    vhost_dev_set_features(dev, dev->log_enabled);
>>> +    if (dev->started) {
>>> +        vhost_dev_set_features(dev, dev->log_enabled);
>>> +    }
>>>   err_features:
>>>       return r;
>>>   }
>>> @@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
>>>       } else {
>>>           vhost_dev_log_resize(dev, vhost_get_log_size(dev));
>>>           r = vhost_dev_set_log(dev, true);
>>> -        if (r < 0) {
>>> +        /*
>>> +         * The dev log resize can fail, because of disconnect
>>> +         * with the vhost-user-blk daemon. Check the device
>>> +         * state before calling the vhost_dev_set_log()
>>> +         * function.
>>> +         * Don't return error if device isn't started to be
>>> +         * consistent with the check above.
>>> +         */
>>> +        if (dev->started && r < 0) {
>>>               return r;
>>>           }
>>>       }
>>> @@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>>>   fail_log:
>>>       vhost_log_put(hdev, false);
>>>   fail_vq:
>>> -    while (--i >= 0) {
>>> +    /*
>>> +     * Disconnect with the vhost-user daemon can lead to the
>>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
>>> +     * structure.
>>> +     */
>>> +    while ((--i >= 0) && (hdev->started)) {
>>>           vhost_virtqueue_stop(hdev,
>>>                                vdev,
>>>                                hdev->vqs + i,
>>
>> This should be a separate patch.
> Do you mean i should split this patch to two patches?


Yes.

Thanks


>
> Thanks.
>
>> Thanks
>>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-12  3:32       ` Jason Wang
@ 2020-05-12  3:47         ` Li Feng
  2020-05-12  9:23           ` Dima Stepanov
  2020-05-12  9:35         ` Dima Stepanov
  1 sibling, 1 reply; 51+ messages in thread
From: Li Feng @ 2020-05-12  3:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin,
	open list:All patches CC here, Dr. David Alan Gilbert, Gonglei,
	yc-core, Paolo Bonzini, Marc-André Lureau, Raphael Norwitz,
	Dima Stepanov, Max Reitz

Hi, Dima.

If vhost_migration_log return < 0, then vhost_log_global_start will
trigger a crash.
Does your patch have process this abort?
If a disconnect happens in the migration stage, the correct operation
is to stop the migration, right?

 841 static void vhost_log_global_start(MemoryListener *listener)
 842 {
 843     int r;
 844
 845     r = vhost_migration_log(listener, true);
 846     if (r < 0) {
 847         abort();
 848     }
 849 }

Thanks,

Feng Li

Jason Wang <jasowang@redhat.com> 于2020年5月12日周二 上午11:33写道:
>
>
> On 2020/5/11 下午5:25, Dima Stepanov wrote:
> > On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> >> On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>> If vhost-user daemon is used as a backend for the vhost device, then we
> >>> should consider a possibility of disconnect at any moment. If such
> >>> disconnect happened in the vhost_migration_log() routine the vhost
> >>> device structure will be clean up.
> >>> At the start of the vhost_migration_log() function there is a check:
> >>>    if (!dev->started) {
> >>>        dev->log_enabled = enable;
> >>>        return 0;
> >>>    }
> >>> To be consistent with this check add the same check after calling the
> >>> vhost_dev_set_log() routine. This in general help not to break a
> >>> migration due the assert() message. But it looks like that this code
> >>> should be revised to handle these errors more carefully.
> >>>
> >>> In case of vhost-user device backend the fail paths should consider the
> >>> state of the device. In this case we should skip some function calls
> >>> during rollback on the error paths, so not to get the NULL dereference
> >>> errors.
> >>>
> >>> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> >>> ---
> >>>   hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >>>   1 file changed, 35 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>> index 3ee50c4..d5ab96d 100644
> >>> --- a/hw/virtio/vhost.c
> >>> +++ b/hw/virtio/vhost.c
> >>> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >>>   static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>   {
> >>>       int r, i, idx;
> >>> +
> >>> +    if (!dev->started) {
> >>> +        /*
> >>> +         * If vhost-user daemon is used as a backend for the
> >>> +         * device and the connection is broken, then the vhost_dev
> >>> +         * structure will be reset all its values to 0.
> >>> +         * Add additional check for the device state.
> >>> +         */
> >>> +        return -1;
> >>> +    }
> >>> +
> >>>       r = vhost_dev_set_features(dev, enable_log);
> >>>       if (r < 0) {
> >>>           goto err_features;
> >>> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>       }
> >>>       return 0;
> >>>   err_vq:
> >>> -    for (; i >= 0; --i) {
> >>> +    /*
> >>> +     * Disconnect with the vhost-user daemon can lead to the
> >>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
> >>> +     * structure.
> >>> +     */
> >>> +    for (; dev->started && (i >= 0); --i) {
> >>>           idx = dev->vhost_ops->vhost_get_vq_index(
> >>
> >> Why need the check of dev->started here, can started be modified outside
> >> mainloop? If yes, I don't get the check of !dev->started in the beginning of
> >> this function.
> >>
> > No dev->started can't change outside the mainloop. The main problem is
> > only for the vhost_user_blk daemon. Consider the case when we
> > successfully pass the dev->started check at the beginning of the
> > function, but after it we hit the disconnect on the next call on the
> > second or third iteration:
> >       r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> > The unix socket backend device will call the disconnect routine for this
> > device and reset the structure. So the structure will be reset (and
> > dev->started set to false) inside this set_addr() call.
>
>
> I still don't get here. I think the disconnect can not happen in the
> middle of vhost_dev_set_log() since both of them were running in
> mainloop. And even if it can, we probably need other synchronization
> mechanism other than simple check here.
>
>
> >   So
> > we shouldn't call the clean up calls because this virtqueues were clean
> > up in the disconnect call. But we should protect these calls somehow, so
> > it will not hit SIGSEGV and we will be able to pass migration.
> >
> > Just to summarize it:
> > For the vhost-user-blk devices we ca hit clean up calls twice in case of
> > vhost disconnect:
> > 1. The first time during the disconnect process. The clean up is called
> > inside it.
> > 2. The second time during roll back clean up.
> > So if it is the case we should skip p2.
> >
> >>> dev, dev->vq_index + i);
> >>>           vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> >>>                                    dev->log_enabled);
> >>>       }
> >>> -    vhost_dev_set_features(dev, dev->log_enabled);
> >>> +    if (dev->started) {
> >>> +        vhost_dev_set_features(dev, dev->log_enabled);
> >>> +    }
> >>>   err_features:
> >>>       return r;
> >>>   }
> >>> @@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
> >>>       } else {
> >>>           vhost_dev_log_resize(dev, vhost_get_log_size(dev));
> >>>           r = vhost_dev_set_log(dev, true);
> >>> -        if (r < 0) {
> >>> +        /*
> >>> +         * The dev log resize can fail, because of disconnect
> >>> +         * with the vhost-user-blk daemon. Check the device
> >>> +         * state before calling the vhost_dev_set_log()
> >>> +         * function.
> >>> +         * Don't return error if device isn't started to be
> >>> +         * consistent with the check above.
> >>> +         */
> >>> +        if (dev->started && r < 0) {
> >>>               return r;
> >>>           }
> >>>       }
> >>> @@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >>>   fail_log:
> >>>       vhost_log_put(hdev, false);
> >>>   fail_vq:
> >>> -    while (--i >= 0) {
> >>> +    /*
> >>> +     * Disconnect with the vhost-user daemon can lead to the
> >>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
> >>> +     * structure.
> >>> +     */
> >>> +    while ((--i >= 0) && (hdev->started)) {
> >>>           vhost_virtqueue_stop(hdev,
> >>>                                vdev,
> >>>                                hdev->vqs + i,
> >>
> >> This should be a separate patch.
> > Do you mean i should split this patch to two patches?
>
>
> Yes.
>
> Thanks
>
>
> >
> > Thanks.
> >
> >> Thanks
> >>
>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-05-12  3:26       ` Jason Wang
@ 2020-05-12  9:08         ` Dima Stepanov
  2020-05-13  3:00           ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-12  9:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, yc-core, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, raphael.norwitz, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote:
> 
> On 2020/5/11 下午5:11, Dima Stepanov wrote:
> >On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:
> >>On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>>Since disconnect can happen at any time during initialization not all
> >>>vring buffers (for instance used vring) can be intialized successfully.
> >>>If the buffer was not initialized then vhost_memory_unmap call will lead
> >>>to SIGSEGV. Add checks for the vring address value before calling unmap.
> >>>Also add assert() in the vhost_memory_unmap() routine.
> >>>
> >>>Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> >>>---
> >>>  hw/virtio/vhost.c | 27 +++++++++++++++++++++------
> >>>  1 file changed, 21 insertions(+), 6 deletions(-)
> >>>
> >>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>index ddbdc53..3ee50c4 100644
> >>>--- a/hw/virtio/vhost.c
> >>>+++ b/hw/virtio/vhost.c
> >>>@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> >>>                                 hwaddr len, int is_write,
> >>>                                 hwaddr access_len)
> >>>  {
> >>>+    assert(buffer);
> >>>+
> >>>      if (!vhost_dev_has_iommu(dev)) {
> >>>          cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> >>>      }
> >>>@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
> >>>                                                  vhost_vq_index);
> >>>      }
> >>>-    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
> >>>-                       1, virtio_queue_get_used_size(vdev, idx));
> >>>-    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
> >>>-                       0, virtio_queue_get_avail_size(vdev, idx));
> >>>-    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
> >>>-                       0, virtio_queue_get_desc_size(vdev, idx));
> >>>+    /*
> >>>+     * Since the vhost-user disconnect can happen during initialization
> >>>+     * check if vring was initialized, before making unmap.
> >>>+     */
> >>>+    if (vq->used) {
> >>>+        vhost_memory_unmap(dev, vq->used,
> >>>+                           virtio_queue_get_used_size(vdev, idx),
> >>>+                           1, virtio_queue_get_used_size(vdev, idx));
> >>>+    }
> >>>+    if (vq->avail) {
> >>>+        vhost_memory_unmap(dev, vq->avail,
> >>>+                           virtio_queue_get_avail_size(vdev, idx),
> >>>+                           0, virtio_queue_get_avail_size(vdev, idx));
> >>>+    }
> >>>+    if (vq->desc) {
> >>>+        vhost_memory_unmap(dev, vq->desc,
> >>>+                           virtio_queue_get_desc_size(vdev, idx),
> >>>+                           0, virtio_queue_get_desc_size(vdev, idx));
> >>>+    }
> >>
> >>Any reason not checking hdev->started instead? vhost_dev_start() will set it
> >>to true if virtqueues were correctly mapped.
> >>
> >>Thanks
> >Well i see it a little bit different:
> >  - vhost_dev_start() sets hdev->started to true before starting
> >    virtqueues
> >  - vhost_virtqueue_start() maps all the memory
> >If we hit the vhost disconnect at the start of the
> >vhost_virtqueue_start(), for instance for this call:
> >   r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
> >Then we will call vhost_user_blk_disconnect:
> >   vhost_user_blk_disconnect()->
> >     vhost_user_blk_stop()->
> >       vhost_dev_stop()->
> >         vhost_virtqueue_stop()
> >As a result we will come in this routine with the hdev->started still
> >set to true, but if used/avail/desc fields still uninitialized and set
> >to 0.
> 
> 
> I may miss something, but consider both vhost_dev_start() and
> vhost_user_blk_disconnect() were serialized in main loop. Can this really
> happen?
Yes, consider the case when we start the vhost-user-blk device:
  vhost_dev_start->
    vhost_virtqueue_start
And we got a disconnect in the middle of vhost_virtqueue_start()
routine, for instance:
  1000     vq->num = state.num = virtio_queue_get_num(vdev, idx);
  1001     r = dev->vhost_ops->vhost_set_vring_num(dev, &state);
  1002     if (r) {
  1003         VHOST_OPS_DEBUG("vhost_set_vring_num failed");
  1004         return -errno;
  1005     }
  --> Here we got a disconnect <--
  1006 
  1007     state.num = virtio_queue_get_last_avail_idx(vdev, idx);
  1008     r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
  1009     if (r) {
  1010         VHOST_OPS_DEBUG("vhost_set_vring_base failed");
  1011         return -errno;
  1012     }
As a result call to vhost_set_vring_base will call the disconnect
routine. The backtrace log for SIGSEGV is as follows:
  Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7ffff2ea9700 (LWP 183150)]
  0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  (gdb) bt
  #0  0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x000055555590fd90 in flatview_write_continue (fv=0x7fffec4a2600,
      addr=0, attrs=..., ptr=0x0, len=1028, addr1=0, 
      l=1028, mr=0x555556b1b310) at ./exec.c:3142
  #2  0x000055555590fe98 in flatview_write (fv=0x7fffec4a2600, addr=0,
      attrs=..., buf=0x0, len=1028) at ./exec.c:3177
  #3  0x00005555559101ed in address_space_write (as=0x555556893940
      <address_space_memory>, addr=0, attrs=..., buf=0x0, 
      len=1028) at ./exec.c:3268
  #4  0x0000555555910caf in address_space_unmap (as=0x555556893940
      <address_space_memory>, buffer=0x0, len=1028, 
      is_write=true, access_len=1028) at ./exec.c:3592
  #5  0x0000555555910d82 in cpu_physical_memory_unmap (buffer=0x0,
      len=1028, is_write=true, access_len=1028) at ./exec.c:3613
  #6  0x0000555555a16fa1 in vhost_memory_unmap (dev=0x7ffff22723e8,
      buffer=0x0, len=1028, is_write=1, access_len=1028)
      at ./hw/virtio/vhost.c:318
  #7  0x0000555555a192a2 in vhost_virtqueue_stop (dev=0x7ffff22723e8,
      vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1136
  #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8,
      vdev=0x7ffff22721a0) at ./hw/virtio/vhost.c:1702
  #9  0x00005555559b6532 in vhost_user_blk_stop (vdev=0x7ffff22721a0)
      at ./hw/block/vhost-user-blk.c:196
  #10 0x00005555559b6b73 in vhost_user_blk_disconnect (dev=0x7ffff22721a0)
      at ./hw/block/vhost-user-blk.c:365
  #11 0x00005555559b6c4f in vhost_user_blk_event (opaque=0x7ffff22721a0,
      event=CHR_EVENT_CLOSED) at ./hw/block/vhost-user-blk.c:384
  #12 0x0000555555e65f7e in chr_be_event (s=0x555556b182e0, event=CHR_EVENT_CLOSED)
      at chardev/char.c:60
  #13 0x0000555555e6601a in qemu_chr_be_event (s=0x555556b182e0,
      event=CHR_EVENT_CLOSED) at chardev/char.c:80
  #14 0x0000555555e6eef3 in tcp_chr_disconnect_locked (chr=0x555556b182e0)
      at chardev/char-socket.c:488
  #15 0x0000555555e6e23f in tcp_chr_write (chr=0x555556b182e0,
      buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-socket.c:178
  #16 0x0000555555e6616c in qemu_chr_write_buffer (s=0x555556b182e0,
      buf=0x7ffff2ea8220 "\n", len=20, offset=0x7ffff2ea8150, write_all=true)
      at chardev/char.c:120
  #17 0x0000555555e662d9 in qemu_chr_write (s=0x555556b182e0, buf=0x7ffff2ea8220 "\n",
      len=20, write_all=true) at chardev/char.c:155
  #18 0x0000555555e693cc in qemu_chr_fe_write_all (be=0x7ffff2272360,
      buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-fe.c:53
  #19 0x0000555555a1c489 in vhost_user_write (dev=0x7ffff22723e8,
      msg=0x7ffff2ea8220, fds=0x0, fd_num=0) at ./hw/virtio/vhost-user.c:350
  #20 0x0000555555a1d325 in vhost_set_vring (dev=0x7ffff22723e8,
      request=10, ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:660
  #21 0x0000555555a1d4c6 in vhost_user_set_vring_base (dev=0x7ffff22723e8,
      ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:704
  #22 0x0000555555a18c1b in vhost_virtqueue_start (dev=0x7ffff22723e8,
      vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1009
  #23 0x0000555555a1a9f5 in vhost_dev_start (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
      at ./hw/virtio/vhost.c:1639
  #24 0x00005555559b6367 in vhost_user_blk_start (vdev=0x7ffff22721a0)
      at ./hw/block/vhost-user-blk.c:150
  #25 0x00005555559b6653 in vhost_user_blk_set_status (vdev=0x7ffff22721a0, status=15 '\017')
      at ./hw/block/vhost-user-blk.c:233
  #26 0x0000555555a1072d in virtio_set_status (vdev=0x7ffff22721a0, val=15 '\017')
      at ./hw/virtio/virtio.c:1956
---Type <return> to continue, or q <return> to quit---

So while we inside vhost_user_blk_start() (frame#24) we are calling
vhost_user_blk_disconnect() (frame#10). And for this call the
dev->started field will be set to true.
  (gdb) frame 8
  #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
      at ./hw/virtio/vhost.c:1702
  1702            vhost_virtqueue_stop(hdev,
  (gdb) p hdev->started
  $1 = true

It isn't an easy race to reproduce: to hit a disconnect at this window.
We were able to hit it during long testing on one of the reconnect
iteration. After it we were able to reproduce it with 100% by adding a
sleep() call inside qemu to make the race window bigger. 

> 
> Thanks
> 
> 
> >
> >>
> >>>  }
> >>>  static void vhost_eventfd_add(MemoryListener *listener,
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-12  3:47         ` Li Feng
@ 2020-05-12  9:23           ` Dima Stepanov
  0 siblings, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-12  9:23 UTC (permalink / raw)
  To: Li Feng
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin, Jason Wang,
	open list:All patches CC here, Dr. David Alan Gilbert, Gonglei,
	yc-core, Paolo Bonzini, Marc-André Lureau, Raphael Norwitz,
	Max Reitz

On Tue, May 12, 2020 at 11:47:34AM +0800, Li Feng wrote:
> Hi, Dima.
> 
> If vhost_migration_log return < 0, then vhost_log_global_start will
> trigger a crash.
> Does your patch have process this abort?
> If a disconnect happens in the migration stage, the correct operation
> is to stop the migration, right?
> 
>  841 static void vhost_log_global_start(MemoryListener *listener)
>  842 {
>  843     int r;
>  844
>  845     r = vhost_migration_log(listener, true);
>  846     if (r < 0) {
>  847         abort();
>  848     }
>  849 }
Yes, my patch process it by not returning an error ). That is one of the
point we've talked about with Raphael and Michael in this thread. First
of all in my patches i'm still following the same logic which has been
already in upstream ./hw/virtio/vhost.c:vhost_migration_log():
  ...
  820     if (!dev->started) {
  821         dev->log_enabled = enable;
  822         return 0;
  823     }
  ...
It means, that if device not started, then continue migration without
returning any error. So i followed the same logic, if we got a
disconnect, then it will mean that device isn't started and we can
continue migration. As a result no error is returned and assert() isn't
hit.
Also there is a question from Raphael to Michael about it you can find
it in this thread, by i will add it also:
  > Subject: Re: [PATCH v2 5/5] vhost: add device started check in
  > migration set log

  > On Wed, May 06, 2020 at 06:08:34PM -0400, Raphael Norwitz wrote:
  >> In particular, we need to decide whether a migration should be
  >> allowed to continue if a device disconnects durning the migration
  >> stage.
  >>
  >> mst, any thoughts?

  > Why not? It can't change state while disconnected, so it just makes
  > things easier.

So it looks like a correct way to handle it. Also our internal tests
passed. Some words about our tests:
  - run src VM with vhost-usr-blk daemon used
  - run fio inside it
  - perform reconnect every X seconds (just kill and restart
    daemon), X is random
  - run dst VM
  - perform migration
  - fio should complete in dst VM
And we cycle this test like forever. At least for now we see no new
issues.

No other comments mixed in below.

> 
> Thanks,
> 
> Feng Li
> 
> Jason Wang <jasowang@redhat.com> 于2020年5月12日周二 上午11:33写道:
> >
> >
> > On 2020/5/11 下午5:25, Dima Stepanov wrote:
> > > On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> > >> On 2020/4/30 下午9:36, Dima Stepanov wrote:
> > >>> If vhost-user daemon is used as a backend for the vhost device, then we
> > >>> should consider a possibility of disconnect at any moment. If such
> > >>> disconnect happened in the vhost_migration_log() routine the vhost
> > >>> device structure will be clean up.
> > >>> At the start of the vhost_migration_log() function there is a check:
> > >>>    if (!dev->started) {
> > >>>        dev->log_enabled = enable;
> > >>>        return 0;
> > >>>    }
> > >>> To be consistent with this check add the same check after calling the
> > >>> vhost_dev_set_log() routine. This in general help not to break a
> > >>> migration due the assert() message. But it looks like that this code
> > >>> should be revised to handle these errors more carefully.
> > >>>
> > >>> In case of vhost-user device backend the fail paths should consider the
> > >>> state of the device. In this case we should skip some function calls
> > >>> during rollback on the error paths, so not to get the NULL dereference
> > >>> errors.
> > >>>
> > >>> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > >>> ---
> > >>>   hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> > >>>   1 file changed, 35 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > >>> index 3ee50c4..d5ab96d 100644
> > >>> --- a/hw/virtio/vhost.c
> > >>> +++ b/hw/virtio/vhost.c
> > >>> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> > >>>   static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> > >>>   {
> > >>>       int r, i, idx;
> > >>> +
> > >>> +    if (!dev->started) {
> > >>> +        /*
> > >>> +         * If vhost-user daemon is used as a backend for the
> > >>> +         * device and the connection is broken, then the vhost_dev
> > >>> +         * structure will be reset all its values to 0.
> > >>> +         * Add additional check for the device state.
> > >>> +         */
> > >>> +        return -1;
> > >>> +    }
> > >>> +
> > >>>       r = vhost_dev_set_features(dev, enable_log);
> > >>>       if (r < 0) {
> > >>>           goto err_features;
> > >>> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> > >>>       }
> > >>>       return 0;
> > >>>   err_vq:
> > >>> -    for (; i >= 0; --i) {
> > >>> +    /*
> > >>> +     * Disconnect with the vhost-user daemon can lead to the
> > >>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
> > >>> +     * structure.
> > >>> +     */
> > >>> +    for (; dev->started && (i >= 0); --i) {
> > >>>           idx = dev->vhost_ops->vhost_get_vq_index(
> > >>
> > >> Why need the check of dev->started here, can started be modified outside
> > >> mainloop? If yes, I don't get the check of !dev->started in the beginning of
> > >> this function.
> > >>
> > > No dev->started can't change outside the mainloop. The main problem is
> > > only for the vhost_user_blk daemon. Consider the case when we
> > > successfully pass the dev->started check at the beginning of the
> > > function, but after it we hit the disconnect on the next call on the
> > > second or third iteration:
> > >       r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> > > The unix socket backend device will call the disconnect routine for this
> > > device and reset the structure. So the structure will be reset (and
> > > dev->started set to false) inside this set_addr() call.
> >
> >
> > I still don't get here. I think the disconnect can not happen in the
> > middle of vhost_dev_set_log() since both of them were running in
> > mainloop. And even if it can, we probably need other synchronization
> > mechanism other than simple check here.
> >
> >
> > >   So
> > > we shouldn't call the clean up calls because this virtqueues were clean
> > > up in the disconnect call. But we should protect these calls somehow, so
> > > it will not hit SIGSEGV and we will be able to pass migration.
> > >
> > > Just to summarize it:
> > > For the vhost-user-blk devices we ca hit clean up calls twice in case of
> > > vhost disconnect:
> > > 1. The first time during the disconnect process. The clean up is called
> > > inside it.
> > > 2. The second time during roll back clean up.
> > > So if it is the case we should skip p2.
> > >
> > >>> dev, dev->vq_index + i);
> > >>>           vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> > >>>                                    dev->log_enabled);
> > >>>       }
> > >>> -    vhost_dev_set_features(dev, dev->log_enabled);
> > >>> +    if (dev->started) {
> > >>> +        vhost_dev_set_features(dev, dev->log_enabled);
> > >>> +    }
> > >>>   err_features:
> > >>>       return r;
> > >>>   }
> > >>> @@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
> > >>>       } else {
> > >>>           vhost_dev_log_resize(dev, vhost_get_log_size(dev));
> > >>>           r = vhost_dev_set_log(dev, true);
> > >>> -        if (r < 0) {
> > >>> +        /*
> > >>> +         * The dev log resize can fail, because of disconnect
> > >>> +         * with the vhost-user-blk daemon. Check the device
> > >>> +         * state before calling the vhost_dev_set_log()
> > >>> +         * function.
> > >>> +         * Don't return error if device isn't started to be
> > >>> +         * consistent with the check above.
> > >>> +         */
> > >>> +        if (dev->started && r < 0) {
> > >>>               return r;
> > >>>           }
> > >>>       }
> > >>> @@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> > >>>   fail_log:
> > >>>       vhost_log_put(hdev, false);
> > >>>   fail_vq:
> > >>> -    while (--i >= 0) {
> > >>> +    /*
> > >>> +     * Disconnect with the vhost-user daemon can lead to the
> > >>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
> > >>> +     * structure.
> > >>> +     */
> > >>> +    while ((--i >= 0) && (hdev->started)) {
> > >>>           vhost_virtqueue_stop(hdev,
> > >>>                                vdev,
> > >>>                                hdev->vqs + i,
> > >>
> > >> This should be a separate patch.
> > > Do you mean i should split this patch to two patches?
> >
> >
> > Yes.
> >
> > Thanks
> >
> >
> > >
> > > Thanks.
> > >
> > >> Thanks
> > >>
> >


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-12  3:32       ` Jason Wang
  2020-05-12  3:47         ` Li Feng
@ 2020-05-12  9:35         ` Dima Stepanov
  2020-05-13  3:20           ` Jason Wang
  2020-05-13  4:15           ` Michael S. Tsirkin
  1 sibling, 2 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-12  9:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, stefanha, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, fengli, yc-core, pbonzini, marcandre.lureau,
	raphael.norwitz, mreitz

On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:
> 
> On 2020/5/11 下午5:25, Dima Stepanov wrote:
> >On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> >>On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>>If vhost-user daemon is used as a backend for the vhost device, then we
> >>>should consider a possibility of disconnect at any moment. If such
> >>>disconnect happened in the vhost_migration_log() routine the vhost
> >>>device structure will be clean up.
> >>>At the start of the vhost_migration_log() function there is a check:
> >>>   if (!dev->started) {
> >>>       dev->log_enabled = enable;
> >>>       return 0;
> >>>   }
> >>>To be consistent with this check add the same check after calling the
> >>>vhost_dev_set_log() routine. This in general help not to break a
> >>>migration due the assert() message. But it looks like that this code
> >>>should be revised to handle these errors more carefully.
> >>>
> >>>In case of vhost-user device backend the fail paths should consider the
> >>>state of the device. In this case we should skip some function calls
> >>>during rollback on the error paths, so not to get the NULL dereference
> >>>errors.
> >>>
> >>>Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> >>>---
> >>>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >>>  1 file changed, 35 insertions(+), 4 deletions(-)
> >>>
> >>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>index 3ee50c4..d5ab96d 100644
> >>>--- a/hw/virtio/vhost.c
> >>>+++ b/hw/virtio/vhost.c
> >>>@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >>>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>  {
> >>>      int r, i, idx;
> >>>+
> >>>+    if (!dev->started) {
> >>>+        /*
> >>>+         * If vhost-user daemon is used as a backend for the
> >>>+         * device and the connection is broken, then the vhost_dev
> >>>+         * structure will be reset all its values to 0.
> >>>+         * Add additional check for the device state.
> >>>+         */
> >>>+        return -1;
> >>>+    }
> >>>+
> >>>      r = vhost_dev_set_features(dev, enable_log);
> >>>      if (r < 0) {
> >>>          goto err_features;
> >>>@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>      }
> >>>      return 0;
> >>>  err_vq:
> >>>-    for (; i >= 0; --i) {
> >>>+    /*
> >>>+     * Disconnect with the vhost-user daemon can lead to the
> >>>+     * vhost_dev_cleanup() call which will clean up vhost_dev
> >>>+     * structure.
> >>>+     */
> >>>+    for (; dev->started && (i >= 0); --i) {
> >>>          idx = dev->vhost_ops->vhost_get_vq_index(
> >>
> >>Why need the check of dev->started here, can started be modified outside
> >>mainloop? If yes, I don't get the check of !dev->started in the beginning of
> >>this function.
> >>
> >No dev->started can't change outside the mainloop. The main problem is
> >only for the vhost_user_blk daemon. Consider the case when we
> >successfully pass the dev->started check at the beginning of the
> >function, but after it we hit the disconnect on the next call on the
> >second or third iteration:
> >      r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> >The unix socket backend device will call the disconnect routine for this
> >device and reset the structure. So the structure will be reset (and
> >dev->started set to false) inside this set_addr() call.
> 
> 
> I still don't get here. I think the disconnect can not happen in the middle
> of vhost_dev_set_log() since both of them were running in mainloop. And even
> if it can, we probably need other synchronization mechanism other than
> simple check here.
Disconnect isn't happened in the separate thread it is happened in this
routine inside vhost_dev_set_log. When for instance vhost_user_write()
call failed:
  vhost_user_set_log_base()
    vhost_user_write()
      vhost_user_blk_disconnect()
        vhost_dev_cleanup()
          vhost_user_backend_cleanup()
So the point is that if we somehow got a disconnect with the
vhost-user-blk daemon before the vhost_user_write() call then it will
continue clean up by running vhost_user_blk_disconnect() function. I
wrote a more detailed backtrace stack in the separate thread, which is
pretty similar to what we have here:
  Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
The places are different but the problem is pretty similar.

So if vhost-user commands handshake then everything is fine and
reconnect will work as expected. The only problem is how to handle
reconnect properly between vhost-user command send/receive.

As i wrote we have a test:
  - run src VM with vhost-usr-blk daemon used
  - run fio inside it
  - perform reconnect every X seconds (just kill and restart daemon),
    X is random
  - run dst VM
  - perform migration
  - fio should complete in dst VM
And we cycle this test like forever.
So it fails once per ~25 iteration. By adding some delays inside qemu we
were able to make the race window larger.

> 
> 
> >  So
> >we shouldn't call the clean up calls because this virtqueues were clean
> >up in the disconnect call. But we should protect these calls somehow, so
> >it will not hit SIGSEGV and we will be able to pass migration.
> >
> >Just to summarize it:
> >For the vhost-user-blk devices we ca hit clean up calls twice in case of
> >vhost disconnect:
> >1. The first time during the disconnect process. The clean up is called
> >inside it.
> >2. The second time during roll back clean up.
> >So if it is the case we should skip p2.
> >
> >>>dev, dev->vq_index + i);
> >>>          vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> >>>                                   dev->log_enabled);
> >>>      }
> >>>-    vhost_dev_set_features(dev, dev->log_enabled);
> >>>+    if (dev->started) {
> >>>+        vhost_dev_set_features(dev, dev->log_enabled);
> >>>+    }
> >>>  err_features:
> >>>      return r;
> >>>  }
> >>>@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
> >>>      } else {
> >>>          vhost_dev_log_resize(dev, vhost_get_log_size(dev));
> >>>          r = vhost_dev_set_log(dev, true);
> >>>-        if (r < 0) {
> >>>+        /*
> >>>+         * The dev log resize can fail, because of disconnect
> >>>+         * with the vhost-user-blk daemon. Check the device
> >>>+         * state before calling the vhost_dev_set_log()
> >>>+         * function.
> >>>+         * Don't return error if device isn't started to be
> >>>+         * consistent with the check above.
> >>>+         */
> >>>+        if (dev->started && r < 0) {
> >>>              return r;
> >>>          }
> >>>      }
> >>>@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >>>  fail_log:
> >>>      vhost_log_put(hdev, false);
> >>>  fail_vq:
> >>>-    while (--i >= 0) {
> >>>+    /*
> >>>+     * Disconnect with the vhost-user daemon can lead to the
> >>>+     * vhost_dev_cleanup() call which will clean up vhost_dev
> >>>+     * structure.
> >>>+     */
> >>>+    while ((--i >= 0) && (hdev->started)) {
> >>>          vhost_virtqueue_stop(hdev,
> >>>                               vdev,
> >>>                               hdev->vqs + i,
> >>
> >>This should be a separate patch.
> >Do you mean i should split this patch to two patches?
> 
> 
> Yes.
> 
> Thanks

Got it. Will do it in v3.

No other comments mixed in below.

Thanks.

> 
> 
> >
> >Thanks.
> >
> >>Thanks
> >>
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-05-12  9:08         ` Dima Stepanov
@ 2020-05-13  3:00           ` Jason Wang
  2020-05-13  9:36             ` Dima Stepanov
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-13  3:00 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, yc-core, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, raphael.norwitz, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz


On 2020/5/12 下午5:08, Dima Stepanov wrote:
> On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote:
>> On 2020/5/11 下午5:11, Dima Stepanov wrote:
>>> On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:
>>>> On 2020/4/30 下午9:36, Dima Stepanov wrote:
>>>>> Since disconnect can happen at any time during initialization not all
>>>>> vring buffers (for instance used vring) can be intialized successfully.
>>>>> If the buffer was not initialized then vhost_memory_unmap call will lead
>>>>> to SIGSEGV. Add checks for the vring address value before calling unmap.
>>>>> Also add assert() in the vhost_memory_unmap() routine.
>>>>>
>>>>> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
>>>>> ---
>>>>>   hw/virtio/vhost.c | 27 +++++++++++++++++++++------
>>>>>   1 file changed, 21 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>> index ddbdc53..3ee50c4 100644
>>>>> --- a/hw/virtio/vhost.c
>>>>> +++ b/hw/virtio/vhost.c
>>>>> @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
>>>>>                                  hwaddr len, int is_write,
>>>>>                                  hwaddr access_len)
>>>>>   {
>>>>> +    assert(buffer);
>>>>> +
>>>>>       if (!vhost_dev_has_iommu(dev)) {
>>>>>           cpu_physical_memory_unmap(buffer, len, is_write, access_len);
>>>>>       }
>>>>> @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>>>>>                                                   vhost_vq_index);
>>>>>       }
>>>>> -    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
>>>>> -                       1, virtio_queue_get_used_size(vdev, idx));
>>>>> -    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
>>>>> -                       0, virtio_queue_get_avail_size(vdev, idx));
>>>>> -    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
>>>>> -                       0, virtio_queue_get_desc_size(vdev, idx));
>>>>> +    /*
>>>>> +     * Since the vhost-user disconnect can happen during initialization
>>>>> +     * check if vring was initialized, before making unmap.
>>>>> +     */
>>>>> +    if (vq->used) {
>>>>> +        vhost_memory_unmap(dev, vq->used,
>>>>> +                           virtio_queue_get_used_size(vdev, idx),
>>>>> +                           1, virtio_queue_get_used_size(vdev, idx));
>>>>> +    }
>>>>> +    if (vq->avail) {
>>>>> +        vhost_memory_unmap(dev, vq->avail,
>>>>> +                           virtio_queue_get_avail_size(vdev, idx),
>>>>> +                           0, virtio_queue_get_avail_size(vdev, idx));
>>>>> +    }
>>>>> +    if (vq->desc) {
>>>>> +        vhost_memory_unmap(dev, vq->desc,
>>>>> +                           virtio_queue_get_desc_size(vdev, idx),
>>>>> +                           0, virtio_queue_get_desc_size(vdev, idx));
>>>>> +    }
>>>> Any reason not checking hdev->started instead? vhost_dev_start() will set it
>>>> to true if virtqueues were correctly mapped.
>>>>
>>>> Thanks
>>> Well i see it a little bit different:
>>>   - vhost_dev_start() sets hdev->started to true before starting
>>>     virtqueues
>>>   - vhost_virtqueue_start() maps all the memory
>>> If we hit the vhost disconnect at the start of the
>>> vhost_virtqueue_start(), for instance for this call:
>>>    r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
>>> Then we will call vhost_user_blk_disconnect:
>>>    vhost_user_blk_disconnect()->
>>>      vhost_user_blk_stop()->
>>>        vhost_dev_stop()->
>>>          vhost_virtqueue_stop()
>>> As a result we will come in this routine with the hdev->started still
>>> set to true, but if used/avail/desc fields still uninitialized and set
>>> to 0.
>>
>> I may miss something, but consider both vhost_dev_start() and
>> vhost_user_blk_disconnect() were serialized in main loop. Can this really
>> happen?
> Yes, consider the case when we start the vhost-user-blk device:
>    vhost_dev_start->
>      vhost_virtqueue_start
> And we got a disconnect in the middle of vhost_virtqueue_start()
> routine, for instance:
>    1000     vq->num = state.num = virtio_queue_get_num(vdev, idx);
>    1001     r = dev->vhost_ops->vhost_set_vring_num(dev, &state);
>    1002     if (r) {
>    1003         VHOST_OPS_DEBUG("vhost_set_vring_num failed");
>    1004         return -errno;
>    1005     }
>    --> Here we got a disconnect <--
>    1006
>    1007     state.num = virtio_queue_get_last_avail_idx(vdev, idx);
>    1008     r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
>    1009     if (r) {
>    1010         VHOST_OPS_DEBUG("vhost_set_vring_base failed");
>    1011         return -errno;
>    1012     }
> As a result call to vhost_set_vring_base will call the disconnect
> routine. The backtrace log for SIGSEGV is as follows:
>    Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>    [Switching to Thread 0x7ffff2ea9700 (LWP 183150)]
>    0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>    (gdb) bt
>    #0  0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>    #1  0x000055555590fd90 in flatview_write_continue (fv=0x7fffec4a2600,
>        addr=0, attrs=..., ptr=0x0, len=1028, addr1=0,
>        l=1028, mr=0x555556b1b310) at ./exec.c:3142
>    #2  0x000055555590fe98 in flatview_write (fv=0x7fffec4a2600, addr=0,
>        attrs=..., buf=0x0, len=1028) at ./exec.c:3177
>    #3  0x00005555559101ed in address_space_write (as=0x555556893940
>        <address_space_memory>, addr=0, attrs=..., buf=0x0,
>        len=1028) at ./exec.c:3268
>    #4  0x0000555555910caf in address_space_unmap (as=0x555556893940
>        <address_space_memory>, buffer=0x0, len=1028,
>        is_write=true, access_len=1028) at ./exec.c:3592
>    #5  0x0000555555910d82 in cpu_physical_memory_unmap (buffer=0x0,
>        len=1028, is_write=true, access_len=1028) at ./exec.c:3613
>    #6  0x0000555555a16fa1 in vhost_memory_unmap (dev=0x7ffff22723e8,
>        buffer=0x0, len=1028, is_write=1, access_len=1028)
>        at ./hw/virtio/vhost.c:318
>    #7  0x0000555555a192a2 in vhost_virtqueue_stop (dev=0x7ffff22723e8,
>        vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1136
>    #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8,
>        vdev=0x7ffff22721a0) at ./hw/virtio/vhost.c:1702
>    #9  0x00005555559b6532 in vhost_user_blk_stop (vdev=0x7ffff22721a0)
>        at ./hw/block/vhost-user-blk.c:196
>    #10 0x00005555559b6b73 in vhost_user_blk_disconnect (dev=0x7ffff22721a0)
>        at ./hw/block/vhost-user-blk.c:365
>    #11 0x00005555559b6c4f in vhost_user_blk_event (opaque=0x7ffff22721a0,
>        event=CHR_EVENT_CLOSED) at ./hw/block/vhost-user-blk.c:384
>    #12 0x0000555555e65f7e in chr_be_event (s=0x555556b182e0, event=CHR_EVENT_CLOSED)
>        at chardev/char.c:60
>    #13 0x0000555555e6601a in qemu_chr_be_event (s=0x555556b182e0,
>        event=CHR_EVENT_CLOSED) at chardev/char.c:80
>    #14 0x0000555555e6eef3 in tcp_chr_disconnect_locked (chr=0x555556b182e0)
>        at chardev/char-socket.c:488
>    #15 0x0000555555e6e23f in tcp_chr_write (chr=0x555556b182e0,
>        buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-socket.c:178
>    #16 0x0000555555e6616c in qemu_chr_write_buffer (s=0x555556b182e0,
>        buf=0x7ffff2ea8220 "\n", len=20, offset=0x7ffff2ea8150, write_all=true)
>        at chardev/char.c:120
>    #17 0x0000555555e662d9 in qemu_chr_write (s=0x555556b182e0, buf=0x7ffff2ea8220 "\n",
>        len=20, write_all=true) at chardev/char.c:155
>    #18 0x0000555555e693cc in qemu_chr_fe_write_all (be=0x7ffff2272360,
>        buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-fe.c:53
>    #19 0x0000555555a1c489 in vhost_user_write (dev=0x7ffff22723e8,
>        msg=0x7ffff2ea8220, fds=0x0, fd_num=0) at ./hw/virtio/vhost-user.c:350
>    #20 0x0000555555a1d325 in vhost_set_vring (dev=0x7ffff22723e8,
>        request=10, ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:660
>    #21 0x0000555555a1d4c6 in vhost_user_set_vring_base (dev=0x7ffff22723e8,
>        ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:704
>    #22 0x0000555555a18c1b in vhost_virtqueue_start (dev=0x7ffff22723e8,
>        vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1009
>    #23 0x0000555555a1a9f5 in vhost_dev_start (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
>        at ./hw/virtio/vhost.c:1639
>    #24 0x00005555559b6367 in vhost_user_blk_start (vdev=0x7ffff22721a0)
>        at ./hw/block/vhost-user-blk.c:150
>    #25 0x00005555559b6653 in vhost_user_blk_set_status (vdev=0x7ffff22721a0, status=15 '\017')
>        at ./hw/block/vhost-user-blk.c:233
>    #26 0x0000555555a1072d in virtio_set_status (vdev=0x7ffff22721a0, val=15 '\017')
>        at ./hw/virtio/virtio.c:1956
> ---Type <return> to continue, or q <return> to quit---
>
> So while we inside vhost_user_blk_start() (frame#24) we are calling
> vhost_user_blk_disconnect() (frame#10). And for this call the
> dev->started field will be set to true.
>    (gdb) frame 8
>    #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
>        at ./hw/virtio/vhost.c:1702
>    1702            vhost_virtqueue_stop(hdev,
>    (gdb) p hdev->started
>    $1 = true
>
> It isn't an easy race to reproduce: to hit a disconnect at this window.
> We were able to hit it during long testing on one of the reconnect
> iteration. After it we were able to reproduce it with 100% by adding a
> sleep() call inside qemu to make the race window bigger.


Thanks for the patience.

I miss the fact that the disconnection routine could be triggered from 
chardev write.

But the codes turns out to be very tricky and hard to debug since the 
code was wrote to deal with the error returned from vhost_ops directly, 
it doesn't expect vhost_dev_cleanup() was call silently for each 
vhost_user_write(). It would introduce troubles if we want to add new 
codes/operations to vhost.

More questions:

- Do we need to have some checking against hdev->started in each vhost 
user ops?
- Do we need to reset vq->avail/vq->desc/vq->used in vhost_dev_stop()?

Thanks


>
>> Thanks
>>
>>
>>>>>   }
>>>>>   static void vhost_eventfd_add(MemoryListener *listener,



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-12  9:35         ` Dima Stepanov
@ 2020-05-13  3:20           ` Jason Wang
  2020-05-13  9:39             ` Dima Stepanov
  2020-05-13  4:15           ` Michael S. Tsirkin
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-13  3:20 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, fengli, yc-core, pbonzini, marcandre.lureau,
	raphael.norwitz, mreitz


On 2020/5/12 下午5:35, Dima Stepanov wrote:
> On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:
>> On 2020/5/11 下午5:25, Dima Stepanov wrote:
>>> On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
>>>> On 2020/4/30 下午9:36, Dima Stepanov wrote:
>>>>> If vhost-user daemon is used as a backend for the vhost device, then we
>>>>> should consider a possibility of disconnect at any moment. If such
>>>>> disconnect happened in the vhost_migration_log() routine the vhost
>>>>> device structure will be clean up.
>>>>> At the start of the vhost_migration_log() function there is a check:
>>>>>    if (!dev->started) {
>>>>>        dev->log_enabled = enable;
>>>>>        return 0;
>>>>>    }
>>>>> To be consistent with this check add the same check after calling the
>>>>> vhost_dev_set_log() routine. This in general help not to break a
>>>>> migration due the assert() message. But it looks like that this code
>>>>> should be revised to handle these errors more carefully.
>>>>>
>>>>> In case of vhost-user device backend the fail paths should consider the
>>>>> state of the device. In this case we should skip some function calls
>>>>> during rollback on the error paths, so not to get the NULL dereference
>>>>> errors.
>>>>>
>>>>> Signed-off-by: Dima Stepanov<dimastep@yandex-team.ru>
>>>>> ---
>>>>>   hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
>>>>>   1 file changed, 35 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>> index 3ee50c4..d5ab96d 100644
>>>>> --- a/hw/virtio/vhost.c
>>>>> +++ b/hw/virtio/vhost.c
>>>>> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>>>>>   static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>>>>>   {
>>>>>       int r, i, idx;
>>>>> +
>>>>> +    if (!dev->started) {
>>>>> +        /*
>>>>> +         * If vhost-user daemon is used as a backend for the
>>>>> +         * device and the connection is broken, then the vhost_dev
>>>>> +         * structure will be reset all its values to 0.
>>>>> +         * Add additional check for the device state.
>>>>> +         */
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>>       r = vhost_dev_set_features(dev, enable_log);
>>>>>       if (r < 0) {
>>>>>           goto err_features;
>>>>> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>>>>>       }
>>>>>       return 0;
>>>>>   err_vq:
>>>>> -    for (; i >= 0; --i) {
>>>>> +    /*
>>>>> +     * Disconnect with the vhost-user daemon can lead to the
>>>>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
>>>>> +     * structure.
>>>>> +     */
>>>>> +    for (; dev->started && (i >= 0); --i) {
>>>>>           idx = dev->vhost_ops->vhost_get_vq_index(
>>>> Why need the check of dev->started here, can started be modified outside
>>>> mainloop? If yes, I don't get the check of !dev->started in the beginning of
>>>> this function.
>>>>
>>> No dev->started can't change outside the mainloop. The main problem is
>>> only for the vhost_user_blk daemon. Consider the case when we
>>> successfully pass the dev->started check at the beginning of the
>>> function, but after it we hit the disconnect on the next call on the
>>> second or third iteration:
>>>       r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
>>> The unix socket backend device will call the disconnect routine for this
>>> device and reset the structure. So the structure will be reset (and
>>> dev->started set to false) inside this set_addr() call.
>> I still don't get here. I think the disconnect can not happen in the middle
>> of vhost_dev_set_log() since both of them were running in mainloop. And even
>> if it can, we probably need other synchronization mechanism other than
>> simple check here.
> Disconnect isn't happened in the separate thread it is happened in this
> routine inside vhost_dev_set_log. When for instance vhost_user_write()
> call failed:
>    vhost_user_set_log_base()
>      vhost_user_write()
>        vhost_user_blk_disconnect()
>          vhost_dev_cleanup()
>            vhost_user_backend_cleanup()
> So the point is that if we somehow got a disconnect with the
> vhost-user-blk daemon before the vhost_user_write() call then it will
> continue clean up by running vhost_user_blk_disconnect() function. I
> wrote a more detailed backtrace stack in the separate thread, which is
> pretty similar to what we have here:
>    Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
> The places are different but the problem is pretty similar.


Yes.


>
> So if vhost-user commands handshake then everything is fine and
> reconnect will work as expected. The only problem is how to handle
> reconnect properly between vhost-user command send/receive.
>
> As i wrote we have a test:
>    - run src VM with vhost-usr-blk daemon used
>    - run fio inside it
>    - perform reconnect every X seconds (just kill and restart daemon),
>      X is random
>    - run dst VM
>    - perform migration
>    - fio should complete in dst VM
> And we cycle this test like forever.
> So it fails once per ~25 iteration. By adding some delays inside qemu we
> were able to make the race window larger.


It would be better if we can draft some qtest for this.

Thanks


>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-12  9:35         ` Dima Stepanov
  2020-05-13  3:20           ` Jason Wang
@ 2020-05-13  4:15           ` Michael S. Tsirkin
  2020-05-13  5:56             ` Jason Wang
  1 sibling, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2020-05-13  4:15 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, Jason Wang, qemu-devel,
	dgilbert, arei.gonglei, fengli, yc-core, pbonzini,
	marcandre.lureau, raphael.norwitz, mreitz

On Tue, May 12, 2020 at 12:35:30PM +0300, Dima Stepanov wrote:
> On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:
> > 
> > On 2020/5/11 下午5:25, Dima Stepanov wrote:
> > >On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> > >>On 2020/4/30 下午9:36, Dima Stepanov wrote:
> > >>>If vhost-user daemon is used as a backend for the vhost device, then we
> > >>>should consider a possibility of disconnect at any moment. If such
> > >>>disconnect happened in the vhost_migration_log() routine the vhost
> > >>>device structure will be clean up.
> > >>>At the start of the vhost_migration_log() function there is a check:
> > >>>   if (!dev->started) {
> > >>>       dev->log_enabled = enable;
> > >>>       return 0;
> > >>>   }
> > >>>To be consistent with this check add the same check after calling the
> > >>>vhost_dev_set_log() routine. This in general help not to break a
> > >>>migration due the assert() message. But it looks like that this code
> > >>>should be revised to handle these errors more carefully.
> > >>>
> > >>>In case of vhost-user device backend the fail paths should consider the
> > >>>state of the device. In this case we should skip some function calls
> > >>>during rollback on the error paths, so not to get the NULL dereference
> > >>>errors.
> > >>>
> > >>>Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> > >>>---
> > >>>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> > >>>  1 file changed, 35 insertions(+), 4 deletions(-)
> > >>>
> > >>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > >>>index 3ee50c4..d5ab96d 100644
> > >>>--- a/hw/virtio/vhost.c
> > >>>+++ b/hw/virtio/vhost.c
> > >>>@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> > >>>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> > >>>  {
> > >>>      int r, i, idx;
> > >>>+
> > >>>+    if (!dev->started) {
> > >>>+        /*
> > >>>+         * If vhost-user daemon is used as a backend for the
> > >>>+         * device and the connection is broken, then the vhost_dev
> > >>>+         * structure will be reset all its values to 0.
> > >>>+         * Add additional check for the device state.
> > >>>+         */
> > >>>+        return -1;
> > >>>+    }
> > >>>+
> > >>>      r = vhost_dev_set_features(dev, enable_log);
> > >>>      if (r < 0) {
> > >>>          goto err_features;
> > >>>@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> > >>>      }
> > >>>      return 0;
> > >>>  err_vq:
> > >>>-    for (; i >= 0; --i) {
> > >>>+    /*
> > >>>+     * Disconnect with the vhost-user daemon can lead to the
> > >>>+     * vhost_dev_cleanup() call which will clean up vhost_dev
> > >>>+     * structure.
> > >>>+     */
> > >>>+    for (; dev->started && (i >= 0); --i) {
> > >>>          idx = dev->vhost_ops->vhost_get_vq_index(
> > >>
> > >>Why need the check of dev->started here, can started be modified outside
> > >>mainloop? If yes, I don't get the check of !dev->started in the beginning of
> > >>this function.
> > >>
> > >No dev->started can't change outside the mainloop. The main problem is
> > >only for the vhost_user_blk daemon. Consider the case when we
> > >successfully pass the dev->started check at the beginning of the
> > >function, but after it we hit the disconnect on the next call on the
> > >second or third iteration:
> > >      r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> > >The unix socket backend device will call the disconnect routine for this
> > >device and reset the structure. So the structure will be reset (and
> > >dev->started set to false) inside this set_addr() call.
> > 
> > 
> > I still don't get here. I think the disconnect can not happen in the middle
> > of vhost_dev_set_log() since both of them were running in mainloop. And even
> > if it can, we probably need other synchronization mechanism other than
> > simple check here.
> Disconnect isn't happened in the separate thread it is happened in this
> routine inside vhost_dev_set_log. When for instance vhost_user_write()
> call failed:
>   vhost_user_set_log_base()
>     vhost_user_write()
>       vhost_user_blk_disconnect()
>         vhost_dev_cleanup()
>           vhost_user_backend_cleanup()
> So the point is that if we somehow got a disconnect with the
> vhost-user-blk daemon before the vhost_user_write() call then it will
> continue clean up by running vhost_user_blk_disconnect() function. I
> wrote a more detailed backtrace stack in the separate thread, which is
> pretty similar to what we have here:
>   Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
> The places are different but the problem is pretty similar.
> 
> So if vhost-user commands handshake then everything is fine and
> reconnect will work as expected. The only problem is how to handle
> reconnect properly between vhost-user command send/receive.



So vhost net had this problem too.

commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Date:   Mon Feb 27 14:49:56 2017 +0400

    vhost-user: delay vhost_user_stop
    
    Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write
    may trigger a disconnect events, calling vhost_user_stop() and clearing
    all the vhost_dev strutures holding data that vhost.c functions expect
    to remain valid. Delay the cleanup to keep the vhost_dev structure
    valid during the vhost.c functions.
    
    Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-id: 20170227104956.24729-1-marcandre.lureau@redhat.com
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

it now has this code to address this:


    case CHR_EVENT_CLOSED:
        /* a close event may happen during a read/write, but vhost
         * code assumes the vhost_dev remains setup, so delay the
         * stop & clear to idle.
         * FIXME: better handle failure in vhost code, remove bh
         */
        if (s->watch) {
            AioContext *ctx = qemu_get_current_aio_context();

            g_source_remove(s->watch);
            s->watch = 0;
            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
                                     NULL, NULL, false);

            aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
        }
        break;

I think it's time we dropped the FIXME and moved the handling to common
code. Jason? Marc-André?





> As i wrote we have a test:
>   - run src VM with vhost-usr-blk daemon used
>   - run fio inside it
>   - perform reconnect every X seconds (just kill and restart daemon),
>     X is random
>   - run dst VM
>   - perform migration
>   - fio should complete in dst VM
> And we cycle this test like forever.
> So it fails once per ~25 iteration. By adding some delays inside qemu we
> were able to make the race window larger.
> 
> > 
> > 
> > >  So
> > >we shouldn't call the clean up calls because this virtqueues were clean
> > >up in the disconnect call. But we should protect these calls somehow, so
> > >it will not hit SIGSEGV and we will be able to pass migration.
> > >
> > >Just to summarize it:
> > >For the vhost-user-blk devices we ca hit clean up calls twice in case of
> > >vhost disconnect:
> > >1. The first time during the disconnect process. The clean up is called
> > >inside it.
> > >2. The second time during roll back clean up.
> > >So if it is the case we should skip p2.
> > >
> > >>>dev, dev->vq_index + i);
> > >>>          vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> > >>>                                   dev->log_enabled);
> > >>>      }
> > >>>-    vhost_dev_set_features(dev, dev->log_enabled);
> > >>>+    if (dev->started) {
> > >>>+        vhost_dev_set_features(dev, dev->log_enabled);
> > >>>+    }
> > >>>  err_features:
> > >>>      return r;
> > >>>  }
> > >>>@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable)
> > >>>      } else {
> > >>>          vhost_dev_log_resize(dev, vhost_get_log_size(dev));
> > >>>          r = vhost_dev_set_log(dev, true);
> > >>>-        if (r < 0) {
> > >>>+        /*
> > >>>+         * The dev log resize can fail, because of disconnect
> > >>>+         * with the vhost-user-blk daemon. Check the device
> > >>>+         * state before calling the vhost_dev_set_log()
> > >>>+         * function.
> > >>>+         * Don't return error if device isn't started to be
> > >>>+         * consistent with the check above.
> > >>>+         */
> > >>>+        if (dev->started && r < 0) {
> > >>>              return r;
> > >>>          }
> > >>>      }
> > >>>@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> > >>>  fail_log:
> > >>>      vhost_log_put(hdev, false);
> > >>>  fail_vq:
> > >>>-    while (--i >= 0) {
> > >>>+    /*
> > >>>+     * Disconnect with the vhost-user daemon can lead to the
> > >>>+     * vhost_dev_cleanup() call which will clean up vhost_dev
> > >>>+     * structure.
> > >>>+     */
> > >>>+    while ((--i >= 0) && (hdev->started)) {
> > >>>          vhost_virtqueue_stop(hdev,
> > >>>                               vdev,
> > >>>                               hdev->vqs + i,
> > >>
> > >>This should be a separate patch.
> > >Do you mean i should split this patch to two patches?
> > 
> > 
> > Yes.
> > 
> > Thanks
> 
> Got it. Will do it in v3.
> 
> No other comments mixed in below.
> 
> Thanks.
> 
> > 
> > 
> > >
> > >Thanks.
> > >
> > >>Thanks
> > >>
> > 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-13  4:15           ` Michael S. Tsirkin
@ 2020-05-13  5:56             ` Jason Wang
  2020-05-13  9:47               ` Dima Stepanov
  2020-05-19  9:13               ` Dima Stepanov
  0 siblings, 2 replies; 51+ messages in thread
From: Jason Wang @ 2020-05-13  5:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, dgilbert, qemu-devel,
	arei.gonglei, fengli, yc-core, pbonzini, marcandre.lureau,
	raphael.norwitz, mreitz


On 2020/5/13 下午12:15, Michael S. Tsirkin wrote:
> On Tue, May 12, 2020 at 12:35:30PM +0300, Dima Stepanov wrote:
>> On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:
>>> On 2020/5/11 下午5:25, Dima Stepanov wrote:
>>>> On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
>>>>> On 2020/4/30 下午9:36, Dima Stepanov wrote:
>>>>>> If vhost-user daemon is used as a backend for the vhost device, then we
>>>>>> should consider a possibility of disconnect at any moment. If such
>>>>>> disconnect happened in the vhost_migration_log() routine the vhost
>>>>>> device structure will be clean up.
>>>>>> At the start of the vhost_migration_log() function there is a check:
>>>>>>    if (!dev->started) {
>>>>>>        dev->log_enabled = enable;
>>>>>>        return 0;
>>>>>>    }
>>>>>> To be consistent with this check add the same check after calling the
>>>>>> vhost_dev_set_log() routine. This in general help not to break a
>>>>>> migration due the assert() message. But it looks like that this code
>>>>>> should be revised to handle these errors more carefully.
>>>>>>
>>>>>> In case of vhost-user device backend the fail paths should consider the
>>>>>> state of the device. In this case we should skip some function calls
>>>>>> during rollback on the error paths, so not to get the NULL dereference
>>>>>> errors.
>>>>>>
>>>>>> Signed-off-by: Dima Stepanov<dimastep@yandex-team.ru>
>>>>>> ---
>>>>>>   hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
>>>>>>   1 file changed, 35 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>> index 3ee50c4..d5ab96d 100644
>>>>>> --- a/hw/virtio/vhost.c
>>>>>> +++ b/hw/virtio/vhost.c
>>>>>> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>>>>>>   static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>>>>>>   {
>>>>>>       int r, i, idx;
>>>>>> +
>>>>>> +    if (!dev->started) {
>>>>>> +        /*
>>>>>> +         * If vhost-user daemon is used as a backend for the
>>>>>> +         * device and the connection is broken, then the vhost_dev
>>>>>> +         * structure will be reset all its values to 0.
>>>>>> +         * Add additional check for the device state.
>>>>>> +         */
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>>       r = vhost_dev_set_features(dev, enable_log);
>>>>>>       if (r < 0) {
>>>>>>           goto err_features;
>>>>>> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>>>>>>       }
>>>>>>       return 0;
>>>>>>   err_vq:
>>>>>> -    for (; i >= 0; --i) {
>>>>>> +    /*
>>>>>> +     * Disconnect with the vhost-user daemon can lead to the
>>>>>> +     * vhost_dev_cleanup() call which will clean up vhost_dev
>>>>>> +     * structure.
>>>>>> +     */
>>>>>> +    for (; dev->started && (i >= 0); --i) {
>>>>>>           idx = dev->vhost_ops->vhost_get_vq_index(
>>>>> Why need the check of dev->started here, can started be modified outside
>>>>> mainloop? If yes, I don't get the check of !dev->started in the beginning of
>>>>> this function.
>>>>>
>>>> No dev->started can't change outside the mainloop. The main problem is
>>>> only for the vhost_user_blk daemon. Consider the case when we
>>>> successfully pass the dev->started check at the beginning of the
>>>> function, but after it we hit the disconnect on the next call on the
>>>> second or third iteration:
>>>>       r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
>>>> The unix socket backend device will call the disconnect routine for this
>>>> device and reset the structure. So the structure will be reset (and
>>>> dev->started set to false) inside this set_addr() call.
>>> I still don't get here. I think the disconnect can not happen in the middle
>>> of vhost_dev_set_log() since both of them were running in mainloop. And even
>>> if it can, we probably need other synchronization mechanism other than
>>> simple check here.
>> Disconnect isn't happened in the separate thread it is happened in this
>> routine inside vhost_dev_set_log. When for instance vhost_user_write()
>> call failed:
>>    vhost_user_set_log_base()
>>      vhost_user_write()
>>        vhost_user_blk_disconnect()
>>          vhost_dev_cleanup()
>>            vhost_user_backend_cleanup()
>> So the point is that if we somehow got a disconnect with the
>> vhost-user-blk daemon before the vhost_user_write() call then it will
>> continue clean up by running vhost_user_blk_disconnect() function. I
>> wrote a more detailed backtrace stack in the separate thread, which is
>> pretty similar to what we have here:
>>    Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
>> The places are different but the problem is pretty similar.
>>
>> So if vhost-user commands handshake then everything is fine and
>> reconnect will work as expected. The only problem is how to handle
>> reconnect properly between vhost-user command send/receive.
>
> So vhost net had this problem too.
>
> commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
> Author: Marc-André Lureau<marcandre.lureau@redhat.com>
> Date:   Mon Feb 27 14:49:56 2017 +0400
>
>      vhost-user: delay vhost_user_stop
>      
>      Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write
>      may trigger a disconnect events, calling vhost_user_stop() and clearing
>      all the vhost_dev strutures holding data that vhost.c functions expect
>      to remain valid. Delay the cleanup to keep the vhost_dev structure
>      valid during the vhost.c functions.
>      
>      Signed-off-by: Marc-André Lureau<marcandre.lureau@redhat.com>
>      Message-id:20170227104956.24729-1-marcandre.lureau@redhat.com
>      Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
>
> it now has this code to address this:
>
>
>      case CHR_EVENT_CLOSED:
>          /* a close event may happen during a read/write, but vhost
>           * code assumes the vhost_dev remains setup, so delay the
>           * stop & clear to idle.
>           * FIXME: better handle failure in vhost code, remove bh
>           */
>          if (s->watch) {
>              AioContext *ctx = qemu_get_current_aio_context();
>
>              g_source_remove(s->watch);
>              s->watch = 0;
>              qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
>                                       NULL, NULL, false);
>
>              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
>          }
>          break;
>
> I think it's time we dropped the FIXME and moved the handling to common
> code. Jason? Marc-André?


I agree. Just to confirm, do you prefer bh or doing changes like what is 
done in this series? It looks to me bh can have more easier codes.

Thanks


>
>
>
>
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-05-13  3:00           ` Jason Wang
@ 2020-05-13  9:36             ` Dima Stepanov
  2020-05-14  7:28               ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-13  9:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, yc-core, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, raphael.norwitz, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

On Wed, May 13, 2020 at 11:00:38AM +0800, Jason Wang wrote:
> 
> On 2020/5/12 下午5:08, Dima Stepanov wrote:
> >On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote:
> >>On 2020/5/11 下午5:11, Dima Stepanov wrote:
> >>>On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:
> >>>>On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>>>>Since disconnect can happen at any time during initialization not all
> >>>>>vring buffers (for instance used vring) can be intialized successfully.
> >>>>>If the buffer was not initialized then vhost_memory_unmap call will lead
> >>>>>to SIGSEGV. Add checks for the vring address value before calling unmap.
> >>>>>Also add assert() in the vhost_memory_unmap() routine.
> >>>>>
> >>>>>Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> >>>>>---
> >>>>>  hw/virtio/vhost.c | 27 +++++++++++++++++++++------
> >>>>>  1 file changed, 21 insertions(+), 6 deletions(-)
> >>>>>
> >>>>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>index ddbdc53..3ee50c4 100644
> >>>>>--- a/hw/virtio/vhost.c
> >>>>>+++ b/hw/virtio/vhost.c
> >>>>>@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> >>>>>                                 hwaddr len, int is_write,
> >>>>>                                 hwaddr access_len)
> >>>>>  {
> >>>>>+    assert(buffer);
> >>>>>+
> >>>>>      if (!vhost_dev_has_iommu(dev)) {
> >>>>>          cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> >>>>>      }
> >>>>>@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
> >>>>>                                                  vhost_vq_index);
> >>>>>      }
> >>>>>-    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
> >>>>>-                       1, virtio_queue_get_used_size(vdev, idx));
> >>>>>-    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
> >>>>>-                       0, virtio_queue_get_avail_size(vdev, idx));
> >>>>>-    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
> >>>>>-                       0, virtio_queue_get_desc_size(vdev, idx));
> >>>>>+    /*
> >>>>>+     * Since the vhost-user disconnect can happen during initialization
> >>>>>+     * check if vring was initialized, before making unmap.
> >>>>>+     */
> >>>>>+    if (vq->used) {
> >>>>>+        vhost_memory_unmap(dev, vq->used,
> >>>>>+                           virtio_queue_get_used_size(vdev, idx),
> >>>>>+                           1, virtio_queue_get_used_size(vdev, idx));
> >>>>>+    }
> >>>>>+    if (vq->avail) {
> >>>>>+        vhost_memory_unmap(dev, vq->avail,
> >>>>>+                           virtio_queue_get_avail_size(vdev, idx),
> >>>>>+                           0, virtio_queue_get_avail_size(vdev, idx));
> >>>>>+    }
> >>>>>+    if (vq->desc) {
> >>>>>+        vhost_memory_unmap(dev, vq->desc,
> >>>>>+                           virtio_queue_get_desc_size(vdev, idx),
> >>>>>+                           0, virtio_queue_get_desc_size(vdev, idx));
> >>>>>+    }
> >>>>Any reason not checking hdev->started instead? vhost_dev_start() will set it
> >>>>to true if virtqueues were correctly mapped.
> >>>>
> >>>>Thanks
> >>>Well i see it a little bit different:
> >>>  - vhost_dev_start() sets hdev->started to true before starting
> >>>    virtqueues
> >>>  - vhost_virtqueue_start() maps all the memory
> >>>If we hit the vhost disconnect at the start of the
> >>>vhost_virtqueue_start(), for instance for this call:
> >>>   r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
> >>>Then we will call vhost_user_blk_disconnect:
> >>>   vhost_user_blk_disconnect()->
> >>>     vhost_user_blk_stop()->
> >>>       vhost_dev_stop()->
> >>>         vhost_virtqueue_stop()
> >>>As a result we will come in this routine with the hdev->started still
> >>>set to true, but if used/avail/desc fields still uninitialized and set
> >>>to 0.
> >>
> >>I may miss something, but consider both vhost_dev_start() and
> >>vhost_user_blk_disconnect() were serialized in main loop. Can this really
> >>happen?
> >Yes, consider the case when we start the vhost-user-blk device:
> >   vhost_dev_start->
> >     vhost_virtqueue_start
> >And we got a disconnect in the middle of vhost_virtqueue_start()
> >routine, for instance:
> >   1000     vq->num = state.num = virtio_queue_get_num(vdev, idx);
> >   1001     r = dev->vhost_ops->vhost_set_vring_num(dev, &state);
> >   1002     if (r) {
> >   1003         VHOST_OPS_DEBUG("vhost_set_vring_num failed");
> >   1004         return -errno;
> >   1005     }
> >   --> Here we got a disconnect <--
> >   1006
> >   1007     state.num = virtio_queue_get_last_avail_idx(vdev, idx);
> >   1008     r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
> >   1009     if (r) {
> >   1010         VHOST_OPS_DEBUG("vhost_set_vring_base failed");
> >   1011         return -errno;
> >   1012     }
> >As a result call to vhost_set_vring_base will call the disconnect
> >routine. The backtrace log for SIGSEGV is as follows:
> >   Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> >   [Switching to Thread 0x7ffff2ea9700 (LWP 183150)]
> >   0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> >   (gdb) bt
> >   #0  0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> >   #1  0x000055555590fd90 in flatview_write_continue (fv=0x7fffec4a2600,
> >       addr=0, attrs=..., ptr=0x0, len=1028, addr1=0,
> >       l=1028, mr=0x555556b1b310) at ./exec.c:3142
> >   #2  0x000055555590fe98 in flatview_write (fv=0x7fffec4a2600, addr=0,
> >       attrs=..., buf=0x0, len=1028) at ./exec.c:3177
> >   #3  0x00005555559101ed in address_space_write (as=0x555556893940
> >       <address_space_memory>, addr=0, attrs=..., buf=0x0,
> >       len=1028) at ./exec.c:3268
> >   #4  0x0000555555910caf in address_space_unmap (as=0x555556893940
> >       <address_space_memory>, buffer=0x0, len=1028,
> >       is_write=true, access_len=1028) at ./exec.c:3592
> >   #5  0x0000555555910d82 in cpu_physical_memory_unmap (buffer=0x0,
> >       len=1028, is_write=true, access_len=1028) at ./exec.c:3613
> >   #6  0x0000555555a16fa1 in vhost_memory_unmap (dev=0x7ffff22723e8,
> >       buffer=0x0, len=1028, is_write=1, access_len=1028)
> >       at ./hw/virtio/vhost.c:318
> >   #7  0x0000555555a192a2 in vhost_virtqueue_stop (dev=0x7ffff22723e8,
> >       vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1136
> >   #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8,
> >       vdev=0x7ffff22721a0) at ./hw/virtio/vhost.c:1702
> >   #9  0x00005555559b6532 in vhost_user_blk_stop (vdev=0x7ffff22721a0)
> >       at ./hw/block/vhost-user-blk.c:196
> >   #10 0x00005555559b6b73 in vhost_user_blk_disconnect (dev=0x7ffff22721a0)
> >       at ./hw/block/vhost-user-blk.c:365
> >   #11 0x00005555559b6c4f in vhost_user_blk_event (opaque=0x7ffff22721a0,
> >       event=CHR_EVENT_CLOSED) at ./hw/block/vhost-user-blk.c:384
> >   #12 0x0000555555e65f7e in chr_be_event (s=0x555556b182e0, event=CHR_EVENT_CLOSED)
> >       at chardev/char.c:60
> >   #13 0x0000555555e6601a in qemu_chr_be_event (s=0x555556b182e0,
> >       event=CHR_EVENT_CLOSED) at chardev/char.c:80
> >   #14 0x0000555555e6eef3 in tcp_chr_disconnect_locked (chr=0x555556b182e0)
> >       at chardev/char-socket.c:488
> >   #15 0x0000555555e6e23f in tcp_chr_write (chr=0x555556b182e0,
> >       buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-socket.c:178
> >   #16 0x0000555555e6616c in qemu_chr_write_buffer (s=0x555556b182e0,
> >       buf=0x7ffff2ea8220 "\n", len=20, offset=0x7ffff2ea8150, write_all=true)
> >       at chardev/char.c:120
> >   #17 0x0000555555e662d9 in qemu_chr_write (s=0x555556b182e0, buf=0x7ffff2ea8220 "\n",
> >       len=20, write_all=true) at chardev/char.c:155
> >   #18 0x0000555555e693cc in qemu_chr_fe_write_all (be=0x7ffff2272360,
> >       buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-fe.c:53
> >   #19 0x0000555555a1c489 in vhost_user_write (dev=0x7ffff22723e8,
> >       msg=0x7ffff2ea8220, fds=0x0, fd_num=0) at ./hw/virtio/vhost-user.c:350
> >   #20 0x0000555555a1d325 in vhost_set_vring (dev=0x7ffff22723e8,
> >       request=10, ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:660
> >   #21 0x0000555555a1d4c6 in vhost_user_set_vring_base (dev=0x7ffff22723e8,
> >       ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:704
> >   #22 0x0000555555a18c1b in vhost_virtqueue_start (dev=0x7ffff22723e8,
> >       vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1009
> >   #23 0x0000555555a1a9f5 in vhost_dev_start (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
> >       at ./hw/virtio/vhost.c:1639
> >   #24 0x00005555559b6367 in vhost_user_blk_start (vdev=0x7ffff22721a0)
> >       at ./hw/block/vhost-user-blk.c:150
> >   #25 0x00005555559b6653 in vhost_user_blk_set_status (vdev=0x7ffff22721a0, status=15 '\017')
> >       at ./hw/block/vhost-user-blk.c:233
> >   #26 0x0000555555a1072d in virtio_set_status (vdev=0x7ffff22721a0, val=15 '\017')
> >       at ./hw/virtio/virtio.c:1956
> >---Type <return> to continue, or q <return> to quit---
> >
> >So while we inside vhost_user_blk_start() (frame#24) we are calling
> >vhost_user_blk_disconnect() (frame#10). And for this call the
> >dev->started field will be set to true.
> >   (gdb) frame 8
> >   #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
> >       at ./hw/virtio/vhost.c:1702
> >   1702            vhost_virtqueue_stop(hdev,
> >   (gdb) p hdev->started
> >   $1 = true
> >
> >It isn't an easy race to reproduce: to hit a disconnect at this window.
> >We were able to hit it during long testing on one of the reconnect
> >iteration. After it we were able to reproduce it with 100% by adding a
> >sleep() call inside qemu to make the race window bigger.
> 
> 
> Thanks for the patience.
> 
> I miss the fact that the disconnection routine could be triggered from
> chardev write.
> 
> But the codes turns out to be very tricky and hard to debug since the code
> was wrote to deal with the error returned from vhost_ops directly, it
> doesn't expect vhost_dev_cleanup() was call silently for each
> vhost_user_write(). It would introduce troubles if we want to add new
> codes/operations to vhost.
> 
> More questions:
> 
Well these are good questions.

> - Do we need to have some checking against hdev->started in each vhost user
> ops?
I can miss smth but it looks like that no. The vhost_dev_set_log()
routine has some additional logic with the vhost_virtqueue_set_addr()
and vhost_dev_set_features() this why we need those additional checks in
the migration code. For the vhost-user devices initialization or
deinitalization code we don't need those checks. Even more we couldn't
add this check to the vhost user ops, since it ops itself will be reset.
And i agree that even if we have no issues right now, by adding new code
and missing these cases we could break smth. Because review process
becomes harder.

Maybe the good idea is postphone clean up, if we are in the middle of the
initialization. So it could be smth like:
  - Put ourself in the INITIALIZATION state
  - Start these vhost-user "handshake" commands
  - If we got a disconnect error, perform disconnect, but don't clean up
    device (it will be clean up on the roll back). I can be done by
    checking the state in vhost_user_..._disconnect routine or smth like
    it
  - vhost-user command returns error back to the _start() routine
  - Rollback in one place in the start() routine, by calling this
    postphoned clean up for the disconnect

Michael wrote that there was similar fix for vhost-net. Can't say for
now, could it help or not, but need to take a look on it also.

> - Do we need to reset vq->avail/vq->desc/vq->used in vhost_dev_stop()?
I think yes. It is initialized in _start(), so it should be reset in
_stop(). Looks correct for me, we just want to be sure that we reset it
correctly by considering the progress in the initialization.
That is why i think that this is okay. Since we need to be sure that the
clean up routine itself could handle such cases.

> 
> Thanks
> 
> 
> >
> >>Thanks
> >>
> >>
> >>>>>  }
> >>>>>  static void vhost_eventfd_add(MemoryListener *listener,
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-13  3:20           ` Jason Wang
@ 2020-05-13  9:39             ` Dima Stepanov
  0 siblings, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-13  9:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, stefanha, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, fengli, yc-core, pbonzini, marcandre.lureau,
	raphael.norwitz, mreitz

On Wed, May 13, 2020 at 11:20:50AM +0800, Jason Wang wrote:
> 
> On 2020/5/12 下午5:35, Dima Stepanov wrote:
> >On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:
> >>On 2020/5/11 下午5:25, Dima Stepanov wrote:
> >>>On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> >>>>On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>>>>If vhost-user daemon is used as a backend for the vhost device, then we
> >>>>>should consider a possibility of disconnect at any moment. If such
> >>>>>disconnect happened in the vhost_migration_log() routine the vhost
> >>>>>device structure will be clean up.
> >>>>>At the start of the vhost_migration_log() function there is a check:
> >>>>>   if (!dev->started) {
> >>>>>       dev->log_enabled = enable;
> >>>>>       return 0;
> >>>>>   }
> >>>>>To be consistent with this check add the same check after calling the
> >>>>>vhost_dev_set_log() routine. This in general help not to break a
> >>>>>migration due the assert() message. But it looks like that this code
> >>>>>should be revised to handle these errors more carefully.
> >>>>>
> >>>>>In case of vhost-user device backend the fail paths should consider the
> >>>>>state of the device. In this case we should skip some function calls
> >>>>>during rollback on the error paths, so not to get the NULL dereference
> >>>>>errors.
> >>>>>
> >>>>>Signed-off-by: Dima Stepanov<dimastep@yandex-team.ru>
> >>>>>---
> >>>>>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >>>>>  1 file changed, 35 insertions(+), 4 deletions(-)
> >>>>>
> >>>>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>index 3ee50c4..d5ab96d 100644
> >>>>>--- a/hw/virtio/vhost.c
> >>>>>+++ b/hw/virtio/vhost.c
> >>>>>@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >>>>>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>>>  {
> >>>>>      int r, i, idx;
> >>>>>+
> >>>>>+    if (!dev->started) {
> >>>>>+        /*
> >>>>>+         * If vhost-user daemon is used as a backend for the
> >>>>>+         * device and the connection is broken, then the vhost_dev
> >>>>>+         * structure will be reset all its values to 0.
> >>>>>+         * Add additional check for the device state.
> >>>>>+         */
> >>>>>+        return -1;
> >>>>>+    }
> >>>>>+
> >>>>>      r = vhost_dev_set_features(dev, enable_log);
> >>>>>      if (r < 0) {
> >>>>>          goto err_features;
> >>>>>@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>>>      }
> >>>>>      return 0;
> >>>>>  err_vq:
> >>>>>-    for (; i >= 0; --i) {
> >>>>>+    /*
> >>>>>+     * Disconnect with the vhost-user daemon can lead to the
> >>>>>+     * vhost_dev_cleanup() call which will clean up vhost_dev
> >>>>>+     * structure.
> >>>>>+     */
> >>>>>+    for (; dev->started && (i >= 0); --i) {
> >>>>>          idx = dev->vhost_ops->vhost_get_vq_index(
> >>>>Why need the check of dev->started here, can started be modified outside
> >>>>mainloop? If yes, I don't get the check of !dev->started in the beginning of
> >>>>this function.
> >>>>
> >>>No dev->started can't change outside the mainloop. The main problem is
> >>>only for the vhost_user_blk daemon. Consider the case when we
> >>>successfully pass the dev->started check at the beginning of the
> >>>function, but after it we hit the disconnect on the next call on the
> >>>second or third iteration:
> >>>      r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> >>>The unix socket backend device will call the disconnect routine for this
> >>>device and reset the structure. So the structure will be reset (and
> >>>dev->started set to false) inside this set_addr() call.
> >>I still don't get here. I think the disconnect can not happen in the middle
> >>of vhost_dev_set_log() since both of them were running in mainloop. And even
> >>if it can, we probably need other synchronization mechanism other than
> >>simple check here.
> >Disconnect isn't happened in the separate thread it is happened in this
> >routine inside vhost_dev_set_log. When for instance vhost_user_write()
> >call failed:
> >   vhost_user_set_log_base()
> >     vhost_user_write()
> >       vhost_user_blk_disconnect()
> >         vhost_dev_cleanup()
> >           vhost_user_backend_cleanup()
> >So the point is that if we somehow got a disconnect with the
> >vhost-user-blk daemon before the vhost_user_write() call then it will
> >continue clean up by running vhost_user_blk_disconnect() function. I
> >wrote a more detailed backtrace stack in the separate thread, which is
> >pretty similar to what we have here:
> >   Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
> >The places are different but the problem is pretty similar.
> 
> 
> Yes.
> 
> 
> >
> >So if vhost-user commands handshake then everything is fine and
> >reconnect will work as expected. The only problem is how to handle
> >reconnect properly between vhost-user command send/receive.
> >
> >As i wrote we have a test:
> >   - run src VM with vhost-usr-blk daemon used
> >   - run fio inside it
> >   - perform reconnect every X seconds (just kill and restart daemon),
> >     X is random
> >   - run dst VM
> >   - perform migration
> >   - fio should complete in dst VM
> >And we cycle this test like forever.
> >So it fails once per ~25 iteration. By adding some delays inside qemu we
> >were able to make the race window larger.
> 
> 
> It would be better if we can draft some qtest for this.
Yes, i'm in process of figuring out how to make/reproduce it in the
qtest framework instead of our internal one.

> 
> Thanks
> 
> 
> >
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-13  5:56             ` Jason Wang
@ 2020-05-13  9:47               ` Dima Stepanov
  2020-05-14  7:34                 ` Jason Wang
  2020-05-19  9:13               ` Dima Stepanov
  1 sibling, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-13  9:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, stefanha, qemu-block, Michael S. Tsirkin, qemu-devel,
	dgilbert, arei.gonglei, fengli, yc-core, pbonzini,
	marcandre.lureau, raphael.norwitz, mreitz

On Wed, May 13, 2020 at 01:56:18PM +0800, Jason Wang wrote:
> 
> On 2020/5/13 下午12:15, Michael S. Tsirkin wrote:
> >On Tue, May 12, 2020 at 12:35:30PM +0300, Dima Stepanov wrote:
> >>On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:
> >>>On 2020/5/11 下午5:25, Dima Stepanov wrote:
> >>>>On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> >>>>>On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>>>>>If vhost-user daemon is used as a backend for the vhost device, then we
> >>>>>>should consider a possibility of disconnect at any moment. If such
> >>>>>>disconnect happened in the vhost_migration_log() routine the vhost
> >>>>>>device structure will be clean up.
> >>>>>>At the start of the vhost_migration_log() function there is a check:
> >>>>>>   if (!dev->started) {
> >>>>>>       dev->log_enabled = enable;
> >>>>>>       return 0;
> >>>>>>   }
> >>>>>>To be consistent with this check add the same check after calling the
> >>>>>>vhost_dev_set_log() routine. This in general help not to break a
> >>>>>>migration due the assert() message. But it looks like that this code
> >>>>>>should be revised to handle these errors more carefully.
> >>>>>>
> >>>>>>In case of vhost-user device backend the fail paths should consider the
> >>>>>>state of the device. In this case we should skip some function calls
> >>>>>>during rollback on the error paths, so not to get the NULL dereference
> >>>>>>errors.
> >>>>>>
> >>>>>>Signed-off-by: Dima Stepanov<dimastep@yandex-team.ru>
> >>>>>>---
> >>>>>>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >>>>>>  1 file changed, 35 insertions(+), 4 deletions(-)
> >>>>>>
> >>>>>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>>index 3ee50c4..d5ab96d 100644
> >>>>>>--- a/hw/virtio/vhost.c
> >>>>>>+++ b/hw/virtio/vhost.c
> >>>>>>@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >>>>>>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>>>>  {
> >>>>>>      int r, i, idx;
> >>>>>>+
> >>>>>>+    if (!dev->started) {
> >>>>>>+        /*
> >>>>>>+         * If vhost-user daemon is used as a backend for the
> >>>>>>+         * device and the connection is broken, then the vhost_dev
> >>>>>>+         * structure will be reset all its values to 0.
> >>>>>>+         * Add additional check for the device state.
> >>>>>>+         */
> >>>>>>+        return -1;
> >>>>>>+    }
> >>>>>>+
> >>>>>>      r = vhost_dev_set_features(dev, enable_log);
> >>>>>>      if (r < 0) {
> >>>>>>          goto err_features;
> >>>>>>@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>>>>      }
> >>>>>>      return 0;
> >>>>>>  err_vq:
> >>>>>>-    for (; i >= 0; --i) {
> >>>>>>+    /*
> >>>>>>+     * Disconnect with the vhost-user daemon can lead to the
> >>>>>>+     * vhost_dev_cleanup() call which will clean up vhost_dev
> >>>>>>+     * structure.
> >>>>>>+     */
> >>>>>>+    for (; dev->started && (i >= 0); --i) {
> >>>>>>          idx = dev->vhost_ops->vhost_get_vq_index(
> >>>>>Why need the check of dev->started here, can started be modified outside
> >>>>>mainloop? If yes, I don't get the check of !dev->started in the beginning of
> >>>>>this function.
> >>>>>
> >>>>No dev->started can't change outside the mainloop. The main problem is
> >>>>only for the vhost_user_blk daemon. Consider the case when we
> >>>>successfully pass the dev->started check at the beginning of the
> >>>>function, but after it we hit the disconnect on the next call on the
> >>>>second or third iteration:
> >>>>      r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> >>>>The unix socket backend device will call the disconnect routine for this
> >>>>device and reset the structure. So the structure will be reset (and
> >>>>dev->started set to false) inside this set_addr() call.
> >>>I still don't get here. I think the disconnect can not happen in the middle
> >>>of vhost_dev_set_log() since both of them were running in mainloop. And even
> >>>if it can, we probably need other synchronization mechanism other than
> >>>simple check here.
> >>Disconnect isn't happened in the separate thread it is happened in this
> >>routine inside vhost_dev_set_log. When for instance vhost_user_write()
> >>call failed:
> >>   vhost_user_set_log_base()
> >>     vhost_user_write()
> >>       vhost_user_blk_disconnect()
> >>         vhost_dev_cleanup()
> >>           vhost_user_backend_cleanup()
> >>So the point is that if we somehow got a disconnect with the
> >>vhost-user-blk daemon before the vhost_user_write() call then it will
> >>continue clean up by running vhost_user_blk_disconnect() function. I
> >>wrote a more detailed backtrace stack in the separate thread, which is
> >>pretty similar to what we have here:
> >>   Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
> >>The places are different but the problem is pretty similar.
> >>
> >>So if vhost-user commands handshake then everything is fine and
> >>reconnect will work as expected. The only problem is how to handle
> >>reconnect properly between vhost-user command send/receive.
> >
> >So vhost net had this problem too.
> >
> >commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
> >Author: Marc-André Lureau<marcandre.lureau@redhat.com>
> >Date:   Mon Feb 27 14:49:56 2017 +0400
> >
> >     vhost-user: delay vhost_user_stop
> >     Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write
> >     may trigger a disconnect events, calling vhost_user_stop() and clearing
> >     all the vhost_dev strutures holding data that vhost.c functions expect
> >     to remain valid. Delay the cleanup to keep the vhost_dev structure
> >     valid during the vhost.c functions.
> >     Signed-off-by: Marc-André Lureau<marcandre.lureau@redhat.com>
> >     Message-id:20170227104956.24729-1-marcandre.lureau@redhat.com
> >     Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> >
> >it now has this code to address this:
> >
> >
> >     case CHR_EVENT_CLOSED:
> >         /* a close event may happen during a read/write, but vhost
> >          * code assumes the vhost_dev remains setup, so delay the
> >          * stop & clear to idle.
> >          * FIXME: better handle failure in vhost code, remove bh
> >          */
> >         if (s->watch) {
> >             AioContext *ctx = qemu_get_current_aio_context();
> >
> >             g_source_remove(s->watch);
> >             s->watch = 0;
> >             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> >                                      NULL, NULL, false);
> >
> >             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >         }
> >         break;
> >
> >I think it's time we dropped the FIXME and moved the handling to common
> >code. Jason? Marc-André?
> 
> 
> I agree. Just to confirm, do you prefer bh or doing changes like what is
> done in this series? It looks to me bh can have more easier codes.

Could it be a good idea just to make disconnect in the char device but
postphone clean up in the vhost-user-blk (or any other vhost-user
device) itself? So we are moving the postphone logic and decision from
the char device to vhost-user device. One of the idea i have is as
follows:
  - Put ourself in the INITIALIZATION state
  - Start these vhost-user "handshake" commands
  - If we got a disconnect error, perform disconnect, but don't clean up
    device (it will be clean up on the roll back). I can be done by 
    checking the state in vhost_user_..._disconnect routine or smth like it
  - vhost-user command returns error back to the _start() routine
  - Rollback in one place in the start() routine, by calling this
    postphoned clean up for the disconnect

> 
> Thanks
> 
> 
> >
> >
> >
> >
> >
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
  2020-05-13  9:36             ` Dima Stepanov
@ 2020-05-14  7:28               ` Jason Wang
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Wang @ 2020-05-14  7:28 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, stefanha, qemu-block, mst, qemu-devel, dgilbert,
	arei.gonglei, fengli, yc-core, pbonzini, marcandre.lureau,
	raphael.norwitz, mreitz


On 2020/5/13 下午5:36, Dima Stepanov wrote:
> On Wed, May 13, 2020 at 11:00:38AM +0800, Jason Wang wrote:
>> On 2020/5/12 下午5:08, Dima Stepanov wrote:
>>> On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote:
>>>> On 2020/5/11 下午5:11, Dima Stepanov wrote:
>>>>> On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:
>>>>>> On 2020/4/30 下午9:36, Dima Stepanov wrote:
>>>>>>> Since disconnect can happen at any time during initialization not all
>>>>>>> vring buffers (for instance used vring) can be intialized successfully.
>>>>>>> If the buffer was not initialized then vhost_memory_unmap call will lead
>>>>>>> to SIGSEGV. Add checks for the vring address value before calling unmap.
>>>>>>> Also add assert() in the vhost_memory_unmap() routine.
>>>>>>>
>>>>>>> Signed-off-by: Dima Stepanov<dimastep@yandex-team.ru>
>>>>>>> ---
>>>>>>>   hw/virtio/vhost.c | 27 +++++++++++++++++++++------
>>>>>>>   1 file changed, 21 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>>> index ddbdc53..3ee50c4 100644
>>>>>>> --- a/hw/virtio/vhost.c
>>>>>>> +++ b/hw/virtio/vhost.c
>>>>>>> @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
>>>>>>>                                  hwaddr len, int is_write,
>>>>>>>                                  hwaddr access_len)
>>>>>>>   {
>>>>>>> +    assert(buffer);
>>>>>>> +
>>>>>>>       if (!vhost_dev_has_iommu(dev)) {
>>>>>>>           cpu_physical_memory_unmap(buffer, len, is_write, access_len);
>>>>>>>       }
>>>>>>> @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>>>>>>>                                                   vhost_vq_index);
>>>>>>>       }
>>>>>>> -    vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
>>>>>>> -                       1, virtio_queue_get_used_size(vdev, idx));
>>>>>>> -    vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
>>>>>>> -                       0, virtio_queue_get_avail_size(vdev, idx));
>>>>>>> -    vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
>>>>>>> -                       0, virtio_queue_get_desc_size(vdev, idx));
>>>>>>> +    /*
>>>>>>> +     * Since the vhost-user disconnect can happen during initialization
>>>>>>> +     * check if vring was initialized, before making unmap.
>>>>>>> +     */
>>>>>>> +    if (vq->used) {
>>>>>>> +        vhost_memory_unmap(dev, vq->used,
>>>>>>> +                           virtio_queue_get_used_size(vdev, idx),
>>>>>>> +                           1, virtio_queue_get_used_size(vdev, idx));
>>>>>>> +    }
>>>>>>> +    if (vq->avail) {
>>>>>>> +        vhost_memory_unmap(dev, vq->avail,
>>>>>>> +                           virtio_queue_get_avail_size(vdev, idx),
>>>>>>> +                           0, virtio_queue_get_avail_size(vdev, idx));
>>>>>>> +    }
>>>>>>> +    if (vq->desc) {
>>>>>>> +        vhost_memory_unmap(dev, vq->desc,
>>>>>>> +                           virtio_queue_get_desc_size(vdev, idx),
>>>>>>> +                           0, virtio_queue_get_desc_size(vdev, idx));
>>>>>>> +    }
>>>>>> Any reason not checking hdev->started instead? vhost_dev_start() will set it
>>>>>> to true if virtqueues were correctly mapped.
>>>>>>
>>>>>> Thanks
>>>>> Well i see it a little bit different:
>>>>>   - vhost_dev_start() sets hdev->started to true before starting
>>>>>     virtqueues
>>>>>   - vhost_virtqueue_start() maps all the memory
>>>>> If we hit the vhost disconnect at the start of the
>>>>> vhost_virtqueue_start(), for instance for this call:
>>>>>    r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
>>>>> Then we will call vhost_user_blk_disconnect:
>>>>>    vhost_user_blk_disconnect()->
>>>>>      vhost_user_blk_stop()->
>>>>>        vhost_dev_stop()->
>>>>>          vhost_virtqueue_stop()
>>>>> As a result we will come in this routine with the hdev->started still
>>>>> set to true, but if used/avail/desc fields still uninitialized and set
>>>>> to 0.
>>>> I may miss something, but consider both vhost_dev_start() and
>>>> vhost_user_blk_disconnect() were serialized in main loop. Can this really
>>>> happen?
>>> Yes, consider the case when we start the vhost-user-blk device:
>>>    vhost_dev_start->
>>>      vhost_virtqueue_start
>>> And we got a disconnect in the middle of vhost_virtqueue_start()
>>> routine, for instance:
>>>    1000     vq->num = state.num = virtio_queue_get_num(vdev, idx);
>>>    1001     r = dev->vhost_ops->vhost_set_vring_num(dev, &state);
>>>    1002     if (r) {
>>>    1003         VHOST_OPS_DEBUG("vhost_set_vring_num failed");
>>>    1004         return -errno;
>>>    1005     }
>>>    --> Here we got a disconnect <--
>>>    1006
>>>    1007     state.num = virtio_queue_get_last_avail_idx(vdev, idx);
>>>    1008     r = dev->vhost_ops->vhost_set_vring_base(dev, &state);
>>>    1009     if (r) {
>>>    1010         VHOST_OPS_DEBUG("vhost_set_vring_base failed");
>>>    1011         return -errno;
>>>    1012     }
>>> As a result call to vhost_set_vring_base will call the disconnect
>>> routine. The backtrace log for SIGSEGV is as follows:
>>>    Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>>    [Switching to Thread 0x7ffff2ea9700 (LWP 183150)]
>>>    0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>    (gdb) bt
>>>    #0  0x00007ffff4d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>    #1  0x000055555590fd90 in flatview_write_continue (fv=0x7fffec4a2600,
>>>        addr=0, attrs=..., ptr=0x0, len=1028, addr1=0,
>>>        l=1028, mr=0x555556b1b310) at ./exec.c:3142
>>>    #2  0x000055555590fe98 in flatview_write (fv=0x7fffec4a2600, addr=0,
>>>        attrs=..., buf=0x0, len=1028) at ./exec.c:3177
>>>    #3  0x00005555559101ed in address_space_write (as=0x555556893940
>>>        <address_space_memory>, addr=0, attrs=..., buf=0x0,
>>>        len=1028) at ./exec.c:3268
>>>    #4  0x0000555555910caf in address_space_unmap (as=0x555556893940
>>>        <address_space_memory>, buffer=0x0, len=1028,
>>>        is_write=true, access_len=1028) at ./exec.c:3592
>>>    #5  0x0000555555910d82 in cpu_physical_memory_unmap (buffer=0x0,
>>>        len=1028, is_write=true, access_len=1028) at ./exec.c:3613
>>>    #6  0x0000555555a16fa1 in vhost_memory_unmap (dev=0x7ffff22723e8,
>>>        buffer=0x0, len=1028, is_write=1, access_len=1028)
>>>        at ./hw/virtio/vhost.c:318
>>>    #7  0x0000555555a192a2 in vhost_virtqueue_stop (dev=0x7ffff22723e8,
>>>        vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1136
>>>    #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8,
>>>        vdev=0x7ffff22721a0) at ./hw/virtio/vhost.c:1702
>>>    #9  0x00005555559b6532 in vhost_user_blk_stop (vdev=0x7ffff22721a0)
>>>        at ./hw/block/vhost-user-blk.c:196
>>>    #10 0x00005555559b6b73 in vhost_user_blk_disconnect (dev=0x7ffff22721a0)
>>>        at ./hw/block/vhost-user-blk.c:365
>>>    #11 0x00005555559b6c4f in vhost_user_blk_event (opaque=0x7ffff22721a0,
>>>        event=CHR_EVENT_CLOSED) at ./hw/block/vhost-user-blk.c:384
>>>    #12 0x0000555555e65f7e in chr_be_event (s=0x555556b182e0, event=CHR_EVENT_CLOSED)
>>>        at chardev/char.c:60
>>>    #13 0x0000555555e6601a in qemu_chr_be_event (s=0x555556b182e0,
>>>        event=CHR_EVENT_CLOSED) at chardev/char.c:80
>>>    #14 0x0000555555e6eef3 in tcp_chr_disconnect_locked (chr=0x555556b182e0)
>>>        at chardev/char-socket.c:488
>>>    #15 0x0000555555e6e23f in tcp_chr_write (chr=0x555556b182e0,
>>>        buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-socket.c:178
>>>    #16 0x0000555555e6616c in qemu_chr_write_buffer (s=0x555556b182e0,
>>>        buf=0x7ffff2ea8220 "\n", len=20, offset=0x7ffff2ea8150, write_all=true)
>>>        at chardev/char.c:120
>>>    #17 0x0000555555e662d9 in qemu_chr_write (s=0x555556b182e0, buf=0x7ffff2ea8220 "\n",
>>>        len=20, write_all=true) at chardev/char.c:155
>>>    #18 0x0000555555e693cc in qemu_chr_fe_write_all (be=0x7ffff2272360,
>>>        buf=0x7ffff2ea8220 "\n", len=20) at chardev/char-fe.c:53
>>>    #19 0x0000555555a1c489 in vhost_user_write (dev=0x7ffff22723e8,
>>>        msg=0x7ffff2ea8220, fds=0x0, fd_num=0) at ./hw/virtio/vhost-user.c:350
>>>    #20 0x0000555555a1d325 in vhost_set_vring (dev=0x7ffff22723e8,
>>>        request=10, ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:660
>>>    #21 0x0000555555a1d4c6 in vhost_user_set_vring_base (dev=0x7ffff22723e8,
>>>        ring=0x7ffff2ea8520) at ./hw/virtio/vhost-user.c:704
>>>    #22 0x0000555555a18c1b in vhost_virtqueue_start (dev=0x7ffff22723e8,
>>>        vdev=0x7ffff22721a0, vq=0x55555770abf0, idx=0) at ./hw/virtio/vhost.c:1009
>>>    #23 0x0000555555a1a9f5 in vhost_dev_start (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
>>>        at ./hw/virtio/vhost.c:1639
>>>    #24 0x00005555559b6367 in vhost_user_blk_start (vdev=0x7ffff22721a0)
>>>        at ./hw/block/vhost-user-blk.c:150
>>>    #25 0x00005555559b6653 in vhost_user_blk_set_status (vdev=0x7ffff22721a0, status=15 '\017')
>>>        at ./hw/block/vhost-user-blk.c:233
>>>    #26 0x0000555555a1072d in virtio_set_status (vdev=0x7ffff22721a0, val=15 '\017')
>>>        at ./hw/virtio/virtio.c:1956
>>> ---Type <return> to continue, or q <return> to quit---
>>>
>>> So while we inside vhost_user_blk_start() (frame#24) we are calling
>>> vhost_user_blk_disconnect() (frame#10). And for this call the
>>> dev->started field will be set to true.
>>>    (gdb) frame 8
>>>    #8  0x0000555555a1acc0 in vhost_dev_stop (hdev=0x7ffff22723e8, vdev=0x7ffff22721a0)
>>>        at ./hw/virtio/vhost.c:1702
>>>    1702            vhost_virtqueue_stop(hdev,
>>>    (gdb) p hdev->started
>>>    $1 = true
>>>
>>> It isn't an easy race to reproduce: to hit a disconnect at this window.
>>> We were able to hit it during long testing on one of the reconnect
>>> iteration. After it we were able to reproduce it with 100% by adding a
>>> sleep() call inside qemu to make the race window bigger.
>> Thanks for the patience.
>>
>> I miss the fact that the disconnection routine could be triggered from
>> chardev write.
>>
>> But the codes turns out to be very tricky and hard to debug since the code
>> was wrote to deal with the error returned from vhost_ops directly, it
>> doesn't expect vhost_dev_cleanup() was call silently for each
>> vhost_user_write(). It would introduce troubles if we want to add new
>> codes/operations to vhost.
>>
>> More questions:
>>
> Well these are good questions.
>
>> - Do we need to have some checking against hdev->started in each vhost user
>> ops?
> I can miss smth but it looks like that no. The vhost_dev_set_log()
> routine has some additional logic with the vhost_virtqueue_set_addr()
> and vhost_dev_set_features() this why we need those additional checks in
> the migration code. For the vhost-user devices initialization or
> deinitalization code we don't need those checks. Even more we couldn't
> add this check to the vhost user ops, since it ops itself will be reset.
> And i agree that even if we have no issues right now, by adding new code
> and missing these cases we could break smth. Because review process
> becomes harder.


Yes, such codes would be very hard to maintain.


>
> Maybe the good idea is postphone clean up, if we are in the middle of the
> initialization. So it could be smth like:
>    - Put ourself in the INITIALIZATION state
>    - Start these vhost-user "handshake" commands
>    - If we got a disconnect error, perform disconnect, but don't clean up
>      device (it will be clean up on the roll back). I can be done by
>      checking the state in vhost_user_..._disconnect routine or smth like
>      it
>    - vhost-user command returns error back to the _start() routine
>    - Rollback in one place in the start() routine, by calling this
>      postphoned clean up for the disconnect
>
> Michael wrote that there was similar fix for vhost-net. Can't say for
> now, could it help or not, but need to take a look on it also.


Right.

Thanks


>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-13  9:47               ` Dima Stepanov
@ 2020-05-14  7:34                 ` Jason Wang
  2020-05-15 16:54                   ` Dima Stepanov
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-14  7:34 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, yc-core, qemu-block, Michael S. Tsirkin, qemu-devel,
	dgilbert, arei.gonglei, fengli, stefanha, marcandre.lureau,
	pbonzini, raphael.norwitz, mreitz


On 2020/5/13 下午5:47, Dima Stepanov wrote:
>>>      case CHR_EVENT_CLOSED:
>>>          /* a close event may happen during a read/write, but vhost
>>>           * code assumes the vhost_dev remains setup, so delay the
>>>           * stop & clear to idle.
>>>           * FIXME: better handle failure in vhost code, remove bh
>>>           */
>>>          if (s->watch) {
>>>              AioContext *ctx = qemu_get_current_aio_context();
>>>
>>>              g_source_remove(s->watch);
>>>              s->watch = 0;
>>>              qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
>>>                                       NULL, NULL, false);
>>>
>>>              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
>>>          }
>>>          break;
>>>
>>> I think it's time we dropped the FIXME and moved the handling to common
>>> code. Jason? Marc-André?
>> I agree. Just to confirm, do you prefer bh or doing changes like what is
>> done in this series? It looks to me bh can have more easier codes.
> Could it be a good idea just to make disconnect in the char device but
> postphone clean up in the vhost-user-blk (or any other vhost-user
> device) itself? So we are moving the postphone logic and decision from
> the char device to vhost-user device. One of the idea i have is as
> follows:
>    - Put ourself in the INITIALIZATION state
>    - Start these vhost-user "handshake" commands
>    - If we got a disconnect error, perform disconnect, but don't clean up
>      device (it will be clean up on the roll back). I can be done by
>      checking the state in vhost_user_..._disconnect routine or smth like it


Any issue you saw just using the aio bh as Michael posted above.

Then we don't need to deal with the silent vhost_dev_stop() and we will 
have codes that is much more easier to understand.

Thank


>    - vhost-user command returns error back to the _start() routine
>    - Rollback in one place in the start() routine, by calling this
>      postphoned clean up for the disconnect
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-14  7:34                 ` Jason Wang
@ 2020-05-15 16:54                   ` Dima Stepanov
  2020-05-16  3:20                     ` Li Feng
                                       ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-15 16:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, yc-core, qemu-block, Michael S. Tsirkin, qemu-devel,
	dgilbert, arei.gonglei, fengli, stefanha, marcandre.lureau,
	pbonzini, raphael.norwitz, mreitz

On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> 
> On 2020/5/13 下午5:47, Dima Stepanov wrote:
> >>>     case CHR_EVENT_CLOSED:
> >>>         /* a close event may happen during a read/write, but vhost
> >>>          * code assumes the vhost_dev remains setup, so delay the
> >>>          * stop & clear to idle.
> >>>          * FIXME: better handle failure in vhost code, remove bh
> >>>          */
> >>>         if (s->watch) {
> >>>             AioContext *ctx = qemu_get_current_aio_context();
> >>>
> >>>             g_source_remove(s->watch);
> >>>             s->watch = 0;
> >>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> >>>                                      NULL, NULL, false);
> >>>
> >>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >>>         }
> >>>         break;
> >>>
> >>>I think it's time we dropped the FIXME and moved the handling to common
> >>>code. Jason? Marc-André?
> >>I agree. Just to confirm, do you prefer bh or doing changes like what is
> >>done in this series? It looks to me bh can have more easier codes.
> >Could it be a good idea just to make disconnect in the char device but
> >postphone clean up in the vhost-user-blk (or any other vhost-user
> >device) itself? So we are moving the postphone logic and decision from
> >the char device to vhost-user device. One of the idea i have is as
> >follows:
> >   - Put ourself in the INITIALIZATION state
> >   - Start these vhost-user "handshake" commands
> >   - If we got a disconnect error, perform disconnect, but don't clean up
> >     device (it will be clean up on the roll back). I can be done by
> >     checking the state in vhost_user_..._disconnect routine or smth like it
> 
> 
> Any issue you saw just using the aio bh as Michael posted above.
> 
> Then we don't need to deal with the silent vhost_dev_stop() and we will have
> codes that is much more easier to understand.
I've implemented this solution inside
hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
using the s->connected field. Looks good and more correct fix ). I have
two questions here before i'll rework the fixes:
1. Is it okay to make the similar fix inside vhost_user_blk_event() or
we are looking for more generic vhost-user solution? What do you think?
2. For migration we require an additional information that for the
vhost-user device it isn't an error, because i'm trigerring the
following assert error:
  Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
  Program terminated with signal SIGABRT, Aborted.
  #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
  [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]

  (gdb) bt
  #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
  #2  0x00005648ea376ee6 in vhost_log_global_start
      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
  #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
      at ./memory.c:2611
  #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
      at ./migration/ram.c:2305
  #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
      at ./migration/ram.c:2323
  #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
      opaque=0x5648eb1f0f20 <ram_state>)
      at ./migration/ram.c:2436
  #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
      migration/savevm.c:1176
  #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
      migration/migration.c:3416
  #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
      util/qemu-thread-posix.c:519
  #10 0x00007fb56eac56ba in start_thread () from
      /lib/x86_64-linux-gnu/libpthread.so.0
  #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
  (gdb) frame 2
  #2  0x00005648ea376ee6 in vhost_log_global_start
     (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
  857             abort();
  (gdb) list
  852     {
  853         int r;
  854
  855         r = vhost_migration_log(listener, true);
  856         if (r < 0) {
  857             abort();
  858         }
  859     }
  860
  861     static void vhost_log_global_stop(MemoryListener *listener)
Since bh postphone the clean up, we can't use the ->started field.
Do we have any mechanism to get the device type/state in the common
vhost_migration_log() routine? So for example for the vhost-user/disconnect
device we will be able to return 0. Or should we implement it and introduce
it in this patch set?

Thanks, Dima.

> 
> Thank
> 
> 
> >   - vhost-user command returns error back to the _start() routine
> >   - Rollback in one place in the start() routine, by calling this
> >     postphoned clean up for the disconnect
> >
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-15 16:54                   ` Dima Stepanov
@ 2020-05-16  3:20                     ` Li Feng
  2020-05-18  2:52                       ` Jason Wang
  2020-05-18  9:27                       ` Dima Stepanov
  2020-05-18  2:50                     ` Jason Wang
  2020-05-19  9:59                     ` Michael S. Tsirkin
  2 siblings, 2 replies; 51+ messages in thread
From: Li Feng @ 2020-05-16  3:20 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: Fam Zheng, Kevin Wolf, yc-core, open list:Block layer core,
	Michael S. Tsirkin, Jason Wang, open list:All patches CC here,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini, Raphael Norwitz,
	Max Reitz

Hi, Dima.
This abort is what I have mentioned in my previous email.
I have triggered this crash without any fix a week ago.
And I have written a test patch to let vhost_log_global_start return
int and propagate the error to up layer.
However, my change is a little large, because the origin callback
return void, and don't do some rollback.
After test, the migration could migrate to dst successfully, and fio
is still running perfectly, but the src vm is still stuck here, no
crash.

Is it right to return this error to the up layer?

Thanks,
Feng Li

Dima Stepanov <dimastep@yandex-team.ru> 于2020年5月16日周六 上午12:55写道:
>
> On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> >
> > On 2020/5/13 下午5:47, Dima Stepanov wrote:
> > >>>     case CHR_EVENT_CLOSED:
> > >>>         /* a close event may happen during a read/write, but vhost
> > >>>          * code assumes the vhost_dev remains setup, so delay the
> > >>>          * stop & clear to idle.
> > >>>          * FIXME: better handle failure in vhost code, remove bh
> > >>>          */
> > >>>         if (s->watch) {
> > >>>             AioContext *ctx = qemu_get_current_aio_context();
> > >>>
> > >>>             g_source_remove(s->watch);
> > >>>             s->watch = 0;
> > >>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > >>>                                      NULL, NULL, false);
> > >>>
> > >>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> > >>>         }
> > >>>         break;
> > >>>
> > >>>I think it's time we dropped the FIXME and moved the handling to common
> > >>>code. Jason? Marc-André?
> > >>I agree. Just to confirm, do you prefer bh or doing changes like what is
> > >>done in this series? It looks to me bh can have more easier codes.
> > >Could it be a good idea just to make disconnect in the char device but
> > >postphone clean up in the vhost-user-blk (or any other vhost-user
> > >device) itself? So we are moving the postphone logic and decision from
> > >the char device to vhost-user device. One of the idea i have is as
> > >follows:
> > >   - Put ourself in the INITIALIZATION state
> > >   - Start these vhost-user "handshake" commands
> > >   - If we got a disconnect error, perform disconnect, but don't clean up
> > >     device (it will be clean up on the roll back). I can be done by
> > >     checking the state in vhost_user_..._disconnect routine or smth like it
> >
> >
> > Any issue you saw just using the aio bh as Michael posted above.
> >
> > Then we don't need to deal with the silent vhost_dev_stop() and we will have
> > codes that is much more easier to understand.
> I've implemented this solution inside
> hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> using the s->connected field. Looks good and more correct fix ). I have
> two questions here before i'll rework the fixes:
> 1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> we are looking for more generic vhost-user solution? What do you think?
> 2. For migration we require an additional information that for the
> vhost-user device it isn't an error, because i'm trigerring the
> following assert error:
>   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
>   Program terminated with signal SIGABRT, Aborted.
>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
>
>   (gdb) bt
>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
>   #2  0x00005648ea376ee6 in vhost_log_global_start
>       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
>       at ./memory.c:2611
>   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
>       at ./migration/ram.c:2305
>   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
>       at ./migration/ram.c:2323
>   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
>       opaque=0x5648eb1f0f20 <ram_state>)
>       at ./migration/ram.c:2436
>   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
>       migration/savevm.c:1176
>   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
>       migration/migration.c:3416
>   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
>       util/qemu-thread-posix.c:519
>   #10 0x00007fb56eac56ba in start_thread () from
>       /lib/x86_64-linux-gnu/libpthread.so.0
>   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>   (gdb) frame 2
>   #2  0x00005648ea376ee6 in vhost_log_global_start
>      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>   857             abort();
>   (gdb) list
>   852     {
>   853         int r;
>   854
>   855         r = vhost_migration_log(listener, true);
>   856         if (r < 0) {
>   857             abort();
>   858         }
>   859     }
>   860
>   861     static void vhost_log_global_stop(MemoryListener *listener)
> Since bh postphone the clean up, we can't use the ->started field.
> Do we have any mechanism to get the device type/state in the common
> vhost_migration_log() routine? So for example for the vhost-user/disconnect
> device we will be able to return 0. Or should we implement it and introduce
> it in this patch set?
>
> Thanks, Dima.
>
> >
> > Thank
> >
> >
> > >   - vhost-user command returns error back to the _start() routine
> > >   - Rollback in one place in the start() routine, by calling this
> > >     postphoned clean up for the disconnect
> > >
> >


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-15 16:54                   ` Dima Stepanov
  2020-05-16  3:20                     ` Li Feng
@ 2020-05-18  2:50                     ` Jason Wang
  2020-05-18  9:41                       ` Dima Stepanov
  2020-05-19  9:59                     ` Michael S. Tsirkin
  2 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-18  2:50 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, yc-core, qemu-block, Michael S. Tsirkin, qemu-devel,
	dgilbert, arei.gonglei, fengli, stefanha, marcandre.lureau,
	pbonzini, raphael.norwitz, mreitz


On 2020/5/16 上午12:54, Dima Stepanov wrote:
> On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
>> On 2020/5/13 下午5:47, Dima Stepanov wrote:
>>>>>      case CHR_EVENT_CLOSED:
>>>>>          /* a close event may happen during a read/write, but vhost
>>>>>           * code assumes the vhost_dev remains setup, so delay the
>>>>>           * stop & clear to idle.
>>>>>           * FIXME: better handle failure in vhost code, remove bh
>>>>>           */
>>>>>          if (s->watch) {
>>>>>              AioContext *ctx = qemu_get_current_aio_context();
>>>>>
>>>>>              g_source_remove(s->watch);
>>>>>              s->watch = 0;
>>>>>              qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
>>>>>                                       NULL, NULL, false);
>>>>>
>>>>>              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
>>>>>          }
>>>>>          break;
>>>>>
>>>>> I think it's time we dropped the FIXME and moved the handling to common
>>>>> code. Jason? Marc-André?
>>>> I agree. Just to confirm, do you prefer bh or doing changes like what is
>>>> done in this series? It looks to me bh can have more easier codes.
>>> Could it be a good idea just to make disconnect in the char device but
>>> postphone clean up in the vhost-user-blk (or any other vhost-user
>>> device) itself? So we are moving the postphone logic and decision from
>>> the char device to vhost-user device. One of the idea i have is as
>>> follows:
>>>    - Put ourself in the INITIALIZATION state
>>>    - Start these vhost-user "handshake" commands
>>>    - If we got a disconnect error, perform disconnect, but don't clean up
>>>      device (it will be clean up on the roll back). I can be done by
>>>      checking the state in vhost_user_..._disconnect routine or smth like it
>>
>> Any issue you saw just using the aio bh as Michael posted above.
>>
>> Then we don't need to deal with the silent vhost_dev_stop() and we will have
>> codes that is much more easier to understand.
> I've implemented this solution inside
> hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> using the s->connected field. Looks good and more correct fix ). I have
> two questions here before i'll rework the fixes:
> 1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> we are looking for more generic vhost-user solution? What do you think?


I think I agree with Michael, it's better to have a generic vhost-user 
solution. But if it turns out to be not easy, we can start from fixing 
vhost-user-blk.


> 2. For migration we require an additional information that for the
> vhost-user device it isn't an error, because i'm trigerring the
> following assert error:
>    Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
>    Program terminated with signal SIGABRT, Aborted.
>    #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>    [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
>
>    (gdb) bt
>    #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>    #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
>    #2  0x00005648ea376ee6 in vhost_log_global_start
>        (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>    #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
>        at ./memory.c:2611
>    #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
>        at ./migration/ram.c:2305
>    #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
>        at ./migration/ram.c:2323
>    #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
>        opaque=0x5648eb1f0f20 <ram_state>)
>        at ./migration/ram.c:2436
>    #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
>        migration/savevm.c:1176
>    #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
>        migration/migration.c:3416
>    #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
>        util/qemu-thread-posix.c:519
>    #10 0x00007fb56eac56ba in start_thread () from
>        /lib/x86_64-linux-gnu/libpthread.so.0
>    #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>    (gdb) frame 2
>    #2  0x00005648ea376ee6 in vhost_log_global_start
>       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>    857             abort();
>    (gdb) list
>    852     {
>    853         int r;
>    854
>    855         r = vhost_migration_log(listener, true);
>    856         if (r < 0) {
>    857             abort();
>    858         }
>    859     }
>    860
>    861     static void vhost_log_global_stop(MemoryListener *listener)
> Since bh postphone the clean up, we can't use the ->started field.
> Do we have any mechanism to get the device type/state in the common
> vhost_migration_log() routine? So for example for the vhost-user/disconnect
> device we will be able to return 0. Or should we implement it and introduce
> it in this patch set?


This requires more thought, I will reply in Feng's mail.

Thanks


>
> Thanks, Dima.
>
>> Thank
>>
>>
>>>    - vhost-user command returns error back to the _start() routine
>>>    - Rollback in one place in the start() routine, by calling this
>>>      postphoned clean up for the disconnect
>>>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-16  3:20                     ` Li Feng
@ 2020-05-18  2:52                       ` Jason Wang
  2020-05-18  9:33                         ` Dima Stepanov
  2020-05-18  9:27                       ` Dima Stepanov
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Wang @ 2020-05-18  2:52 UTC (permalink / raw)
  To: Li Feng, Dima Stepanov
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin,
	open list:All patches CC here, Dr. David Alan Gilbert, Gonglei,
	yc-core, Paolo Bonzini, Marc-André Lureau, Raphael Norwitz,
	Max Reitz


On 2020/5/16 上午11:20, Li Feng wrote:
> Hi, Dima.
> This abort is what I have mentioned in my previous email.
> I have triggered this crash without any fix a week ago.
> And I have written a test patch to let vhost_log_global_start return
> int and propagate the error to up layer.
> However, my change is a little large, because the origin callback
> return void, and don't do some rollback.
> After test, the migration could migrate to dst successfully, and fio
> is still running perfectly, but the src vm is still stuck here, no
> crash.
>
> Is it right to return this error to the up layer?


That could be a solution or we may ask David for more suggestion.

Another thing that might be useful is to block re connection during 
migration.

Thanks


>
> Thanks,
> Feng Li
>
> Dima Stepanov <dimastep@yandex-team.ru> 于2020年5月16日周六 上午12:55写道:
>> On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
>>> On 2020/5/13 下午5:47, Dima Stepanov wrote:
>>>>>>      case CHR_EVENT_CLOSED:
>>>>>>          /* a close event may happen during a read/write, but vhost
>>>>>>           * code assumes the vhost_dev remains setup, so delay the
>>>>>>           * stop & clear to idle.
>>>>>>           * FIXME: better handle failure in vhost code, remove bh
>>>>>>           */
>>>>>>          if (s->watch) {
>>>>>>              AioContext *ctx = qemu_get_current_aio_context();
>>>>>>
>>>>>>              g_source_remove(s->watch);
>>>>>>              s->watch = 0;
>>>>>>              qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
>>>>>>                                       NULL, NULL, false);
>>>>>>
>>>>>>              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
>>>>>>          }
>>>>>>          break;
>>>>>>
>>>>>> I think it's time we dropped the FIXME and moved the handling to common
>>>>>> code. Jason? Marc-André?
>>>>> I agree. Just to confirm, do you prefer bh or doing changes like what is
>>>>> done in this series? It looks to me bh can have more easier codes.
>>>> Could it be a good idea just to make disconnect in the char device but
>>>> postphone clean up in the vhost-user-blk (or any other vhost-user
>>>> device) itself? So we are moving the postphone logic and decision from
>>>> the char device to vhost-user device. One of the idea i have is as
>>>> follows:
>>>>    - Put ourself in the INITIALIZATION state
>>>>    - Start these vhost-user "handshake" commands
>>>>    - If we got a disconnect error, perform disconnect, but don't clean up
>>>>      device (it will be clean up on the roll back). I can be done by
>>>>      checking the state in vhost_user_..._disconnect routine or smth like it
>>>
>>> Any issue you saw just using the aio bh as Michael posted above.
>>>
>>> Then we don't need to deal with the silent vhost_dev_stop() and we will have
>>> codes that is much more easier to understand.
>> I've implemented this solution inside
>> hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
>> using the s->connected field. Looks good and more correct fix ). I have
>> two questions here before i'll rework the fixes:
>> 1. Is it okay to make the similar fix inside vhost_user_blk_event() or
>> we are looking for more generic vhost-user solution? What do you think?
>> 2. For migration we require an additional information that for the
>> vhost-user device it isn't an error, because i'm trigerring the
>> following assert error:
>>    Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
>>    Program terminated with signal SIGABRT, Aborted.
>>    #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>    [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
>>
>>    (gdb) bt
>>    #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>    #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>    #2  0x00005648ea376ee6 in vhost_log_global_start
>>        (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>>    #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
>>        at ./memory.c:2611
>>    #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
>>        at ./migration/ram.c:2305
>>    #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
>>        at ./migration/ram.c:2323
>>    #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
>>        opaque=0x5648eb1f0f20 <ram_state>)
>>        at ./migration/ram.c:2436
>>    #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
>>        migration/savevm.c:1176
>>    #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
>>        migration/migration.c:3416
>>    #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
>>        util/qemu-thread-posix.c:519
>>    #10 0x00007fb56eac56ba in start_thread () from
>>        /lib/x86_64-linux-gnu/libpthread.so.0
>>    #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>    (gdb) frame 2
>>    #2  0x00005648ea376ee6 in vhost_log_global_start
>>       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>>    857             abort();
>>    (gdb) list
>>    852     {
>>    853         int r;
>>    854
>>    855         r = vhost_migration_log(listener, true);
>>    856         if (r < 0) {
>>    857             abort();
>>    858         }
>>    859     }
>>    860
>>    861     static void vhost_log_global_stop(MemoryListener *listener)
>> Since bh postphone the clean up, we can't use the ->started field.
>> Do we have any mechanism to get the device type/state in the common
>> vhost_migration_log() routine? So for example for the vhost-user/disconnect
>> device we will be able to return 0. Or should we implement it and introduce
>> it in this patch set?
>>
>> Thanks, Dima.
>>
>>> Thank
>>>
>>>
>>>>    - vhost-user command returns error back to the _start() routine
>>>>    - Rollback in one place in the start() routine, by calling this
>>>>      postphoned clean up for the disconnect
>>>>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-16  3:20                     ` Li Feng
  2020-05-18  2:52                       ` Jason Wang
@ 2020-05-18  9:27                       ` Dima Stepanov
  1 sibling, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-18  9:27 UTC (permalink / raw)
  To: Li Feng
  Cc: Fam Zheng, Kevin Wolf, yc-core, open list:Block layer core,
	Michael S. Tsirkin, Jason Wang, open list:All patches CC here,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini, Raphael Norwitz,
	Max Reitz

On Sat, May 16, 2020 at 11:20:03AM +0800, Li Feng wrote:
> Hi, Dima.
> This abort is what I have mentioned in my previous email.
Yes, i understood it and this abort() message was fixed by the previous
patch. But since we try new postphone approach this patch isn't working
and we need to get the device state somehow:
  - vhost-user disconnect => device not started

> I have triggered this crash without any fix a week ago.
> And I have written a test patch to let vhost_log_global_start return
> int and propagate the error to up layer.
> However, my change is a little large, because the origin callback
> return void, and don't do some rollback.
> After test, the migration could migrate to dst successfully, and fio
> is still running perfectly, but the src vm is still stuck here, no
> crash.
> 
> Is it right to return this error to the up layer?
Well it is the question we talk about with you, i'm also not sure. I can
only summarize some of the statements i used:
  - device state: not started -> is okay for migration
  - device state: vhost-user disconnect, this is the same as "not
    started" -> is okay for migration
  - at least my internal migration tests passed
So my idea for the fix here is smth like:
Add device callback, for instance vhost_dev_started() which will
return device state. And for the vhost-user device (or at least
vhost-user-blk) device this callback will consider the connected field
and return true or false.
As a result vhost_migration_log() will check device state at the start
of the routine and before return.
But if the disconnect state isn't okay for migration, then we should
return an error.

No other comments mixed in below.

> 
> Thanks,
> Feng Li
> 
> Dima Stepanov <dimastep@yandex-team.ru> 于2020年5月16日周六 上午12:55写道:
> >
> > On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> > >
> > > On 2020/5/13 下午5:47, Dima Stepanov wrote:
> > > >>>     case CHR_EVENT_CLOSED:
> > > >>>         /* a close event may happen during a read/write, but vhost
> > > >>>          * code assumes the vhost_dev remains setup, so delay the
> > > >>>          * stop & clear to idle.
> > > >>>          * FIXME: better handle failure in vhost code, remove bh
> > > >>>          */
> > > >>>         if (s->watch) {
> > > >>>             AioContext *ctx = qemu_get_current_aio_context();
> > > >>>
> > > >>>             g_source_remove(s->watch);
> > > >>>             s->watch = 0;
> > > >>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > > >>>                                      NULL, NULL, false);
> > > >>>
> > > >>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> > > >>>         }
> > > >>>         break;
> > > >>>
> > > >>>I think it's time we dropped the FIXME and moved the handling to common
> > > >>>code. Jason? Marc-André?
> > > >>I agree. Just to confirm, do you prefer bh or doing changes like what is
> > > >>done in this series? It looks to me bh can have more easier codes.
> > > >Could it be a good idea just to make disconnect in the char device but
> > > >postphone clean up in the vhost-user-blk (or any other vhost-user
> > > >device) itself? So we are moving the postphone logic and decision from
> > > >the char device to vhost-user device. One of the idea i have is as
> > > >follows:
> > > >   - Put ourself in the INITIALIZATION state
> > > >   - Start these vhost-user "handshake" commands
> > > >   - If we got a disconnect error, perform disconnect, but don't clean up
> > > >     device (it will be clean up on the roll back). I can be done by
> > > >     checking the state in vhost_user_..._disconnect routine or smth like it
> > >
> > >
> > > Any issue you saw just using the aio bh as Michael posted above.
> > >
> > > Then we don't need to deal with the silent vhost_dev_stop() and we will have
> > > codes that is much more easier to understand.
> > I've implemented this solution inside
> > hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> > using the s->connected field. Looks good and more correct fix ). I have
> > two questions here before i'll rework the fixes:
> > 1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> > we are looking for more generic vhost-user solution? What do you think?
> > 2. For migration we require an additional information that for the
> > vhost-user device it isn't an error, because i'm trigerring the
> > following assert error:
> >   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
> >   Program terminated with signal SIGABRT, Aborted.
> >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> >
> >   (gdb) bt
> >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> >   #2  0x00005648ea376ee6 in vhost_log_global_start
> >       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
> >       at ./memory.c:2611
> >   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
> >       at ./migration/ram.c:2305
> >   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
> >       at ./migration/ram.c:2323
> >   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
> >       opaque=0x5648eb1f0f20 <ram_state>)
> >       at ./migration/ram.c:2436
> >   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
> >       migration/savevm.c:1176
> >   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
> >       migration/migration.c:3416
> >   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
> >       util/qemu-thread-posix.c:519
> >   #10 0x00007fb56eac56ba in start_thread () from
> >       /lib/x86_64-linux-gnu/libpthread.so.0
> >   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >   (gdb) frame 2
> >   #2  0x00005648ea376ee6 in vhost_log_global_start
> >      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >   857             abort();
> >   (gdb) list
> >   852     {
> >   853         int r;
> >   854
> >   855         r = vhost_migration_log(listener, true);
> >   856         if (r < 0) {
> >   857             abort();
> >   858         }
> >   859     }
> >   860
> >   861     static void vhost_log_global_stop(MemoryListener *listener)
> > Since bh postphone the clean up, we can't use the ->started field.
> > Do we have any mechanism to get the device type/state in the common
> > vhost_migration_log() routine? So for example for the vhost-user/disconnect
> > device we will be able to return 0. Or should we implement it and introduce
> > it in this patch set?
> >
> > Thanks, Dima.
> >
> > >
> > > Thank
> > >
> > >
> > > >   - vhost-user command returns error back to the _start() routine
> > > >   - Rollback in one place in the start() routine, by calling this
> > > >     postphoned clean up for the disconnect
> > > >
> > >


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-18  2:52                       ` Jason Wang
@ 2020-05-18  9:33                         ` Dima Stepanov
  0 siblings, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-18  9:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Fam Zheng, Kevin Wolf, Stefan Hajnoczi,
	open list:Block layer core, Michael S. Tsirkin,
	open list:All patches CC here, Dr. David Alan Gilbert, Gonglei,
	Li Feng, yc-core, Paolo Bonzini, Marc-André Lureau,
	Raphael Norwitz, Max Reitz

On Mon, May 18, 2020 at 10:52:08AM +0800, Jason Wang wrote:
> 
> On 2020/5/16 上午11:20, Li Feng wrote:
> >Hi, Dima.
> >This abort is what I have mentioned in my previous email.
> >I have triggered this crash without any fix a week ago.
> >And I have written a test patch to let vhost_log_global_start return
> >int and propagate the error to up layer.
> >However, my change is a little large, because the origin callback
> >return void, and don't do some rollback.
> >After test, the migration could migrate to dst successfully, and fio
> >is still running perfectly, but the src vm is still stuck here, no
> >crash.
> >
> >Is it right to return this error to the up layer?
> 
> 
> That could be a solution or we may ask David for more suggestion.
> 
> Another thing that might be useful is to block re connection during
> migration.
I've written a little more information as answer to Feng's mail. But
what if add some new callback to get the device started state (started or not).
And for the vhost-user (or at least vhost-usr-blk) devices it will use
the connected field also to return the device state:
  - disconnect -> not started
For other devices we can just return the started field value as it is
right now.

No other comments mixed in below.

> 
> Thanks
> 
> 
> >
> >Thanks,
> >Feng Li
> >
> >Dima Stepanov <dimastep@yandex-team.ru> 于2020年5月16日周六 上午12:55写道:
> >>On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> >>>On 2020/5/13 下午5:47, Dima Stepanov wrote:
> >>>>>>     case CHR_EVENT_CLOSED:
> >>>>>>         /* a close event may happen during a read/write, but vhost
> >>>>>>          * code assumes the vhost_dev remains setup, so delay the
> >>>>>>          * stop & clear to idle.
> >>>>>>          * FIXME: better handle failure in vhost code, remove bh
> >>>>>>          */
> >>>>>>         if (s->watch) {
> >>>>>>             AioContext *ctx = qemu_get_current_aio_context();
> >>>>>>
> >>>>>>             g_source_remove(s->watch);
> >>>>>>             s->watch = 0;
> >>>>>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> >>>>>>                                      NULL, NULL, false);
> >>>>>>
> >>>>>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >>>>>>         }
> >>>>>>         break;
> >>>>>>
> >>>>>>I think it's time we dropped the FIXME and moved the handling to common
> >>>>>>code. Jason? Marc-André?
> >>>>>I agree. Just to confirm, do you prefer bh or doing changes like what is
> >>>>>done in this series? It looks to me bh can have more easier codes.
> >>>>Could it be a good idea just to make disconnect in the char device but
> >>>>postphone clean up in the vhost-user-blk (or any other vhost-user
> >>>>device) itself? So we are moving the postphone logic and decision from
> >>>>the char device to vhost-user device. One of the idea i have is as
> >>>>follows:
> >>>>   - Put ourself in the INITIALIZATION state
> >>>>   - Start these vhost-user "handshake" commands
> >>>>   - If we got a disconnect error, perform disconnect, but don't clean up
> >>>>     device (it will be clean up on the roll back). I can be done by
> >>>>     checking the state in vhost_user_..._disconnect routine or smth like it
> >>>
> >>>Any issue you saw just using the aio bh as Michael posted above.
> >>>
> >>>Then we don't need to deal with the silent vhost_dev_stop() and we will have
> >>>codes that is much more easier to understand.
> >>I've implemented this solution inside
> >>hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> >>using the s->connected field. Looks good and more correct fix ). I have
> >>two questions here before i'll rework the fixes:
> >>1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> >>we are looking for more generic vhost-user solution? What do you think?
> >>2. For migration we require an additional information that for the
> >>vhost-user device it isn't an error, because i'm trigerring the
> >>following assert error:
> >>   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
> >>   Program terminated with signal SIGABRT, Aborted.
> >>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> >>
> >>   (gdb) bt
> >>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> >>   #2  0x00005648ea376ee6 in vhost_log_global_start
> >>       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >>   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
> >>       at ./memory.c:2611
> >>   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
> >>       at ./migration/ram.c:2305
> >>   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
> >>       at ./migration/ram.c:2323
> >>   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
> >>       opaque=0x5648eb1f0f20 <ram_state>)
> >>       at ./migration/ram.c:2436
> >>   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
> >>       migration/savevm.c:1176
> >>   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
> >>       migration/migration.c:3416
> >>   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
> >>       util/qemu-thread-posix.c:519
> >>   #10 0x00007fb56eac56ba in start_thread () from
> >>       /lib/x86_64-linux-gnu/libpthread.so.0
> >>   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >>   (gdb) frame 2
> >>   #2  0x00005648ea376ee6 in vhost_log_global_start
> >>      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >>   857             abort();
> >>   (gdb) list
> >>   852     {
> >>   853         int r;
> >>   854
> >>   855         r = vhost_migration_log(listener, true);
> >>   856         if (r < 0) {
> >>   857             abort();
> >>   858         }
> >>   859     }
> >>   860
> >>   861     static void vhost_log_global_stop(MemoryListener *listener)
> >>Since bh postphone the clean up, we can't use the ->started field.
> >>Do we have any mechanism to get the device type/state in the common
> >>vhost_migration_log() routine? So for example for the vhost-user/disconnect
> >>device we will be able to return 0. Or should we implement it and introduce
> >>it in this patch set?
> >>
> >>Thanks, Dima.
> >>
> >>>Thank
> >>>
> >>>
> >>>>   - vhost-user command returns error back to the _start() routine
> >>>>   - Rollback in one place in the start() routine, by calling this
> >>>>     postphoned clean up for the disconnect
> >>>>
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-18  2:50                     ` Jason Wang
@ 2020-05-18  9:41                       ` Dima Stepanov
  2020-05-18  9:53                         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-18  9:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, yc-core, qemu-block, Michael S. Tsirkin, qemu-devel,
	dgilbert, arei.gonglei, fengli, stefanha, marcandre.lureau,
	pbonzini, raphael.norwitz, mreitz

On Mon, May 18, 2020 at 10:50:39AM +0800, Jason Wang wrote:
> 
> On 2020/5/16 上午12:54, Dima Stepanov wrote:
> >On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> >>On 2020/5/13 下午5:47, Dima Stepanov wrote:
> >>>>>     case CHR_EVENT_CLOSED:
> >>>>>         /* a close event may happen during a read/write, but vhost
> >>>>>          * code assumes the vhost_dev remains setup, so delay the
> >>>>>          * stop & clear to idle.
> >>>>>          * FIXME: better handle failure in vhost code, remove bh
> >>>>>          */
> >>>>>         if (s->watch) {
> >>>>>             AioContext *ctx = qemu_get_current_aio_context();
> >>>>>
> >>>>>             g_source_remove(s->watch);
> >>>>>             s->watch = 0;
> >>>>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> >>>>>                                      NULL, NULL, false);
> >>>>>
> >>>>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >>>>>         }
> >>>>>         break;
> >>>>>
> >>>>>I think it's time we dropped the FIXME and moved the handling to common
> >>>>>code. Jason? Marc-André?
> >>>>I agree. Just to confirm, do you prefer bh or doing changes like what is
> >>>>done in this series? It looks to me bh can have more easier codes.
> >>>Could it be a good idea just to make disconnect in the char device but
> >>>postphone clean up in the vhost-user-blk (or any other vhost-user
> >>>device) itself? So we are moving the postphone logic and decision from
> >>>the char device to vhost-user device. One of the idea i have is as
> >>>follows:
> >>>   - Put ourself in the INITIALIZATION state
> >>>   - Start these vhost-user "handshake" commands
> >>>   - If we got a disconnect error, perform disconnect, but don't clean up
> >>>     device (it will be clean up on the roll back). I can be done by
> >>>     checking the state in vhost_user_..._disconnect routine or smth like it
> >>
> >>Any issue you saw just using the aio bh as Michael posted above.
> >>
> >>Then we don't need to deal with the silent vhost_dev_stop() and we will have
> >>codes that is much more easier to understand.
> >I've implemented this solution inside
> >hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> >using the s->connected field. Looks good and more correct fix ). I have
> >two questions here before i'll rework the fixes:
> >1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> >we are looking for more generic vhost-user solution? What do you think?
> 
> 
> I think I agree with Michael, it's better to have a generic vhost-user
> solution. But if it turns out to be not easy, we can start from fixing
> vhost-user-blk.
I also agree, but as i see it right now the connect/disconnect events
are handled inside each vhost-user device implementation file. So it will
need some global refactoring. So i suggest having this fix first and
after it refactoring the code:
 - more devices will be involved
 - i see there is some difference in device handling

> 
> 
> >2. For migration we require an additional information that for the
> >vhost-user device it isn't an error, because i'm trigerring the
> >following assert error:
> >   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
> >   Program terminated with signal SIGABRT, Aborted.
> >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> >
> >   (gdb) bt
> >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> >   #2  0x00005648ea376ee6 in vhost_log_global_start
> >       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
> >       at ./memory.c:2611
> >   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
> >       at ./migration/ram.c:2305
> >   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
> >       at ./migration/ram.c:2323
> >   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
> >       opaque=0x5648eb1f0f20 <ram_state>)
> >       at ./migration/ram.c:2436
> >   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
> >       migration/savevm.c:1176
> >   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
> >       migration/migration.c:3416
> >   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
> >       util/qemu-thread-posix.c:519
> >   #10 0x00007fb56eac56ba in start_thread () from
> >       /lib/x86_64-linux-gnu/libpthread.so.0
> >   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >   (gdb) frame 2
> >   #2  0x00005648ea376ee6 in vhost_log_global_start
> >      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >   857             abort();
> >   (gdb) list
> >   852     {
> >   853         int r;
> >   854
> >   855         r = vhost_migration_log(listener, true);
> >   856         if (r < 0) {
> >   857             abort();
> >   858         }
> >   859     }
> >   860
> >   861     static void vhost_log_global_stop(MemoryListener *listener)
> >Since bh postphone the clean up, we can't use the ->started field.
> >Do we have any mechanism to get the device type/state in the common
> >vhost_migration_log() routine? So for example for the vhost-user/disconnect
> >device we will be able to return 0. Or should we implement it and introduce
> >it in this patch set?
> 
> 
> This requires more thought, I will reply in Feng's mail.
Okay, let's continue discussion there.

No other comments mixed in below.

Thanks, Dima.

> 
> Thanks
> 
> 
> >
> >Thanks, Dima.
> >
> >>Thank
> >>
> >>
> >>>   - vhost-user command returns error back to the _start() routine
> >>>   - Rollback in one place in the start() routine, by calling this
> >>>     postphoned clean up for the disconnect
> >>>
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-18  9:41                       ` Dima Stepanov
@ 2020-05-18  9:53                         ` Dr. David Alan Gilbert
  2020-05-19  9:07                           ` Dima Stepanov
  0 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2020-05-18  9:53 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, yc-core, qemu-block, Michael S. Tsirkin, Jason Wang,
	qemu-devel, raphael.norwitz, arei.gonglei, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

* Dima Stepanov (dimastep@yandex-team.ru) wrote:
> On Mon, May 18, 2020 at 10:50:39AM +0800, Jason Wang wrote:
> > 
> > On 2020/5/16 上午12:54, Dima Stepanov wrote:
> > >On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> > >>On 2020/5/13 下午5:47, Dima Stepanov wrote:
> > >>>>>     case CHR_EVENT_CLOSED:
> > >>>>>         /* a close event may happen during a read/write, but vhost
> > >>>>>          * code assumes the vhost_dev remains setup, so delay the
> > >>>>>          * stop & clear to idle.
> > >>>>>          * FIXME: better handle failure in vhost code, remove bh
> > >>>>>          */
> > >>>>>         if (s->watch) {
> > >>>>>             AioContext *ctx = qemu_get_current_aio_context();
> > >>>>>
> > >>>>>             g_source_remove(s->watch);
> > >>>>>             s->watch = 0;
> > >>>>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > >>>>>                                      NULL, NULL, false);
> > >>>>>
> > >>>>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> > >>>>>         }
> > >>>>>         break;
> > >>>>>
> > >>>>>I think it's time we dropped the FIXME and moved the handling to common
> > >>>>>code. Jason? Marc-André?
> > >>>>I agree. Just to confirm, do you prefer bh or doing changes like what is
> > >>>>done in this series? It looks to me bh can have more easier codes.
> > >>>Could it be a good idea just to make disconnect in the char device but
> > >>>postphone clean up in the vhost-user-blk (or any other vhost-user
> > >>>device) itself? So we are moving the postphone logic and decision from
> > >>>the char device to vhost-user device. One of the idea i have is as
> > >>>follows:
> > >>>   - Put ourself in the INITIALIZATION state
> > >>>   - Start these vhost-user "handshake" commands
> > >>>   - If we got a disconnect error, perform disconnect, but don't clean up
> > >>>     device (it will be clean up on the roll back). I can be done by
> > >>>     checking the state in vhost_user_..._disconnect routine or smth like it
> > >>
> > >>Any issue you saw just using the aio bh as Michael posted above.
> > >>
> > >>Then we don't need to deal with the silent vhost_dev_stop() and we will have
> > >>codes that is much more easier to understand.
> > >I've implemented this solution inside
> > >hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> > >using the s->connected field. Looks good and more correct fix ). I have
> > >two questions here before i'll rework the fixes:
> > >1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> > >we are looking for more generic vhost-user solution? What do you think?
> > 
> > 
> > I think I agree with Michael, it's better to have a generic vhost-user
> > solution. But if it turns out to be not easy, we can start from fixing
> > vhost-user-blk.
> I also agree, but as i see it right now the connect/disconnect events
> are handled inside each vhost-user device implementation file. So it will
> need some global refactoring. So i suggest having this fix first and
> after it refactoring the code:
>  - more devices will be involved
>  - i see there is some difference in device handling

I'm following bits of this discussion, some thoughts;
if your device doesn't support reconnect, then if, at the start of
migration you find that you can't start the log what is the correct
behaviour?  You can't carry on with the migration because you'd have an
inconsistent migration state; so I guess that's why the abort() is there
- but I think I'd generally prefer to fail the migration and hope the
vhsot device is still working for anything other than the log.

You're going to have to be pretty careful with the ordering of reconect
- reconnecting on the source during a migration sounds pretty hairy, but
a migration can take many minutes, so if you really want to survive this
I guess you have to.

Dave


> > 
> > 
> > >2. For migration we require an additional information that for the
> > >vhost-user device it isn't an error, because i'm trigerring the
> > >following assert error:
> > >   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
> > >   Program terminated with signal SIGABRT, Aborted.
> > >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > >   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> > >
> > >   (gdb) bt
> > >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > >   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > >   #2  0x00005648ea376ee6 in vhost_log_global_start
> > >       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> > >   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
> > >       at ./memory.c:2611
> > >   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
> > >       at ./migration/ram.c:2305
> > >   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
> > >       at ./migration/ram.c:2323
> > >   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
> > >       opaque=0x5648eb1f0f20 <ram_state>)
> > >       at ./migration/ram.c:2436
> > >   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
> > >       migration/savevm.c:1176
> > >   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
> > >       migration/migration.c:3416
> > >   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
> > >       util/qemu-thread-posix.c:519
> > >   #10 0x00007fb56eac56ba in start_thread () from
> > >       /lib/x86_64-linux-gnu/libpthread.so.0
> > >   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > >   (gdb) frame 2
> > >   #2  0x00005648ea376ee6 in vhost_log_global_start
> > >      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> > >   857             abort();
> > >   (gdb) list
> > >   852     {
> > >   853         int r;
> > >   854
> > >   855         r = vhost_migration_log(listener, true);
> > >   856         if (r < 0) {
> > >   857             abort();
> > >   858         }
> > >   859     }
> > >   860
> > >   861     static void vhost_log_global_stop(MemoryListener *listener)
> > >Since bh postphone the clean up, we can't use the ->started field.
> > >Do we have any mechanism to get the device type/state in the common
> > >vhost_migration_log() routine? So for example for the vhost-user/disconnect
> > >device we will be able to return 0. Or should we implement it and introduce
> > >it in this patch set?
> > 
> > 
> > This requires more thought, I will reply in Feng's mail.
> Okay, let's continue discussion there.
> 
> No other comments mixed in below.
> 
> Thanks, Dima.
> 
> > 
> > Thanks
> > 
> > 
> > >
> > >Thanks, Dima.
> > >
> > >>Thank
> > >>
> > >>
> > >>>   - vhost-user command returns error back to the _start() routine
> > >>>   - Rollback in one place in the start() routine, by calling this
> > >>>     postphoned clean up for the disconnect
> > >>>
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-18  9:53                         ` Dr. David Alan Gilbert
@ 2020-05-19  9:07                           ` Dima Stepanov
  2020-05-19 10:24                             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 51+ messages in thread
From: Dima Stepanov @ 2020-05-19  9:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: fam, kwolf, yc-core, qemu-block, Michael S. Tsirkin, Jason Wang,
	qemu-devel, raphael.norwitz, arei.gonglei, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

On Mon, May 18, 2020 at 10:53:59AM +0100, Dr. David Alan Gilbert wrote:
> * Dima Stepanov (dimastep@yandex-team.ru) wrote:
> > On Mon, May 18, 2020 at 10:50:39AM +0800, Jason Wang wrote:
> > > 
> > > On 2020/5/16 上午12:54, Dima Stepanov wrote:
> > > >On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> > > >>On 2020/5/13 下午5:47, Dima Stepanov wrote:
> > > >>>>>     case CHR_EVENT_CLOSED:
> > > >>>>>         /* a close event may happen during a read/write, but vhost
> > > >>>>>          * code assumes the vhost_dev remains setup, so delay the
> > > >>>>>          * stop & clear to idle.
> > > >>>>>          * FIXME: better handle failure in vhost code, remove bh
> > > >>>>>          */
> > > >>>>>         if (s->watch) {
> > > >>>>>             AioContext *ctx = qemu_get_current_aio_context();
> > > >>>>>
> > > >>>>>             g_source_remove(s->watch);
> > > >>>>>             s->watch = 0;
> > > >>>>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > > >>>>>                                      NULL, NULL, false);
> > > >>>>>
> > > >>>>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> > > >>>>>         }
> > > >>>>>         break;
> > > >>>>>
> > > >>>>>I think it's time we dropped the FIXME and moved the handling to common
> > > >>>>>code. Jason? Marc-André?
> > > >>>>I agree. Just to confirm, do you prefer bh or doing changes like what is
> > > >>>>done in this series? It looks to me bh can have more easier codes.
> > > >>>Could it be a good idea just to make disconnect in the char device but
> > > >>>postphone clean up in the vhost-user-blk (or any other vhost-user
> > > >>>device) itself? So we are moving the postphone logic and decision from
> > > >>>the char device to vhost-user device. One of the idea i have is as
> > > >>>follows:
> > > >>>   - Put ourself in the INITIALIZATION state
> > > >>>   - Start these vhost-user "handshake" commands
> > > >>>   - If we got a disconnect error, perform disconnect, but don't clean up
> > > >>>     device (it will be clean up on the roll back). I can be done by
> > > >>>     checking the state in vhost_user_..._disconnect routine or smth like it
> > > >>
> > > >>Any issue you saw just using the aio bh as Michael posted above.
> > > >>
> > > >>Then we don't need to deal with the silent vhost_dev_stop() and we will have
> > > >>codes that is much more easier to understand.
> > > >I've implemented this solution inside
> > > >hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> > > >using the s->connected field. Looks good and more correct fix ). I have
> > > >two questions here before i'll rework the fixes:
> > > >1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> > > >we are looking for more generic vhost-user solution? What do you think?
> > > 
> > > 
> > > I think I agree with Michael, it's better to have a generic vhost-user
> > > solution. But if it turns out to be not easy, we can start from fixing
> > > vhost-user-blk.
> > I also agree, but as i see it right now the connect/disconnect events
> > are handled inside each vhost-user device implementation file. So it will
> > need some global refactoring. So i suggest having this fix first and
> > after it refactoring the code:
> >  - more devices will be involved
> >  - i see there is some difference in device handling
> 
> I'm following bits of this discussion, some thoughts;
> if your device doesn't support reconnect, then if, at the start of
> migration you find that you can't start the log what is the correct
> behaviour?
I'm not sure here, but it looks like that in this case the device state
will be:
  disconnect -> stopped (will not be changed during migration, because
  reconnect isn't supported)
And because of it the device state will not be changed during migration,
so there is no need for log and migration could be completed
successfully.
So as i see it (i could be wrong here) that:
 - it is okay: if device is not started and we will not change this
   state during migration + log start is failed
 - it is not okay: if device is started + log start is failed (because
   we can't handle the dirty pages and so on during migration)

> You can't carry on with the migration because you'd have an
> inconsistent migration state; so I guess that's why the abort() is there
> - but I think I'd generally prefer to fail the migration and hope the
> vhsot device is still working for anything other than the log.
> 
> You're going to have to be pretty careful with the ordering of reconect
> - reconnecting on the source during a migration sounds pretty hairy, but
> a migration can take many minutes, so if you really want to survive this
> I guess you have to.
Maybe if we get a disconnect during migration then we could postphone or
just don't make reconnect at all till the end of migration on the source
side. This will make a device left in the stopped state.

Thanks, Dima.

> 
> Dave
> 
> 
> > > 
> > > 
> > > >2. For migration we require an additional information that for the
> > > >vhost-user device it isn't an error, because i'm trigerring the
> > > >following assert error:
> > > >   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
> > > >   Program terminated with signal SIGABRT, Aborted.
> > > >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > > >   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> > > >
> > > >   (gdb) bt
> > > >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > > >   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > > >   #2  0x00005648ea376ee6 in vhost_log_global_start
> > > >       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> > > >   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
> > > >       at ./memory.c:2611
> > > >   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
> > > >       at ./migration/ram.c:2305
> > > >   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
> > > >       at ./migration/ram.c:2323
> > > >   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
> > > >       opaque=0x5648eb1f0f20 <ram_state>)
> > > >       at ./migration/ram.c:2436
> > > >   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
> > > >       migration/savevm.c:1176
> > > >   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
> > > >       migration/migration.c:3416
> > > >   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
> > > >       util/qemu-thread-posix.c:519
> > > >   #10 0x00007fb56eac56ba in start_thread () from
> > > >       /lib/x86_64-linux-gnu/libpthread.so.0
> > > >   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > > >   (gdb) frame 2
> > > >   #2  0x00005648ea376ee6 in vhost_log_global_start
> > > >      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> > > >   857             abort();
> > > >   (gdb) list
> > > >   852     {
> > > >   853         int r;
> > > >   854
> > > >   855         r = vhost_migration_log(listener, true);
> > > >   856         if (r < 0) {
> > > >   857             abort();
> > > >   858         }
> > > >   859     }
> > > >   860
> > > >   861     static void vhost_log_global_stop(MemoryListener *listener)
> > > >Since bh postphone the clean up, we can't use the ->started field.
> > > >Do we have any mechanism to get the device type/state in the common
> > > >vhost_migration_log() routine? So for example for the vhost-user/disconnect
> > > >device we will be able to return 0. Or should we implement it and introduce
> > > >it in this patch set?
> > > 
> > > 
> > > This requires more thought, I will reply in Feng's mail.
> > Okay, let's continue discussion there.
> > 
> > No other comments mixed in below.
> > 
> > Thanks, Dima.
> > 
> > > 
> > > Thanks
> > > 
> > > 
> > > >
> > > >Thanks, Dima.
> > > >
> > > >>Thank
> > > >>
> > > >>
> > > >>>   - vhost-user command returns error back to the _start() routine
> > > >>>   - Rollback in one place in the start() routine, by calling this
> > > >>>     postphoned clean up for the disconnect
> > > >>>
> > > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-13  5:56             ` Jason Wang
  2020-05-13  9:47               ` Dima Stepanov
@ 2020-05-19  9:13               ` Dima Stepanov
  1 sibling, 0 replies; 51+ messages in thread
From: Dima Stepanov @ 2020-05-19  9:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: fam, kwolf, stefanha, qemu-block, Michael S. Tsirkin, qemu-devel,
	dgilbert, arei.gonglei, fengli, yc-core, pbonzini,
	marcandre.lureau, raphael.norwitz, mreitz

On Wed, May 13, 2020 at 01:56:18PM +0800, Jason Wang wrote:
> 
> On 2020/5/13 下午12:15, Michael S. Tsirkin wrote:
> >On Tue, May 12, 2020 at 12:35:30PM +0300, Dima Stepanov wrote:
> >>On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:
> >>>On 2020/5/11 下午5:25, Dima Stepanov wrote:
> >>>>On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> >>>>>On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>>>>>If vhost-user daemon is used as a backend for the vhost device, then we
> >>>>>>should consider a possibility of disconnect at any moment. If such
> >>>>>>disconnect happened in the vhost_migration_log() routine the vhost
> >>>>>>device structure will be clean up.
> >>>>>>At the start of the vhost_migration_log() function there is a check:
> >>>>>>   if (!dev->started) {
> >>>>>>       dev->log_enabled = enable;
> >>>>>>       return 0;
> >>>>>>   }
> >>>>>>To be consistent with this check add the same check after calling the
> >>>>>>vhost_dev_set_log() routine. This in general help not to break a
> >>>>>>migration due the assert() message. But it looks like that this code
> >>>>>>should be revised to handle these errors more carefully.
> >>>>>>
> >>>>>>In case of vhost-user device backend the fail paths should consider the
> >>>>>>state of the device. In this case we should skip some function calls
> >>>>>>during rollback on the error paths, so not to get the NULL dereference
> >>>>>>errors.
> >>>>>>
> >>>>>>Signed-off-by: Dima Stepanov<dimastep@yandex-team.ru>
> >>>>>>---
> >>>>>>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
> >>>>>>  1 file changed, 35 insertions(+), 4 deletions(-)
> >>>>>>
> >>>>>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>>index 3ee50c4..d5ab96d 100644
> >>>>>>--- a/hw/virtio/vhost.c
> >>>>>>+++ b/hw/virtio/vhost.c
> >>>>>>@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >>>>>>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>>>>  {
> >>>>>>      int r, i, idx;
> >>>>>>+
> >>>>>>+    if (!dev->started) {
> >>>>>>+        /*
> >>>>>>+         * If vhost-user daemon is used as a backend for the
> >>>>>>+         * device and the connection is broken, then the vhost_dev
> >>>>>>+         * structure will be reset all its values to 0.
> >>>>>>+         * Add additional check for the device state.
> >>>>>>+         */
> >>>>>>+        return -1;
> >>>>>>+    }
> >>>>>>+
> >>>>>>      r = vhost_dev_set_features(dev, enable_log);
> >>>>>>      if (r < 0) {
> >>>>>>          goto err_features;
> >>>>>>@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>>>>      }
> >>>>>>      return 0;
> >>>>>>  err_vq:
> >>>>>>-    for (; i >= 0; --i) {
> >>>>>>+    /*
> >>>>>>+     * Disconnect with the vhost-user daemon can lead to the
> >>>>>>+     * vhost_dev_cleanup() call which will clean up vhost_dev
> >>>>>>+     * structure.
> >>>>>>+     */
> >>>>>>+    for (; dev->started && (i >= 0); --i) {
> >>>>>>          idx = dev->vhost_ops->vhost_get_vq_index(
> >>>>>Why need the check of dev->started here, can started be modified outside
> >>>>>mainloop? If yes, I don't get the check of !dev->started in the beginning of
> >>>>>this function.
> >>>>>
> >>>>No dev->started can't change outside the mainloop. The main problem is
> >>>>only for the vhost_user_blk daemon. Consider the case when we
> >>>>successfully pass the dev->started check at the beginning of the
> >>>>function, but after it we hit the disconnect on the next call on the
> >>>>second or third iteration:
> >>>>      r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> >>>>The unix socket backend device will call the disconnect routine for this
> >>>>device and reset the structure. So the structure will be reset (and
> >>>>dev->started set to false) inside this set_addr() call.
> >>>I still don't get here. I think the disconnect can not happen in the middle
> >>>of vhost_dev_set_log() since both of them were running in mainloop. And even
> >>>if it can, we probably need other synchronization mechanism other than
> >>>simple check here.
> >>Disconnect isn't happened in the separate thread it is happened in this
> >>routine inside vhost_dev_set_log. When for instance vhost_user_write()
> >>call failed:
> >>   vhost_user_set_log_base()
> >>     vhost_user_write()
> >>       vhost_user_blk_disconnect()
> >>         vhost_dev_cleanup()
> >>           vhost_user_backend_cleanup()
> >>So the point is that if we somehow got a disconnect with the
> >>vhost-user-blk daemon before the vhost_user_write() call then it will
> >>continue clean up by running vhost_user_blk_disconnect() function. I
> >>wrote a more detailed backtrace stack in the separate thread, which is
> >>pretty similar to what we have here:
> >>   Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
> >>The places are different but the problem is pretty similar.
> >>
> >>So if vhost-user commands handshake then everything is fine and
> >>reconnect will work as expected. The only problem is how to handle
> >>reconnect properly between vhost-user command send/receive.
> >
> >So vhost net had this problem too.
> >
> >commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
> >Author: Marc-André Lureau<marcandre.lureau@redhat.com>
> >Date:   Mon Feb 27 14:49:56 2017 +0400
> >
> >     vhost-user: delay vhost_user_stop
> >     Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write
> >     may trigger a disconnect events, calling vhost_user_stop() and clearing
> >     all the vhost_dev strutures holding data that vhost.c functions expect
> >     to remain valid. Delay the cleanup to keep the vhost_dev structure
> >     valid during the vhost.c functions.
> >     Signed-off-by: Marc-André Lureau<marcandre.lureau@redhat.com>
> >     Message-id:20170227104956.24729-1-marcandre.lureau@redhat.com
> >     Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> >
> >it now has this code to address this:
> >
> >
> >     case CHR_EVENT_CLOSED:
> >         /* a close event may happen during a read/write, but vhost
> >          * code assumes the vhost_dev remains setup, so delay the
> >          * stop & clear to idle.
> >          * FIXME: better handle failure in vhost code, remove bh
> >          */
> >         if (s->watch) {
> >             AioContext *ctx = qemu_get_current_aio_context();
> >
> >             g_source_remove(s->watch);
> >             s->watch = 0;
> >             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> >                                      NULL, NULL, false);
> >
> >             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >         }
> >         break;
> >
> >I think it's time we dropped the FIXME and moved the handling to common
> >code. Jason? Marc-André?
> 
> 
> I agree. Just to confirm, do you prefer bh or doing changes like what is
> done in this series? It looks to me bh can have more easier codes.
While we still having the discussion about migration and its states, i
think we can split this patchset. The first patchset will be about
fixing the vhost-usr-blk reconnect issues. And the follow up patchset
will be about migration if we decide to fix/workaround this abort() call
at the start of migration. I'll prepare v3 with this bh approach and
will send it for review. What do you think?

Thanks, Dima.

> 
> Thanks
> 
> 
> >
> >
> >
> >
> >
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-15 16:54                   ` Dima Stepanov
  2020-05-16  3:20                     ` Li Feng
  2020-05-18  2:50                     ` Jason Wang
@ 2020-05-19  9:59                     ` Michael S. Tsirkin
  2 siblings, 0 replies; 51+ messages in thread
From: Michael S. Tsirkin @ 2020-05-19  9:59 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, yc-core, qemu-block, Jason Wang, qemu-devel,
	dgilbert, arei.gonglei, fengli, stefanha, marcandre.lureau,
	pbonzini, raphael.norwitz, mreitz

On Fri, May 15, 2020 at 07:54:57PM +0300, Dima Stepanov wrote:
> On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> > 
> > On 2020/5/13 下午5:47, Dima Stepanov wrote:
> > >>>     case CHR_EVENT_CLOSED:
> > >>>         /* a close event may happen during a read/write, but vhost
> > >>>          * code assumes the vhost_dev remains setup, so delay the
> > >>>          * stop & clear to idle.
> > >>>          * FIXME: better handle failure in vhost code, remove bh
> > >>>          */
> > >>>         if (s->watch) {
> > >>>             AioContext *ctx = qemu_get_current_aio_context();
> > >>>
> > >>>             g_source_remove(s->watch);
> > >>>             s->watch = 0;
> > >>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > >>>                                      NULL, NULL, false);
> > >>>
> > >>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> > >>>         }
> > >>>         break;
> > >>>
> > >>>I think it's time we dropped the FIXME and moved the handling to common
> > >>>code. Jason? Marc-André?
> > >>I agree. Just to confirm, do you prefer bh or doing changes like what is
> > >>done in this series? It looks to me bh can have more easier codes.
> > >Could it be a good idea just to make disconnect in the char device but
> > >postphone clean up in the vhost-user-blk (or any other vhost-user
> > >device) itself? So we are moving the postphone logic and decision from
> > >the char device to vhost-user device. One of the idea i have is as
> > >follows:
> > >   - Put ourself in the INITIALIZATION state
> > >   - Start these vhost-user "handshake" commands
> > >   - If we got a disconnect error, perform disconnect, but don't clean up
> > >     device (it will be clean up on the roll back). I can be done by
> > >     checking the state in vhost_user_..._disconnect routine or smth like it
> > 
> > 
> > Any issue you saw just using the aio bh as Michael posted above.
> > 
> > Then we don't need to deal with the silent vhost_dev_stop() and we will have
> > codes that is much more easier to understand.
> I've implemented this solution inside
> hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> using the s->connected field. Looks good and more correct fix ). I have
> two questions here before i'll rework the fixes:
> 1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> we are looking for more generic vhost-user solution? What do you think?

Either works I think.

> 2. For migration we require an additional information that for the
> vhost-user device it isn't an error, because i'm trigerring the
> following assert error:
>   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
>   Program terminated with signal SIGABRT, Aborted.
>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> 
>   (gdb) bt
>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
>   #2  0x00005648ea376ee6 in vhost_log_global_start
>       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
>       at ./memory.c:2611
>   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
>       at ./migration/ram.c:2305
>   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
>       at ./migration/ram.c:2323
>   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
>       opaque=0x5648eb1f0f20 <ram_state>)
>       at ./migration/ram.c:2436
>   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
>       migration/savevm.c:1176
>   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
>       migration/migration.c:3416
>   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
>       util/qemu-thread-posix.c:519
>   #10 0x00007fb56eac56ba in start_thread () from
>       /lib/x86_64-linux-gnu/libpthread.so.0
>   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>   (gdb) frame 2
>   #2  0x00005648ea376ee6 in vhost_log_global_start
>      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
>   857             abort();
>   (gdb) list
>   852     {
>   853         int r;
>   854
>   855         r = vhost_migration_log(listener, true);
>   856         if (r < 0) {
>   857             abort();
>   858         }
>   859     }
>   860
>   861     static void vhost_log_global_stop(MemoryListener *listener)
> Since bh postphone the clean up, we can't use the ->started field.
> Do we have any mechanism to get the device type/state in the common
> vhost_migration_log() routine? So for example for the vhost-user/disconnect
> device we will be able to return 0. Or should we implement it and introduce
> it in this patch set?
> 
> Thanks, Dima.
> 
> > 
> > Thank
> > 
> > 
> > >   - vhost-user command returns error back to the _start() routine
> > >   - Rollback in one place in the start() routine, by calling this
> > >     postphoned clean up for the disconnect
> > >
> > 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 5/5] vhost: add device started check in migration set log
  2020-05-19  9:07                           ` Dima Stepanov
@ 2020-05-19 10:24                             ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2020-05-19 10:24 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: fam, kwolf, yc-core, qemu-block, Michael S. Tsirkin, Jason Wang,
	qemu-devel, raphael.norwitz, arei.gonglei, fengli, stefanha,
	marcandre.lureau, pbonzini, mreitz

* Dima Stepanov (dimastep@yandex-team.ru) wrote:
> On Mon, May 18, 2020 at 10:53:59AM +0100, Dr. David Alan Gilbert wrote:
> > * Dima Stepanov (dimastep@yandex-team.ru) wrote:
> > > On Mon, May 18, 2020 at 10:50:39AM +0800, Jason Wang wrote:
> > > > 
> > > > On 2020/5/16 上午12:54, Dima Stepanov wrote:
> > > > >On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> > > > >>On 2020/5/13 下午5:47, Dima Stepanov wrote:
> > > > >>>>>     case CHR_EVENT_CLOSED:
> > > > >>>>>         /* a close event may happen during a read/write, but vhost
> > > > >>>>>          * code assumes the vhost_dev remains setup, so delay the
> > > > >>>>>          * stop & clear to idle.
> > > > >>>>>          * FIXME: better handle failure in vhost code, remove bh
> > > > >>>>>          */
> > > > >>>>>         if (s->watch) {
> > > > >>>>>             AioContext *ctx = qemu_get_current_aio_context();
> > > > >>>>>
> > > > >>>>>             g_source_remove(s->watch);
> > > > >>>>>             s->watch = 0;
> > > > >>>>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > > > >>>>>                                      NULL, NULL, false);
> > > > >>>>>
> > > > >>>>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> > > > >>>>>         }
> > > > >>>>>         break;
> > > > >>>>>
> > > > >>>>>I think it's time we dropped the FIXME and moved the handling to common
> > > > >>>>>code. Jason? Marc-André?
> > > > >>>>I agree. Just to confirm, do you prefer bh or doing changes like what is
> > > > >>>>done in this series? It looks to me bh can have more easier codes.
> > > > >>>Could it be a good idea just to make disconnect in the char device but
> > > > >>>postphone clean up in the vhost-user-blk (or any other vhost-user
> > > > >>>device) itself? So we are moving the postphone logic and decision from
> > > > >>>the char device to vhost-user device. One of the idea i have is as
> > > > >>>follows:
> > > > >>>   - Put ourself in the INITIALIZATION state
> > > > >>>   - Start these vhost-user "handshake" commands
> > > > >>>   - If we got a disconnect error, perform disconnect, but don't clean up
> > > > >>>     device (it will be clean up on the roll back). I can be done by
> > > > >>>     checking the state in vhost_user_..._disconnect routine or smth like it
> > > > >>
> > > > >>Any issue you saw just using the aio bh as Michael posted above.
> > > > >>
> > > > >>Then we don't need to deal with the silent vhost_dev_stop() and we will have
> > > > >>codes that is much more easier to understand.
> > > > >I've implemented this solution inside
> > > > >hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> > > > >using the s->connected field. Looks good and more correct fix ). I have
> > > > >two questions here before i'll rework the fixes:
> > > > >1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> > > > >we are looking for more generic vhost-user solution? What do you think?
> > > > 
> > > > 
> > > > I think I agree with Michael, it's better to have a generic vhost-user
> > > > solution. But if it turns out to be not easy, we can start from fixing
> > > > vhost-user-blk.
> > > I also agree, but as i see it right now the connect/disconnect events
> > > are handled inside each vhost-user device implementation file. So it will
> > > need some global refactoring. So i suggest having this fix first and
> > > after it refactoring the code:
> > >  - more devices will be involved
> > >  - i see there is some difference in device handling
> > 
> > I'm following bits of this discussion, some thoughts;
> > if your device doesn't support reconnect, then if, at the start of
> > migration you find that you can't start the log what is the correct
> > behaviour?
> I'm not sure here, but it looks like that in this case the device state
> will be:
>   disconnect -> stopped (will not be changed during migration, because
>   reconnect isn't supported)
> And because of it the device state will not be changed during migration,
> so there is no need for log and migration could be completed
> successfully.
> So as i see it (i could be wrong here) that:
>  - it is okay: if device is not started and we will not change this
>    state during migration + log start is failed
>  - it is not okay: if device is started + log start is failed (because
>    we can't handle the dirty pages and so on during migration)

Yes, that does make sense to me.

> > You can't carry on with the migration because you'd have an
> > inconsistent migration state; so I guess that's why the abort() is there
> > - but I think I'd generally prefer to fail the migration and hope the
> > vhsot device is still working for anything other than the log.
> > 
> > You're going to have to be pretty careful with the ordering of reconect
> > - reconnecting on the source during a migration sounds pretty hairy, but
> > a migration can take many minutes, so if you really want to survive this
> > I guess you have to.
> Maybe if we get a disconnect during migration then we could postphone or
> just don't make reconnect at all till the end of migration on the source
> side. This will make a device left in the stopped state.

Yes, that's the easiest way, but you may find people object to it being
out for that long.

Dave

> Thanks, Dima.
> 
> > 
> > Dave
> > 
> > 
> > > > 
> > > > 
> > > > >2. For migration we require an additional information that for the
> > > > >vhost-user device it isn't an error, because i'm trigerring the
> > > > >following assert error:
> > > > >   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
> > > > >   Program terminated with signal SIGABRT, Aborted.
> > > > >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > > > >   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> > > > >
> > > > >   (gdb) bt
> > > > >   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > > > >   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > > > >   #2  0x00005648ea376ee6 in vhost_log_global_start
> > > > >       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> > > > >   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
> > > > >       at ./memory.c:2611
> > > > >   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
> > > > >       at ./migration/ram.c:2305
> > > > >   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
> > > > >       at ./migration/ram.c:2323
> > > > >   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
> > > > >       opaque=0x5648eb1f0f20 <ram_state>)
> > > > >       at ./migration/ram.c:2436
> > > > >   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
> > > > >       migration/savevm.c:1176
> > > > >   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
> > > > >       migration/migration.c:3416
> > > > >   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
> > > > >       util/qemu-thread-posix.c:519
> > > > >   #10 0x00007fb56eac56ba in start_thread () from
> > > > >       /lib/x86_64-linux-gnu/libpthread.so.0
> > > > >   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > > > >   (gdb) frame 2
> > > > >   #2  0x00005648ea376ee6 in vhost_log_global_start
> > > > >      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> > > > >   857             abort();
> > > > >   (gdb) list
> > > > >   852     {
> > > > >   853         int r;
> > > > >   854
> > > > >   855         r = vhost_migration_log(listener, true);
> > > > >   856         if (r < 0) {
> > > > >   857             abort();
> > > > >   858         }
> > > > >   859     }
> > > > >   860
> > > > >   861     static void vhost_log_global_stop(MemoryListener *listener)
> > > > >Since bh postphone the clean up, we can't use the ->started field.
> > > > >Do we have any mechanism to get the device type/state in the common
> > > > >vhost_migration_log() routine? So for example for the vhost-user/disconnect
> > > > >device we will be able to return 0. Or should we implement it and introduce
> > > > >it in this patch set?
> > > > 
> > > > 
> > > > This requires more thought, I will reply in Feng's mail.
> > > Okay, let's continue discussion there.
> > > 
> > > No other comments mixed in below.
> > > 
> > > Thanks, Dima.
> > > 
> > > > 
> > > > Thanks
> > > > 
> > > > 
> > > > >
> > > > >Thanks, Dima.
> > > > >
> > > > >>Thank
> > > > >>
> > > > >>
> > > > >>>   - vhost-user command returns error back to the _start() routine
> > > > >>>   - Rollback in one place in the start() routine, by calling this
> > > > >>>     postphoned clean up for the disconnect
> > > > >>>
> > > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2020-05-19 10:25 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
2020-05-06  8:54   ` Li Feng
2020-05-06  9:46   ` Marc-André Lureau
2020-04-30 13:36 ` [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
2020-05-04  0:36   ` Raphael Norwitz
2020-05-06  8:54     ` Dima Stepanov
2020-05-11  3:03   ` Jason Wang
2020-05-11  8:55     ` Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
2020-05-04  1:06   ` Raphael Norwitz
2020-05-06  8:51     ` Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 4/5] vhost: check vring address before calling unmap Dima Stepanov
2020-05-04  1:13   ` Raphael Norwitz
2020-05-11  3:05   ` Jason Wang
2020-05-11  9:11     ` Dima Stepanov
2020-05-12  3:26       ` Jason Wang
2020-05-12  9:08         ` Dima Stepanov
2020-05-13  3:00           ` Jason Wang
2020-05-13  9:36             ` Dima Stepanov
2020-05-14  7:28               ` Jason Wang
2020-04-30 13:36 ` [PATCH v2 5/5] vhost: add device started check in migration set log Dima Stepanov
2020-05-06 22:08   ` Raphael Norwitz
2020-05-07  7:15     ` Michael S. Tsirkin
2020-05-07 15:35     ` Dima Stepanov
2020-05-11  0:03       ` Raphael Norwitz
2020-05-11  9:43         ` Dima Stepanov
2020-05-11  3:15   ` Jason Wang
2020-05-11  9:25     ` Dima Stepanov
2020-05-12  3:32       ` Jason Wang
2020-05-12  3:47         ` Li Feng
2020-05-12  9:23           ` Dima Stepanov
2020-05-12  9:35         ` Dima Stepanov
2020-05-13  3:20           ` Jason Wang
2020-05-13  9:39             ` Dima Stepanov
2020-05-13  4:15           ` Michael S. Tsirkin
2020-05-13  5:56             ` Jason Wang
2020-05-13  9:47               ` Dima Stepanov
2020-05-14  7:34                 ` Jason Wang
2020-05-15 16:54                   ` Dima Stepanov
2020-05-16  3:20                     ` Li Feng
2020-05-18  2:52                       ` Jason Wang
2020-05-18  9:33                         ` Dima Stepanov
2020-05-18  9:27                       ` Dima Stepanov
2020-05-18  2:50                     ` Jason Wang
2020-05-18  9:41                       ` Dima Stepanov
2020-05-18  9:53                         ` Dr. David Alan Gilbert
2020-05-19  9:07                           ` Dima Stepanov
2020-05-19 10:24                             ` Dr. David Alan Gilbert
2020-05-19  9:59                     ` Michael S. Tsirkin
2020-05-19  9:13               ` Dima Stepanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.