All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] vhost-vdpa multiqueue fixes
@ 2022-03-30  6:33 Si-Wei Liu
  2022-03-30  6:33 ` [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa Si-Wei Liu
                   ` (7 more replies)
  0 siblings, 8 replies; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

Hi,

This patch series attempt to fix a few issues in vhost-vdpa multiqueue functionality. 

Patch #1 is the formal submission for RFC patch in:
https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/

Patch #2 and #3 were taken from a previous patchset posted on qemu-devel:
https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/

albeit abandoned, two patches in that set turn out to be useful for patch #4, which is to fix a QEMU crash due to race condition.

Patch #5 through #7 are obviously small bug fixes. Please find the description of each in the commit log.

Thanks,
-Siwei

---

Eugenio Pérez (2):
  virtio-net: Fix indentation
  virtio-net: Only enable userland vq if using tap backend

Si-Wei Liu (5):
  virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  virtio: don't read pending event on host notifier if disabled
  vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
  vhost-net: fix improper cleanup in vhost_net_start
  vhost-vdpa: backend feature should set only once

 hw/net/vhost_net.c         |  4 +++-
 hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
 hw/virtio/vhost-vdpa.c     |  2 +-
 hw/virtio/virtio-bus.c     |  3 ++-
 hw/virtio/virtio.c         | 21 +++++++++++++--------
 include/hw/virtio/virtio.h |  2 ++
 net/vhost-vdpa.c           |  4 +++-
 7 files changed, 45 insertions(+), 16 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
@ 2022-03-30  6:33 ` Si-Wei Liu
  2022-03-30  9:00   ` Jason Wang
  2022-03-30  6:33 ` [PATCH 2/7] virtio-net: Fix indentation Si-Wei Liu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

With MQ enabled vdpa device and non-MQ supporting guest e.g.
booting vdpa with mq=on over OVMF of single vqp, below assert
failure is seen:

../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.

0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
   at ../hw/virtio/virtio-pci.c:974
8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
   at ../hw/net/vhost_net.c:361
10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
   at ../softmmu/memory.c:492
15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
   at ../softmmu/memory.c:1504
17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
   at ../softmmu/physmem.c:2914
20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
   attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6

The cause for the assert failure is due to that the vhost_dev index
for the ctrl vq was not aligned with actual one in use by the guest.
Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
if guest doesn't support multiqueue, the guest vq layout would shrink
to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
This results in ctrl_vq taking a different vhost_dev group index than
the default. We can map vq to the correct vhost_dev group by checking
if MQ is supported by guest and successfully negotiated. Since the
MQ feature is only present along with CTRL_VQ, we make sure the index
2 is only meant for the control vq while MQ is not supported by guest.

Be noted if QEMU or guest doesn't support control vq, there's no bother
exposing vhost_dev and guest notifier for the control vq. Since
vhost_net_start/stop implies DRIVER_OK is set in device status, feature
negotiation should be completed when reaching virtio_net_vhost_status().

Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/net/virtio-net.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1067e72..484b215 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -245,7 +245,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
     NetClientState *nc = qemu_get_queue(n->nic);
     int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
-    int cvq = n->max_ncs - n->max_queue_pairs;
+    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+              n->max_ncs - n->max_queue_pairs : 0;
 
     if (!get_vhost_net(nc->peer)) {
         return;
@@ -3170,8 +3171,14 @@ static NetClientInfo net_virtio_info = {
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
-    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    NetClientState *nc;
     assert(n->vhost_started);
+    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
+        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
+        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
+    } else {
+        nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    }
     return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
 }
 
@@ -3179,8 +3186,14 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
                                            bool mask)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
-    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    NetClientState *nc;
     assert(n->vhost_started);
+    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
+        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
+        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
+    } else {
+        nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    }
     vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
                              vdev, idx, mask);
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/7] virtio-net: Fix indentation
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
  2022-03-30  6:33 ` [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa Si-Wei Liu
@ 2022-03-30  6:33 ` Si-Wei Liu
  2022-03-30  9:01   ` Jason Wang
  2022-03-30  6:33 ` [PATCH 3/7] virtio-net: Only enable userland vq if using tap backend Si-Wei Liu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

From: Eugenio Pérez <eperezma@redhat.com>

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/net/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 484b215..ffaf481 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3523,7 +3523,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     nc = qemu_get_queue(n->nic);
     nc->rxfilter_notify_enabled = 1;
 
-   if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
         struct virtio_net_config netcfg = {};
         memcpy(&netcfg.mac, &n->nic_conf.macaddr, ETH_ALEN);
         vhost_net_set_config(get_vhost_net(nc->peer),
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/7] virtio-net: Only enable userland vq if using tap backend
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
  2022-03-30  6:33 ` [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa Si-Wei Liu
  2022-03-30  6:33 ` [PATCH 2/7] virtio-net: Fix indentation Si-Wei Liu
@ 2022-03-30  6:33 ` Si-Wei Liu
  2022-03-30  9:07   ` Jason Wang
  2022-03-30  6:33 ` [PATCH 4/7] virtio: don't read pending event on host notifier if disabled Si-Wei Liu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

From: Eugenio Pérez <eperezma@redhat.com>

Qemu falls back on userland handlers even if vhost-user and vhost-vdpa
cases. These assumes a tap device can handle the packets.

If a vdpa device fail to start, it can trigger a sigsegv because of
that. Do not resort on them unless actually possible.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/net/virtio-net.c        |  4 ++++
 hw/virtio/virtio.c         | 21 +++++++++++++--------
 include/hw/virtio/virtio.h |  2 ++
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ffaf481..9cdf777 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3523,6 +3523,10 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     nc = qemu_get_queue(n->nic);
     nc->rxfilter_notify_enabled = 1;
 
+    if (!nc->peer || nc->peer->info->type != NET_CLIENT_DRIVER_TAP) {
+        /* Only tap can use userspace networking */
+        vdev->disable_ioeventfd_handler = true;
+    }
     if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
         struct virtio_net_config netcfg = {};
         memcpy(&netcfg.mac, &n->nic_conf.macaddr, ETH_ALEN);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9d637e0..806603b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3708,17 +3708,22 @@ static int virtio_device_start_ioeventfd_impl(VirtIODevice *vdev)
             err = r;
             goto assign_error;
         }
-        event_notifier_set_handler(&vq->host_notifier,
-                                   virtio_queue_host_notifier_read);
+
+        if (!vdev->disable_ioeventfd_handler) {
+            event_notifier_set_handler(&vq->host_notifier,
+                                       virtio_queue_host_notifier_read);
+        }
     }
 
-    for (n = 0; n < VIRTIO_QUEUE_MAX; n++) {
-        /* Kick right away to begin processing requests already in vring */
-        VirtQueue *vq = &vdev->vq[n];
-        if (!vq->vring.num) {
-            continue;
+    if (!vdev->disable_ioeventfd_handler) {
+        for (n = 0; n < VIRTIO_QUEUE_MAX; n++) {
+            /* Kick right away to begin processing requests already in vring */
+            VirtQueue *vq = &vdev->vq[n];
+            if (!vq->vring.num) {
+                continue;
+            }
+            event_notifier_set(&vq->host_notifier);
         }
-        event_notifier_set(&vq->host_notifier);
     }
     memory_region_transaction_commit();
     return 0;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b31c450..b6ce5f0 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -105,6 +105,8 @@ struct VirtIODevice
     VMChangeStateEntry *vmstate;
     char *bus_name;
     uint8_t device_endian;
+    /* backend does not support userspace handler */
+    bool disable_ioeventfd_handler;
     bool use_guest_notifier_mask;
     AddressSpace *dma_as;
     QLIST_HEAD(, VirtQueue) *vector_queues;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
                   ` (2 preceding siblings ...)
  2022-03-30  6:33 ` [PATCH 3/7] virtio-net: Only enable userland vq if using tap backend Si-Wei Liu
@ 2022-03-30  6:33 ` Si-Wei Liu
  2022-03-30  9:14   ` Jason Wang
  2022-03-30  6:33 ` [PATCH 5/7] vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa Si-Wei Liu
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

Previous commit prevents vhost-user and vhost-vdpa from using
userland vq handler via disable_ioeventfd_handler. The same
needs to be done for host notifier cleanup too, as the
virtio_queue_host_notifier_read handler still tends to read
pending event left behind on ioeventfd and attempts to handle
outstanding kicks from QEMU userland vq.

If vq handler is not disabled on cleanup, it may lead to sigsegv
with recursive virtio_net_set_status call on the control vq:

0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
   at ../hw/virtio/virtio-pci.c:974
8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
   at ../hw/net/vhost_net.c:361
10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
   at ../softmmu/memory.c:492
15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
   at ../softmmu/memory.c:1504
17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at ../../../include/qemu/host-utils.h:165
18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
   at ../softmmu/physmem.c:2914
20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
   attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6

Fixes: 4023784 ("vhost-vdpa: multiqueue support")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/virtio-bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 0f69d1c..3159b58 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -311,7 +311,8 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
     /* Test and clear notifier after disabling event,
      * in case poll callback didn't have time to run.
      */
-    virtio_queue_host_notifier_read(notifier);
+    if (!vdev->disable_ioeventfd_handler)
+        virtio_queue_host_notifier_read(notifier);
     event_notifier_cleanup(notifier);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/7] vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
                   ` (3 preceding siblings ...)
  2022-03-30  6:33 ` [PATCH 4/7] virtio: don't read pending event on host notifier if disabled Si-Wei Liu
@ 2022-03-30  6:33 ` Si-Wei Liu
  2022-03-30  9:15   ` Jason Wang
  2022-03-30  6:33 ` [PATCH 6/7] vhost-net: fix improper cleanup in vhost_net_start Si-Wei Liu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

... such that no memory leaks on dangling net clients in case of
error.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1e9fe47..df1e69e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -306,7 +306,9 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 
 err:
     if (i) {
-        qemu_del_net_client(ncs[0]);
+        for (i--; i >= 0; i--) {
+            qemu_del_net_client(ncs[i]);
+        }
     }
     qemu_close(vdpa_device_fd);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/7] vhost-net: fix improper cleanup in vhost_net_start
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
                   ` (4 preceding siblings ...)
  2022-03-30  6:33 ` [PATCH 5/7] vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa Si-Wei Liu
@ 2022-03-30  6:33 ` Si-Wei Liu
  2022-03-30  9:30   ` Jason Wang
  2022-03-30  6:33 ` [PATCH 7/7] vhost-vdpa: backend feature should set only once Si-Wei Liu
  2022-04-27  4:28 ` [PATCH 0/7] vhost-vdpa multiqueue fixes Jason Wang
  7 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

vhost_net_start() missed a corresponding stop_one() upon error from
vhost_set_vring_enable(). While at it, make the error handling for
err_start more robust. No real issue was found due to this though.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/net/vhost_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..d6d7c51 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -381,6 +381,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
             r = vhost_set_vring_enable(peer, peer->vring_enable);
 
             if (r < 0) {
+                vhost_net_stop_one(get_vhost_net(peer), dev);
                 goto err_start;
             }
         }
@@ -390,7 +391,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
 err_start:
     while (--i >= 0) {
-        peer = qemu_get_peer(ncs , i);
+        peer = qemu_get_peer(ncs, i < data_queue_pairs ?
+                                  i : n->max_queue_pairs);
         vhost_net_stop_one(get_vhost_net(peer), dev);
     }
     e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
                   ` (5 preceding siblings ...)
  2022-03-30  6:33 ` [PATCH 6/7] vhost-net: fix improper cleanup in vhost_net_start Si-Wei Liu
@ 2022-03-30  6:33 ` Si-Wei Liu
  2022-03-30  9:28   ` Jason Wang
                     ` (2 more replies)
  2022-04-27  4:28 ` [PATCH 0/7] vhost-vdpa multiqueue fixes Jason Wang
  7 siblings, 3 replies; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30  6:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: si-wei.liu, eperezma, jasowang, eli, mst

The vhost_vdpa_one_time_request() branch in
vhost_vdpa_set_backend_cap() incorrectly sends down
iotls on vhost_dev with non-zero index. This may
end up with multiple VHOST_SET_BACKEND_FEATURES
ioctl calls sent down on the vhost-vdpa fd that is
shared between all these vhost_dev's.

To fix it, send down ioctl only once via the first
vhost_dev with index 0. Toggle the polarity of the
vhost_vdpa_one_time_request() test would do the trick.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..27ea706 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 
     features &= f;
 
-    if (vhost_vdpa_one_time_request(dev)) {
+    if (!vhost_vdpa_one_time_request(dev)) {
         r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
         if (r) {
             return -EFAULT;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  2022-03-30  6:33 ` [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa Si-Wei Liu
@ 2022-03-30  9:00   ` Jason Wang
  2022-03-30 15:47     ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-03-30  9:00 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> With MQ enabled vdpa device and non-MQ supporting guest e.g.
> booting vdpa with mq=on over OVMF of single vqp, below assert
> failure is seen:
>
> ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
>
> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>    at ../hw/virtio/virtio-pci.c:974
> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>    at ../hw/net/vhost_net.c:361
> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>    at ../softmmu/memory.c:492
> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>    at ../softmmu/memory.c:1504
> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>    at ../softmmu/physmem.c:2914
> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>    attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>
> The cause for the assert failure is due to that the vhost_dev index
> for the ctrl vq was not aligned with actual one in use by the guest.
> Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
> if guest doesn't support multiqueue, the guest vq layout would shrink
> to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
> This results in ctrl_vq taking a different vhost_dev group index than
> the default. We can map vq to the correct vhost_dev group by checking
> if MQ is supported by guest and successfully negotiated. Since the
> MQ feature is only present along with CTRL_VQ, we make sure the index
> 2 is only meant for the control vq while MQ is not supported by guest.
>
> Be noted if QEMU or guest doesn't support control vq, there's no bother
> exposing vhost_dev and guest notifier for the control vq. Since
> vhost_net_start/stop implies DRIVER_OK is set in device status, feature
> negotiation should be completed when reaching virtio_net_vhost_status().
>
> Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/net/virtio-net.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 1067e72..484b215 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -245,7 +245,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
>      NetClientState *nc = qemu_get_queue(n->nic);
>      int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> -    int cvq = n->max_ncs - n->max_queue_pairs;
> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> +              n->max_ncs - n->max_queue_pairs : 0;

Let's use a separate patch for this.

>
>      if (!get_vhost_net(nc->peer)) {
>          return;
> @@ -3170,8 +3171,14 @@ static NetClientInfo net_virtio_info = {
>  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);
> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> +    NetClientState *nc;
>      assert(n->vhost_started);
> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));

This assert seems guest trigger-able. If yes, I would remove this or
replace it with log_guest_error.

> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
> +    } else {
> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
> +    }
>      return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
>  }
>
> @@ -3179,8 +3186,14 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
>                                             bool mask)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);
> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> +    NetClientState *nc;
>      assert(n->vhost_started);
> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));

And this.

Thanks


> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
> +    } else {
> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
> +    }
>      vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
>                               vdev, idx, mask);
>  }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/7] virtio-net: Fix indentation
  2022-03-30  6:33 ` [PATCH 2/7] virtio-net: Fix indentation Si-Wei Liu
@ 2022-03-30  9:01   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-03-30  9:01 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> From: Eugenio Pérez <eperezma@redhat.com>
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>  hw/net/virtio-net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 484b215..ffaf481 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -3523,7 +3523,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>      nc = qemu_get_queue(n->nic);
>      nc->rxfilter_notify_enabled = 1;
>
> -   if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>          struct virtio_net_config netcfg = {};
>          memcpy(&netcfg.mac, &n->nic_conf.macaddr, ETH_ALEN);
>          vhost_net_set_config(get_vhost_net(nc->peer),
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/7] virtio-net: Only enable userland vq if using tap backend
  2022-03-30  6:33 ` [PATCH 3/7] virtio-net: Only enable userland vq if using tap backend Si-Wei Liu
@ 2022-03-30  9:07   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-03-30  9:07 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> From: Eugenio Pérez <eperezma@redhat.com>
>
> Qemu falls back on userland handlers even if vhost-user and vhost-vdpa
> cases. These assumes a tap device can handle the packets.
>
> If a vdpa device fail to start, it can trigger a sigsegv because of
> that. Do not resort on them unless actually possible.

What kind of segsev you've seen. If my memory is correct we finally
choose to have a dummy receive() for vhost-vDPA in

commit 846a1e85da646c6006db429648389fc110f92d75
Author: Eugenio Pérez <eperezma@redhat.com>
Date:   Thu Nov 25 11:16:13 2021 +0100

    vdpa: Add dummy receive callback

    Qemu falls back on userland handlers even if vhost-user and vhost-vdpa
    cases. These assumes a tap device can handle the packets.

    If a vdpa device fail to start, it can trigger a sigsegv because of
    that. Add dummy receiver that returns no progress so it can keep
    running.

    Fixes: 1e0a84ea49 ("vhost-vdpa: introduce vhost-vdpa net client")
    Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
    Message-Id: <20211125101614.76927-2-eperezma@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>

(Technically, we can have a vhost-vDPA networking backend (not vhost backend))

>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/net/virtio-net.c        |  4 ++++
>  hw/virtio/virtio.c         | 21 +++++++++++++--------
>  include/hw/virtio/virtio.h |  2 ++
>  3 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index ffaf481..9cdf777 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -3523,6 +3523,10 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>      nc = qemu_get_queue(n->nic);
>      nc->rxfilter_notify_enabled = 1;
>
> +    if (!nc->peer || nc->peer->info->type != NET_CLIENT_DRIVER_TAP) {
> +        /* Only tap can use userspace networking */
> +        vdev->disable_ioeventfd_handler = true;
> +    }
>      if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
>          struct virtio_net_config netcfg = {};
>          memcpy(&netcfg.mac, &n->nic_conf.macaddr, ETH_ALEN);
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 9d637e0..806603b 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3708,17 +3708,22 @@ static int virtio_device_start_ioeventfd_impl(VirtIODevice *vdev)
>              err = r;
>              goto assign_error;
>          }
> -        event_notifier_set_handler(&vq->host_notifier,
> -                                   virtio_queue_host_notifier_read);
> +
> +        if (!vdev->disable_ioeventfd_handler) {
> +            event_notifier_set_handler(&vq->host_notifier,
> +                                       virtio_queue_host_notifier_read);
> +        }
>      }
>
> -    for (n = 0; n < VIRTIO_QUEUE_MAX; n++) {
> -        /* Kick right away to begin processing requests already in vring */
> -        VirtQueue *vq = &vdev->vq[n];
> -        if (!vq->vring.num) {
> -            continue;
> +    if (!vdev->disable_ioeventfd_handler) {
> +        for (n = 0; n < VIRTIO_QUEUE_MAX; n++) {
> +            /* Kick right away to begin processing requests already in vring */
> +            VirtQueue *vq = &vdev->vq[n];
> +            if (!vq->vring.num) {
> +                continue;
> +            }
> +            event_notifier_set(&vq->host_notifier);
>          }
> -        event_notifier_set(&vq->host_notifier);
>      }
>      memory_region_transaction_commit();
>      return 0;
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b31c450..b6ce5f0 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -105,6 +105,8 @@ struct VirtIODevice
>      VMChangeStateEntry *vmstate;
>      char *bus_name;
>      uint8_t device_endian;
> +    /* backend does not support userspace handler */
> +    bool disable_ioeventfd_handler;
>      bool use_guest_notifier_mask;
>      AddressSpace *dma_as;
>      QLIST_HEAD(, VirtQueue) *vector_queues;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-03-30  6:33 ` [PATCH 4/7] virtio: don't read pending event on host notifier if disabled Si-Wei Liu
@ 2022-03-30  9:14   ` Jason Wang
  2022-03-30 16:40     ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-03-30  9:14 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Previous commit prevents vhost-user and vhost-vdpa from using
> userland vq handler via disable_ioeventfd_handler. The same
> needs to be done for host notifier cleanup too, as the
> virtio_queue_host_notifier_read handler still tends to read
> pending event left behind on ioeventfd and attempts to handle
> outstanding kicks from QEMU userland vq.
>
> If vq handler is not disabled on cleanup, it may lead to sigsegv
> with recursive virtio_net_set_status call on the control vq:
>
> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557

I feel it's probably a bug elsewhere e.g when we fail to start
vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
will fallback to the userspace vq handler.

Thanks

> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>    at ../hw/virtio/virtio-pci.c:974
> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>    at ../hw/net/vhost_net.c:361
> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>    at ../softmmu/memory.c:492
> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>    at ../softmmu/memory.c:1504
> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at ../../../include/qemu/host-utils.h:165
> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>    at ../softmmu/physmem.c:2914
> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>    attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>
> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/virtio-bus.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index 0f69d1c..3159b58 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -311,7 +311,8 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>      /* Test and clear notifier after disabling event,
>       * in case poll callback didn't have time to run.
>       */
> -    virtio_queue_host_notifier_read(notifier);
> +    if (!vdev->disable_ioeventfd_handler)
> +        virtio_queue_host_notifier_read(notifier);
>      event_notifier_cleanup(notifier);
>  }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/7] vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
  2022-03-30  6:33 ` [PATCH 5/7] vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa Si-Wei Liu
@ 2022-03-30  9:15   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-03-30  9:15 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> ... such that no memory leaks on dangling net clients in case of
> error.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>  net/vhost-vdpa.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 1e9fe47..df1e69e 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -306,7 +306,9 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>
>  err:
>      if (i) {
> -        qemu_del_net_client(ncs[0]);
> +        for (i--; i >= 0; i--) {
> +            qemu_del_net_client(ncs[i]);
> +        }
>      }
>      qemu_close(vdpa_device_fd);
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30  6:33 ` [PATCH 7/7] vhost-vdpa: backend feature should set only once Si-Wei Liu
@ 2022-03-30  9:28   ` Jason Wang
  2022-03-30 16:24   ` Stefano Garzarella
  2022-03-30 19:01   ` Eugenio Perez Martin
  2 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-03-30  9:28 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> The vhost_vdpa_one_time_request() branch in
> vhost_vdpa_set_backend_cap() incorrectly sends down
> iotls on vhost_dev with non-zero index. This may
> end up with multiple VHOST_SET_BACKEND_FEATURES
> ioctl calls sent down on the vhost-vdpa fd that is
> shared between all these vhost_dev's.
>
> To fix it, send down ioctl only once via the first
> vhost_dev with index 0. Toggle the polarity of the
> vhost_vdpa_one_time_request() test would do the trick.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>  hw/virtio/vhost-vdpa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..27ea706 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>
>      features &= f;
>
> -    if (vhost_vdpa_one_time_request(dev)) {
> +    if (!vhost_vdpa_one_time_request(dev)) {
>          r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>          if (r) {
>              return -EFAULT;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/7] vhost-net: fix improper cleanup in vhost_net_start
  2022-03-30  6:33 ` [PATCH 6/7] vhost-net: fix improper cleanup in vhost_net_start Si-Wei Liu
@ 2022-03-30  9:30   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-03-30  9:30 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> vhost_net_start() missed a corresponding stop_one() upon error from
> vhost_set_vring_enable(). While at it, make the error handling for
> err_start more robust. No real issue was found due to this though.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

>  hw/net/vhost_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 30379d2..d6d7c51 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -381,6 +381,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>              r = vhost_set_vring_enable(peer, peer->vring_enable);
>
>              if (r < 0) {
> +                vhost_net_stop_one(get_vhost_net(peer), dev);
>                  goto err_start;
>              }
>          }
> @@ -390,7 +391,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>
>  err_start:
>      while (--i >= 0) {
> -        peer = qemu_get_peer(ncs , i);
> +        peer = qemu_get_peer(ncs, i < data_queue_pairs ?
> +                                  i : n->max_queue_pairs);
>          vhost_net_stop_one(get_vhost_net(peer), dev);
>      }
>      e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  2022-03-30  9:00   ` Jason Wang
@ 2022-03-30 15:47     ` Si-Wei Liu
  2022-03-31  8:39       ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30 15:47 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 3/30/2022 2:00 AM, Jason Wang wrote:
> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> With MQ enabled vdpa device and non-MQ supporting guest e.g.
>> booting vdpa with mq=on over OVMF of single vqp, below assert
>> failure is seen:
>>
>> ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
>>
>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>     at ../hw/virtio/virtio-pci.c:974
>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>     at ../hw/net/vhost_net.c:361
>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>>     at ../softmmu/memory.c:492
>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>>     at ../softmmu/memory.c:1504
>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>     at ../softmmu/physmem.c:2914
>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>>     attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>
>> The cause for the assert failure is due to that the vhost_dev index
>> for the ctrl vq was not aligned with actual one in use by the guest.
>> Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
>> if guest doesn't support multiqueue, the guest vq layout would shrink
>> to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
>> This results in ctrl_vq taking a different vhost_dev group index than
>> the default. We can map vq to the correct vhost_dev group by checking
>> if MQ is supported by guest and successfully negotiated. Since the
>> MQ feature is only present along with CTRL_VQ, we make sure the index
>> 2 is only meant for the control vq while MQ is not supported by guest.
>>
>> Be noted if QEMU or guest doesn't support control vq, there's no bother
>> exposing vhost_dev and guest notifier for the control vq. Since
>> vhost_net_start/stop implies DRIVER_OK is set in device status, feature
>> negotiation should be completed when reaching virtio_net_vhost_status().
>>
>> Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
>> Suggested-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   hw/net/virtio-net.c | 19 ++++++++++++++++---
>>   1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 1067e72..484b215 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -245,7 +245,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>       VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>       NetClientState *nc = qemu_get_queue(n->nic);
>>       int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
>> -    int cvq = n->max_ncs - n->max_queue_pairs;
>> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
>> +              n->max_ncs - n->max_queue_pairs : 0;
> Let's use a separate patch for this.
Yes, I can do that. Then the new patch will become a requisite for this 
patch.

>
>>       if (!get_vhost_net(nc->peer)) {
>>           return;
>> @@ -3170,8 +3171,14 @@ static NetClientInfo net_virtio_info = {
>>   static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
>>   {
>>       VirtIONet *n = VIRTIO_NET(vdev);
>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    NetClientState *nc;
>>       assert(n->vhost_started);
>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
> This assert seems guest trigger-able. If yes, I would remove this or
> replace it with log_guest_error.
This assert actually is relevant to the cvq change in 
virtio_net_vhost_status(). Since the same check on VIRTIO_NET_F_CTRL_VQ 
has been done earlier, it is assured that CTRL_VQ is negotiated when 
getting here.
Noted the vhost_started is asserted in the same function, which in turn 
implies DRIVER_OK is set meaning feature negotiation is complete. I 
can't easily think of a scenario which guest may inadvertently or 
purposely trigger the assert?

-Siwei

>
>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>> +    } else {
>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    }
>>       return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
>>   }
>>
>> @@ -3179,8 +3186,14 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
>>                                              bool mask)
>>   {
>>       VirtIONet *n = VIRTIO_NET(vdev);
>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    NetClientState *nc;
>>       assert(n->vhost_started);
>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
> And this.
>
> Thanks
>
>
>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>> +    } else {
>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    }
>>       vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
>>                                vdev, idx, mask);
>>   }
>> --
>> 1.8.3.1
>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30  6:33 ` [PATCH 7/7] vhost-vdpa: backend feature should set only once Si-Wei Liu
  2022-03-30  9:28   ` Jason Wang
@ 2022-03-30 16:24   ` Stefano Garzarella
  2022-03-30 17:12     ` Si-Wei Liu
  2022-03-30 19:01   ` Eugenio Perez Martin
  2 siblings, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2022-03-30 16:24 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, jasowang, mst, qemu-devel, eli

On Tue, Mar 29, 2022 at 11:33:17PM -0700, Si-Wei Liu wrote:
>The vhost_vdpa_one_time_request() branch in
>vhost_vdpa_set_backend_cap() incorrectly sends down
>iotls on vhost_dev with non-zero index. This may

Little typo s/iotls/ioctls

>end up with multiple VHOST_SET_BACKEND_FEATURES
>ioctl calls sent down on the vhost-vdpa fd that is
>shared between all these vhost_dev's.
>
>To fix it, send down ioctl only once via the first
>vhost_dev with index 0. Toggle the polarity of the
>vhost_vdpa_one_time_request() test would do the trick.
>
>Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>---
> hw/virtio/vhost-vdpa.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>index c5ed7a3..27ea706 100644
>--- a/hw/virtio/vhost-vdpa.c
>+++ b/hw/virtio/vhost-vdpa.c
>@@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>
>     features &= f;
>
>-    if (vhost_vdpa_one_time_request(dev)) {
>+    if (!vhost_vdpa_one_time_request(dev)) {
>         r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>         if (r) {
>             return -EFAULT;
>-- 
>1.8.3.1
>
>

With that:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>



Unrelated to this patch, but the name vhost_vdpa_one_time_request() is 
confusing IMHO.

Not that I'm good with names, but how about we change it to 
vhost_vdpa_skip_one_time_request()?

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-03-30  9:14   ` Jason Wang
@ 2022-03-30 16:40     ` Si-Wei Liu
  2022-03-31  8:36       ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30 16:40 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 3/30/2022 2:14 AM, Jason Wang wrote:
> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> Previous commit prevents vhost-user and vhost-vdpa from using
>> userland vq handler via disable_ioeventfd_handler. The same
>> needs to be done for host notifier cleanup too, as the
>> virtio_queue_host_notifier_read handler still tends to read
>> pending event left behind on ioeventfd and attempts to handle
>> outstanding kicks from QEMU userland vq.
>>
>> If vq handler is not disabled on cleanup, it may lead to sigsegv
>> with recursive virtio_net_set_status call on the control vq:
>>
>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
> I feel it's probably a bug elsewhere e.g when we fail to start
> vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
> will fallback to the userspace vq handler.
Apologies, an incorrect stack trace was pasted which actually came from 
patch #1. I will post a v2 with the corresponding one as below:

0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at 
../hw/core/qdev.c:376
1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled 
(vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at 
../hw/virtio/vhost.c:318
3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>, 
buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at 
../hw/virtio/vhost.c:336
4  0x000055f800d71867 in vhost_virtqueue_stop 
(dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590, 
vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
5  0x000055f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30, 
vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30, 
dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
7  0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590, 
ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7, 
cvq=cvq@entry=1)
    at ../hw/net/vhost_net.c:423
8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>, 
n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
9  0x000055f800d4e628 in virtio_net_set_status 
(vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at 
../hw/net/virtio-net.c:370
10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized 
out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at 
../hw/net/virtio-net.c:1408
11 0x000055f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590, 
vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
12 0x000055f800d69f37 in virtio_queue_host_notifier_read 
(vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
13 0x000055f800d69f37 in virtio_queue_host_notifier_read 
(n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier 
(bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
15 0x000055f800d73106 in vhost_dev_disable_notifiers 
(hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
    at ../../../include/hw/virtio/virtio-bus.h:35
16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0, 
dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
17 0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590, 
ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7, 
cvq=cvq@entry=1)
    at ../hw/net/vhost_net.c:423
18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>, 
n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590, 
status=15 '\017') at ../hw/net/virtio-net.c:370
20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590, 
val=<optimized out>) at ../hw/virtio/virtio.c:1945
21 0x000055f800d11d9d in vm_state_notify (running=running@entry=false, 
state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
22 0x000055f800d04e7a in do_vm_stop 
(state=state@entry=RUN_STATE_SHUTDOWN, send_stop=send_stop@entry=false) 
at ../softmmu/cpus.c:262
23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
24 0x000055f800d126af in qemu_cleanup () at ../softmmu/runstate.c:812
25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized 
out>, envp=<optimized out>) at ../softmmu/main.c:51

 From the trace pending read only occurs in stop path. The recursive 
virtio_net_set_status from virtio_net_handle_ctrl doesn't make sense to me.
Not sure I got the reason why we need to handle pending host 
notification in userland vq, can you elaborate?

Thanks,
-Siwei

>
> Thanks
>
>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>     at ../hw/virtio/virtio-pci.c:974
>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>     at ../hw/net/vhost_net.c:361
>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>>     at ../softmmu/memory.c:492
>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>>     at ../softmmu/memory.c:1504
>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at ../../../include/qemu/host-utils.h:165
>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>     at ../softmmu/physmem.c:2914
>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>>     attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>
>> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
>> Cc: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>>   hw/virtio/virtio-bus.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
>> index 0f69d1c..3159b58 100644
>> --- a/hw/virtio/virtio-bus.c
>> +++ b/hw/virtio/virtio-bus.c
>> @@ -311,7 +311,8 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>>       /* Test and clear notifier after disabling event,
>>        * in case poll callback didn't have time to run.
>>        */
>> -    virtio_queue_host_notifier_read(notifier);
>> +    if (!vdev->disable_ioeventfd_handler)
>> +        virtio_queue_host_notifier_read(notifier);
>>       event_notifier_cleanup(notifier);
>>   }
>>
>> --
>> 1.8.3.1
>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30 16:24   ` Stefano Garzarella
@ 2022-03-30 17:12     ` Si-Wei Liu
  2022-03-30 17:32       ` Stefano Garzarella
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30 17:12 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: eperezma, jasowang, mst, qemu-devel, eli



On 3/30/2022 9:24 AM, Stefano Garzarella wrote:
> On Tue, Mar 29, 2022 at 11:33:17PM -0700, Si-Wei Liu wrote:
>> The vhost_vdpa_one_time_request() branch in
>> vhost_vdpa_set_backend_cap() incorrectly sends down
>> iotls on vhost_dev with non-zero index. This may
>
> Little typo s/iotls/ioctls
Thanks! Will correct it in v2.

>
>> end up with multiple VHOST_SET_BACKEND_FEATURES
>> ioctl calls sent down on the vhost-vdpa fd that is
>> shared between all these vhost_dev's.
>>
>> To fix it, send down ioctl only once via the first
>> vhost_dev with index 0. Toggle the polarity of the
>> vhost_vdpa_one_time_request() test would do the trick.
>>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>> hw/virtio/vhost-vdpa.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..27ea706 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct 
>> vhost_dev *dev)
>>
>>     features &= f;
>>
>> -    if (vhost_vdpa_one_time_request(dev)) {
>> +    if (!vhost_vdpa_one_time_request(dev)) {
>>         r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>>         if (r) {
>>             return -EFAULT;
>> -- 
>> 1.8.3.1
>>
>>
>
> With that:
>
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>
>
> Unrelated to this patch, but the name vhost_vdpa_one_time_request() is 
> confusing IMHO.
Coincidentally I got the same feeling and wanted to rename it to 
something else, too.

>
> Not that I'm good with names, but how about we change it to 
> vhost_vdpa_skip_one_time_request()?
How about vhost_vdpa_request_already_applied()? seems to be more 
readable in the context.

-Siwei

>
> Thanks,
> Stefano
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30 17:12     ` Si-Wei Liu
@ 2022-03-30 17:32       ` Stefano Garzarella
  2022-03-30 18:27         ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Stefano Garzarella @ 2022-03-30 17:32 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, jasowang, mst, qemu-devel, eli

On Wed, Mar 30, 2022 at 10:12:42AM -0700, Si-Wei Liu wrote:
>
>
>On 3/30/2022 9:24 AM, Stefano Garzarella wrote:
>>On Tue, Mar 29, 2022 at 11:33:17PM -0700, Si-Wei Liu wrote:
>>>The vhost_vdpa_one_time_request() branch in
>>>vhost_vdpa_set_backend_cap() incorrectly sends down
>>>iotls on vhost_dev with non-zero index. This may
>>
>>Little typo s/iotls/ioctls
>Thanks! Will correct it in v2.
>
>>
>>>end up with multiple VHOST_SET_BACKEND_FEATURES
>>>ioctl calls sent down on the vhost-vdpa fd that is
>>>shared between all these vhost_dev's.
>>>
>>>To fix it, send down ioctl only once via the first
>>>vhost_dev with index 0. Toggle the polarity of the
>>>vhost_vdpa_one_time_request() test would do the trick.
>>>
>>>Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>---
>>>hw/virtio/vhost-vdpa.c | 2 +-
>>>1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>>diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>index c5ed7a3..27ea706 100644
>>>--- a/hw/virtio/vhost-vdpa.c
>>>+++ b/hw/virtio/vhost-vdpa.c
>>>@@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct 
>>>vhost_dev *dev)
>>>
>>>    features &= f;
>>>
>>>-    if (vhost_vdpa_one_time_request(dev)) {
>>>+    if (!vhost_vdpa_one_time_request(dev)) {
>>>        r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>>>        if (r) {
>>>            return -EFAULT;
>>>-- 
>>>1.8.3.1
>>>
>>>
>>
>>With that:
>>
>>Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>>
>>
>>
>>Unrelated to this patch, but the name vhost_vdpa_one_time_request() 
>>is confusing IMHO.
>Coincidentally I got the same feeling and wanted to rename it to 
>something else, too.
>
>>
>>Not that I'm good with names, but how about we change it to 
>>vhost_vdpa_skip_one_time_request()?
>How about vhost_vdpa_request_already_applied()? seems to be more 
>readable in the context.

That's fine too, except you can't discern that it's a single request per 
device, so maybe I'd add "dev," but I don't know if it gets too long:

vhost_vdpa_dev_request_already_applied()

And I would also add a comment to the function to explain that we use 
that function for requests that only need to be applied once, regardless 
of the number of queues.

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30 17:32       ` Stefano Garzarella
@ 2022-03-30 18:27         ` Eugenio Perez Martin
  2022-03-30 22:44           ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-03-30 18:27 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Si-Wei Liu, Jason Wang, Michael Tsirkin, qemu-level, Eli Cohen

On Wed, Mar 30, 2022 at 7:32 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Mar 30, 2022 at 10:12:42AM -0700, Si-Wei Liu wrote:
> >
> >
> >On 3/30/2022 9:24 AM, Stefano Garzarella wrote:
> >>On Tue, Mar 29, 2022 at 11:33:17PM -0700, Si-Wei Liu wrote:
> >>>The vhost_vdpa_one_time_request() branch in
> >>>vhost_vdpa_set_backend_cap() incorrectly sends down
> >>>iotls on vhost_dev with non-zero index. This may
> >>
> >>Little typo s/iotls/ioctls
> >Thanks! Will correct it in v2.
> >
> >>
> >>>end up with multiple VHOST_SET_BACKEND_FEATURES
> >>>ioctl calls sent down on the vhost-vdpa fd that is
> >>>shared between all these vhost_dev's.
> >>>
> >>>To fix it, send down ioctl only once via the first
> >>>vhost_dev with index 0. Toggle the polarity of the
> >>>vhost_vdpa_one_time_request() test would do the trick.
> >>>
> >>>Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >>>---
> >>>hw/virtio/vhost-vdpa.c | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>>diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>index c5ed7a3..27ea706 100644
> >>>--- a/hw/virtio/vhost-vdpa.c
> >>>+++ b/hw/virtio/vhost-vdpa.c
> >>>@@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct
> >>>vhost_dev *dev)
> >>>
> >>>    features &= f;
> >>>
> >>>-    if (vhost_vdpa_one_time_request(dev)) {
> >>>+    if (!vhost_vdpa_one_time_request(dev)) {
> >>>        r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
> >>>        if (r) {
> >>>            return -EFAULT;
> >>>--
> >>>1.8.3.1
> >>>
> >>>
> >>
> >>With that:
> >>
> >>Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> >>
> >>
> >>
> >>Unrelated to this patch, but the name vhost_vdpa_one_time_request()
> >>is confusing IMHO.
> >Coincidentally I got the same feeling and wanted to rename it to
> >something else, too.
> >
> >>
> >>Not that I'm good with names, but how about we change it to
> >>vhost_vdpa_skip_one_time_request()?
> >How about vhost_vdpa_request_already_applied()? seems to be more
> >readable in the context.
>
> That's fine too, except you can't discern that it's a single request per
> device, so maybe I'd add "dev," but I don't know if it gets too long:
>
> vhost_vdpa_dev_request_already_applied()
>
> And I would also add a comment to the function to explain that we use
> that function for requests that only need to be applied once, regardless
> of the number of queues.
>

In my opinion it should express what it checks: vhost_vdpa_first, or
vhost_vdpa_first_dev, vhost_vdpa_first_queue... and to add a comment
like the one you propose. I think the context of the caller gives
enough information.

I would add that the use is for requests that only need / must be
applied once *and before setting up queues*, *at the beginning of
operation*, or similar, because we do a similar check with
dev->vq_index_end and these are not exchangeable.

Thanks!

> Thanks,
> Stefano
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30  6:33 ` [PATCH 7/7] vhost-vdpa: backend feature should set only once Si-Wei Liu
  2022-03-30  9:28   ` Jason Wang
  2022-03-30 16:24   ` Stefano Garzarella
@ 2022-03-30 19:01   ` Eugenio Perez Martin
  2022-03-30 23:03     ` Si-Wei Liu
  2 siblings, 1 reply; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-03-30 19:01 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Jason Wang, Eli Cohen, qemu-level, Michael Tsirkin

On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> The vhost_vdpa_one_time_request() branch in
> vhost_vdpa_set_backend_cap() incorrectly sends down
> iotls on vhost_dev with non-zero index. This may
> end up with multiple VHOST_SET_BACKEND_FEATURES
> ioctl calls sent down on the vhost-vdpa fd that is
> shared between all these vhost_dev's.
>

Not only that. This means that qemu thinks the device supports iotlb
batching as long as the device does not have cvq. If vdpa does not
support batching, it will return an error later with no possibility of
doing it ok. Some open questions:

Should we make the vdpa driver return error as long as a feature is
used but not set by qemu, or let it as undefined? I guess we have to
keep the batching at least without checking so the kernel supports old
versions of qemu.

On the other hand, should we return an error if IOTLB_MSG_V2 is not
supported here? We're basically assuming it in other functions.

> To fix it, send down ioctl only once via the first
> vhost_dev with index 0. Toggle the polarity of the
> vhost_vdpa_one_time_request() test would do the trick.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  hw/virtio/vhost-vdpa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..27ea706 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>
>      features &= f;
>
> -    if (vhost_vdpa_one_time_request(dev)) {
> +    if (!vhost_vdpa_one_time_request(dev)) {
>          r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>          if (r) {
>              return -EFAULT;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30 18:27         ` Eugenio Perez Martin
@ 2022-03-30 22:44           ` Si-Wei Liu
  0 siblings, 0 replies; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30 22:44 UTC (permalink / raw)
  To: Eugenio Perez Martin, Stefano Garzarella
  Cc: Jason Wang, Michael Tsirkin, qemu-level, Eli Cohen



On 3/30/2022 11:27 AM, Eugenio Perez Martin wrote:
> On Wed, Mar 30, 2022 at 7:32 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> On Wed, Mar 30, 2022 at 10:12:42AM -0700, Si-Wei Liu wrote:
>>>
>>> On 3/30/2022 9:24 AM, Stefano Garzarella wrote:
>>>> On Tue, Mar 29, 2022 at 11:33:17PM -0700, Si-Wei Liu wrote:
>>>>> The vhost_vdpa_one_time_request() branch in
>>>>> vhost_vdpa_set_backend_cap() incorrectly sends down
>>>>> iotls on vhost_dev with non-zero index. This may
>>>> Little typo s/iotls/ioctls
>>> Thanks! Will correct it in v2.
>>>
>>>>> end up with multiple VHOST_SET_BACKEND_FEATURES
>>>>> ioctl calls sent down on the vhost-vdpa fd that is
>>>>> shared between all these vhost_dev's.
>>>>>
>>>>> To fix it, send down ioctl only once via the first
>>>>> vhost_dev with index 0. Toggle the polarity of the
>>>>> vhost_vdpa_one_time_request() test would do the trick.
>>>>>
>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>> ---
>>>>> hw/virtio/vhost-vdpa.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index c5ed7a3..27ea706 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct
>>>>> vhost_dev *dev)
>>>>>
>>>>>     features &= f;
>>>>>
>>>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>>>> +    if (!vhost_vdpa_one_time_request(dev)) {
>>>>>         r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>>>>>         if (r) {
>>>>>             return -EFAULT;
>>>>> --
>>>>> 1.8.3.1
>>>>>
>>>>>
>>>> With that:
>>>>
>>>> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>
>>>>
>>>>
>>>> Unrelated to this patch, but the name vhost_vdpa_one_time_request()
>>>> is confusing IMHO.
>>> Coincidentally I got the same feeling and wanted to rename it to
>>> something else, too.
>>>
>>>> Not that I'm good with names, but how about we change it to
>>>> vhost_vdpa_skip_one_time_request()?
>>> How about vhost_vdpa_request_already_applied()? seems to be more
>>> readable in the context.
>> That's fine too, except you can't discern that it's a single request per
>> device, so maybe I'd add "dev," but I don't know if it gets too long:
>>
>> vhost_vdpa_dev_request_already_applied()
>>
>> And I would also add a comment to the function to explain that we use
>> that function for requests that only need to be applied once, regardless
>> of the number of queues.
>>
> In my opinion it should express what it checks: vhost_vdpa_first, or
> vhost_vdpa_first_dev, vhost_vdpa_first_queue...
Indeed, the same name ever came to my mind that is to reflect what it 
actually checks. It is just that I have to flip the polarity of the 
vhost_vdpa_one_time_request() function that made me adopt another name. 
What matches best for the current code would be something similar to 
vhost_vdpa_dev_other_than_the_first(), though why bother using another 
function rather than write the check as-is in place is another question.

>   and to add a comment
> like the one you propose.
Will do.
>   I think the context of the caller gives
> enough information.
>
> I would add that the use is for requests that only need / must be
> applied once *and before setting up queues*, *at the beginning of
> operation*, or similar, because we do a similar check with
> dev->vq_index_end and these are not exchangeable.
Nods. Exactly.

-Siwei
>
> Thanks!
>
>> Thanks,
>> Stefano
>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30 19:01   ` Eugenio Perez Martin
@ 2022-03-30 23:03     ` Si-Wei Liu
  2022-03-31  8:02       ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-30 23:03 UTC (permalink / raw)
  To: Eugenio Perez Martin; +Cc: Jason Wang, Eli Cohen, qemu-level, Michael Tsirkin



On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> The vhost_vdpa_one_time_request() branch in
>> vhost_vdpa_set_backend_cap() incorrectly sends down
>> iotls on vhost_dev with non-zero index. This may
>> end up with multiple VHOST_SET_BACKEND_FEATURES
>> ioctl calls sent down on the vhost-vdpa fd that is
>> shared between all these vhost_dev's.
>>
> Not only that. This means that qemu thinks the device supports iotlb
> batching as long as the device does not have cvq. If vdpa does not
> support batching, it will return an error later with no possibility of
> doing it ok.
I think the implicit assumption here is that the caller should back off 
to where it was if it comes to error i.e. once the first 
vhost_dev_set_features call gets an error, vhost_dev_start() will fail 
straight. Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq 
and it doesn't even need to. There seems to me no possibility for it to 
fail in a way as thought here. The capture is that IOTLB batching is at 
least a vdpa device level backend feature, if not per-kernel. Same as 
IOTLB_MSG_V2.

-Siwei

>   Some open questions:
>
> Should we make the vdpa driver return error as long as a feature is
> used but not set by qemu, or let it as undefined? I guess we have to
> keep the batching at least without checking so the kernel supports old
> versions of qemu.
>
> On the other hand, should we return an error if IOTLB_MSG_V2 is not
> supported here? We're basically assuming it in other functions.
>
>> To fix it, send down ioctl only once via the first
>> vhost_dev with index 0. Toggle the polarity of the
>> vhost_vdpa_one_time_request() test would do the trick.
>>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
>
>> ---
>>   hw/virtio/vhost-vdpa.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..27ea706 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>
>>       features &= f;
>>
>> -    if (vhost_vdpa_one_time_request(dev)) {
>> +    if (!vhost_vdpa_one_time_request(dev)) {
>>           r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>>           if (r) {
>>               return -EFAULT;
>> --
>> 1.8.3.1
>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-30 23:03     ` Si-Wei Liu
@ 2022-03-31  8:02       ` Eugenio Perez Martin
  2022-03-31  8:54         ` Jason Wang
  2022-03-31 21:15         ` Si-Wei Liu
  0 siblings, 2 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-03-31  8:02 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Jason Wang, Eli Cohen, qemu-level, Michael Tsirkin

On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
> > On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> The vhost_vdpa_one_time_request() branch in
> >> vhost_vdpa_set_backend_cap() incorrectly sends down
> >> iotls on vhost_dev with non-zero index. This may
> >> end up with multiple VHOST_SET_BACKEND_FEATURES
> >> ioctl calls sent down on the vhost-vdpa fd that is
> >> shared between all these vhost_dev's.
> >>
> > Not only that. This means that qemu thinks the device supports iotlb
> > batching as long as the device does not have cvq. If vdpa does not
> > support batching, it will return an error later with no possibility of
> > doing it ok.
> I think the implicit assumption here is that the caller should back off
> to where it was if it comes to error i.e. once the first
> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
> straight.

Sorry, I don't follow you here, and maybe my message was not clear enough.

What I meant is that your patch fixes another problem not stated in
the message: it is not possible to initialize a net vdpa device that
does not have cvq and does not support iotlb batches without it. Qemu
will assume that the device supports batching, so the write of
VHOST_IOTLB_BATCH_BEGIN will fail. I didn't test what happens next but
it probably cannot continue.

In that regard, this commit needs to be marked as "Fixes: ...", either
("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
("4d191cf vhost-vdpa: classify one time request"). We have a
regression if we introduce both, or the second one and the support of
any other backend feature.

> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
> and it doesn't even need to. There seems to me no possibility for it to
> fail in a way as thought here. The capture is that IOTLB batching is at
> least a vdpa device level backend feature, if not per-kernel. Same as
> IOTLB_MSG_V2.
>

At this moment it is per-kernel, yes. With your patch there is no need
to fail because of the lack of _F_IOTLB_BATCH, the code should handle
this case ok.

But if VHOST_GET_BACKEND_FEATURES returns no support for
VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
messages anyway. This has nothing to do with the patch, I'm just
noting it here.

In that case, maybe it is better to return something like -ENOTSUP?

Thanks!

> -Siwei
>
> >   Some open questions:
> >
> > Should we make the vdpa driver return error as long as a feature is
> > used but not set by qemu, or let it as undefined? I guess we have to
> > keep the batching at least without checking so the kernel supports old
> > versions of qemu.
> >
> > On the other hand, should we return an error if IOTLB_MSG_V2 is not
> > supported here? We're basically assuming it in other functions.
> >
> >> To fix it, send down ioctl only once via the first
> >> vhost_dev with index 0. Toggle the polarity of the
> >> vhost_vdpa_one_time_request() test would do the trick.
> >>
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > Acked-by: Eugenio Pérez <eperezma@redhat.com>
> >
> >> ---
> >>   hw/virtio/vhost-vdpa.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >> index c5ed7a3..27ea706 100644
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >>
> >>       features &= f;
> >>
> >> -    if (vhost_vdpa_one_time_request(dev)) {
> >> +    if (!vhost_vdpa_one_time_request(dev)) {
> >>           r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
> >>           if (r) {
> >>               return -EFAULT;
> >> --
> >> 1.8.3.1
> >>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-03-30 16:40     ` Si-Wei Liu
@ 2022-03-31  8:36       ` Jason Wang
  2022-04-01 20:37         ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-03-31  8:36 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/30/2022 2:14 AM, Jason Wang wrote:
> > On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> Previous commit prevents vhost-user and vhost-vdpa from using
> >> userland vq handler via disable_ioeventfd_handler. The same
> >> needs to be done for host notifier cleanup too, as the
> >> virtio_queue_host_notifier_read handler still tends to read
> >> pending event left behind on ioeventfd and attempts to handle
> >> outstanding kicks from QEMU userland vq.
> >>
> >> If vq handler is not disabled on cleanup, it may lead to sigsegv
> >> with recursive virtio_net_set_status call on the control vq:
> >>
> >> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
> >> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
> >> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
> >> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
> >> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
> >> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
> >> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
> > I feel it's probably a bug elsewhere e.g when we fail to start
> > vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
> > will fallback to the userspace vq handler.
> Apologies, an incorrect stack trace was pasted which actually came from
> patch #1. I will post a v2 with the corresponding one as below:
>
> 0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at
> ../hw/core/qdev.c:376
> 1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled
> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
> 2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at
> ../hw/virtio/vhost.c:318
> 3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>,
> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
> ../hw/virtio/vhost.c:336
> 4  0x000055f800d71867 in vhost_virtqueue_stop
> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
> 5  0x000055f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30,
> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
> 6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
> 7  0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> cvq=cvq@entry=1)
>     at ../hw/net/vhost_net.c:423
> 8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
> 9  0x000055f800d4e628 in virtio_net_set_status
> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
> ../hw/net/virtio-net.c:370

I don't understand why virtio_net_handle_ctrl() call virtio_net_set_stauts()...

> 10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized
> out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at
> ../hw/net/virtio-net.c:1408
> 11 0x000055f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
> 12 0x000055f800d69f37 in virtio_queue_host_notifier_read
> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
> 13 0x000055f800d69f37 in virtio_queue_host_notifier_read
> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
> 14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier
> (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
> 15 0x000055f800d73106 in vhost_dev_disable_notifiers
> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
>     at ../../../include/hw/virtio/virtio-bus.h:35
> 16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
> 17 0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> cvq=cvq@entry=1)
>     at ../hw/net/vhost_net.c:423
> 18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
> 19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
> status=15 '\017') at ../hw/net/virtio-net.c:370
> 20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
> val=<optimized out>) at ../hw/virtio/virtio.c:1945
> 21 0x000055f800d11d9d in vm_state_notify (running=running@entry=false,
> state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
> 22 0x000055f800d04e7a in do_vm_stop
> (state=state@entry=RUN_STATE_SHUTDOWN, send_stop=send_stop@entry=false)
> at ../softmmu/cpus.c:262
> 23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
> 24 0x000055f800d126af in qemu_cleanup () at ../softmmu/runstate.c:812
> 25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at ../softmmu/main.c:51
>
>  From the trace pending read only occurs in stop path. The recursive
> virtio_net_set_status from virtio_net_handle_ctrl doesn't make sense to me.

Yes, we need to figure this out to know the root cause.

The code should work for the case when vhost-vdp fails to start.

> Not sure I got the reason why we need to handle pending host
> notification in userland vq, can you elaborate?

Because vhost-vDPA fails to start, we will "fallback" to a dummy userspace.

Thanks

>
> Thanks,
> -Siwei
>
> >
> > Thanks
> >
> >> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
> >>     at ../hw/virtio/virtio-pci.c:974
> >> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
> >> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
> >>     at ../hw/net/vhost_net.c:361
> >> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
> >> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
> >> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
> >> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
> >> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
> >>     at ../softmmu/memory.c:492
> >> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
> >> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
> >>     at ../softmmu/memory.c:1504
> >> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at ../../../include/qemu/host-utils.h:165
> >> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
> >> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
> >>     at ../softmmu/physmem.c:2914
> >> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
> >>     attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
> >> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
> >> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
> >> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
> >> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
> >> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
> >>
> >> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
> >> Cc: Jason Wang <jasowang@redhat.com>
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> ---
> >>   hw/virtio/virtio-bus.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> >> index 0f69d1c..3159b58 100644
> >> --- a/hw/virtio/virtio-bus.c
> >> +++ b/hw/virtio/virtio-bus.c
> >> @@ -311,7 +311,8 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
> >>       /* Test and clear notifier after disabling event,
> >>        * in case poll callback didn't have time to run.
> >>        */
> >> -    virtio_queue_host_notifier_read(notifier);
> >> +    if (!vdev->disable_ioeventfd_handler)
> >> +        virtio_queue_host_notifier_read(notifier);
> >>       event_notifier_cleanup(notifier);
> >>   }
> >>
> >> --
> >> 1.8.3.1
> >>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  2022-03-30 15:47     ` Si-Wei Liu
@ 2022-03-31  8:39       ` Jason Wang
  2022-04-01 22:32         ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-03-31  8:39 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/30/2022 2:00 AM, Jason Wang wrote:
> > On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> With MQ enabled vdpa device and non-MQ supporting guest e.g.
> >> booting vdpa with mq=on over OVMF of single vqp, below assert
> >> failure is seen:
> >>
> >> ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
> >>
> >> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
> >> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
> >> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
> >> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
> >> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
> >> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
> >> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
> >> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
> >>     at ../hw/virtio/virtio-pci.c:974
> >> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
> >> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
> >>     at ../hw/net/vhost_net.c:361
> >> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
> >> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
> >> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
> >> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
> >> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
> >>     at ../softmmu/memory.c:492
> >> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
> >> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
> >>     at ../softmmu/memory.c:1504
> >> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
> >> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
> >> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
> >>     at ../softmmu/physmem.c:2914
> >> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
> >>     attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
> >> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
> >> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
> >> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
> >> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
> >> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
> >>
> >> The cause for the assert failure is due to that the vhost_dev index
> >> for the ctrl vq was not aligned with actual one in use by the guest.
> >> Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
> >> if guest doesn't support multiqueue, the guest vq layout would shrink
> >> to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
> >> This results in ctrl_vq taking a different vhost_dev group index than
> >> the default. We can map vq to the correct vhost_dev group by checking
> >> if MQ is supported by guest and successfully negotiated. Since the
> >> MQ feature is only present along with CTRL_VQ, we make sure the index
> >> 2 is only meant for the control vq while MQ is not supported by guest.
> >>
> >> Be noted if QEMU or guest doesn't support control vq, there's no bother
> >> exposing vhost_dev and guest notifier for the control vq. Since
> >> vhost_net_start/stop implies DRIVER_OK is set in device status, feature
> >> negotiation should be completed when reaching virtio_net_vhost_status().
> >>
> >> Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
> >> Suggested-by: Jason Wang <jasowang@redhat.com>
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> ---
> >>   hw/net/virtio-net.c | 19 ++++++++++++++++---
> >>   1 file changed, 16 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 1067e72..484b215 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -245,7 +245,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>       VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >>       NetClientState *nc = qemu_get_queue(n->nic);
> >>       int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> >> -    int cvq = n->max_ncs - n->max_queue_pairs;
> >> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> >> +              n->max_ncs - n->max_queue_pairs : 0;
> > Let's use a separate patch for this.
> Yes, I can do that. Then the new patch will become a requisite for this
> patch.
>
> >
> >>       if (!get_vhost_net(nc->peer)) {
> >>           return;
> >> @@ -3170,8 +3171,14 @@ static NetClientInfo net_virtio_info = {
> >>   static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> >>   {
> >>       VirtIONet *n = VIRTIO_NET(vdev);
> >> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >> +    NetClientState *nc;
> >>       assert(n->vhost_started);
> >> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
> >> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
> > This assert seems guest trigger-able. If yes, I would remove this or
> > replace it with log_guest_error.
> This assert actually is relevant to the cvq change in
> virtio_net_vhost_status(). Since the same check on VIRTIO_NET_F_CTRL_VQ
> has been done earlier, it is assured that CTRL_VQ is negotiated when
> getting here.
> Noted the vhost_started is asserted in the same function, which in turn
> implies DRIVER_OK is set meaning feature negotiation is complete. I
> can't easily think of a scenario which guest may inadvertently or
> purposely trigger the assert?

So the code can be triggered like e.g unmasking:

virtio_pci_vq_vector_unmask()
        k->guest_notifier_pending()

Thanks


>
> -Siwei
>
> >
> >> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
> >> +    } else {
> >> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >> +    }
> >>       return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
> >>   }
> >>
> >> @@ -3179,8 +3186,14 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
> >>                                              bool mask)
> >>   {
> >>       VirtIONet *n = VIRTIO_NET(vdev);
> >> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >> +    NetClientState *nc;
> >>       assert(n->vhost_started);
> >> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
> >> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
> > And this.
> >
> > Thanks
> >
> >
> >> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
> >> +    } else {
> >> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >> +    }
> >>       vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
> >>                                vdev, idx, mask);
> >>   }
> >> --
> >> 1.8.3.1
> >>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-31  8:02       ` Eugenio Perez Martin
@ 2022-03-31  8:54         ` Jason Wang
  2022-03-31  9:19           ` Eugenio Perez Martin
  2022-03-31 21:15         ` Si-Wei Liu
  1 sibling, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-03-31  8:54 UTC (permalink / raw)
  To: Eugenio Perez Martin, Si-Wei Liu; +Cc: Eli Cohen, qemu-level, Michael Tsirkin


在 2022/3/31 下午4:02, Eugenio Perez Martin 写道:
> On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
>>> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> The vhost_vdpa_one_time_request() branch in
>>>> vhost_vdpa_set_backend_cap() incorrectly sends down
>>>> iotls on vhost_dev with non-zero index. This may
>>>> end up with multiple VHOST_SET_BACKEND_FEATURES
>>>> ioctl calls sent down on the vhost-vdpa fd that is
>>>> shared between all these vhost_dev's.
>>>>
>>> Not only that. This means that qemu thinks the device supports iotlb
>>> batching as long as the device does not have cvq. If vdpa does not
>>> support batching, it will return an error later with no possibility of
>>> doing it ok.
>> I think the implicit assumption here is that the caller should back off
>> to where it was if it comes to error i.e. once the first
>> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
>> straight.
> Sorry, I don't follow you here, and maybe my message was not clear enough.
>
> What I meant is that your patch fixes another problem not stated in
> the message: it is not possible to initialize a net vdpa device that
> does not have cvq and does not support iotlb batches without it. Qemu
> will assume that the device supports batching, so the write of
> VHOST_IOTLB_BATCH_BEGIN will fail. I didn't test what happens next but
> it probably cannot continue.


So you mean we actually didn't call VHOST_SET_BACKEND_CAP in this case. 
Fortunately, kernel didn't check the backend cap when accepting batching 
hints.

We are probably fine?

Thanks


> In that regard, this commit needs to be marked as "Fixes: ...", either
> ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
> ("4d191cf vhost-vdpa: classify one time request"). We have a
> regression if we introduce both, or the second one and the support of
> any other backend feature.
>
>> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
>> and it doesn't even need to. There seems to me no possibility for it to
>> fail in a way as thought here. The capture is that IOTLB batching is at
>> least a vdpa device level backend feature, if not per-kernel. Same as
>> IOTLB_MSG_V2.
>>
> At this moment it is per-kernel, yes. With your patch there is no need
> to fail because of the lack of _F_IOTLB_BATCH, the code should handle
> this case ok.
>
> But if VHOST_GET_BACKEND_FEATURES returns no support for
> VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
> messages anyway. This has nothing to do with the patch, I'm just
> noting it here.
>
> In that case, maybe it is better to return something like -ENOTSUP?
>
> Thanks!
>
>> -Siwei
>>
>>>    Some open questions:
>>>
>>> Should we make the vdpa driver return error as long as a feature is
>>> used but not set by qemu, or let it as undefined? I guess we have to
>>> keep the batching at least without checking so the kernel supports old
>>> versions of qemu.
>>>
>>> On the other hand, should we return an error if IOTLB_MSG_V2 is not
>>> supported here? We're basically assuming it in other functions.
>>>
>>>> To fix it, send down ioctl only once via the first
>>>> vhost_dev with index 0. Toggle the polarity of the
>>>> vhost_vdpa_one_time_request() test would do the trick.
>>>>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>> Acked-by: Eugenio Pérez <eperezma@redhat.com>
>>>
>>>> ---
>>>>    hw/virtio/vhost-vdpa.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..27ea706 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>>>
>>>>        features &= f;
>>>>
>>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>>> +    if (!vhost_vdpa_one_time_request(dev)) {
>>>>            r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>>>>            if (r) {
>>>>                return -EFAULT;
>>>> --
>>>> 1.8.3.1
>>>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-31  8:54         ` Jason Wang
@ 2022-03-31  9:19           ` Eugenio Perez Martin
  2022-04-01  2:39             ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-03-31  9:19 UTC (permalink / raw)
  To: Jason Wang; +Cc: Si-Wei Liu, Eli Cohen, qemu-level, Michael Tsirkin

On Thu, Mar 31, 2022 at 10:54 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/3/31 下午4:02, Eugenio Perez Martin 写道:
> > On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
> >>> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>> The vhost_vdpa_one_time_request() branch in
> >>>> vhost_vdpa_set_backend_cap() incorrectly sends down
> >>>> iotls on vhost_dev with non-zero index. This may
> >>>> end up with multiple VHOST_SET_BACKEND_FEATURES
> >>>> ioctl calls sent down on the vhost-vdpa fd that is
> >>>> shared between all these vhost_dev's.
> >>>>
> >>> Not only that. This means that qemu thinks the device supports iotlb
> >>> batching as long as the device does not have cvq. If vdpa does not
> >>> support batching, it will return an error later with no possibility of
> >>> doing it ok.
> >> I think the implicit assumption here is that the caller should back off
> >> to where it was if it comes to error i.e. once the first
> >> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
> >> straight.
> > Sorry, I don't follow you here, and maybe my message was not clear enough.
> >
> > What I meant is that your patch fixes another problem not stated in
> > the message: it is not possible to initialize a net vdpa device that
> > does not have cvq and does not support iotlb batches without it. Qemu
> > will assume that the device supports batching, so the write of
> > VHOST_IOTLB_BATCH_BEGIN will fail. I didn't test what happens next but
> > it probably cannot continue.
>
>
> So you mean we actually didn't call VHOST_SET_BACKEND_CAP in this case.
> Fortunately, kernel didn't check the backend cap when accepting batching
> hints.
>
> We are probably fine?
>

We're fine as long as the vdpa driver in the kernel effectively
supports batching. If not, qemu will try to batch, and it will fail.

It was introduced in v5.9, so qemu has not supported kernel <5.9 since
we introduced multiqueue support (I didn't test). Unless we apply this
patch. That's the reason it should be marked as fixed and backported
to stable IMO.

Thanks!

> Thanks
>
>
> > In that regard, this commit needs to be marked as "Fixes: ...", either
> > ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
> > ("4d191cf vhost-vdpa: classify one time request"). We have a
> > regression if we introduce both, or the second one and the support of
> > any other backend feature.
> >
> >> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
> >> and it doesn't even need to. There seems to me no possibility for it to
> >> fail in a way as thought here. The capture is that IOTLB batching is at
> >> least a vdpa device level backend feature, if not per-kernel. Same as
> >> IOTLB_MSG_V2.
> >>
> > At this moment it is per-kernel, yes. With your patch there is no need
> > to fail because of the lack of _F_IOTLB_BATCH, the code should handle
> > this case ok.
> >
> > But if VHOST_GET_BACKEND_FEATURES returns no support for
> > VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
> > messages anyway. This has nothing to do with the patch, I'm just
> > noting it here.
> >
> > In that case, maybe it is better to return something like -ENOTSUP?
> >
> > Thanks!
> >
> >> -Siwei
> >>
> >>>    Some open questions:
> >>>
> >>> Should we make the vdpa driver return error as long as a feature is
> >>> used but not set by qemu, or let it as undefined? I guess we have to
> >>> keep the batching at least without checking so the kernel supports old
> >>> versions of qemu.
> >>>
> >>> On the other hand, should we return an error if IOTLB_MSG_V2 is not
> >>> supported here? We're basically assuming it in other functions.
> >>>
> >>>> To fix it, send down ioctl only once via the first
> >>>> vhost_dev with index 0. Toggle the polarity of the
> >>>> vhost_vdpa_one_time_request() test would do the trick.
> >>>>
> >>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >>> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> >>>
> >>>> ---
> >>>>    hw/virtio/vhost-vdpa.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>> index c5ed7a3..27ea706 100644
> >>>> --- a/hw/virtio/vhost-vdpa.c
> >>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >>>>
> >>>>        features &= f;
> >>>>
> >>>> -    if (vhost_vdpa_one_time_request(dev)) {
> >>>> +    if (!vhost_vdpa_one_time_request(dev)) {
> >>>>            r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
> >>>>            if (r) {
> >>>>                return -EFAULT;
> >>>> --
> >>>> 1.8.3.1
> >>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-31  8:02       ` Eugenio Perez Martin
  2022-03-31  8:54         ` Jason Wang
@ 2022-03-31 21:15         ` Si-Wei Liu
  2022-04-01  8:21           ` Eugenio Perez Martin
  1 sibling, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-03-31 21:15 UTC (permalink / raw)
  To: Eugenio Perez Martin; +Cc: Jason Wang, Eli Cohen, qemu-level, Michael Tsirkin



On 3/31/2022 1:02 AM, Eugenio Perez Martin wrote:
> On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
>>> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> The vhost_vdpa_one_time_request() branch in
>>>> vhost_vdpa_set_backend_cap() incorrectly sends down
>>>> iotls on vhost_dev with non-zero index. This may
>>>> end up with multiple VHOST_SET_BACKEND_FEATURES
>>>> ioctl calls sent down on the vhost-vdpa fd that is
>>>> shared between all these vhost_dev's.
>>>>
>>> Not only that. This means that qemu thinks the device supports iotlb
>>> batching as long as the device does not have cvq. If vdpa does not
>>> support batching, it will return an error later with no possibility of
>>> doing it ok.
>> I think the implicit assumption here is that the caller should back off
>> to where it was if it comes to error i.e. once the first
>> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
>> straight.
> Sorry, I don't follow you here, and maybe my message was not clear enough.
>
> What I meant is that your patch fixes another problem not stated in
> the message: it is not possible to initialize a net vdpa device that
> does not have cvq and does not support iotlb batches without it. Qemu
> will assume that the device supports batching, so the write of
> VHOST_IOTLB_BATCH_BEGIN will fail.
This is not what I see from the code? For e.g. 
vhost_vdpa_iotlb_batch_begin_once() has the following:

  140     if (v->dev->backend_cap & (0x1ULL << 
VHOST_BACKEND_F_IOTLB_BATCH) &&
  141         !v->iotlb_batch_begin_sent) {
  142         vhost_vdpa_listener_begin_batch(v);
  143     }

If backend_cap doesn't contain the VHOST_BACKEND_F_IOTLB_BATCH bit, QEMU 
shouldn't send down VHOST_IOTLB_BATCH_BEGIN...

Noted in vhost_vdpa_set_backend_cap(), VHOST_GET_BACKEND_FEATURES was 
supposed to get the backend capability from the kernel ahead of the 
VHOST_SET_BACKEND_FEATURES call. In which case of your concern, at least 
feature VHOST_BACKEND_F_IOTLB_MSG_V2 should be successfully returned and 
stored in the backend_cap, even if the VHOST_SET_BACKEND_FEATURES ioctl 
was missed in between. Hence the resulting backend_cap shouldn't have 
the VHOST_BACKEND_F_IOTLB_BATCH bit set. What am I missing here?


>   I didn't test what happens next but
> it probably cannot continue.
>
> In that regard, this commit needs to be marked as "Fixes: ...", either
> ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
> ("4d191cf vhost-vdpa: classify one time request"). We have a
> regression if we introduce both, or the second one and the support of
> any other backend feature.
Sure, it's not that I am unwilling to add the "Fixes" tag, though I'd 
like to make sure if the worry is real upfront. Thanks for pointing it 
out anyway.

Thanks,
-Siwei

>
>> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
>> and it doesn't even need to. There seems to me no possibility for it to
>> fail in a way as thought here. The capture is that IOTLB batching is at
>> least a vdpa device level backend feature, if not per-kernel. Same as
>> IOTLB_MSG_V2.
>>
> At this moment it is per-kernel, yes. With your patch there is no need
> to fail because of the lack of _F_IOTLB_BATCH, the code should handle
> this case ok.
>
> But if VHOST_GET_BACKEND_FEATURES returns no support for
> VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
> messages anyway. This has nothing to do with the patch, I'm just
> noting it here.
>
> In that case, maybe it is better to return something like -ENOTSUP?
>
> Thanks!
>
>> -Siwei
>>
>>>    Some open questions:
>>>
>>> Should we make the vdpa driver return error as long as a feature is
>>> used but not set by qemu, or let it as undefined? I guess we have to
>>> keep the batching at least without checking so the kernel supports old
>>> versions of qemu.
>>>
>>> On the other hand, should we return an error if IOTLB_MSG_V2 is not
>>> supported here? We're basically assuming it in other functions.
>>>
>>>> To fix it, send down ioctl only once via the first
>>>> vhost_dev with index 0. Toggle the polarity of the
>>>> vhost_vdpa_one_time_request() test would do the trick.
>>>>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>> Acked-by: Eugenio Pérez <eperezma@redhat.com>
>>>
>>>> ---
>>>>    hw/virtio/vhost-vdpa.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..27ea706 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>>>
>>>>        features &= f;
>>>>
>>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>>> +    if (!vhost_vdpa_one_time_request(dev)) {
>>>>            r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>>>>            if (r) {
>>>>                return -EFAULT;
>>>> --
>>>> 1.8.3.1
>>>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-31  9:19           ` Eugenio Perez Martin
@ 2022-04-01  2:39             ` Jason Wang
  2022-04-01  4:18               ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-01  2:39 UTC (permalink / raw)
  To: Eugenio Perez Martin; +Cc: Si-Wei Liu, Eli Cohen, qemu-level, Michael Tsirkin

On Thu, Mar 31, 2022 at 5:20 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Mar 31, 2022 at 10:54 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/3/31 下午4:02, Eugenio Perez Martin 写道:
> > > On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> > >>
> > >>
> > >> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
> > >>> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> > >>>> The vhost_vdpa_one_time_request() branch in
> > >>>> vhost_vdpa_set_backend_cap() incorrectly sends down
> > >>>> iotls on vhost_dev with non-zero index. This may
> > >>>> end up with multiple VHOST_SET_BACKEND_FEATURES
> > >>>> ioctl calls sent down on the vhost-vdpa fd that is
> > >>>> shared between all these vhost_dev's.
> > >>>>
> > >>> Not only that. This means that qemu thinks the device supports iotlb
> > >>> batching as long as the device does not have cvq. If vdpa does not
> > >>> support batching, it will return an error later with no possibility of
> > >>> doing it ok.
> > >> I think the implicit assumption here is that the caller should back off
> > >> to where it was if it comes to error i.e. once the first
> > >> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
> > >> straight.
> > > Sorry, I don't follow you here, and maybe my message was not clear enough.
> > >
> > > What I meant is that your patch fixes another problem not stated in
> > > the message: it is not possible to initialize a net vdpa device that
> > > does not have cvq and does not support iotlb batches without it. Qemu
> > > will assume that the device supports batching, so the write of
> > > VHOST_IOTLB_BATCH_BEGIN will fail. I didn't test what happens next but
> > > it probably cannot continue.
> >
> >
> > So you mean we actually didn't call VHOST_SET_BACKEND_CAP in this case.
> > Fortunately, kernel didn't check the backend cap when accepting batching
> > hints.
> >
> > We are probably fine?
> >
>
> We're fine as long as the vdpa driver in the kernel effectively
> supports batching. If not, qemu will try to batch, and it will fail.
>
> It was introduced in v5.9, so qemu has not supported kernel <5.9 since
> we introduced multiqueue support (I didn't test). Unless we apply this
> patch. That's the reason it should be marked as fixed and backported
> to stable IMO.

Ok, so it looks to me we have more issues.

In vhost_vdpa_set_backend_cap() we fail when
VHOST_VDPA_GET_BACKEND_FEATURES fails. This breaks the older kernel
since that ioctl is introduced in

653055b9acd4 ("vhost-vdpa: support get/set backend features")

We should:

1) make it work by not failing the vhost_vdpa_set_backend_cap() and
assuming MSG_V2.
2) check the batching support in vhost_vdpa_listener_begin_batch()
instead of trying to set VHOST_IOTLB_BATCH_BEGIN uncondtionally

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > > In that regard, this commit needs to be marked as "Fixes: ...", either
> > > ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
> > > ("4d191cf vhost-vdpa: classify one time request"). We have a
> > > regression if we introduce both, or the second one and the support of
> > > any other backend feature.
> > >
> > >> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
> > >> and it doesn't even need to. There seems to me no possibility for it to
> > >> fail in a way as thought here. The capture is that IOTLB batching is at
> > >> least a vdpa device level backend feature, if not per-kernel. Same as
> > >> IOTLB_MSG_V2.
> > >>
> > > At this moment it is per-kernel, yes. With your patch there is no need
> > > to fail because of the lack of _F_IOTLB_BATCH, the code should handle
> > > this case ok.
> > >
> > > But if VHOST_GET_BACKEND_FEATURES returns no support for
> > > VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
> > > messages anyway. This has nothing to do with the patch, I'm just
> > > noting it here.
> > >
> > > In that case, maybe it is better to return something like -ENOTSUP?
> > >
> > > Thanks!
> > >
> > >> -Siwei
> > >>
> > >>>    Some open questions:
> > >>>
> > >>> Should we make the vdpa driver return error as long as a feature is
> > >>> used but not set by qemu, or let it as undefined? I guess we have to
> > >>> keep the batching at least without checking so the kernel supports old
> > >>> versions of qemu.
> > >>>
> > >>> On the other hand, should we return an error if IOTLB_MSG_V2 is not
> > >>> supported here? We're basically assuming it in other functions.
> > >>>
> > >>>> To fix it, send down ioctl only once via the first
> > >>>> vhost_dev with index 0. Toggle the polarity of the
> > >>>> vhost_vdpa_one_time_request() test would do the trick.
> > >>>>
> > >>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > >>> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>
> > >>>> ---
> > >>>>    hw/virtio/vhost-vdpa.c | 2 +-
> > >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>
> > >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>> index c5ed7a3..27ea706 100644
> > >>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> > >>>>
> > >>>>        features &= f;
> > >>>>
> > >>>> -    if (vhost_vdpa_one_time_request(dev)) {
> > >>>> +    if (!vhost_vdpa_one_time_request(dev)) {
> > >>>>            r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
> > >>>>            if (r) {
> > >>>>                return -EFAULT;
> > >>>> --
> > >>>> 1.8.3.1
> > >>>>
> >
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-04-01  2:39             ` Jason Wang
@ 2022-04-01  4:18               ` Si-Wei Liu
  2022-04-02  1:33                 ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-01  4:18 UTC (permalink / raw)
  To: Jason Wang, Eugenio Perez Martin; +Cc: Eli Cohen, qemu-level, Michael Tsirkin



On 3/31/2022 7:39 PM, Jason Wang wrote:
> On Thu, Mar 31, 2022 at 5:20 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
>> On Thu, Mar 31, 2022 at 10:54 AM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2022/3/31 下午4:02, Eugenio Perez Martin 写道:
>>>> On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>
>>>>> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
>>>>>> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>>> The vhost_vdpa_one_time_request() branch in
>>>>>>> vhost_vdpa_set_backend_cap() incorrectly sends down
>>>>>>> iotls on vhost_dev with non-zero index. This may
>>>>>>> end up with multiple VHOST_SET_BACKEND_FEATURES
>>>>>>> ioctl calls sent down on the vhost-vdpa fd that is
>>>>>>> shared between all these vhost_dev's.
>>>>>>>
>>>>>> Not only that. This means that qemu thinks the device supports iotlb
>>>>>> batching as long as the device does not have cvq. If vdpa does not
>>>>>> support batching, it will return an error later with no possibility of
>>>>>> doing it ok.
>>>>> I think the implicit assumption here is that the caller should back off
>>>>> to where it was if it comes to error i.e. once the first
>>>>> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
>>>>> straight.
>>>> Sorry, I don't follow you here, and maybe my message was not clear enough.
>>>>
>>>> What I meant is that your patch fixes another problem not stated in
>>>> the message: it is not possible to initialize a net vdpa device that
>>>> does not have cvq and does not support iotlb batches without it. Qemu
>>>> will assume that the device supports batching, so the write of
>>>> VHOST_IOTLB_BATCH_BEGIN will fail. I didn't test what happens next but
>>>> it probably cannot continue.
>>>
>>> So you mean we actually didn't call VHOST_SET_BACKEND_CAP in this case.
>>> Fortunately, kernel didn't check the backend cap when accepting batching
>>> hints.
>>>
>>> We are probably fine?
>>>
>> We're fine as long as the vdpa driver in the kernel effectively
>> supports batching. If not, qemu will try to batch, and it will fail.
>>
>> It was introduced in v5.9, so qemu has not supported kernel <5.9 since
>> we introduced multiqueue support (I didn't test). Unless we apply this
>> patch. That's the reason it should be marked as fixed and backported
>> to stable IMO.
> Ok, so it looks to me we have more issues.
>
> In vhost_vdpa_set_backend_cap() we fail when
> VHOST_VDPA_GET_BACKEND_FEATURES fails. This breaks the older kernel
> since that ioctl is introduced in
>
> 653055b9acd4 ("vhost-vdpa: support get/set backend features")
Yep, the GET/SET_BACKEND ioctl pair got introduced together in this 
exact commit.
>
> We should:
>
> 1) make it work by not failing the vhost_vdpa_set_backend_cap() and
> assuming MSG_V2.
This issue is orthogonal with my fix, which was pre-existing before the 
multiqueue support. I believe there should be another separate patch to 
fix QEMU for pre-GET/SET_BACKEND kernel.

> 2) check the batching support in vhost_vdpa_listener_begin_batch()
> instead of trying to set VHOST_IOTLB_BATCH_BEGIN uncondtionally
This is non-issue since VHOST_BACKEND_F_IOTLB_BATCH is already validated 
in the caller vhost_vdpa_iotlb_batch_begin_once().

-Siwei
>
> Thanks
>
>> Thanks!
>>
>>> Thanks
>>>
>>>
>>>> In that regard, this commit needs to be marked as "Fixes: ...", either
>>>> ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
>>>> ("4d191cf vhost-vdpa: classify one time request"). We have a
>>>> regression if we introduce both, or the second one and the support of
>>>> any other backend feature.
>>>>
>>>>> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
>>>>> and it doesn't even need to. There seems to me no possibility for it to
>>>>> fail in a way as thought here. The capture is that IOTLB batching is at
>>>>> least a vdpa device level backend feature, if not per-kernel. Same as
>>>>> IOTLB_MSG_V2.
>>>>>
>>>> At this moment it is per-kernel, yes. With your patch there is no need
>>>> to fail because of the lack of _F_IOTLB_BATCH, the code should handle
>>>> this case ok.
>>>>
>>>> But if VHOST_GET_BACKEND_FEATURES returns no support for
>>>> VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
>>>> messages anyway. This has nothing to do with the patch, I'm just
>>>> noting it here.
>>>>
>>>> In that case, maybe it is better to return something like -ENOTSUP?
>>>>
>>>> Thanks!
>>>>
>>>>> -Siwei
>>>>>
>>>>>>     Some open questions:
>>>>>>
>>>>>> Should we make the vdpa driver return error as long as a feature is
>>>>>> used but not set by qemu, or let it as undefined? I guess we have to
>>>>>> keep the batching at least without checking so the kernel supports old
>>>>>> versions of qemu.
>>>>>>
>>>>>> On the other hand, should we return an error if IOTLB_MSG_V2 is not
>>>>>> supported here? We're basically assuming it in other functions.
>>>>>>
>>>>>>> To fix it, send down ioctl only once via the first
>>>>>>> vhost_dev with index 0. Toggle the polarity of the
>>>>>>> vhost_vdpa_one_time_request() test would do the trick.
>>>>>>>
>>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>> Acked-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>
>>>>>>> ---
>>>>>>>     hw/virtio/vhost-vdpa.c | 2 +-
>>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>>> index c5ed7a3..27ea706 100644
>>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>>>>>>
>>>>>>>         features &= f;
>>>>>>>
>>>>>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>>>>>> +    if (!vhost_vdpa_one_time_request(dev)) {
>>>>>>>             r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
>>>>>>>             if (r) {
>>>>>>>                 return -EFAULT;
>>>>>>> --
>>>>>>> 1.8.3.1
>>>>>>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-03-31 21:15         ` Si-Wei Liu
@ 2022-04-01  8:21           ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-01  8:21 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Jason Wang, Eli Cohen, qemu-level, Michael Tsirkin

On Thu, Mar 31, 2022 at 11:15 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/31/2022 1:02 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
> >>> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>> The vhost_vdpa_one_time_request() branch in
> >>>> vhost_vdpa_set_backend_cap() incorrectly sends down
> >>>> iotls on vhost_dev with non-zero index. This may
> >>>> end up with multiple VHOST_SET_BACKEND_FEATURES
> >>>> ioctl calls sent down on the vhost-vdpa fd that is
> >>>> shared between all these vhost_dev's.
> >>>>
> >>> Not only that. This means that qemu thinks the device supports iotlb
> >>> batching as long as the device does not have cvq. If vdpa does not
> >>> support batching, it will return an error later with no possibility of
> >>> doing it ok.
> >> I think the implicit assumption here is that the caller should back off
> >> to where it was if it comes to error i.e. once the first
> >> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
> >> straight.
> > Sorry, I don't follow you here, and maybe my message was not clear enough.
> >
> > What I meant is that your patch fixes another problem not stated in
> > the message: it is not possible to initialize a net vdpa device that
> > does not have cvq and does not support iotlb batches without it. Qemu
> > will assume that the device supports batching, so the write of
> > VHOST_IOTLB_BATCH_BEGIN will fail.
> This is not what I see from the code? For e.g.
> vhost_vdpa_iotlb_batch_begin_once() has the following:
>
>   140     if (v->dev->backend_cap & (0x1ULL <<
> VHOST_BACKEND_F_IOTLB_BATCH) &&
>   141         !v->iotlb_batch_begin_sent) {
>   142         vhost_vdpa_listener_begin_batch(v);
>   143     }
>
> If backend_cap doesn't contain the VHOST_BACKEND_F_IOTLB_BATCH bit, QEMU
> shouldn't send down VHOST_IOTLB_BATCH_BEGIN...
>
> Noted in vhost_vdpa_set_backend_cap(), VHOST_GET_BACKEND_FEATURES was
> supposed to get the backend capability from the kernel ahead of the
> VHOST_SET_BACKEND_FEATURES call. In which case of your concern, at least
> feature VHOST_BACKEND_F_IOTLB_MSG_V2 should be successfully returned and
> stored in the backend_cap, even if the VHOST_SET_BACKEND_FEATURES ioctl
> was missed in between. Hence the resulting backend_cap shouldn't have
> the VHOST_BACKEND_F_IOTLB_BATCH bit set. What am I missing here?
>

You're right, I missed that the GET is not skipped, thanks!

>
> >   I didn't test what happens next but
> > it probably cannot continue.
> >
> > In that regard, this commit needs to be marked as "Fixes: ...", either
> > ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
> > ("4d191cf vhost-vdpa: classify one time request"). We have a
> > regression if we introduce both, or the second one and the support of
> > any other backend feature.
> Sure, it's not that I am unwilling to add the "Fixes" tag, though I'd
> like to make sure if the worry is real upfront. Thanks for pointing it
> out anyway.
>
> Thanks,
> -Siwei
>
> >
> >> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
> >> and it doesn't even need to. There seems to me no possibility for it to
> >> fail in a way as thought here. The capture is that IOTLB batching is at
> >> least a vdpa device level backend feature, if not per-kernel. Same as
> >> IOTLB_MSG_V2.
> >>
> > At this moment it is per-kernel, yes. With your patch there is no need
> > to fail because of the lack of _F_IOTLB_BATCH, the code should handle
> > this case ok.
> >
> > But if VHOST_GET_BACKEND_FEATURES returns no support for
> > VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
> > messages anyway. This has nothing to do with the patch, I'm just
> > noting it here.
> >
> > In that case, maybe it is better to return something like -ENOTSUP?
> >
> > Thanks!
> >
> >> -Siwei
> >>
> >>>    Some open questions:
> >>>
> >>> Should we make the vdpa driver return error as long as a feature is
> >>> used but not set by qemu, or let it as undefined? I guess we have to
> >>> keep the batching at least without checking so the kernel supports old
> >>> versions of qemu.
> >>>
> >>> On the other hand, should we return an error if IOTLB_MSG_V2 is not
> >>> supported here? We're basically assuming it in other functions.
> >>>
> >>>> To fix it, send down ioctl only once via the first
> >>>> vhost_dev with index 0. Toggle the polarity of the
> >>>> vhost_vdpa_one_time_request() test would do the trick.
> >>>>
> >>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >>> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> >>>
> >>>> ---
> >>>>    hw/virtio/vhost-vdpa.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>> index c5ed7a3..27ea706 100644
> >>>> --- a/hw/virtio/vhost-vdpa.c
> >>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >>>>
> >>>>        features &= f;
> >>>>
> >>>> -    if (vhost_vdpa_one_time_request(dev)) {
> >>>> +    if (!vhost_vdpa_one_time_request(dev)) {
> >>>>            r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
> >>>>            if (r) {
> >>>>                return -EFAULT;
> >>>> --
> >>>> 1.8.3.1
> >>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-03-31  8:36       ` Jason Wang
@ 2022-04-01 20:37         ` Si-Wei Liu
  2022-04-02  2:00           ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-01 20:37 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 3/31/2022 1:36 AM, Jason Wang wrote:
> On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/30/2022 2:14 AM, Jason Wang wrote:
>>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> Previous commit prevents vhost-user and vhost-vdpa from using
>>>> userland vq handler via disable_ioeventfd_handler. The same
>>>> needs to be done for host notifier cleanup too, as the
>>>> virtio_queue_host_notifier_read handler still tends to read
>>>> pending event left behind on ioeventfd and attempts to handle
>>>> outstanding kicks from QEMU userland vq.
>>>>
>>>> If vq handler is not disabled on cleanup, it may lead to sigsegv
>>>> with recursive virtio_net_set_status call on the control vq:
>>>>
>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
>>> I feel it's probably a bug elsewhere e.g when we fail to start
>>> vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
>>> will fallback to the userspace vq handler.
>> Apologies, an incorrect stack trace was pasted which actually came from
>> patch #1. I will post a v2 with the corresponding one as below:
>>
>> 0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at
>> ../hw/core/qdev.c:376
>> 1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled
>> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
>> 2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at
>> ../hw/virtio/vhost.c:318
>> 3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>,
>> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
>> ../hw/virtio/vhost.c:336
>> 4  0x000055f800d71867 in vhost_virtqueue_stop
>> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
>> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
>> 5  0x000055f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30,
>> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
>> 6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
>> 7  0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>> cvq=cvq@entry=1)
>>      at ../hw/net/vhost_net.c:423
>> 8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>> 9  0x000055f800d4e628 in virtio_net_set_status
>> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
>> ../hw/net/virtio-net.c:370
> I don't understand why virtio_net_handle_ctrl() call virtio_net_set_stauts()...
The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ 
command, i.e. in virtio_net_handle_mq():

1413     n->curr_queue_pairs = queue_pairs;
1414     /* stop the backend before changing the number of queue_pairs 
to avoid handling a
1415      * disabled queue */
1416     virtio_net_set_status(vdev, vdev->status);
1417     virtio_net_set_queue_pairs(n);

Noted before the vdpa multiqueue support, there was never a vhost_dev 
for ctrl_vq exposed, i.e. there's no host notifier set up for the 
ctrl_vq on vhost_kernel as it is emulated in QEMU software.

>
>> 10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized
>> out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at
>> ../hw/net/virtio-net.c:1408
>> 11 0x000055f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
>> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
>> 12 0x000055f800d69f37 in virtio_queue_host_notifier_read
>> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
>> 13 0x000055f800d69f37 in virtio_queue_host_notifier_read
>> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
>> 14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier
>> (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
>> 15 0x000055f800d73106 in vhost_dev_disable_notifiers
>> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
>>      at ../../../include/hw/virtio/virtio-bus.h:35
>> 16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
>> 17 0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>> cvq=cvq@entry=1)
>>      at ../hw/net/vhost_net.c:423
>> 18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>> 19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
>> status=15 '\017') at ../hw/net/virtio-net.c:370
>> 20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
>> val=<optimized out>) at ../hw/virtio/virtio.c:1945
>> 21 0x000055f800d11d9d in vm_state_notify (running=running@entry=false,
>> state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
>> 22 0x000055f800d04e7a in do_vm_stop
>> (state=state@entry=RUN_STATE_SHUTDOWN, send_stop=send_stop@entry=false)
>> at ../softmmu/cpus.c:262
>> 23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
>> 24 0x000055f800d126af in qemu_cleanup () at ../softmmu/runstate.c:812
>> 25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized
>> out>, envp=<optimized out>) at ../softmmu/main.c:51
>>
>>   From the trace pending read only occurs in stop path. The recursive
>> virtio_net_set_status from virtio_net_handle_ctrl doesn't make sense to me.
> Yes, we need to figure this out to know the root cause.
I think it has something to do with the virtqueue unready issue that the 
vhost_reset_device() refactoring series attempt to fix. If that is fixed 
we should not see this sigsegv with mlx5_vdpa. However I guess we both 
agreed that the vq_unready support would need new uAPI (some flag) to 
define, hence this fix applies to the situation where uAPI doesn't exist 
on the kernel or the vq_unready is not supported by vdpa vendor driver.

>
> The code should work for the case when vhost-vdp fails to start.
Unlike the other datapath queues for net vdpa, the events left behind in 
the control queue can't be processed by the userspace, as unlike 
vhost-kernel we don't have a fallback path in the userspace. To ignore 
the pending event and let vhost vdpa process it on resume/start is 
perhaps the best thing to do. This is even true for datapath queues for 
other vdpa devices than of network.

>
>> Not sure I got the reason why we need to handle pending host
>> notification in userland vq, can you elaborate?
> Because vhost-vDPA fails to start, we will "fallback" to a dummy userspace.
Is the dummy userspace working or operational? What's the use case of 
this "fallback" dummy if what guest user can get is a busted NIC? I 
think this is very different from the vhost-kernel case in that once 
vhost fails, we can fallback to userspace to emulate the network through 
the tap fd in a good way. However, there's no equivalent yet for 
vhost-vdpa...

Thanks,
-Siwei

>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>> Thanks
>>>
>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>>>      at ../hw/virtio/virtio-pci.c:974
>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>>>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>>>      at ../hw/net/vhost_net.c:361
>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
>>>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
>>>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>>>>      at ../softmmu/memory.c:492
>>>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>>>>      at ../softmmu/memory.c:1504
>>>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at ../../../include/qemu/host-utils.h:165
>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>>>      at ../softmmu/physmem.c:2914
>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>>>>      attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
>>>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>>>
>>>> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> ---
>>>>    hw/virtio/virtio-bus.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
>>>> index 0f69d1c..3159b58 100644
>>>> --- a/hw/virtio/virtio-bus.c
>>>> +++ b/hw/virtio/virtio-bus.c
>>>> @@ -311,7 +311,8 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>>>>        /* Test and clear notifier after disabling event,
>>>>         * in case poll callback didn't have time to run.
>>>>         */
>>>> -    virtio_queue_host_notifier_read(notifier);
>>>> +    if (!vdev->disable_ioeventfd_handler)
>>>> +        virtio_queue_host_notifier_read(notifier);
>>>>        event_notifier_cleanup(notifier);
>>>>    }
>>>>
>>>> --
>>>> 1.8.3.1
>>>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  2022-03-31  8:39       ` Jason Wang
@ 2022-04-01 22:32         ` Si-Wei Liu
  2022-04-02  2:10           ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-01 22:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 3/31/2022 1:39 AM, Jason Wang wrote:
> On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/30/2022 2:00 AM, Jason Wang wrote:
>>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> With MQ enabled vdpa device and non-MQ supporting guest e.g.
>>>> booting vdpa with mq=on over OVMF of single vqp, below assert
>>>> failure is seen:
>>>>
>>>> ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
>>>>
>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>>>      at ../hw/virtio/virtio-pci.c:974
>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>>>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>>>      at ../hw/net/vhost_net.c:361
>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
>>>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
>>>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>>>>      at ../softmmu/memory.c:492
>>>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>>>>      at ../softmmu/memory.c:1504
>>>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>>>      at ../softmmu/physmem.c:2914
>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>>>>      attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
>>>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>>>
>>>> The cause for the assert failure is due to that the vhost_dev index
>>>> for the ctrl vq was not aligned with actual one in use by the guest.
>>>> Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
>>>> if guest doesn't support multiqueue, the guest vq layout would shrink
>>>> to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
>>>> This results in ctrl_vq taking a different vhost_dev group index than
>>>> the default. We can map vq to the correct vhost_dev group by checking
>>>> if MQ is supported by guest and successfully negotiated. Since the
>>>> MQ feature is only present along with CTRL_VQ, we make sure the index
>>>> 2 is only meant for the control vq while MQ is not supported by guest.
>>>>
>>>> Be noted if QEMU or guest doesn't support control vq, there's no bother
>>>> exposing vhost_dev and guest notifier for the control vq. Since
>>>> vhost_net_start/stop implies DRIVER_OK is set in device status, feature
>>>> negotiation should be completed when reaching virtio_net_vhost_status().
>>>>
>>>> Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
>>>> Suggested-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> ---
>>>>    hw/net/virtio-net.c | 19 ++++++++++++++++---
>>>>    1 file changed, 16 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 1067e72..484b215 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -245,7 +245,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>        VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>        NetClientState *nc = qemu_get_queue(n->nic);
>>>>        int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
>>>> -    int cvq = n->max_ncs - n->max_queue_pairs;
>>>> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
>>>> +              n->max_ncs - n->max_queue_pairs : 0;
>>> Let's use a separate patch for this.
>> Yes, I can do that. Then the new patch will become a requisite for this
>> patch.
>>
>>>>        if (!get_vhost_net(nc->peer)) {
>>>>            return;
>>>> @@ -3170,8 +3171,14 @@ static NetClientInfo net_virtio_info = {
>>>>    static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
>>>>    {
>>>>        VirtIONet *n = VIRTIO_NET(vdev);
>>>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>> +    NetClientState *nc;
>>>>        assert(n->vhost_started);
>>>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>>>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
>>> This assert seems guest trigger-able. If yes, I would remove this or
>>> replace it with log_guest_error.
>> This assert actually is relevant to the cvq change in
>> virtio_net_vhost_status(). Since the same check on VIRTIO_NET_F_CTRL_VQ
>> has been done earlier, it is assured that CTRL_VQ is negotiated when
>> getting here.
>> Noted the vhost_started is asserted in the same function, which in turn
>> implies DRIVER_OK is set meaning feature negotiation is complete. I
>> can't easily think of a scenario which guest may inadvertently or
>> purposely trigger the assert?
> So the code can be triggered like e.g unmasking:
>
> virtio_pci_vq_vector_unmask()
>          k->guest_notifier_pending()
Hmmm, are you concerned more about idx being invalid, or 
VIRTIO_NET_F_CTRL_VQ getting cleared?

virtio_pci_vector_unmask() has validation through virtio_queue_get_num() 
that ensures the vq index is valid. While it doesn't seem possible for 
VIRTIO_NET_F_CTRL_VQ to be cleared without device reset first, during 
which the pending event left over on guest notifier eventfd should have 
been completed within virtio_pci_set_guest_notifiers(false) before 
vhost_net_stop() returns. If I am not missing something here, I guess 
we're probably fine?

-Siwei

>
> Thanks
>
>
>> -Siwei
>>
>>>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>>>> +    } else {
>>>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>> +    }
>>>>        return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
>>>>    }
>>>>
>>>> @@ -3179,8 +3186,14 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
>>>>                                               bool mask)
>>>>    {
>>>>        VirtIONet *n = VIRTIO_NET(vdev);
>>>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>> +    NetClientState *nc;
>>>>        assert(n->vhost_started);
>>>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>>>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
>>> And this.
>>>
>>> Thanks
>>>
>>>
>>>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>>>> +    } else {
>>>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>> +    }
>>>>        vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
>>>>                                 vdev, idx, mask);
>>>>    }
>>>> --
>>>> 1.8.3.1
>>>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once
  2022-04-01  4:18               ` Si-Wei Liu
@ 2022-04-02  1:33                 ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-02  1:33 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Eugenio Perez Martin, Eli Cohen, qemu-level, Michael Tsirkin

On Fri, Apr 1, 2022 at 12:18 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/31/2022 7:39 PM, Jason Wang wrote:
> > On Thu, Mar 31, 2022 at 5:20 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> >> On Thu, Mar 31, 2022 at 10:54 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>
> >>> 在 2022/3/31 下午4:02, Eugenio Perez Martin 写道:
> >>>> On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>>
> >>>>> On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
> >>>>>> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>>>> The vhost_vdpa_one_time_request() branch in
> >>>>>>> vhost_vdpa_set_backend_cap() incorrectly sends down
> >>>>>>> iotls on vhost_dev with non-zero index. This may
> >>>>>>> end up with multiple VHOST_SET_BACKEND_FEATURES
> >>>>>>> ioctl calls sent down on the vhost-vdpa fd that is
> >>>>>>> shared between all these vhost_dev's.
> >>>>>>>
> >>>>>> Not only that. This means that qemu thinks the device supports iotlb
> >>>>>> batching as long as the device does not have cvq. If vdpa does not
> >>>>>> support batching, it will return an error later with no possibility of
> >>>>>> doing it ok.
> >>>>> I think the implicit assumption here is that the caller should back off
> >>>>> to where it was if it comes to error i.e. once the first
> >>>>> vhost_dev_set_features call gets an error, vhost_dev_start() will fail
> >>>>> straight.
> >>>> Sorry, I don't follow you here, and maybe my message was not clear enough.
> >>>>
> >>>> What I meant is that your patch fixes another problem not stated in
> >>>> the message: it is not possible to initialize a net vdpa device that
> >>>> does not have cvq and does not support iotlb batches without it. Qemu
> >>>> will assume that the device supports batching, so the write of
> >>>> VHOST_IOTLB_BATCH_BEGIN will fail. I didn't test what happens next but
> >>>> it probably cannot continue.
> >>>
> >>> So you mean we actually didn't call VHOST_SET_BACKEND_CAP in this case.
> >>> Fortunately, kernel didn't check the backend cap when accepting batching
> >>> hints.
> >>>
> >>> We are probably fine?
> >>>
> >> We're fine as long as the vdpa driver in the kernel effectively
> >> supports batching. If not, qemu will try to batch, and it will fail.
> >>
> >> It was introduced in v5.9, so qemu has not supported kernel <5.9 since
> >> we introduced multiqueue support (I didn't test). Unless we apply this
> >> patch. That's the reason it should be marked as fixed and backported
> >> to stable IMO.
> > Ok, so it looks to me we have more issues.
> >
> > In vhost_vdpa_set_backend_cap() we fail when
> > VHOST_VDPA_GET_BACKEND_FEATURES fails. This breaks the older kernel
> > since that ioctl is introduced in
> >
> > 653055b9acd4 ("vhost-vdpa: support get/set backend features")
> Yep, the GET/SET_BACKEND ioctl pair got introduced together in this
> exact commit.
> >
> > We should:
> >
> > 1) make it work by not failing the vhost_vdpa_set_backend_cap() and
> > assuming MSG_V2.
> This issue is orthogonal with my fix, which was pre-existing before the
> multiqueue support. I believe there should be another separate patch to
> fix QEMU for pre-GET/SET_BACKEND kernel.

Right.

>
> > 2) check the batching support in vhost_vdpa_listener_begin_batch()
> > instead of trying to set VHOST_IOTLB_BATCH_BEGIN uncondtionally
> This is non-issue since VHOST_BACKEND_F_IOTLB_BATCH is already validated
> in the caller vhost_vdpa_iotlb_batch_begin_once().

Yes, I miss that optimization.

Thanks

>
> -Siwei
> >
> > Thanks
> >
> >> Thanks!
> >>
> >>> Thanks
> >>>
> >>>
> >>>> In that regard, this commit needs to be marked as "Fixes: ...", either
> >>>> ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
> >>>> ("4d191cf vhost-vdpa: classify one time request"). We have a
> >>>> regression if we introduce both, or the second one and the support of
> >>>> any other backend feature.
> >>>>
> >>>>> Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
> >>>>> and it doesn't even need to. There seems to me no possibility for it to
> >>>>> fail in a way as thought here. The capture is that IOTLB batching is at
> >>>>> least a vdpa device level backend feature, if not per-kernel. Same as
> >>>>> IOTLB_MSG_V2.
> >>>>>
> >>>> At this moment it is per-kernel, yes. With your patch there is no need
> >>>> to fail because of the lack of _F_IOTLB_BATCH, the code should handle
> >>>> this case ok.
> >>>>
> >>>> But if VHOST_GET_BACKEND_FEATURES returns no support for
> >>>> VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
> >>>> messages anyway. This has nothing to do with the patch, I'm just
> >>>> noting it here.
> >>>>
> >>>> In that case, maybe it is better to return something like -ENOTSUP?
> >>>>
> >>>> Thanks!
> >>>>
> >>>>> -Siwei
> >>>>>
> >>>>>>     Some open questions:
> >>>>>>
> >>>>>> Should we make the vdpa driver return error as long as a feature is
> >>>>>> used but not set by qemu, or let it as undefined? I guess we have to
> >>>>>> keep the batching at least without checking so the kernel supports old
> >>>>>> versions of qemu.
> >>>>>>
> >>>>>> On the other hand, should we return an error if IOTLB_MSG_V2 is not
> >>>>>> supported here? We're basically assuming it in other functions.
> >>>>>>
> >>>>>>> To fix it, send down ioctl only once via the first
> >>>>>>> vhost_dev with index 0. Toggle the polarity of the
> >>>>>>> vhost_vdpa_one_time_request() test would do the trick.
> >>>>>>>
> >>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >>>>>> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> >>>>>>
> >>>>>>> ---
> >>>>>>>     hw/virtio/vhost-vdpa.c | 2 +-
> >>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>>>>> index c5ed7a3..27ea706 100644
> >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>>>>> @@ -665,7 +665,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >>>>>>>
> >>>>>>>         features &= f;
> >>>>>>>
> >>>>>>> -    if (vhost_vdpa_one_time_request(dev)) {
> >>>>>>> +    if (!vhost_vdpa_one_time_request(dev)) {
> >>>>>>>             r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
> >>>>>>>             if (r) {
> >>>>>>>                 return -EFAULT;
> >>>>>>> --
> >>>>>>> 1.8.3.1
> >>>>>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-04-01 20:37         ` Si-Wei Liu
@ 2022-04-02  2:00           ` Jason Wang
  2022-04-05 19:18             ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-02  2:00 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/31/2022 1:36 AM, Jason Wang wrote:
> > On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 3/30/2022 2:14 AM, Jason Wang wrote:
> >>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>> Previous commit prevents vhost-user and vhost-vdpa from using
> >>>> userland vq handler via disable_ioeventfd_handler. The same
> >>>> needs to be done for host notifier cleanup too, as the
> >>>> virtio_queue_host_notifier_read handler still tends to read
> >>>> pending event left behind on ioeventfd and attempts to handle
> >>>> outstanding kicks from QEMU userland vq.
> >>>>
> >>>> If vq handler is not disabled on cleanup, it may lead to sigsegv
> >>>> with recursive virtio_net_set_status call on the control vq:
> >>>>
> >>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
> >>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
> >>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
> >>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
> >>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
> >>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
> >>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
> >>> I feel it's probably a bug elsewhere e.g when we fail to start
> >>> vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
> >>> will fallback to the userspace vq handler.
> >> Apologies, an incorrect stack trace was pasted which actually came from
> >> patch #1. I will post a v2 with the corresponding one as below:
> >>
> >> 0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at
> >> ../hw/core/qdev.c:376
> >> 1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled
> >> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
> >> 2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at
> >> ../hw/virtio/vhost.c:318
> >> 3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>,
> >> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
> >> ../hw/virtio/vhost.c:336
> >> 4  0x000055f800d71867 in vhost_virtqueue_stop
> >> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
> >> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
> >> 5  0x000055f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30,
> >> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
> >> 6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
> >> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
> >> 7  0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
> >> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> >> cvq=cvq@entry=1)
> >>      at ../hw/net/vhost_net.c:423
> >> 8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
> >> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
> >> 9  0x000055f800d4e628 in virtio_net_set_status
> >> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
> >> ../hw/net/virtio-net.c:370
> > I don't understand why virtio_net_handle_ctrl() call virtio_net_set_stauts()...
> The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ
> command, i.e. in virtio_net_handle_mq():

Completely forget that the code was actually written by me :\

>
> 1413     n->curr_queue_pairs = queue_pairs;
> 1414     /* stop the backend before changing the number of queue_pairs
> to avoid handling a
> 1415      * disabled queue */
> 1416     virtio_net_set_status(vdev, vdev->status);
> 1417     virtio_net_set_queue_pairs(n);
>
> Noted before the vdpa multiqueue support, there was never a vhost_dev
> for ctrl_vq exposed, i.e. there's no host notifier set up for the
> ctrl_vq on vhost_kernel as it is emulated in QEMU software.
>
> >
> >> 10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized
> >> out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at
> >> ../hw/net/virtio-net.c:1408
> >> 11 0x000055f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
> >> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
> >> 12 0x000055f800d69f37 in virtio_queue_host_notifier_read
> >> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
> >> 13 0x000055f800d69f37 in virtio_queue_host_notifier_read
> >> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
> >> 14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier
> >> (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
> >> 15 0x000055f800d73106 in vhost_dev_disable_notifiers
> >> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
> >>      at ../../../include/hw/virtio/virtio-bus.h:35
> >> 16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
> >> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
> >> 17 0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
> >> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> >> cvq=cvq@entry=1)
> >>      at ../hw/net/vhost_net.c:423
> >> 18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
> >> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
> >> 19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
> >> status=15 '\017') at ../hw/net/virtio-net.c:370
> >> 20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
> >> val=<optimized out>) at ../hw/virtio/virtio.c:1945
> >> 21 0x000055f800d11d9d in vm_state_notify (running=running@entry=false,
> >> state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
> >> 22 0x000055f800d04e7a in do_vm_stop
> >> (state=state@entry=RUN_STATE_SHUTDOWN, send_stop=send_stop@entry=false)
> >> at ../softmmu/cpus.c:262
> >> 23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
> >> 24 0x000055f800d126af in qemu_cleanup () at ../softmmu/runstate.c:812
> >> 25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized
> >> out>, envp=<optimized out>) at ../softmmu/main.c:51
> >>
> >>   From the trace pending read only occurs in stop path. The recursive
> >> virtio_net_set_status from virtio_net_handle_ctrl doesn't make sense to me.
> > Yes, we need to figure this out to know the root cause.
> I think it has something to do with the virtqueue unready issue that the
> vhost_reset_device() refactoring series attempt to fix. If that is fixed
> we should not see this sigsegv with mlx5_vdpa. However I guess we both
> agreed that the vq_unready support would need new uAPI (some flag) to
> define, hence this fix applies to the situation where uAPI doesn't exist
> on the kernel or the vq_unready is not supported by vdpa vendor driver.
>

Yes.

> >
> > The code should work for the case when vhost-vdp fails to start.
> Unlike the other datapath queues for net vdpa, the events left behind in
> the control queue can't be processed by the userspace, as unlike
> vhost-kernel we don't have a fallback path in the userspace.

So that's the question, we should have a safe fallback.

> To ignore
> the pending event and let vhost vdpa process it on resume/start is
> perhaps the best thing to do. This is even true for datapath queues for
> other vdpa devices than of network.
>
> >
> >> Not sure I got the reason why we need to handle pending host
> >> notification in userland vq, can you elaborate?
> > Because vhost-vDPA fails to start, we will "fallback" to a dummy userspace.
> Is the dummy userspace working or operational? What's the use case of
> this "fallback" dummy if what guest user can get is a busted NIC?

The problem is we can't do better in this case now. Such fallack (e.g
for vhost-user) has been used for years. Or do you have any better
ideas?

It doesn't differ too much of the two approaches:

1) dummy fallback which can do even cvq

and

2) disable host notifiers

Especially consider 2) requires more changes.

> I
> think this is very different from the vhost-kernel case in that once
> vhost fails, we can fallback to userspace to emulate the network through
> the tap fd in a good way. However, there's no equivalent yet for
> vhost-vdpa...
>

As said previously, technically we can have vhost-vDPA network backend
as a fallback. (So did for vhost-user).

Thanks

> Thanks,
> -Siwei
>
> >
> > Thanks
> >
> >> Thanks,
> >> -Siwei
> >>
> >>> Thanks
> >>>
> >>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
> >>>>      at ../hw/virtio/virtio-pci.c:974
> >>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
> >>>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
> >>>>      at ../hw/net/vhost_net.c:361
> >>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
> >>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
> >>>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
> >>>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
> >>>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
> >>>>      at ../softmmu/memory.c:492
> >>>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
> >>>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
> >>>>      at ../softmmu/memory.c:1504
> >>>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at ../../../include/qemu/host-utils.h:165
> >>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
> >>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
> >>>>      at ../softmmu/physmem.c:2914
> >>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
> >>>>      attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
> >>>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
> >>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
> >>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
> >>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
> >>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
> >>>>
> >>>> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
> >>>> Cc: Jason Wang <jasowang@redhat.com>
> >>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >>>> ---
> >>>>    hw/virtio/virtio-bus.c | 3 ++-
> >>>>    1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> >>>> index 0f69d1c..3159b58 100644
> >>>> --- a/hw/virtio/virtio-bus.c
> >>>> +++ b/hw/virtio/virtio-bus.c
> >>>> @@ -311,7 +311,8 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
> >>>>        /* Test and clear notifier after disabling event,
> >>>>         * in case poll callback didn't have time to run.
> >>>>         */
> >>>> -    virtio_queue_host_notifier_read(notifier);
> >>>> +    if (!vdev->disable_ioeventfd_handler)
> >>>> +        virtio_queue_host_notifier_read(notifier);
> >>>>        event_notifier_cleanup(notifier);
> >>>>    }
> >>>>
> >>>> --
> >>>> 1.8.3.1
> >>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  2022-04-01 22:32         ` Si-Wei Liu
@ 2022-04-02  2:10           ` Jason Wang
  2022-04-05 23:26             ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-02  2:10 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Sat, Apr 2, 2022 at 6:32 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/31/2022 1:39 AM, Jason Wang wrote:
> > On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 3/30/2022 2:00 AM, Jason Wang wrote:
> >>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>> With MQ enabled vdpa device and non-MQ supporting guest e.g.
> >>>> booting vdpa with mq=on over OVMF of single vqp, below assert
> >>>> failure is seen:
> >>>>
> >>>> ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
> >>>>
> >>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
> >>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
> >>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
> >>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
> >>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
> >>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
> >>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
> >>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
> >>>>      at ../hw/virtio/virtio-pci.c:974
> >>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
> >>>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
> >>>>      at ../hw/net/vhost_net.c:361
> >>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
> >>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
> >>>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
> >>>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
> >>>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
> >>>>      at ../softmmu/memory.c:492
> >>>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
> >>>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
> >>>>      at ../softmmu/memory.c:1504
> >>>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
> >>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
> >>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
> >>>>      at ../softmmu/physmem.c:2914
> >>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
> >>>>      attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
> >>>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
> >>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
> >>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
> >>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
> >>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
> >>>>
> >>>> The cause for the assert failure is due to that the vhost_dev index
> >>>> for the ctrl vq was not aligned with actual one in use by the guest.
> >>>> Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
> >>>> if guest doesn't support multiqueue, the guest vq layout would shrink
> >>>> to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
> >>>> This results in ctrl_vq taking a different vhost_dev group index than
> >>>> the default. We can map vq to the correct vhost_dev group by checking
> >>>> if MQ is supported by guest and successfully negotiated. Since the
> >>>> MQ feature is only present along with CTRL_VQ, we make sure the index
> >>>> 2 is only meant for the control vq while MQ is not supported by guest.
> >>>>
> >>>> Be noted if QEMU or guest doesn't support control vq, there's no bother
> >>>> exposing vhost_dev and guest notifier for the control vq. Since
> >>>> vhost_net_start/stop implies DRIVER_OK is set in device status, feature
> >>>> negotiation should be completed when reaching virtio_net_vhost_status().
> >>>>
> >>>> Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
> >>>> Suggested-by: Jason Wang <jasowang@redhat.com>
> >>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >>>> ---
> >>>>    hw/net/virtio-net.c | 19 ++++++++++++++++---
> >>>>    1 file changed, 16 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>> index 1067e72..484b215 100644
> >>>> --- a/hw/net/virtio-net.c
> >>>> +++ b/hw/net/virtio-net.c
> >>>> @@ -245,7 +245,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>>>        VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >>>>        NetClientState *nc = qemu_get_queue(n->nic);
> >>>>        int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> >>>> -    int cvq = n->max_ncs - n->max_queue_pairs;
> >>>> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> >>>> +              n->max_ncs - n->max_queue_pairs : 0;
> >>> Let's use a separate patch for this.
> >> Yes, I can do that. Then the new patch will become a requisite for this
> >> patch.
> >>
> >>>>        if (!get_vhost_net(nc->peer)) {
> >>>>            return;
> >>>> @@ -3170,8 +3171,14 @@ static NetClientInfo net_virtio_info = {
> >>>>    static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> >>>>    {
> >>>>        VirtIONet *n = VIRTIO_NET(vdev);
> >>>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >>>> +    NetClientState *nc;
> >>>>        assert(n->vhost_started);
> >>>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
> >>>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
> >>> This assert seems guest trigger-able. If yes, I would remove this or
> >>> replace it with log_guest_error.
> >> This assert actually is relevant to the cvq change in
> >> virtio_net_vhost_status(). Since the same check on VIRTIO_NET_F_CTRL_VQ
> >> has been done earlier, it is assured that CTRL_VQ is negotiated when
> >> getting here.
> >> Noted the vhost_started is asserted in the same function, which in turn
> >> implies DRIVER_OK is set meaning feature negotiation is complete. I
> >> can't easily think of a scenario which guest may inadvertently or
> >> purposely trigger the assert?
> > So the code can be triggered like e.g unmasking:
> >
> > virtio_pci_vq_vector_unmask()
> >          k->guest_notifier_pending()
> Hmmm, are you concerned more about idx being invalid, or
> VIRTIO_NET_F_CTRL_VQ getting cleared?

Something like this, we can't let a buggy driver crash into Qemu.

>
> virtio_pci_vector_unmask() has validation through virtio_queue_get_num()
> that ensures the vq index is valid.

Actually not, it just check whether the vq size is set:

int virtio_queue_get_num(VirtIODevice *vdev, int n)
{
    return vdev->vq[n].vring.num;
}

> While it doesn't seem possible for
> VIRTIO_NET_F_CTRL_VQ to be cleared without device reset first,

Probably, since we had a check in virtio_set_features():

    /*
     * The driver must not attempt to set features after feature negotiation
     * has finished.
     */
    if (vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) {
        return -EINVAL;
    }

But another interesting part is that the guest_feautres come from the
migration stream as well:

static const VMStateDescription vmstate_virtio_64bit_features = {
    .name = "virtio/64bit_features",
    .version_id = 1,
    .minimum_version_id = 1,
    .needed = &virtio_64bit_features_needed,
    .fields = (VMStateField[]) {
        VMSTATE_UINT64(guest_features, VirtIODevice),
        VMSTATE_END_OF_LIST()
    }
};

We should also be ready to let the buggy migration flow to crash us.

>during
> which the pending event left over on guest notifier eventfd should have
> been completed within virtio_pci_set_guest_notifiers(false) before
> vhost_net_stop() returns. If I am not missing something here, I guess
> we're probably fine?

I'm not sure I got here, but the mask/unmask is not necessarily
related to vhost stop. E.g it can happen if guest want to change IRQ
affinity.

Thanks

>
> -Siwei
>
> >
> > Thanks
> >
> >
> >> -Siwei
> >>
> >>>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
> >>>> +    } else {
> >>>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >>>> +    }
> >>>>        return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
> >>>>    }
> >>>>
> >>>> @@ -3179,8 +3186,14 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
> >>>>                                               bool mask)
> >>>>    {
> >>>>        VirtIONet *n = VIRTIO_NET(vdev);
> >>>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >>>> +    NetClientState *nc;
> >>>>        assert(n->vhost_started);
> >>>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
> >>>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
> >>> And this.
> >>>
> >>> Thanks
> >>>
> >>>
> >>>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
> >>>> +    } else {
> >>>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
> >>>> +    }
> >>>>        vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
> >>>>                                 vdev, idx, mask);
> >>>>    }
> >>>> --
> >>>> 1.8.3.1
> >>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-04-02  2:00           ` Jason Wang
@ 2022-04-05 19:18             ` Si-Wei Liu
  2022-04-07  7:05               ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-05 19:18 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 4/1/2022 7:00 PM, Jason Wang wrote:
> On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/31/2022 1:36 AM, Jason Wang wrote:
>>> On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 3/30/2022 2:14 AM, Jason Wang wrote:
>>>>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>> Previous commit prevents vhost-user and vhost-vdpa from using
>>>>>> userland vq handler via disable_ioeventfd_handler. The same
>>>>>> needs to be done for host notifier cleanup too, as the
>>>>>> virtio_queue_host_notifier_read handler still tends to read
>>>>>> pending event left behind on ioeventfd and attempts to handle
>>>>>> outstanding kicks from QEMU userland vq.
>>>>>>
>>>>>> If vq handler is not disabled on cleanup, it may lead to sigsegv
>>>>>> with recursive virtio_net_set_status call on the control vq:
>>>>>>
>>>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>>>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>>>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>>>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>>>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>>>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>>>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
>>>>> I feel it's probably a bug elsewhere e.g when we fail to start
>>>>> vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
>>>>> will fallback to the userspace vq handler.
>>>> Apologies, an incorrect stack trace was pasted which actually came from
>>>> patch #1. I will post a v2 with the corresponding one as below:
>>>>
>>>> 0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at
>>>> ../hw/core/qdev.c:376
>>>> 1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled
>>>> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
>>>> 2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at
>>>> ../hw/virtio/vhost.c:318
>>>> 3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>,
>>>> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
>>>> ../hw/virtio/vhost.c:336
>>>> 4  0x000055f800d71867 in vhost_virtqueue_stop
>>>> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
>>>> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
>>>> 5  0x000055f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30,
>>>> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
>>>> 6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
>>>> 7  0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>>>> cvq=cvq@entry=1)
>>>>       at ../hw/net/vhost_net.c:423
>>>> 8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>>>> 9  0x000055f800d4e628 in virtio_net_set_status
>>>> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
>>>> ../hw/net/virtio-net.c:370
>>> I don't understand why virtio_net_handle_ctrl() call virtio_net_set_stauts()...
>> The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ
>> command, i.e. in virtio_net_handle_mq():
> Completely forget that the code was actually written by me :\
>
>> 1413     n->curr_queue_pairs = queue_pairs;
>> 1414     /* stop the backend before changing the number of queue_pairs
>> to avoid handling a
>> 1415      * disabled queue */
>> 1416     virtio_net_set_status(vdev, vdev->status);
>> 1417     virtio_net_set_queue_pairs(n);
>>
>> Noted before the vdpa multiqueue support, there was never a vhost_dev
>> for ctrl_vq exposed, i.e. there's no host notifier set up for the
>> ctrl_vq on vhost_kernel as it is emulated in QEMU software.
>>
>>>> 10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized
>>>> out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at
>>>> ../hw/net/virtio-net.c:1408
>>>> 11 0x000055f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
>>>> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
>>>> 12 0x000055f800d69f37 in virtio_queue_host_notifier_read
>>>> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
>>>> 13 0x000055f800d69f37 in virtio_queue_host_notifier_read
>>>> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
>>>> 14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier
>>>> (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
>>>> 15 0x000055f800d73106 in vhost_dev_disable_notifiers
>>>> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
>>>>       at ../../../include/hw/virtio/virtio-bus.h:35
>>>> 16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
>>>> 17 0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>>>> cvq=cvq@entry=1)
>>>>       at ../hw/net/vhost_net.c:423
>>>> 18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>,
>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>>>> 19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
>>>> status=15 '\017') at ../hw/net/virtio-net.c:370
>>>> 20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
>>>> val=<optimized out>) at ../hw/virtio/virtio.c:1945
>>>> 21 0x000055f800d11d9d in vm_state_notify (running=running@entry=false,
>>>> state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
>>>> 22 0x000055f800d04e7a in do_vm_stop
>>>> (state=state@entry=RUN_STATE_SHUTDOWN, send_stop=send_stop@entry=false)
>>>> at ../softmmu/cpus.c:262
>>>> 23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
>>>> 24 0x000055f800d126af in qemu_cleanup () at ../softmmu/runstate.c:812
>>>> 25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized
>>>> out>, envp=<optimized out>) at ../softmmu/main.c:51
>>>>
>>>>    From the trace pending read only occurs in stop path. The recursive
>>>> virtio_net_set_status from virtio_net_handle_ctrl doesn't make sense to me.
>>> Yes, we need to figure this out to know the root cause.
>> I think it has something to do with the virtqueue unready issue that the
>> vhost_reset_device() refactoring series attempt to fix. If that is fixed
>> we should not see this sigsegv with mlx5_vdpa. However I guess we both
>> agreed that the vq_unready support would need new uAPI (some flag) to
>> define, hence this fix applies to the situation where uAPI doesn't exist
>> on the kernel or the vq_unready is not supported by vdpa vendor driver.
>>
> Yes.
>
>>> The code should work for the case when vhost-vdp fails to start.
>> Unlike the other datapath queues for net vdpa, the events left behind in
>> the control queue can't be processed by the userspace, as unlike
>> vhost-kernel we don't have a fallback path in the userspace.
> So that's the question, we should have a safe fallback.
>
>> To ignore
>> the pending event and let vhost vdpa process it on resume/start is
>> perhaps the best thing to do. This is even true for datapath queues for
>> other vdpa devices than of network.
>>
>>>> Not sure I got the reason why we need to handle pending host
>>>> notification in userland vq, can you elaborate?
>>> Because vhost-vDPA fails to start, we will "fallback" to a dummy userspace.
>> Is the dummy userspace working or operational? What's the use case of
>> this "fallback" dummy if what guest user can get is a busted NIC?
> The problem is we can't do better in this case now. Such fallack (e.g
> for vhost-user) has been used for years. Or do you have any better
> ideas?
In my opinion if vhost-vdpa or vhost-user fails to start, maybe we 
should try to disable the device via virtio_error(), which would set 
broken to true and set NEEDS_RESET in case of VERSION_1. That way the 
device won't move forward further and the guest may get the indication 
via config interrupt that something had gone wrong underneath. If device 
reset is well supported there the guest driver would retry. This can at 
least give the backend some chance to recover if running into 
intermittent error. The worst result would be the device keeps resetting 
repeatedly, for which we may introduce tunable to control the rate if 
seeing reset occurs too often.. Did this ever get considered before?

Noted that the dummy userspace can't handle any control vq command 
effectively once the vhost backend fails, for e.g. how does it handle 
those guest offload, rx mode, MAC or VLAN filter changes without sending 
the request down to the backend? This could easily get inconsistent 
state to the guest if somehow we are able to resume the virtqueue 
without a reset. Even so, I suspect the device reset eventually is still 
needed on the other part, which is subject to the specific failure. It 
looks to me at least for vhost-vdpa, it might be the safest fallback so 
far to ignore pending event in ctrl_vq, and disable the device from 
moving forward in case of backend start failure.

>
> It doesn't differ too much of the two approaches:
>
> 1) dummy fallback which can do even cvq
>
> and
>
> 2) disable host notifiers
>
> Especially consider 2) requires more changes.
I'm not clear if 2) really needs more changes, as it seems to me that it 
would take more unwarranted changes to make dummy fallback to work on 
cvq? And suppose we can fallback to disabling device via virtio_error(), 
we don't even need to change any code on cvq?

On the other hand, for the specific code path this patch tries to fix, 
it is not due to failure to start vhost-vdpa backend, but more of a 
control flow flaw in the stop path due to lack of VQ stop uAPI. Let 
alone dummy or host notifier, considering currently it's in the stop 
path followed by a reset, I feel it should be pretty safe to just ignore 
the pending event on the control vq rather than process it prematurely 
in userspace. What do you think? I can leave without the host notifier 
handler change for sure.

>
>> I
>> think this is very different from the vhost-kernel case in that once
>> vhost fails, we can fallback to userspace to emulate the network through
>> the tap fd in a good way. However, there's no equivalent yet for
>> vhost-vdpa...
>>
> As said previously, technically we can have vhost-vDPA network backend
> as a fallback.
But this is working as yet. And how do you envision the datapath may 
work given that we don't have a fallback tap fd?

-Siwei


>   (So did for vhost-user).
>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>> Thanks
>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>> Thanks
>>>>>
>>>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>>>>>       at ../hw/virtio/virtio-pci.c:974
>>>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>>>>>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>>>>>       at ../hw/net/vhost_net.c:361
>>>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>>>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
>>>>>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
>>>>>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>>>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>>>>>>       at ../softmmu/memory.c:492
>>>>>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>>>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>>>>>>       at ../softmmu/memory.c:1504
>>>>>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at ../../../include/qemu/host-utils.h:165
>>>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
>>>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>>>>>       at ../softmmu/physmem.c:2914
>>>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>>>>>>       attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
>>>>>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>>>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>>>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
>>>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>>>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>>>>>
>>>>>> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
>>>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>> ---
>>>>>>     hw/virtio/virtio-bus.c | 3 ++-
>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
>>>>>> index 0f69d1c..3159b58 100644
>>>>>> --- a/hw/virtio/virtio-bus.c
>>>>>> +++ b/hw/virtio/virtio-bus.c
>>>>>> @@ -311,7 +311,8 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>>>>>>         /* Test and clear notifier after disabling event,
>>>>>>          * in case poll callback didn't have time to run.
>>>>>>          */
>>>>>> -    virtio_queue_host_notifier_read(notifier);
>>>>>> +    if (!vdev->disable_ioeventfd_handler)
>>>>>> +        virtio_queue_host_notifier_read(notifier);
>>>>>>         event_notifier_cleanup(notifier);
>>>>>>     }
>>>>>>
>>>>>> --
>>>>>> 1.8.3.1
>>>>>>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  2022-04-02  2:10           ` Jason Wang
@ 2022-04-05 23:26             ` Si-Wei Liu
  0 siblings, 0 replies; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-05 23:26 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, mst, qemu-devel, Eli Cohen



On 4/1/2022 7:10 PM, Jason Wang wrote:
> On Sat, Apr 2, 2022 at 6:32 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/31/2022 1:39 AM, Jason Wang wrote:
>>> On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 3/30/2022 2:00 AM, Jason Wang wrote:
>>>>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>> With MQ enabled vdpa device and non-MQ supporting guest e.g.
>>>>>> booting vdpa with mq=on over OVMF of single vqp, below assert
>>>>>> failure is seen:
>>>>>>
>>>>>> ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
>>>>>>
>>>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>>>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>>>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>>>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>>>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>>>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>>>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
>>>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>>>>>       at ../hw/virtio/virtio-pci.c:974
>>>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>>>>>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>>>>>       at ../hw/net/vhost_net.c:361
>>>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>>>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
>>>>>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
>>>>>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>>>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>>>>>>       at ../softmmu/memory.c:492
>>>>>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>>>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>>>>>>       at ../softmmu/memory.c:1504
>>>>>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
>>>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
>>>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>>>>>       at ../softmmu/physmem.c:2914
>>>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>>>>>>       attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
>>>>>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>>>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>>>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
>>>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>>>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>>>>>
>>>>>> The cause for the assert failure is due to that the vhost_dev index
>>>>>> for the ctrl vq was not aligned with actual one in use by the guest.
>>>>>> Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
>>>>>> if guest doesn't support multiqueue, the guest vq layout would shrink
>>>>>> to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
>>>>>> This results in ctrl_vq taking a different vhost_dev group index than
>>>>>> the default. We can map vq to the correct vhost_dev group by checking
>>>>>> if MQ is supported by guest and successfully negotiated. Since the
>>>>>> MQ feature is only present along with CTRL_VQ, we make sure the index
>>>>>> 2 is only meant for the control vq while MQ is not supported by guest.
>>>>>>
>>>>>> Be noted if QEMU or guest doesn't support control vq, there's no bother
>>>>>> exposing vhost_dev and guest notifier for the control vq. Since
>>>>>> vhost_net_start/stop implies DRIVER_OK is set in device status, feature
>>>>>> negotiation should be completed when reaching virtio_net_vhost_status().
>>>>>>
>>>>>> Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
>>>>>> Suggested-by: Jason Wang <jasowang@redhat.com>
>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>> ---
>>>>>>     hw/net/virtio-net.c | 19 ++++++++++++++++---
>>>>>>     1 file changed, 16 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index 1067e72..484b215 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -245,7 +245,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>>         VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>>         NetClientState *nc = qemu_get_queue(n->nic);
>>>>>>         int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
>>>>>> -    int cvq = n->max_ncs - n->max_queue_pairs;
>>>>>> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
>>>>>> +              n->max_ncs - n->max_queue_pairs : 0;
>>>>> Let's use a separate patch for this.
>>>> Yes, I can do that. Then the new patch will become a requisite for this
>>>> patch.
>>>>
>>>>>>         if (!get_vhost_net(nc->peer)) {
>>>>>>             return;
>>>>>> @@ -3170,8 +3171,14 @@ static NetClientInfo net_virtio_info = {
>>>>>>     static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
>>>>>>     {
>>>>>>         VirtIONet *n = VIRTIO_NET(vdev);
>>>>>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>>>> +    NetClientState *nc;
>>>>>>         assert(n->vhost_started);
>>>>>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>>>>>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
>>>>> This assert seems guest trigger-able. If yes, I would remove this or
>>>>> replace it with log_guest_error.
>>>> This assert actually is relevant to the cvq change in
>>>> virtio_net_vhost_status(). Since the same check on VIRTIO_NET_F_CTRL_VQ
>>>> has been done earlier, it is assured that CTRL_VQ is negotiated when
>>>> getting here.
>>>> Noted the vhost_started is asserted in the same function, which in turn
>>>> implies DRIVER_OK is set meaning feature negotiation is complete. I
>>>> can't easily think of a scenario which guest may inadvertently or
>>>> purposely trigger the assert?
>>> So the code can be triggered like e.g unmasking:
>>>
>>> virtio_pci_vq_vector_unmask()
>>>           k->guest_notifier_pending()
>> Hmmm, are you concerned more about idx being invalid, or
>> VIRTIO_NET_F_CTRL_VQ getting cleared?
> Something like this, we can't let a buggy driver crash into Qemu.
>
>> virtio_pci_vector_unmask() has validation through virtio_queue_get_num()
>> that ensures the vq index is valid.
> Actually not, it just check whether the vq size is set:
>
> int virtio_queue_get_num(VirtIODevice *vdev, int n)
> {
>      return vdev->vq[n].vring.num;
> }
>
>> While it doesn't seem possible for
>> VIRTIO_NET_F_CTRL_VQ to be cleared without device reset first,
> Probably, since we had a check in virtio_set_features():
>
>      /*
>       * The driver must not attempt to set features after feature negotiation
>       * has finished.
>       */
>      if (vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) {
>          return -EINVAL;
>      }
>
> But another interesting part is that the guest_feautres come from the
> migration stream as well:
>
> static const VMStateDescription vmstate_virtio_64bit_features = {
>      .name = "virtio/64bit_features",
>      .version_id = 1,
>      .minimum_version_id = 1,
>      .needed = &virtio_64bit_features_needed,
>      .fields = (VMStateField[]) {
>          VMSTATE_UINT64(guest_features, VirtIODevice),
>          VMSTATE_END_OF_LIST()
>      }
> };
>
> We should also be ready to let the buggy migration flow to crash us.
Fair enough. Given the possibility of introduction through migration 
stream I think now it's needed to converting assert to error. Thanks for 
pointing it out.

Thanks,
-Siwei

>
>> during
>> which the pending event left over on guest notifier eventfd should have
>> been completed within virtio_pci_set_guest_notifiers(false) before
>> vhost_net_stop() returns. If I am not missing something here, I guess
>> we're probably fine?
> I'm not sure I got here, but the mask/unmask is not necessarily
> related to vhost stop. E.g it can happen if guest want to change IRQ
> affinity.
>
> Thanks
>
>> -Siwei
>>
>>> Thanks
>>>
>>>
>>>> -Siwei
>>>>
>>>>>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>>>>>> +    } else {
>>>>>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>>>> +    }
>>>>>>         return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
>>>>>>     }
>>>>>>
>>>>>> @@ -3179,8 +3186,14 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
>>>>>>                                                bool mask)
>>>>>>     {
>>>>>>         VirtIONet *n = VIRTIO_NET(vdev);
>>>>>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>>>> +    NetClientState *nc;
>>>>>>         assert(n->vhost_started);
>>>>>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>>>>>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
>>>>> And this.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>>>>>> +    } else {
>>>>>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>>>>> +    }
>>>>>>         vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
>>>>>>                                  vdev, idx, mask);
>>>>>>     }
>>>>>> --
>>>>>> 1.8.3.1
>>>>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-04-05 19:18             ` Si-Wei Liu
@ 2022-04-07  7:05               ` Jason Wang
  2022-04-08  1:02                 ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-07  7:05 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst


在 2022/4/6 上午3:18, Si-Wei Liu 写道:
>
>
> On 4/1/2022 7:00 PM, Jason Wang wrote:
>> On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>
>>>
>>> On 3/31/2022 1:36 AM, Jason Wang wrote:
>>>> On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu <si-wei.liu@oracle.com> 
>>>> wrote:
>>>>>
>>>>> On 3/30/2022 2:14 AM, Jason Wang wrote:
>>>>>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu 
>>>>>> <si-wei.liu@oracle.com> wrote:
>>>>>>> Previous commit prevents vhost-user and vhost-vdpa from using
>>>>>>> userland vq handler via disable_ioeventfd_handler. The same
>>>>>>> needs to be done for host notifier cleanup too, as the
>>>>>>> virtio_queue_host_notifier_read handler still tends to read
>>>>>>> pending event left behind on ioeventfd and attempts to handle
>>>>>>> outstanding kicks from QEMU userland vq.
>>>>>>>
>>>>>>> If vq handler is not disabled on cleanup, it may lead to sigsegv
>>>>>>> with recursive virtio_net_set_status call on the control vq:
>>>>>>>
>>>>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>>>>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>>>>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>>>>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>>>>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized 
>>>>>>> out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>>>>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized 
>>>>>>> out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>>>>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask 
>>>>>>> (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized 
>>>>>>> out>) at ../hw/virtio/vhost.c:1557
>>>>>> I feel it's probably a bug elsewhere e.g when we fail to start
>>>>>> vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
>>>>>> will fallback to the userspace vq handler.
>>>>> Apologies, an incorrect stack trace was pasted which actually came 
>>>>> from
>>>>> patch #1. I will post a v2 with the corresponding one as below:
>>>>>
>>>>> 0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at
>>>>> ../hw/core/qdev.c:376
>>>>> 1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled
>>>>> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
>>>>> 2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at
>>>>> ../hw/virtio/vhost.c:318
>>>>> 3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>,
>>>>> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
>>>>> ../hw/virtio/vhost.c:336
>>>>> 4  0x000055f800d71867 in vhost_virtqueue_stop
>>>>> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
>>>>> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
>>>>> 5  0x000055f800d7406c in vhost_dev_stop 
>>>>> (hdev=hdev@entry=0x55f8037ccc30,
>>>>> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
>>>>> 6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
>>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
>>>>> 7  0x000055f800bf0678 in vhost_net_stop 
>>>>> (dev=dev@entry=0x55f8044ec590,
>>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>>>>> cvq=cvq@entry=1)
>>>>>       at ../hw/net/vhost_net.c:423
>>>>> 8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized 
>>>>> out>,
>>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>>>>> 9  0x000055f800d4e628 in virtio_net_set_status
>>>>> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
>>>>> ../hw/net/virtio-net.c:370
>>>> I don't understand why virtio_net_handle_ctrl() call 
>>>> virtio_net_set_stauts()...
>>> The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ
>>> command, i.e. in virtio_net_handle_mq():
>> Completely forget that the code was actually written by me :\
>>
>>> 1413     n->curr_queue_pairs = queue_pairs;
>>> 1414     /* stop the backend before changing the number of queue_pairs
>>> to avoid handling a
>>> 1415      * disabled queue */
>>> 1416     virtio_net_set_status(vdev, vdev->status);
>>> 1417     virtio_net_set_queue_pairs(n);
>>>
>>> Noted before the vdpa multiqueue support, there was never a vhost_dev
>>> for ctrl_vq exposed, i.e. there's no host notifier set up for the
>>> ctrl_vq on vhost_kernel as it is emulated in QEMU software.
>>>
>>>>> 10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized
>>>>> out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at
>>>>> ../hw/net/virtio-net.c:1408
>>>>> 11 0x000055f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
>>>>> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
>>>>> 12 0x000055f800d69f37 in virtio_queue_host_notifier_read
>>>>> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
>>>>> 13 0x000055f800d69f37 in virtio_queue_host_notifier_read
>>>>> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
>>>>> 14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier
>>>>> (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
>>>>> 15 0x000055f800d73106 in vhost_dev_disable_notifiers
>>>>> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
>>>>>       at ../../../include/hw/virtio/virtio-bus.h:35
>>>>> 16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
>>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
>>>>> 17 0x000055f800bf0678 in vhost_net_stop 
>>>>> (dev=dev@entry=0x55f8044ec590,
>>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>>>>> cvq=cvq@entry=1)
>>>>>       at ../hw/net/vhost_net.c:423
>>>>> 18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized 
>>>>> out>,
>>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>>>>> 19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
>>>>> status=15 '\017') at ../hw/net/virtio-net.c:370
>>>>> 20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
>>>>> val=<optimized out>) at ../hw/virtio/virtio.c:1945
>>>>> 21 0x000055f800d11d9d in vm_state_notify 
>>>>> (running=running@entry=false,
>>>>> state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
>>>>> 22 0x000055f800d04e7a in do_vm_stop
>>>>> (state=state@entry=RUN_STATE_SHUTDOWN, 
>>>>> send_stop=send_stop@entry=false)
>>>>> at ../softmmu/cpus.c:262
>>>>> 23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
>>>>> 24 0x000055f800d126af in qemu_cleanup () at ../softmmu/runstate.c:812
>>>>> 25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized
>>>>> out>, envp=<optimized out>) at ../softmmu/main.c:51
>>>>>
>>>>>    From the trace pending read only occurs in stop path. The 
>>>>> recursive
>>>>> virtio_net_set_status from virtio_net_handle_ctrl doesn't make 
>>>>> sense to me.
>>>> Yes, we need to figure this out to know the root cause.
>>> I think it has something to do with the virtqueue unready issue that 
>>> the
>>> vhost_reset_device() refactoring series attempt to fix. If that is 
>>> fixed
>>> we should not see this sigsegv with mlx5_vdpa. However I guess we both
>>> agreed that the vq_unready support would need new uAPI (some flag) to
>>> define, hence this fix applies to the situation where uAPI doesn't 
>>> exist
>>> on the kernel or the vq_unready is not supported by vdpa vendor driver.
>>>
>> Yes.
>>
>>>> The code should work for the case when vhost-vdp fails to start.
>>> Unlike the other datapath queues for net vdpa, the events left 
>>> behind in
>>> the control queue can't be processed by the userspace, as unlike
>>> vhost-kernel we don't have a fallback path in the userspace.
>> So that's the question, we should have a safe fallback.
>>
>>> To ignore
>>> the pending event and let vhost vdpa process it on resume/start is
>>> perhaps the best thing to do. This is even true for datapath queues for
>>> other vdpa devices than of network.
>>>
>>>>> Not sure I got the reason why we need to handle pending host
>>>>> notification in userland vq, can you elaborate?
>>>> Because vhost-vDPA fails to start, we will "fallback" to a dummy 
>>>> userspace.
>>> Is the dummy userspace working or operational? What's the use case of
>>> this "fallback" dummy if what guest user can get is a busted NIC?
>> The problem is we can't do better in this case now. Such fallack (e.g
>> for vhost-user) has been used for years. Or do you have any better
>> ideas?
> In my opinion if vhost-vdpa or vhost-user fails to start, maybe we 
> should try to disable the device via virtio_error(), which would set 
> broken to true and set NEEDS_RESET in case of VERSION_1. That way the 
> device won't move forward further and the guest may get the indication 
> via config interrupt that something had gone wrong underneath. If 
> device reset is well supported there the guest driver would retry.


Note that the NEEDS_RESET is not implemented in the current Linux drivers.


> This can at least give the backend some chance to recover if running 
> into intermittent error. The worst result would be the device keeps 
> resetting repeatedly, for which we may introduce tunable to control 
> the rate if seeing reset occurs too often.. Did this ever get 
> considered before?


I don't know, but we manage to survive with such fallback for years. We 
can do this, but can virtio_error() fix the issue you describe here?


>
> Noted that the dummy userspace can't handle any control vq command 
> effectively once the vhost backend fails, for e.g. how does it handle 
> those guest offload, rx mode, MAC or VLAN filter changes without 
> sending the request down to the backend? 


It should be no difference compared to the real hardware. The device is 
just malfunction. The driver can detect this in many ways. E.g in the 
past I suggest to implement the device watchdog for virtio-net, the 
prototype is posted but for some reason it was stalled. Maybe we can 
consider to continue the work.


> This could easily get inconsistent state to the guest if somehow we 
> are able to resume the virtqueue without a reset. Even so, I suspect 
> the device reset eventually is still needed on the other part, which 
> is subject to the specific failure. It looks to me at least for 
> vhost-vdpa, it might be the safest fallback so far to ignore pending 
> event in ctrl_vq, and disable the device from moving forward in case 
> of backend start failure.


I don't get here, if we fail to start vhost-vdpa, the Qemu should do a 
safe rewind otherwise it would be a bug.


>
>>
>> It doesn't differ too much of the two approaches:
>>
>> 1) dummy fallback which can do even cvq
>>
>> and
>>
>> 2) disable host notifiers
>>
>> Especially consider 2) requires more changes.
> I'm not clear if 2) really needs more changes, as it seems to me that 
> it would take more unwarranted changes to make dummy fallback to work 
> on cvq? And suppose we can fallback to disabling device via 
> virtio_error(), we don't even need to change any code on cvq?


So let me explain my points:

1) we use dummy receive as a fallback as vhost-user

2) the code should safely fallback to that otherwise it could be a bug 
elsewhere

3) if we think the dummy fallback doesn't make sense, we can improve, 
but we need to figure out why we can crash for 2) since the code could 
be used in other path.


>
> On the other hand, for the specific code path this patch tries to fix, 
> it is not due to failure to start vhost-vdpa backend, but more of a 
> control flow flaw in the stop path due to lack of VQ stop uAPI. Let 
> alone dummy or host notifier, considering currently it's in the stop 
> path followed by a reset, I feel it should be pretty safe to just 
> ignore the pending event on the control vq rather than process it 
> prematurely in userspace. What do you think? I can leave without the 
> host notifier handler change for sure.


I wonder how vhost-user deal with this.


>
>>
>>> I
>>> think this is very different from the vhost-kernel case in that once
>>> vhost fails, we can fallback to userspace to emulate the network 
>>> through
>>> the tap fd in a good way. However, there's no equivalent yet for
>>> vhost-vdpa...
>>>
>> As said previously, technically we can have vhost-vDPA network backend
>> as a fallback.
> But this is working as yet. And how do you envision the datapath may 
> work given that we don't have a fallback tap fd?


I mean we can treat vhost-vdpa as a kind of general networking backend 
that could be used by all NIC model like e1000e. Then we can use that as 
a fallback.

But I'm not sure it's worth to bother.

Thanks


>
> -Siwei
>
>
>>   (So did for vhost-user).
>>
>> Thanks
>>
>>> Thanks,
>>> -Siwei
>>>
>>>> Thanks
>>>>
>>>>> Thanks,
>>>>> -Siwei
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier 
>>>>>>> (d=d@entry=0x558f568f0f60, n=n@entry=2, 
>>>>>>> assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>>>>>>       at ../hw/virtio/virtio-pci.c:974
>>>>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers 
>>>>>>> (d=0x558f568f0f60, nvqs=3, assign=true) at 
>>>>>>> ../hw/virtio/virtio-pci.c:1019
>>>>>>> 9  0x0000558f52bf091d in vhost_net_start 
>>>>>>> (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, 
>>>>>>> data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>>>>>>       at ../hw/net/vhost_net.c:361
>>>>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status 
>>>>>>> (status=<optimized out>, n=0x558f568f91f0) at 
>>>>>>> ../hw/net/virtio-net.c:289
>>>>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status 
>>>>>>> (vdev=0x558f568f91f0, status=15 '\017') at 
>>>>>>> ../hw/net/virtio-net.c:370
>>>>>>> 12 0x0000558f52d6c4b2 in virtio_set_status 
>>>>>>> (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at 
>>>>>>> ../hw/virtio/virtio.c:1945
>>>>>>> 13 0x0000558f52c69eff in virtio_pci_common_write 
>>>>>>> (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized 
>>>>>>> out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>>>>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor 
>>>>>>> (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, 
>>>>>>> shift=<optimized out>, mask=<optimized out>, attrs=...)
>>>>>>>       at ../softmmu/memory.c:492
>>>>>>> 15 0x0000558f52d127de in access_with_adjusted_size 
>>>>>>> (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, 
>>>>>>> size=size@entry=1, access_size_min=<optimized out>, 
>>>>>>> access_size_max=<optimized out>, access_fn=0x558f52d15cf0 
>>>>>>> <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at 
>>>>>>> ../softmmu/memory.c:554
>>>>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write 
>>>>>>> (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, 
>>>>>>> op=<optimized out>, attrs=attrs@entry=...)
>>>>>>>       at ../softmmu/memory.c:1504
>>>>>>> 17 0x0000558f52d078e7 in flatview_write_continue 
>>>>>>> (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, 
>>>>>>> attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, 
>>>>>>> addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at 
>>>>>>> ../../../include/qemu/host-utils.h:165
>>>>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, 
>>>>>>> addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at 
>>>>>>> ../softmmu/physmem.c:2822
>>>>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized 
>>>>>>> out>, addr=<optimized out>, attrs=..., 
>>>>>>> buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>>>>>>       at ../softmmu/physmem.c:2914
>>>>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, 
>>>>>>> addr=<optimized out>, attrs=...,
>>>>>>>       attrs@entry=..., buf=buf@entry=0x7f8ce6300028, 
>>>>>>> len=<optimized out>, is_write=<optimized out>) at 
>>>>>>> ../softmmu/physmem.c:2924
>>>>>>> 21 0x0000558f52dced09 in kvm_cpu_exec 
>>>>>>> (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>>>>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn 
>>>>>>> (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>>>>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized 
>>>>>>> out>) at ../util/qemu-thread-posix.c:556
>>>>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>>>>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>>>>>>
>>>>>>> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
>>>>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>>> ---
>>>>>>>     hw/virtio/virtio-bus.c | 3 ++-
>>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
>>>>>>> index 0f69d1c..3159b58 100644
>>>>>>> --- a/hw/virtio/virtio-bus.c
>>>>>>> +++ b/hw/virtio/virtio-bus.c
>>>>>>> @@ -311,7 +311,8 @@ void 
>>>>>>> virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>>>>>>>         /* Test and clear notifier after disabling event,
>>>>>>>          * in case poll callback didn't have time to run.
>>>>>>>          */
>>>>>>> -    virtio_queue_host_notifier_read(notifier);
>>>>>>> +    if (!vdev->disable_ioeventfd_handler)
>>>>>>> +        virtio_queue_host_notifier_read(notifier);
>>>>>>>         event_notifier_cleanup(notifier);
>>>>>>>     }
>>>>>>>
>>>>>>> -- 
>>>>>>> 1.8.3.1
>>>>>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-04-07  7:05               ` Jason Wang
@ 2022-04-08  1:02                 ` Si-Wei Liu
  2022-04-11  8:49                   ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-08  1:02 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 4/7/2022 12:05 AM, Jason Wang wrote:
>
> 在 2022/4/6 上午3:18, Si-Wei Liu 写道:
>>
>>
>> On 4/1/2022 7:00 PM, Jason Wang wrote:
>>> On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu <si-wei.liu@oracle.com> 
>>> wrote:
>>>>
>>>>
>>>> On 3/31/2022 1:36 AM, Jason Wang wrote:
>>>>> On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu 
>>>>> <si-wei.liu@oracle.com> wrote:
>>>>>>
>>>>>> On 3/30/2022 2:14 AM, Jason Wang wrote:
>>>>>>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu 
>>>>>>> <si-wei.liu@oracle.com> wrote:
>>>>>>>> Previous commit prevents vhost-user and vhost-vdpa from using
>>>>>>>> userland vq handler via disable_ioeventfd_handler. The same
>>>>>>>> needs to be done for host notifier cleanup too, as the
>>>>>>>> virtio_queue_host_notifier_read handler still tends to read
>>>>>>>> pending event left behind on ioeventfd and attempts to handle
>>>>>>>> outstanding kicks from QEMU userland vq.
>>>>>>>>
>>>>>>>> If vq handler is not disabled on cleanup, it may lead to sigsegv
>>>>>>>> with recursive virtio_net_set_status call on the control vq:
>>>>>>>>
>>>>>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>>>>>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>>>>>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>>>>>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>>>>>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index 
>>>>>>>> (dev=<optimized out>, idx=<optimized out>) at 
>>>>>>>> ../hw/virtio/vhost-vdpa.c:563
>>>>>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index 
>>>>>>>> (dev=<optimized out>, idx=<optimized out>) at 
>>>>>>>> ../hw/virtio/vhost-vdpa.c:558
>>>>>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask 
>>>>>>>> (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized 
>>>>>>>> out>) at ../hw/virtio/vhost.c:1557
>>>>>>> I feel it's probably a bug elsewhere e.g when we fail to start
>>>>>>> vhost-vDPA, it's the charge of the Qemu to poll host notifier 
>>>>>>> and we
>>>>>>> will fallback to the userspace vq handler.
>>>>>> Apologies, an incorrect stack trace was pasted which actually 
>>>>>> came from
>>>>>> patch #1. I will post a v2 with the corresponding one as below:
>>>>>>
>>>>>> 0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at
>>>>>> ../hw/core/qdev.c:376
>>>>>> 1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled
>>>>>> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
>>>>>> 2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at
>>>>>> ../hw/virtio/vhost.c:318
>>>>>> 3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>,
>>>>>> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
>>>>>> ../hw/virtio/vhost.c:336
>>>>>> 4  0x000055f800d71867 in vhost_virtqueue_stop
>>>>>> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
>>>>>> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
>>>>>> 5  0x000055f800d7406c in vhost_dev_stop 
>>>>>> (hdev=hdev@entry=0x55f8037ccc30,
>>>>>> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
>>>>>> 6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
>>>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
>>>>>> 7  0x000055f800bf0678 in vhost_net_stop 
>>>>>> (dev=dev@entry=0x55f8044ec590,
>>>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>>>>>> cvq=cvq@entry=1)
>>>>>>       at ../hw/net/vhost_net.c:423
>>>>>> 8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized 
>>>>>> out>,
>>>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>>>>>> 9  0x000055f800d4e628 in virtio_net_set_status
>>>>>> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
>>>>>> ../hw/net/virtio-net.c:370
>>>>> I don't understand why virtio_net_handle_ctrl() call 
>>>>> virtio_net_set_stauts()...
>>>> The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ
>>>> command, i.e. in virtio_net_handle_mq():
>>> Completely forget that the code was actually written by me :\
>>>
>>>> 1413     n->curr_queue_pairs = queue_pairs;
>>>> 1414     /* stop the backend before changing the number of queue_pairs
>>>> to avoid handling a
>>>> 1415      * disabled queue */
>>>> 1416     virtio_net_set_status(vdev, vdev->status);
>>>> 1417     virtio_net_set_queue_pairs(n);
>>>>
>>>> Noted before the vdpa multiqueue support, there was never a vhost_dev
>>>> for ctrl_vq exposed, i.e. there's no host notifier set up for the
>>>> ctrl_vq on vhost_kernel as it is emulated in QEMU software.
>>>>
>>>>>> 10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized
>>>>>> out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at
>>>>>> ../hw/net/virtio-net.c:1408
>>>>>> 11 0x000055f800d534d8 in virtio_net_handle_ctrl 
>>>>>> (vdev=0x55f8044ec590,
>>>>>> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
>>>>>> 12 0x000055f800d69f37 in virtio_queue_host_notifier_read
>>>>>> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
>>>>>> 13 0x000055f800d69f37 in virtio_queue_host_notifier_read
>>>>>> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
>>>>>> 14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier
>>>>>> (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
>>>>>> 15 0x000055f800d73106 in vhost_dev_disable_notifiers
>>>>>> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
>>>>>>       at ../../../include/hw/virtio/virtio-bus.h:35
>>>>>> 16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
>>>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
>>>>>> 17 0x000055f800bf0678 in vhost_net_stop 
>>>>>> (dev=dev@entry=0x55f8044ec590,
>>>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
>>>>>> cvq=cvq@entry=1)
>>>>>>       at ../hw/net/vhost_net.c:423
>>>>>> 18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized 
>>>>>> out>,
>>>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
>>>>>> 19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
>>>>>> status=15 '\017') at ../hw/net/virtio-net.c:370
>>>>>> 20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
>>>>>> val=<optimized out>) at ../hw/virtio/virtio.c:1945
>>>>>> 21 0x000055f800d11d9d in vm_state_notify 
>>>>>> (running=running@entry=false,
>>>>>> state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
>>>>>> 22 0x000055f800d04e7a in do_vm_stop
>>>>>> (state=state@entry=RUN_STATE_SHUTDOWN, 
>>>>>> send_stop=send_stop@entry=false)
>>>>>> at ../softmmu/cpus.c:262
>>>>>> 23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
>>>>>> 24 0x000055f800d126af in qemu_cleanup () at 
>>>>>> ../softmmu/runstate.c:812
>>>>>> 25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized
>>>>>> out>, envp=<optimized out>) at ../softmmu/main.c:51
>>>>>>
>>>>>>    From the trace pending read only occurs in stop path. The 
>>>>>> recursive
>>>>>> virtio_net_set_status from virtio_net_handle_ctrl doesn't make 
>>>>>> sense to me.
>>>>> Yes, we need to figure this out to know the root cause.
>>>> I think it has something to do with the virtqueue unready issue 
>>>> that the
>>>> vhost_reset_device() refactoring series attempt to fix. If that is 
>>>> fixed
>>>> we should not see this sigsegv with mlx5_vdpa. However I guess we both
>>>> agreed that the vq_unready support would need new uAPI (some flag) to
>>>> define, hence this fix applies to the situation where uAPI doesn't 
>>>> exist
>>>> on the kernel or the vq_unready is not supported by vdpa vendor 
>>>> driver.
>>>>
>>> Yes.
>>>
>>>>> The code should work for the case when vhost-vdp fails to start.
>>>> Unlike the other datapath queues for net vdpa, the events left 
>>>> behind in
>>>> the control queue can't be processed by the userspace, as unlike
>>>> vhost-kernel we don't have a fallback path in the userspace.
>>> So that's the question, we should have a safe fallback.
>>>
>>>> To ignore
>>>> the pending event and let vhost vdpa process it on resume/start is
>>>> perhaps the best thing to do. This is even true for datapath queues 
>>>> for
>>>> other vdpa devices than of network.
>>>>
>>>>>> Not sure I got the reason why we need to handle pending host
>>>>>> notification in userland vq, can you elaborate?
>>>>> Because vhost-vDPA fails to start, we will "fallback" to a dummy 
>>>>> userspace.
>>>> Is the dummy userspace working or operational? What's the use case of
>>>> this "fallback" dummy if what guest user can get is a busted NIC?
>>> The problem is we can't do better in this case now. Such fallack (e.g
>>> for vhost-user) has been used for years. Or do you have any better
>>> ideas?
>> In my opinion if vhost-vdpa or vhost-user fails to start, maybe we 
>> should try to disable the device via virtio_error(), which would set 
>> broken to true and set NEEDS_RESET in case of VERSION_1. That way the 
>> device won't move forward further and the guest may get the 
>> indication via config interrupt that something had gone wrong 
>> underneath. If device reset is well supported there the guest driver 
>> would retry.
>
>
> Note that the NEEDS_RESET is not implemented in the current Linux 
> drivers.
Yes, I am aware of that. I think the point to set NEEDS_RESET is to stop 
the device from moving forward, as when it comes to start failure, the 
vhost backend is already bogged down or in a bogus state unable to move 
further. And it's the standardized way to explicitly inform guest of 
failure on the device side, although the corresponding NEEDS_RESET 
handling hadn't been implemented in any Linux driver yet. Of coz 
alternatively, guest can figure it out itself implicitly via watchdog 
timer, as you indicated below.

>
>
>> This can at least give the backend some chance to recover if running 
>> into intermittent error. The worst result would be the device keeps 
>> resetting repeatedly, for which we may introduce tunable to control 
>> the rate if seeing reset occurs too often.. Did this ever get 
>> considered before?
>
>
> I don't know, but we manage to survive with such fallback for years. 
I wonder how vhost-user client may restart in this case i.e. when 
running into transient backend failure. Haven't yet checked the code, do 
you mean there's never error recovery (or error detection at least) 
implemented in the vhost-user client for e.g. DPDK? Or it just tries to 
reconnect if the socket connection gets closed, but never cares about 
any vhost-user backend start failure?

> We can do this, but can virtio_error() fix the issue you describe here?
It doesn't fix the sigsegv issue for certain. Actually the issue I ran 
into has little to do with error handling, but thinking with the 
assumption of virtio_error() in the start error path we can just live 
without falling back to the dummy userspace or handling any request (as 
all vqs are effectively stopped/disabled). Which is exactly consistent 
with the handling in the stop (success) path: ignore pending event on 
the host notifier. In other word, it doesn't necessarily have to assume 
the existence of dummy userspace fallback, which IMHO does nothing more 
compared to marking NEEDS_RESET with virtio_error(). While on the 
contrary, if there's ever a good use case for the dummy userspace (which 
I might not be aware), I thought the fallback to userspace emulation 
would be even needed for the stop path. But I doubted the need for 
adding such complex code without seeing a convincing case.

>
>
>>
>> Noted that the dummy userspace can't handle any control vq command 
>> effectively once the vhost backend fails, for e.g. how does it handle 
>> those guest offload, rx mode, MAC or VLAN filter changes without 
>> sending the request down to the backend? 
>
>
> It should be no difference compared to the real hardware. The device 
> is just malfunction. The driver can detect this in many ways. E.g in 
> the past I suggest to implement the device watchdog for virtio-net, 
> the prototype is posted but for some reason it was stalled. Maybe we 
> can consider to continue the work.
Would you mind pointing me to the thread? What was the blocker then?

I feel it might be nice to consider NEEDS_RESET handling for guest 
drivers as it is more relevant here.

>
>
>> This could easily get inconsistent state to the guest if somehow we 
>> are able to resume the virtqueue without a reset. Even so, I suspect 
>> the device reset eventually is still needed on the other part, which 
>> is subject to the specific failure. It looks to me at least for 
>> vhost-vdpa, it might be the safest fallback so far to ignore pending 
>> event in ctrl_vq, and disable the device from moving forward in case 
>> of backend start failure.
>
>
> I don't get here, if we fail to start vhost-vdpa, the Qemu should do a 
> safe rewind otherwise it would be a bug.
In the ideal world, yes QEMU should back off to where it was. However, I 
worried that not all of the operations has a corresponding undo op 
symmetrically, for e.g. there's no unready op for vq_ready(), 
reset_owner() contains device reset internally to undo what set_owner() 
effects. It would be easier to just reset as a safe fallback in this case.

>
>
>>
>>>
>>> It doesn't differ too much of the two approaches:
>>>
>>> 1) dummy fallback which can do even cvq
>>>
>>> and
>>>
>>> 2) disable host notifiers
>>>
>>> Especially consider 2) requires more changes.
>> I'm not clear if 2) really needs more changes, as it seems to me that 
>> it would take more unwarranted changes to make dummy fallback to work 
>> on cvq? And suppose we can fallback to disabling device via 
>> virtio_error(), we don't even need to change any code on cvq?
>
>
> So let me explain my points:
>
> 1) we use dummy receive as a fallback as vhost-user
>
> 2) the code should safely fallback to that otherwise it could be a bug 
> elsewhere
>
> 3) if we think the dummy fallback doesn't make sense, we can improve, 
> but we need to figure out why we can crash for 2) since the code could 
> be used in other path.
I think we may either ignore pending request left behind on the vhost 
host notifier or even flush the queue in the stop path, since when 
reaching to this point all of the data vqs have been effectively stopped 
via vhost_dev_stop() and vhost_dev_disable_notifiers(). It looks that's 
what the dummy fallback did on the data vqs, too? i.e. receive_disabled 
is set until queues for the dummy backend are eventually flushed when 
device is fully stopped.

What "could be used in other path" is the key question to answer in my 
opinion. Without knowing the (potential) use cases, it'd be hard to 
imagine what level of emulation needs to be done. I hope we don't have 
to involve complex code change to emulate changing the no. of queues 
when it's known all the heavy lifting done earlier will be effectively 
destroyed with a follow-up reset in the stop path.

As said, I'm fine with not touching the dummy fallback part, but at 
least we should figure out a simple way to fix the vhost-vdpa control vq 
issue.

>
>
>>
>> On the other hand, for the specific code path this patch tries to 
>> fix, it is not due to failure to start vhost-vdpa backend, but more 
>> of a control flow flaw in the stop path due to lack of VQ stop uAPI. 
>> Let alone dummy or host notifier, considering currently it's in the 
>> stop path followed by a reset, I feel it should be pretty safe to 
>> just ignore the pending event on the control vq rather than process 
>> it prematurely in userspace. What do you think? I can leave without 
>> the host notifier handler change for sure.
>
>
> I wonder how vhost-user deal with this.
vhost-user doesn't expose host notifier for control vq. This path is not 
even involved. All requests on the control vq are handled by the 
emulated virtio_net_handle_ctrl handler in the QEMU process.

>
>
>>
>>>
>>>> I
>>>> think this is very different from the vhost-kernel case in that once
>>>> vhost fails, we can fallback to userspace to emulate the network 
>>>> through
>>>> the tap fd in a good way. However, there's no equivalent yet for
>>>> vhost-vdpa...
>>>>
>>> As said previously, technically we can have vhost-vDPA network backend
>>> as a fallback.
>> But this is working as yet. And how do you envision the datapath may 
>> work given that we don't have a fallback tap fd?
>
>
> I mean we can treat vhost-vdpa as a kind of general networking backend 
> that could be used by all NIC model like e1000e. Then we can use that 
> as a fallback.
>
> But I'm not sure it's worth to bother.
Well, perhaps that's another story. I think to support that it would 
need more code refactoring than just the ioeventfd handler change alone...

Thanks,
-Siwei

>
> Thanks
>
>
>>
>> -Siwei
>>
>>
>>>   (So did for vhost-user).
>>>
>>> Thanks
>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>> Thanks
>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier 
>>>>>>>> (d=d@entry=0x558f568f0f60, n=n@entry=2, 
>>>>>>>> assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>>>>>>>       at ../hw/virtio/virtio-pci.c:974
>>>>>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers 
>>>>>>>> (d=0x558f568f0f60, nvqs=3, assign=true) at 
>>>>>>>> ../hw/virtio/virtio-pci.c:1019
>>>>>>>> 9  0x0000558f52bf091d in vhost_net_start 
>>>>>>>> (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, 
>>>>>>>> data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>>>>>>>       at ../hw/net/vhost_net.c:361
>>>>>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status 
>>>>>>>> (status=<optimized out>, n=0x558f568f91f0) at 
>>>>>>>> ../hw/net/virtio-net.c:289
>>>>>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status 
>>>>>>>> (vdev=0x558f568f91f0, status=15 '\017') at 
>>>>>>>> ../hw/net/virtio-net.c:370
>>>>>>>> 12 0x0000558f52d6c4b2 in virtio_set_status 
>>>>>>>> (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at 
>>>>>>>> ../hw/virtio/virtio.c:1945
>>>>>>>> 13 0x0000558f52c69eff in virtio_pci_common_write 
>>>>>>>> (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized 
>>>>>>>> out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>>>>>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor 
>>>>>>>> (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, 
>>>>>>>> shift=<optimized out>, mask=<optimized out>, attrs=...)
>>>>>>>>       at ../softmmu/memory.c:492
>>>>>>>> 15 0x0000558f52d127de in access_with_adjusted_size 
>>>>>>>> (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, 
>>>>>>>> size=size@entry=1, access_size_min=<optimized out>, 
>>>>>>>> access_size_max=<optimized out>, access_fn=0x558f52d15cf0 
>>>>>>>> <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) 
>>>>>>>> at ../softmmu/memory.c:554
>>>>>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write 
>>>>>>>> (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, 
>>>>>>>> op=<optimized out>, attrs=attrs@entry=...)
>>>>>>>>       at ../softmmu/memory.c:1504
>>>>>>>> 17 0x0000558f52d078e7 in flatview_write_continue 
>>>>>>>> (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, 
>>>>>>>> attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, 
>>>>>>>> addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at 
>>>>>>>> ../../../include/qemu/host-utils.h:165
>>>>>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, 
>>>>>>>> addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at 
>>>>>>>> ../softmmu/physmem.c:2822
>>>>>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized 
>>>>>>>> out>, addr=<optimized out>, attrs=..., 
>>>>>>>> buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>>>>>>>       at ../softmmu/physmem.c:2914
>>>>>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, 
>>>>>>>> addr=<optimized out>, attrs=...,
>>>>>>>>       attrs@entry=..., buf=buf@entry=0x7f8ce6300028, 
>>>>>>>> len=<optimized out>, is_write=<optimized out>) at 
>>>>>>>> ../softmmu/physmem.c:2924
>>>>>>>> 21 0x0000558f52dced09 in kvm_cpu_exec 
>>>>>>>> (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>>>>>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn 
>>>>>>>> (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>>>>>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized 
>>>>>>>> out>) at ../util/qemu-thread-posix.c:556
>>>>>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>>>>>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>>>>>>>
>>>>>>>> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
>>>>>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>>>> ---
>>>>>>>>     hw/virtio/virtio-bus.c | 3 ++-
>>>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
>>>>>>>> index 0f69d1c..3159b58 100644
>>>>>>>> --- a/hw/virtio/virtio-bus.c
>>>>>>>> +++ b/hw/virtio/virtio-bus.c
>>>>>>>> @@ -311,7 +311,8 @@ void 
>>>>>>>> virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>>>>>>>>         /* Test and clear notifier after disabling event,
>>>>>>>>          * in case poll callback didn't have time to run.
>>>>>>>>          */
>>>>>>>> -    virtio_queue_host_notifier_read(notifier);
>>>>>>>> +    if (!vdev->disable_ioeventfd_handler)
>>>>>>>> +        virtio_queue_host_notifier_read(notifier);
>>>>>>>>         event_notifier_cleanup(notifier);
>>>>>>>>     }
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 1.8.3.1
>>>>>>>>
>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
  2022-04-08  1:02                 ` Si-Wei Liu
@ 2022-04-11  8:49                   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-11  8:49 UTC (permalink / raw)
  To: Si-Wei Liu, Laurent Vivier; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Fri, Apr 8, 2022 at 9:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/7/2022 12:05 AM, Jason Wang wrote:
> >
> > 在 2022/4/6 上午3:18, Si-Wei Liu 写道:
> >>
> >>
> >> On 4/1/2022 7:00 PM, Jason Wang wrote:
> >>> On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu <si-wei.liu@oracle.com>
> >>> wrote:
> >>>>
> >>>>
> >>>> On 3/31/2022 1:36 AM, Jason Wang wrote:
> >>>>> On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu
> >>>>> <si-wei.liu@oracle.com> wrote:
> >>>>>>
> >>>>>> On 3/30/2022 2:14 AM, Jason Wang wrote:
> >>>>>>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu
> >>>>>>> <si-wei.liu@oracle.com> wrote:
> >>>>>>>> Previous commit prevents vhost-user and vhost-vdpa from using
> >>>>>>>> userland vq handler via disable_ioeventfd_handler. The same
> >>>>>>>> needs to be done for host notifier cleanup too, as the
> >>>>>>>> virtio_queue_host_notifier_read handler still tends to read
> >>>>>>>> pending event left behind on ioeventfd and attempts to handle
> >>>>>>>> outstanding kicks from QEMU userland vq.
> >>>>>>>>
> >>>>>>>> If vq handler is not disabled on cleanup, it may lead to sigsegv
> >>>>>>>> with recursive virtio_net_set_status call on the control vq:
> >>>>>>>>
> >>>>>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
> >>>>>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
> >>>>>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
> >>>>>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
> >>>>>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index
> >>>>>>>> (dev=<optimized out>, idx=<optimized out>) at
> >>>>>>>> ../hw/virtio/vhost-vdpa.c:563
> >>>>>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index
> >>>>>>>> (dev=<optimized out>, idx=<optimized out>) at
> >>>>>>>> ../hw/virtio/vhost-vdpa.c:558
> >>>>>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask
> >>>>>>>> (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized
> >>>>>>>> out>) at ../hw/virtio/vhost.c:1557
> >>>>>>> I feel it's probably a bug elsewhere e.g when we fail to start
> >>>>>>> vhost-vDPA, it's the charge of the Qemu to poll host notifier
> >>>>>>> and we
> >>>>>>> will fallback to the userspace vq handler.
> >>>>>> Apologies, an incorrect stack trace was pasted which actually
> >>>>>> came from
> >>>>>> patch #1. I will post a v2 with the corresponding one as below:
> >>>>>>
> >>>>>> 0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at
> >>>>>> ../hw/core/qdev.c:376
> >>>>>> 1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled
> >>>>>> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
> >>>>>> 2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at
> >>>>>> ../hw/virtio/vhost.c:318
> >>>>>> 3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>,
> >>>>>> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
> >>>>>> ../hw/virtio/vhost.c:336
> >>>>>> 4  0x000055f800d71867 in vhost_virtqueue_stop
> >>>>>> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
> >>>>>> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
> >>>>>> 5  0x000055f800d7406c in vhost_dev_stop
> >>>>>> (hdev=hdev@entry=0x55f8037ccc30,
> >>>>>> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
> >>>>>> 6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
> >>>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
> >>>>>> 7  0x000055f800bf0678 in vhost_net_stop
> >>>>>> (dev=dev@entry=0x55f8044ec590,
> >>>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> >>>>>> cvq=cvq@entry=1)
> >>>>>>       at ../hw/net/vhost_net.c:423
> >>>>>> 8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized
> >>>>>> out>,
> >>>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
> >>>>>> 9  0x000055f800d4e628 in virtio_net_set_status
> >>>>>> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
> >>>>>> ../hw/net/virtio-net.c:370
> >>>>> I don't understand why virtio_net_handle_ctrl() call
> >>>>> virtio_net_set_stauts()...
> >>>> The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ
> >>>> command, i.e. in virtio_net_handle_mq():
> >>> Completely forget that the code was actually written by me :\
> >>>
> >>>> 1413     n->curr_queue_pairs = queue_pairs;
> >>>> 1414     /* stop the backend before changing the number of queue_pairs
> >>>> to avoid handling a
> >>>> 1415      * disabled queue */
> >>>> 1416     virtio_net_set_status(vdev, vdev->status);
> >>>> 1417     virtio_net_set_queue_pairs(n);
> >>>>
> >>>> Noted before the vdpa multiqueue support, there was never a vhost_dev
> >>>> for ctrl_vq exposed, i.e. there's no host notifier set up for the
> >>>> ctrl_vq on vhost_kernel as it is emulated in QEMU software.
> >>>>
> >>>>>> 10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized
> >>>>>> out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at
> >>>>>> ../hw/net/virtio-net.c:1408
> >>>>>> 11 0x000055f800d534d8 in virtio_net_handle_ctrl
> >>>>>> (vdev=0x55f8044ec590,
> >>>>>> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
> >>>>>> 12 0x000055f800d69f37 in virtio_queue_host_notifier_read
> >>>>>> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
> >>>>>> 13 0x000055f800d69f37 in virtio_queue_host_notifier_read
> >>>>>> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
> >>>>>> 14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier
> >>>>>> (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
> >>>>>> 15 0x000055f800d73106 in vhost_dev_disable_notifiers
> >>>>>> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
> >>>>>>       at ../../../include/hw/virtio/virtio-bus.h:35
> >>>>>> 16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
> >>>>>> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
> >>>>>> 17 0x000055f800bf0678 in vhost_net_stop
> >>>>>> (dev=dev@entry=0x55f8044ec590,
> >>>>>> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> >>>>>> cvq=cvq@entry=1)
> >>>>>>       at ../hw/net/vhost_net.c:423
> >>>>>> 18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized
> >>>>>> out>,
> >>>>>> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
> >>>>>> 19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
> >>>>>> status=15 '\017') at ../hw/net/virtio-net.c:370
> >>>>>> 20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
> >>>>>> val=<optimized out>) at ../hw/virtio/virtio.c:1945
> >>>>>> 21 0x000055f800d11d9d in vm_state_notify
> >>>>>> (running=running@entry=false,
> >>>>>> state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
> >>>>>> 22 0x000055f800d04e7a in do_vm_stop
> >>>>>> (state=state@entry=RUN_STATE_SHUTDOWN,
> >>>>>> send_stop=send_stop@entry=false)
> >>>>>> at ../softmmu/cpus.c:262
> >>>>>> 23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
> >>>>>> 24 0x000055f800d126af in qemu_cleanup () at
> >>>>>> ../softmmu/runstate.c:812
> >>>>>> 25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized
> >>>>>> out>, envp=<optimized out>) at ../softmmu/main.c:51
> >>>>>>
> >>>>>>    From the trace pending read only occurs in stop path. The
> >>>>>> recursive
> >>>>>> virtio_net_set_status from virtio_net_handle_ctrl doesn't make
> >>>>>> sense to me.
> >>>>> Yes, we need to figure this out to know the root cause.
> >>>> I think it has something to do with the virtqueue unready issue
> >>>> that the
> >>>> vhost_reset_device() refactoring series attempt to fix. If that is
> >>>> fixed
> >>>> we should not see this sigsegv with mlx5_vdpa. However I guess we both
> >>>> agreed that the vq_unready support would need new uAPI (some flag) to
> >>>> define, hence this fix applies to the situation where uAPI doesn't
> >>>> exist
> >>>> on the kernel or the vq_unready is not supported by vdpa vendor
> >>>> driver.
> >>>>
> >>> Yes.
> >>>
> >>>>> The code should work for the case when vhost-vdp fails to start.
> >>>> Unlike the other datapath queues for net vdpa, the events left
> >>>> behind in
> >>>> the control queue can't be processed by the userspace, as unlike
> >>>> vhost-kernel we don't have a fallback path in the userspace.
> >>> So that's the question, we should have a safe fallback.
> >>>
> >>>> To ignore
> >>>> the pending event and let vhost vdpa process it on resume/start is
> >>>> perhaps the best thing to do. This is even true for datapath queues
> >>>> for
> >>>> other vdpa devices than of network.
> >>>>
> >>>>>> Not sure I got the reason why we need to handle pending host
> >>>>>> notification in userland vq, can you elaborate?
> >>>>> Because vhost-vDPA fails to start, we will "fallback" to a dummy
> >>>>> userspace.
> >>>> Is the dummy userspace working or operational? What's the use case of
> >>>> this "fallback" dummy if what guest user can get is a busted NIC?
> >>> The problem is we can't do better in this case now. Such fallack (e.g
> >>> for vhost-user) has been used for years. Or do you have any better
> >>> ideas?
> >> In my opinion if vhost-vdpa or vhost-user fails to start, maybe we
> >> should try to disable the device via virtio_error(), which would set
> >> broken to true and set NEEDS_RESET in case of VERSION_1. That way the
> >> device won't move forward further and the guest may get the
> >> indication via config interrupt that something had gone wrong
> >> underneath. If device reset is well supported there the guest driver
> >> would retry.
> >
> >
> > Note that the NEEDS_RESET is not implemented in the current Linux
> > drivers.
> Yes, I am aware of that. I think the point to set NEEDS_RESET is to stop
> the device from moving forward, as when it comes to start failure, the
> vhost backend is already bogged down or in a bogus state unable to move
> further. And it's the standardized way to explicitly inform guest of
> failure on the device side, although the corresponding NEEDS_RESET
> handling hadn't been implemented in any Linux driver yet. Of coz
> alternatively, guest can figure it out itself implicitly via watchdog
> timer, as you indicated below.

Right, but I think the guest stuffs is something that is nice to have
but not a must since we don't trust the device always.

>
> >
> >
> >> This can at least give the backend some chance to recover if running
> >> into intermittent error. The worst result would be the device keeps
> >> resetting repeatedly, for which we may introduce tunable to control
> >> the rate if seeing reset occurs too often.. Did this ever get
> >> considered before?
> >
> >
> > I don't know, but we manage to survive with such fallback for years.
> I wonder how vhost-user client may restart in this case i.e. when
> running into transient backend failure. Haven't yet checked the code, do
> you mean there's never error recovery (or error detection at least)
> implemented in the vhost-user client for e.g. DPDK? Or it just tries to
> reconnect if the socket connection gets closed, but never cares about
> any vhost-user backend start failure?

I think we may just get "fallback to userspace" everytime we want to fallback.

>
> > We can do this, but can virtio_error() fix the issue you describe here?
> It doesn't fix the sigsegv issue for certain. Actually the issue I ran
> into has little to do with error handling, but thinking with the
> assumption of virtio_error() in the start error path we can just live
> without falling back to the dummy userspace or handling any request (as
> all vqs are effectively stopped/disabled). Which is exactly consistent
> with the handling in the stop (success) path: ignore pending event on
> the host notifier. In other word, it doesn't necessarily have to assume
> the existence of dummy userspace fallback, which IMHO does nothing more
> compared to marking NEEDS_RESET with virtio_error(). While on the
> contrary, if there's ever a good use case for the dummy userspace (which
> I might not be aware), I thought the fallback to userspace emulation
> would be even needed for the stop path. But I doubted the need for
> adding such complex code without seeing a convincing case.

Ok, I get you. I think virtio_erorr() could be done in another series.
We can fix the issue crash first and then do virtio_error().

>
> >
> >
> >>
> >> Noted that the dummy userspace can't handle any control vq command
> >> effectively once the vhost backend fails, for e.g. how does it handle
> >> those guest offload, rx mode, MAC or VLAN filter changes without
> >> sending the request down to the backend?
> >
> >
> > It should be no difference compared to the real hardware. The device
> > is just malfunction. The driver can detect this in many ways. E.g in
> > the past I suggest to implement the device watchdog for virtio-net,
> > the prototype is posted but for some reason it was stalled. Maybe we
> > can consider to continue the work.
> Would you mind pointing me to the thread? What was the blocker then?

Something like

https://lore.kernel.org/netdev/20191122013636.1041-1-jcfaracco@gmail.com/

I think it doesn't have any blocker, it's probably because nobody
tries to keep working on that. And a drawback is that it's only used
for the net and only for TX. To have a more general detection of the
buggy device, we need a lot of work done in other places. E.g to
detect a stall in virtio_cread():

https://patchwork.kernel.org/project/qemu-devel/patch/20190611172032.19143-1-lvivier@redhat.com/

>
> I feel it might be nice to consider NEEDS_RESET handling for guest
> drivers as it is more relevant here.

Yes, but notice that they're a little bit different:

1) NEEDS_RESET, the device knows something is wrong and it can't be
recovered without a reset
2) watchdog, the device don't know there a bug inside itself

2) looks more robust at first glance.

>
> >
> >
> >> This could easily get inconsistent state to the guest if somehow we
> >> are able to resume the virtqueue without a reset. Even so, I suspect
> >> the device reset eventually is still needed on the other part, which
> >> is subject to the specific failure. It looks to me at least for
> >> vhost-vdpa, it might be the safest fallback so far to ignore pending
> >> event in ctrl_vq, and disable the device from moving forward in case
> >> of backend start failure.
> >
> >
> > I don't get here, if we fail to start vhost-vdpa, the Qemu should do a
> > safe rewind otherwise it would be a bug.
> In the ideal world, yes QEMU should back off to where it was. However, I
> worried that not all of the operations has a corresponding undo op
> symmetrically, for e.g. there's no unready op for vq_ready(),
> reset_owner() contains device reset internally to undo what set_owner()
> effects.

Note that as we discussed in another thread, the reset_owner() is
really confusing and may lead to a buggy device.

Actually, I don't see any real issues caused by the above rewind you
mentioned if we fallback to a device that doesn't send and receive
anything.

> It would be easier to just reset as a safe fallback in this case.

The problem is that we should have a way to work for the old guest. A
reset here may break existing virtio drivers since it was noticed by
the guest.

>
> >
> >
> >>
> >>>
> >>> It doesn't differ too much of the two approaches:
> >>>
> >>> 1) dummy fallback which can do even cvq
> >>>
> >>> and
> >>>
> >>> 2) disable host notifiers
> >>>
> >>> Especially consider 2) requires more changes.
> >> I'm not clear if 2) really needs more changes, as it seems to me that
> >> it would take more unwarranted changes to make dummy fallback to work
> >> on cvq? And suppose we can fallback to disabling device via
> >> virtio_error(), we don't even need to change any code on cvq?
> >
> >
> > So let me explain my points:
> >
> > 1) we use dummy receive as a fallback as vhost-user
> >
> > 2) the code should safely fallback to that otherwise it could be a bug
> > elsewhere
> >
> > 3) if we think the dummy fallback doesn't make sense, we can improve,
> > but we need to figure out why we can crash for 2) since the code could
> > be used in other path.
> I think we may either ignore pending request left behind on the vhost
> host notifier or even flush the queue in the stop path, since when
> reaching to this point all of the data vqs have been effectively stopped
> via vhost_dev_stop() and vhost_dev_disable_notifiers(). It looks that's
> what the dummy fallback did on the data vqs, too? i.e. receive_disabled
> is set until queues for the dummy backend are eventually flushed when
> device is fully stopped.
>
> What "could be used in other path" is the key question to answer in my
> opinion. Without knowing the (potential) use cases, it'd be hard to
> imagine what level of emulation needs to be done. I hope we don't have
> to involve complex code change to emulate changing the no. of queues
> when it's known all the heavy lifting done earlier will be effectively
> destroyed with a follow-up reset in the stop path.

That's my feeling as well.

>
> As said, I'm fine with not touching the dummy fallback part, but at
> least we should figure out a simple way to fix the vhost-vdpa control vq
> issue.

Right, that's my point, we can do any other things on top (since we
need a fix for -stable which should not be instructive). For the case
of control vq, I think we should make it as simple as falling back to
let Qemu to poll the ioeventfd, then it can safely fallback to the
userspace virtio_net_handle_ctrl()?

>
> >
> >
> >>
> >> On the other hand, for the specific code path this patch tries to
> >> fix, it is not due to failure to start vhost-vdpa backend, but more
> >> of a control flow flaw in the stop path due to lack of VQ stop uAPI.
> >> Let alone dummy or host notifier, considering currently it's in the
> >> stop path followed by a reset, I feel it should be pretty safe to
> >> just ignore the pending event on the control vq rather than process
> >> it prematurely in userspace. What do you think? I can leave without
> >> the host notifier handler change for sure.
> >
> >
> > I wonder how vhost-user deal with this.
> vhost-user doesn't expose host notifier for control vq. This path is not
> even involved. All requests on the control vq are handled by the
> emulated virtio_net_handle_ctrl handler in the QEMU process.

Right, but what I meant is that cvq should not differ from data vq in
this case. (Letting qemu to poll the ioeventfd and use
virtio_net_handle_cvq()).

>
> >
> >
> >>
> >>>
> >>>> I
> >>>> think this is very different from the vhost-kernel case in that once
> >>>> vhost fails, we can fallback to userspace to emulate the network
> >>>> through
> >>>> the tap fd in a good way. However, there's no equivalent yet for
> >>>> vhost-vdpa...
> >>>>
> >>> As said previously, technically we can have vhost-vDPA network backend
> >>> as a fallback.
> >> But this is working as yet. And how do you envision the datapath may
> >> work given that we don't have a fallback tap fd?
> >
> >
> > I mean we can treat vhost-vdpa as a kind of general networking backend
> > that could be used by all NIC model like e1000e. Then we can use that
> > as a fallback.
> >
> > But I'm not sure it's worth to bother.
> Well, perhaps that's another story. I think to support that it would
> need more code refactoring than just the ioeventfd handler change alone...

Right.

Thanks

>
> Thanks,
> -Siwei
>
> >
> > Thanks
> >
> >
> >>
> >> -Siwei
> >>
> >>
> >>>   (So did for vhost-user).
> >>>
> >>> Thanks
> >>>
> >>>> Thanks,
> >>>> -Siwei
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier
> >>>>>>>> (d=d@entry=0x558f568f0f60, n=n@entry=2,
> >>>>>>>> assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
> >>>>>>>>       at ../hw/virtio/virtio-pci.c:974
> >>>>>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers
> >>>>>>>> (d=0x558f568f0f60, nvqs=3, assign=true) at
> >>>>>>>> ../hw/virtio/virtio-pci.c:1019
> >>>>>>>> 9  0x0000558f52bf091d in vhost_net_start
> >>>>>>>> (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0,
> >>>>>>>> data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
> >>>>>>>>       at ../hw/net/vhost_net.c:361
> >>>>>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status
> >>>>>>>> (status=<optimized out>, n=0x558f568f91f0) at
> >>>>>>>> ../hw/net/virtio-net.c:289
> >>>>>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status
> >>>>>>>> (vdev=0x558f568f91f0, status=15 '\017') at
> >>>>>>>> ../hw/net/virtio-net.c:370
> >>>>>>>> 12 0x0000558f52d6c4b2 in virtio_set_status
> >>>>>>>> (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at
> >>>>>>>> ../hw/virtio/virtio.c:1945
> >>>>>>>> 13 0x0000558f52c69eff in virtio_pci_common_write
> >>>>>>>> (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized
> >>>>>>>> out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
> >>>>>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor
> >>>>>>>> (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1,
> >>>>>>>> shift=<optimized out>, mask=<optimized out>, attrs=...)
> >>>>>>>>       at ../softmmu/memory.c:492
> >>>>>>>> 15 0x0000558f52d127de in access_with_adjusted_size
> >>>>>>>> (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748,
> >>>>>>>> size=size@entry=1, access_size_min=<optimized out>,
> >>>>>>>> access_size_max=<optimized out>, access_fn=0x558f52d15cf0
> >>>>>>>> <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...)
> >>>>>>>> at ../softmmu/memory.c:554
> >>>>>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write
> >>>>>>>> (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>,
> >>>>>>>> op=<optimized out>, attrs=attrs@entry=...)
> >>>>>>>>       at ../softmmu/memory.c:1504
> >>>>>>>> 17 0x0000558f52d078e7 in flatview_write_continue
> >>>>>>>> (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124,
> >>>>>>>> attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1,
> >>>>>>>> addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at
> >>>>>>>> ../../../include/qemu/host-utils.h:165
> >>>>>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90,
> >>>>>>>> addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at
> >>>>>>>> ../softmmu/physmem.c:2822
> >>>>>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized
> >>>>>>>> out>, addr=<optimized out>, attrs=...,
> >>>>>>>> buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
> >>>>>>>>       at ../softmmu/physmem.c:2914
> >>>>>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>,
> >>>>>>>> addr=<optimized out>, attrs=...,
> >>>>>>>>       attrs@entry=..., buf=buf@entry=0x7f8ce6300028,
> >>>>>>>> len=<optimized out>, is_write=<optimized out>) at
> >>>>>>>> ../softmmu/physmem.c:2924
> >>>>>>>> 21 0x0000558f52dced09 in kvm_cpu_exec
> >>>>>>>> (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
> >>>>>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn
> >>>>>>>> (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
> >>>>>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized
> >>>>>>>> out>) at ../util/qemu-thread-posix.c:556
> >>>>>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
> >>>>>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
> >>>>>>>>
> >>>>>>>> Fixes: 4023784 ("vhost-vdpa: multiqueue support")
> >>>>>>>> Cc: Jason Wang <jasowang@redhat.com>
> >>>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >>>>>>>> ---
> >>>>>>>>     hw/virtio/virtio-bus.c | 3 ++-
> >>>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> >>>>>>>> index 0f69d1c..3159b58 100644
> >>>>>>>> --- a/hw/virtio/virtio-bus.c
> >>>>>>>> +++ b/hw/virtio/virtio-bus.c
> >>>>>>>> @@ -311,7 +311,8 @@ void
> >>>>>>>> virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
> >>>>>>>>         /* Test and clear notifier after disabling event,
> >>>>>>>>          * in case poll callback didn't have time to run.
> >>>>>>>>          */
> >>>>>>>> -    virtio_queue_host_notifier_read(notifier);
> >>>>>>>> +    if (!vdev->disable_ioeventfd_handler)
> >>>>>>>> +        virtio_queue_host_notifier_read(notifier);
> >>>>>>>>         event_notifier_cleanup(notifier);
> >>>>>>>>     }
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 1.8.3.1
> >>>>>>>>
> >>
> >
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/7] vhost-vdpa multiqueue fixes
  2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
                   ` (6 preceding siblings ...)
  2022-03-30  6:33 ` [PATCH 7/7] vhost-vdpa: backend feature should set only once Si-Wei Liu
@ 2022-04-27  4:28 ` Jason Wang
  2022-04-27  8:29   ` Si-Wei Liu
  7 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-27  4:28 UTC (permalink / raw)
  To: Si-Wei Liu, qemu-devel; +Cc: eperezma, eli, mst


在 2022/3/30 14:33, Si-Wei Liu 写道:
> Hi,
>
> This patch series attempt to fix a few issues in vhost-vdpa multiqueue functionality.
>
> Patch #1 is the formal submission for RFC patch in:
> https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/
>
> Patch #2 and #3 were taken from a previous patchset posted on qemu-devel:
> https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/
>
> albeit abandoned, two patches in that set turn out to be useful for patch #4, which is to fix a QEMU crash due to race condition.
>
> Patch #5 through #7 are obviously small bug fixes. Please find the description of each in the commit log.
>
> Thanks,
> -Siwei


Hi Si Wei:

I think we need another version of this series?

Thanks


>
> ---
>
> Eugenio Pérez (2):
>    virtio-net: Fix indentation
>    virtio-net: Only enable userland vq if using tap backend
>
> Si-Wei Liu (5):
>    virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>    virtio: don't read pending event on host notifier if disabled
>    vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>    vhost-net: fix improper cleanup in vhost_net_start
>    vhost-vdpa: backend feature should set only once
>
>   hw/net/vhost_net.c         |  4 +++-
>   hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>   hw/virtio/vhost-vdpa.c     |  2 +-
>   hw/virtio/virtio-bus.c     |  3 ++-
>   hw/virtio/virtio.c         | 21 +++++++++++++--------
>   include/hw/virtio/virtio.h |  2 ++
>   net/vhost-vdpa.c           |  4 +++-
>   7 files changed, 45 insertions(+), 16 deletions(-)
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/7] vhost-vdpa multiqueue fixes
  2022-04-27  4:28 ` [PATCH 0/7] vhost-vdpa multiqueue fixes Jason Wang
@ 2022-04-27  8:29   ` Si-Wei Liu
  2022-04-27  8:38     ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-27  8:29 UTC (permalink / raw)
  To: Jason Wang, qemu-devel; +Cc: eperezma, eli, mst



On 4/26/2022 9:28 PM, Jason Wang wrote:
>
> 在 2022/3/30 14:33, Si-Wei Liu 写道:
>> Hi,
>>
>> This patch series attempt to fix a few issues in vhost-vdpa 
>> multiqueue functionality.
>>
>> Patch #1 is the formal submission for RFC patch in:
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$ 
>>
>> Patch #2 and #3 were taken from a previous patchset posted on 
>> qemu-devel:
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$ 
>>
>> albeit abandoned, two patches in that set turn out to be useful for 
>> patch #4, which is to fix a QEMU crash due to race condition.
>>
>> Patch #5 through #7 are obviously small bug fixes. Please find the 
>> description of each in the commit log.
>>
>> Thanks,
>> -Siwei
>
>
> Hi Si Wei:
>
> I think we need another version of this series?
Hi Jason,

Apologies for the long delay. I was in the middle of reworking the patch 
"virtio: don't read pending event on host notifier if disabled", but 
found out that it would need quite some code change for the userspace 
fallback handler to work properly (for the queue no. change case 
specifically). I have to drop it from the series and posted it later on 
when ready. Will post a v2 with relevant patches removed.

Regards,
-Siwei

>
> Thanks
>
>
>>
>> ---
>>
>> Eugenio Pérez (2):
>>    virtio-net: Fix indentation
>>    virtio-net: Only enable userland vq if using tap backend
>>
>> Si-Wei Liu (5):
>>    virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>>    virtio: don't read pending event on host notifier if disabled
>>    vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>>    vhost-net: fix improper cleanup in vhost_net_start
>>    vhost-vdpa: backend feature should set only once
>>
>>   hw/net/vhost_net.c         |  4 +++-
>>   hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>>   hw/virtio/vhost-vdpa.c     |  2 +-
>>   hw/virtio/virtio-bus.c     |  3 ++-
>>   hw/virtio/virtio.c         | 21 +++++++++++++--------
>>   include/hw/virtio/virtio.h |  2 ++
>>   net/vhost-vdpa.c           |  4 +++-
>>   7 files changed, 45 insertions(+), 16 deletions(-)
>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/7] vhost-vdpa multiqueue fixes
  2022-04-27  8:29   ` Si-Wei Liu
@ 2022-04-27  8:38     ` Jason Wang
  2022-04-27  9:09       ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-27  8:38 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/26/2022 9:28 PM, Jason Wang wrote:
> >
> > 在 2022/3/30 14:33, Si-Wei Liu 写道:
> >> Hi,
> >>
> >> This patch series attempt to fix a few issues in vhost-vdpa
> >> multiqueue functionality.
> >>
> >> Patch #1 is the formal submission for RFC patch in:
> >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
> >>
> >> Patch #2 and #3 were taken from a previous patchset posted on
> >> qemu-devel:
> >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
> >>
> >> albeit abandoned, two patches in that set turn out to be useful for
> >> patch #4, which is to fix a QEMU crash due to race condition.
> >>
> >> Patch #5 through #7 are obviously small bug fixes. Please find the
> >> description of each in the commit log.
> >>
> >> Thanks,
> >> -Siwei
> >
> >
> > Hi Si Wei:
> >
> > I think we need another version of this series?
> Hi Jason,
>
> Apologies for the long delay. I was in the middle of reworking the patch
> "virtio: don't read pending event on host notifier if disabled", but
> found out that it would need quite some code change for the userspace
> fallback handler to work properly (for the queue no. change case
> specifically).

We probably need this fix for -stable, so I wonder if we can have a
workaround first and do refactoring on top?

> I have to drop it from the series and posted it later on
> when ready. Will post a v2 with relevant patches removed.

Thanks

>
> Regards,
> -Siwei
>
> >
> > Thanks
> >
> >
> >>
> >> ---
> >>
> >> Eugenio Pérez (2):
> >>    virtio-net: Fix indentation
> >>    virtio-net: Only enable userland vq if using tap backend
> >>
> >> Si-Wei Liu (5):
> >>    virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
> >>    virtio: don't read pending event on host notifier if disabled
> >>    vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
> >>    vhost-net: fix improper cleanup in vhost_net_start
> >>    vhost-vdpa: backend feature should set only once
> >>
> >>   hw/net/vhost_net.c         |  4 +++-
> >>   hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
> >>   hw/virtio/vhost-vdpa.c     |  2 +-
> >>   hw/virtio/virtio-bus.c     |  3 ++-
> >>   hw/virtio/virtio.c         | 21 +++++++++++++--------
> >>   include/hw/virtio/virtio.h |  2 ++
> >>   net/vhost-vdpa.c           |  4 +++-
> >>   7 files changed, 45 insertions(+), 16 deletions(-)
> >>
> >
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/7] vhost-vdpa multiqueue fixes
  2022-04-27  8:38     ` Jason Wang
@ 2022-04-27  9:09       ` Si-Wei Liu
  2022-04-29  2:30         ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-27  9:09 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 4/27/2022 1:38 AM, Jason Wang wrote:
> On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 4/26/2022 9:28 PM, Jason Wang wrote:
>>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
>>>> Hi,
>>>>
>>>> This patch series attempt to fix a few issues in vhost-vdpa
>>>> multiqueue functionality.
>>>>
>>>> Patch #1 is the formal submission for RFC patch in:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
>>>>
>>>> Patch #2 and #3 were taken from a previous patchset posted on
>>>> qemu-devel:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
>>>>
>>>> albeit abandoned, two patches in that set turn out to be useful for
>>>> patch #4, which is to fix a QEMU crash due to race condition.
>>>>
>>>> Patch #5 through #7 are obviously small bug fixes. Please find the
>>>> description of each in the commit log.
>>>>
>>>> Thanks,
>>>> -Siwei
>>>
>>> Hi Si Wei:
>>>
>>> I think we need another version of this series?
>> Hi Jason,
>>
>> Apologies for the long delay. I was in the middle of reworking the patch
>> "virtio: don't read pending event on host notifier if disabled", but
>> found out that it would need quite some code change for the userspace
>> fallback handler to work properly (for the queue no. change case
>> specifically).
> We probably need this fix for -stable, so I wonder if we can have a
> workaround first and do refactoring on top?
Hmmm, a nasty fix but may well address the segfault is something like this:

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 8ca0b80..3ac93a4 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -368,6 +368,10 @@ static void virtio_net_set_status(struct 
VirtIODevice *vdev, uint8_t status)
      int i;
      uint8_t queue_status;

+    if (n->status_pending)
+        return;
+
+    n->status_pending = true;
      virtio_net_vnet_endian_status(n, status);
      virtio_net_vhost_status(n, status);

@@ -416,6 +420,7 @@ static void virtio_net_set_status(struct 
VirtIODevice *vdev, uint8_t status)
              }
          }
      }
+    n->status_pending = false;
  }

  static void virtio_net_set_link_status(NetClientState *nc)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032..95efea8 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -216,6 +216,7 @@ struct VirtIONet {
      VirtioNetRssData rss_data;
      struct NetRxPkt *rx_pkt;
      struct EBPFRSSContext ebpf_rss;
+    bool status_pending;
  };

  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,

To be honest, I am not sure if this is worth a full blown fix to make it 
completely work. Without applying vq suspend patch (the one I posted in 
https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/), 
it's very hard for me to effectively verify my code change - it's very 
easy for the guest vq index to be out of sync if not stopping the vq 
once the vhost is up and running (I tested it with repeatedly set_link 
off and on). I am not sure if there's real chance we can run into issue 
in practice due to the incomplete fix, if we don't fix the vq 
stop/suspend issue first. Anyway I will try, as other use case e.g, live 
migration is likely to get stumbled on it, too.

-Siwei


>
>> I have to drop it from the series and posted it later on
>> when ready. Will post a v2 with relevant patches removed.
> Thanks
>
>> Regards,
>> -Siwei
>>
>>> Thanks
>>>
>>>
>>>> ---
>>>>
>>>> Eugenio Pérez (2):
>>>>     virtio-net: Fix indentation
>>>>     virtio-net: Only enable userland vq if using tap backend
>>>>
>>>> Si-Wei Liu (5):
>>>>     virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>>>>     virtio: don't read pending event on host notifier if disabled
>>>>     vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>>>>     vhost-net: fix improper cleanup in vhost_net_start
>>>>     vhost-vdpa: backend feature should set only once
>>>>
>>>>    hw/net/vhost_net.c         |  4 +++-
>>>>    hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>>>>    hw/virtio/vhost-vdpa.c     |  2 +-
>>>>    hw/virtio/virtio-bus.c     |  3 ++-
>>>>    hw/virtio/virtio.c         | 21 +++++++++++++--------
>>>>    include/hw/virtio/virtio.h |  2 ++
>>>>    net/vhost-vdpa.c           |  4 +++-
>>>>    7 files changed, 45 insertions(+), 16 deletions(-)
>>>>



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/7] vhost-vdpa multiqueue fixes
  2022-04-27  9:09       ` Si-Wei Liu
@ 2022-04-29  2:30         ` Jason Wang
  2022-04-30  2:07           ` Si-Wei Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-29  2:30 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: eperezma, Eli Cohen, qemu-devel, mst

On Wed, Apr 27, 2022 at 5:09 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/27/2022 1:38 AM, Jason Wang wrote:
> > On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 4/26/2022 9:28 PM, Jason Wang wrote:
> >>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
> >>>> Hi,
> >>>>
> >>>> This patch series attempt to fix a few issues in vhost-vdpa
> >>>> multiqueue functionality.
> >>>>
> >>>> Patch #1 is the formal submission for RFC patch in:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
> >>>>
> >>>> Patch #2 and #3 were taken from a previous patchset posted on
> >>>> qemu-devel:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
> >>>>
> >>>> albeit abandoned, two patches in that set turn out to be useful for
> >>>> patch #4, which is to fix a QEMU crash due to race condition.
> >>>>
> >>>> Patch #5 through #7 are obviously small bug fixes. Please find the
> >>>> description of each in the commit log.
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>>
> >>> Hi Si Wei:
> >>>
> >>> I think we need another version of this series?
> >> Hi Jason,
> >>
> >> Apologies for the long delay. I was in the middle of reworking the patch
> >> "virtio: don't read pending event on host notifier if disabled", but
> >> found out that it would need quite some code change for the userspace
> >> fallback handler to work properly (for the queue no. change case
> >> specifically).
> > We probably need this fix for -stable, so I wonder if we can have a
> > workaround first and do refactoring on top?
> Hmmm, a nasty fix but may well address the segfault is something like this:
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 8ca0b80..3ac93a4 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -368,6 +368,10 @@ static void virtio_net_set_status(struct
> VirtIODevice *vdev, uint8_t status)
>       int i;
>       uint8_t queue_status;
>
> +    if (n->status_pending)
> +        return;
> +
> +    n->status_pending = true;
>       virtio_net_vnet_endian_status(n, status);
>       virtio_net_vhost_status(n, status);
>
> @@ -416,6 +420,7 @@ static void virtio_net_set_status(struct
> VirtIODevice *vdev, uint8_t status)
>               }
>           }
>       }
> +    n->status_pending = false;
>   }
>
>   static void virtio_net_set_link_status(NetClientState *nc)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index eb87032..95efea8 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -216,6 +216,7 @@ struct VirtIONet {
>       VirtioNetRssData rss_data;
>       struct NetRxPkt *rx_pkt;
>       struct EBPFRSSContext ebpf_rss;
> +    bool status_pending;
>   };
>
>   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>
> To be honest, I am not sure if this is worth a full blown fix to make it
> completely work. Without applying vq suspend patch (the one I posted in
> https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/),
> it's very hard for me to effectively verify my code change - it's very
> easy for the guest vq index to be out of sync if not stopping the vq
> once the vhost is up and running (I tested it with repeatedly set_link
> off and on).

Can we test via vmstop?

> I am not sure if there's real chance we can run into issue
> in practice due to the incomplete fix, if we don't fix the vq
> stop/suspend issue first. Anyway I will try, as other use case e.g, live
> migration is likely to get stumbled on it, too.

Ok, so I think we probably don't need the "nasty" fix above. Let's fix
it with the issue of stop/resume.

Thanks

>
> -Siwei
>
>
> >
> >> I have to drop it from the series and posted it later on
> >> when ready. Will post a v2 with relevant patches removed.
> > Thanks
> >
> >> Regards,
> >> -Siwei
> >>
> >>> Thanks
> >>>
> >>>
> >>>> ---
> >>>>
> >>>> Eugenio Pérez (2):
> >>>>     virtio-net: Fix indentation
> >>>>     virtio-net: Only enable userland vq if using tap backend
> >>>>
> >>>> Si-Wei Liu (5):
> >>>>     virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
> >>>>     virtio: don't read pending event on host notifier if disabled
> >>>>     vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
> >>>>     vhost-net: fix improper cleanup in vhost_net_start
> >>>>     vhost-vdpa: backend feature should set only once
> >>>>
> >>>>    hw/net/vhost_net.c         |  4 +++-
> >>>>    hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
> >>>>    hw/virtio/vhost-vdpa.c     |  2 +-
> >>>>    hw/virtio/virtio-bus.c     |  3 ++-
> >>>>    hw/virtio/virtio.c         | 21 +++++++++++++--------
> >>>>    include/hw/virtio/virtio.h |  2 ++
> >>>>    net/vhost-vdpa.c           |  4 +++-
> >>>>    7 files changed, 45 insertions(+), 16 deletions(-)
> >>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/7] vhost-vdpa multiqueue fixes
  2022-04-29  2:30         ` Jason Wang
@ 2022-04-30  2:07           ` Si-Wei Liu
  2022-05-05  8:40             ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Si-Wei Liu @ 2022-04-30  2:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: eperezma, Eli Cohen, qemu-devel, mst



On 4/28/2022 7:30 PM, Jason Wang wrote:
> On Wed, Apr 27, 2022 at 5:09 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 4/27/2022 1:38 AM, Jason Wang wrote:
>>> On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 4/26/2022 9:28 PM, Jason Wang wrote:
>>>>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
>>>>>> Hi,
>>>>>>
>>>>>> This patch series attempt to fix a few issues in vhost-vdpa
>>>>>> multiqueue functionality.
>>>>>>
>>>>>> Patch #1 is the formal submission for RFC patch in:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
>>>>>>
>>>>>> Patch #2 and #3 were taken from a previous patchset posted on
>>>>>> qemu-devel:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
>>>>>>
>>>>>> albeit abandoned, two patches in that set turn out to be useful for
>>>>>> patch #4, which is to fix a QEMU crash due to race condition.
>>>>>>
>>>>>> Patch #5 through #7 are obviously small bug fixes. Please find the
>>>>>> description of each in the commit log.
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>> Hi Si Wei:
>>>>>
>>>>> I think we need another version of this series?
>>>> Hi Jason,
>>>>
>>>> Apologies for the long delay. I was in the middle of reworking the patch
>>>> "virtio: don't read pending event on host notifier if disabled", but
>>>> found out that it would need quite some code change for the userspace
>>>> fallback handler to work properly (for the queue no. change case
>>>> specifically).
>>> We probably need this fix for -stable, so I wonder if we can have a
>>> workaround first and do refactoring on top?
>> Hmmm, a nasty fix but may well address the segfault is something like this:
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 8ca0b80..3ac93a4 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -368,6 +368,10 @@ static void virtio_net_set_status(struct
>> VirtIODevice *vdev, uint8_t status)
>>        int i;
>>        uint8_t queue_status;
>>
>> +    if (n->status_pending)
>> +        return;
>> +
>> +    n->status_pending = true;
>>        virtio_net_vnet_endian_status(n, status);
>>        virtio_net_vhost_status(n, status);
>>
>> @@ -416,6 +420,7 @@ static void virtio_net_set_status(struct
>> VirtIODevice *vdev, uint8_t status)
>>                }
>>            }
>>        }
>> +    n->status_pending = false;
>>    }
>>
>>    static void virtio_net_set_link_status(NetClientState *nc)
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index eb87032..95efea8 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -216,6 +216,7 @@ struct VirtIONet {
>>        VirtioNetRssData rss_data;
>>        struct NetRxPkt *rx_pkt;
>>        struct EBPFRSSContext ebpf_rss;
>> +    bool status_pending;
>>    };
>>
>>    void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>>
>> To be honest, I am not sure if this is worth a full blown fix to make it
>> completely work. Without applying vq suspend patch (the one I posted in
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/__;!!ACWV5N9M2RV99hQ!L4qque3YpPr-CGp12NrNdMMT1HROfEY_Juw2vnfZXHjOhtT0XJCR9GB8cvWEbJL9Aeh-WhDogBVArJn91P0$ ),
>> it's very hard for me to effectively verify my code change - it's very
>> easy for the guest vq index to be out of sync if not stopping the vq
>> once the vhost is up and running (I tested it with repeatedly set_link
>> off and on).
> Can we test via vmstop?
Yes, of coz, that's where the segfault happened. The tight loop of 
set_link on/off doesn't even work for the single queue case, hence 
that's why I doubted it ever worked for vhost-vdpa.

>
>> I am not sure if there's real chance we can run into issue
>> in practice due to the incomplete fix, if we don't fix the vq
>> stop/suspend issue first. Anyway I will try, as other use case e.g, live
>> migration is likely to get stumbled on it, too.
> Ok, so I think we probably don't need the "nasty" fix above. Let's fix
> it with the issue of stop/resume.
Ok, then does below tentative code change suffice the need? i.e. it 
would fail the request of changing queue pairs when the vhost-vdpa 
backend falls back to the userspace handler, but it's probably the 
easiest way to fix the vmstop segfault.

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ed231f9..8ba9f09 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1177,6 +1177,7 @@ static int virtio_net_handle_mq(VirtIONet *n, 
uint8_t cmd,
      struct virtio_net_ctrl_mq mq;
      size_t s;
      uint16_t queue_pairs;
+    NetClientState *nc = qemu_get_queue(n->nic);

      s = iov_to_buf(iov, iov_cnt, 0, &mq, sizeof(mq));
      if (s != sizeof(mq)) {
@@ -1196,6 +1197,13 @@ static int virtio_net_handle_mq(VirtIONet *n, 
uint8_t cmd,
          return VIRTIO_NET_ERR;
      }

+    /* avoid changing the number of queue_pairs for vdpa device in
+     * userspace handler.
+     * TODO: get userspace fallback to work with future patch */
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+        return VIRTIO_NET_ERR;
+    }
+
      n->curr_queue_pairs = queue_pairs;
      /* stop the backend before changing the number of queue_pairs to 
avoid handling a
       * disabled queue */

Thanks,
-Siwei
>
> Thanks
>
>> -Siwei
>>
>>
>>>> I have to drop it from the series and posted it later on
>>>> when ready. Will post a v2 with relevant patches removed.
>>> Thanks
>>>
>>>> Regards,
>>>> -Siwei
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> Eugenio Pérez (2):
>>>>>>      virtio-net: Fix indentation
>>>>>>      virtio-net: Only enable userland vq if using tap backend
>>>>>>
>>>>>> Si-Wei Liu (5):
>>>>>>      virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>>>>>>      virtio: don't read pending event on host notifier if disabled
>>>>>>      vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>>>>>>      vhost-net: fix improper cleanup in vhost_net_start
>>>>>>      vhost-vdpa: backend feature should set only once
>>>>>>
>>>>>>     hw/net/vhost_net.c         |  4 +++-
>>>>>>     hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>>>>>>     hw/virtio/vhost-vdpa.c     |  2 +-
>>>>>>     hw/virtio/virtio-bus.c     |  3 ++-
>>>>>>     hw/virtio/virtio.c         | 21 +++++++++++++--------
>>>>>>     include/hw/virtio/virtio.h |  2 ++
>>>>>>     net/vhost-vdpa.c           |  4 +++-
>>>>>>     7 files changed, 45 insertions(+), 16 deletions(-)
>>>>>>



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/7] vhost-vdpa multiqueue fixes
  2022-04-30  2:07           ` Si-Wei Liu
@ 2022-05-05  8:40             ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-05-05  8:40 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: qemu-devel, mst, eperezma, Eli Cohen

On Sat, Apr 30, 2022 at 10:07 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/28/2022 7:30 PM, Jason Wang wrote:
> > On Wed, Apr 27, 2022 at 5:09 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 4/27/2022 1:38 AM, Jason Wang wrote:
> >>> On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>
> >>>> On 4/26/2022 9:28 PM, Jason Wang wrote:
> >>>>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
> >>>>>> Hi,
> >>>>>>
> >>>>>> This patch series attempt to fix a few issues in vhost-vdpa
> >>>>>> multiqueue functionality.
> >>>>>>
> >>>>>> Patch #1 is the formal submission for RFC patch in:
> >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
> >>>>>>
> >>>>>> Patch #2 and #3 were taken from a previous patchset posted on
> >>>>>> qemu-devel:
> >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
> >>>>>>
> >>>>>> albeit abandoned, two patches in that set turn out to be useful for
> >>>>>> patch #4, which is to fix a QEMU crash due to race condition.
> >>>>>>
> >>>>>> Patch #5 through #7 are obviously small bug fixes. Please find the
> >>>>>> description of each in the commit log.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>> Hi Si Wei:
> >>>>>
> >>>>> I think we need another version of this series?
> >>>> Hi Jason,
> >>>>
> >>>> Apologies for the long delay. I was in the middle of reworking the patch
> >>>> "virtio: don't read pending event on host notifier if disabled", but
> >>>> found out that it would need quite some code change for the userspace
> >>>> fallback handler to work properly (for the queue no. change case
> >>>> specifically).
> >>> We probably need this fix for -stable, so I wonder if we can have a
> >>> workaround first and do refactoring on top?
> >> Hmmm, a nasty fix but may well address the segfault is something like this:
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 8ca0b80..3ac93a4 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -368,6 +368,10 @@ static void virtio_net_set_status(struct
> >> VirtIODevice *vdev, uint8_t status)
> >>        int i;
> >>        uint8_t queue_status;
> >>
> >> +    if (n->status_pending)
> >> +        return;
> >> +
> >> +    n->status_pending = true;
> >>        virtio_net_vnet_endian_status(n, status);
> >>        virtio_net_vhost_status(n, status);
> >>
> >> @@ -416,6 +420,7 @@ static void virtio_net_set_status(struct
> >> VirtIODevice *vdev, uint8_t status)
> >>                }
> >>            }
> >>        }
> >> +    n->status_pending = false;
> >>    }
> >>
> >>    static void virtio_net_set_link_status(NetClientState *nc)
> >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> >> index eb87032..95efea8 100644
> >> --- a/include/hw/virtio/virtio-net.h
> >> +++ b/include/hw/virtio/virtio-net.h
> >> @@ -216,6 +216,7 @@ struct VirtIONet {
> >>        VirtioNetRssData rss_data;
> >>        struct NetRxPkt *rx_pkt;
> >>        struct EBPFRSSContext ebpf_rss;
> >> +    bool status_pending;
> >>    };
> >>
> >>    void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> >>
> >> To be honest, I am not sure if this is worth a full blown fix to make it
> >> completely work. Without applying vq suspend patch (the one I posted in
> >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/__;!!ACWV5N9M2RV99hQ!L4qque3YpPr-CGp12NrNdMMT1HROfEY_Juw2vnfZXHjOhtT0XJCR9GB8cvWEbJL9Aeh-WhDogBVArJn91P0$ ),
> >> it's very hard for me to effectively verify my code change - it's very
> >> easy for the guest vq index to be out of sync if not stopping the vq
> >> once the vhost is up and running (I tested it with repeatedly set_link
> >> off and on).
> > Can we test via vmstop?
> Yes, of coz, that's where the segfault happened. The tight loop of
> set_link on/off doesn't even work for the single queue case, hence
> that's why I doubted it ever worked for vhost-vdpa.

Right, this is something we need to check. Set_link should stop the
vhost device anyhow, otherwise it should be a bug.

>
> >
> >> I am not sure if there's real chance we can run into issue
> >> in practice due to the incomplete fix, if we don't fix the vq
> >> stop/suspend issue first. Anyway I will try, as other use case e.g, live
> >> migration is likely to get stumbled on it, too.
> > Ok, so I think we probably don't need the "nasty" fix above. Let's fix
> > it with the issue of stop/resume.
> Ok, then does below tentative code change suffice the need? i.e. it
> would fail the request of changing queue pairs when the vhost-vdpa
> backend falls back to the userspace handler, but it's probably the
> easiest way to fix the vmstop segfault.

Probably, let's see.

Thanks

>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index ed231f9..8ba9f09 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1177,6 +1177,7 @@ static int virtio_net_handle_mq(VirtIONet *n,
> uint8_t cmd,
>       struct virtio_net_ctrl_mq mq;
>       size_t s;
>       uint16_t queue_pairs;
> +    NetClientState *nc = qemu_get_queue(n->nic);
>
>       s = iov_to_buf(iov, iov_cnt, 0, &mq, sizeof(mq));
>       if (s != sizeof(mq)) {
> @@ -1196,6 +1197,13 @@ static int virtio_net_handle_mq(VirtIONet *n,
> uint8_t cmd,
>           return VIRTIO_NET_ERR;
>       }
>
> +    /* avoid changing the number of queue_pairs for vdpa device in
> +     * userspace handler.
> +     * TODO: get userspace fallback to work with future patch */
> +    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +        return VIRTIO_NET_ERR;
> +    }
> +
>       n->curr_queue_pairs = queue_pairs;
>       /* stop the backend before changing the number of queue_pairs to
> avoid handling a
>        * disabled queue */
>
> Thanks,
> -Siwei
> >
> > Thanks
> >
> >> -Siwei
> >>
> >>
> >>>> I have to drop it from the series and posted it later on
> >>>> when ready. Will post a v2 with relevant patches removed.
> >>> Thanks
> >>>
> >>>> Regards,
> >>>> -Siwei
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> Eugenio Pérez (2):
> >>>>>>      virtio-net: Fix indentation
> >>>>>>      virtio-net: Only enable userland vq if using tap backend
> >>>>>>
> >>>>>> Si-Wei Liu (5):
> >>>>>>      virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
> >>>>>>      virtio: don't read pending event on host notifier if disabled
> >>>>>>      vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
> >>>>>>      vhost-net: fix improper cleanup in vhost_net_start
> >>>>>>      vhost-vdpa: backend feature should set only once
> >>>>>>
> >>>>>>     hw/net/vhost_net.c         |  4 +++-
> >>>>>>     hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
> >>>>>>     hw/virtio/vhost-vdpa.c     |  2 +-
> >>>>>>     hw/virtio/virtio-bus.c     |  3 ++-
> >>>>>>     hw/virtio/virtio.c         | 21 +++++++++++++--------
> >>>>>>     include/hw/virtio/virtio.h |  2 ++
> >>>>>>     net/vhost-vdpa.c           |  4 +++-
> >>>>>>     7 files changed, 45 insertions(+), 16 deletions(-)
> >>>>>>
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-05-05  9:47 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-30  6:33 [PATCH 0/7] vhost-vdpa multiqueue fixes Si-Wei Liu
2022-03-30  6:33 ` [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa Si-Wei Liu
2022-03-30  9:00   ` Jason Wang
2022-03-30 15:47     ` Si-Wei Liu
2022-03-31  8:39       ` Jason Wang
2022-04-01 22:32         ` Si-Wei Liu
2022-04-02  2:10           ` Jason Wang
2022-04-05 23:26             ` Si-Wei Liu
2022-03-30  6:33 ` [PATCH 2/7] virtio-net: Fix indentation Si-Wei Liu
2022-03-30  9:01   ` Jason Wang
2022-03-30  6:33 ` [PATCH 3/7] virtio-net: Only enable userland vq if using tap backend Si-Wei Liu
2022-03-30  9:07   ` Jason Wang
2022-03-30  6:33 ` [PATCH 4/7] virtio: don't read pending event on host notifier if disabled Si-Wei Liu
2022-03-30  9:14   ` Jason Wang
2022-03-30 16:40     ` Si-Wei Liu
2022-03-31  8:36       ` Jason Wang
2022-04-01 20:37         ` Si-Wei Liu
2022-04-02  2:00           ` Jason Wang
2022-04-05 19:18             ` Si-Wei Liu
2022-04-07  7:05               ` Jason Wang
2022-04-08  1:02                 ` Si-Wei Liu
2022-04-11  8:49                   ` Jason Wang
2022-03-30  6:33 ` [PATCH 5/7] vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa Si-Wei Liu
2022-03-30  9:15   ` Jason Wang
2022-03-30  6:33 ` [PATCH 6/7] vhost-net: fix improper cleanup in vhost_net_start Si-Wei Liu
2022-03-30  9:30   ` Jason Wang
2022-03-30  6:33 ` [PATCH 7/7] vhost-vdpa: backend feature should set only once Si-Wei Liu
2022-03-30  9:28   ` Jason Wang
2022-03-30 16:24   ` Stefano Garzarella
2022-03-30 17:12     ` Si-Wei Liu
2022-03-30 17:32       ` Stefano Garzarella
2022-03-30 18:27         ` Eugenio Perez Martin
2022-03-30 22:44           ` Si-Wei Liu
2022-03-30 19:01   ` Eugenio Perez Martin
2022-03-30 23:03     ` Si-Wei Liu
2022-03-31  8:02       ` Eugenio Perez Martin
2022-03-31  8:54         ` Jason Wang
2022-03-31  9:19           ` Eugenio Perez Martin
2022-04-01  2:39             ` Jason Wang
2022-04-01  4:18               ` Si-Wei Liu
2022-04-02  1:33                 ` Jason Wang
2022-03-31 21:15         ` Si-Wei Liu
2022-04-01  8:21           ` Eugenio Perez Martin
2022-04-27  4:28 ` [PATCH 0/7] vhost-vdpa multiqueue fixes Jason Wang
2022-04-27  8:29   ` Si-Wei Liu
2022-04-27  8:38     ` Jason Wang
2022-04-27  9:09       ` Si-Wei Liu
2022-04-29  2:30         ` Jason Wang
2022-04-30  2:07           ` Si-Wei Liu
2022-05-05  8:40             ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.