All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vdpa: Avoid reset when stop device
@ 2022-03-23  8:42 08005325
  2022-03-23  9:20 ` Jason Wang
  2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
  0 siblings, 2 replies; 47+ messages in thread
From: 08005325 @ 2022-03-23  8:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, Michael Qiu, lulu

From: Michael Qiu <qiudayu@archeros.com>

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset twice, this leads
to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_dev_stop(), qemu tries to stop the device,
then stop the queue: vhost_virtqueue_stop().
In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and the driver finds
that the VQ is invalied, this is the root cause.

Actually, device reset will be called within func release()

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
---
 hw/virtio/vhost-vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..d858b4f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_vring_state state = {
             .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
     }
@@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
     } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
-        vhost_vdpa_reset_device(dev);
         vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                    VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] vdpa: Avoid reset when stop device
  2022-03-23  8:42 [PATCH] vdpa: Avoid reset when stop device 08005325
@ 2022-03-23  9:20 ` Jason Wang
  2022-03-25  6:32   ` Si-Wei Liu
  2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
  1 sibling, 1 reply; 47+ messages in thread
From: Jason Wang @ 2022-03-23  9:20 UTC (permalink / raw)
  To: 08005325
  Cc: Zhu Lingshan, Eugenio Perez Martin, Michael Qiu, qemu-devel, Cindy Lu

Adding Eugenio,  and Ling Shan.

On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>
> From: Michael Qiu <qiudayu@archeros.com>
>
> Currently, when VM poweroff, it will trigger vdpa
> device(such as mlx bluefield2 VF) reset twice, this leads
> to below issue:
>
> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>
> This because in vhost_dev_stop(), qemu tries to stop the device,
> then stop the queue: vhost_virtqueue_stop().
> In vhost_dev_stop(), it resets the device, which clear some flags
> in low level driver, and the driver finds
> that the VQ is invalied, this is the root cause.
>
> Actually, device reset will be called within func release()
>
> To solve the issue, vdpa should set vring unready, and
> remove reset ops in device stop: vhost_dev_start(hdev, false).

This is an interesting issue. Do you see a real issue except for the
above warnings.

The reason we "abuse" reset is that we don't have a stop uAPI for
vhost. We plan to add a status bit to stop the whole device in the
virtio spec, but considering it may take a while maybe we can first
introduce a new uAPI/ioctl for that.

Note that the stop doesn't just work for virtqueue but others like,
e.g config space. But considering we don't have config interrupt
support right now, we're probably fine.

Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
problematic, Ling Shan, please have a check. And we probably need a
workaround for vp_vdpa as well.

Anyhow, this seems to be better than reset. So for 7.1:

Acked-by: Jason Wang <jasowang@redhat.com>

>
> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> ---
>  hw/virtio/vhost-vdpa.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..d858b4f 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>      return idx;
>  }
>
> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>  {
>      int i;
>      trace_vhost_vdpa_set_vring_ready(dev);
>      for (i = 0; i < dev->nvqs; ++i) {
>          struct vhost_vring_state state = {
>              .index = dev->vq_index + i,
> -            .num = 1,
> +            .num = ready,
>          };
>          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>      }
> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          if (unlikely(!ok)) {
>              return -1;
>          }
> -        vhost_vdpa_set_vring_ready(dev);
> +        vhost_vdpa_set_vring_ready(dev, 1);
>      } else {
> +        vhost_vdpa_set_vring_ready(dev, 0);
>          ok = vhost_vdpa_svqs_stop(dev);
>          if (unlikely(!ok)) {
>              return -1;
> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          memory_listener_register(&v->listener, &address_space_memory);
>          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>      } else {
> -        vhost_vdpa_reset_device(dev);
>          vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                     VIRTIO_CONFIG_S_DRIVER);
>          memory_listener_unregister(&v->listener);
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] vdpa: Avoid reset when stop device
  2022-03-23  9:20 ` Jason Wang
@ 2022-03-25  6:32   ` Si-Wei Liu
       [not found]     ` <fe13304f-0a18-639e-580d-ce6eb7daecab@archeros.com>
       [not found]     ` <6fbf82a9-39ce-f179-5e4b-384123ca542c@archeros.com>
  0 siblings, 2 replies; 47+ messages in thread
From: Si-Wei Liu @ 2022-03-25  6:32 UTC (permalink / raw)
  To: Jason Wang, 08005325, Eugenio Perez Martin
  Cc: Zhu Lingshan, Michael Qiu, Cindy Lu, qemu-devel



On 3/23/2022 2:20 AM, Jason Wang wrote:
> Adding Eugenio,  and Ling Shan.
>
> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>> From: Michael Qiu <qiudayu@archeros.com>
>>
>> Currently, when VM poweroff, it will trigger vdpa
>> device(such as mlx bluefield2 VF) reset twice, this leads
>> to below issue:
>>
>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>
>> This because in vhost_dev_stop(), qemu tries to stop the device,
>> then stop the queue: vhost_virtqueue_stop().
>> In vhost_dev_stop(), it resets the device, which clear some flags
>> in low level driver, and the driver finds
>> that the VQ is invalied, this is the root cause.
>>
>> Actually, device reset will be called within func release()
>>
>> To solve the issue, vdpa should set vring unready, and
>> remove reset ops in device stop: vhost_dev_start(hdev, false).
> This is an interesting issue. Do you see a real issue except for the
> above warnings.
>
> The reason we "abuse" reset is that we don't have a stop uAPI for
> vhost. We plan to add a status bit to stop the whole device in the
> virtio spec, but considering it may take a while maybe we can first
> introduce a new uAPI/ioctl for that.
Yep. What was missing here is a vdpa specific uAPI for per-virtqueue 
stop/suspend rather than spec level amendment to stop the whole device 
(including both vq and config space). For now we can have vDPA specific 
means to control the vq, something vDPA hardware vendor must support for 
live migration, e.g. datapath switching to shadow vq. I believe the spec 
amendment may follow to define a bit for virtio feature negotiation 
later on if needed (FWIW virtio-vdpa already does set_vq_ready(..., 0) 
to stop the vq).

However, there's a flaw in this patch, see below.
>
> Note that the stop doesn't just work for virtqueue but others like,
> e.g config space. But considering we don't have config interrupt
> support right now, we're probably fine.
>
> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
> problematic, Ling Shan, please have a check. And we probably need a
> workaround for vp_vdpa as well.
>
> Anyhow, this seems to be better than reset. So for 7.1:
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>> ---
>>   hw/virtio/vhost-vdpa.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..d858b4f 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = ready,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           if (unlikely(!ok)) {
>>               return -1;
>>           }
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>           ok = vhost_vdpa_svqs_stop(dev);
>>           if (unlikely(!ok)) {
>>               return -1;
>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           memory_listener_register(&v->listener, &address_space_memory);
>>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>       } else {
>> -        vhost_vdpa_reset_device(dev);
Unfortunately, the reset can't be be removed from here as this code path 
usually involves virtio reset or status change for e.g. invoked via 
virtio_net_set_status(... , 0). Ideally we should use the 
VhostOps.vhost_reset_device() to reset the vhost-vdpa device where 
status change is involved after vhost_dev_stop() is done, but this 
distinction is not there yet as of today in all of the virtio devices 
except vhost_user_scsi.

Alternatively we may be able to do something like below, stop the 
virtqueue in vhost_vdpa_get_vring_base() in the vhost_virtqueue_stop() 
context. Only until the hardware vq is stopped, svq can stop and unmap 
then vhost-vdpa would reset the device status. It kinda works, but not 
in a perfect way...

--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct 
vhost_dev *dev, int idx)
      return idx;
  }

-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int enable)
  {
      int i;
      trace_vhost_vdpa_set_vring_ready(dev);
      for (i = 0; i < dev->nvqs; ++i) {
          struct vhost_vring_state state = {
              .index = dev->vq_index + i,
-            .num = 1,
+            .num = enable,
          };
          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
      }
@@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
*dev, bool started)

      if (started) {
          vhost_vdpa_host_notifiers_init(dev);
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
      } else {
          vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
      }
@@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct 
vhost_dev *dev,
  {
      int ret;

+    /* Deactivate the queue (best effort) */
+    vhost_vdpa_set_vring_ready(dev, 0);
+
      ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
      trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
      return ret;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 437347a..2e917d8 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
VirtIODevice *vdev)
      /* should only be called after backend is connected */
      assert(hdev->vhost_ops);

-    if (hdev->vhost_ops->vhost_dev_start) {
-        hdev->vhost_ops->vhost_dev_start(hdev, false);
-    }
      for (i = 0; i < hdev->nvqs; ++i) {
          vhost_virtqueue_stop(hdev,
                               vdev,
                               hdev->vqs + i,
                               hdev->vq_index + i);
      }
+    if (hdev->vhost_ops->vhost_dev_start) {
+        hdev->vhost_ops->vhost_dev_start(hdev, false);
+    }

      if (vhost_dev_has_iommu(hdev)) {
          if (hdev->vhost_ops->vhost_set_iotlb_callback) {

Regards,
-Siwei

>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>                                      VIRTIO_CONFIG_S_DRIVER);
>>           memory_listener_unregister(&v->listener);
>> --
>> 1.8.3.1
>>
>



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] vdpa: Avoid reset when stop device
       [not found]     ` <fe13304f-0a18-639e-580d-ce6eb7daecab@archeros.com>
@ 2022-03-25 19:19       ` Si-Wei Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Si-Wei Liu @ 2022-03-25 19:19 UTC (permalink / raw)
  To: Michael Qiu, Jason Wang, 08005325, Eugenio Perez Martin
  Cc: Zhu Lingshan, qemu-devel, Cindy Lu



On 3/25/2022 2:00 AM, Michael Qiu wrote:
>
>
> On 2022/3/25 14:32, Si-Wei Liu wrote:
>>
>>
>> On 3/23/2022 2:20 AM, Jason Wang wrote:
>>> Adding Eugenio,  and Ling Shan.
>>>
>>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>>>> From: Michael Qiu <qiudayu@archeros.com>
>>>>
>>>> Currently, when VM poweroff, it will trigger vdpa
>>>> device(such as mlx bluefield2 VF) reset twice, this leads
>>>> to below issue:
>>>>
>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>
>>>> This because in vhost_dev_stop(), qemu tries to stop the device,
>>>> then stop the queue: vhost_virtqueue_stop().
>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>> in low level driver, and the driver finds
>>>> that the VQ is invalied, this is the root cause.
>>>>
>>>> Actually, device reset will be called within func release()
>>>>
>>>> To solve the issue, vdpa should set vring unready, and
>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>> This is an interesting issue. Do you see a real issue except for the
>>> above warnings.
>>>
>>> The reason we "abuse" reset is that we don't have a stop uAPI for
>>> vhost. We plan to add a status bit to stop the whole device in the
>>> virtio spec, but considering it may take a while maybe we can first
>>> introduce a new uAPI/ioctl for that.
>> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue 
>> stop/suspend rather than spec level amendment to stop the whole 
>> device (including both vq and config space). For now we can have vDPA 
>> specific means to control the vq, something vDPA hardware vendor must 
>> support for live migration, e.g. datapath switching to shadow vq. I 
>> believe the spec amendment may follow to define a bit for virtio 
>> feature negotiation later on if needed (FWIW virtio-vdpa already does 
>> set_vq_ready(..., 0) to stop the vq).
>>
>> However, there's a flaw in this patch, see below.
>>>
>>> Note that the stop doesn't just work for virtqueue but others like,
>>> e.g config space. But considering we don't have config interrupt
>>> support right now, we're probably fine.
>>>
>>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
>>> problematic, Ling Shan, please have a check. And we probably need a
>>> workaround for vp_vdpa as well.
>>>
>>> Anyhow, this seems to be better than reset. So for 7.1:
>>>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>
>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>> ---
>>>>   hw/virtio/vhost-vdpa.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..d858b4f 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct 
>>>> vhost_dev *dev, int idx)
>>>>       return idx;
>>>>   }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, 
>>>> unsigned int ready)
>>>>   {
>>>>       int i;
>>>>       trace_vhost_vdpa_set_vring_ready(dev);
>>>>       for (i = 0; i < dev->nvqs; ++i) {
>>>>           struct vhost_vring_state state = {
>>>>               .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = ready,
>>>>           };
>>>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>       }
>>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>>           }
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>       } else {
>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>           ok = vhost_vdpa_svqs_stop(dev);
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           memory_listener_register(&v->listener, 
>>>> &address_space_memory);
>>>>           return vhost_vdpa_add_status(dev, 
>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>       } else {
>>>> -        vhost_vdpa_reset_device(dev);
>> Unfortunately, the reset can't be be removed from here as this code 
>> path usually involves virtio reset or status change for e.g. invoked 
>> via virtio_net_set_status(... , 0). Ideally we should use the 
>> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where 
>> status change is involved after vhost_dev_stop() is done, but this 
>> distinction is not there yet as of today in all of the virtio devices 
>> except vhost_user_scsi.
>>
>
> Actually, we may not care about virtio_net_set_status(... , 0), 
> because in virtio_net_device_unrealize() will finnally call 
> qemu_del_nic(),
The reset is needed because guest can write 0 to the device status 
register to initiate device reset while VM is running, that's a very 
common scenario where virtio_net_set_status(... , 0) has to be invoked. 
Quoting the spec:

-----------------%<-----------------

2.1.2 Device Requirements: Device Status Field
The device MUST initialize device status to 0 upon reset.
...
device_status
The driver writes the device status here (see 2.1). Writing 0 into this 
field resets the device.

-----------------%<-----------------

That being said, remove vhost_vdpa_reset_device() will introduce severe 
regression to vdpa functionality, for e.g. you may see weird error or 
panic once guest is rebooted as the device state may have been messed 
up. As indicated earlier, to fix it in a clean way it would need to 
involve serious code refactoring to all callers of vhost_dev_stop, and 
converting those which require device reset to explicitly call 
VhostOps.vhost_reset_device().

> see below:
>
> qemu_del_nic()
>     -->qemu_cleanup_net_client()
>         -->cleanup/vhost_vdpa_cleanup()
>             -->qemu_close(s->vhost_vdpa.device_fd)
>
> In kernel space, close() action triggered release(),
> release()/vhost_vdpa_release()
>     --> vhost_vdpa_reset()
>
> So it will finnally do vdpa_reset, that's why I said reset will be 
> called twice in current qemu code.

That's a minor problem as nobody cares about the extra reset while guest 
is being shut off.


Regards,
-Siwei
>
> Thanks,
> Michael
>
>> Alternatively we may be able to do something like below, stop the 
>> virtqueue in vhost_vdpa_get_vring_base() in the 
>> vhost_virtqueue_stop() context. Only until the hardware vq is 
>> stopped, svq can stop and unmap then vhost-vdpa would reset the 
>> device status. It kinda works, but not in a perfect way...
>>
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct 
>> vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int 
>> enable)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = enable,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>
>>       if (started) {
>>           vhost_vdpa_host_notifiers_init(dev);
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>       }
>> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct 
>> vhost_dev *dev,
>>   {
>>       int ret;
>>
>> +    /* Deactivate the queue (best effort) */
>> +    vhost_vdpa_set_vring_ready(dev, 0);
>> +
>>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
>>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
>>       return ret;
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 437347a..2e917d8 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
>> VirtIODevice *vdev)
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>>
>> -    if (hdev->vhost_ops->vhost_dev_start) {
>> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> -    }
>>       for (i = 0; i < hdev->nvqs; ++i) {
>>           vhost_virtqueue_stop(hdev,
>>                                vdev,
>>                                hdev->vqs + i,
>>                                hdev->vq_index + i);
>>       }
>> +    if (hdev->vhost_ops->vhost_dev_start) {
>> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> +    }
>>
>>       if (vhost_dev_has_iommu(hdev)) {
>>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>>
>> Regards,
>> -Siwei
>>
>>>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> VIRTIO_CONFIG_S_DRIVER);
>>>>           memory_listener_unregister(&v->listener);
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>
>>
>>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] vdpa: Avoid reset when stop device
       [not found]     ` <6fbf82a9-39ce-f179-5e4b-384123ca542c@archeros.com>
@ 2022-03-25 19:59       ` Si-Wei Liu
  2022-03-30  8:52         ` Jason Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Si-Wei Liu @ 2022-03-25 19:59 UTC (permalink / raw)
  To: Michael Qiu, Jason Wang, 08005325, Eugenio Perez Martin
  Cc: Zhu Lingshan, qemu-devel, Cindy Lu



On 3/25/2022 12:18 AM, Michael Qiu wrote:
>
>
> On 2022/3/25 14:32, Si-Wei Liu Wrote:
>>
>>
>> On 3/23/2022 2:20 AM, Jason Wang wrote:
>>> Adding Eugenio,  and Ling Shan.
>>>
>>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>>>> From: Michael Qiu <qiudayu@archeros.com>
>>>>
>>>> Currently, when VM poweroff, it will trigger vdpa
>>>> device(such as mlx bluefield2 VF) reset twice, this leads
>>>> to below issue:
>>>>
>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>
>>>> This because in vhost_dev_stop(), qemu tries to stop the device,
>>>> then stop the queue: vhost_virtqueue_stop().
>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>> in low level driver, and the driver finds
>>>> that the VQ is invalied, this is the root cause.
>>>>
>>>> Actually, device reset will be called within func release()
>>>>
>>>> To solve the issue, vdpa should set vring unready, and
>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>> This is an interesting issue. Do you see a real issue except for the
>>> above warnings.
>>>
>>> The reason we "abuse" reset is that we don't have a stop uAPI for
>>> vhost. We plan to add a status bit to stop the whole device in the
>>> virtio spec, but considering it may take a while maybe we can first
>>> introduce a new uAPI/ioctl for that.
>> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue 
>> stop/suspend rather than spec level amendment to stop the whole 
>> device (including both vq and config space). For now we can have vDPA 
>> specific means to control the vq, something vDPA hardware vendor must 
>> support for live migration, e.g. datapath switching to shadow vq. I 
>> believe the spec amendment may follow to define a bit for virtio 
>> feature negotiation later on if needed (FWIW virtio-vdpa already does 
>> set_vq_ready(..., 0) to stop the vq).
>>
>> However, there's a flaw in this patch, see below.
>>>
>>> Note that the stop doesn't just work for virtqueue but others like,
>>> e.g config space. But considering we don't have config interrupt
>>> support right now, we're probably fine.
>>>
>>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
>>> problematic, Ling Shan, please have a check. And we probably need a
>>> workaround for vp_vdpa as well.
>>>
>>> Anyhow, this seems to be better than reset. So for 7.1:
>>>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>
>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>> ---
>>>>   hw/virtio/vhost-vdpa.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..d858b4f 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct 
>>>> vhost_dev *dev, int idx)
>>>>       return idx;
>>>>   }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, 
>>>> unsigned int ready)
>>>>   {
>>>>       int i;
>>>>       trace_vhost_vdpa_set_vring_ready(dev);
>>>>       for (i = 0; i < dev->nvqs; ++i) {
>>>>           struct vhost_vring_state state = {
>>>>               .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = ready,
>>>>           };
>>>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>       }
>>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>>           }
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>       } else {
>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>           ok = vhost_vdpa_svqs_stop(dev);
>>>>           if (unlikely(!ok)) {
>>>>               return -1;
>>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct 
>>>> vhost_dev *dev, bool started)
>>>>           memory_listener_register(&v->listener, 
>>>> &address_space_memory);
>>>>           return vhost_vdpa_add_status(dev, 
>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>       } else {
>>>> -        vhost_vdpa_reset_device(dev);
>> Unfortunately, the reset can't be be removed from here as this code 
>> path usually involves virtio reset or status change for e.g. invoked 
>> via virtio_net_set_status(... , 0). Ideally we should use the 
>> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where 
>> status change is involved after vhost_dev_stop() is done, but this 
>> distinction is not there yet as of today in all of the virtio devices 
>> except vhost_user_scsi.
>>
>> Alternatively we may be able to do something like below, stop the 
>> virtqueue in vhost_vdpa_get_vring_base() in the 
>> vhost_virtqueue_stop() context. Only until the hardware vq is 
>> stopped, svq can stop and unmap then vhost-vdpa would reset the 
>> device status. It kinda works, but not in a perfect way...
As I indicated above, this is an less ideal way to address the issue you 
came across about, without losing functionality or introducing 
regression to the code. Ideally it'd be best to get fixed in a clean 
way, though that would a little more effort in code refactoring. 
Personally I feel that the error message you saw is somewhat benign and 
don't think it caused any real problem. Did you see trouble if living 
with the bogus error message for the moment?

>>
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct 
>> vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int 
>> enable)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = enable,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>
>>       if (started) {
>>           vhost_vdpa_host_notifiers_init(dev);
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>       }
>> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct 
>> vhost_dev *dev,
>>   {
>>       int ret;
>>
>> +    /* Deactivate the queue (best effort) */
>> +    vhost_vdpa_set_vring_ready(dev, 0);
>> +
>
> I don't think it's a good idea to change the state in "get" function,
>
> get means "read" not "write".
Well, if you look at the context of vhost_vdpa_get_vring_base(), the 
only caller is vhost_virtqueue_stop(). Without stopping the hardware 
ahead, it doesn't make sense to effectively get a used_index snapshot 
for resuming/restarting the vq. It might be more obvious and sensible to 
see that were to introduce another Vhost op to suspend the vq right 
before the get_vring_base() call, though I wouldn't bother doing it.

>
>>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
>>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
>>       return ret;
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 437347a..2e917d8 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
>> VirtIODevice *vdev)
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>>
>> -    if (hdev->vhost_ops->vhost_dev_start) {
>> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> -    }
>>       for (i = 0; i < hdev->nvqs; ++i) {
>>           vhost_virtqueue_stop(hdev,
>>                                vdev,
>>                                hdev->vqs + i,
>>                                hdev->vq_index + i);
>>       }
>> +    if (hdev->vhost_ops->vhost_dev_start) {
>> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
>> +    }
>>
>
> This first idea comes to me is just like this, but at last I don't 
> choose this solution.
>
> When we start a device, first we start the virtqueue then 
> vhost_ops->vhost_dev_start.
>
> So in stop stage, in my opinion, we should just do the opposite, do as 
> the orignal code do. Change the sequential is a dangerous action.
I don't see any danger yet, would you please elaborate the specific 
problem you see? I think this sequence is as expected:

1. suspend each individual vq i.e. stop processing upcoming request, and 
possibly complete inflight requests  -> get_vring_base()
2. tear down virtio resources from VMM for e.g. unmap guest memory 
mappings, remove host notifiers, and et al
3. reset device -> vhost_vdpa_reset_device()

Regards,
-Siwei

>
> Thanks,
> Michael
>>       if (vhost_dev_has_iommu(hdev)) {
>>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>>
>> Regards,
>> -Siwei
>>
>>>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> VIRTIO_CONFIG_S_DRIVER);
>>>>           memory_listener_unregister(&v->listener);
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>
>>
>>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] vdpa: Avoid reset when stop device
  2022-03-25 19:59       ` Si-Wei Liu
@ 2022-03-30  8:52         ` Jason Wang
  2022-03-30  9:53           ` Michael Qiu
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Wang @ 2022-03-30  8:52 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Cindy Lu, qemu-devel, Eugenio Perez Martin, Michael Qiu,
	Zhu Lingshan, 08005325

On Sat, Mar 26, 2022 at 3:59 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/25/2022 12:18 AM, Michael Qiu wrote:
> >
> >
> > On 2022/3/25 14:32, Si-Wei Liu Wrote:
> >>
> >>
> >> On 3/23/2022 2:20 AM, Jason Wang wrote:
> >>> Adding Eugenio,  and Ling Shan.
> >>>
> >>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
> >>>> From: Michael Qiu <qiudayu@archeros.com>
> >>>>
> >>>> Currently, when VM poweroff, it will trigger vdpa
> >>>> device(such as mlx bluefield2 VF) reset twice, this leads
> >>>> to below issue:
> >>>>
> >>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> >>>>
> >>>> This because in vhost_dev_stop(), qemu tries to stop the device,
> >>>> then stop the queue: vhost_virtqueue_stop().
> >>>> In vhost_dev_stop(), it resets the device, which clear some flags
> >>>> in low level driver, and the driver finds
> >>>> that the VQ is invalied, this is the root cause.
> >>>>
> >>>> Actually, device reset will be called within func release()
> >>>>
> >>>> To solve the issue, vdpa should set vring unready, and
> >>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
> >>> This is an interesting issue. Do you see a real issue except for the
> >>> above warnings.
> >>>
> >>> The reason we "abuse" reset is that we don't have a stop uAPI for
> >>> vhost. We plan to add a status bit to stop the whole device in the
> >>> virtio spec, but considering it may take a while maybe we can first
> >>> introduce a new uAPI/ioctl for that.
> >> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue
> >> stop/suspend rather than spec level amendment to stop the whole
> >> device (including both vq and config space). For now we can have vDPA
> >> specific means to control the vq, something vDPA hardware vendor must
> >> support for live migration, e.g. datapath switching to shadow vq. I
> >> believe the spec amendment may follow to define a bit for virtio
> >> feature negotiation later on if needed (FWIW virtio-vdpa already does
> >> set_vq_ready(..., 0) to stop the vq).
> >>
> >> However, there's a flaw in this patch, see below.
> >>>
> >>> Note that the stop doesn't just work for virtqueue but others like,
> >>> e.g config space. But considering we don't have config interrupt
> >>> support right now, we're probably fine.
> >>>
> >>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
> >>> problematic, Ling Shan, please have a check. And we probably need a
> >>> workaround for vp_vdpa as well.
> >>>
> >>> Anyhow, this seems to be better than reset. So for 7.1:
> >>>
> >>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>>
> >>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> >>>> ---
> >>>>   hw/virtio/vhost-vdpa.c | 8 ++++----
> >>>>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>> index c5ed7a3..d858b4f 100644
> >>>> --- a/hw/virtio/vhost-vdpa.c
> >>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct
> >>>> vhost_dev *dev, int idx)
> >>>>       return idx;
> >>>>   }
> >>>>
> >>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev,
> >>>> unsigned int ready)
> >>>>   {
> >>>>       int i;
> >>>>       trace_vhost_vdpa_set_vring_ready(dev);
> >>>>       for (i = 0; i < dev->nvqs; ++i) {
> >>>>           struct vhost_vring_state state = {
> >>>>               .index = dev->vq_index + i,
> >>>> -            .num = 1,
> >>>> +            .num = ready,
> >>>>           };
> >>>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>>>       }
> >>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct
> >>>> vhost_dev *dev, bool started)
> >>>>           if (unlikely(!ok)) {
> >>>>               return -1;
> >>>>           }
> >>>> -        vhost_vdpa_set_vring_ready(dev);
> >>>> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>>>       } else {
> >>>> +        vhost_vdpa_set_vring_ready(dev, 0);
> >>>>           ok = vhost_vdpa_svqs_stop(dev);
> >>>>           if (unlikely(!ok)) {
> >>>>               return -1;
> >>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct
> >>>> vhost_dev *dev, bool started)
> >>>>           memory_listener_register(&v->listener,
> >>>> &address_space_memory);
> >>>>           return vhost_vdpa_add_status(dev,
> >>>> VIRTIO_CONFIG_S_DRIVER_OK);
> >>>>       } else {
> >>>> -        vhost_vdpa_reset_device(dev);
> >> Unfortunately, the reset can't be be removed from here as this code
> >> path usually involves virtio reset or status change for e.g. invoked
> >> via virtio_net_set_status(... , 0). Ideally we should use the
> >> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where
> >> status change is involved after vhost_dev_stop() is done, but this
> >> distinction is not there yet as of today in all of the virtio devices
> >> except vhost_user_scsi.
> >>
> >> Alternatively we may be able to do something like below, stop the
> >> virtqueue in vhost_vdpa_get_vring_base() in the
> >> vhost_virtqueue_stop() context. Only until the hardware vq is
> >> stopped, svq can stop and unmap then vhost-vdpa would reset the
> >> device status. It kinda works, but not in a perfect way...
> As I indicated above, this is an less ideal way to address the issue you
> came across about, without losing functionality or introducing
> regression to the code. Ideally it'd be best to get fixed in a clean
> way, though that would a little more effort in code refactoring.
> Personally I feel that the error message you saw is somewhat benign and
> don't think it caused any real problem. Did you see trouble if living
> with the bogus error message for the moment?

Should be fine for networking devices at least since we don't care
about duplicated packets (restore last_avail_idx as used_idx).

But it might be problematic for types of devices.

Thanks


>
> >>
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct
> >> vhost_dev *dev, int idx)
> >>       return idx;
> >>   }
> >>
> >> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int
> >> enable)
> >>   {
> >>       int i;
> >>       trace_vhost_vdpa_set_vring_ready(dev);
> >>       for (i = 0; i < dev->nvqs; ++i) {
> >>           struct vhost_vring_state state = {
> >>               .index = dev->vq_index + i,
> >> -            .num = 1,
> >> +            .num = enable,
> >>           };
> >>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>       }
> >> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev
> >> *dev, bool started)
> >>
> >>       if (started) {
> >>           vhost_vdpa_host_notifiers_init(dev);
> >> -        vhost_vdpa_set_vring_ready(dev);
> >> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>       } else {
> >>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >>       }
> >> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct
> >> vhost_dev *dev,
> >>   {
> >>       int ret;
> >>
> >> +    /* Deactivate the queue (best effort) */
> >> +    vhost_vdpa_set_vring_ready(dev, 0);
> >> +
> >
> > I don't think it's a good idea to change the state in "get" function,
> >
> > get means "read" not "write".
> Well, if you look at the context of vhost_vdpa_get_vring_base(), the
> only caller is vhost_virtqueue_stop(). Without stopping the hardware
> ahead, it doesn't make sense to effectively get a used_index snapshot
> for resuming/restarting the vq. It might be more obvious and sensible to
> see that were to introduce another Vhost op to suspend the vq right
> before the get_vring_base() call, though I wouldn't bother doing it.
>
> >
> >>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> >>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
> >>       return ret;
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 437347a..2e917d8 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev,
> >> VirtIODevice *vdev)
> >>       /* should only be called after backend is connected */
> >>       assert(hdev->vhost_ops);
> >>
> >> -    if (hdev->vhost_ops->vhost_dev_start) {
> >> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
> >> -    }
> >>       for (i = 0; i < hdev->nvqs; ++i) {
> >>           vhost_virtqueue_stop(hdev,
> >>                                vdev,
> >>                                hdev->vqs + i,
> >>                                hdev->vq_index + i);
> >>       }
> >> +    if (hdev->vhost_ops->vhost_dev_start) {
> >> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
> >> +    }
> >>
> >
> > This first idea comes to me is just like this, but at last I don't
> > choose this solution.
> >
> > When we start a device, first we start the virtqueue then
> > vhost_ops->vhost_dev_start.
> >
> > So in stop stage, in my opinion, we should just do the opposite, do as
> > the orignal code do. Change the sequential is a dangerous action.
> I don't see any danger yet, would you please elaborate the specific
> problem you see? I think this sequence is as expected:
>
> 1. suspend each individual vq i.e. stop processing upcoming request, and
> possibly complete inflight requests  -> get_vring_base()
> 2. tear down virtio resources from VMM for e.g. unmap guest memory
> mappings, remove host notifiers, and et al
> 3. reset device -> vhost_vdpa_reset_device()
>
> Regards,
> -Siwei
>
> >
> > Thanks,
> > Michael
> >>       if (vhost_dev_has_iommu(hdev)) {
> >>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
> >>
> >> Regards,
> >> -Siwei
> >>
> >>>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >>>> VIRTIO_CONFIG_S_DRIVER);
> >>>>           memory_listener_unregister(&v->listener);
> >>>> --
> >>>> 1.8.3.1
> >>>>
> >>>
> >>
> >>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] vdpa: Avoid reset when stop device
  2022-03-30  8:52         ` Jason Wang
@ 2022-03-30  9:53           ` Michael Qiu
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-03-30  9:53 UTC (permalink / raw)
  To: Jason Wang, Si-Wei Liu
  Cc: Eugenio Perez Martin, Zhu Lingshan, Cindy Lu, qemu-devel, 08005325



On 2022/3/30 16:52, Jason Wang wrote:
> On Sat, Mar 26, 2022 at 3:59 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>>
>> On 3/25/2022 12:18 AM, Michael Qiu wrote:
>>>
>>>
>>> On 2022/3/25 14:32, Si-Wei Liu Wrote:
>>>>
>>>>
>>>> On 3/23/2022 2:20 AM, Jason Wang wrote:
>>>>> Adding Eugenio,  and Ling Shan.
>>>>>
>>>>> On Wed, Mar 23, 2022 at 4:58 PM <08005325@163.com> wrote:
>>>>>> From: Michael Qiu <qiudayu@archeros.com>
>>>>>>
>>>>>> Currently, when VM poweroff, it will trigger vdpa
>>>>>> device(such as mlx bluefield2 VF) reset twice, this leads
>>>>>> to below issue:
>>>>>>
>>>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>>>
>>>>>> This because in vhost_dev_stop(), qemu tries to stop the device,
>>>>>> then stop the queue: vhost_virtqueue_stop().
>>>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>>>> in low level driver, and the driver finds
>>>>>> that the VQ is invalied, this is the root cause.
>>>>>>
>>>>>> Actually, device reset will be called within func release()
>>>>>>
>>>>>> To solve the issue, vdpa should set vring unready, and
>>>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>> This is an interesting issue. Do you see a real issue except for the
>>>>> above warnings.
>>>>>
>>>>> The reason we "abuse" reset is that we don't have a stop uAPI for
>>>>> vhost. We plan to add a status bit to stop the whole device in the
>>>>> virtio spec, but considering it may take a while maybe we can first
>>>>> introduce a new uAPI/ioctl for that.
>>>> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue
>>>> stop/suspend rather than spec level amendment to stop the whole
>>>> device (including both vq and config space). For now we can have vDPA
>>>> specific means to control the vq, something vDPA hardware vendor must
>>>> support for live migration, e.g. datapath switching to shadow vq. I
>>>> believe the spec amendment may follow to define a bit for virtio
>>>> feature negotiation later on if needed (FWIW virtio-vdpa already does
>>>> set_vq_ready(..., 0) to stop the vq).
>>>>
>>>> However, there's a flaw in this patch, see below.
>>>>>
>>>>> Note that the stop doesn't just work for virtqueue but others like,
>>>>> e.g config space. But considering we don't have config interrupt
>>>>> support right now, we're probably fine.
>>>>>
>>>>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
>>>>> problematic, Ling Shan, please have a check. And we probably need a
>>>>> workaround for vp_vdpa as well.
>>>>>
>>>>> Anyhow, this seems to be better than reset. So for 7.1:
>>>>>
>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>>
>>>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>>>> ---
>>>>>>    hw/virtio/vhost-vdpa.c | 8 ++++----
>>>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>> index c5ed7a3..d858b4f 100644
>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct
>>>>>> vhost_dev *dev, int idx)
>>>>>>        return idx;
>>>>>>    }
>>>>>>
>>>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev,
>>>>>> unsigned int ready)
>>>>>>    {
>>>>>>        int i;
>>>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>>>            struct vhost_vring_state state = {
>>>>>>                .index = dev->vq_index + i,
>>>>>> -            .num = 1,
>>>>>> +            .num = ready,
>>>>>>            };
>>>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>>>        }
>>>>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct
>>>>>> vhost_dev *dev, bool started)
>>>>>>            if (unlikely(!ok)) {
>>>>>>                return -1;
>>>>>>            }
>>>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>>>        } else {
>>>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>>>            if (unlikely(!ok)) {
>>>>>>                return -1;
>>>>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct
>>>>>> vhost_dev *dev, bool started)
>>>>>>            memory_listener_register(&v->listener,
>>>>>> &address_space_memory);
>>>>>>            return vhost_vdpa_add_status(dev,
>>>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>>        } else {
>>>>>> -        vhost_vdpa_reset_device(dev);
>>>> Unfortunately, the reset can't be be removed from here as this code
>>>> path usually involves virtio reset or status change for e.g. invoked
>>>> via virtio_net_set_status(... , 0). Ideally we should use the
>>>> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where
>>>> status change is involved after vhost_dev_stop() is done, but this
>>>> distinction is not there yet as of today in all of the virtio devices
>>>> except vhost_user_scsi.
>>>>
>>>> Alternatively we may be able to do something like below, stop the
>>>> virtqueue in vhost_vdpa_get_vring_base() in the
>>>> vhost_virtqueue_stop() context. Only until the hardware vq is
>>>> stopped, svq can stop and unmap then vhost-vdpa would reset the
>>>> device status. It kinda works, but not in a perfect way...
>> As I indicated above, this is an less ideal way to address the issue you
>> came across about, without losing functionality or introducing
>> regression to the code. Ideally it'd be best to get fixed in a clean
>> way, though that would a little more effort in code refactoring.
>> Personally I feel that the error message you saw is somewhat benign and
>> don't think it caused any real problem. Did you see trouble if living
>> with the bogus error message for the moment?
> 
> Should be fine for networking devices at least since we don't care
> about duplicated packets (restore last_avail_idx as used_idx).
> 
> But it might be problematic for types of devices.
> 
> Thanks
> 

I find that the second reset does not triggered by device qemu_del_nic, 
although it will trigger device reset, the rootcause is that,
we try to stop the vhost devices in each queue pair and control queue, 
each queue pair or control queue needs one vhost device. And all vhost 
devices share one kernel vdpa device.

In my case, only enable 1 datapath queue pair and one control queue.

the first time vhost_net_stop_one is for datapath queue pair, but
at this time the backend kernel vdpa device has been reset, in the
second loop, when stop control queue backend vhost device, the queue
already disappeared.

I will send V2 to fix the issue, to reset the device in the last vhost 
device stop stage.

> 
>>
>>>>
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct
>>>> vhost_dev *dev, int idx)
>>>>        return idx;
>>>>    }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int
>>>> enable)
>>>>    {
>>>>        int i;
>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>            struct vhost_vring_state state = {
>>>>                .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = enable,
>>>>            };
>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>        }
>>>> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev
>>>> *dev, bool started)
>>>>
>>>>        if (started) {
>>>>            vhost_vdpa_host_notifiers_init(dev);
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>        } else {
>>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>>        }
>>>> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct
>>>> vhost_dev *dev,
>>>>    {
>>>>        int ret;
>>>>
>>>> +    /* Deactivate the queue (best effort) */
>>>> +    vhost_vdpa_set_vring_ready(dev, 0);
>>>> +
>>>
>>> I don't think it's a good idea to change the state in "get" function,
>>>
>>> get means "read" not "write".
>> Well, if you look at the context of vhost_vdpa_get_vring_base(), the
>> only caller is vhost_virtqueue_stop(). Without stopping the hardware
>> ahead, it doesn't make sense to effectively get a used_index snapshot
>> for resuming/restarting the vq. It might be more obvious and sensible to
>> see that were to introduce another Vhost op to suspend the vq right
>> before the get_vring_base() call, though I wouldn't bother doing it.
>>
>>>
>>>>        ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
>>>>        trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
>>>>        return ret;
>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>> index 437347a..2e917d8 100644
>>>> --- a/hw/virtio/vhost.c
>>>> +++ b/hw/virtio/vhost.c
>>>> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev,
>>>> VirtIODevice *vdev)
>>>>        /* should only be called after backend is connected */
>>>>        assert(hdev->vhost_ops);
>>>>
>>>> -    if (hdev->vhost_ops->vhost_dev_start) {
>>>> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
>>>> -    }
>>>>        for (i = 0; i < hdev->nvqs; ++i) {
>>>>            vhost_virtqueue_stop(hdev,
>>>>                                 vdev,
>>>>                                 hdev->vqs + i,
>>>>                                 hdev->vq_index + i);
>>>>        }
>>>> +    if (hdev->vhost_ops->vhost_dev_start) {
>>>> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
>>>> +    }
>>>>
>>>
>>> This first idea comes to me is just like this, but at last I don't
>>> choose this solution.
>>>
>>> When we start a device, first we start the virtqueue then
>>> vhost_ops->vhost_dev_start.
>>>
>>> So in stop stage, in my opinion, we should just do the opposite, do as
>>> the orignal code do. Change the sequential is a dangerous action.
>> I don't see any danger yet, would you please elaborate the specific
>> problem you see? I think this sequence is as expected:
>>
>> 1. suspend each individual vq i.e. stop processing upcoming request, and
>> possibly complete inflight requests  -> get_vring_base()
>> 2. tear down virtio resources from VMM for e.g. unmap guest memory
>> mappings, remove host notifiers, and et al
>> 3. reset device -> vhost_vdpa_reset_device()
>>
>> Regards,
>> -Siwei
>>
>>>
>>> Thanks,
>>> Michael
>>>>        if (vhost_dev_has_iommu(hdev)) {
>>>>            if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>>>>
>>>> Regards,
>>>> -Siwei
>>>>
>>>>>>            vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>>> VIRTIO_CONFIG_S_DRIVER);
>>>>>>            memory_listener_unregister(&v->listener);
>>>>>> --
>>>>>> 1.8.3.1
>>>>>>
>>>>>
>>>>
>>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device
  2022-03-23  8:42 [PATCH] vdpa: Avoid reset when stop device 08005325
  2022-03-23  9:20 ` Jason Wang
@ 2022-03-30 10:02 ` 08005325
  2022-03-30 10:52   ` Michael S. Tsirkin
                     ` (3 more replies)
  1 sibling, 4 replies; 47+ messages in thread
From: 08005325 @ 2022-03-30 10:02 UTC (permalink / raw)
  To: jasowang, mst
  Cc: lulu, qemu-devel, eperezma, Michael Qiu, si-wei.liu, lingshan.zhu

From: Michael Qiu <qiudayu@archeros.com>

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
 the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device when the last vhost stop triggerd.

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v2 --> v1:
   implement a new function vhost_dev_reset,
   reset the backend kernel device at last.

---
 hw/net/vhost_net.c        | 22 +++++++++++++++++++---
 hw/virtio/vhost-vdpa.c    |  8 ++++----
 hw/virtio/vhost.c         | 16 +++++++++++++++-
 include/hw/virtio/vhost.h |  1 +
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..3cdf6a4 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -299,7 +299,7 @@ fail_notifiers:
 }
 
 static void vhost_net_stop_one(struct vhost_net *net,
-                               VirtIODevice *dev)
+                               VirtIODevice *dev, bool reset)
 {
     struct vhost_vring_file file = { .fd = -1 };
 
@@ -313,6 +313,11 @@ static void vhost_net_stop_one(struct vhost_net *net,
         net->nc->info->poll(net->nc, true);
     }
     vhost_dev_stop(&net->dev, dev);
+
+    if (reset) {
+        vhost_dev_reset(&net->dev);
+    }
+
     vhost_dev_disable_notifiers(&net->dev, dev);
 }
 
@@ -391,7 +396,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 err_start:
     while (--i >= 0) {
         peer = qemu_get_peer(ncs , i);
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        if (i == 0) {
+            vhost_net_stop_one(get_vhost_net(peer), dev, true);
+        } else {
+            vhost_net_stop_one(get_vhost_net(peer), dev, false);
+        }
     }
     e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
     if (e < 0) {
@@ -420,7 +430,13 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
         } else {
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
         }
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        /* We only reset backend device during the last vhost */
+        if (i == nvhosts - 1) {
+            vhost_net_stop_one(get_vhost_net(peer), dev, true);
+        } else {
+            vhost_net_stop_one(get_vhost_net(peer), dev, false);
+        }
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..d858b4f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_vring_state state = {
             .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
     }
@@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
     } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
-        vhost_vdpa_reset_device(dev);
         vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                    VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b643f42..6d9b4a3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1820,7 +1820,7 @@ fail_features:
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
     int i;
-
+    printf("vhost_dev_stop test\n");
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
@@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+int vhost_dev_reset(struct vhost_dev *hdev)
+{
+    int ret = 0;
+
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops);
+
+    if (hdev->vhost_ops->vhost_reset_device) {
+        ret = hdev->vhost_ops->vhost_reset_device(hdev);
+    }
+
+    return ret;
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7..b8b7c20 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_reset(struct vhost_dev *hdev);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device
  2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
@ 2022-03-30 10:52   ` Michael S. Tsirkin
  2022-03-31  1:39     ` Michael Qiu
  2022-03-31  0:15   ` Si-Wei Liu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2022-03-30 10:52 UTC (permalink / raw)
  To: 08005325
  Cc: lulu, jasowang, qemu-devel, eperezma, Michael Qiu, si-wei.liu,
	lingshan.zhu

On Wed, Mar 30, 2022 at 06:02:41AM -0400, 08005325@163.com wrote:

It's an empty patch.

-- 
MST



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device
  2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
  2022-03-30 10:52   ` Michael S. Tsirkin
@ 2022-03-31  0:15   ` Si-Wei Liu
  2022-03-31  4:01     ` Michael Qiu
  2022-03-31  4:02     ` Michael Qiu
  2022-03-31  5:19   ` [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop() 08005325
  2022-03-31  9:25   ` [PATCH RESEND " qiudayu
  3 siblings, 2 replies; 47+ messages in thread
From: Si-Wei Liu @ 2022-03-31  0:15 UTC (permalink / raw)
  To: 08005325, jasowang, mst
  Cc: lingshan.zhu, eperezma, Michael Qiu, qemu-devel, lulu



On 3/30/2022 3:02 AM, 08005325@163.com wrote:
> From: Michael Qiu <qiudayu@archeros.com>
>
> Currently, when VM poweroff, it will trigger vdpa
> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> queue pair and one control queue, triggered 3 times), this
> leads to below issue:
>
> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>
> This because in vhost_net_stop(), it will stop all vhost device bind to
> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> , then stop the queue: vhost_virtqueue_stop().
>
> In vhost_dev_stop(), it resets the device, which clear some flags
> in low level driver, and in next loop(stop other vhost backends),
> qemu try to stop the queue corresponding to the vhost backend,
>   the driver finds that the VQ is invalied, this is the root cause.
>
> To solve the issue, vdpa should set vring unready, and
> remove reset ops in device stop: vhost_dev_start(hdev, false).
>
> and implement a new function vhost_dev_reset, only reset backend
> device when the last vhost stop triggerd.
>
> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
> v2 --> v1:
>     implement a new function vhost_dev_reset,
>     reset the backend kernel device at last.
>
> ---
>   hw/net/vhost_net.c        | 22 +++++++++++++++++++---
>   hw/virtio/vhost-vdpa.c    |  8 ++++----
>   hw/virtio/vhost.c         | 16 +++++++++++++++-
>   include/hw/virtio/vhost.h |  1 +
>   4 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 30379d2..3cdf6a4 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -299,7 +299,7 @@ fail_notifiers:
>   }
>   
>   static void vhost_net_stop_one(struct vhost_net *net,
> -                               VirtIODevice *dev)
> +                               VirtIODevice *dev, bool reset)
>   {
>       struct vhost_vring_file file = { .fd = -1 };
>   
> @@ -313,6 +313,11 @@ static void vhost_net_stop_one(struct vhost_net *net,
>           net->nc->info->poll(net->nc, true);
>       }
>       vhost_dev_stop(&net->dev, dev);
> +
> +    if (reset) {
> +        vhost_dev_reset(&net->dev);
> +    }
> +
>       vhost_dev_disable_notifiers(&net->dev, dev);
>   }
>   
> @@ -391,7 +396,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>   err_start:
>       while (--i >= 0) {
>           peer = qemu_get_peer(ncs , i);
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        if (i == 0) {
> +            vhost_net_stop_one(get_vhost_net(peer), dev, true);
> +        } else {
> +            vhost_net_stop_one(get_vhost_net(peer), dev, false);
> +        }
>       }
>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>       if (e < 0) {
> @@ -420,7 +430,13 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>           } else {
>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>           }
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        /* We only reset backend device during the last vhost */
> +        if (i == nvhosts - 1) {
I wonder if there's any specific reason to position device reset in the 
for loop given that there's no virtqueue level reset? Wouldn't it be 
cleaner to reset the device at the end of vhost_net_stop() before 
return? you may use qemu_get_peer(ncs, 0) without hassle? Noted the 
vhost_ops->vhost_reset_device op is per device rather per vq.

> +            vhost_net_stop_one(get_vhost_net(peer), dev, true);
> +        } else {
> +            vhost_net_stop_one(get_vhost_net(peer), dev, false);
> +        }
>       }
>   
>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..d858b4f 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>       return idx;
>   }
>   
> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>   {
>       int i;
>       trace_vhost_vdpa_set_vring_ready(dev);
>       for (i = 0; i < dev->nvqs; ++i) {
>           struct vhost_vring_state state = {
>               .index = dev->vq_index + i,
> -            .num = 1,
> +            .num = ready,
>           };
>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>       }
> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           if (unlikely(!ok)) {
>               return -1;
>           }
> -        vhost_vdpa_set_vring_ready(dev);
> +        vhost_vdpa_set_vring_ready(dev, 1);
>       } else {
> +        vhost_vdpa_set_vring_ready(dev, 0);
>           ok = vhost_vdpa_svqs_stop(dev);
>           if (unlikely(!ok)) {
>               return -1;
> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           memory_listener_register(&v->listener, &address_space_memory);
>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>       } else {
> -        vhost_vdpa_reset_device(dev);
>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                      VIRTIO_CONFIG_S_DRIVER);
Here's another issue (regression) got to address - the added 
S_ACKNOWLEDGE | S_DRIVER bits will be cleared right away by the 
follow-up reset in vhost_net_stop_one(... , true), which in turn will 
cause virtio fail to initialize e.g. vhost_vdpa_set_features() will fail 
to set VIRTIO_CONFIG_S_FEATURES_OK.

Ideally the status bit should be set whenever the corresponding status 
bit is set by virtio_net from virtio_net_vhost_status(), or practically 
it can be done at the very beginning of vhost_dev_start(), for e.g. the 
first call before vhost_dev_set_features(). For this purpose, you may 
consider adding another vhost_init_device op, which is symmetric to 
vhost_ops->vhost_reset_device in the vhost_net_stop() path.

Thanks,
-Siwei

>           memory_listener_unregister(&v->listener);
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index b643f42..6d9b4a3 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1820,7 +1820,7 @@ fail_features:
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>   {
>       int i;
> -
> +    printf("vhost_dev_stop test\n");
>       /* should only be called after backend is connected */
>       assert(hdev->vhost_ops);
>   
> @@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>   
>       return -ENOSYS;
>   }
> +
> +int vhost_dev_reset(struct vhost_dev *hdev)
> +{
> +    int ret = 0;
> +
> +    /* should only be called after backend is connected */
> +    assert(hdev->vhost_ops);
> +
> +    if (hdev->vhost_ops->vhost_reset_device) {
> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
> +    }
> +
> +    return ret;
> +}
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 58a73e7..b8b7c20 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> +int vhost_dev_reset(struct vhost_dev *hdev);
>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device
  2022-03-30 10:52   ` Michael S. Tsirkin
@ 2022-03-31  1:39     ` Michael Qiu
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-03-31  1:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, 08005325
  Cc: lulu, jasowang, qemu-devel, eperezma, si-wei.liu, lingshan.zhu

Michael,

Others has already received the patch, don't know why. Anyway, I will 
repost another version(V3).

Here is the V2 patch, see below:

From: Michael Qiu <qiudayu@archeros.com>

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
  the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device when the last vhost stop triggerd.

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v2 --> v1:
    implement a new function vhost_dev_reset,
    reset the backend kernel device at last.

---
  hw/net/vhost_net.c        | 22 +++++++++++++++++++---
  hw/virtio/vhost-vdpa.c    |  8 ++++----
  hw/virtio/vhost.c         | 16 +++++++++++++++-
  include/hw/virtio/vhost.h |  1 +
  4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..3cdf6a4 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -299,7 +299,7 @@ fail_notifiers:
  }

  static void vhost_net_stop_one(struct vhost_net *net,
-                               VirtIODevice *dev)
+                               VirtIODevice *dev, bool reset)
  {
      struct vhost_vring_file file = { .fd = -1 };

@@ -313,6 +313,11 @@ static void vhost_net_stop_one(struct vhost_net *net,
          net->nc->info->poll(net->nc, true);
      }
      vhost_dev_stop(&net->dev, dev);
+
+    if (reset) {
+        vhost_dev_reset(&net->dev);
+    }
+
      vhost_dev_disable_notifiers(&net->dev, dev);
  }

@@ -391,7 +396,12 @@ int vhost_net_start(VirtIODevice *dev, 
NetClientState *ncs,
  err_start:
      while (--i >= 0) {
          peer = qemu_get_peer(ncs , i);
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        if (i == 0) {
+            vhost_net_stop_one(get_vhost_net(peer), dev, true);
+        } else {
+            vhost_net_stop_one(get_vhost_net(peer), dev, false);
+        }
      }
      e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
      if (e < 0) {
@@ -420,7 +430,13 @@ void vhost_net_stop(VirtIODevice *dev, 
NetClientState *ncs,
          } else {
              peer = qemu_get_peer(ncs, n->max_queue_pairs);
          }
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        /* We only reset backend device during the last vhost */
+        if (i == nvhosts - 1) {
+            vhost_net_stop_one(get_vhost_net(peer), dev, true);
+        } else {
+            vhost_net_stop_one(get_vhost_net(peer), dev, false);
+        }
      }

      r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..d858b4f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct 
vhost_dev *dev, int idx)
      return idx;
  }

-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned 
int ready)
  {
      int i;
      trace_vhost_vdpa_set_vring_ready(dev);
      for (i = 0; i < dev->nvqs; ++i) {
          struct vhost_vring_state state = {
              .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
          };
          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
      }
@@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
*dev, bool started)
          if (unlikely(!ok)) {
              return -1;
          }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
      } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
          ok = vhost_vdpa_svqs_stop(dev);
          if (unlikely(!ok)) {
              return -1;
@@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
*dev, bool started)
          memory_listener_register(&v->listener, &address_space_memory);
          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
      } else {
-        vhost_vdpa_reset_device(dev);
          vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                     VIRTIO_CONFIG_S_DRIVER);
          memory_listener_unregister(&v->listener);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b643f42..6d9b4a3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1820,7 +1820,7 @@ fail_features:
  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
  {
      int i;
-
+    printf("vhost_dev_stop test\n");
      /* should only be called after backend is connected */
      assert(hdev->vhost_ops);

@@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,

      return -ENOSYS;
  }
+
+int vhost_dev_reset(struct vhost_dev *hdev)
+{
+    int ret = 0;
+
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops);
+
+    if (hdev->vhost_ops->vhost_reset_device) {
+        ret = hdev->vhost_ops->vhost_reset_device(hdev);
+    }
+
+    return ret;
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7..b8b7c20 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
  void vhost_dev_cleanup(struct vhost_dev *hdev);
  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_reset(struct vhost_dev *hdev);
  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice 
*vdev);
  void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice 
*vdev);

-- 
1.8.3.1




On 2022/3/30 18:52, Michael S. Tsirkin wrote:
> On Wed, Mar 30, 2022 at 06:02:41AM -0400, 08005325@163.com wrote:
> 
> It's an empty patch.
> 


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device
  2022-03-31  0:15   ` Si-Wei Liu
@ 2022-03-31  4:01     ` Michael Qiu
  2022-03-31  4:02     ` Michael Qiu
  1 sibling, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-03-31  4:01 UTC (permalink / raw)
  To: Si-Wei Liu, 08005325, jasowang, mst
  Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 2022/3/31 8:15, Si-Wei Liu wrote:
> 
> 
> On 3/30/2022 3:02 AM, 08005325@163.com wrote:
>> From: Michael Qiu <qiudayu@archeros.com>
>>
>> Currently, when VM poweroff, it will trigger vdpa
>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>> queue pair and one control queue, triggered 3 times), this
>> leads to below issue:
>>
>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>
>> This because in vhost_net_stop(), it will stop all vhost device bind to
>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the 
>> device
>> , then stop the queue: vhost_virtqueue_stop().
>>
>> In vhost_dev_stop(), it resets the device, which clear some flags
>> in low level driver, and in next loop(stop other vhost backends),
>> qemu try to stop the queue corresponding to the vhost backend,
>>   the driver finds that the VQ is invalied, this is the root cause.
>>
>> To solve the issue, vdpa should set vring unready, and
>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>
>> and implement a new function vhost_dev_reset, only reset backend
>> device when the last vhost stop triggerd.
>>
>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>> ---
>> v2 --> v1:
>>     implement a new function vhost_dev_reset,
>>     reset the backend kernel device at last.
>>
>> ---
>>   hw/net/vhost_net.c        | 22 +++++++++++++++++++---
>>   hw/virtio/vhost-vdpa.c    |  8 ++++----
>>   hw/virtio/vhost.c         | 16 +++++++++++++++-
>>   include/hw/virtio/vhost.h |  1 +
>>   4 files changed, 39 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index 30379d2..3cdf6a4 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -299,7 +299,7 @@ fail_notifiers:
>>   }
>>   static void vhost_net_stop_one(struct vhost_net *net,
>> -                               VirtIODevice *dev)
>> +                               VirtIODevice *dev, bool reset)
>>   {
>>       struct vhost_vring_file file = { .fd = -1 };
>> @@ -313,6 +313,11 @@ static void vhost_net_stop_one(struct vhost_net 
>> *net,
>>           net->nc->info->poll(net->nc, true);
>>       }
>>       vhost_dev_stop(&net->dev, dev);
>> +
>> +    if (reset) {
>> +        vhost_dev_reset(&net->dev);
>> +    }
>> +
>>       vhost_dev_disable_notifiers(&net->dev, dev);
>>   }
>> @@ -391,7 +396,12 @@ int vhost_net_start(VirtIODevice *dev, 
>> NetClientState *ncs,
>>   err_start:
>>       while (--i >= 0) {
>>           peer = qemu_get_peer(ncs , i);
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        if (i == 0) {
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, true);
>> +        } else {
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, false);
>> +        }
>>       }
>>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>       if (e < 0) {
>> @@ -420,7 +430,13 @@ void vhost_net_stop(VirtIODevice *dev, 
>> NetClientState *ncs,
>>           } else {
>>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>           }
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        /* We only reset backend device during the last vhost */
>> +        if (i == nvhosts - 1) {
> I wonder if there's any specific reason to position device reset in the 
> for loop given that there's no virtqueue level reset? Wouldn't it be 
> cleaner to reset the device at the end of vhost_net_stop() before 
> return? you may use qemu_get_peer(ncs, 0) without hassle? Noted the 
> vhost_ops->vhost_reset_device op is per device rather per vq.

OK, it's a good way to do reset at the end of vhost_net_stop(), I will 
change it in next version.

> 
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, true);
>> +        } else {
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, false);
>> +        }
>>       }
>>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..d858b4f 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct 
>> vhost_dev *dev, int idx)
>>       return idx;
>>   }
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned 
>> int ready)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = ready,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>           if (unlikely(!ok)) {
>>               return -1;
>>           }
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>           ok = vhost_vdpa_svqs_stop(dev);
>>           if (unlikely(!ok)) {
>>               return -1;
>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>           memory_listener_register(&v->listener, &address_space_memory);
>>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>       } else {
>> -        vhost_vdpa_reset_device(dev);
>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>                                      VIRTIO_CONFIG_S_DRIVER);
> Here's another issue (regression) got to address - the added 
> S_ACKNOWLEDGE | S_DRIVER bits will be cleared right away by the 
> follow-up reset in vhost_net_stop_one(... , true), which in turn will 
> cause virtio fail to initialize e.g. vhost_vdpa_set_features() will fail 
> to set VIRTIO_CONFIG_S_FEATURES_OK
> 
> Ideally the status bit should be set whenever the corresponding status 
> bit is set by virtio_net from virtio_net_vhost_status(), or practically 
> it can be done at the very beginning of vhost_dev_start(), for e.g. the 
> first call before vhost_dev_set_features(). For this purpose, you may 
> consider adding another vhost_init_device op, which is symmetric to 
> vhost_ops->vhost_reset_device in the vhost_net_stop() path.
> 

Seems only vdpa device need reset after stop, although virtio spec said 
need reset, but kernel doesn't reset, and if reset it has issue to 
reprobe virtio-net in guest, So we probely only add it after reset if it 
is VDPA device, for kernel and other datapath we just keep the same as 
before.

Thanks,
Michael

> Thanks,
> -Siwei
> 
>>           memory_listener_unregister(&v->listener);
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index b643f42..6d9b4a3 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1820,7 +1820,7 @@ fail_features:
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>   {
>>       int i;
>> -
>> +    printf("vhost_dev_stop test\n");
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>> @@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>       return -ENOSYS;
>>   }
>> +
>> +int vhost_dev_reset(struct vhost_dev *hdev)
>> +{
>> +    int ret = 0;
>> +
>> +    /* should only be called after backend is connected */
>> +    assert(hdev->vhost_ops);
>> +
>> +    if (hdev->vhost_ops->vhost_reset_device) {
>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>> +    }
>> +
>> +    return ret;
>> +}
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index 58a73e7..b8b7c20 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void 
>> *opaque,
>>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice 
>> *vdev);
>>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
>> VirtIODevice *vdev);
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device
  2022-03-31  0:15   ` Si-Wei Liu
  2022-03-31  4:01     ` Michael Qiu
@ 2022-03-31  4:02     ` Michael Qiu
  1 sibling, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-03-31  4:02 UTC (permalink / raw)
  To: Si-Wei Liu, 08005325, jasowang, mst
  Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 2022/3/31 8:15, Si-Wei Liu wrote:
> 
> 
> On 3/30/2022 3:02 AM, 08005325@163.com wrote:
>> From: Michael Qiu <qiudayu@archeros.com>
>>
>> Currently, when VM poweroff, it will trigger vdpa
>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>> queue pair and one control queue, triggered 3 times), this
>> leads to below issue:
>>
>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>
>> This because in vhost_net_stop(), it will stop all vhost device bind to
>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the 
>> device
>> , then stop the queue: vhost_virtqueue_stop().
>>
>> In vhost_dev_stop(), it resets the device, which clear some flags
>> in low level driver, and in next loop(stop other vhost backends),
>> qemu try to stop the queue corresponding to the vhost backend,
>>   the driver finds that the VQ is invalied, this is the root cause.
>>
>> To solve the issue, vdpa should set vring unready, and
>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>
>> and implement a new function vhost_dev_reset, only reset backend
>> device when the last vhost stop triggerd.
>>
>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>> ---
>> v2 --> v1:
>>     implement a new function vhost_dev_reset,
>>     reset the backend kernel device at last.
>>
>> ---
>>   hw/net/vhost_net.c        | 22 +++++++++++++++++++---
>>   hw/virtio/vhost-vdpa.c    |  8 ++++----
>>   hw/virtio/vhost.c         | 16 +++++++++++++++-
>>   include/hw/virtio/vhost.h |  1 +
>>   4 files changed, 39 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index 30379d2..3cdf6a4 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -299,7 +299,7 @@ fail_notifiers:
>>   }
>>   static void vhost_net_stop_one(struct vhost_net *net,
>> -                               VirtIODevice *dev)
>> +                               VirtIODevice *dev, bool reset)
>>   {
>>       struct vhost_vring_file file = { .fd = -1 };
>> @@ -313,6 +313,11 @@ static void vhost_net_stop_one(struct vhost_net 
>> *net,
>>           net->nc->info->poll(net->nc, true);
>>       }
>>       vhost_dev_stop(&net->dev, dev);
>> +
>> +    if (reset) {
>> +        vhost_dev_reset(&net->dev);
>> +    }
>> +
>>       vhost_dev_disable_notifiers(&net->dev, dev);
>>   }
>> @@ -391,7 +396,12 @@ int vhost_net_start(VirtIODevice *dev, 
>> NetClientState *ncs,
>>   err_start:
>>       while (--i >= 0) {
>>           peer = qemu_get_peer(ncs , i);
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        if (i == 0) {
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, true);
>> +        } else {
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, false);
>> +        }
>>       }
>>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>       if (e < 0) {
>> @@ -420,7 +430,13 @@ void vhost_net_stop(VirtIODevice *dev, 
>> NetClientState *ncs,
>>           } else {
>>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>           }
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        /* We only reset backend device during the last vhost */
>> +        if (i == nvhosts - 1) {
> I wonder if there's any specific reason to position device reset in the 
> for loop given that there's no virtqueue level reset? Wouldn't it be 
> cleaner to reset the device at the end of vhost_net_stop() before 
> return? you may use qemu_get_peer(ncs, 0) without hassle? Noted the 
> vhost_ops->vhost_reset_device op is per device rather per vq.

OK, it's a good way to do reset at the end of vhost_net_stop(), I will 
change it in next version.

> 
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, true);
>> +        } else {
>> +            vhost_net_stop_one(get_vhost_net(peer), dev, false);
>> +        }
>>       }
>>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..d858b4f 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct 
>> vhost_dev *dev, int idx)
>>       return idx;
>>   }
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned 
>> int ready)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = ready,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>           if (unlikely(!ok)) {
>>               return -1;
>>           }
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>           ok = vhost_vdpa_svqs_stop(dev);
>>           if (unlikely(!ok)) {
>>               return -1;
>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>           memory_listener_register(&v->listener, &address_space_memory);
>>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>       } else {
>> -        vhost_vdpa_reset_device(dev);
>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>                                      VIRTIO_CONFIG_S_DRIVER);
> Here's another issue (regression) got to address - the added 
> S_ACKNOWLEDGE | S_DRIVER bits will be cleared right away by the 
> follow-up reset in vhost_net_stop_one(... , true), which in turn will 
> cause virtio fail to initialize e.g. vhost_vdpa_set_features() will fail 
> to set VIRTIO_CONFIG_S_FEATURES_OK
> 
> Ideally the status bit should be set whenever the corresponding status 
> bit is set by virtio_net from virtio_net_vhost_status(), or practically 
> it can be done at the very beginning of vhost_dev_start(), for e.g. the 
> first call before vhost_dev_set_features(). For this purpose, you may 
> consider adding another vhost_init_device op, which is symmetric to 
> vhost_ops->vhost_reset_device in the vhost_net_stop() path.
> 

Seems only vdpa device need reset after stop, although virtio spec said 
need reset, but kernel doesn't reset, and if reset it has issue to 
reprobe virtio-net in guest, So we probely only add it after reset if it 
is VDPA device, for kernel and other datapath we just keep the same as 
before.

Thanks,
Michael

> Thanks,
> -Siwei
> 
>>           memory_listener_unregister(&v->listener);
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index b643f42..6d9b4a3 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1820,7 +1820,7 @@ fail_features:
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>   {
>>       int i;
>> -
>> +    printf("vhost_dev_stop test\n");
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>> @@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>       return -ENOSYS;
>>   }
>> +
>> +int vhost_dev_reset(struct vhost_dev *hdev)
>> +{
>> +    int ret = 0;
>> +
>> +    /* should only be called after backend is connected */
>> +    assert(hdev->vhost_ops);
>> +
>> +    if (hdev->vhost_ops->vhost_reset_device) {
>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>> +    }
>> +
>> +    return ret;
>> +}
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index 58a73e7..b8b7c20 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void 
>> *opaque,
>>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice 
>> *vdev);
>>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
>> VirtIODevice *vdev);
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
  2022-03-30 10:52   ` Michael S. Tsirkin
  2022-03-31  0:15   ` Si-Wei Liu
@ 2022-03-31  5:19   ` 08005325
  2022-03-31  8:55     ` Jason Wang
  2022-03-31  9:25   ` [PATCH RESEND " qiudayu
  3 siblings, 1 reply; 47+ messages in thread
From: 08005325 @ 2022-03-31  5:19 UTC (permalink / raw)
  To: jasowang, mst, si-wei.liu
  Cc: Michael Qiu, eperezma, lingshan.zhu, qemu-devel, lulu

From: Michael Qiu <qiudayu@archeros.com>

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
 the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stoped.

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v3 --> v2:
    Call vhost_dev_reset() at the end of vhost_net_stop().

    Since the vDPA device need re-add the status bit 
    VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
    simply, add them inside vhost_vdpa_reset_device, and
    the only way calling vhost_vdpa_reset_device is in
    vhost_net_stop(), so it keeps the same behavior as before.

v2 --> v1:
   Implement a new function vhost_dev_reset,
   reset the backend kernel device at last.
---
 hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
 hw/virtio/vhost-vdpa.c    | 15 +++++++++------
 hw/virtio/vhost.c         | 15 ++++++++++++++-
 include/hw/virtio/vhost.h |  1 +
 4 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..422c9bf 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     int total_notifiers = data_queue_pairs * 2 + cvq;
     VirtIONet *n = VIRTIO_NET(dev);
     int nvhosts = data_queue_pairs + cvq;
-    struct vhost_net *net;
+    struct vhost_net *net = NULL;
     int r, e, i, index_end = data_queue_pairs * 2;
     NetClientState *peer;
 
@@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 err_start:
     while (--i >= 0) {
         peer = qemu_get_peer(ncs , i);
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
     }
+
+    /* We only reset backend vdpa device */
+    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+        vhost_dev_reset(&net->dev);
+    }
+
     e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
     if (e < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
@@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     VirtIONet *n = VIRTIO_NET(dev);
     NetClientState *peer;
+    struct vhost_net *net = NULL;
     int total_notifiers = data_queue_pairs * 2 + cvq;
     int nvhosts = data_queue_pairs + cvq;
     int i, r;
@@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
         } else {
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
         }
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
+    }
+
+    /* We only reset backend vdpa device */
+    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+        vhost_dev_reset(&net->dev);
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..3ef0199 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
 
     ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
     trace_vhost_vdpa_reset_device(dev, status);
+
+    /* Add back this status, so that the device could work next time*/
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                               VIRTIO_CONFIG_S_DRIVER);
+
     return ret;
 }
 
@@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_vring_state state = {
             .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
     }
@@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
     } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
 
         return 0;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b643f42..7e0cdb6 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1820,7 +1820,6 @@ fail_features:
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
     int i;
-
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
@@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+int vhost_dev_reset(struct vhost_dev *hdev)
+{
+    int ret = 0;
+
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops);
+
+    if (hdev->vhost_ops->vhost_reset_device) {
+        ret = hdev->vhost_ops->vhost_reset_device(hdev);
+    }
+
+    return ret;
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7..b8b7c20 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_reset(struct vhost_dev *hdev);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-31  5:19   ` [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop() 08005325
@ 2022-03-31  8:55     ` Jason Wang
  2022-03-31  9:12       ` Maxime Coquelin
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Wang @ 2022-03-31  8:55 UTC (permalink / raw)
  To: 08005325
  Cc: Cindy Lu, mst, qemu-devel, eperezma, Michael Qiu, Si-Wei Liu,
	Zhu Lingshan

On Thu, Mar 31, 2022 at 1:20 PM <08005325@163.com> wrote:

Hi:

For some reason, I see the patch as an attachment.

Thanks



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-31  8:55     ` Jason Wang
@ 2022-03-31  9:12       ` Maxime Coquelin
  2022-03-31  9:22         ` Michael Qiu
  2022-04-01  2:55         ` Jason Wang
  0 siblings, 2 replies; 47+ messages in thread
From: Maxime Coquelin @ 2022-03-31  9:12 UTC (permalink / raw)
  To: Jason Wang, 08005325
  Cc: Cindy Lu, mst, qemu-devel, eperezma, Michael Qiu, Si-Wei Liu,
	Zhu Lingshan

Hi,

On 3/31/22 10:55, Jason Wang wrote:
> On Thu, Mar 31, 2022 at 1:20 PM <08005325@163.com> wrote:
> 
> Hi:
> 
> For some reason, I see the patch as an attachment.

We are starting to see this more and more since yesterday on DPDK
mailing list. It seems like an issue with mimecast, when the From: tag
is different from the sender.

Maxime

> Thanks
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-31  9:12       ` Maxime Coquelin
@ 2022-03-31  9:22         ` Michael Qiu
  2022-04-01  2:55         ` Jason Wang
  1 sibling, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-03-31  9:22 UTC (permalink / raw)
  To: Maxime Coquelin, Jason Wang, 08005325
  Cc: Cindy Lu, mst, qemu-devel, eperezma, Si-Wei Liu, Zhu Lingshan

Hi, all

To avoid tigger the issue of mimecast, I will re-post the V3 patch use 
the mail address "Michael Qiu <qiudayu@archeros.com>" as a workaround.


Thanks,
Michael
On 2022/3/31 17:12, Maxime Coquelin wrote:
> Hi,
> 
> On 3/31/22 10:55, Jason Wang wrote:
>> On Thu, Mar 31, 2022 at 1:20 PM <08005325@163.com> wrote:
>>
>> Hi:
>>
>> For some reason, I see the patch as an attachment.
> 
> We are starting to see this more and more since yesterday on DPDK
> mailing list. It seems like an issue with mimecast, when the From: tag
> is different from the sender.
> 
> Maxime
> 
>> Thanks
>>
>>
> 
> 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH RESEND v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
                     ` (2 preceding siblings ...)
  2022-03-31  5:19   ` [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop() 08005325
@ 2022-03-31  9:25   ` qiudayu
  2022-03-31 10:19     ` Michael Qiu
                       ` (3 more replies)
  3 siblings, 4 replies; 47+ messages in thread
From: qiudayu @ 2022-03-31  9:25 UTC (permalink / raw)
  To: jasowang, mst, si-wei.liu
  Cc: Michael Qiu, eperezma, lingshan.zhu, qemu-devel, lulu

From: Michael Qiu <qiudayu@archeros.com>

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
 the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stoped.

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v3 --> v2:
    Call vhost_dev_reset() at the end of vhost_net_stop().

    Since the vDPA device need re-add the status bit 
    VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
    simply, add them inside vhost_vdpa_reset_device, and
    the only way calling vhost_vdpa_reset_device is in
    vhost_net_stop(), so it keeps the same behavior as before.

v2 --> v1:
   Implement a new function vhost_dev_reset,
   reset the backend kernel device at last.
---
 hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
 hw/virtio/vhost-vdpa.c    | 15 +++++++++------
 hw/virtio/vhost.c         | 15 ++++++++++++++-
 include/hw/virtio/vhost.h |  1 +
 4 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..422c9bf 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     int total_notifiers = data_queue_pairs * 2 + cvq;
     VirtIONet *n = VIRTIO_NET(dev);
     int nvhosts = data_queue_pairs + cvq;
-    struct vhost_net *net;
+    struct vhost_net *net = NULL;
     int r, e, i, index_end = data_queue_pairs * 2;
     NetClientState *peer;
 
@@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 err_start:
     while (--i >= 0) {
         peer = qemu_get_peer(ncs , i);
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
     }
+
+    /* We only reset backend vdpa device */
+    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+        vhost_dev_reset(&net->dev);
+    }
+
     e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
     if (e < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
@@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     VirtIONet *n = VIRTIO_NET(dev);
     NetClientState *peer;
+    struct vhost_net *net = NULL;
     int total_notifiers = data_queue_pairs * 2 + cvq;
     int nvhosts = data_queue_pairs + cvq;
     int i, r;
@@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
         } else {
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
         }
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
+    }
+
+    /* We only reset backend vdpa device */
+    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+        vhost_dev_reset(&net->dev);
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..3ef0199 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
 
     ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
     trace_vhost_vdpa_reset_device(dev, status);
+
+    /* Add back this status, so that the device could work next time*/
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                               VIRTIO_CONFIG_S_DRIVER);
+
     return ret;
 }
 
@@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_vring_state state = {
             .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
     }
@@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
     } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
 
         return 0;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b643f42..7e0cdb6 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1820,7 +1820,6 @@ fail_features:
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
     int i;
-
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
@@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+int vhost_dev_reset(struct vhost_dev *hdev)
+{
+    int ret = 0;
+
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops);
+
+    if (hdev->vhost_ops->vhost_reset_device) {
+        ret = hdev->vhost_ops->vhost_reset_device(hdev);
+    }
+
+    return ret;
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7..b8b7c20 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_reset(struct vhost_dev *hdev);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
-- 
1.8.3.1





^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-31  9:25   ` [PATCH RESEND " qiudayu
@ 2022-03-31 10:19     ` Michael Qiu
       [not found]     ` <6245804d.1c69fb81.3c35c.d7efSMTPIN_ADDED_BROKEN@mx.google.com>
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-03-31 10:19 UTC (permalink / raw)
  To: jasowang; +Cc: lulu, mst, qemu-devel, eperezma, Si-Wei Liu, lingshan.zhu

Hi, Jason

Does it work this time?

On 2022/3/31 17:25, qiudayu@archeros.com wrote:
> From: Michael Qiu <qiudayu@archeros.com>
> 
> Currently, when VM poweroff, it will trigger vdpa
> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> queue pair and one control queue, triggered 3 times), this
> leads to below issue:
> 
> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> 
> This because in vhost_net_stop(), it will stop all vhost device bind to
> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> , then stop the queue: vhost_virtqueue_stop().
> 
> In vhost_dev_stop(), it resets the device, which clear some flags
> in low level driver, and in next loop(stop other vhost backends),
> qemu try to stop the queue corresponding to the vhost backend,
>   the driver finds that the VQ is invalied, this is the root cause.
> 
> To solve the issue, vdpa should set vring unready, and
> remove reset ops in device stop: vhost_dev_start(hdev, false).
> 
> and implement a new function vhost_dev_reset, only reset backend
> device after all vhost(per-queue) stoped.
> 
> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
> v3 --> v2:
>      Call vhost_dev_reset() at the end of vhost_net_stop().
> 
>      Since the vDPA device need re-add the status bit
>      VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>      simply, add them inside vhost_vdpa_reset_device, and
>      the only way calling vhost_vdpa_reset_device is in
>      vhost_net_stop(), so it keeps the same behavior as before.
> 
> v2 --> v1:
>     Implement a new function vhost_dev_reset,
>     reset the backend kernel device at last.
> ---
>   hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>   hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>   hw/virtio/vhost.c         | 15 ++++++++++++++-
>   include/hw/virtio/vhost.h |  1 +
>   4 files changed, 45 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 30379d2..422c9bf 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>       int total_notifiers = data_queue_pairs * 2 + cvq;
>       VirtIONet *n = VIRTIO_NET(dev);
>       int nvhosts = data_queue_pairs + cvq;
> -    struct vhost_net *net;
> +    struct vhost_net *net = NULL;
>       int r, e, i, index_end = data_queue_pairs * 2;
>       NetClientState *peer;
>   
> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>   err_start:
>       while (--i >= 0) {
>           peer = qemu_get_peer(ncs , i);
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        net = get_vhost_net(peer);
> +
> +        vhost_net_stop_one(net, dev);
>       }
> +
> +    /* We only reset backend vdpa device */
> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> +        vhost_dev_reset(&net->dev);
> +    }
> +
>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>       if (e < 0) {
>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>       VirtIONet *n = VIRTIO_NET(dev);
>       NetClientState *peer;
> +    struct vhost_net *net = NULL;
>       int total_notifiers = data_queue_pairs * 2 + cvq;
>       int nvhosts = data_queue_pairs + cvq;
>       int i, r;
> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>           } else {
>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>           }
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        net = get_vhost_net(peer);
> +
> +        vhost_net_stop_one(net, dev);
> +    }
> +
> +    /* We only reset backend vdpa device */
> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> +        vhost_dev_reset(&net->dev);
>       }
>   
>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..3ef0199 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>   
>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>       trace_vhost_vdpa_reset_device(dev, status);
> +
> +    /* Add back this status, so that the device could work next time*/
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                               VIRTIO_CONFIG_S_DRIVER);
> +
>       return ret;
>   }
>   
> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>       return idx;
>   }
>   
> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>   {
>       int i;
>       trace_vhost_vdpa_set_vring_ready(dev);
>       for (i = 0; i < dev->nvqs; ++i) {
>           struct vhost_vring_state state = {
>               .index = dev->vq_index + i,
> -            .num = 1,
> +            .num = ready,
>           };
>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>       }
> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           if (unlikely(!ok)) {
>               return -1;
>           }
> -        vhost_vdpa_set_vring_ready(dev);
> +        vhost_vdpa_set_vring_ready(dev, 1);
>       } else {
> +        vhost_vdpa_set_vring_ready(dev, 0);
>           ok = vhost_vdpa_svqs_stop(dev);
>           if (unlikely(!ok)) {
>               return -1;
> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           memory_listener_register(&v->listener, &address_space_memory);
>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>       } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
>           memory_listener_unregister(&v->listener);
>   
>           return 0;
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index b643f42..7e0cdb6 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1820,7 +1820,6 @@ fail_features:
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>   {
>       int i;
> -
>       /* should only be called after backend is connected */
>       assert(hdev->vhost_ops);
>   
> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>   
>       return -ENOSYS;
>   }
> +
> +int vhost_dev_reset(struct vhost_dev *hdev)
> +{
> +    int ret = 0;
> +
> +    /* should only be called after backend is connected */
> +    assert(hdev->vhost_ops);
> +
> +    if (hdev->vhost_ops->vhost_reset_device) {
> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
> +    }
> +
> +    return ret;
> +}
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 58a73e7..b8b7c20 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> +int vhost_dev_reset(struct vhost_dev *hdev);
>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v3] vdpa: reset the backend device in the end of vhost_net_stop()
       [not found]     ` <6245804d.1c69fb81.3c35c.d7efSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2022-03-31 20:32       ` Michael S. Tsirkin
  0 siblings, 0 replies; 47+ messages in thread
From: Michael S. Tsirkin @ 2022-03-31 20:32 UTC (permalink / raw)
  To: Michael Qiu
  Cc: lulu, jasowang, qemu-devel, eperezma, Si-Wei Liu, lingshan.zhu

On Thu, Mar 31, 2022 at 06:19:37PM +0800, Michael Qiu wrote:
> Hi, Jason
> 
> Does it work this time?

Nope. Just use git-send-email.

-- 
MST



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-31  9:25   ` [PATCH RESEND " qiudayu
  2022-03-31 10:19     ` Michael Qiu
       [not found]     ` <6245804d.1c69fb81.3c35c.d7efSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2022-04-01  1:12     ` Si-Wei Liu
  2022-04-01  1:45       ` Michael Qiu
  2022-04-01  1:31     ` [PATCH v4] " Michael Qiu
  3 siblings, 1 reply; 47+ messages in thread
From: Si-Wei Liu @ 2022-04-01  1:12 UTC (permalink / raw)
  To: qiudayu, jasowang, mst; +Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 3/31/2022 2:25 AM, qiudayu@archeros.com wrote:
> From: Michael Qiu <qiudayu@archeros.com>
>
> Currently, when VM poweroff, it will trigger vdpa
> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> queue pair and one control queue, triggered 3 times), this
> leads to below issue:
>
> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>
> This because in vhost_net_stop(), it will stop all vhost device bind to
> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> , then stop the queue: vhost_virtqueue_stop().
>
> In vhost_dev_stop(), it resets the device, which clear some flags
> in low level driver, and in next loop(stop other vhost backends),
> qemu try to stop the queue corresponding to the vhost backend,
>   the driver finds that the VQ is invalied, this is the root cause.
>
> To solve the issue, vdpa should set vring unready, and
> remove reset ops in device stop: vhost_dev_start(hdev, false).
>
> and implement a new function vhost_dev_reset, only reset backend
> device after all vhost(per-queue) stoped.
>
> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
> v3 --> v2:
>      Call vhost_dev_reset() at the end of vhost_net_stop().
>
>      Since the vDPA device need re-add the status bit
>      VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>      simply, add them inside vhost_vdpa_reset_device, and
>      the only way calling vhost_vdpa_reset_device is in
>      vhost_net_stop(), so it keeps the same behavior as before.
>
> v2 --> v1:
>     Implement a new function vhost_dev_reset,
>     reset the backend kernel device at last.
> ---
>   hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>   hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>   hw/virtio/vhost.c         | 15 ++++++++++++++-
>   include/hw/virtio/vhost.h |  1 +
>   4 files changed, 45 insertions(+), 10 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 30379d2..422c9bf 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>       int total_notifiers = data_queue_pairs * 2 + cvq;
>       VirtIONet *n = VIRTIO_NET(dev);
>       int nvhosts = data_queue_pairs + cvq;
> -    struct vhost_net *net;
> +    struct vhost_net *net = NULL;
>       int r, e, i, index_end = data_queue_pairs * 2;
>       NetClientState *peer;
>   
> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>   err_start:
>       while (--i >= 0) {
>           peer = qemu_get_peer(ncs , i);
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        net = get_vhost_net(peer);
> +
> +        vhost_net_stop_one(net, dev);
>       }
> +
> +    /* We only reset backend vdpa device */
> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
I would reset the device anyway regardless the first vhost_dev. Some 
ioctl calls may have well changed device state in vhost_dev_start() that 
has no way to get back than reset.

> +        vhost_dev_reset(&net->dev);
I would move this to the end as it's more sensible to reset the device 
after guest notifier is disabled.
> +    }
> +
>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>       if (e < 0) {
>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>       VirtIONet *n = VIRTIO_NET(dev);
>       NetClientState *peer;
> +    struct vhost_net *net = NULL;
>       int total_notifiers = data_queue_pairs * 2 + cvq;
>       int nvhosts = data_queue_pairs + cvq;
>       int i, r;
> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>           } else {
>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>           }
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        net = get_vhost_net(peer);
> +
> +        vhost_net_stop_one(net, dev);
> +    }
> +
> +    /* We only reset backend vdpa device */
> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
Yikes, I think it needs some code refactoring here without having to 
check VHOST_BACKEND_TYPE_VDPA explicitly. Historically the 
.vhost_reset_device() op was misnamed: it was initially meant for 
RESET_OWNER but never got used. Could you add a new .vhost_reset_owner() 
op to VhostOps (via another patch) and rename properly, e.g. from 
vhost_kernel_reset_device() to vhost_kernel_reset_owner()? For 
vhost_user_reset_device(), you can safely factor out the 
VHOST_USER_RESET_OWNER case to a new vhost_user_reset_owner() function, 
and only reset the device in vhost_user_reset_device() depending on the 
VHOST_USER_PROTOCOL_F_RESET_DEVICE protocol feature.

With this change, vhost_reset_device will be effectively a no-op on 
vhost_kernel (NULL) and vhost_user (only applicable to vhost-user-scsi 
backend which supports VHOST_USER_PROTOCOL_F_RESET_DEVICE).
> +        vhost_dev_reset(&net->dev);
I would move this to the end as it's more sensible to reset the device 
after guest notifier is disabled.

>       }
>   
>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..3ef0199 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>   
>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>       trace_vhost_vdpa_reset_device(dev, status);
> +
> +    /* Add back this status, so that the device could work next time*/
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                               VIRTIO_CONFIG_S_DRIVER);
> +
Hmmm, this might not be the ideal place, but I'm fine to leave it as-is. 
It would need some more future work in code refactoring for e.g. live 
migration and error recovery.

Thanks,
-Siwei

>       return ret;
>   }
>   
> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>       return idx;
>   }
>   
> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>   {
>       int i;
>       trace_vhost_vdpa_set_vring_ready(dev);
>       for (i = 0; i < dev->nvqs; ++i) {
>           struct vhost_vring_state state = {
>               .index = dev->vq_index + i,
> -            .num = 1,
> +            .num = ready,
>           };
>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>       }
> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           if (unlikely(!ok)) {
>               return -1;
>           }
> -        vhost_vdpa_set_vring_ready(dev);
> +        vhost_vdpa_set_vring_ready(dev, 1);
>       } else {
> +        vhost_vdpa_set_vring_ready(dev, 0);
>           ok = vhost_vdpa_svqs_stop(dev);
>           if (unlikely(!ok)) {
>               return -1;
> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           memory_listener_register(&v->listener, &address_space_memory);
>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>       } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
>           memory_listener_unregister(&v->listener);
>   
>           return 0;
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index b643f42..7e0cdb6 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1820,7 +1820,6 @@ fail_features:
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>   {
>       int i;
> -
>       /* should only be called after backend is connected */
>       assert(hdev->vhost_ops);
>   
> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>   
>       return -ENOSYS;
>   }
> +
> +int vhost_dev_reset(struct vhost_dev *hdev)
> +{
> +    int ret = 0;
> +
> +    /* should only be called after backend is connected */
> +    assert(hdev->vhost_ops);
> +
> +    if (hdev->vhost_ops->vhost_reset_device) {
> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
> +    }
> +
> +    return ret;
> +}
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 58a73e7..b8b7c20 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> +int vhost_dev_reset(struct vhost_dev *hdev);
>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-31  9:25   ` [PATCH RESEND " qiudayu
                       ` (2 preceding siblings ...)
  2022-04-01  1:12     ` Si-Wei Liu
@ 2022-04-01  1:31     ` Michael Qiu
  2022-04-01  2:53       ` Jason Wang
  2022-04-01 11:06       ` [PATCH 0/3] Refactor vhost device reset Michael Qiu
  3 siblings, 2 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-01  1:31 UTC (permalink / raw)
  To: jasowang, mst, si-wei.liu
  Cc: Michael Qiu, eperezma, lingshan.zhu, qemu-devel, lulu

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
 the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stoped.

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v4 --> v3
    Nothing changed, becasue of issue with mimecast,
    when the From: tag is different from the sender,
    the some mail client will take the patch as an
    attachment, RESEND v3 does not work, So resend
    the patch as v4

v3 --> v2:
    Call vhost_dev_reset() at the end of vhost_net_stop().

    Since the vDPA device need re-add the status bit 
    VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
    simply, add them inside vhost_vdpa_reset_device, and
    the only way calling vhost_vdpa_reset_device is in
    vhost_net_stop(), so it keeps the same behavior as before.

v2 --> v1:
   Implement a new function vhost_dev_reset,
   reset the backend kernel device at last.
---
 hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
 hw/virtio/vhost-vdpa.c    | 15 +++++++++------
 hw/virtio/vhost.c         | 15 ++++++++++++++-
 include/hw/virtio/vhost.h |  1 +
 4 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..422c9bf 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     int total_notifiers = data_queue_pairs * 2 + cvq;
     VirtIONet *n = VIRTIO_NET(dev);
     int nvhosts = data_queue_pairs + cvq;
-    struct vhost_net *net;
+    struct vhost_net *net = NULL;
     int r, e, i, index_end = data_queue_pairs * 2;
     NetClientState *peer;
 
@@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 err_start:
     while (--i >= 0) {
         peer = qemu_get_peer(ncs , i);
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
     }
+
+    /* We only reset backend vdpa device */
+    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+        vhost_dev_reset(&net->dev);
+    }
+
     e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
     if (e < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
@@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     VirtIONet *n = VIRTIO_NET(dev);
     NetClientState *peer;
+    struct vhost_net *net = NULL;
     int total_notifiers = data_queue_pairs * 2 + cvq;
     int nvhosts = data_queue_pairs + cvq;
     int i, r;
@@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
         } else {
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
         }
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
+    }
+
+    /* We only reset backend vdpa device */
+    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+        vhost_dev_reset(&net->dev);
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..3ef0199 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
 
     ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
     trace_vhost_vdpa_reset_device(dev, status);
+
+    /* Add back this status, so that the device could work next time*/
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                               VIRTIO_CONFIG_S_DRIVER);
+
     return ret;
 }
 
@@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_vring_state state = {
             .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
     }
@@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
     } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
 
         return 0;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b643f42..7e0cdb6 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1820,7 +1820,6 @@ fail_features:
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
     int i;
-
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
@@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+int vhost_dev_reset(struct vhost_dev *hdev)
+{
+    int ret = 0;
+
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops);
+
+    if (hdev->vhost_ops->vhost_reset_device) {
+        ret = hdev->vhost_ops->vhost_reset_device(hdev);
+    }
+
+    return ret;
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7..b8b7c20 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_reset(struct vhost_dev *hdev);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
-- 
1.8.3.1





^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-01  1:12     ` Si-Wei Liu
@ 2022-04-01  1:45       ` Michael Qiu
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-01  1:45 UTC (permalink / raw)
  To: Si-Wei Liu, jasowang, mst; +Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 2022/4/1 9:12, Si-Wei Liu worte:
> 
> 
> On 3/31/2022 2:25 AM, qiudayu@archeros.com wrote:
>> From: Michael Qiu <qiudayu@archeros.com>
>>
>> Currently, when VM poweroff, it will trigger vdpa
>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>> queue pair and one control queue, triggered 3 times), this
>> leads to below issue:
>>
>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>
>> This because in vhost_net_stop(), it will stop all vhost device bind to
>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the 
>> device
>> , then stop the queue: vhost_virtqueue_stop().
>>
>> In vhost_dev_stop(), it resets the device, which clear some flags
>> in low level driver, and in next loop(stop other vhost backends),
>> qemu try to stop the queue corresponding to the vhost backend,
>>   the driver finds that the VQ is invalied, this is the root cause.
>>
>> To solve the issue, vdpa should set vring unready, and
>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>
>> and implement a new function vhost_dev_reset, only reset backend
>> device after all vhost(per-queue) stoped.
>>
>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>> ---
>> v3 --> v2:
>>      Call vhost_dev_reset() at the end of vhost_net_stop().
>>
>>      Since the vDPA device need re-add the status bit
>>      VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>      simply, add them inside vhost_vdpa_reset_device, and
>>      the only way calling vhost_vdpa_reset_device is in
>>      vhost_net_stop(), so it keeps the same behavior as before.
>>
>> v2 --> v1:
>>     Implement a new function vhost_dev_reset,
>>     reset the backend kernel device at last.
>> ---
>>   hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>   hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>   hw/virtio/vhost.c         | 15 ++++++++++++++-
>>   include/hw/virtio/vhost.h |  1 +
>>   4 files changed, 45 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index 30379d2..422c9bf 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, 
>> NetClientState *ncs,
>>       int total_notifiers = data_queue_pairs * 2 + cvq;
>>       VirtIONet *n = VIRTIO_NET(dev);
>>       int nvhosts = data_queue_pairs + cvq;
>> -    struct vhost_net *net;
>> +    struct vhost_net *net = NULL;
>>       int r, e, i, index_end = data_queue_pairs * 2;
>>       NetClientState *peer;
>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, 
>> NetClientState *ncs,
>>   err_start:
>>       while (--i >= 0) {
>>           peer = qemu_get_peer(ncs , i);
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        net = get_vhost_net(peer);
>> +
>> +        vhost_net_stop_one(net, dev);
>>       }
>> +
>> +    /* We only reset backend vdpa device */
>> +    if (net && net->dev.vhost_ops->backend_type == 
>> VHOST_BACKEND_TYPE_VDPA) {
> I would reset the device anyway regardless the first vhost_dev. Some 
> ioctl calls may have well changed device state in vhost_dev_start() that 
> has no way to get back than reset.
> 
Here I just use the first vhost_dev as nothing is different in each 
vhost_dev, reset just set 0 to the vhost-vdpa socket FD. In all 
vhost_dev, FD is the same.

>> +        vhost_dev_reset(&net->dev);
> I would move this to the end as it's more sensible to reset the device 
> after guest notifier is disabled.

I will move it in next patch

>> +    }
>> +
>>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>       if (e < 0) {
>>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", 
>> e);
>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, 
>> NetClientState *ncs,
>>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>       VirtIONet *n = VIRTIO_NET(dev);
>>       NetClientState *peer;
>> +    struct vhost_net *net = NULL;
>>       int total_notifiers = data_queue_pairs * 2 + cvq;
>>       int nvhosts = data_queue_pairs + cvq;
>>       int i, r;
>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, 
>> NetClientState *ncs,
>>           } else {
>>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>           }
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        net = get_vhost_net(peer);
>> +
>> +        vhost_net_stop_one(net, dev);
>> +    }
>> +
>> +    /* We only reset backend vdpa device */
>> +    if (net && net->dev.vhost_ops->backend_type == 
>> VHOST_BACKEND_TYPE_VDPA) {
> Yikes, I think it needs some code refactoring here without having to 
> check VHOST_BACKEND_TYPE_VDPA explicitly. Historically the 
> .vhost_reset_device() op was misnamed: it was initially meant for 
> RESET_OWNER but never got used. Could you add a new .vhost_reset_owner() 
> op to VhostOps (via another patch) and rename properly, e.g. from 
> vhost_kernel_reset_device() to vhost_kernel_reset_owner()? For 
> vhost_user_reset_device(), you can safely factor out the 
> VHOST_USER_RESET_OWNER case to a new vhost_user_reset_owner() function, 
> and only reset the device in vhost_user_reset_device() depending on the 
> VHOST_USER_PROTOCOL_F_RESET_DEVICE protocol feature.
> 
> With this change, vhost_reset_device will be effectively a no-op on 
> vhost_kernel (NULL) and vhost_user (only applicable to vhost-user-scsi 
> backend which supports VHOST_USER_PROTOCOL_F_RESET_DEVICE).


OK, I will make a patch to do that.

>> +        vhost_dev_reset(&net->dev);
> I would move this to the end as it's more sensible to reset the device 
> after guest notifier is disabled.
> 
>>       }
>>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..3ef0199 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct 
>> vhost_dev *dev)
>>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>       trace_vhost_vdpa_reset_device(dev, status);
>> +
>> +    /* Add back this status, so that the device could work next time*/
>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>> +                               VIRTIO_CONFIG_S_DRIVER);
>> +
> Hmmm, this might not be the ideal place, but I'm fine to leave it as-is. 
> It would need some more future work in code refactoring for e.g. live 
> migration and error recovery.
> 
> Thanks,
> -Siwei
> 
>>       return ret;
>>   }
>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct 
>> vhost_dev *dev, int idx)
>>       return idx;
>>   }
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned 
>> int ready)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = ready,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>           if (unlikely(!ok)) {
>>               return -1;
>>           }
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>           ok = vhost_vdpa_svqs_stop(dev);
>>           if (unlikely(!ok)) {
>>               return -1;
>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>> *dev, bool started)
>>           memory_listener_register(&v->listener, &address_space_memory);
>>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>       } else {
>> -        vhost_vdpa_reset_device(dev);
>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>           memory_listener_unregister(&v->listener);
>>           return 0;
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index b643f42..7e0cdb6 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1820,7 +1820,6 @@ fail_features:
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>   {
>>       int i;
>> -
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>       return -ENOSYS;
>>   }
>> +
>> +int vhost_dev_reset(struct vhost_dev *hdev)
>> +{
>> +    int ret = 0;
>> +
>> +    /* should only be called after backend is connected */
>> +    assert(hdev->vhost_ops);
>> +
>> +    if (hdev->vhost_ops->vhost_reset_device) {
>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>> +    }
>> +
>> +    return ret;
>> +}
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index 58a73e7..b8b7c20 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void 
>> *opaque,
>>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice 
>> *vdev);
>>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
>> VirtIODevice *vdev);
> 
> 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-01  1:31     ` [PATCH v4] " Michael Qiu
@ 2022-04-01  2:53       ` Jason Wang
  2022-04-01  3:20         ` Michael Qiu
                           ` (2 more replies)
  2022-04-01 11:06       ` [PATCH 0/3] Refactor vhost device reset Michael Qiu
  1 sibling, 3 replies; 47+ messages in thread
From: Jason Wang @ 2022-04-01  2:53 UTC (permalink / raw)
  To: Michael Qiu; +Cc: Cindy Lu, mst, qemu-devel, eperezma, Si-Wei Liu, Zhu Lingshan

On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
>
> Currently, when VM poweroff, it will trigger vdpa
> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> queue pair and one control queue, triggered 3 times), this
> leads to below issue:
>
> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>
> This because in vhost_net_stop(), it will stop all vhost device bind to
> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> , then stop the queue: vhost_virtqueue_stop().
>
> In vhost_dev_stop(), it resets the device, which clear some flags
> in low level driver, and in next loop(stop other vhost backends),
> qemu try to stop the queue corresponding to the vhost backend,
>  the driver finds that the VQ is invalied, this is the root cause.
>
> To solve the issue, vdpa should set vring unready, and
> remove reset ops in device stop: vhost_dev_start(hdev, false).
>
> and implement a new function vhost_dev_reset, only reset backend
> device after all vhost(per-queue) stoped.

Typo.

>
> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Rethink this patch, consider there're devices that don't support
set_vq_ready(). I wonder if we need

1) uAPI to tell the user space whether or not it supports set_vq_ready()
2) userspace will call SET_VRING_ENABLE() when the device supports
otherwise it will use RESET.

And for safety, I suggest tagging this as 7.1.

> ---
> v4 --> v3
>     Nothing changed, becasue of issue with mimecast,
>     when the From: tag is different from the sender,
>     the some mail client will take the patch as an
>     attachment, RESEND v3 does not work, So resend
>     the patch as v4
>
> v3 --> v2:
>     Call vhost_dev_reset() at the end of vhost_net_stop().
>
>     Since the vDPA device need re-add the status bit
>     VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>     simply, add them inside vhost_vdpa_reset_device, and
>     the only way calling vhost_vdpa_reset_device is in
>     vhost_net_stop(), so it keeps the same behavior as before.
>
> v2 --> v1:
>    Implement a new function vhost_dev_reset,
>    reset the backend kernel device at last.
> ---
>  hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>  hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>  hw/virtio/vhost.c         | 15 ++++++++++++++-
>  include/hw/virtio/vhost.h |  1 +
>  4 files changed, 45 insertions(+), 10 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 30379d2..422c9bf 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>      int total_notifiers = data_queue_pairs * 2 + cvq;
>      VirtIONet *n = VIRTIO_NET(dev);
>      int nvhosts = data_queue_pairs + cvq;
> -    struct vhost_net *net;
> +    struct vhost_net *net = NULL;
>      int r, e, i, index_end = data_queue_pairs * 2;
>      NetClientState *peer;
>
> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>  err_start:
>      while (--i >= 0) {
>          peer = qemu_get_peer(ncs , i);
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        net = get_vhost_net(peer);
> +
> +        vhost_net_stop_one(net, dev);
>      }
> +
> +    /* We only reset backend vdpa device */
> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> +        vhost_dev_reset(&net->dev);
> +    }
> +
>      e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>      if (e < 0) {
>          fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>      VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>      VirtIONet *n = VIRTIO_NET(dev);
>      NetClientState *peer;
> +    struct vhost_net *net = NULL;
>      int total_notifiers = data_queue_pairs * 2 + cvq;
>      int nvhosts = data_queue_pairs + cvq;
>      int i, r;
> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>          } else {
>              peer = qemu_get_peer(ncs, n->max_queue_pairs);
>          }
> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> +
> +        net = get_vhost_net(peer);
> +
> +        vhost_net_stop_one(net, dev);
> +    }
> +
> +    /* We only reset backend vdpa device */
> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> +        vhost_dev_reset(&net->dev);
>      }

So we've already reset the device in vhost_vdpa_dev_start(), any
reason we need to do it again here?

>
>      r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c5ed7a3..3ef0199 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>
>      ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>      trace_vhost_vdpa_reset_device(dev, status);
> +
> +    /* Add back this status, so that the device could work next time*/
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                               VIRTIO_CONFIG_S_DRIVER);

This seems to contradict the semantic of reset.

> +
>      return ret;
>  }
>
> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>      return idx;
>  }
>
> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>  {
>      int i;
>      trace_vhost_vdpa_set_vring_ready(dev);
>      for (i = 0; i < dev->nvqs; ++i) {
>          struct vhost_vring_state state = {
>              .index = dev->vq_index + i,
> -            .num = 1,
> +            .num = ready,
>          };
>          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>      }
> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          if (unlikely(!ok)) {
>              return -1;
>          }
> -        vhost_vdpa_set_vring_ready(dev);
> +        vhost_vdpa_set_vring_ready(dev, 1);
>      } else {
> +        vhost_vdpa_set_vring_ready(dev, 0);
>          ok = vhost_vdpa_svqs_stop(dev);
>          if (unlikely(!ok)) {
>              return -1;
> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          memory_listener_register(&v->listener, &address_space_memory);
>          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>      } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
>          memory_listener_unregister(&v->listener);
>
>          return 0;
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index b643f42..7e0cdb6 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1820,7 +1820,6 @@ fail_features:
>  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>  {
>      int i;
> -

Unnecessary changes.

>      /* should only be called after backend is connected */
>      assert(hdev->vhost_ops);
>
> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>
>      return -ENOSYS;
>  }
> +
> +int vhost_dev_reset(struct vhost_dev *hdev)
> +{

Let's use a separate patch for this.

Thanks

> +    int ret = 0;
> +
> +    /* should only be called after backend is connected */
> +    assert(hdev->vhost_ops);
> +
> +    if (hdev->vhost_ops->vhost_reset_device) {
> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
> +    }
> +
> +    return ret;
> +}
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 58a73e7..b8b7c20 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>  void vhost_dev_cleanup(struct vhost_dev *hdev);
>  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> +int vhost_dev_reset(struct vhost_dev *hdev);
>  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>  void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>
> --
> 1.8.3.1
>
>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-03-31  9:12       ` Maxime Coquelin
  2022-03-31  9:22         ` Michael Qiu
@ 2022-04-01  2:55         ` Jason Wang
  1 sibling, 0 replies; 47+ messages in thread
From: Jason Wang @ 2022-04-01  2:55 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Cindy Lu, mst, qemu-devel, eperezma, Michael Qiu, Si-Wei Liu,
	Zhu Lingshan, 08005325

On Thu, Mar 31, 2022 at 5:12 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Hi,
>
> On 3/31/22 10:55, Jason Wang wrote:
> > On Thu, Mar 31, 2022 at 1:20 PM <08005325@163.com> wrote:
> >
> > Hi:
> >
> > For some reason, I see the patch as an attachment.
>
> We are starting to see this more and more since yesterday on DPDK
> mailing list. It seems like an issue with mimecast, when the From: tag
> is different from the sender.
>
> Maxime

I see. Thanks

>
> > Thanks
> >
> >
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-01  2:53       ` Jason Wang
@ 2022-04-01  3:20         ` Michael Qiu
  2022-04-01 23:07         ` Si-Wei Liu
       [not found]         ` <62466fff.1c69fb81.8817a.d813SMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-01  3:20 UTC (permalink / raw)
  To: Jason Wang; +Cc: Cindy Lu, mst, qemu-devel, eperezma, Si-Wei Liu, Zhu Lingshan



On 2022/4/1 10:53, Jason Wang wrote:
> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
>>
>> Currently, when VM poweroff, it will trigger vdpa
>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>> queue pair and one control queue, triggered 3 times), this
>> leads to below issue:
>>
>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>
>> This because in vhost_net_stop(), it will stop all vhost device bind to
>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
>> , then stop the queue: vhost_virtqueue_stop().
>>
>> In vhost_dev_stop(), it resets the device, which clear some flags
>> in low level driver, and in next loop(stop other vhost backends),
>> qemu try to stop the queue corresponding to the vhost backend,
>>   the driver finds that the VQ is invalied, this is the root cause.
>>
>> To solve the issue, vdpa should set vring unready, and
>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>
>> and implement a new function vhost_dev_reset, only reset backend
>> device after all vhost(per-queue) stoped.
> 
> Typo.
> 
>>
>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> Rethink this patch, consider there're devices that don't support
> set_vq_ready(). I wonder if we need
> 
> 1) uAPI to tell the user space whether or not it supports set_vq_ready()
> 2) userspace will call SET_VRING_ENABLE() when the device supports
> otherwise it will use RESET.

if the device does not support set_vq_ready() in kernel, it will trigger 
kernel oops, at least in current kernel, it does not check where 
set_vq_ready has been implemented.

And I checked all vdpa driver in kernel, all drivers has implemented 
this ops.

So I think it is OK to call set_vq_ready without check.

> 
> And for safety, I suggest tagging this as 7.1.
> 
>> ---
>> v4 --> v3
>>      Nothing changed, becasue of issue with mimecast,
>>      when the From: tag is different from the sender,
>>      the some mail client will take the patch as an
>>      attachment, RESEND v3 does not work, So resend
>>      the patch as v4
>>
>> v3 --> v2:
>>      Call vhost_dev_reset() at the end of vhost_net_stop().
>>
>>      Since the vDPA device need re-add the status bit
>>      VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>      simply, add them inside vhost_vdpa_reset_device, and
>>      the only way calling vhost_vdpa_reset_device is in
>>      vhost_net_stop(), so it keeps the same behavior as before.
>>
>> v2 --> v1:
>>     Implement a new function vhost_dev_reset,
>>     reset the backend kernel device at last.
>> ---
>>   hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>   hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>   hw/virtio/vhost.c         | 15 ++++++++++++++-
>>   include/hw/virtio/vhost.h |  1 +
>>   4 files changed, 45 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index 30379d2..422c9bf 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>       int total_notifiers = data_queue_pairs * 2 + cvq;
>>       VirtIONet *n = VIRTIO_NET(dev);
>>       int nvhosts = data_queue_pairs + cvq;
>> -    struct vhost_net *net;
>> +    struct vhost_net *net = NULL;
>>       int r, e, i, index_end = data_queue_pairs * 2;
>>       NetClientState *peer;
>>
>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>   err_start:
>>       while (--i >= 0) {
>>           peer = qemu_get_peer(ncs , i);
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        net = get_vhost_net(peer);
>> +
>> +        vhost_net_stop_one(net, dev);
>>       }
>> +
>> +    /* We only reset backend vdpa device */
>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>> +        vhost_dev_reset(&net->dev);
>> +    }
>> +
>>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>       if (e < 0) {
>>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>       VirtIONet *n = VIRTIO_NET(dev);
>>       NetClientState *peer;
>> +    struct vhost_net *net = NULL;
>>       int total_notifiers = data_queue_pairs * 2 + cvq;
>>       int nvhosts = data_queue_pairs + cvq;
>>       int i, r;
>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>           } else {
>>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>           }
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        net = get_vhost_net(peer);
>> +
>> +        vhost_net_stop_one(net, dev);
>> +    }
>> +
>> +    /* We only reset backend vdpa device */
>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>> +        vhost_dev_reset(&net->dev);
>>       }
> 
> So we've already reset the device in vhost_vdpa_dev_start(), any
> reason we need to do it again here?

reset device in vhost_vdpa_dev_start if there is some error with start.


> 
>>
>>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..3ef0199 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>
>>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>       trace_vhost_vdpa_reset_device(dev, status);
>> +
>> +    /* Add back this status, so that the device could work next time*/
>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>> +                               VIRTIO_CONFIG_S_DRIVER);
> 
> This seems to contradict the semantic of reset

Yes, but it's hard to put it in other place, seems only vhost-vdpa need 
it, and for VM shutdown, qemu_del_nic() will do cleanup this like close 
vhost fds, which will call reset in kernel space without set those features.

So at last I put it here with no other inpact.

Thanks,
Michael
> 
>> +
>>       return ret;
>>   }
>>
>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = ready,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           if (unlikely(!ok)) {
>>               return -1;
>>           }
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>           ok = vhost_vdpa_svqs_stop(dev);
>>           if (unlikely(!ok)) {
>>               return -1;
>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           memory_listener_register(&v->listener, &address_space_memory);
>>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>       } else {
>> -        vhost_vdpa_reset_device(dev);
>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>           memory_listener_unregister(&v->listener);
>>
>>           return 0;
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index b643f42..7e0cdb6 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1820,7 +1820,6 @@ fail_features:
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>   {
>>       int i;
>> -
> 
> Unnecessary changes.
> 
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>>
>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>
>>       return -ENOSYS;
>>   }
>> +
>> +int vhost_dev_reset(struct vhost_dev *hdev)
>> +{
> 
> Let's use a separate patch for this.
> 
> Thanks
> 
>> +    int ret = 0;
>> +
>> +    /* should only be called after backend is connected */
>> +    assert(hdev->vhost_ops);
>> +
>> +    if (hdev->vhost_ops->vhost_reset_device) {
>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>> +    }
>> +
>> +    return ret;
>> +}
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index 58a73e7..b8b7c20 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>
>> --
>> 1.8.3.1
>>
>>
>>
> 
> 
> 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 0/3] Refactor vhost device reset
  2022-04-01  1:31     ` [PATCH v4] " Michael Qiu
  2022-04-01  2:53       ` Jason Wang
@ 2022-04-01 11:06       ` Michael Qiu
  2022-04-01 11:06         ` [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps Michael Qiu
                           ` (2 more replies)
  1 sibling, 3 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-01 11:06 UTC (permalink / raw)
  To: jasowang, mst, si-wei.liu
  Cc: Michael Qiu, eperezma, lingshan.zhu, qemu-devel, lulu

Now adays, vhost framework does a misnamed for
vhost_reset_device(), actually we need seperate vhost_reset_device()
and vhost_reset_owner(), this patchset refactor it, and make different
backend call the right function.

Base on those work, fix an issue of vdpa device reset for several times.

Test with kernel vhost, vhost-vdpa, DPDK vhost-user(vdpa), with shutdown
,reboot, and load/unload virtio_net driver in guest.

Michael Qiu (3):
  vhost: Refactor vhost_reset_device() in VhostOps
  vhost: add vhost_dev_reset()
  vdpa: reset the backend device in the end of vhost_net_stop()

 hw/net/vhost_net.c                | 22 +++++++++++++++++++---
 hw/scsi/vhost-user-scsi.c         |  6 +++++-
 hw/virtio/vhost-backend.c         |  4 ++--
 hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
 hw/virtio/vhost-vdpa.c            | 15 +++++++++------
 hw/virtio/vhost.c                 | 14 ++++++++++++++
 include/hw/virtio/vhost-backend.h |  2 ++
 include/hw/virtio/vhost.h         |  1 +
 8 files changed, 70 insertions(+), 16 deletions(-)

-- 
1.8.3.1




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-01 11:06       ` [PATCH 0/3] Refactor vhost device reset Michael Qiu
@ 2022-04-01 11:06         ` Michael Qiu
  2022-04-02  0:44           ` Si-Wei Liu
  2022-04-02  2:38           ` Jason Wang
  2022-04-01 11:06         ` [PATCH 2/3] vhost: add vhost_dev_reset() Michael Qiu
  2022-04-01 11:06         ` [PATCH 3/3 v5] vdpa: reset the backend device in the end of vhost_net_stop() Michael Qiu
  2 siblings, 2 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-01 11:06 UTC (permalink / raw)
  To: jasowang, mst, si-wei.liu
  Cc: Michael Qiu, eperezma, lingshan.zhu, qemu-devel, lulu

Currently in vhost framwork, vhost_reset_device() is misnamed.
Actually, it should be vhost_reset_owner().

In vhost user, it make compatible with reset device ops, but
vhost kernel does not compatible with it, for vhost vdpa, it
only implement reset device action.

So we need seperate the function into vhost_reset_owner() and
vhost_reset_device(). So that different backend could use the
correct function.

Signde-off-by: Michael Qiu <qiudayu@archeros.com>
---
 hw/scsi/vhost-user-scsi.c         |  6 +++++-
 hw/virtio/vhost-backend.c         |  4 ++--
 hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
 include/hw/virtio/vhost-backend.h |  2 ++
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 1b2f7ee..f179626 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
         return;
     }
 
-    if (dev->vhost_ops->vhost_reset_device) {
+    if (virtio_has_feature(dev->protocol_features,
+                           VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
+                           dev->vhost_ops->vhost_reset_device) {
         dev->vhost_ops->vhost_reset_device(dev);
+    } else if (dev->vhost_ops->vhost_reset_owner) {
+        dev->vhost_ops->vhost_reset_owner(dev);
     }
 }
 
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index e409a86..abbaa8b 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
     return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
 }
 
-static int vhost_kernel_reset_device(struct vhost_dev *dev)
+static int vhost_kernel_reset_owner(struct vhost_dev *dev)
 {
     return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
 }
@@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
         .vhost_get_features = vhost_kernel_get_features,
         .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
         .vhost_set_owner = vhost_kernel_set_owner,
-        .vhost_reset_device = vhost_kernel_reset_device,
+        .vhost_reset_owner = vhost_kernel_reset_owner,
         .vhost_get_vq_index = vhost_kernel_get_vq_index,
 #ifdef CONFIG_VHOST_VSOCK
         .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6abbc9d..4412008 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct vhost_dev *dev,
     return 0;
 }
 
+static int vhost_user_reset_owner(struct vhost_dev *dev)
+{
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_RESET_OWNER,
+        .hdr.flags = VHOST_USER_VERSION,
+    };
+
+    return vhost_user_write(dev, &msg, NULL, 0);
+}
+
 static int vhost_user_reset_device(struct vhost_dev *dev)
 {
     VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_RESET_DEVICE,
         .hdr.flags = VHOST_USER_VERSION,
     };
 
-    msg.hdr.request = virtio_has_feature(dev->protocol_features,
-                                         VHOST_USER_PROTOCOL_F_RESET_DEVICE)
-        ? VHOST_USER_RESET_DEVICE
-        : VHOST_USER_RESET_OWNER;
+    /* Caller must ensure the backend has VHOST_USER_PROTOCOL_F_RESET_DEVICE
+     * support */
+    if (!virtio_has_feature(dev->protocol_features,
+                       VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
+        return -EPERM;
+    }
 
     return vhost_user_write(dev, &msg, NULL, 0);
 }
@@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
         .vhost_set_features = vhost_user_set_features,
         .vhost_get_features = vhost_user_get_features,
         .vhost_set_owner = vhost_user_set_owner,
+        .vhost_reset_owner = vhost_user_reset_owner,
         .vhost_reset_device = vhost_user_reset_device,
         .vhost_get_vq_index = vhost_user_get_vq_index,
         .vhost_set_vring_enable = vhost_user_set_vring_enable,
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81bf310..affeeb0 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
                                      uint64_t *features);
 typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
 typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
+typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
 typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
 typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
 typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
@@ -150,6 +151,7 @@ typedef struct VhostOps {
     vhost_get_features_op vhost_get_features;
     vhost_set_backend_cap_op vhost_set_backend_cap;
     vhost_set_owner_op vhost_set_owner;
+    vhost_reset_owner_op vhost_reset_owner;
     vhost_reset_device_op vhost_reset_device;
     vhost_get_vq_index_op vhost_get_vq_index;
     vhost_set_vring_enable_op vhost_set_vring_enable;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/3] vhost: add vhost_dev_reset()
  2022-04-01 11:06       ` [PATCH 0/3] Refactor vhost device reset Michael Qiu
  2022-04-01 11:06         ` [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps Michael Qiu
@ 2022-04-01 11:06         ` Michael Qiu
  2022-04-02  0:48           ` Si-Wei Liu
  2022-04-01 11:06         ` [PATCH 3/3 v5] vdpa: reset the backend device in the end of vhost_net_stop() Michael Qiu
  2 siblings, 1 reply; 47+ messages in thread
From: Michael Qiu @ 2022-04-01 11:06 UTC (permalink / raw)
  To: jasowang, mst, si-wei.liu
  Cc: Michael Qiu, eperezma, lingshan.zhu, qemu-devel, lulu

Not all vhost-user backends support ops->vhost_reset_device(). Instead
of adding backend check and call backend ops directly, it's better to
implement a function in vhost framework, so that it could hide vhost_ops
details.

SIgned-off-by: Michael Qiu <qiudayu@archeros.com>
---
 hw/virtio/vhost.c         | 14 ++++++++++++++
 include/hw/virtio/vhost.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b643f42..26667ae 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+int vhost_dev_reset(struct vhost_dev *hdev)
+{
+    int ret = 0;
+
+    /* should only be called after backend is connected */
+    assert(hdev->vhost_ops);
+
+    if (hdev->vhost_ops->vhost_reset_device) {
+        ret = hdev->vhost_ops->vhost_reset_device(hdev);
+    }
+
+    return ret;
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7..b8b7c20 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_reset(struct vhost_dev *hdev);
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
-- 
1.8.3.1




^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 3/3 v5] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-01 11:06       ` [PATCH 0/3] Refactor vhost device reset Michael Qiu
  2022-04-01 11:06         ` [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps Michael Qiu
  2022-04-01 11:06         ` [PATCH 2/3] vhost: add vhost_dev_reset() Michael Qiu
@ 2022-04-01 11:06         ` Michael Qiu
  2 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-01 11:06 UTC (permalink / raw)
  To: jasowang, mst, si-wei.liu
  Cc: Michael Qiu, eperezma, lingshan.zhu, qemu-devel, lulu

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
 the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stopped.

Signed-off-by: Michael Qiu<qiudayu@archeros.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
v5 --> v4:
    move vhost_dev_reset() call after set_guest_notifiers

    remove implementation of vhost_dev_reset()

    remove backend check for VHOST_BACKEND_TYPE_VDPA
    
v4 --> v3:
    Nothing changed, becasue of issue with mimecast,
    when the From: tag is different from the sender,
    the some mail client will take the patch as an
    attachment, RESEND v3 does not work, So resend
    the patch as v4

v3 --> v2:
    Call vhost_dev_reset() at the end of vhost_net_stop().

    Since the vDPA device need re-add the status bit 
    VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
    simply, add them inside vhost_vdpa_reset_device, and
    the only way calling vhost_vdpa_reset_device is in
    vhost_net_stop(), so it keeps the same behavior as before.

v2 --> v1:
   Implement a new function vhost_dev_reset,
   reset the backend kernel device at last.

---
 hw/net/vhost_net.c     | 22 +++++++++++++++++++---
 hw/virtio/vhost-vdpa.c | 15 +++++++++------
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..30c76ca 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     int total_notifiers = data_queue_pairs * 2 + cvq;
     VirtIONet *n = VIRTIO_NET(dev);
     int nvhosts = data_queue_pairs + cvq;
-    struct vhost_net *net;
+    struct vhost_net *net = NULL;
     int r, e, i, index_end = data_queue_pairs * 2;
     NetClientState *peer;
 
@@ -391,13 +391,21 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 err_start:
     while (--i >= 0) {
         peer = qemu_get_peer(ncs , i);
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
     }
+
     e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
     if (e < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
         fflush(stderr);
     }
+
+    if (net) {
+        vhost_dev_reset(&net->dev);
+    }
 err:
     return r;
 }
@@ -410,6 +418,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     VirtIONet *n = VIRTIO_NET(dev);
     NetClientState *peer;
+    struct vhost_net *net = NULL;
     int total_notifiers = data_queue_pairs * 2 + cvq;
     int nvhosts = data_queue_pairs + cvq;
     int i, r;
@@ -420,7 +429,10 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
         } else {
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
         }
-        vhost_net_stop_one(get_vhost_net(peer), dev);
+
+        net = get_vhost_net(peer);
+
+        vhost_net_stop_one(net, dev);
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
@@ -429,6 +441,10 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
         fflush(stderr);
     }
     assert(r >= 0);
+
+    if (net) {
+        vhost_dev_reset(&net->dev);
+    }
 }
 
 void vhost_net_cleanup(struct vhost_net *net)
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c5ed7a3..3ef0199 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
 
     ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
     trace_vhost_vdpa_reset_device(dev, status);
+
+    /* Add back this status, so that the device could work next time*/
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                               VIRTIO_CONFIG_S_DRIVER);
+
     return ret;
 }
 
@@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
-static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_vring_state state = {
             .index = dev->vq_index + i,
-            .num = 1,
+            .num = ready,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
     }
@@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
+        vhost_vdpa_set_vring_ready(dev, 1);
     } else {
+        vhost_vdpa_set_vring_ready(dev, 0);
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
 
         return 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-01  2:53       ` Jason Wang
  2022-04-01  3:20         ` Michael Qiu
@ 2022-04-01 23:07         ` Si-Wei Liu
  2022-04-02  2:20           ` Jason Wang
       [not found]         ` <62466fff.1c69fb81.8817a.d813SMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 1 reply; 47+ messages in thread
From: Si-Wei Liu @ 2022-04-01 23:07 UTC (permalink / raw)
  To: Jason Wang, Michael Qiu; +Cc: eperezma, Zhu Lingshan, qemu-devel, Cindy Lu, mst



On 3/31/2022 7:53 PM, Jason Wang wrote:
> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
>> Currently, when VM poweroff, it will trigger vdpa
>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>> queue pair and one control queue, triggered 3 times), this
>> leads to below issue:
>>
>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>
>> This because in vhost_net_stop(), it will stop all vhost device bind to
>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
>> , then stop the queue: vhost_virtqueue_stop().
>>
>> In vhost_dev_stop(), it resets the device, which clear some flags
>> in low level driver, and in next loop(stop other vhost backends),
>> qemu try to stop the queue corresponding to the vhost backend,
>>   the driver finds that the VQ is invalied, this is the root cause.
>>
>> To solve the issue, vdpa should set vring unready, and
>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>
>> and implement a new function vhost_dev_reset, only reset backend
>> device after all vhost(per-queue) stoped.
> Typo.
>
>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
> Rethink this patch, consider there're devices that don't support
> set_vq_ready(). I wonder if we need
>
> 1) uAPI to tell the user space whether or not it supports set_vq_ready()
I guess what's more relevant here is to define the uAPI semantics for 
unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing, 
as starting vq is comparatively less ambiguous. Considering the 
likelihood that this interface may be used for live migration, it would 
be nice to come up with variants such as 1) discard inflight request 
v.s. 2) waiting for inflight processing to be done, and 3) timeout in 
waiting.

> 2) userspace will call SET_VRING_ENABLE() when the device supports
> otherwise it will use RESET.
Are you looking to making virtqueue resume-able through the new 
SET_VRING_ENABLE() uAPI?

I think RESET is inevitable in some case, i.e. when guest initiates 
device reset by writing 0 to the status register. For suspend/resume and 
live migration use cases, indeed RESET can be substituted with 
SET_VRING_ENABLE. Again, it'd need quite some code refactoring to 
accommodate this change. Although I'm all for it, it'd be the best to 
lay out the plan for multiple phases rather than overload this single 
patch too much. You can count my time on this endeavor if you don't mind. :)

>
> And for safety, I suggest tagging this as 7.1.
+1

Regards,
-Siwei

>
>> ---
>> v4 --> v3
>>      Nothing changed, becasue of issue with mimecast,
>>      when the From: tag is different from the sender,
>>      the some mail client will take the patch as an
>>      attachment, RESEND v3 does not work, So resend
>>      the patch as v4
>>
>> v3 --> v2:
>>      Call vhost_dev_reset() at the end of vhost_net_stop().
>>
>>      Since the vDPA device need re-add the status bit
>>      VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>      simply, add them inside vhost_vdpa_reset_device, and
>>      the only way calling vhost_vdpa_reset_device is in
>>      vhost_net_stop(), so it keeps the same behavior as before.
>>
>> v2 --> v1:
>>     Implement a new function vhost_dev_reset,
>>     reset the backend kernel device at last.
>> ---
>>   hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>   hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>   hw/virtio/vhost.c         | 15 ++++++++++++++-
>>   include/hw/virtio/vhost.h |  1 +
>>   4 files changed, 45 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index 30379d2..422c9bf 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>       int total_notifiers = data_queue_pairs * 2 + cvq;
>>       VirtIONet *n = VIRTIO_NET(dev);
>>       int nvhosts = data_queue_pairs + cvq;
>> -    struct vhost_net *net;
>> +    struct vhost_net *net = NULL;
>>       int r, e, i, index_end = data_queue_pairs * 2;
>>       NetClientState *peer;
>>
>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>   err_start:
>>       while (--i >= 0) {
>>           peer = qemu_get_peer(ncs , i);
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        net = get_vhost_net(peer);
>> +
>> +        vhost_net_stop_one(net, dev);
>>       }
>> +
>> +    /* We only reset backend vdpa device */
>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>> +        vhost_dev_reset(&net->dev);
>> +    }
>> +
>>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>       if (e < 0) {
>>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>       VirtIONet *n = VIRTIO_NET(dev);
>>       NetClientState *peer;
>> +    struct vhost_net *net = NULL;
>>       int total_notifiers = data_queue_pairs * 2 + cvq;
>>       int nvhosts = data_queue_pairs + cvq;
>>       int i, r;
>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>           } else {
>>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>           }
>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>> +
>> +        net = get_vhost_net(peer);
>> +
>> +        vhost_net_stop_one(net, dev);
>> +    }
>> +
>> +    /* We only reset backend vdpa device */
>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>> +        vhost_dev_reset(&net->dev);
>>       }
> So we've already reset the device in vhost_vdpa_dev_start(), any
> reason we need to do it again here?
>
>>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index c5ed7a3..3ef0199 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>
>>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>       trace_vhost_vdpa_reset_device(dev, status);
>> +
>> +    /* Add back this status, so that the device could work next time*/
>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>> +                               VIRTIO_CONFIG_S_DRIVER);
> This seems to contradict the semantic of reset.
>
>> +
>>       return ret;
>>   }
>>
>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>>       return idx;
>>   }
>>
>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>>   {
>>       int i;
>>       trace_vhost_vdpa_set_vring_ready(dev);
>>       for (i = 0; i < dev->nvqs; ++i) {
>>           struct vhost_vring_state state = {
>>               .index = dev->vq_index + i,
>> -            .num = 1,
>> +            .num = ready,
>>           };
>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>       }
>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           if (unlikely(!ok)) {
>>               return -1;
>>           }
>> -        vhost_vdpa_set_vring_ready(dev);
>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>       } else {
>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>           ok = vhost_vdpa_svqs_stop(dev);
>>           if (unlikely(!ok)) {
>>               return -1;
>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>           memory_listener_register(&v->listener, &address_space_memory);
>>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>       } else {
>> -        vhost_vdpa_reset_device(dev);
>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>           memory_listener_unregister(&v->listener);
>>
>>           return 0;
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index b643f42..7e0cdb6 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1820,7 +1820,6 @@ fail_features:
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>   {
>>       int i;
>> -
> Unnecessary changes.
>
>>       /* should only be called after backend is connected */
>>       assert(hdev->vhost_ops);
>>
>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>
>>       return -ENOSYS;
>>   }
>> +
>> +int vhost_dev_reset(struct vhost_dev *hdev)
>> +{
> Let's use a separate patch for this.
>
> Thanks
>
>> +    int ret = 0;
>> +
>> +    /* should only be called after backend is connected */
>> +    assert(hdev->vhost_ops);
>> +
>> +    if (hdev->vhost_ops->vhost_reset_device) {
>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>> +    }
>> +
>> +    return ret;
>> +}
>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>> index 58a73e7..b8b7c20 100644
>> --- a/include/hw/virtio/vhost.h
>> +++ b/include/hw/virtio/vhost.h
>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>
>> --
>> 1.8.3.1
>>
>>
>>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-01 11:06         ` [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps Michael Qiu
@ 2022-04-02  0:44           ` Si-Wei Liu
  2022-04-02  2:08             ` Michael Qiu
  2022-04-02  2:38           ` Jason Wang
  1 sibling, 1 reply; 47+ messages in thread
From: Si-Wei Liu @ 2022-04-02  0:44 UTC (permalink / raw)
  To: Michael Qiu, jasowang, mst; +Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 4/1/2022 4:06 AM, Michael Qiu wrote:
> Currently in vhost framwork, vhost_reset_device() is misnamed.
> Actually, it should be vhost_reset_owner().
>
> In vhost user, it make compatible with reset device ops, but
> vhost kernel does not compatible with it, for vhost vdpa, it
> only implement reset device action.
>
> So we need seperate the function into vhost_reset_owner() and
> vhost_reset_device(). So that different backend could use the
> correct function.
>
> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
> ---
>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
>   hw/virtio/vhost-backend.c         |  4 ++--
>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
>   include/hw/virtio/vhost-backend.h |  2 ++
>   4 files changed, 27 insertions(+), 7 deletions(-)
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 1b2f7ee..f179626 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
>           return;
>       }
>   
> -    if (dev->vhost_ops->vhost_reset_device) {
> +    if (virtio_has_feature(dev->protocol_features,
> +                           VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
This line change is not needed. VHOST_USER_PROTOCOL_F_RESET_DEVICE is 
guaranteed to be available if getting here.
> +                           dev->vhost_ops->vhost_reset_device) {
>           dev->vhost_ops->vhost_reset_device(dev);
> +    } else if (dev->vhost_ops->vhost_reset_owner) {
> +        dev->vhost_ops->vhost_reset_owner(dev);
Nope, drop these two lines. The caller of vhost_user_scsi_reset() 
doesn't expect vhost_reset_owner to be called in case vhost_reset_device 
is not implemented.

>       }
>   }
>   
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index e409a86..abbaa8b 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
>   {
>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>   }
> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
>           .vhost_get_features = vhost_kernel_get_features,
>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>           .vhost_set_owner = vhost_kernel_set_owner,
> -        .vhost_reset_device = vhost_kernel_reset_device,
> +        .vhost_reset_owner = vhost_kernel_reset_owner,
>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
>   #ifdef CONFIG_VHOST_VSOCK
>           .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 6abbc9d..4412008 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct vhost_dev *dev,
>       return 0;
>   }
>   
> +static int vhost_user_reset_owner(struct vhost_dev *dev)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_RESET_OWNER,
> +        .hdr.flags = VHOST_USER_VERSION,
> +    };
> +
> +    return vhost_user_write(dev, &msg, NULL, 0);
> +}
> +
>   static int vhost_user_reset_device(struct vhost_dev *dev)
>   {
>       VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>           .hdr.flags = VHOST_USER_VERSION,
>       };
>   
> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
> -                                         VHOST_USER_PROTOCOL_F_RESET_DEVICE)
> -        ? VHOST_USER_RESET_DEVICE
> -        : VHOST_USER_RESET_OWNER;
> +    /* Caller must ensure the backend has VHOST_USER_PROTOCOL_F_RESET_DEVICE
> +     * support */
> +    if (!virtio_has_feature(dev->protocol_features,
> +                       VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> +        return -EPERM;
> +    }
I think we can safely remove this check, since the caller already 
guarantees VHOST_USER_PROTOCOL_F_RESET_DEVICE is around as what your 
comment mentions.

The previous branch condition is to reuse the vhost_reset_device op for 
two different ends, but there's no actual user for 
VHOST_USER_RESET_OWNER historically.

Thanks,
-Siwei

>   
>       return vhost_user_write(dev, &msg, NULL, 0);
>   }
> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
>           .vhost_set_features = vhost_user_set_features,
>           .vhost_get_features = vhost_user_get_features,
>           .vhost_set_owner = vhost_user_set_owner,
> +        .vhost_reset_owner = vhost_user_reset_owner,
>           .vhost_reset_device = vhost_user_reset_device,
>           .vhost_get_vq_index = vhost_user_get_vq_index,
>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81bf310..affeeb0 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
>                                        uint64_t *features);
>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
> @@ -150,6 +151,7 @@ typedef struct VhostOps {
>       vhost_get_features_op vhost_get_features;
>       vhost_set_backend_cap_op vhost_set_backend_cap;
>       vhost_set_owner_op vhost_set_owner;
> +    vhost_reset_owner_op vhost_reset_owner;
>       vhost_reset_device_op vhost_reset_device;
>       vhost_get_vq_index_op vhost_get_vq_index;
>       vhost_set_vring_enable_op vhost_set_vring_enable;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/3] vhost: add vhost_dev_reset()
  2022-04-01 11:06         ` [PATCH 2/3] vhost: add vhost_dev_reset() Michael Qiu
@ 2022-04-02  0:48           ` Si-Wei Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Si-Wei Liu @ 2022-04-02  0:48 UTC (permalink / raw)
  To: Michael Qiu, jasowang, mst; +Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 4/1/2022 4:06 AM, Michael Qiu wrote:
> Not all vhost-user backends support ops->vhost_reset_device(). Instead
> of adding backend check and call backend ops directly, it's better to
> implement a function in vhost framework, so that it could hide vhost_ops
> details.
>
> SIgned-off-by: Michael Qiu <qiudayu@archeros.com>
> ---
>   hw/virtio/vhost.c         | 14 ++++++++++++++
>   include/hw/virtio/vhost.h |  1 +
>   2 files changed, 15 insertions(+)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index b643f42..26667ae 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>   
>       return -ENOSYS;
>   }
> +
> +int vhost_dev_reset(struct vhost_dev *hdev)
Maybe vhost_user_scsi_reset() can call this function instead?

-Siwei
> +{
> +    int ret = 0;
> +
> +    /* should only be called after backend is connected */
> +    assert(hdev->vhost_ops);
> +
> +    if (hdev->vhost_ops->vhost_reset_device) {
> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
> +    }
> +
> +    return ret;
> +}
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 58a73e7..b8b7c20 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>   void vhost_dev_cleanup(struct vhost_dev *hdev);
>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> +int vhost_dev_reset(struct vhost_dev *hdev);
>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>   



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
       [not found]         ` <62466fff.1c69fb81.8817a.d813SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2022-04-02  1:48           ` Jason Wang
  2022-04-02  3:43             ` Michael Qiu
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Wang @ 2022-04-02  1:48 UTC (permalink / raw)
  To: Michael Qiu; +Cc: Cindy Lu, mst, qemu-devel, eperezma, Si-Wei Liu, Zhu Lingshan

On Fri, Apr 1, 2022 at 11:22 AM Michael Qiu <qiudayu@archeros.com> wrote:
>
>
>
> On 2022/4/1 10:53, Jason Wang wrote:
> > On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
> >>
> >> Currently, when VM poweroff, it will trigger vdpa
> >> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> >> queue pair and one control queue, triggered 3 times), this
> >> leads to below issue:
> >>
> >> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> >>
> >> This because in vhost_net_stop(), it will stop all vhost device bind to
> >> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> >> , then stop the queue: vhost_virtqueue_stop().
> >>
> >> In vhost_dev_stop(), it resets the device, which clear some flags
> >> in low level driver, and in next loop(stop other vhost backends),
> >> qemu try to stop the queue corresponding to the vhost backend,
> >>   the driver finds that the VQ is invalied, this is the root cause.
> >>
> >> To solve the issue, vdpa should set vring unready, and
> >> remove reset ops in device stop: vhost_dev_start(hdev, false).
> >>
> >> and implement a new function vhost_dev_reset, only reset backend
> >> device after all vhost(per-queue) stoped.
> >
> > Typo.
> >
> >>
> >> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> >> Acked-by: Jason Wang <jasowang@redhat.com>
> >
> > Rethink this patch, consider there're devices that don't support
> > set_vq_ready(). I wonder if we need
> >
> > 1) uAPI to tell the user space whether or not it supports set_vq_ready()
> > 2) userspace will call SET_VRING_ENABLE() when the device supports
> > otherwise it will use RESET.
>
> if the device does not support set_vq_ready() in kernel, it will trigger
> kernel oops, at least in current kernel, it does not check where
> set_vq_ready has been implemented.
>
> And I checked all vdpa driver in kernel, all drivers has implemented
> this ops.

Actually, it's not about whether or not the set_vq_ready() is
implemented. It's about whether the parent supports it correctly:

The ability to suspend and resume a virtqueue is currently beyond the
ability of some transport (e.g PCI).

For IFCVF:

static void ifcvf_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev,
                                    u16 qid, bool ready)
{
        struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);

        vf->vring[qid].ready = ready;
}

It seems to follow the PCI transport, so if you just set it to zero,
it simply doesn't work at all. I can see some tricks that are used in
the DPDK driver, maybe we can use the same to "fix" this.

For VDUSE, we are basically the same:

static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa,
                                        u16 idx, bool ready)
{
        struct vduse_dev *dev = vdpa_to_vduse(vdpa);
        struct vduse_virtqueue *vq = &dev->vqs[idx];

        vq->ready = ready;
}

It can't be stopped correctly if we just set it to zero.

For vp_vdpa, it basically wants to abuse the queue_enable, which may
result in a warning in Qemu (and the device isn't stopped).

static void vp_vdpa_set_vq_ready(struct vdpa_device *vdpa,
                                 u16 qid, bool ready)
{
        struct virtio_pci_modern_device *mdev = vdpa_to_mdev(vdpa);

        vp_modern_set_queue_enable(mdev, qid, ready);
}

ENI did a trick in writing 0 to virtqueue address, so it works for
stop but not the start.

static void eni_vdpa_set_vq_ready(struct vdpa_device *vdpa, u16 qid,
                                  bool ready)
{
        struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);

        /* ENI is a legacy virtio-pci device. This is not supported
         * by specification. But we can disable virtqueue by setting
         * address to 0.
         */
        if (!ready)
                vp_legacy_set_queue_address(ldev, qid, 0);
}

mlx5 call suspend_vq() which should be fine.

Simulator is probably fine.

So I worry if we use the set_vq_ready(0) it won't work correctly and
will have other issues. The idea is:

- advertise the suspend/resume capability via uAPI, then mlx5_vdpa and
simulator can go with set_vq_ready()
- others can still go with reset(), and we can try to fix them
gradually (and introduce this in the virtio spec).

>
> So I think it is OK to call set_vq_ready without check.
>
> >
> > And for safety, I suggest tagging this as 7.1.
> >
> >> ---
> >> v4 --> v3
> >>      Nothing changed, becasue of issue with mimecast,
> >>      when the From: tag is different from the sender,
> >>      the some mail client will take the patch as an
> >>      attachment, RESEND v3 does not work, So resend
> >>      the patch as v4
> >>
> >> v3 --> v2:
> >>      Call vhost_dev_reset() at the end of vhost_net_stop().
> >>
> >>      Since the vDPA device need re-add the status bit
> >>      VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
> >>      simply, add them inside vhost_vdpa_reset_device, and
> >>      the only way calling vhost_vdpa_reset_device is in
> >>      vhost_net_stop(), so it keeps the same behavior as before.
> >>
> >> v2 --> v1:
> >>     Implement a new function vhost_dev_reset,
> >>     reset the backend kernel device at last.
> >> ---
> >>   hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
> >>   hw/virtio/vhost-vdpa.c    | 15 +++++++++------
> >>   hw/virtio/vhost.c         | 15 ++++++++++++++-
> >>   include/hw/virtio/vhost.h |  1 +
> >>   4 files changed, 45 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >> index 30379d2..422c9bf 100644
> >> --- a/hw/net/vhost_net.c
> >> +++ b/hw/net/vhost_net.c
> >> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >>       int total_notifiers = data_queue_pairs * 2 + cvq;
> >>       VirtIONet *n = VIRTIO_NET(dev);
> >>       int nvhosts = data_queue_pairs + cvq;
> >> -    struct vhost_net *net;
> >> +    struct vhost_net *net = NULL;
> >>       int r, e, i, index_end = data_queue_pairs * 2;
> >>       NetClientState *peer;
> >>
> >> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >>   err_start:
> >>       while (--i >= 0) {
> >>           peer = qemu_get_peer(ncs , i);
> >> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> >> +
> >> +        net = get_vhost_net(peer);
> >> +
> >> +        vhost_net_stop_one(net, dev);
> >>       }
> >> +
> >> +    /* We only reset backend vdpa device */
> >> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> >> +        vhost_dev_reset(&net->dev);
> >> +    }
> >> +
> >>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> >>       if (e < 0) {
> >>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
> >> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
> >>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> >>       VirtIONet *n = VIRTIO_NET(dev);
> >>       NetClientState *peer;
> >> +    struct vhost_net *net = NULL;
> >>       int total_notifiers = data_queue_pairs * 2 + cvq;
> >>       int nvhosts = data_queue_pairs + cvq;
> >>       int i, r;
> >> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
> >>           } else {
> >>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
> >>           }
> >> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> >> +
> >> +        net = get_vhost_net(peer);
> >> +
> >> +        vhost_net_stop_one(net, dev);
> >> +    }
> >> +
> >> +    /* We only reset backend vdpa device */
> >> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> >> +        vhost_dev_reset(&net->dev);
> >>       }
> >
> > So we've already reset the device in vhost_vdpa_dev_start(), any
> > reason we need to do it again here?
>
> reset device in vhost_vdpa_dev_start if there is some error with start.

The rest should have been done in vhost_net_stop_one()?

>
>
> >
> >>
> >>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >> index c5ed7a3..3ef0199 100644
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> >>
> >>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> >>       trace_vhost_vdpa_reset_device(dev, status);
> >> +
> >> +    /* Add back this status, so that the device could work next time*/
> >> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >> +                               VIRTIO_CONFIG_S_DRIVER);
> >
> > This seems to contradict the semantic of reset
>
> Yes, but it's hard to put it in other place, seems only vhost-vdpa need
> it, and for VM shutdown, qemu_del_nic() will do cleanup this like close
> vhost fds, which will call reset in kernel space without set those features.
>
> So at last I put it here with no other inpact.

Can we move this to the suitable caller of this function?

Thanks

>
> Thanks,
> Michael
> >
> >> +
> >>       return ret;
> >>   }
> >>
> >> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
> >>       return idx;
> >>   }
> >>
> >> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
> >>   {
> >>       int i;
> >>       trace_vhost_vdpa_set_vring_ready(dev);
> >>       for (i = 0; i < dev->nvqs; ++i) {
> >>           struct vhost_vring_state state = {
> >>               .index = dev->vq_index + i,
> >> -            .num = 1,
> >> +            .num = ready,
> >>           };
> >>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>       }
> >> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>           if (unlikely(!ok)) {
> >>               return -1;
> >>           }
> >> -        vhost_vdpa_set_vring_ready(dev);
> >> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>       } else {
> >> +        vhost_vdpa_set_vring_ready(dev, 0);
> >>           ok = vhost_vdpa_svqs_stop(dev);
> >>           if (unlikely(!ok)) {
> >>               return -1;
> >> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>           memory_listener_register(&v->listener, &address_space_memory);
> >>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> >>       } else {
> >> -        vhost_vdpa_reset_device(dev);
> >> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >> -                                   VIRTIO_CONFIG_S_DRIVER);
> >>           memory_listener_unregister(&v->listener);
> >>
> >>           return 0;
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index b643f42..7e0cdb6 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -1820,7 +1820,6 @@ fail_features:
> >>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
> >>   {
> >>       int i;
> >> -
> >
> > Unnecessary changes.
> >
> >>       /* should only be called after backend is connected */
> >>       assert(hdev->vhost_ops);
> >>
> >> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
> >>
> >>       return -ENOSYS;
> >>   }
> >> +
> >> +int vhost_dev_reset(struct vhost_dev *hdev)
> >> +{
> >
> > Let's use a separate patch for this.
> >
> > Thanks
> >
> >> +    int ret = 0;
> >> +
> >> +    /* should only be called after backend is connected */
> >> +    assert(hdev->vhost_ops);
> >> +
> >> +    if (hdev->vhost_ops->vhost_reset_device) {
> >> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
> >> +    }
> >> +
> >> +    return ret;
> >> +}
> >> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> >> index 58a73e7..b8b7c20 100644
> >> --- a/include/hw/virtio/vhost.h
> >> +++ b/include/hw/virtio/vhost.h
> >> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> >>   void vhost_dev_cleanup(struct vhost_dev *hdev);
> >>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
> >>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> >> +int vhost_dev_reset(struct vhost_dev *hdev);
> >>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
> >>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
> >>
> >> --
> >> 1.8.3.1
> >>
> >>
> >>
> >
> >
> >
>
>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-02  0:44           ` Si-Wei Liu
@ 2022-04-02  2:08             ` Michael Qiu
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-02  2:08 UTC (permalink / raw)
  To: Si-Wei Liu, jasowang, mst; +Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 2022/4/2 8:44, Si-Wei Liu wrote:
> 
> 
> On 4/1/2022 4:06 AM, Michael Qiu wrote:
>> Currently in vhost framwork, vhost_reset_device() is misnamed.
>> Actually, it should be vhost_reset_owner().
>>
>> In vhost user, it make compatible with reset device ops, but
>> vhost kernel does not compatible with it, for vhost vdpa, it
>> only implement reset device action.
>>
>> So we need seperate the function into vhost_reset_owner() and
>> vhost_reset_device(). So that different backend could use the
>> correct function.
>>
>> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
>> ---
>>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
>>   hw/virtio/vhost-backend.c         |  4 ++--
>>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
>>   include/hw/virtio/vhost-backend.h |  2 ++
>>   4 files changed, 27 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>> index 1b2f7ee..f179626 100644
>> --- a/hw/scsi/vhost-user-scsi.c
>> +++ b/hw/scsi/vhost-user-scsi.c
>> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
>>           return;
>>       }
>> -    if (dev->vhost_ops->vhost_reset_device) {
>> +    if (virtio_has_feature(dev->protocol_features,
>> +                           VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
> This line change is not needed. VHOST_USER_PROTOCOL_F_RESET_DEVICE is 
> guaranteed to be available if getting here.
>> +                           dev->vhost_ops->vhost_reset_device) {
>>           dev->vhost_ops->vhost_reset_device(dev);
>> +    } else if (dev->vhost_ops->vhost_reset_owner) {
>> +        dev->vhost_ops->vhost_reset_owner(dev);
> Nope, drop these two lines. The caller of vhost_user_scsi_reset() 
> doesn't expect vhost_reset_owner to be called in case vhost_reset_device 
> is not implemented.
> 

  You are right, I will drop these two lines and remove 
VHOST_USER_PROTOCOL_F_RESET_DEVICE  check.


>>       }
>>   }
>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
>> index e409a86..abbaa8b 100644
>> --- a/hw/virtio/vhost-backend.c
>> +++ b/hw/virtio/vhost-backend.c
>> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev 
>> *dev)
>>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>>   }
>> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
>> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
>>   {
>>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>>   }
>> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
>>           .vhost_get_features = vhost_kernel_get_features,
>>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>>           .vhost_set_owner = vhost_kernel_set_owner,
>> -        .vhost_reset_device = vhost_kernel_reset_device,
>> +        .vhost_reset_owner = vhost_kernel_reset_owner,
>>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
>>   #ifdef CONFIG_VHOST_VSOCK
>>           .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> index 6abbc9d..4412008 100644
>> --- a/hw/virtio/vhost-user.c
>> +++ b/hw/virtio/vhost-user.c
>> @@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct 
>> vhost_dev *dev,
>>       return 0;
>>   }
>> +static int vhost_user_reset_owner(struct vhost_dev *dev)
>> +{
>> +    VhostUserMsg msg = {
>> +        .hdr.request = VHOST_USER_RESET_OWNER,
>> +        .hdr.flags = VHOST_USER_VERSION,
>> +    };
>> +
>> +    return vhost_user_write(dev, &msg, NULL, 0);
>> +}
>> +
>>   static int vhost_user_reset_device(struct vhost_dev *dev)
>>   {
>>       VhostUserMsg msg = {
>> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>>           .hdr.flags = VHOST_USER_VERSION,
>>       };
>> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
>> -                                         
>> VHOST_USER_PROTOCOL_F_RESET_DEVICE)
>> -        ? VHOST_USER_RESET_DEVICE
>> -        : VHOST_USER_RESET_OWNER;
>> +    /* Caller must ensure the backend has 
>> VHOST_USER_PROTOCOL_F_RESET_DEVICE
>> +     * support */
>> +    if (!virtio_has_feature(dev->protocol_features,
>> +                       VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
>> +        return -EPERM;
>> +    }
> I think we can safely remove this check, since the caller already 
> guarantees VHOST_USER_PROTOCOL_F_RESET_DEVICE is around as what your 
> comment mentions.
> 

I think it probely worth to check, because for vhost_net_stop() it does 
not check this flag, otherwise we should check if the backend is vhost 
user with this flag enabled.

> The previous branch condition is to reuse the vhost_reset_device op for 
> two different ends, but there's no actual user for 
> VHOST_USER_RESET_OWNER historically.
> 
> Thanks,
> -Siwei
> 
>>       return vhost_user_write(dev, &msg, NULL, 0);
>>   }
>> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
>>           .vhost_set_features = vhost_user_set_features,
>>           .vhost_get_features = vhost_user_get_features,
>>           .vhost_set_owner = vhost_user_set_owner,
>> +        .vhost_reset_owner = vhost_user_reset_owner,
>>           .vhost_reset_device = vhost_user_reset_device,
>>           .vhost_get_vq_index = vhost_user_get_vq_index,
>>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
>> diff --git a/include/hw/virtio/vhost-backend.h 
>> b/include/hw/virtio/vhost-backend.h
>> index 81bf310..affeeb0 100644
>> --- a/include/hw/virtio/vhost-backend.h
>> +++ b/include/hw/virtio/vhost-backend.h
>> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct 
>> vhost_dev *dev,
>>                                        uint64_t *features);
>>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
>>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
>> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
>>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
>>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
>> @@ -150,6 +151,7 @@ typedef struct VhostOps {
>>       vhost_get_features_op vhost_get_features;
>>       vhost_set_backend_cap_op vhost_set_backend_cap;
>>       vhost_set_owner_op vhost_set_owner;
>> +    vhost_reset_owner_op vhost_reset_owner;
>>       vhost_reset_device_op vhost_reset_device;
>>       vhost_get_vq_index_op vhost_get_vq_index;
>>       vhost_set_vring_enable_op vhost_set_vring_enable;
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-01 23:07         ` Si-Wei Liu
@ 2022-04-02  2:20           ` Jason Wang
  2022-04-02  3:53             ` Michael Qiu
                               ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Jason Wang @ 2022-04-02  2:20 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Cindy Lu, mst, qemu-devel, eperezma, Michael Qiu, Zhu Lingshan

Adding Michael.

On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 3/31/2022 7:53 PM, Jason Wang wrote:
> > On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
> >> Currently, when VM poweroff, it will trigger vdpa
> >> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> >> queue pair and one control queue, triggered 3 times), this
> >> leads to below issue:
> >>
> >> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> >>
> >> This because in vhost_net_stop(), it will stop all vhost device bind to
> >> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> >> , then stop the queue: vhost_virtqueue_stop().
> >>
> >> In vhost_dev_stop(), it resets the device, which clear some flags
> >> in low level driver, and in next loop(stop other vhost backends),
> >> qemu try to stop the queue corresponding to the vhost backend,
> >>   the driver finds that the VQ is invalied, this is the root cause.
> >>
> >> To solve the issue, vdpa should set vring unready, and
> >> remove reset ops in device stop: vhost_dev_start(hdev, false).
> >>
> >> and implement a new function vhost_dev_reset, only reset backend
> >> device after all vhost(per-queue) stoped.
> > Typo.
> >
> >> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
> >> Acked-by: Jason Wang <jasowang@redhat.com>
> > Rethink this patch, consider there're devices that don't support
> > set_vq_ready(). I wonder if we need
> >
> > 1) uAPI to tell the user space whether or not it supports set_vq_ready()
> I guess what's more relevant here is to define the uAPI semantics for
> unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing,
> as starting vq is comparatively less ambiguous.

Yes.

> Considering the
> likelihood that this interface may be used for live migration, it would
> be nice to come up with variants such as 1) discard inflight request
> v.s. 2) waiting for inflight processing to be done,

Or inflight descriptor reporting (which seems to be tricky). But we
can start from net that a discarding may just work.

>and 3) timeout in
> waiting.

Actually, that's the plan and Eugenio is proposing something like this
via virtio spec:

https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html

>
> > 2) userspace will call SET_VRING_ENABLE() when the device supports
> > otherwise it will use RESET.
> Are you looking to making virtqueue resume-able through the new
> SET_VRING_ENABLE() uAPI?
>
> I think RESET is inevitable in some case, i.e. when guest initiates
> device reset by writing 0 to the status register.

Yes, that's all my plan.

> For suspend/resume and
> live migration use cases, indeed RESET can be substituted with
> SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
> accommodate this change. Although I'm all for it, it'd be the best to
> lay out the plan for multiple phases rather than overload this single
> patch too much. You can count my time on this endeavor if you don't mind. :)

You're welcome, I agree we should choose a way to go first:

1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
2) go with virtio-spec (may take a while)
3) don't wait for the spec, have a vDPA specific uAPI first. Note that
I've chatted with most of the vendors and they seem to be fine with
the _S_STOP. If we go this way, we can still provide the forward
compatibility of _S_STOP
4) or do them all (in parallel)

Any thoughts?

Thanks

>
> >
> > And for safety, I suggest tagging this as 7.1.
> +1
>
> Regards,
> -Siwei
>
> >
> >> ---
> >> v4 --> v3
> >>      Nothing changed, becasue of issue with mimecast,
> >>      when the From: tag is different from the sender,
> >>      the some mail client will take the patch as an
> >>      attachment, RESEND v3 does not work, So resend
> >>      the patch as v4
> >>
> >> v3 --> v2:
> >>      Call vhost_dev_reset() at the end of vhost_net_stop().
> >>
> >>      Since the vDPA device need re-add the status bit
> >>      VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
> >>      simply, add them inside vhost_vdpa_reset_device, and
> >>      the only way calling vhost_vdpa_reset_device is in
> >>      vhost_net_stop(), so it keeps the same behavior as before.
> >>
> >> v2 --> v1:
> >>     Implement a new function vhost_dev_reset,
> >>     reset the backend kernel device at last.
> >> ---
> >>   hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
> >>   hw/virtio/vhost-vdpa.c    | 15 +++++++++------
> >>   hw/virtio/vhost.c         | 15 ++++++++++++++-
> >>   include/hw/virtio/vhost.h |  1 +
> >>   4 files changed, 45 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >> index 30379d2..422c9bf 100644
> >> --- a/hw/net/vhost_net.c
> >> +++ b/hw/net/vhost_net.c
> >> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >>       int total_notifiers = data_queue_pairs * 2 + cvq;
> >>       VirtIONet *n = VIRTIO_NET(dev);
> >>       int nvhosts = data_queue_pairs + cvq;
> >> -    struct vhost_net *net;
> >> +    struct vhost_net *net = NULL;
> >>       int r, e, i, index_end = data_queue_pairs * 2;
> >>       NetClientState *peer;
> >>
> >> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >>   err_start:
> >>       while (--i >= 0) {
> >>           peer = qemu_get_peer(ncs , i);
> >> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> >> +
> >> +        net = get_vhost_net(peer);
> >> +
> >> +        vhost_net_stop_one(net, dev);
> >>       }
> >> +
> >> +    /* We only reset backend vdpa device */
> >> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> >> +        vhost_dev_reset(&net->dev);
> >> +    }
> >> +
> >>       e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> >>       if (e < 0) {
> >>           fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
> >> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
> >>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> >>       VirtIONet *n = VIRTIO_NET(dev);
> >>       NetClientState *peer;
> >> +    struct vhost_net *net = NULL;
> >>       int total_notifiers = data_queue_pairs * 2 + cvq;
> >>       int nvhosts = data_queue_pairs + cvq;
> >>       int i, r;
> >> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
> >>           } else {
> >>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
> >>           }
> >> -        vhost_net_stop_one(get_vhost_net(peer), dev);
> >> +
> >> +        net = get_vhost_net(peer);
> >> +
> >> +        vhost_net_stop_one(net, dev);
> >> +    }
> >> +
> >> +    /* We only reset backend vdpa device */
> >> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> >> +        vhost_dev_reset(&net->dev);
> >>       }
> > So we've already reset the device in vhost_vdpa_dev_start(), any
> > reason we need to do it again here?
> >
> >>       r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
> >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >> index c5ed7a3..3ef0199 100644
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> >>
> >>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> >>       trace_vhost_vdpa_reset_device(dev, status);
> >> +
> >> +    /* Add back this status, so that the device could work next time*/
> >> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >> +                               VIRTIO_CONFIG_S_DRIVER);
> > This seems to contradict the semantic of reset.
> >
> >> +
> >>       return ret;
> >>   }
> >>
> >> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
> >>       return idx;
> >>   }
> >>
> >> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
> >>   {
> >>       int i;
> >>       trace_vhost_vdpa_set_vring_ready(dev);
> >>       for (i = 0; i < dev->nvqs; ++i) {
> >>           struct vhost_vring_state state = {
> >>               .index = dev->vq_index + i,
> >> -            .num = 1,
> >> +            .num = ready,
> >>           };
> >>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>       }
> >> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>           if (unlikely(!ok)) {
> >>               return -1;
> >>           }
> >> -        vhost_vdpa_set_vring_ready(dev);
> >> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>       } else {
> >> +        vhost_vdpa_set_vring_ready(dev, 0);
> >>           ok = vhost_vdpa_svqs_stop(dev);
> >>           if (unlikely(!ok)) {
> >>               return -1;
> >> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>           memory_listener_register(&v->listener, &address_space_memory);
> >>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> >>       } else {
> >> -        vhost_vdpa_reset_device(dev);
> >> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >> -                                   VIRTIO_CONFIG_S_DRIVER);
> >>           memory_listener_unregister(&v->listener);
> >>
> >>           return 0;
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index b643f42..7e0cdb6 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -1820,7 +1820,6 @@ fail_features:
> >>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
> >>   {
> >>       int i;
> >> -
> > Unnecessary changes.
> >
> >>       /* should only be called after backend is connected */
> >>       assert(hdev->vhost_ops);
> >>
> >> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
> >>
> >>       return -ENOSYS;
> >>   }
> >> +
> >> +int vhost_dev_reset(struct vhost_dev *hdev)
> >> +{
> > Let's use a separate patch for this.
> >
> > Thanks
> >
> >> +    int ret = 0;
> >> +
> >> +    /* should only be called after backend is connected */
> >> +    assert(hdev->vhost_ops);
> >> +
> >> +    if (hdev->vhost_ops->vhost_reset_device) {
> >> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
> >> +    }
> >> +
> >> +    return ret;
> >> +}
> >> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> >> index 58a73e7..b8b7c20 100644
> >> --- a/include/hw/virtio/vhost.h
> >> +++ b/include/hw/virtio/vhost.h
> >> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> >>   void vhost_dev_cleanup(struct vhost_dev *hdev);
> >>   int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
> >>   void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> >> +int vhost_dev_reset(struct vhost_dev *hdev);
> >>   int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
> >>   void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
> >>
> >> --
> >> 1.8.3.1
> >>
> >>
> >>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-01 11:06         ` [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps Michael Qiu
  2022-04-02  0:44           ` Si-Wei Liu
@ 2022-04-02  2:38           ` Jason Wang
  2022-04-02  5:14             ` Michael Qiu
       [not found]             ` <6247dc22.1c69fb81.4244.a88bSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 2 replies; 47+ messages in thread
From: Jason Wang @ 2022-04-02  2:38 UTC (permalink / raw)
  To: Michael Qiu, mst, si-wei.liu; +Cc: eperezma, lingshan.zhu, qemu-devel, lulu


在 2022/4/1 下午7:06, Michael Qiu 写道:
> Currently in vhost framwork, vhost_reset_device() is misnamed.
> Actually, it should be vhost_reset_owner().
>
> In vhost user, it make compatible with reset device ops, but
> vhost kernel does not compatible with it, for vhost vdpa, it
> only implement reset device action.
>
> So we need seperate the function into vhost_reset_owner() and
> vhost_reset_device(). So that different backend could use the
> correct function.


I see no reason when RESET_OWNER needs to be done for kernel backend.

And if I understand the code correctly, vhost-user "abuse" RESET_OWNER 
for reset. So the current code looks fine?


>
> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
> ---
>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
>   hw/virtio/vhost-backend.c         |  4 ++--
>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
>   include/hw/virtio/vhost-backend.h |  2 ++
>   4 files changed, 27 insertions(+), 7 deletions(-)
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 1b2f7ee..f179626 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
>           return;
>       }
>   
> -    if (dev->vhost_ops->vhost_reset_device) {
> +    if (virtio_has_feature(dev->protocol_features,
> +                           VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
> +                           dev->vhost_ops->vhost_reset_device) {
>           dev->vhost_ops->vhost_reset_device(dev);
> +    } else if (dev->vhost_ops->vhost_reset_owner) {
> +        dev->vhost_ops->vhost_reset_owner(dev);


Actually, I fail to understand why we need an indirection via vhost_ops. 
It's guaranteed to be vhost_user_ops.


>       }
>   }
>   
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index e409a86..abbaa8b 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
>   {
>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>   }
> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
>           .vhost_get_features = vhost_kernel_get_features,
>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>           .vhost_set_owner = vhost_kernel_set_owner,
> -        .vhost_reset_device = vhost_kernel_reset_device,
> +        .vhost_reset_owner = vhost_kernel_reset_owner,


I think we can delete the current vhost_reset_device() since it not used 
in any code path.

Thanks


>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
>   #ifdef CONFIG_VHOST_VSOCK
>           .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 6abbc9d..4412008 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct vhost_dev *dev,
>       return 0;
>   }
>   
> +static int vhost_user_reset_owner(struct vhost_dev *dev)
> +{
> +    VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_RESET_OWNER,
> +        .hdr.flags = VHOST_USER_VERSION,
> +    };
> +
> +    return vhost_user_write(dev, &msg, NULL, 0);
> +}
> +
>   static int vhost_user_reset_device(struct vhost_dev *dev)
>   {
>       VhostUserMsg msg = {
> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>           .hdr.flags = VHOST_USER_VERSION,
>       };
>   
> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
> -                                         VHOST_USER_PROTOCOL_F_RESET_DEVICE)
> -        ? VHOST_USER_RESET_DEVICE
> -        : VHOST_USER_RESET_OWNER;
> +    /* Caller must ensure the backend has VHOST_USER_PROTOCOL_F_RESET_DEVICE
> +     * support */
> +    if (!virtio_has_feature(dev->protocol_features,
> +                       VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> +        return -EPERM;
> +    }
>   
>       return vhost_user_write(dev, &msg, NULL, 0);
>   }
> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
>           .vhost_set_features = vhost_user_set_features,
>           .vhost_get_features = vhost_user_get_features,
>           .vhost_set_owner = vhost_user_set_owner,
> +        .vhost_reset_owner = vhost_user_reset_owner,
>           .vhost_reset_device = vhost_user_reset_device,
>           .vhost_get_vq_index = vhost_user_get_vq_index,
>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 81bf310..affeeb0 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
>                                        uint64_t *features);
>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
> @@ -150,6 +151,7 @@ typedef struct VhostOps {
>       vhost_get_features_op vhost_get_features;
>       vhost_set_backend_cap_op vhost_set_backend_cap;
>       vhost_set_owner_op vhost_set_owner;
> +    vhost_reset_owner_op vhost_reset_owner;
>       vhost_reset_device_op vhost_reset_device;
>       vhost_get_vq_index_op vhost_get_vq_index;
>       vhost_set_vring_enable_op vhost_set_vring_enable;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-02  1:48           ` Jason Wang
@ 2022-04-02  3:43             ` Michael Qiu
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-02  3:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: Cindy Lu, mst, qemu-devel, eperezma, Si-Wei Liu, Zhu Lingshan



On 2022/4/2 9:48, Jason Wang wrote:
> On Fri, Apr 1, 2022 at 11:22 AM Michael Qiu <qiudayu@archeros.com> wrote:
>>
>>
>>
>> On 2022/4/1 10:53, Jason Wang wrote:
>>> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
>>>>
>>>> Currently, when VM poweroff, it will trigger vdpa
>>>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>>>> queue pair and one control queue, triggered 3 times), this
>>>> leads to below issue:
>>>>
>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>
>>>> This because in vhost_net_stop(), it will stop all vhost device bind to
>>>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
>>>> , then stop the queue: vhost_virtqueue_stop().
>>>>
>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>> in low level driver, and in next loop(stop other vhost backends),
>>>> qemu try to stop the queue corresponding to the vhost backend,
>>>>    the driver finds that the VQ is invalied, this is the root cause.
>>>>
>>>> To solve the issue, vdpa should set vring unready, and
>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>
>>>> and implement a new function vhost_dev_reset, only reset backend
>>>> device after all vhost(per-queue) stoped.
>>>
>>> Typo.
>>>
>>>>
>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>
>>> Rethink this patch, consider there're devices that don't support
>>> set_vq_ready(). I wonder if we need
>>>
>>> 1) uAPI to tell the user space whether or not it supports set_vq_ready()
>>> 2) userspace will call SET_VRING_ENABLE() when the device supports
>>> otherwise it will use RESET.
>>
>> if the device does not support set_vq_ready() in kernel, it will trigger
>> kernel oops, at least in current kernel, it does not check where
>> set_vq_ready has been implemented.
>>
>> And I checked all vdpa driver in kernel, all drivers has implemented
>> this ops.
> 
> Actually, it's not about whether or not the set_vq_ready() is
> implemented. It's about whether the parent supports it correctly:
> 
> The ability to suspend and resume a virtqueue is currently beyond the
> ability of some transport (e.g PCI).
> 

OK, Got it

> For IFCVF:
> 
> static void ifcvf_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev,
>                                      u16 qid, bool ready)
> {
>          struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
> 
>          vf->vring[qid].ready = ready;
> }
> 
> It seems to follow the PCI transport, so if you just set it to zero,
> it simply doesn't work at all. I can see some tricks that are used in
> the DPDK driver, maybe we can use the same to "fix" this.
> 
> For VDUSE, we are basically the same:
> 
> static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa,
>                                          u16 idx, bool ready)
> {
>          struct vduse_dev *dev = vdpa_to_vduse(vdpa);
>          struct vduse_virtqueue *vq = &dev->vqs[idx];
> 
>          vq->ready = ready;
> }
> 
> It can't be stopped correctly if we just set it to zero.
> 
> For vp_vdpa, it basically wants to abuse the queue_enable, which may
> result in a warning in Qemu (and the device isn't stopped).
> 
> static void vp_vdpa_set_vq_ready(struct vdpa_device *vdpa,
>                                   u16 qid, bool ready)
> {
>          struct virtio_pci_modern_device *mdev = vdpa_to_mdev(vdpa);
> 
>          vp_modern_set_queue_enable(mdev, qid, ready);
> }
> 
> ENI did a trick in writing 0 to virtqueue address, so it works for
> stop but not the start.
> 
> static void eni_vdpa_set_vq_ready(struct vdpa_device *vdpa, u16 qid,
>                                    bool ready)
> {
>          struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
> 
>          /* ENI is a legacy virtio-pci device. This is not supported
>           * by specification. But we can disable virtqueue by setting
>           * address to 0.
>           */
>          if (!ready)
>                  vp_legacy_set_queue_address(ldev, qid, 0);
> }
> 
> mlx5 call suspend_vq() which should be fine.
> 
> Simulator is probably fine.
> 
> So I worry if we use the set_vq_ready(0) it won't work correctly and
> will have other issues. The idea is:
> 
> - advertise the suspend/resume capability via uAPI, then mlx5_vdpa and
> simulator can go with set_vq_ready()
> - others can still go with reset(), and we can try to fix them
> gradually (and introduce this in the virtio spec).
> 

Totally agreet.

>>
>> So I think it is OK to call set_vq_ready without check.
>>
>>>
>>> And for safety, I suggest tagging this as 7.1.
>>>
>>>> ---
>>>> v4 --> v3
>>>>       Nothing changed, becasue of issue with mimecast,
>>>>       when the From: tag is different from the sender,
>>>>       the some mail client will take the patch as an
>>>>       attachment, RESEND v3 does not work, So resend
>>>>       the patch as v4
>>>>
>>>> v3 --> v2:
>>>>       Call vhost_dev_reset() at the end of vhost_net_stop().
>>>>
>>>>       Since the vDPA device need re-add the status bit
>>>>       VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>>>       simply, add them inside vhost_vdpa_reset_device, and
>>>>       the only way calling vhost_vdpa_reset_device is in
>>>>       vhost_net_stop(), so it keeps the same behavior as before.
>>>>
>>>> v2 --> v1:
>>>>      Implement a new function vhost_dev_reset,
>>>>      reset the backend kernel device at last.
>>>> ---
>>>>    hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>>>    hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>>>    hw/virtio/vhost.c         | 15 ++++++++++++++-
>>>>    include/hw/virtio/vhost.h |  1 +
>>>>    4 files changed, 45 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>> index 30379d2..422c9bf 100644
>>>> --- a/hw/net/vhost_net.c
>>>> +++ b/hw/net/vhost_net.c
>>>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>> -    struct vhost_net *net;
>>>> +    struct vhost_net *net = NULL;
>>>>        int r, e, i, index_end = data_queue_pairs * 2;
>>>>        NetClientState *peer;
>>>>
>>>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>    err_start:
>>>>        while (--i >= 0) {
>>>>            peer = qemu_get_peer(ncs , i);
>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>> +
>>>> +        net = get_vhost_net(peer);
>>>> +
>>>> +        vhost_net_stop_one(net, dev);
>>>>        }
>>>> +
>>>> +    /* We only reset backend vdpa device */
>>>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>>>> +        vhost_dev_reset(&net->dev);
>>>> +    }
>>>> +
>>>>        e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>>>        if (e < 0) {
>>>>            fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
>>>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>        NetClientState *peer;
>>>> +    struct vhost_net *net = NULL;
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>        int i, r;
>>>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>            } else {
>>>>                peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>>            }
>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>> +
>>>> +        net = get_vhost_net(peer);
>>>> +
>>>> +        vhost_net_stop_one(net, dev);
>>>> +    }
>>>> +
>>>> +    /* We only reset backend vdpa device */
>>>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>>>> +        vhost_dev_reset(&net->dev);
>>>>        }
>>>
>>> So we've already reset the device in vhost_vdpa_dev_start(), any
>>> reason we need to do it again here?
>>
>> reset device in vhost_vdpa_dev_start if there is some error with start.
> 
> The rest should have been done in vhost_net_stop_one()?
> 
>>
>>
>>>
>>>>
>>>>        r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..3ef0199 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>>>
>>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>> +
>>>> +    /* Add back this status, so that the device could work next time*/
>>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>>
>>> This seems to contradict the semantic of reset
>>
>> Yes, but it's hard to put it in other place, seems only vhost-vdpa need
>> it, and for VM shutdown, qemu_del_nic() will do cleanup this like close
>> vhost fds, which will call reset in kernel space without set those features.
>>
>> So at last I put it here with no other inpact.
> 
> Can we move this to the suitable caller of this function?
> 

This is vhost_vdpa backend specific requirement, if we move to the 
caller, we need a backend check after each vhost_dev_reset() been called.

Otherwise we need a new vhost API vhost_add_status(), it a bit
complex because we should consider live migration and error recovery as 
Si-Wei mentioned.

Could we just leave it here and move it to right place next time.

Thanks,
Michael
> Thanks
> 
>>
>> Thanks,
>> Michael
>>>
>>>> +
>>>>        return ret;
>>>>    }
>>>>
>>>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>>>>        return idx;
>>>>    }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>>>>    {
>>>>        int i;
>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>            struct vhost_vring_state state = {
>>>>                .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = ready,
>>>>            };
>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>        }
>>>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>            if (unlikely(!ok)) {
>>>>                return -1;
>>>>            }
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>        } else {
>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>            if (unlikely(!ok)) {
>>>>                return -1;
>>>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>            memory_listener_register(&v->listener, &address_space_memory);
>>>>            return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>>>        } else {
>>>> -        vhost_vdpa_reset_device(dev);
>>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>>>            memory_listener_unregister(&v->listener);
>>>>
>>>>            return 0;
>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>> index b643f42..7e0cdb6 100644
>>>> --- a/hw/virtio/vhost.c
>>>> +++ b/hw/virtio/vhost.c
>>>> @@ -1820,7 +1820,6 @@ fail_features:
>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>>>    {
>>>>        int i;
>>>> -
>>>
>>> Unnecessary changes.
>>>
>>>>        /* should only be called after backend is connected */
>>>>        assert(hdev->vhost_ops);
>>>>
>>>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>>>
>>>>        return -ENOSYS;
>>>>    }
>>>> +
>>>> +int vhost_dev_reset(struct vhost_dev *hdev)
>>>> +{
>>>
>>> Let's use a separate patch for this.
>>>
>>> Thanks
>>>
>>>> +    int ret = 0;
>>>> +
>>>> +    /* should only be called after backend is connected */
>>>> +    assert(hdev->vhost_ops);
>>>> +
>>>> +    if (hdev->vhost_ops->vhost_reset_device) {
>>>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>> index 58a73e7..b8b7c20 100644
>>>> --- a/include/hw/virtio/vhost.h
>>>> +++ b/include/hw/virtio/vhost.h
>>>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>>>>    void vhost_dev_cleanup(struct vhost_dev *hdev);
>>>>    int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>>>    int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>    void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-02  2:20           ` Jason Wang
@ 2022-04-02  3:53             ` Michael Qiu
  2022-04-06  0:56             ` Si-Wei Liu
       [not found]             ` <6247c8f5.1c69fb81.848e0.8b49SMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-02  3:53 UTC (permalink / raw)
  To: Jason Wang, Si-Wei Liu; +Cc: eperezma, Zhu Lingshan, qemu-devel, Cindy Lu, mst



On 2022/4/2 10:20, Jason Wang wrote:
> Adding Michael.
> 
> On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>>
>> On 3/31/2022 7:53 PM, Jason Wang wrote:
>>> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
>>>> Currently, when VM poweroff, it will trigger vdpa
>>>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>>>> queue pair and one control queue, triggered 3 times), this
>>>> leads to below issue:
>>>>
>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>
>>>> This because in vhost_net_stop(), it will stop all vhost device bind to
>>>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
>>>> , then stop the queue: vhost_virtqueue_stop().
>>>>
>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>> in low level driver, and in next loop(stop other vhost backends),
>>>> qemu try to stop the queue corresponding to the vhost backend,
>>>>    the driver finds that the VQ is invalied, this is the root cause.
>>>>
>>>> To solve the issue, vdpa should set vring unready, and
>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>
>>>> and implement a new function vhost_dev_reset, only reset backend
>>>> device after all vhost(per-queue) stoped.
>>> Typo.
>>>
>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> Rethink this patch, consider there're devices that don't support
>>> set_vq_ready(). I wonder if we need
>>>
>>> 1) uAPI to tell the user space whether or not it supports set_vq_ready()
>> I guess what's more relevant here is to define the uAPI semantics for
>> unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing,
>> as starting vq is comparatively less ambiguous.
> 
> Yes.
> 
>> Considering the
>> likelihood that this interface may be used for live migration, it would
>> be nice to come up with variants such as 1) discard inflight request
>> v.s. 2) waiting for inflight processing to be done,
> 
> Or inflight descriptor reporting (which seems to be tricky). But we
> can start from net that a discarding may just work.
> 
>> and 3) timeout in
>> waiting.
> 
> Actually, that's the plan and Eugenio is proposing something like this
> via virtio spec:
> 
> https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> 
>>
>>> 2) userspace will call SET_VRING_ENABLE() when the device supports
>>> otherwise it will use RESET.
>> Are you looking to making virtqueue resume-able through the new
>> SET_VRING_ENABLE() uAPI?
>>
>> I think RESET is inevitable in some case, i.e. when guest initiates
>> device reset by writing 0 to the status register.
> 
> Yes, that's all my plan.
> 
>> For suspend/resume and
>> live migration use cases, indeed RESET can be substituted with
>> SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
>> accommodate this change. Although I'm all for it, it'd be the best to
>> lay out the plan for multiple phases rather than overload this single
>> patch too much. You can count my time on this endeavor if you don't mind. :)
> 
> You're welcome, I agree we should choose a way to go first:
> 
> 1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
> 2) go with virtio-spec (may take a while)
> 3) don't wait for the spec, have a vDPA specific uAPI first. Note that
> I've chatted with most of the vendors and they seem to be fine with
> the _S_STOP. If we go this way, we can still provide the forward
> compatibility of _S_STOP
> 4) or do them all (in parallel)
> 
> Any thoughts?
> 

virtio-spec should be long-term, not only because the spec goes very 
slowly, but also the hardware upgrade should be a problem.

For short-term, better to take the first one?

Thanks,
Michael
> Thanks
> 
>>
>>>
>>> And for safety, I suggest tagging this as 7.1.
>> +1
>>
>> Regards,
>> -Siwei
>>
>>>
>>>> ---
>>>> v4 --> v3
>>>>       Nothing changed, becasue of issue with mimecast,
>>>>       when the From: tag is different from the sender,
>>>>       the some mail client will take the patch as an
>>>>       attachment, RESEND v3 does not work, So resend
>>>>       the patch as v4
>>>>
>>>> v3 --> v2:
>>>>       Call vhost_dev_reset() at the end of vhost_net_stop().
>>>>
>>>>       Since the vDPA device need re-add the status bit
>>>>       VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>>>       simply, add them inside vhost_vdpa_reset_device, and
>>>>       the only way calling vhost_vdpa_reset_device is in
>>>>       vhost_net_stop(), so it keeps the same behavior as before.
>>>>
>>>> v2 --> v1:
>>>>      Implement a new function vhost_dev_reset,
>>>>      reset the backend kernel device at last.
>>>> ---
>>>>    hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>>>    hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>>>    hw/virtio/vhost.c         | 15 ++++++++++++++-
>>>>    include/hw/virtio/vhost.h |  1 +
>>>>    4 files changed, 45 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>> index 30379d2..422c9bf 100644
>>>> --- a/hw/net/vhost_net.c
>>>> +++ b/hw/net/vhost_net.c
>>>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>> -    struct vhost_net *net;
>>>> +    struct vhost_net *net = NULL;
>>>>        int r, e, i, index_end = data_queue_pairs * 2;
>>>>        NetClientState *peer;
>>>>
>>>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>    err_start:
>>>>        while (--i >= 0) {
>>>>            peer = qemu_get_peer(ncs , i);
>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>> +
>>>> +        net = get_vhost_net(peer);
>>>> +
>>>> +        vhost_net_stop_one(net, dev);
>>>>        }
>>>> +
>>>> +    /* We only reset backend vdpa device */
>>>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>>>> +        vhost_dev_reset(&net->dev);
>>>> +    }
>>>> +
>>>>        e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>>>        if (e < 0) {
>>>>            fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
>>>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>        NetClientState *peer;
>>>> +    struct vhost_net *net = NULL;
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>        int i, r;
>>>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>            } else {
>>>>                peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>>            }
>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>> +
>>>> +        net = get_vhost_net(peer);
>>>> +
>>>> +        vhost_net_stop_one(net, dev);
>>>> +    }
>>>> +
>>>> +    /* We only reset backend vdpa device */
>>>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>>>> +        vhost_dev_reset(&net->dev);
>>>>        }
>>> So we've already reset the device in vhost_vdpa_dev_start(), any
>>> reason we need to do it again here?
>>>
>>>>        r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..3ef0199 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>>>
>>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>> +
>>>> +    /* Add back this status, so that the device could work next time*/
>>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>> This seems to contradict the semantic of reset.
>>>
>>>> +
>>>>        return ret;
>>>>    }
>>>>
>>>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>>>>        return idx;
>>>>    }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>>>>    {
>>>>        int i;
>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>            struct vhost_vring_state state = {
>>>>                .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = ready,
>>>>            };
>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>        }
>>>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>            if (unlikely(!ok)) {
>>>>                return -1;
>>>>            }
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>        } else {
>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>            if (unlikely(!ok)) {
>>>>                return -1;
>>>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>            memory_listener_register(&v->listener, &address_space_memory);
>>>>            return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>>>        } else {
>>>> -        vhost_vdpa_reset_device(dev);
>>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>>>            memory_listener_unregister(&v->listener);
>>>>
>>>>            return 0;
>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>> index b643f42..7e0cdb6 100644
>>>> --- a/hw/virtio/vhost.c
>>>> +++ b/hw/virtio/vhost.c
>>>> @@ -1820,7 +1820,6 @@ fail_features:
>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>>>    {
>>>>        int i;
>>>> -
>>> Unnecessary changes.
>>>
>>>>        /* should only be called after backend is connected */
>>>>        assert(hdev->vhost_ops);
>>>>
>>>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>>>
>>>>        return -ENOSYS;
>>>>    }
>>>> +
>>>> +int vhost_dev_reset(struct vhost_dev *hdev)
>>>> +{
>>> Let's use a separate patch for this.
>>>
>>> Thanks
>>>
>>>> +    int ret = 0;
>>>> +
>>>> +    /* should only be called after backend is connected */
>>>> +    assert(hdev->vhost_ops);
>>>> +
>>>> +    if (hdev->vhost_ops->vhost_reset_device) {
>>>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>> index 58a73e7..b8b7c20 100644
>>>> --- a/include/hw/virtio/vhost.h
>>>> +++ b/include/hw/virtio/vhost.h
>>>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>>>>    void vhost_dev_cleanup(struct vhost_dev *hdev);
>>>>    int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>>>    int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>    void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>>>
>>
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-02  2:38           ` Jason Wang
@ 2022-04-02  5:14             ` Michael Qiu
       [not found]             ` <6247dc22.1c69fb81.4244.a88bSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 47+ messages in thread
From: Michael Qiu @ 2022-04-02  5:14 UTC (permalink / raw)
  To: Jason Wang, mst, si-wei.liu; +Cc: eperezma, lingshan.zhu, qemu-devel, lulu



On 2022/4/2 10:38, Jason Wang wrote:
> 
> 在 2022/4/1 下午7:06, Michael Qiu 写道:
>> Currently in vhost framwork, vhost_reset_device() is misnamed.
>> Actually, it should be vhost_reset_owner().
>>
>> In vhost user, it make compatible with reset device ops, but
>> vhost kernel does not compatible with it, for vhost vdpa, it
>> only implement reset device action.
>>
>> So we need seperate the function into vhost_reset_owner() and
>> vhost_reset_device(). So that different backend could use the
>> correct function.
> 
> 
> I see no reason when RESET_OWNER needs to be done for kernel backend.
> 

In kernel vhost, RESET_OWNER  indeed do vhost device level reset: 
vhost_net_reset_owner()

static long vhost_net_reset_owner(struct vhost_net *n)
{
[...]
         err = vhost_dev_check_owner(&n->dev);
         if (err)
                 goto done;
         umem = vhost_dev_reset_owner_prepare();
         if (!umem) {
                 err = -ENOMEM;
                 goto done;
         }
         vhost_net_stop(n, &tx_sock, &rx_sock);
         vhost_net_flush(n);
         vhost_dev_stop(&n->dev);
         vhost_dev_reset_owner(&n->dev, umem);
         vhost_net_vq_reset(n);
[...]

}

In the history of QEMU, There is a commit:
commit d1f8b30ec8dde0318fd1b98d24a64926feae9625
Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Date:   Wed Sep 23 12:19:57 2015 +0800

     vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE

     Quote from Michael:

         We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE.

but finally, it has been reverted by the author:
commit 60915dc4691768c4dc62458bb3e16c843fab091d
Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Date:   Wed Nov 11 21:24:37 2015 +0800

     vhost: rename RESET_DEVICE backto RESET_OWNER

     This patch basically reverts commit d1f8b30e.

     It turned out that it breaks stuff, so revert it:
 
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html

Seems kernel take RESET_OWNER for reset,but QEMU never call to this 
function to do a reset.

> And if I understand the code correctly, vhost-user "abuse" RESET_OWNER 
> for reset. So the current code looks fine?
> 
> 
>>
>> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
>> ---
>>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
>>   hw/virtio/vhost-backend.c         |  4 ++--
>>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
>>   include/hw/virtio/vhost-backend.h |  2 ++
>>   4 files changed, 27 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>> index 1b2f7ee..f179626 100644
>> --- a/hw/scsi/vhost-user-scsi.c
>> +++ b/hw/scsi/vhost-user-scsi.c
>> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
>>           return;
>>       }
>> -    if (dev->vhost_ops->vhost_reset_device) {
>> +    if (virtio_has_feature(dev->protocol_features,
>> +                           VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
>> +                           dev->vhost_ops->vhost_reset_device) {
>>           dev->vhost_ops->vhost_reset_device(dev);
>> +    } else if (dev->vhost_ops->vhost_reset_owner) {
>> +        dev->vhost_ops->vhost_reset_owner(dev);
> 
> 
> Actually, I fail to understand why we need an indirection via vhost_ops. 
> It's guaranteed to be vhost_user_ops.
> 
> 
>>       }
>>   }
>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
>> index e409a86..abbaa8b 100644
>> --- a/hw/virtio/vhost-backend.c
>> +++ b/hw/virtio/vhost-backend.c
>> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev 
>> *dev)
>>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>>   }
>> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
>> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
>>   {
>>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>>   }
>> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
>>           .vhost_get_features = vhost_kernel_get_features,
>>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>>           .vhost_set_owner = vhost_kernel_set_owner,
>> -        .vhost_reset_device = vhost_kernel_reset_device,
>> +        .vhost_reset_owner = vhost_kernel_reset_owner,
> 
> 
> I think we can delete the current vhost_reset_device() since it not used 
> in any code path.
> 

I planned to use it for vDPA reset, and vhost-user-scsi also use device 
reset.

Thanks,
Michael

> Thanks
> 
> 
>>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
>>   #ifdef CONFIG_VHOST_VSOCK
>>           .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> index 6abbc9d..4412008 100644
>> --- a/hw/virtio/vhost-user.c
>> +++ b/hw/virtio/vhost-user.c
>> @@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct 
>> vhost_dev *dev,
>>       return 0;
>>   }
>> +static int vhost_user_reset_owner(struct vhost_dev *dev)
>> +{
>> +    VhostUserMsg msg = {
>> +        .hdr.request = VHOST_USER_RESET_OWNER,
>> +        .hdr.flags = VHOST_USER_VERSION,
>> +    };
>> +
>> +    return vhost_user_write(dev, &msg, NULL, 0);
>> +}
>> +
>>   static int vhost_user_reset_device(struct vhost_dev *dev)
>>   {
>>       VhostUserMsg msg = {
>> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>>           .hdr.flags = VHOST_USER_VERSION,
>>       };
>> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
>> -                                         
>> VHOST_USER_PROTOCOL_F_RESET_DEVICE)
>> -        ? VHOST_USER_RESET_DEVICE
>> -        : VHOST_USER_RESET_OWNER;
>> +    /* Caller must ensure the backend has 
>> VHOST_USER_PROTOCOL_F_RESET_DEVICE
>> +     * support */
>> +    if (!virtio_has_feature(dev->protocol_features,
>> +                       VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
>> +        return -EPERM;
>> +    }
>>       return vhost_user_write(dev, &msg, NULL, 0);
>>   }
>> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
>>           .vhost_set_features = vhost_user_set_features,
>>           .vhost_get_features = vhost_user_get_features,
>>           .vhost_set_owner = vhost_user_set_owner,
>> +        .vhost_reset_owner = vhost_user_reset_owner,
>>           .vhost_reset_device = vhost_user_reset_device,
>>           .vhost_get_vq_index = vhost_user_get_vq_index,
>>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
>> diff --git a/include/hw/virtio/vhost-backend.h 
>> b/include/hw/virtio/vhost-backend.h
>> index 81bf310..affeeb0 100644
>> --- a/include/hw/virtio/vhost-backend.h
>> +++ b/include/hw/virtio/vhost-backend.h
>> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct 
>> vhost_dev *dev,
>>                                        uint64_t *features);
>>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
>>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
>> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
>>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
>>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
>> @@ -150,6 +151,7 @@ typedef struct VhostOps {
>>       vhost_get_features_op vhost_get_features;
>>       vhost_set_backend_cap_op vhost_set_backend_cap;
>>       vhost_set_owner_op vhost_set_owner;
>> +    vhost_reset_owner_op vhost_reset_owner;
>>       vhost_reset_device_op vhost_reset_device;
>>       vhost_get_vq_index_op vhost_get_vq_index;
>>       vhost_set_vring_enable_op vhost_set_vring_enable;
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-02  2:20           ` Jason Wang
  2022-04-02  3:53             ` Michael Qiu
@ 2022-04-06  0:56             ` Si-Wei Liu
  2022-04-07  7:50               ` Jason Wang
       [not found]             ` <6247c8f5.1c69fb81.848e0.8b49SMTPIN_ADDED_BROKEN@mx.google.com>
  2 siblings, 1 reply; 47+ messages in thread
From: Si-Wei Liu @ 2022-04-06  0:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: Cindy Lu, mst, qemu-devel, eperezma, Michael Qiu, Zhu Lingshan



On 4/1/2022 7:20 PM, Jason Wang wrote:
> Adding Michael.
>
> On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/31/2022 7:53 PM, Jason Wang wrote:
>>> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> wrote:
>>>> Currently, when VM poweroff, it will trigger vdpa
>>>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>>>> queue pair and one control queue, triggered 3 times), this
>>>> leads to below issue:
>>>>
>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>
>>>> This because in vhost_net_stop(), it will stop all vhost device bind to
>>>> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
>>>> , then stop the queue: vhost_virtqueue_stop().
>>>>
>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>> in low level driver, and in next loop(stop other vhost backends),
>>>> qemu try to stop the queue corresponding to the vhost backend,
>>>>    the driver finds that the VQ is invalied, this is the root cause.
>>>>
>>>> To solve the issue, vdpa should set vring unready, and
>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>
>>>> and implement a new function vhost_dev_reset, only reset backend
>>>> device after all vhost(per-queue) stoped.
>>> Typo.
>>>
>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> Rethink this patch, consider there're devices that don't support
>>> set_vq_ready(). I wonder if we need
>>>
>>> 1) uAPI to tell the user space whether or not it supports set_vq_ready()
>> I guess what's more relevant here is to define the uAPI semantics for
>> unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing,
>> as starting vq is comparatively less ambiguous.
> Yes.
>
>> Considering the
>> likelihood that this interface may be used for live migration, it would
>> be nice to come up with variants such as 1) discard inflight request
>> v.s. 2) waiting for inflight processing to be done,
> Or inflight descriptor reporting (which seems to be tricky). But we
> can start from net that a discarding may just work.
>
>> and 3) timeout in
>> waiting.
> Actually, that's the plan and Eugenio is proposing something like this
> via virtio spec:
>
> https://urldefense.com/v3/__https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html__;!!ACWV5N9M2RV99hQ!bcX6i6_atR-6Gcl-4q5Tekab_xDuXr7lDAMw2E1hilZ_1cZIX1c5mztQtvsnjiiy$
Thanks for the pointer, I seem to recall I saw it some time back though 
I wonder if there's follow-up for the v3? My impression was that this is 
still a work-in-progress spec proposal, while the semantics of various 
F_STOP scenario is unclear yet and not all of the requirements (ex: 
STOP_FAILED, rewind & !IN_ORDER) for live migration do seem to get 
accommodated?

>
>>> 2) userspace will call SET_VRING_ENABLE() when the device supports
>>> otherwise it will use RESET.
>> Are you looking to making virtqueue resume-able through the new
>> SET_VRING_ENABLE() uAPI?
>>
>> I think RESET is inevitable in some case, i.e. when guest initiates
>> device reset by writing 0 to the status register.
> Yes, that's all my plan.
>
>> For suspend/resume and
>> live migration use cases, indeed RESET can be substituted with
>> SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
>> accommodate this change. Although I'm all for it, it'd be the best to
>> lay out the plan for multiple phases rather than overload this single
>> patch too much. You can count my time on this endeavor if you don't mind. :)
> You're welcome, I agree we should choose a way to go first:
>
> 1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
For networking device and the vq suspend/resume and live migration use 
cases to support, I thought it might suffice? We may drop inflight or 
unused ones for Ethernet... What other part do you think may limit its 
extension to become a general uAPI or add new uAPI to address similar VQ 
stop requirement if need be? Or we might well define subsystem specific 
uAPI to stop the virtqueue, for vdpa device specifically? I think the 
point here is given that we would like to avoid guest side modification 
to support live migration, we can define specific uAPI for specific live 
migration requirement without having to involve guest driver change. 
It'd be easy to get started this way and generalize them all to a full 
blown _S_STOP when things are eventually settled.

> 2) go with virtio-spec (may take a while)
I feel it might be still quite early for now to get to a full blown 
_S_STOP spec level amendment that works for all types of virtio (vendor) 
devices. Generally there can be very specific subsystem-dependent ways 
to stop each type of virtio devices that satisfies the live migration of 
virtio subsystem devices. For now the discussion mostly concerns with vq 
index rewind, inflight handling, notification interrupt and 
configuration space such kind of virtio level things, but real device 
backend has implication on the other parts such as the order of IO/DMA 
quiescing and interrupt masking. If the subsystem virtio guest drivers 
today somehow don't support any of those _S_STOP new behaviors, I guess 
it's with little point to introduce the same or similar _S_STOP 
functionality to the guest driver to effectively support live migration.


Thanks,
-Siwei
> 3) don't wait for the spec, have a vDPA specific uAPI first. Note that
> I've chatted with most of the vendors and they seem to be fine with
> the _S_STOP. If we go this way, we can still provide the forward
> compatibility of _S_STOP
> 4) or do them all (in parallel)
>
> Any thoughts?
>
> Thanks
>
>>> And for safety, I suggest tagging this as 7.1.
>> +1
>>
>> Regards,
>> -Siwei
>>
>>>> ---
>>>> v4 --> v3
>>>>       Nothing changed, becasue of issue with mimecast,
>>>>       when the From: tag is different from the sender,
>>>>       the some mail client will take the patch as an
>>>>       attachment, RESEND v3 does not work, So resend
>>>>       the patch as v4
>>>>
>>>> v3 --> v2:
>>>>       Call vhost_dev_reset() at the end of vhost_net_stop().
>>>>
>>>>       Since the vDPA device need re-add the status bit
>>>>       VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>>>       simply, add them inside vhost_vdpa_reset_device, and
>>>>       the only way calling vhost_vdpa_reset_device is in
>>>>       vhost_net_stop(), so it keeps the same behavior as before.
>>>>
>>>> v2 --> v1:
>>>>      Implement a new function vhost_dev_reset,
>>>>      reset the backend kernel device at last.
>>>> ---
>>>>    hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>>>    hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>>>    hw/virtio/vhost.c         | 15 ++++++++++++++-
>>>>    include/hw/virtio/vhost.h |  1 +
>>>>    4 files changed, 45 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>> index 30379d2..422c9bf 100644
>>>> --- a/hw/net/vhost_net.c
>>>> +++ b/hw/net/vhost_net.c
>>>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>> -    struct vhost_net *net;
>>>> +    struct vhost_net *net = NULL;
>>>>        int r, e, i, index_end = data_queue_pairs * 2;
>>>>        NetClientState *peer;
>>>>
>>>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>    err_start:
>>>>        while (--i >= 0) {
>>>>            peer = qemu_get_peer(ncs , i);
>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>> +
>>>> +        net = get_vhost_net(peer);
>>>> +
>>>> +        vhost_net_stop_one(net, dev);
>>>>        }
>>>> +
>>>> +    /* We only reset backend vdpa device */
>>>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>>>> +        vhost_dev_reset(&net->dev);
>>>> +    }
>>>> +
>>>>        e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>>>        if (e < 0) {
>>>>            fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
>>>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>        NetClientState *peer;
>>>> +    struct vhost_net *net = NULL;
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>        int i, r;
>>>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>            } else {
>>>>                peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>>            }
>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>> +
>>>> +        net = get_vhost_net(peer);
>>>> +
>>>> +        vhost_net_stop_one(net, dev);
>>>> +    }
>>>> +
>>>> +    /* We only reset backend vdpa device */
>>>> +    if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
>>>> +        vhost_dev_reset(&net->dev);
>>>>        }
>>> So we've already reset the device in vhost_vdpa_dev_start(), any
>>> reason we need to do it again here?
>>>
>>>>        r = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index c5ed7a3..3ef0199 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>>>
>>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>> +
>>>> +    /* Add back this status, so that the device could work next time*/
>>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>> This seems to contradict the semantic of reset.
>>>
>>>> +
>>>>        return ret;
>>>>    }
>>>>
>>>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>>>>        return idx;
>>>>    }
>>>>
>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, unsigned int ready)
>>>>    {
>>>>        int i;
>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>            struct vhost_vring_state state = {
>>>>                .index = dev->vq_index + i,
>>>> -            .num = 1,
>>>> +            .num = ready,
>>>>            };
>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>        }
>>>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>            if (unlikely(!ok)) {
>>>>                return -1;
>>>>            }
>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>        } else {
>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>            if (unlikely(!ok)) {
>>>>                return -1;
>>>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>            memory_listener_register(&v->listener, &address_space_memory);
>>>>            return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>>>        } else {
>>>> -        vhost_vdpa_reset_device(dev);
>>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>>>            memory_listener_unregister(&v->listener);
>>>>
>>>>            return 0;
>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>> index b643f42..7e0cdb6 100644
>>>> --- a/hw/virtio/vhost.c
>>>> +++ b/hw/virtio/vhost.c
>>>> @@ -1820,7 +1820,6 @@ fail_features:
>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>>>    {
>>>>        int i;
>>>> -
>>> Unnecessary changes.
>>>
>>>>        /* should only be called after backend is connected */
>>>>        assert(hdev->vhost_ops);
>>>>
>>>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>>>>
>>>>        return -ENOSYS;
>>>>    }
>>>> +
>>>> +int vhost_dev_reset(struct vhost_dev *hdev)
>>>> +{
>>> Let's use a separate patch for this.
>>>
>>> Thanks
>>>
>>>> +    int ret = 0;
>>>> +
>>>> +    /* should only be called after backend is connected */
>>>> +    assert(hdev->vhost_ops);
>>>> +
>>>> +    if (hdev->vhost_ops->vhost_reset_device) {
>>>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>> index 58a73e7..b8b7c20 100644
>>>> --- a/include/hw/virtio/vhost.h
>>>> +++ b/include/hw/virtio/vhost.h
>>>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>>>>    void vhost_dev_cleanup(struct vhost_dev *hdev);
>>>>    int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>>>    int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>    void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>>>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
       [not found]             ` <6247dc22.1c69fb81.4244.a88bSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2022-04-07  7:35               ` Jason Wang
  2022-04-08  8:38                 ` Michael Qiu
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Wang @ 2022-04-07  7:35 UTC (permalink / raw)
  To: qemu-devel, Michael Qiu
  Cc: Si-Wei Liu, Eugenio Perez Martin, Zhu, Lingshan, Cindy Lu


在 2022/4/2 下午1:14, Michael Qiu 写道:
>
>
> On 2022/4/2 10:38, Jason Wang wrote:
>>
>> 在 2022/4/1 下午7:06, Michael Qiu 写道:
>>> Currently in vhost framwork, vhost_reset_device() is misnamed.
>>> Actually, it should be vhost_reset_owner().
>>>
>>> In vhost user, it make compatible with reset device ops, but
>>> vhost kernel does not compatible with it, for vhost vdpa, it
>>> only implement reset device action.
>>>
>>> So we need seperate the function into vhost_reset_owner() and
>>> vhost_reset_device(). So that different backend could use the
>>> correct function.
>>
>>
>> I see no reason when RESET_OWNER needs to be done for kernel backend.
>>
>
> In kernel vhost, RESET_OWNER  indeed do vhost device level reset: 
> vhost_net_reset_owner()
>
> static long vhost_net_reset_owner(struct vhost_net *n)
> {
> [...]
>         err = vhost_dev_check_owner(&n->dev);
>         if (err)
>                 goto done;
>         umem = vhost_dev_reset_owner_prepare();
>         if (!umem) {
>                 err = -ENOMEM;
>                 goto done;
>         }
>         vhost_net_stop(n, &tx_sock, &rx_sock);
>         vhost_net_flush(n);
>         vhost_dev_stop(&n->dev);
>         vhost_dev_reset_owner(&n->dev, umem);
>         vhost_net_vq_reset(n);
> [...]
>
> }
>
> In the history of QEMU, There is a commit:
> commit d1f8b30ec8dde0318fd1b98d24a64926feae9625
> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Date:   Wed Sep 23 12:19:57 2015 +0800
>
>     vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
>
>     Quote from Michael:
>
>         We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE.
>
> but finally, it has been reverted by the author:
> commit 60915dc4691768c4dc62458bb3e16c843fab091d
> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Date:   Wed Nov 11 21:24:37 2015 +0800
>
>     vhost: rename RESET_DEVICE backto RESET_OWNER
>
>     This patch basically reverts commit d1f8b30e.
>
>     It turned out that it breaks stuff, so revert it:
>
> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html
>
> Seems kernel take RESET_OWNER for reset,but QEMU never call to this 
> function to do a reset.


The question is, we manage to survive by not using RESET_OWNER for past 
10 years. Any reason that we want to use that now?

Note that the RESET_OWNER is only useful the process want to drop the 
its mm refcnt from vhost, it doesn't reset the device (e.g it does not 
even call vhost_vq_reset()).

(Especially, it was deprecated in by the vhost-user protocol since its 
semantics is ambiguous)


>
>> And if I understand the code correctly, vhost-user "abuse" 
>> RESET_OWNER for reset. So the current code looks fine?
>>
>>
>>>
>>> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
>>> ---
>>>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
>>>   hw/virtio/vhost-backend.c         |  4 ++--
>>>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
>>>   include/hw/virtio/vhost-backend.h |  2 ++
>>>   4 files changed, 27 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>>> index 1b2f7ee..f179626 100644
>>> --- a/hw/scsi/vhost-user-scsi.c
>>> +++ b/hw/scsi/vhost-user-scsi.c
>>> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice 
>>> *vdev)
>>>           return;
>>>       }
>>> -    if (dev->vhost_ops->vhost_reset_device) {
>>> +    if (virtio_has_feature(dev->protocol_features,
>>> + VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
>>> + dev->vhost_ops->vhost_reset_device) {
>>>           dev->vhost_ops->vhost_reset_device(dev);
>>> +    } else if (dev->vhost_ops->vhost_reset_owner) {
>>> +        dev->vhost_ops->vhost_reset_owner(dev);
>>
>>
>> Actually, I fail to understand why we need an indirection via 
>> vhost_ops. It's guaranteed to be vhost_user_ops.
>>
>>
>>>       }
>>>   }
>>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
>>> index e409a86..abbaa8b 100644
>>> --- a/hw/virtio/vhost-backend.c
>>> +++ b/hw/virtio/vhost-backend.c
>>> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct 
>>> vhost_dev *dev)
>>>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>>>   }
>>> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
>>> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
>>>   {
>>>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>>>   }
>>> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
>>>           .vhost_get_features = vhost_kernel_get_features,
>>>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>>>           .vhost_set_owner = vhost_kernel_set_owner,
>>> -        .vhost_reset_device = vhost_kernel_reset_device,
>>> +        .vhost_reset_owner = vhost_kernel_reset_owner,
>>
>>
>> I think we can delete the current vhost_reset_device() since it not 
>> used in any code path.
>>
>
> I planned to use it for vDPA reset, 


For vhost-vDPA it can call vhost_vdpa_reset_device() directly.

As I mentioned before, the only user of vhost_reset_device config ops is 
vhost-user-scsi but it should directly call the vhost_user_reset_device().

Thanks


> and vhost-user-scsi also use device reset.
>
> Thanks,
> Michael
>
>> Thanks
>>
>>
>>>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
>>>   #ifdef CONFIG_VHOST_VSOCK
>>>           .vhost_vsock_set_guest_cid = 
>>> vhost_kernel_vsock_set_guest_cid,
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index 6abbc9d..4412008 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -1475,16 +1475,29 @@ static int 
>>> vhost_user_get_max_memslots(struct vhost_dev *dev,
>>>       return 0;
>>>   }
>>> +static int vhost_user_reset_owner(struct vhost_dev *dev)
>>> +{
>>> +    VhostUserMsg msg = {
>>> +        .hdr.request = VHOST_USER_RESET_OWNER,
>>> +        .hdr.flags = VHOST_USER_VERSION,
>>> +    };
>>> +
>>> +    return vhost_user_write(dev, &msg, NULL, 0);
>>> +}
>>> +
>>>   static int vhost_user_reset_device(struct vhost_dev *dev)
>>>   {
>>>       VhostUserMsg msg = {
>>> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>>>           .hdr.flags = VHOST_USER_VERSION,
>>>       };
>>> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
>>> - VHOST_USER_PROTOCOL_F_RESET_DEVICE)
>>> -        ? VHOST_USER_RESET_DEVICE
>>> -        : VHOST_USER_RESET_OWNER;
>>> +    /* Caller must ensure the backend has 
>>> VHOST_USER_PROTOCOL_F_RESET_DEVICE
>>> +     * support */
>>> +    if (!virtio_has_feature(dev->protocol_features,
>>> +                       VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
>>> +        return -EPERM;
>>> +    }
>>>       return vhost_user_write(dev, &msg, NULL, 0);
>>>   }
>>> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
>>>           .vhost_set_features = vhost_user_set_features,
>>>           .vhost_get_features = vhost_user_get_features,
>>>           .vhost_set_owner = vhost_user_set_owner,
>>> +        .vhost_reset_owner = vhost_user_reset_owner,
>>>           .vhost_reset_device = vhost_user_reset_device,
>>>           .vhost_get_vq_index = vhost_user_get_vq_index,
>>>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
>>> diff --git a/include/hw/virtio/vhost-backend.h 
>>> b/include/hw/virtio/vhost-backend.h
>>> index 81bf310..affeeb0 100644
>>> --- a/include/hw/virtio/vhost-backend.h
>>> +++ b/include/hw/virtio/vhost-backend.h
>>> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct 
>>> vhost_dev *dev,
>>>                                        uint64_t *features);
>>>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
>>>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
>>> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
>>>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>>>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
>>>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
>>> @@ -150,6 +151,7 @@ typedef struct VhostOps {
>>>       vhost_get_features_op vhost_get_features;
>>>       vhost_set_backend_cap_op vhost_set_backend_cap;
>>>       vhost_set_owner_op vhost_set_owner;
>>> +    vhost_reset_owner_op vhost_reset_owner;
>>>       vhost_reset_device_op vhost_reset_device;
>>>       vhost_get_vq_index_op vhost_get_vq_index;
>>>       vhost_set_vring_enable_op vhost_set_vring_enable;
>>
>>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
  2022-04-06  0:56             ` Si-Wei Liu
@ 2022-04-07  7:50               ` Jason Wang
  0 siblings, 0 replies; 47+ messages in thread
From: Jason Wang @ 2022-04-07  7:50 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Cindy Lu, mst, qemu-devel, eperezma, Michael Qiu, Zhu Lingshan


在 2022/4/6 上午8:56, Si-Wei Liu 写道:
>
>
> On 4/1/2022 7:20 PM, Jason Wang wrote:
>> Adding Michael.
>>
>> On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>
>>>
>>> On 3/31/2022 7:53 PM, Jason Wang wrote:
>>>> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> 
>>>> wrote:
>>>>> Currently, when VM poweroff, it will trigger vdpa
>>>>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>>>>> queue pair and one control queue, triggered 3 times), this
>>>>> leads to below issue:
>>>>>
>>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>>
>>>>> This because in vhost_net_stop(), it will stop all vhost device 
>>>>> bind to
>>>>> this virtio device, and in vhost_dev_stop(), qemu tries to stop 
>>>>> the device
>>>>> , then stop the queue: vhost_virtqueue_stop().
>>>>>
>>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>>> in low level driver, and in next loop(stop other vhost backends),
>>>>> qemu try to stop the queue corresponding to the vhost backend,
>>>>>    the driver finds that the VQ is invalied, this is the root cause.
>>>>>
>>>>> To solve the issue, vdpa should set vring unready, and
>>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>>
>>>>> and implement a new function vhost_dev_reset, only reset backend
>>>>> device after all vhost(per-queue) stoped.
>>>> Typo.
>>>>
>>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>> Rethink this patch, consider there're devices that don't support
>>>> set_vq_ready(). I wonder if we need
>>>>
>>>> 1) uAPI to tell the user space whether or not it supports 
>>>> set_vq_ready()
>>> I guess what's more relevant here is to define the uAPI semantics for
>>> unready i.e. set_vq_ready(0) for resuming/stopping virtqueue 
>>> processing,
>>> as starting vq is comparatively less ambiguous.
>> Yes.
>>
>>> Considering the
>>> likelihood that this interface may be used for live migration, it would
>>> be nice to come up with variants such as 1) discard inflight request
>>> v.s. 2) waiting for inflight processing to be done,
>> Or inflight descriptor reporting (which seems to be tricky). But we
>> can start from net that a discarding may just work.
>>
>>> and 3) timeout in
>>> waiting.
>> Actually, that's the plan and Eugenio is proposing something like this
>> via virtio spec:
>>
>> https://urldefense.com/v3/__https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html__;!!ACWV5N9M2RV99hQ!bcX6i6_atR-6Gcl-4q5Tekab_xDuXr7lDAMw2E1hilZ_1cZIX1c5mztQtvsnjiiy$ 
>>
> Thanks for the pointer, I seem to recall I saw it some time back 
> though I wonder if there's follow-up for the v3? My impression was 
> that this is still a work-in-progress spec proposal, while the 
> semantics of various F_STOP scenario is unclear yet and not all of the 
> requirements (ex: STOP_FAILED, rewind & !IN_ORDER) for live migration 
> do seem to get accommodated?


My understanding is that, the reason for STOP_FAILED and IN_ORDER is 
because we don't have a way to report inflight descriptors. We will try 
to overcome those by allow the device to report inflight descriptors in 
the next version.


>
>>
>>>> 2) userspace will call SET_VRING_ENABLE() when the device supports
>>>> otherwise it will use RESET.
>>> Are you looking to making virtqueue resume-able through the new
>>> SET_VRING_ENABLE() uAPI?
>>>
>>> I think RESET is inevitable in some case, i.e. when guest initiates
>>> device reset by writing 0 to the status register.
>> Yes, that's all my plan.
>>
>>> For suspend/resume and
>>> live migration use cases, indeed RESET can be substituted with
>>> SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
>>> accommodate this change. Although I'm all for it, it'd be the best to
>>> lay out the plan for multiple phases rather than overload this single
>>> patch too much. You can count my time on this endeavor if you don't 
>>> mind. :)
>> You're welcome, I agree we should choose a way to go first:
>>
>> 1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
> For networking device and the vq suspend/resume and live migration use 
> cases to support, I thought it might suffice?


Without config space change it would be sufficient. And anyhow the vDPA 
parent can prevent the config change if all the virtqueue is disabled.


> We may drop inflight or unused ones for Ethernet...


Yes.


> What other part do you think may limit its extension to become a 
> general uAPI or add new uAPI to address similar VQ stop requirement if 
> need be? 


For networking, we don't need other.


> Or we might well define subsystem specific uAPI to stop the virtqueue, 
> for vdpa device specifically?


Anyhow we need a uAPI consider we have some parent that doesn't support 
virtqueue stop. So this could be another way to go.

But if we decide to bother with new uAPI, I would rather go with a new 
uAPI for stop the device. It can help for the config space change as well.


> I think the point here is given that we would like to avoid guest side 
> modification to support live migration, we can define specific uAPI 
> for specific live migration requirement without having to involve 
> guest driver change. It'd be easy to get started this way and 
> generalize them all to a full blown _S_STOP when things are eventually 
> settled.


Yes, note that the status seen by guest is mediated by the hypervisor. 
So the hypervisor can choose not to hide the _S_STOP from the guest to 
keep the migration work without modifications in the guest driver.


>
>> 2) go with virtio-spec (may take a while)
> I feel it might be still quite early for now to get to a full blown 
> _S_STOP spec level amendment that works for all types of virtio 
> (vendor) devices. Generally there can be very specific 
> subsystem-dependent ways to stop each type of virtio devices that 
> satisfies the live migration of virtio subsystem devices.


Yes.


> For now the discussion mostly concerns with vq index rewind, inflight 
> handling, notification interrupt and configuration space such kind of 
> virtio level things, but real device backend has implication on the 
> other parts such as the order of IO/DMA quiescing and interrupt masking.


It's the charge of the vDPA parent to perform any necessary quiescing to 
satisfy the semantic of _S_STOP, it's an implementation detail which is 
out of the scope of the spec.


> If the subsystem virtio guest drivers today somehow don't support any 
> of those _S_STOP new behaviors, I guess it's with little point to 
> introduce the same or similar _S_STOP functionality to the guest 
> driver to effectively support live migration.


See above, the live migration is transparent to the guest. For the 
driver that doesn't support _S_STOP, we can still live migrate it. The 
only interesting part is nesting: if we want to live migrate a nested 
guest we need to the guest driver must support _S_STOP.

Thanks


>
>
> Thanks,
> -Siwei
>> 3) don't wait for the spec, have a vDPA specific uAPI first. Note that
>> I've chatted with most of the vendors and they seem to be fine with
>> the _S_STOP. If we go this way, we can still provide the forward
>> compatibility of _S_STOP
>> 4) or do them all (in parallel)
>>
>> Any thoughts?
>>
>> Thanks
>>
>>>> And for safety, I suggest tagging this as 7.1.
>>> +1
>>>
>>> Regards,
>>> -Siwei
>>>
>>>>> ---
>>>>> v4 --> v3
>>>>>       Nothing changed, becasue of issue with mimecast,
>>>>>       when the From: tag is different from the sender,
>>>>>       the some mail client will take the patch as an
>>>>>       attachment, RESEND v3 does not work, So resend
>>>>>       the patch as v4
>>>>>
>>>>> v3 --> v2:
>>>>>       Call vhost_dev_reset() at the end of vhost_net_stop().
>>>>>
>>>>>       Since the vDPA device need re-add the status bit
>>>>>       VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>>>>       simply, add them inside vhost_vdpa_reset_device, and
>>>>>       the only way calling vhost_vdpa_reset_device is in
>>>>>       vhost_net_stop(), so it keeps the same behavior as before.
>>>>>
>>>>> v2 --> v1:
>>>>>      Implement a new function vhost_dev_reset,
>>>>>      reset the backend kernel device at last.
>>>>> ---
>>>>>    hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>>>>    hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>>>>    hw/virtio/vhost.c         | 15 ++++++++++++++-
>>>>>    include/hw/virtio/vhost.h |  1 +
>>>>>    4 files changed, 45 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>> index 30379d2..422c9bf 100644
>>>>> --- a/hw/net/vhost_net.c
>>>>> +++ b/hw/net/vhost_net.c
>>>>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>> -    struct vhost_net *net;
>>>>> +    struct vhost_net *net = NULL;
>>>>>        int r, e, i, index_end = data_queue_pairs * 2;
>>>>>        NetClientState *peer;
>>>>>
>>>>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>    err_start:
>>>>>        while (--i >= 0) {
>>>>>            peer = qemu_get_peer(ncs , i);
>>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>> +
>>>>> +        net = get_vhost_net(peer);
>>>>> +
>>>>> +        vhost_net_stop_one(net, dev);
>>>>>        }
>>>>> +
>>>>> +    /* We only reset backend vdpa device */
>>>>> +    if (net && net->dev.vhost_ops->backend_type == 
>>>>> VHOST_BACKEND_TYPE_VDPA) {
>>>>> +        vhost_dev_reset(&net->dev);
>>>>> +    }
>>>>> +
>>>>>        e = k->set_guest_notifiers(qbus->parent, total_notifiers, 
>>>>> false);
>>>>>        if (e < 0) {
>>>>>            fprintf(stderr, "vhost guest notifier cleanup failed: 
>>>>> %d\n", e);
>>>>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>>        NetClientState *peer;
>>>>> +    struct vhost_net *net = NULL;
>>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>>        int i, r;
>>>>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>            } else {
>>>>>                peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>>>            }
>>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>> +
>>>>> +        net = get_vhost_net(peer);
>>>>> +
>>>>> +        vhost_net_stop_one(net, dev);
>>>>> +    }
>>>>> +
>>>>> +    /* We only reset backend vdpa device */
>>>>> +    if (net && net->dev.vhost_ops->backend_type == 
>>>>> VHOST_BACKEND_TYPE_VDPA) {
>>>>> +        vhost_dev_reset(&net->dev);
>>>>>        }
>>>> So we've already reset the device in vhost_vdpa_dev_start(), any
>>>> reason we need to do it again here?
>>>>
>>>>>        r = k->set_guest_notifiers(qbus->parent, total_notifiers, 
>>>>> false);
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index c5ed7a3..3ef0199 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct 
>>>>> vhost_dev *dev)
>>>>>
>>>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>>> +
>>>>> +    /* Add back this status, so that the device could work next 
>>>>> time*/
>>>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>>> This seems to contradict the semantic of reset.
>>>>
>>>>> +
>>>>>        return ret;
>>>>>    }
>>>>>
>>>>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct 
>>>>> vhost_dev *dev, int idx)
>>>>>        return idx;
>>>>>    }
>>>>>
>>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, 
>>>>> unsigned int ready)
>>>>>    {
>>>>>        int i;
>>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>>            struct vhost_vring_state state = {
>>>>>                .index = dev->vq_index + i,
>>>>> -            .num = 1,
>>>>> +            .num = ready,
>>>>>            };
>>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>>        }
>>>>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct 
>>>>> vhost_dev *dev, bool started)
>>>>>            if (unlikely(!ok)) {
>>>>>                return -1;
>>>>>            }
>>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>>        } else {
>>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>>            if (unlikely(!ok)) {
>>>>>                return -1;
>>>>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct 
>>>>> vhost_dev *dev, bool started)
>>>>>            memory_listener_register(&v->listener, 
>>>>> &address_space_memory);
>>>>>            return vhost_vdpa_add_status(dev, 
>>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>        } else {
>>>>> -        vhost_vdpa_reset_device(dev);
>>>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> - VIRTIO_CONFIG_S_DRIVER);
>>>>> memory_listener_unregister(&v->listener);
>>>>>
>>>>>            return 0;
>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>> index b643f42..7e0cdb6 100644
>>>>> --- a/hw/virtio/vhost.c
>>>>> +++ b/hw/virtio/vhost.c
>>>>> @@ -1820,7 +1820,6 @@ fail_features:
>>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>>>>    {
>>>>>        int i;
>>>>> -
>>>> Unnecessary changes.
>>>>
>>>>>        /* should only be called after backend is connected */
>>>>>        assert(hdev->vhost_ops);
>>>>>
>>>>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev 
>>>>> *hdev,
>>>>>
>>>>>        return -ENOSYS;
>>>>>    }
>>>>> +
>>>>> +int vhost_dev_reset(struct vhost_dev *hdev)
>>>>> +{
>>>> Let's use a separate patch for this.
>>>>
>>>> Thanks
>>>>
>>>>> +    int ret = 0;
>>>>> +
>>>>> +    /* should only be called after backend is connected */
>>>>> +    assert(hdev->vhost_ops);
>>>>> +
>>>>> +    if (hdev->vhost_ops->vhost_reset_device) {
>>>>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>>>>> +    }
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>>> index 58a73e7..b8b7c20 100644
>>>>> --- a/include/hw/virtio/vhost.h
>>>>> +++ b/include/hw/virtio/vhost.h
>>>>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, 
>>>>> void *opaque,
>>>>>    void vhost_dev_cleanup(struct vhost_dev *hdev);
>>>>>    int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>>>>    int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
>>>>> VirtIODevice *vdev);
>>>>>    void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
>>>>> VirtIODevice *vdev);
>>>>>
>>>>> -- 
>>>>> 1.8.3.1
>>>>>
>>>>>
>>>>>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
       [not found]             ` <6247c8f5.1c69fb81.848e0.8b49SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2022-04-07  7:52               ` Jason Wang
  0 siblings, 0 replies; 47+ messages in thread
From: Jason Wang @ 2022-04-07  7:52 UTC (permalink / raw)
  To: qemu-devel


在 2022/4/2 上午11:53, Michael Qiu 写道:
>
>
> On 2022/4/2 10:20, Jason Wang wrote:
>> Adding Michael.
>>
>> On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>
>>>
>>>
>>> On 3/31/2022 7:53 PM, Jason Wang wrote:
>>>> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> 
>>>> wrote:
>>>>> Currently, when VM poweroff, it will trigger vdpa
>>>>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>>>>> queue pair and one control queue, triggered 3 times), this
>>>>> leads to below issue:
>>>>>
>>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>>
>>>>> This because in vhost_net_stop(), it will stop all vhost device 
>>>>> bind to
>>>>> this virtio device, and in vhost_dev_stop(), qemu tries to stop 
>>>>> the device
>>>>> , then stop the queue: vhost_virtqueue_stop().
>>>>>
>>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>>> in low level driver, and in next loop(stop other vhost backends),
>>>>> qemu try to stop the queue corresponding to the vhost backend,
>>>>>    the driver finds that the VQ is invalied, this is the root cause.
>>>>>
>>>>> To solve the issue, vdpa should set vring unready, and
>>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>>
>>>>> and implement a new function vhost_dev_reset, only reset backend
>>>>> device after all vhost(per-queue) stoped.
>>>> Typo.
>>>>
>>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>> Rethink this patch, consider there're devices that don't support
>>>> set_vq_ready(). I wonder if we need
>>>>
>>>> 1) uAPI to tell the user space whether or not it supports 
>>>> set_vq_ready()
>>> I guess what's more relevant here is to define the uAPI semantics for
>>> unready i.e. set_vq_ready(0) for resuming/stopping virtqueue 
>>> processing,
>>> as starting vq is comparatively less ambiguous.
>>
>> Yes.
>>
>>> Considering the
>>> likelihood that this interface may be used for live migration, it would
>>> be nice to come up with variants such as 1) discard inflight request
>>> v.s. 2) waiting for inflight processing to be done,
>>
>> Or inflight descriptor reporting (which seems to be tricky). But we
>> can start from net that a discarding may just work.
>>
>>> and 3) timeout in
>>> waiting.
>>
>> Actually, that's the plan and Eugenio is proposing something like this
>> via virtio spec:
>>
>> https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
>>
>>>
>>>> 2) userspace will call SET_VRING_ENABLE() when the device supports
>>>> otherwise it will use RESET.
>>> Are you looking to making virtqueue resume-able through the new
>>> SET_VRING_ENABLE() uAPI?
>>>
>>> I think RESET is inevitable in some case, i.e. when guest initiates
>>> device reset by writing 0 to the status register.
>>
>> Yes, that's all my plan.
>>
>>> For suspend/resume and
>>> live migration use cases, indeed RESET can be substituted with
>>> SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
>>> accommodate this change. Although I'm all for it, it'd be the best to
>>> lay out the plan for multiple phases rather than overload this single
>>> patch too much. You can count my time on this endeavor if you don't 
>>> mind. :)
>>
>> You're welcome, I agree we should choose a way to go first:
>>
>> 1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
>> 2) go with virtio-spec (may take a while)
>> 3) don't wait for the spec, have a vDPA specific uAPI first. Note that
>> I've chatted with most of the vendors and they seem to be fine with
>> the _S_STOP. If we go this way, we can still provide the forward
>> compatibility of _S_STOP
>> 4) or do them all (in parallel)
>>
>> Any thoughts?
>>
>
> virtio-spec should be long-term, not only because the spec goes very 
> slowly, but also the hardware upgrade should be a problem.
>
> For short-term, better to take the first one?


Consider we need a new uAPI anyhow, I prefer for 2) but you can try 1) 
and see what people think.

Thanks


>
> Thanks,
> Michael
>> Thanks
>>
>>>
>>>>
>>>> And for safety, I suggest tagging this as 7.1.
>>> +1
>>>
>>> Regards,
>>> -Siwei
>>>
>>>>
>>>>> ---
>>>>> v4 --> v3
>>>>>       Nothing changed, becasue of issue with mimecast,
>>>>>       when the From: tag is different from the sender,
>>>>>       the some mail client will take the patch as an
>>>>>       attachment, RESEND v3 does not work, So resend
>>>>>       the patch as v4
>>>>>
>>>>> v3 --> v2:
>>>>>       Call vhost_dev_reset() at the end of vhost_net_stop().
>>>>>
>>>>>       Since the vDPA device need re-add the status bit
>>>>>       VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>>>>       simply, add them inside vhost_vdpa_reset_device, and
>>>>>       the only way calling vhost_vdpa_reset_device is in
>>>>>       vhost_net_stop(), so it keeps the same behavior as before.
>>>>>
>>>>> v2 --> v1:
>>>>>      Implement a new function vhost_dev_reset,
>>>>>      reset the backend kernel device at last.
>>>>> ---
>>>>>    hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>>>>    hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>>>>    hw/virtio/vhost.c         | 15 ++++++++++++++-
>>>>>    include/hw/virtio/vhost.h |  1 +
>>>>>    4 files changed, 45 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>> index 30379d2..422c9bf 100644
>>>>> --- a/hw/net/vhost_net.c
>>>>> +++ b/hw/net/vhost_net.c
>>>>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>> -    struct vhost_net *net;
>>>>> +    struct vhost_net *net = NULL;
>>>>>        int r, e, i, index_end = data_queue_pairs * 2;
>>>>>        NetClientState *peer;
>>>>>
>>>>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>    err_start:
>>>>>        while (--i >= 0) {
>>>>>            peer = qemu_get_peer(ncs , i);
>>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>> +
>>>>> +        net = get_vhost_net(peer);
>>>>> +
>>>>> +        vhost_net_stop_one(net, dev);
>>>>>        }
>>>>> +
>>>>> +    /* We only reset backend vdpa device */
>>>>> +    if (net && net->dev.vhost_ops->backend_type == 
>>>>> VHOST_BACKEND_TYPE_VDPA) {
>>>>> +        vhost_dev_reset(&net->dev);
>>>>> +    }
>>>>> +
>>>>>        e = k->set_guest_notifiers(qbus->parent, total_notifiers, 
>>>>> false);
>>>>>        if (e < 0) {
>>>>>            fprintf(stderr, "vhost guest notifier cleanup failed: 
>>>>> %d\n", e);
>>>>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>>        NetClientState *peer;
>>>>> +    struct vhost_net *net = NULL;
>>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>>        int i, r;
>>>>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>            } else {
>>>>>                peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>>>            }
>>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>> +
>>>>> +        net = get_vhost_net(peer);
>>>>> +
>>>>> +        vhost_net_stop_one(net, dev);
>>>>> +    }
>>>>> +
>>>>> +    /* We only reset backend vdpa device */
>>>>> +    if (net && net->dev.vhost_ops->backend_type == 
>>>>> VHOST_BACKEND_TYPE_VDPA) {
>>>>> +        vhost_dev_reset(&net->dev);
>>>>>        }
>>>> So we've already reset the device in vhost_vdpa_dev_start(), any
>>>> reason we need to do it again here?
>>>>
>>>>>        r = k->set_guest_notifiers(qbus->parent, total_notifiers, 
>>>>> false);
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index c5ed7a3..3ef0199 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct 
>>>>> vhost_dev *dev)
>>>>>
>>>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>>> +
>>>>> +    /* Add back this status, so that the device could work next 
>>>>> time*/
>>>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>>> This seems to contradict the semantic of reset.
>>>>
>>>>> +
>>>>>        return ret;
>>>>>    }
>>>>>
>>>>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct 
>>>>> vhost_dev *dev, int idx)
>>>>>        return idx;
>>>>>    }
>>>>>
>>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, 
>>>>> unsigned int ready)
>>>>>    {
>>>>>        int i;
>>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>>            struct vhost_vring_state state = {
>>>>>                .index = dev->vq_index + i,
>>>>> -            .num = 1,
>>>>> +            .num = ready,
>>>>>            };
>>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>>        }
>>>>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct 
>>>>> vhost_dev *dev, bool started)
>>>>>            if (unlikely(!ok)) {
>>>>>                return -1;
>>>>>            }
>>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>>        } else {
>>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>>            if (unlikely(!ok)) {
>>>>>                return -1;
>>>>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct 
>>>>> vhost_dev *dev, bool started)
>>>>>            memory_listener_register(&v->listener, 
>>>>> &address_space_memory);
>>>>>            return vhost_vdpa_add_status(dev, 
>>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>        } else {
>>>>> -        vhost_vdpa_reset_device(dev);
>>>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> - VIRTIO_CONFIG_S_DRIVER);
>>>>> memory_listener_unregister(&v->listener);
>>>>>
>>>>>            return 0;
>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>> index b643f42..7e0cdb6 100644
>>>>> --- a/hw/virtio/vhost.c
>>>>> +++ b/hw/virtio/vhost.c
>>>>> @@ -1820,7 +1820,6 @@ fail_features:
>>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>>>>    {
>>>>>        int i;
>>>>> -
>>>> Unnecessary changes.
>>>>
>>>>>        /* should only be called after backend is connected */
>>>>>        assert(hdev->vhost_ops);
>>>>>
>>>>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev 
>>>>> *hdev,
>>>>>
>>>>>        return -ENOSYS;
>>>>>    }
>>>>> +
>>>>> +int vhost_dev_reset(struct vhost_dev *hdev)
>>>>> +{
>>>> Let's use a separate patch for this.
>>>>
>>>> Thanks
>>>>
>>>>> +    int ret = 0;
>>>>> +
>>>>> +    /* should only be called after backend is connected */
>>>>> +    assert(hdev->vhost_ops);
>>>>> +
>>>>> +    if (hdev->vhost_ops->vhost_reset_device) {
>>>>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>>>>> +    }
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>>> index 58a73e7..b8b7c20 100644
>>>>> --- a/include/hw/virtio/vhost.h
>>>>> +++ b/include/hw/virtio/vhost.h
>>>>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, 
>>>>> void *opaque,
>>>>>    void vhost_dev_cleanup(struct vhost_dev *hdev);
>>>>>    int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>>>>    int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
>>>>> VirtIODevice *vdev);
>>>>>    void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
>>>>> VirtIODevice *vdev);
>>>>>
>>>>> -- 
>>>>> 1.8.3.1
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-07  7:35               ` Jason Wang
@ 2022-04-08  8:38                 ` Michael Qiu
  2022-04-08 17:17                   ` Si-Wei Liu
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Qiu @ 2022-04-08  8:38 UTC (permalink / raw)
  To: Jason Wang, qemu-devel
  Cc: Si-Wei Liu, Eugenio Perez Martin, Zhu, Lingshan, Cindy Lu



在 2022/4/7 15:35, Jason Wang 写道:
> 
> 在 2022/4/2 下午1:14, Michael Qiu 写道:
>>
>>
>> On 2022/4/2 10:38, Jason Wang wrote:
>>>
>>> 在 2022/4/1 下午7:06, Michael Qiu 写道:
>>>> Currently in vhost framwork, vhost_reset_device() is misnamed.
>>>> Actually, it should be vhost_reset_owner().
>>>>
>>>> In vhost user, it make compatible with reset device ops, but
>>>> vhost kernel does not compatible with it, for vhost vdpa, it
>>>> only implement reset device action.
>>>>
>>>> So we need seperate the function into vhost_reset_owner() and
>>>> vhost_reset_device(). So that different backend could use the
>>>> correct function.
>>>
>>>
>>> I see no reason when RESET_OWNER needs to be done for kernel backend.
>>>
>>
>> In kernel vhost, RESET_OWNER  indeed do vhost device level reset: 
>> vhost_net_reset_owner()
>>
>> static long vhost_net_reset_owner(struct vhost_net *n)
>> {
>> [...]
>>         err = vhost_dev_check_owner(&n->dev);
>>         if (err)
>>                 goto done;
>>         umem = vhost_dev_reset_owner_prepare();
>>         if (!umem) {
>>                 err = -ENOMEM;
>>                 goto done;
>>         }
>>         vhost_net_stop(n, &tx_sock, &rx_sock);
>>         vhost_net_flush(n);
>>         vhost_dev_stop(&n->dev);
>>         vhost_dev_reset_owner(&n->dev, umem);
>>         vhost_net_vq_reset(n);
>> [...]
>>
>> }
>>
>> In the history of QEMU, There is a commit:
>> commit d1f8b30ec8dde0318fd1b98d24a64926feae9625
>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> Date:   Wed Sep 23 12:19:57 2015 +0800
>>
>>     vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
>>
>>     Quote from Michael:
>>
>>         We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE.
>>
>> but finally, it has been reverted by the author:
>> commit 60915dc4691768c4dc62458bb3e16c843fab091d
>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> Date:   Wed Nov 11 21:24:37 2015 +0800
>>
>>     vhost: rename RESET_DEVICE backto RESET_OWNER
>>
>>     This patch basically reverts commit d1f8b30e.
>>
>>     It turned out that it breaks stuff, so revert it:
>>
>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html
>>
>> Seems kernel take RESET_OWNER for reset,but QEMU never call to this 
>> function to do a reset.
> 
> 
> The question is, we manage to survive by not using RESET_OWNER for past 
> 10 years. Any reason that we want to use that now?
> 
> Note that the RESET_OWNER is only useful the process want to drop the 
> its mm refcnt from vhost, it doesn't reset the device (e.g it does not 
> even call vhost_vq_reset()).
> 
> (Especially, it was deprecated in by the vhost-user protocol since its 
> semantics is ambiguous)
> 
> 

So, you prefer to directly remove RESET_OWNER support now?

>>
>>> And if I understand the code correctly, vhost-user "abuse" 
>>> RESET_OWNER for reset. So the current code looks fine?
>>>
>>>
>>>>
>>>> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
>>>> ---
>>>>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
>>>>   hw/virtio/vhost-backend.c         |  4 ++--
>>>>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
>>>>   include/hw/virtio/vhost-backend.h |  2 ++
>>>>   4 files changed, 27 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>>>> index 1b2f7ee..f179626 100644
>>>> --- a/hw/scsi/vhost-user-scsi.c
>>>> +++ b/hw/scsi/vhost-user-scsi.c
>>>> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice 
>>>> *vdev)
>>>>           return;
>>>>       }
>>>> -    if (dev->vhost_ops->vhost_reset_device) {
>>>> +    if (virtio_has_feature(dev->protocol_features,
>>>> + VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
>>>> + dev->vhost_ops->vhost_reset_device) {
>>>>           dev->vhost_ops->vhost_reset_device(dev);
>>>> +    } else if (dev->vhost_ops->vhost_reset_owner) {
>>>> +        dev->vhost_ops->vhost_reset_owner(dev);
>>>
>>>
>>> Actually, I fail to understand why we need an indirection via 
>>> vhost_ops. It's guaranteed to be vhost_user_ops.
>>>
>>>
>>>>       }
>>>>   }
>>>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
>>>> index e409a86..abbaa8b 100644
>>>> --- a/hw/virtio/vhost-backend.c
>>>> +++ b/hw/virtio/vhost-backend.c
>>>> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct 
>>>> vhost_dev *dev)
>>>>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>>>>   }
>>>> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
>>>> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
>>>>   {
>>>>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>>>>   }
>>>> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
>>>>           .vhost_get_features = vhost_kernel_get_features,
>>>>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>>>>           .vhost_set_owner = vhost_kernel_set_owner,
>>>> -        .vhost_reset_device = vhost_kernel_reset_device,
>>>> +        .vhost_reset_owner = vhost_kernel_reset_owner,
>>>
>>>
>>> I think we can delete the current vhost_reset_device() since it not 
>>> used in any code path.
>>>
>>
>> I planned to use it for vDPA reset, 
> 
> 
> For vhost-vDPA it can call vhost_vdpa_reset_device() directly.
> 
> As I mentioned before, the only user of vhost_reset_device config ops is 
> vhost-user-scsi but it should directly call the vhost_user_reset_device().
> 


Yes, but in the next patch I reuse it to reset backend device in vhost_net.


> Thanks
> 
> 
>> and vhost-user-scsi also use device reset.
>>
>> Thanks,
>> Michael
>>
>>> Thanks
>>>
>>>
>>>>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
>>>>   #ifdef CONFIG_VHOST_VSOCK
>>>>           .vhost_vsock_set_guest_cid = 
>>>> vhost_kernel_vsock_set_guest_cid,
>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>>> index 6abbc9d..4412008 100644
>>>> --- a/hw/virtio/vhost-user.c
>>>> +++ b/hw/virtio/vhost-user.c
>>>> @@ -1475,16 +1475,29 @@ static int 
>>>> vhost_user_get_max_memslots(struct vhost_dev *dev,
>>>>       return 0;
>>>>   }
>>>> +static int vhost_user_reset_owner(struct vhost_dev *dev)
>>>> +{
>>>> +    VhostUserMsg msg = {
>>>> +        .hdr.request = VHOST_USER_RESET_OWNER,
>>>> +        .hdr.flags = VHOST_USER_VERSION,
>>>> +    };
>>>> +
>>>> +    return vhost_user_write(dev, &msg, NULL, 0);
>>>> +}
>>>> +
>>>>   static int vhost_user_reset_device(struct vhost_dev *dev)
>>>>   {
>>>>       VhostUserMsg msg = {
>>>> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>>>>           .hdr.flags = VHOST_USER_VERSION,
>>>>       };
>>>> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
>>>> - VHOST_USER_PROTOCOL_F_RESET_DEVICE)
>>>> -        ? VHOST_USER_RESET_DEVICE
>>>> -        : VHOST_USER_RESET_OWNER;
>>>> +    /* Caller must ensure the backend has 
>>>> VHOST_USER_PROTOCOL_F_RESET_DEVICE
>>>> +     * support */
>>>> +    if (!virtio_has_feature(dev->protocol_features,
>>>> +                       VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
>>>> +        return -EPERM;
>>>> +    }
>>>>       return vhost_user_write(dev, &msg, NULL, 0);
>>>>   }
>>>> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
>>>>           .vhost_set_features = vhost_user_set_features,
>>>>           .vhost_get_features = vhost_user_get_features,
>>>>           .vhost_set_owner = vhost_user_set_owner,
>>>> +        .vhost_reset_owner = vhost_user_reset_owner,
>>>>           .vhost_reset_device = vhost_user_reset_device,
>>>>           .vhost_get_vq_index = vhost_user_get_vq_index,
>>>>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
>>>> diff --git a/include/hw/virtio/vhost-backend.h 
>>>> b/include/hw/virtio/vhost-backend.h
>>>> index 81bf310..affeeb0 100644
>>>> --- a/include/hw/virtio/vhost-backend.h
>>>> +++ b/include/hw/virtio/vhost-backend.h
>>>> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct 
>>>> vhost_dev *dev,
>>>>                                        uint64_t *features);
>>>>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
>>>>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
>>>> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
>>>>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>>>>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
>>>>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
>>>> @@ -150,6 +151,7 @@ typedef struct VhostOps {
>>>>       vhost_get_features_op vhost_get_features;
>>>>       vhost_set_backend_cap_op vhost_set_backend_cap;
>>>>       vhost_set_owner_op vhost_set_owner;
>>>> +    vhost_reset_owner_op vhost_reset_owner;
>>>>       vhost_reset_device_op vhost_reset_device;
>>>>       vhost_get_vq_index_op vhost_get_vq_index;
>>>>       vhost_set_vring_enable_op vhost_set_vring_enable;
>>>
>>>
>>
> 
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-08  8:38                 ` Michael Qiu
@ 2022-04-08 17:17                   ` Si-Wei Liu
  2022-04-11  8:51                     ` Jason Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Si-Wei Liu @ 2022-04-08 17:17 UTC (permalink / raw)
  To: Michael Qiu, Jason Wang, qemu-devel
  Cc: Eugenio Perez Martin, Zhu, Lingshan, Cindy Lu



On 4/8/2022 1:38 AM, Michael Qiu wrote:
>
>
> 在 2022/4/7 15:35, Jason Wang 写道:
>>
>> 在 2022/4/2 下午1:14, Michael Qiu 写道:
>>>
>>>
>>> On 2022/4/2 10:38, Jason Wang wrote:
>>>>
>>>> 在 2022/4/1 下午7:06, Michael Qiu 写道:
>>>>> Currently in vhost framwork, vhost_reset_device() is misnamed.
>>>>> Actually, it should be vhost_reset_owner().
>>>>>
>>>>> In vhost user, it make compatible with reset device ops, but
>>>>> vhost kernel does not compatible with it, for vhost vdpa, it
>>>>> only implement reset device action.
>>>>>
>>>>> So we need seperate the function into vhost_reset_owner() and
>>>>> vhost_reset_device(). So that different backend could use the
>>>>> correct function.
>>>>
>>>>
>>>> I see no reason when RESET_OWNER needs to be done for kernel backend.
>>>>
>>>
>>> In kernel vhost, RESET_OWNER  indeed do vhost device level reset: 
>>> vhost_net_reset_owner()
>>>
>>> static long vhost_net_reset_owner(struct vhost_net *n)
>>> {
>>> [...]
>>>         err = vhost_dev_check_owner(&n->dev);
>>>         if (err)
>>>                 goto done;
>>>         umem = vhost_dev_reset_owner_prepare();
>>>         if (!umem) {
>>>                 err = -ENOMEM;
>>>                 goto done;
>>>         }
>>>         vhost_net_stop(n, &tx_sock, &rx_sock);
>>>         vhost_net_flush(n);
>>>         vhost_dev_stop(&n->dev);
>>>         vhost_dev_reset_owner(&n->dev, umem);
>>>         vhost_net_vq_reset(n);
>>> [...]
>>>
>>> }
>>>
>>> In the history of QEMU, There is a commit:
>>> commit d1f8b30ec8dde0318fd1b98d24a64926feae9625
>>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>> Date:   Wed Sep 23 12:19:57 2015 +0800
>>>
>>>     vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
>>>
>>>     Quote from Michael:
>>>
>>>         We really should rename VHOST_RESET_OWNER to 
>>> VHOST_RESET_DEVICE.
>>>
>>> but finally, it has been reverted by the author:
>>> commit 60915dc4691768c4dc62458bb3e16c843fab091d
>>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>> Date:   Wed Nov 11 21:24:37 2015 +0800
>>>
>>>     vhost: rename RESET_DEVICE backto RESET_OWNER
>>>
>>>     This patch basically reverts commit d1f8b30e.
>>>
>>>     It turned out that it breaks stuff, so revert it:
>>>
>>> https://urldefense.com/v3/__http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html__;!!ACWV5N9M2RV99hQ!bgCEUDnSCLVO8LxXlwcdiHrtwqH5ip_sVcKscrtJve5eSzJfNN9JZuf-8HQIQ1Re$ 
>>>
>>> Seems kernel take RESET_OWNER for reset,but QEMU never call to this 
>>> function to do a reset.
>>
>>
>> The question is, we manage to survive by not using RESET_OWNER for 
>> past 10 years. Any reason that we want to use that now?
>>
>> Note that the RESET_OWNER is only useful the process want to drop the 
>> its mm refcnt from vhost, it doesn't reset the device (e.g it does 
>> not even call vhost_vq_reset()).
>>
>> (Especially, it was deprecated in by the vhost-user protocol since 
>> its semantics is ambiguous)
>>
>>
>
> So, you prefer to directly remove RESET_OWNER support now?
>
>>>
>>>> And if I understand the code correctly, vhost-user "abuse" 
>>>> RESET_OWNER for reset. So the current code looks fine?
>>>>
>>>>
>>>>>
>>>>> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
>>>>> ---
>>>>>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
>>>>>   hw/virtio/vhost-backend.c         |  4 ++--
>>>>>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
>>>>>   include/hw/virtio/vhost-backend.h |  2 ++
>>>>>   4 files changed, 27 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>>>>> index 1b2f7ee..f179626 100644
>>>>> --- a/hw/scsi/vhost-user-scsi.c
>>>>> +++ b/hw/scsi/vhost-user-scsi.c
>>>>> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice 
>>>>> *vdev)
>>>>>           return;
>>>>>       }
>>>>> -    if (dev->vhost_ops->vhost_reset_device) {
>>>>> +    if (virtio_has_feature(dev->protocol_features,
>>>>> + VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
>>>>> + dev->vhost_ops->vhost_reset_device) {
>>>>>           dev->vhost_ops->vhost_reset_device(dev);
>>>>> +    } else if (dev->vhost_ops->vhost_reset_owner) {
>>>>> +        dev->vhost_ops->vhost_reset_owner(dev);
>>>>
>>>>
>>>> Actually, I fail to understand why we need an indirection via 
>>>> vhost_ops. It's guaranteed to be vhost_user_ops.
>>>>
>>>>
>>>>>       }
>>>>>   }
>>>>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
>>>>> index e409a86..abbaa8b 100644
>>>>> --- a/hw/virtio/vhost-backend.c
>>>>> +++ b/hw/virtio/vhost-backend.c
>>>>> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct 
>>>>> vhost_dev *dev)
>>>>>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>>>>>   }
>>>>> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
>>>>> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
>>>>>   {
>>>>>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>>>>>   }
>>>>> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
>>>>>           .vhost_get_features = vhost_kernel_get_features,
>>>>>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>>>>>           .vhost_set_owner = vhost_kernel_set_owner,
>>>>> -        .vhost_reset_device = vhost_kernel_reset_device,
>>>>> +        .vhost_reset_owner = vhost_kernel_reset_owner,
>>>>
>>>>
>>>> I think we can delete the current vhost_reset_device() since it not 
>>>> used in any code path.
>>>>
>>>
>>> I planned to use it for vDPA reset, 
>>
>>
>> For vhost-vDPA it can call vhost_vdpa_reset_device() directly.
>>
>> As I mentioned before, the only user of vhost_reset_device config ops 
>> is vhost-user-scsi but it should directly call the 
>> vhost_user_reset_device().
>>
>
>
> Yes, but in the next patch I reuse it to reset backend device in 
> vhost_net.
I recall vhost-user has a different way to reset the net backend, so 
probably we can leave out implementing the .vhost_reset_device() op for 
vhost-user as Jason suggested. In that case vhost-user-scsi will call 
into vhost_user_reset_device() directly without using the 
.vhost_reset_device() op.

-Siwei

>
>
>> Thanks
>>
>>
>>> and vhost-user-scsi also use device reset.
>>>
>>> Thanks,
>>> Michael
>>>
>>>> Thanks
>>>>
>>>>
>>>>>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
>>>>>   #ifdef CONFIG_VHOST_VSOCK
>>>>>           .vhost_vsock_set_guest_cid = 
>>>>> vhost_kernel_vsock_set_guest_cid,
>>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>>>> index 6abbc9d..4412008 100644
>>>>> --- a/hw/virtio/vhost-user.c
>>>>> +++ b/hw/virtio/vhost-user.c
>>>>> @@ -1475,16 +1475,29 @@ static int 
>>>>> vhost_user_get_max_memslots(struct vhost_dev *dev,
>>>>>       return 0;
>>>>>   }
>>>>> +static int vhost_user_reset_owner(struct vhost_dev *dev)
>>>>> +{
>>>>> +    VhostUserMsg msg = {
>>>>> +        .hdr.request = VHOST_USER_RESET_OWNER,
>>>>> +        .hdr.flags = VHOST_USER_VERSION,
>>>>> +    };
>>>>> +
>>>>> +    return vhost_user_write(dev, &msg, NULL, 0);
>>>>> +}
>>>>> +
>>>>>   static int vhost_user_reset_device(struct vhost_dev *dev)
>>>>>   {
>>>>>       VhostUserMsg msg = {
>>>>> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>>>>>           .hdr.flags = VHOST_USER_VERSION,
>>>>>       };
>>>>> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
>>>>> - VHOST_USER_PROTOCOL_F_RESET_DEVICE)
>>>>> -        ? VHOST_USER_RESET_DEVICE
>>>>> -        : VHOST_USER_RESET_OWNER;
>>>>> +    /* Caller must ensure the backend has 
>>>>> VHOST_USER_PROTOCOL_F_RESET_DEVICE
>>>>> +     * support */
>>>>> +    if (!virtio_has_feature(dev->protocol_features,
>>>>> + VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
>>>>> +        return -EPERM;
>>>>> +    }
>>>>>       return vhost_user_write(dev, &msg, NULL, 0);
>>>>>   }
>>>>> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
>>>>>           .vhost_set_features = vhost_user_set_features,
>>>>>           .vhost_get_features = vhost_user_get_features,
>>>>>           .vhost_set_owner = vhost_user_set_owner,
>>>>> +        .vhost_reset_owner = vhost_user_reset_owner,
>>>>>           .vhost_reset_device = vhost_user_reset_device,
>>>>>           .vhost_get_vq_index = vhost_user_get_vq_index,
>>>>>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
>>>>> diff --git a/include/hw/virtio/vhost-backend.h 
>>>>> b/include/hw/virtio/vhost-backend.h
>>>>> index 81bf310..affeeb0 100644
>>>>> --- a/include/hw/virtio/vhost-backend.h
>>>>> +++ b/include/hw/virtio/vhost-backend.h
>>>>> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct 
>>>>> vhost_dev *dev,
>>>>>                                        uint64_t *features);
>>>>>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
>>>>>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
>>>>> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
>>>>>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>>>>>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int 
>>>>> idx);
>>>>>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
>>>>> @@ -150,6 +151,7 @@ typedef struct VhostOps {
>>>>>       vhost_get_features_op vhost_get_features;
>>>>>       vhost_set_backend_cap_op vhost_set_backend_cap;
>>>>>       vhost_set_owner_op vhost_set_owner;
>>>>> +    vhost_reset_owner_op vhost_reset_owner;
>>>>>       vhost_reset_device_op vhost_reset_device;
>>>>>       vhost_get_vq_index_op vhost_get_vq_index;
>>>>>       vhost_set_vring_enable_op vhost_set_vring_enable;
>>>>
>>>>
>>>
>>
>>
>>
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps
  2022-04-08 17:17                   ` Si-Wei Liu
@ 2022-04-11  8:51                     ` Jason Wang
  0 siblings, 0 replies; 47+ messages in thread
From: Jason Wang @ 2022-04-11  8:51 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Eugenio Perez Martin, Michael Qiu, Cindy Lu, qemu-devel, Zhu, Lingshan

On Sat, Apr 9, 2022 at 1:17 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/8/2022 1:38 AM, Michael Qiu wrote:
> >
> >
> > 在 2022/4/7 15:35, Jason Wang 写道:
> >>
> >> 在 2022/4/2 下午1:14, Michael Qiu 写道:
> >>>
> >>>
> >>> On 2022/4/2 10:38, Jason Wang wrote:
> >>>>
> >>>> 在 2022/4/1 下午7:06, Michael Qiu 写道:
> >>>>> Currently in vhost framwork, vhost_reset_device() is misnamed.
> >>>>> Actually, it should be vhost_reset_owner().
> >>>>>
> >>>>> In vhost user, it make compatible with reset device ops, but
> >>>>> vhost kernel does not compatible with it, for vhost vdpa, it
> >>>>> only implement reset device action.
> >>>>>
> >>>>> So we need seperate the function into vhost_reset_owner() and
> >>>>> vhost_reset_device(). So that different backend could use the
> >>>>> correct function.
> >>>>
> >>>>
> >>>> I see no reason when RESET_OWNER needs to be done for kernel backend.
> >>>>
> >>>
> >>> In kernel vhost, RESET_OWNER  indeed do vhost device level reset:
> >>> vhost_net_reset_owner()
> >>>
> >>> static long vhost_net_reset_owner(struct vhost_net *n)
> >>> {
> >>> [...]
> >>>         err = vhost_dev_check_owner(&n->dev);
> >>>         if (err)
> >>>                 goto done;
> >>>         umem = vhost_dev_reset_owner_prepare();
> >>>         if (!umem) {
> >>>                 err = -ENOMEM;
> >>>                 goto done;
> >>>         }
> >>>         vhost_net_stop(n, &tx_sock, &rx_sock);
> >>>         vhost_net_flush(n);
> >>>         vhost_dev_stop(&n->dev);
> >>>         vhost_dev_reset_owner(&n->dev, umem);
> >>>         vhost_net_vq_reset(n);
> >>> [...]
> >>>
> >>> }
> >>>
> >>> In the history of QEMU, There is a commit:
> >>> commit d1f8b30ec8dde0318fd1b98d24a64926feae9625
> >>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>> Date:   Wed Sep 23 12:19:57 2015 +0800
> >>>
> >>>     vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
> >>>
> >>>     Quote from Michael:
> >>>
> >>>         We really should rename VHOST_RESET_OWNER to
> >>> VHOST_RESET_DEVICE.
> >>>
> >>> but finally, it has been reverted by the author:
> >>> commit 60915dc4691768c4dc62458bb3e16c843fab091d
> >>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>> Date:   Wed Nov 11 21:24:37 2015 +0800
> >>>
> >>>     vhost: rename RESET_DEVICE backto RESET_OWNER
> >>>
> >>>     This patch basically reverts commit d1f8b30e.
> >>>
> >>>     It turned out that it breaks stuff, so revert it:
> >>>
> >>> https://urldefense.com/v3/__http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html__;!!ACWV5N9M2RV99hQ!bgCEUDnSCLVO8LxXlwcdiHrtwqH5ip_sVcKscrtJve5eSzJfNN9JZuf-8HQIQ1Re$
> >>>
> >>> Seems kernel take RESET_OWNER for reset,but QEMU never call to this
> >>> function to do a reset.
> >>
> >>
> >> The question is, we manage to survive by not using RESET_OWNER for
> >> past 10 years. Any reason that we want to use that now?
> >>
> >> Note that the RESET_OWNER is only useful the process want to drop the
> >> its mm refcnt from vhost, it doesn't reset the device (e.g it does
> >> not even call vhost_vq_reset()).
> >>
> >> (Especially, it was deprecated in by the vhost-user protocol since
> >> its semantics is ambiguous)
> >>
> >>
> >
> > So, you prefer to directly remove RESET_OWNER support now?
> >
> >>>
> >>>> And if I understand the code correctly, vhost-user "abuse"
> >>>> RESET_OWNER for reset. So the current code looks fine?
> >>>>
> >>>>
> >>>>>
> >>>>> Signde-off-by: Michael Qiu <qiudayu@archeros.com>
> >>>>> ---
> >>>>>   hw/scsi/vhost-user-scsi.c         |  6 +++++-
> >>>>>   hw/virtio/vhost-backend.c         |  4 ++--
> >>>>>   hw/virtio/vhost-user.c            | 22 ++++++++++++++++++----
> >>>>>   include/hw/virtio/vhost-backend.h |  2 ++
> >>>>>   4 files changed, 27 insertions(+), 7 deletions(-)
> >>>>>
> >>>>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> >>>>> index 1b2f7ee..f179626 100644
> >>>>> --- a/hw/scsi/vhost-user-scsi.c
> >>>>> +++ b/hw/scsi/vhost-user-scsi.c
> >>>>> @@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice
> >>>>> *vdev)
> >>>>>           return;
> >>>>>       }
> >>>>> -    if (dev->vhost_ops->vhost_reset_device) {
> >>>>> +    if (virtio_has_feature(dev->protocol_features,
> >>>>> + VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
> >>>>> + dev->vhost_ops->vhost_reset_device) {
> >>>>>           dev->vhost_ops->vhost_reset_device(dev);
> >>>>> +    } else if (dev->vhost_ops->vhost_reset_owner) {
> >>>>> +        dev->vhost_ops->vhost_reset_owner(dev);
> >>>>
> >>>>
> >>>> Actually, I fail to understand why we need an indirection via
> >>>> vhost_ops. It's guaranteed to be vhost_user_ops.
> >>>>
> >>>>
> >>>>>       }
> >>>>>   }
> >>>>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> >>>>> index e409a86..abbaa8b 100644
> >>>>> --- a/hw/virtio/vhost-backend.c
> >>>>> +++ b/hw/virtio/vhost-backend.c
> >>>>> @@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct
> >>>>> vhost_dev *dev)
> >>>>>       return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
> >>>>>   }
> >>>>> -static int vhost_kernel_reset_device(struct vhost_dev *dev)
> >>>>> +static int vhost_kernel_reset_owner(struct vhost_dev *dev)
> >>>>>   {
> >>>>>       return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
> >>>>>   }
> >>>>> @@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
> >>>>>           .vhost_get_features = vhost_kernel_get_features,
> >>>>>           .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
> >>>>>           .vhost_set_owner = vhost_kernel_set_owner,
> >>>>> -        .vhost_reset_device = vhost_kernel_reset_device,
> >>>>> +        .vhost_reset_owner = vhost_kernel_reset_owner,
> >>>>
> >>>>
> >>>> I think we can delete the current vhost_reset_device() since it not
> >>>> used in any code path.
> >>>>
> >>>
> >>> I planned to use it for vDPA reset,
> >>
> >>
> >> For vhost-vDPA it can call vhost_vdpa_reset_device() directly.
> >>
> >> As I mentioned before, the only user of vhost_reset_device config ops
> >> is vhost-user-scsi but it should directly call the
> >> vhost_user_reset_device().
> >>
> >
> >
> > Yes, but in the next patch I reuse it to reset backend device in
> > vhost_net.
> I recall vhost-user has a different way to reset the net backend,

Yes, it has VHOST_USER_RESET_DEVICE.

Thanks

> so
> probably we can leave out implementing the .vhost_reset_device() op for
> vhost-user as Jason suggested. In that case vhost-user-scsi will call
> into vhost_user_reset_device() directly without using the
> .vhost_reset_device() op.
>
> -Siwei
>
> >
> >
> >> Thanks
> >>
> >>
> >>> and vhost-user-scsi also use device reset.
> >>>
> >>> Thanks,
> >>> Michael
> >>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>>           .vhost_get_vq_index = vhost_kernel_get_vq_index,
> >>>>>   #ifdef CONFIG_VHOST_VSOCK
> >>>>>           .vhost_vsock_set_guest_cid =
> >>>>> vhost_kernel_vsock_set_guest_cid,
> >>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> >>>>> index 6abbc9d..4412008 100644
> >>>>> --- a/hw/virtio/vhost-user.c
> >>>>> +++ b/hw/virtio/vhost-user.c
> >>>>> @@ -1475,16 +1475,29 @@ static int
> >>>>> vhost_user_get_max_memslots(struct vhost_dev *dev,
> >>>>>       return 0;
> >>>>>   }
> >>>>> +static int vhost_user_reset_owner(struct vhost_dev *dev)
> >>>>> +{
> >>>>> +    VhostUserMsg msg = {
> >>>>> +        .hdr.request = VHOST_USER_RESET_OWNER,
> >>>>> +        .hdr.flags = VHOST_USER_VERSION,
> >>>>> +    };
> >>>>> +
> >>>>> +    return vhost_user_write(dev, &msg, NULL, 0);
> >>>>> +}
> >>>>> +
> >>>>>   static int vhost_user_reset_device(struct vhost_dev *dev)
> >>>>>   {
> >>>>>       VhostUserMsg msg = {
> >>>>> +        .hdr.request = VHOST_USER_RESET_DEVICE,
> >>>>>           .hdr.flags = VHOST_USER_VERSION,
> >>>>>       };
> >>>>> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
> >>>>> - VHOST_USER_PROTOCOL_F_RESET_DEVICE)
> >>>>> -        ? VHOST_USER_RESET_DEVICE
> >>>>> -        : VHOST_USER_RESET_OWNER;
> >>>>> +    /* Caller must ensure the backend has
> >>>>> VHOST_USER_PROTOCOL_F_RESET_DEVICE
> >>>>> +     * support */
> >>>>> +    if (!virtio_has_feature(dev->protocol_features,
> >>>>> + VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> >>>>> +        return -EPERM;
> >>>>> +    }
> >>>>>       return vhost_user_write(dev, &msg, NULL, 0);
> >>>>>   }
> >>>>> @@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
> >>>>>           .vhost_set_features = vhost_user_set_features,
> >>>>>           .vhost_get_features = vhost_user_get_features,
> >>>>>           .vhost_set_owner = vhost_user_set_owner,
> >>>>> +        .vhost_reset_owner = vhost_user_reset_owner,
> >>>>>           .vhost_reset_device = vhost_user_reset_device,
> >>>>>           .vhost_get_vq_index = vhost_user_get_vq_index,
> >>>>>           .vhost_set_vring_enable = vhost_user_set_vring_enable,
> >>>>> diff --git a/include/hw/virtio/vhost-backend.h
> >>>>> b/include/hw/virtio/vhost-backend.h
> >>>>> index 81bf310..affeeb0 100644
> >>>>> --- a/include/hw/virtio/vhost-backend.h
> >>>>> +++ b/include/hw/virtio/vhost-backend.h
> >>>>> @@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct
> >>>>> vhost_dev *dev,
> >>>>>                                        uint64_t *features);
> >>>>>   typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
> >>>>>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
> >>>>> +typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);
> >>>>>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
> >>>>>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int
> >>>>> idx);
> >>>>>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
> >>>>> @@ -150,6 +151,7 @@ typedef struct VhostOps {
> >>>>>       vhost_get_features_op vhost_get_features;
> >>>>>       vhost_set_backend_cap_op vhost_set_backend_cap;
> >>>>>       vhost_set_owner_op vhost_set_owner;
> >>>>> +    vhost_reset_owner_op vhost_reset_owner;
> >>>>>       vhost_reset_device_op vhost_reset_device;
> >>>>>       vhost_get_vq_index_op vhost_get_vq_index;
> >>>>>       vhost_set_vring_enable_op vhost_set_vring_enable;
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >
>



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2022-04-11  8:53 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23  8:42 [PATCH] vdpa: Avoid reset when stop device 08005325
2022-03-23  9:20 ` Jason Wang
2022-03-25  6:32   ` Si-Wei Liu
     [not found]     ` <fe13304f-0a18-639e-580d-ce6eb7daecab@archeros.com>
2022-03-25 19:19       ` Si-Wei Liu
     [not found]     ` <6fbf82a9-39ce-f179-5e4b-384123ca542c@archeros.com>
2022-03-25 19:59       ` Si-Wei Liu
2022-03-30  8:52         ` Jason Wang
2022-03-30  9:53           ` Michael Qiu
2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
2022-03-30 10:52   ` Michael S. Tsirkin
2022-03-31  1:39     ` Michael Qiu
2022-03-31  0:15   ` Si-Wei Liu
2022-03-31  4:01     ` Michael Qiu
2022-03-31  4:02     ` Michael Qiu
2022-03-31  5:19   ` [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop() 08005325
2022-03-31  8:55     ` Jason Wang
2022-03-31  9:12       ` Maxime Coquelin
2022-03-31  9:22         ` Michael Qiu
2022-04-01  2:55         ` Jason Wang
2022-03-31  9:25   ` [PATCH RESEND " qiudayu
2022-03-31 10:19     ` Michael Qiu
     [not found]     ` <6245804d.1c69fb81.3c35c.d7efSMTPIN_ADDED_BROKEN@mx.google.com>
2022-03-31 20:32       ` Michael S. Tsirkin
2022-04-01  1:12     ` Si-Wei Liu
2022-04-01  1:45       ` Michael Qiu
2022-04-01  1:31     ` [PATCH v4] " Michael Qiu
2022-04-01  2:53       ` Jason Wang
2022-04-01  3:20         ` Michael Qiu
2022-04-01 23:07         ` Si-Wei Liu
2022-04-02  2:20           ` Jason Wang
2022-04-02  3:53             ` Michael Qiu
2022-04-06  0:56             ` Si-Wei Liu
2022-04-07  7:50               ` Jason Wang
     [not found]             ` <6247c8f5.1c69fb81.848e0.8b49SMTPIN_ADDED_BROKEN@mx.google.com>
2022-04-07  7:52               ` Jason Wang
     [not found]         ` <62466fff.1c69fb81.8817a.d813SMTPIN_ADDED_BROKEN@mx.google.com>
2022-04-02  1:48           ` Jason Wang
2022-04-02  3:43             ` Michael Qiu
2022-04-01 11:06       ` [PATCH 0/3] Refactor vhost device reset Michael Qiu
2022-04-01 11:06         ` [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps Michael Qiu
2022-04-02  0:44           ` Si-Wei Liu
2022-04-02  2:08             ` Michael Qiu
2022-04-02  2:38           ` Jason Wang
2022-04-02  5:14             ` Michael Qiu
     [not found]             ` <6247dc22.1c69fb81.4244.a88bSMTPIN_ADDED_BROKEN@mx.google.com>
2022-04-07  7:35               ` Jason Wang
2022-04-08  8:38                 ` Michael Qiu
2022-04-08 17:17                   ` Si-Wei Liu
2022-04-11  8:51                     ` Jason Wang
2022-04-01 11:06         ` [PATCH 2/3] vhost: add vhost_dev_reset() Michael Qiu
2022-04-02  0:48           ` Si-Wei Liu
2022-04-01 11:06         ` [PATCH 3/3 v5] vdpa: reset the backend device in the end of vhost_net_stop() Michael Qiu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.