All of lore.kernel.org
 help / color / mirror / Atom feed
* virtio: why no full reset on virtio_set_status 0 ?
@ 2022-07-27 10:51 Claudio Fontana
  2022-07-27 15:32 ` Michael S. Tsirkin
  2022-07-27 16:17 ` Michael S. Tsirkin
  0 siblings, 2 replies; 18+ messages in thread
From: Claudio Fontana @ 2022-07-27 10:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Alex Bennée

Hi Michael and all,

I have started researching a qemu / ovs / dpdk bug:

https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/

that seems to be affecting multiple parties in the telco space,

and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
in virtio_set_status, when receiving a status value of 0.

It seems it has always been this way, so I am clearly missing / forgetting something basic,

I checked the virtio spec at https://docs.oasis-open.org/

and from:

"
4.1.4.3 Common configuration structure layout

device_status
The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.

"

and

"
2.4.1 Device Requirements: Device Reset
A device MUST reinitialize device status to 0 after receiving a reset.
"

I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.

Instead, we have just the check:

    if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
        (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
        virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
    }

which just sets the started field,

and then we have the call to the virtio device class set_status (virtio_net...),
but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:

"
    vdev->start_on_kick = false;
    vdev->started = false;
    vdev->broken = false;
    vdev->guest_features = 0;
    vdev->queue_sel = 0;
    vdev->status = 0;
    vdev->disabled = false;
    qatomic_set(&vdev->isr, 0);
    vdev->config_vector = VIRTIO_NO_VECTOR;
    virtio_notify_vector(vdev, vdev->config_vector);

    for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
        ... initialize vdev->vq[i] ...
    }
"

Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
but what am I missing here?

Thanks,

Claudio

-- 
Claudio Fontana
Engineering Manager Virtualization, SUSE Labs Core

SUSE Software Solutions Italy Srl


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-27 10:51 virtio: why no full reset on virtio_set_status 0 ? Claudio Fontana
@ 2022-07-27 15:32 ` Michael S. Tsirkin
  2022-07-28  1:27   ` Jason Wang
  2022-07-27 16:17 ` Michael S. Tsirkin
  1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2022-07-27 15:32 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: qemu-devel, Alex Bennée, Marcel Apfelbaum, Jason Wang

On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
> Hi Michael and all,
> 
> I have started researching a qemu / ovs / dpdk bug:
> 
> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
> 
> that seems to be affecting multiple parties in the telco space,
> 
> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
> in virtio_set_status, when receiving a status value of 0.
> 
> It seems it has always been this way, so I am clearly missing / forgetting something basic,
> 
> I checked the virtio spec at https://docs.oasis-open.org/
> 
> and from:
> 
> "
> 4.1.4.3 Common configuration structure layout
> 
> device_status
> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
> 
> "
> 
> and
> 
> "
> 2.4.1 Device Requirements: Device Reset
> A device MUST reinitialize device status to 0 after receiving a reset.
> "
> 
> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
> 
> Instead, we have just the check:
> 
>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>     }
> 
> which just sets the started field,
> 
> and then we have the call to the virtio device class set_status (virtio_net...),
> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
> 
> "
>     vdev->start_on_kick = false;
>     vdev->started = false;
>     vdev->broken = false;
>     vdev->guest_features = 0;
>     vdev->queue_sel = 0;
>     vdev->status = 0;
>     vdev->disabled = false;
>     qatomic_set(&vdev->isr, 0);
>     vdev->config_vector = VIRTIO_NO_VECTOR;
>     virtio_notify_vector(vdev, vdev->config_vector);
> 
>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>         ... initialize vdev->vq[i] ...
>     }
> "
> 
> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
> but what am I missing here?
> 
> Thanks,
> 
> Claudio
> 
> -- 
> Claudio Fontana
> Engineering Manager Virtualization, SUSE Labs Core
> 
> SUSE Software Solutions Italy Srl


So for example for pci:

    case VIRTIO_PCI_STATUS:


	....

        if (vdev->status == 0) {
            virtio_pci_reset(DEVICE(proxy));
        }

which I suspect is a bug because:

static void virtio_pci_reset(DeviceState *qdev)
{
    VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
    PCIDevice *dev = PCI_DEVICE(qdev);
    int i;

    virtio_bus_reset(bus);
    msix_unuse_all_vectors(&proxy->pci_dev);

    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
        proxy->vqs[i].enabled = 0;
        proxy->vqs[i].num = 0;
        proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
        proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
        proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
    }


so far so good

    if (pci_is_express(dev)) {
        pcie_cap_deverr_reset(dev);
        pcie_cap_lnkctl_reset(dev);

        pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
    }

this part is wrong I think, it got here by mistake since the same
function is used for bus level reset.

Jason, Marcel, any input?

-- 
MST



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-27 10:51 virtio: why no full reset on virtio_set_status 0 ? Claudio Fontana
  2022-07-27 15:32 ` Michael S. Tsirkin
@ 2022-07-27 16:17 ` Michael S. Tsirkin
  1 sibling, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2022-07-27 16:17 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel, Alex Bennée

On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
> Hi Michael and all,
> 
> I have started researching a qemu / ovs / dpdk bug:
> 
> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
> 
> that seems to be affecting multiple parties in the telco space,
> 
> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
> in virtio_set_status, when receiving a status value of 0.
> 
> It seems it has always been this way, so I am clearly missing / forgetting something basic,
> 
> I checked the virtio spec at https://docs.oasis-open.org/
> 
> and from:
> 
> "
> 4.1.4.3 Common configuration structure layout
> 
> device_status
> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
> 
> "
> 
> and
> 
> "
> 2.4.1 Device Requirements: Device Reset
> A device MUST reinitialize device status to 0 after receiving a reset.
> "
> 
> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
> 
> Instead, we have just the check:
> 
>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>     }
> 
> which just sets the started field,
> 
> and then we have the call to the virtio device class set_status (virtio_net...),
> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
> 
> "
>     vdev->start_on_kick = false;
>     vdev->started = false;
>     vdev->broken = false;
>     vdev->guest_features = 0;
>     vdev->queue_sel = 0;
>     vdev->status = 0;
>     vdev->disabled = false;
>     qatomic_set(&vdev->isr, 0);
>     vdev->config_vector = VIRTIO_NO_VECTOR;
>     virtio_notify_vector(vdev, vdev->config_vector);
> 
>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>         ... initialize vdev->vq[i] ...
>     }
> "
> 
> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
> but what am I missing here?
> 
> Thanks,
> 
> Claudio

I went looking at this code and found a bug, so
I just sent a patch fixing it:
https://lore.kernel.org/r/20220727161445.21428-1-mst%40redhat.com

no idea whether it's related but can't hurt to try.


> -- 
> Claudio Fontana
> Engineering Manager Virtualization, SUSE Labs Core
> 
> SUSE Software Solutions Italy Srl



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-27 15:32 ` Michael S. Tsirkin
@ 2022-07-28  1:27   ` Jason Wang
  2022-07-28  7:16     ` Claudio Fontana
  2022-07-28  7:43     ` Claudio Fontana
  0 siblings, 2 replies; 18+ messages in thread
From: Jason Wang @ 2022-07-28  1:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Claudio Fontana, qemu-devel, Alex Bennée, Marcel Apfelbaum

On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
> > Hi Michael and all,
> >
> > I have started researching a qemu / ovs / dpdk bug:
> >
> > https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
> >
> > that seems to be affecting multiple parties in the telco space,
> >
> > and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
> > in virtio_set_status, when receiving a status value of 0.
> >
> > It seems it has always been this way, so I am clearly missing / forgetting something basic,
> >
> > I checked the virtio spec at https://docs.oasis-open.org/
> >
> > and from:
> >
> > "
> > 4.1.4.3 Common configuration structure layout
> >
> > device_status
> > The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
> >
> > "
> >
> > and
> >
> > "
> > 2.4.1 Device Requirements: Device Reset
> > A device MUST reinitialize device status to 0 after receiving a reset.
> > "
> >
> > I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
> >
> > Instead, we have just the check:
> >
> >     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
> >         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> >         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
> >     }
> >
> > which just sets the started field,
> >
> > and then we have the call to the virtio device class set_status (virtio_net...),
> > but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
> >
> > "
> >     vdev->start_on_kick = false;
> >     vdev->started = false;
> >     vdev->broken = false;
> >     vdev->guest_features = 0;
> >     vdev->queue_sel = 0;
> >     vdev->status = 0;
> >     vdev->disabled = false;
> >     qatomic_set(&vdev->isr, 0);
> >     vdev->config_vector = VIRTIO_NO_VECTOR;
> >     virtio_notify_vector(vdev, vdev->config_vector);
> >
> >     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> >         ... initialize vdev->vq[i] ...
> >     }
> > "
> >
> > Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
> > but what am I missing here?
> >
> > Thanks,
> >
> > Claudio
> >
> > --
> > Claudio Fontana
> > Engineering Manager Virtualization, SUSE Labs Core
> >
> > SUSE Software Solutions Italy Srl
>
>
> So for example for pci:
>
>     case VIRTIO_PCI_STATUS:
>
>
>         ....
>
>         if (vdev->status == 0) {
>             virtio_pci_reset(DEVICE(proxy));
>         }
>
> which I suspect is a bug because:
>
> static void virtio_pci_reset(DeviceState *qdev)
> {
>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>     PCIDevice *dev = PCI_DEVICE(qdev);
>     int i;
>
>     virtio_bus_reset(bus);

Note that we do virtio_reset() here.

>     msix_unuse_all_vectors(&proxy->pci_dev);
>
>     for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>         proxy->vqs[i].enabled = 0;
>         proxy->vqs[i].num = 0;
>         proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
>         proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
>         proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
>     }
>
>
> so far so good
>
>     if (pci_is_express(dev)) {
>         pcie_cap_deverr_reset(dev);
>         pcie_cap_lnkctl_reset(dev);
>
>         pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
>     }
>
> this part is wrong I think, it got here by mistake since the same
> function is used for bus level reset.
>
> Jason, Marcel, any input?

Yes, I think we don't need PCI stuff here. We do virtio reset not pci.

Thanks

>
> --
> MST
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28  1:27   ` Jason Wang
@ 2022-07-28  7:16     ` Claudio Fontana
  2022-07-28  7:43     ` Claudio Fontana
  1 sibling, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2022-07-28  7:16 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: qemu-devel, Alex Bennée, Marcel Apfelbaum

On 7/28/22 03:27, Jason Wang wrote:
> On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
>>> Hi Michael and all,
>>>
>>> I have started researching a qemu / ovs / dpdk bug:
>>>
>>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
>>>
>>> that seems to be affecting multiple parties in the telco space,
>>>
>>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
>>> in virtio_set_status, when receiving a status value of 0.
>>>
>>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
>>>
>>> I checked the virtio spec at https://docs.oasis-open.org/
>>>
>>> and from:
>>>
>>> "
>>> 4.1.4.3 Common configuration structure layout
>>>
>>> device_status
>>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
>>>
>>> "
>>>
>>> and
>>>
>>> "
>>> 2.4.1 Device Requirements: Device Reset
>>> A device MUST reinitialize device status to 0 after receiving a reset.
>>> "
>>>
>>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
>>>
>>> Instead, we have just the check:
>>>
>>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>>>     }
>>>
>>> which just sets the started field,
>>>
>>> and then we have the call to the virtio device class set_status (virtio_net...),
>>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
>>>
>>> "
>>>     vdev->start_on_kick = false;
>>>     vdev->started = false;
>>>     vdev->broken = false;
>>>     vdev->guest_features = 0;
>>>     vdev->queue_sel = 0;
>>>     vdev->status = 0;
>>>     vdev->disabled = false;
>>>     qatomic_set(&vdev->isr, 0);
>>>     vdev->config_vector = VIRTIO_NO_VECTOR;
>>>     virtio_notify_vector(vdev, vdev->config_vector);
>>>
>>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>>         ... initialize vdev->vq[i] ...
>>>     }
>>> "
>>>
>>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
>>> but what am I missing here?
>>>
>>> Thanks,
>>>
>>> Claudio
>>>
>>> --
>>> Claudio Fontana
>>> Engineering Manager Virtualization, SUSE Labs Core
>>>
>>> SUSE Software Solutions Italy Srl
>>
>>
>> So for example for pci:
>>
>>     case VIRTIO_PCI_STATUS:
>>
>>
>>         ....
>>
>>         if (vdev->status == 0) {
>>             virtio_pci_reset(DEVICE(proxy));
>>         }
>>
>> which I suspect is a bug because:
>>
>> static void virtio_pci_reset(DeviceState *qdev)
>> {
>>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>     PCIDevice *dev = PCI_DEVICE(qdev);
>>     int i;
>>
>>     virtio_bus_reset(bus);
> 
> Note that we do virtio_reset() here.


Indeed, thanks I completely overlooked it.

However we end up with multiple calls to k->set_status,
one from the virtio_set_status call, and one from the virtio_reset() call, from a single 0 write.

Also I wonder about the ordering. Starting from the guest driver writing 0,
we end up (for virtio-net), calling virtio_net_set_status() first,
which lands us in virtio_net_vhost_status(),
which in the case of an already started vhost lands us in vhost_net_stop().

If we do virtio_reset immediately on seeing the guest writing 0,
we instead land first in virtio_net_reset().

Which one is the right behavior? I am having more success with the second ordering currently.
I can describe the scenario in more detail if needed in this context.

All in all, I think that the meaning of virtio_set_status as a function is unclear;
if is it supposed to match the virtio standard meaning of be virtio status field, then I'd expect we should do the virtio reset right there,
like we process the other bits of the status field, and not do it again from virtio_pci...

I wonder also about all the callers of virtio_set_status, what the assumptions of each caller are...

Thanks!

Claudio


> 
>>     msix_unuse_all_vectors(&proxy->pci_dev);
>>
>>     for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>         proxy->vqs[i].enabled = 0;
>>         proxy->vqs[i].num = 0;
>>         proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
>>         proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
>>         proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
>>     }
>>
>>
>> so far so good
>>
>>     if (pci_is_express(dev)) {
>>         pcie_cap_deverr_reset(dev);
>>         pcie_cap_lnkctl_reset(dev);
>>
>>         pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
>>     }
>>
>> this part is wrong I think, it got here by mistake since the same
>> function is used for bus level reset.
>>
>> Jason, Marcel, any input?
> 
> Yes, I think we don't need PCI stuff here. We do virtio reset not pci.
> 
> Thanks
> 
>>
>> --
>> MST
>>
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28  1:27   ` Jason Wang
  2022-07-28  7:16     ` Claudio Fontana
@ 2022-07-28  7:43     ` Claudio Fontana
  2022-07-28  9:09       ` Claudio Fontana
  2022-07-28 11:41       ` Michael S. Tsirkin
  1 sibling, 2 replies; 18+ messages in thread
From: Claudio Fontana @ 2022-07-28  7:43 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: qemu-devel, Alex Bennée, Marcel Apfelbaum

On 7/28/22 03:27, Jason Wang wrote:
> On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
>>> Hi Michael and all,
>>>
>>> I have started researching a qemu / ovs / dpdk bug:
>>>
>>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
>>>
>>> that seems to be affecting multiple parties in the telco space,
>>>
>>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
>>> in virtio_set_status, when receiving a status value of 0.
>>>
>>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
>>>
>>> I checked the virtio spec at https://docs.oasis-open.org/
>>>
>>> and from:
>>>
>>> "
>>> 4.1.4.3 Common configuration structure layout
>>>
>>> device_status
>>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
>>>
>>> "
>>>
>>> and
>>>
>>> "
>>> 2.4.1 Device Requirements: Device Reset
>>> A device MUST reinitialize device status to 0 after receiving a reset.
>>> "
>>>
>>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
>>>
>>> Instead, we have just the check:
>>>
>>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>>>     }
>>>
>>> which just sets the started field,
>>>
>>> and then we have the call to the virtio device class set_status (virtio_net...),
>>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
>>>
>>> "
>>>     vdev->start_on_kick = false;
>>>     vdev->started = false;
>>>     vdev->broken = false;
>>>     vdev->guest_features = 0;
>>>     vdev->queue_sel = 0;
>>>     vdev->status = 0;
>>>     vdev->disabled = false;
>>>     qatomic_set(&vdev->isr, 0);
>>>     vdev->config_vector = VIRTIO_NO_VECTOR;
>>>     virtio_notify_vector(vdev, vdev->config_vector);
>>>
>>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>>         ... initialize vdev->vq[i] ...
>>>     }
>>> "
>>>
>>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
>>> but what am I missing here?
>>>
>>> Thanks,
>>>
>>> Claudio
>>>
>>> --
>>> Claudio Fontana
>>> Engineering Manager Virtualization, SUSE Labs Core
>>>
>>> SUSE Software Solutions Italy Srl
>>
>>
>> So for example for pci:
>>
>>     case VIRTIO_PCI_STATUS:
>>
>>
>>         ....
>>
>>         if (vdev->status == 0) {
>>             virtio_pci_reset(DEVICE(proxy));
>>         }
>>
>> which I suspect is a bug because:
>>
>> static void virtio_pci_reset(DeviceState *qdev)
>> {
>>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>     PCIDevice *dev = PCI_DEVICE(qdev);
>>     int i;
>>
>>     virtio_bus_reset(bus);
> 
> Note that we do virtio_reset() here.


Yes, thank you, I completely overlooked it, I noticed this in Michael's response as well.

However we end up with multiple calls to k->set_status, one from the virtio_set_status call,
and one from the virtio_bus_reset(), which is probably something we don't want.

All in all it is not clear what the meaning of virtio_set_status is supposed to be I think,
and I wonder what the assumptions are among all the callers.
If it is supposed to be an implementation of the virtio standard field as described, I think we should do the reset right then and there,
but maybe the true meaning of the function is another one I couldn't understand, since _some_ of the cases are processes there.

And there is a question about ordering:

in virtio_pci we end up calling virtio_set_status(0), which gets us k->set_status(vdev, 0), which lands in virtio_net_set_status(0) and virtio_net_vhost_status,
which causes a vhost_net_stop().

Should we instead land in virtio_net_reset() first, by doing a virtio reset earlier when detecting a 0 value from the driver?

in the scenario I am looking at (with vhost-user, ovs/dpdk, and a guest testpmd application),
the guest application goes away without any chance to signal (kill -9), then gets immediately restarted and does a write of 0 to status, while qemu and ovs still hold the state for the device.

As QEMU lands in vhost_net_stop(), it seems to cause a chain of events that crash ovs which is trying to read an rx burst from the queue,
while QEMU is left hanging waiting forever for a response to VHOST_USER_GET_VRING_BASE issued as a result of vhost_net_stop.

Just saying, I am having more success with the second ordering, but I am still studying, don't have the full picture yet.

Thanks,

Claudio

> 
>>     msix_unuse_all_vectors(&proxy->pci_dev);
>>
>>     for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>         proxy->vqs[i].enabled = 0;
>>         proxy->vqs[i].num = 0;
>>         proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
>>         proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
>>         proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
>>     }
>>
>>
>> so far so good
>>
>>     if (pci_is_express(dev)) {
>>         pcie_cap_deverr_reset(dev);
>>         pcie_cap_lnkctl_reset(dev);
>>
>>         pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
>>     }
>>
>> this part is wrong I think, it got here by mistake since the same
>> function is used for bus level reset.
>>
>> Jason, Marcel, any input?
> 
> Yes, I think we don't need PCI stuff here. We do virtio reset not pci.
> 
> Thanks
> 
>>
>> --
>> MST
>>
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28  7:43     ` Claudio Fontana
@ 2022-07-28  9:09       ` Claudio Fontana
  2022-07-28 10:24         ` Cornelia Huck
  2022-07-28 13:39         ` Michael S. Tsirkin
  2022-07-28 11:41       ` Michael S. Tsirkin
  1 sibling, 2 replies; 18+ messages in thread
From: Claudio Fontana @ 2022-07-28  9:09 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: qemu-devel, Alex Bennée, Marcel Apfelbaum

On 7/28/22 09:43, Claudio Fontana wrote:
> On 7/28/22 03:27, Jason Wang wrote:
>> On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
>>>> Hi Michael and all,
>>>>
>>>> I have started researching a qemu / ovs / dpdk bug:
>>>>
>>>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
>>>>
>>>> that seems to be affecting multiple parties in the telco space,
>>>>
>>>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
>>>> in virtio_set_status, when receiving a status value of 0.
>>>>
>>>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
>>>>
>>>> I checked the virtio spec at https://docs.oasis-open.org/
>>>>
>>>> and from:
>>>>
>>>> "
>>>> 4.1.4.3 Common configuration structure layout
>>>>
>>>> device_status
>>>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
>>>>
>>>> "
>>>>
>>>> and
>>>>
>>>> "
>>>> 2.4.1 Device Requirements: Device Reset
>>>> A device MUST reinitialize device status to 0 after receiving a reset.
>>>> "
>>>>
>>>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
>>>>
>>>> Instead, we have just the check:
>>>>
>>>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>>>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>>>>     }
>>>>
>>>> which just sets the started field,
>>>>
>>>> and then we have the call to the virtio device class set_status (virtio_net...),
>>>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
>>>>
>>>> "
>>>>     vdev->start_on_kick = false;
>>>>     vdev->started = false;
>>>>     vdev->broken = false;
>>>>     vdev->guest_features = 0;
>>>>     vdev->queue_sel = 0;
>>>>     vdev->status = 0;
>>>>     vdev->disabled = false;
>>>>     qatomic_set(&vdev->isr, 0);
>>>>     vdev->config_vector = VIRTIO_NO_VECTOR;
>>>>     virtio_notify_vector(vdev, vdev->config_vector);
>>>>
>>>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>>>         ... initialize vdev->vq[i] ...
>>>>     }
>>>> "
>>>>
>>>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
>>>> but what am I missing here?
>>>>
>>>> Thanks,
>>>>
>>>> Claudio
>>>>
>>>> --
>>>> Claudio Fontana
>>>> Engineering Manager Virtualization, SUSE Labs Core
>>>>
>>>> SUSE Software Solutions Italy Srl
>>>
>>>
>>> So for example for pci:
>>>
>>>     case VIRTIO_PCI_STATUS:
>>>
>>>
>>>         ....
>>>
>>>         if (vdev->status == 0) {
>>>             virtio_pci_reset(DEVICE(proxy));
>>>         }
>>>
>>> which I suspect is a bug because:
>>>
>>> static void virtio_pci_reset(DeviceState *qdev)
>>> {
>>>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>>>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>>     PCIDevice *dev = PCI_DEVICE(qdev);
>>>     int i;
>>>
>>>     virtio_bus_reset(bus);
>>
>> Note that we do virtio_reset() here.
> 
> 
> Yes, thank you, I completely overlooked it, I noticed this in Michael's response as well.
> 
> However we end up with multiple calls to k->set_status, one from the virtio_set_status call,
> and one from the virtio_bus_reset(), which is probably something we don't want.
> 
> All in all it is not clear what the meaning of virtio_set_status is supposed to be I think,
> and I wonder what the assumptions are among all the callers.
> If it is supposed to be an implementation of the virtio standard field as described, I think we should do the reset right then and there,
> but maybe the true meaning of the function is another one I couldn't understand, since _some_ of the cases are processes there.
> 
> And there is a question about ordering:
> 
> in virtio_pci we end up calling virtio_set_status(0), which gets us k->set_status(vdev, 0), which lands in virtio_net_set_status(0) and virtio_net_vhost_status,
> which causes a vhost_net_stop().
> 
> Should we instead land in virtio_net_reset() first, by doing a virtio reset earlier when detecting a 0 value from the driver?
> 
> in the scenario I am looking at (with vhost-user, ovs/dpdk, and a guest testpmd application),
> the guest application goes away without any chance to signal (kill -9), then gets immediately restarted and does a write of 0 to status, while qemu and ovs still hold the state for the device.
> 
> As QEMU lands in vhost_net_stop(), it seems to cause a chain of events that crash ovs which is trying to read an rx burst from the queue,
> while QEMU is left hanging waiting forever for a response to VHOST_USER_GET_VRING_BASE issued as a result of vhost_net_stop.
> 
> Just saying, I am having more success with the second ordering, but I am still studying, don't have the full picture yet.


Currently I'm doing (on top of Michael's patch) the following which seems to be working
(but of course this does not even being to look at the other transports, architectures etc),
just an idea to share:

---
 hw/virtio/virtio-pci.c | 7 ++++---
 hw/virtio/virtio.c     | 7 ++++++-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 3189ec014d..3cbfa3ce3a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -312,6 +312,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
     case VIRTIO_PCI_QUEUE_PFN:
         pa = (hwaddr)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
         if (pa == 0) {
+            virtio_bus_reset(&proxy->bus);
             virtio_pci_reset(DEVICE(proxy));
         }
         else
@@ -1941,11 +1942,8 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
 static void virtio_pci_reset(DeviceState *qdev)
 {
     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
-    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
-    PCIDevice *dev = PCI_DEVICE(qdev);
     int i;
 
-    virtio_bus_reset(bus);
     msix_unuse_all_vectors(&proxy->pci_dev);
 
     for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
@@ -1960,7 +1958,10 @@ static void virtio_pci_reset(DeviceState *qdev)
 static void virtio_pci_bus_reset(DeviceState *qdev)
 {
     PCIDevice *dev = PCI_DEVICE(qdev);
+    VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
+    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
 
+    virtio_bus_reset(bus);
     virtio_pci_reset(qdev);
 
     if (pci_is_express(dev)) {
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..da58ca6f86 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1977,6 +1977,12 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
     VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
     trace_virtio_set_status(vdev, val);
 
+    if (val == 0) {
+        VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
+        virtio_bus_reset(bus);
+        return 0;
+    }
+
     if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
         if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
             val & VIRTIO_CONFIG_S_FEATURES_OK) {
@@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
     VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
     int i;
 
-    virtio_set_status(vdev, 0);
     if (current_cpu) {
         /* Guest initiated reset */
         vdev->device_endian = virtio_current_cpu_endian();
-- 
2.26.2






> 
> Thanks,
> 
> Claudio
> 
>>
>>>     msix_unuse_all_vectors(&proxy->pci_dev);
>>>
>>>     for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>>         proxy->vqs[i].enabled = 0;
>>>         proxy->vqs[i].num = 0;
>>>         proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
>>>         proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
>>>         proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
>>>     }
>>>
>>>
>>> so far so good
>>>
>>>     if (pci_is_express(dev)) {
>>>         pcie_cap_deverr_reset(dev);
>>>         pcie_cap_lnkctl_reset(dev);
>>>
>>>         pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
>>>     }
>>>
>>> this part is wrong I think, it got here by mistake since the same
>>> function is used for bus level reset.
>>>
>>> Jason, Marcel, any input?
>>
>> Yes, I think we don't need PCI stuff here. We do virtio reset not pci.
>>
>> Thanks
>>
>>>
>>> --
>>> MST
>>>
>>
>>
> 
> 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28  9:09       ` Claudio Fontana
@ 2022-07-28 10:24         ` Cornelia Huck
  2022-07-31 20:38           ` Claudio Fontana
  2022-07-28 13:39         ` Michael S. Tsirkin
  1 sibling, 1 reply; 18+ messages in thread
From: Cornelia Huck @ 2022-07-28 10:24 UTC (permalink / raw)
  To: Claudio Fontana, Jason Wang, Michael S. Tsirkin
  Cc: qemu-devel, Alex Bennée, Marcel Apfelbaum, Halil Pasic, Eric Farman

On Thu, Jul 28 2022, Claudio Fontana <cfontana@suse.de> wrote:

> On 7/28/22 09:43, Claudio Fontana wrote:
>> On 7/28/22 03:27, Jason Wang wrote:
>>> On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>
>>>> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
>>>>> Hi Michael and all,
>>>>>
>>>>> I have started researching a qemu / ovs / dpdk bug:
>>>>>
>>>>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
>>>>>
>>>>> that seems to be affecting multiple parties in the telco space,
>>>>>
>>>>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
>>>>> in virtio_set_status, when receiving a status value of 0.
>>>>>
>>>>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
>>>>>
>>>>> I checked the virtio spec at https://docs.oasis-open.org/
>>>>>
>>>>> and from:
>>>>>
>>>>> "
>>>>> 4.1.4.3 Common configuration structure layout
>>>>>
>>>>> device_status
>>>>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
>>>>>
>>>>> "
>>>>>
>>>>> and
>>>>>
>>>>> "
>>>>> 2.4.1 Device Requirements: Device Reset
>>>>> A device MUST reinitialize device status to 0 after receiving a reset.
>>>>> "

Side note: We can also have a reset without writing 0 to the device
status (RESET ccw on the virtio-ccw transport).

>>>>>
>>>>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
>>>>>
>>>>> Instead, we have just the check:
>>>>>
>>>>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>>>>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>     }
>>>>>
>>>>> which just sets the started field,
>>>>>
>>>>> and then we have the call to the virtio device class set_status (virtio_net...),
>>>>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
>>>>>
>>>>> "
>>>>>     vdev->start_on_kick = false;
>>>>>     vdev->started = false;
>>>>>     vdev->broken = false;
>>>>>     vdev->guest_features = 0;
>>>>>     vdev->queue_sel = 0;
>>>>>     vdev->status = 0;
>>>>>     vdev->disabled = false;
>>>>>     qatomic_set(&vdev->isr, 0);
>>>>>     vdev->config_vector = VIRTIO_NO_VECTOR;
>>>>>     virtio_notify_vector(vdev, vdev->config_vector);
>>>>>
>>>>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>>>>         ... initialize vdev->vq[i] ...
>>>>>     }
>>>>> "
>>>>>
>>>>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
>>>>> but what am I missing here?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Claudio
>>>>>
>>>>> --
>>>>> Claudio Fontana
>>>>> Engineering Manager Virtualization, SUSE Labs Core
>>>>>
>>>>> SUSE Software Solutions Italy Srl
>>>>
>>>>
>>>> So for example for pci:
>>>>
>>>>     case VIRTIO_PCI_STATUS:
>>>>
>>>>
>>>>         ....
>>>>
>>>>         if (vdev->status == 0) {
>>>>             virtio_pci_reset(DEVICE(proxy));
>>>>         }

FWIW, ccw ends up calling virtio_ccw_reset_virtio() when the driver
issues a reset command, or when it issues a write status 0 command, or
when the generic reset function is invoked.

>>>>
>>>> which I suspect is a bug because:
>>>>
>>>> static void virtio_pci_reset(DeviceState *qdev)
>>>> {
>>>>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>>>>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>>>     PCIDevice *dev = PCI_DEVICE(qdev);
>>>>     int i;
>>>>
>>>>     virtio_bus_reset(bus);
>>>
>>> Note that we do virtio_reset() here.
>> 
>> 
>> Yes, thank you, I completely overlooked it, I noticed this in Michael's response as well.
>> 
>> However we end up with multiple calls to k->set_status, one from the virtio_set_status call,
>> and one from the virtio_bus_reset(), which is probably something we don't want.
>> 
>> All in all it is not clear what the meaning of virtio_set_status is supposed to be I think,
>> and I wonder what the assumptions are among all the callers.
>> If it is supposed to be an implementation of the virtio standard field as described, I think we should do the reset right then and there,
>> but maybe the true meaning of the function is another one I couldn't understand, since _some_ of the cases are processes there.

Hm. Maybe there needs to be a distinction between "we're forwarding the
status setting by the driver to the core, take any appropriate action"
and "we've just reset the device, now we just need to zero out the
status field"?

>> 
>> And there is a question about ordering:
>> 
>> in virtio_pci we end up calling virtio_set_status(0), which gets us k->set_status(vdev, 0), which lands in virtio_net_set_status(0) and virtio_net_vhost_status,
>> which causes a vhost_net_stop().
>> 
>> Should we instead land in virtio_net_reset() first, by doing a virtio reset earlier when detecting a 0 value from the driver?
>> 
>> in the scenario I am looking at (with vhost-user, ovs/dpdk, and a guest testpmd application),
>> the guest application goes away without any chance to signal (kill -9), then gets immediately restarted and does a write of 0 to status, while qemu and ovs still hold the state for the device.
>> 
>> As QEMU lands in vhost_net_stop(), it seems to cause a chain of events that crash ovs which is trying to read an rx burst from the queue,
>> while QEMU is left hanging waiting forever for a response to VHOST_USER_GET_VRING_BASE issued as a result of vhost_net_stop.
>> 
>> Just saying, I am having more success with the second ordering, but I am still studying, don't have the full picture yet.
>
>
> Currently I'm doing (on top of Michael's patch) the following which seems to be working
> (but of course this does not even being to look at the other transports, architectures etc),
> just an idea to share:
>
> ---
>  hw/virtio/virtio-pci.c | 7 ++++---
>  hw/virtio/virtio.c     | 7 ++++++-
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 3189ec014d..3cbfa3ce3a 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -312,6 +312,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>      case VIRTIO_PCI_QUEUE_PFN:
>          pa = (hwaddr)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
>          if (pa == 0) {
> +            virtio_bus_reset(&proxy->bus);
>              virtio_pci_reset(DEVICE(proxy));
>          }
>          else
> @@ -1941,11 +1942,8 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
>  static void virtio_pci_reset(DeviceState *qdev)
>  {
>      VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> -    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
> -    PCIDevice *dev = PCI_DEVICE(qdev);
>      int i;
>  
> -    virtio_bus_reset(bus);
>      msix_unuse_all_vectors(&proxy->pci_dev);
>  
>      for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> @@ -1960,7 +1958,10 @@ static void virtio_pci_reset(DeviceState *qdev)
>  static void virtio_pci_bus_reset(DeviceState *qdev)
>  {
>      PCIDevice *dev = PCI_DEVICE(qdev);
> +    VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> +    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>  
> +    virtio_bus_reset(bus);
>      virtio_pci_reset(qdev);
>  
>      if (pci_is_express(dev)) {
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 5d607aeaa0..da58ca6f86 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1977,6 +1977,12 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>      trace_virtio_set_status(vdev, val);
>  
> +    if (val == 0) {
> +        VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +        virtio_bus_reset(bus);
> +        return 0;
> +    }
> +
>      if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>          if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
>              val & VIRTIO_CONFIG_S_FEATURES_OK) {
> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>      int i;
>  
> -    virtio_set_status(vdev, 0);

Doesn't that break virtio-ccw for resets triggered via the RESET ccw
(see above?)

>      if (current_cpu) {
>          /* Guest initiated reset */
>          vdev->device_endian = virtio_current_cpu_endian();



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28  7:43     ` Claudio Fontana
  2022-07-28  9:09       ` Claudio Fontana
@ 2022-07-28 11:41       ` Michael S. Tsirkin
  1 sibling, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2022-07-28 11:41 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Jason Wang, qemu-devel, Alex Bennée, Marcel Apfelbaum

On Thu, Jul 28, 2022 at 09:43:56AM +0200, Claudio Fontana wrote:
> On 7/28/22 03:27, Jason Wang wrote:
> > On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>
> >> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
> >>> Hi Michael and all,
> >>>
> >>> I have started researching a qemu / ovs / dpdk bug:
> >>>
> >>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
> >>>
> >>> that seems to be affecting multiple parties in the telco space,
> >>>
> >>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
> >>> in virtio_set_status, when receiving a status value of 0.
> >>>
> >>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
> >>>
> >>> I checked the virtio spec at https://docs.oasis-open.org/
> >>>
> >>> and from:
> >>>
> >>> "
> >>> 4.1.4.3 Common configuration structure layout
> >>>
> >>> device_status
> >>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
> >>>
> >>> "
> >>>
> >>> and
> >>>
> >>> "
> >>> 2.4.1 Device Requirements: Device Reset
> >>> A device MUST reinitialize device status to 0 after receiving a reset.
> >>> "
> >>>
> >>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
> >>>
> >>> Instead, we have just the check:
> >>>
> >>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
> >>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> >>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
> >>>     }
> >>>
> >>> which just sets the started field,
> >>>
> >>> and then we have the call to the virtio device class set_status (virtio_net...),
> >>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
> >>>
> >>> "
> >>>     vdev->start_on_kick = false;
> >>>     vdev->started = false;
> >>>     vdev->broken = false;
> >>>     vdev->guest_features = 0;
> >>>     vdev->queue_sel = 0;
> >>>     vdev->status = 0;
> >>>     vdev->disabled = false;
> >>>     qatomic_set(&vdev->isr, 0);
> >>>     vdev->config_vector = VIRTIO_NO_VECTOR;
> >>>     virtio_notify_vector(vdev, vdev->config_vector);
> >>>
> >>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> >>>         ... initialize vdev->vq[i] ...
> >>>     }
> >>> "
> >>>
> >>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
> >>> but what am I missing here?
> >>>
> >>> Thanks,
> >>>
> >>> Claudio
> >>>
> >>> --
> >>> Claudio Fontana
> >>> Engineering Manager Virtualization, SUSE Labs Core
> >>>
> >>> SUSE Software Solutions Italy Srl
> >>
> >>
> >> So for example for pci:
> >>
> >>     case VIRTIO_PCI_STATUS:
> >>
> >>
> >>         ....
> >>
> >>         if (vdev->status == 0) {
> >>             virtio_pci_reset(DEVICE(proxy));
> >>         }
> >>
> >> which I suspect is a bug because:
> >>
> >> static void virtio_pci_reset(DeviceState *qdev)
> >> {
> >>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> >>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
> >>     PCIDevice *dev = PCI_DEVICE(qdev);
> >>     int i;
> >>
> >>     virtio_bus_reset(bus);
> > 
> > Note that we do virtio_reset() here.
> 
> 
> Yes, thank you, I completely overlooked it, I noticed this in Michael's response as well.
> 
> However we end up with multiple calls to k->set_status, one from the virtio_set_status call,
> and one from the virtio_bus_reset(), which is probably something we don't want.
> 
> All in all it is not clear what the meaning of virtio_set_status is supposed to be I think,
> and I wonder what the assumptions are among all the callers.
> If it is supposed to be an implementation of the virtio standard field as described, I think we should do the reset right then and there,
> but maybe the true meaning of the function is another one I couldn't understand, since _some_ of the cases are processes there.
> 
> And there is a question about ordering:
> 
> in virtio_pci we end up calling virtio_set_status(0), which gets us k->set_status(vdev, 0), which lands in virtio_net_set_status(0) and virtio_net_vhost_status,
> which causes a vhost_net_stop().
> Should we instead land in virtio_net_reset() first, by doing a virtio reset earlier when detecting a 0 value from the driver?


Well we want to first stop the backend and only then reset our
local state. Seems to make sense ...


> in the scenario I am looking at (with vhost-user, ovs/dpdk, and a guest testpmd application),
> the guest application goes away without any chance to signal (kill -9), then gets immediately restarted and does a write of 0 to status, while qemu and ovs still hold the state for the device.
> 
> As QEMU lands in vhost_net_stop(), it seems to cause a chain of events that crash ovs which is trying to read an rx burst from the queue,

Not sure I got this part.

> while QEMU is left hanging waiting forever for a response to VHOST_USER_GET_VRING_BASE issued as a result of vhost_net_stop.

Ineteresting why doesn't socket close after ovs crash cause the read to fail.

> Just saying, I am having more success with the second ordering, but I am still studying, don't have the full picture yet.
> 
> Thanks,
> 
> Claudio
> 
> > 
> >>     msix_unuse_all_vectors(&proxy->pci_dev);
> >>
> >>     for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> >>         proxy->vqs[i].enabled = 0;
> >>         proxy->vqs[i].num = 0;
> >>         proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
> >>         proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
> >>         proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
> >>     }
> >>
> >>
> >> so far so good
> >>
> >>     if (pci_is_express(dev)) {
> >>         pcie_cap_deverr_reset(dev);
> >>         pcie_cap_lnkctl_reset(dev);
> >>
> >>         pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
> >>     }
> >>
> >> this part is wrong I think, it got here by mistake since the same
> >> function is used for bus level reset.
> >>
> >> Jason, Marcel, any input?
> > 
> > Yes, I think we don't need PCI stuff here. We do virtio reset not pci.
> > 
> > Thanks
> > 
> >>
> >> --
> >> MST
> >>
> > 
> > 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28  9:09       ` Claudio Fontana
  2022-07-28 10:24         ` Cornelia Huck
@ 2022-07-28 13:39         ` Michael S. Tsirkin
  2022-07-29  9:46           ` Claudio Fontana
  1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2022-07-28 13:39 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Jason Wang, qemu-devel, Alex Bennée, Marcel Apfelbaum

On Thu, Jul 28, 2022 at 11:09:15AM +0200, Claudio Fontana wrote:
> On 7/28/22 09:43, Claudio Fontana wrote:
> > On 7/28/22 03:27, Jason Wang wrote:
> >> On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>
> >>> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
> >>>> Hi Michael and all,
> >>>>
> >>>> I have started researching a qemu / ovs / dpdk bug:
> >>>>
> >>>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
> >>>>
> >>>> that seems to be affecting multiple parties in the telco space,
> >>>>
> >>>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
> >>>> in virtio_set_status, when receiving a status value of 0.
> >>>>
> >>>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
> >>>>
> >>>> I checked the virtio spec at https://docs.oasis-open.org/
> >>>>
> >>>> and from:
> >>>>
> >>>> "
> >>>> 4.1.4.3 Common configuration structure layout
> >>>>
> >>>> device_status
> >>>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
> >>>>
> >>>> "
> >>>>
> >>>> and
> >>>>
> >>>> "
> >>>> 2.4.1 Device Requirements: Device Reset
> >>>> A device MUST reinitialize device status to 0 after receiving a reset.
> >>>> "
> >>>>
> >>>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
> >>>>
> >>>> Instead, we have just the check:
> >>>>
> >>>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
> >>>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> >>>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
> >>>>     }
> >>>>
> >>>> which just sets the started field,
> >>>>
> >>>> and then we have the call to the virtio device class set_status (virtio_net...),
> >>>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
> >>>>
> >>>> "
> >>>>     vdev->start_on_kick = false;
> >>>>     vdev->started = false;
> >>>>     vdev->broken = false;
> >>>>     vdev->guest_features = 0;
> >>>>     vdev->queue_sel = 0;
> >>>>     vdev->status = 0;
> >>>>     vdev->disabled = false;
> >>>>     qatomic_set(&vdev->isr, 0);
> >>>>     vdev->config_vector = VIRTIO_NO_VECTOR;
> >>>>     virtio_notify_vector(vdev, vdev->config_vector);
> >>>>
> >>>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> >>>>         ... initialize vdev->vq[i] ...
> >>>>     }
> >>>> "
> >>>>
> >>>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
> >>>> but what am I missing here?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Claudio
> >>>>
> >>>> --
> >>>> Claudio Fontana
> >>>> Engineering Manager Virtualization, SUSE Labs Core
> >>>>
> >>>> SUSE Software Solutions Italy Srl
> >>>
> >>>
> >>> So for example for pci:
> >>>
> >>>     case VIRTIO_PCI_STATUS:
> >>>
> >>>
> >>>         ....
> >>>
> >>>         if (vdev->status == 0) {
> >>>             virtio_pci_reset(DEVICE(proxy));
> >>>         }
> >>>
> >>> which I suspect is a bug because:
> >>>
> >>> static void virtio_pci_reset(DeviceState *qdev)
> >>> {
> >>>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> >>>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
> >>>     PCIDevice *dev = PCI_DEVICE(qdev);
> >>>     int i;
> >>>
> >>>     virtio_bus_reset(bus);
> >>
> >> Note that we do virtio_reset() here.
> > 
> > 
> > Yes, thank you, I completely overlooked it, I noticed this in Michael's response as well.
> > 
> > However we end up with multiple calls to k->set_status, one from the virtio_set_status call,
> > and one from the virtio_bus_reset(), which is probably something we don't want.
> > 
> > All in all it is not clear what the meaning of virtio_set_status is supposed to be I think,
> > and I wonder what the assumptions are among all the callers.
> > If it is supposed to be an implementation of the virtio standard field as described, I think we should do the reset right then and there,
> > but maybe the true meaning of the function is another one I couldn't understand, since _some_ of the cases are processes there.
> > 
> > And there is a question about ordering:
> > 
> > in virtio_pci we end up calling virtio_set_status(0), which gets us k->set_status(vdev, 0), which lands in virtio_net_set_status(0) and virtio_net_vhost_status,
> > which causes a vhost_net_stop().
> > 
> > Should we instead land in virtio_net_reset() first, by doing a virtio reset earlier when detecting a 0 value from the driver?
> > 
> > in the scenario I am looking at (with vhost-user, ovs/dpdk, and a guest testpmd application),
> > the guest application goes away without any chance to signal (kill -9), then gets immediately restarted and does a write of 0 to status, while qemu and ovs still hold the state for the device.
> > 
> > As QEMU lands in vhost_net_stop(), it seems to cause a chain of events that crash ovs which is trying to read an rx burst from the queue,
> > while QEMU is left hanging waiting forever for a response to VHOST_USER_GET_VRING_BASE issued as a result of vhost_net_stop.
> > 
> > Just saying, I am having more success with the second ordering, but I am still studying, don't have the full picture yet.
> 
> 
> Currently I'm doing (on top of Michael's patch) the following which seems to be working
> (but of course this does not even being to look at the other transports, architectures etc),
> just an idea to share:
> 
> ---
>  hw/virtio/virtio-pci.c | 7 ++++---
>  hw/virtio/virtio.c     | 7 ++++++-
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 3189ec014d..3cbfa3ce3a 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -312,6 +312,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>      case VIRTIO_PCI_QUEUE_PFN:
>          pa = (hwaddr)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
>          if (pa == 0) {
> +            virtio_bus_reset(&proxy->bus);
>              virtio_pci_reset(DEVICE(proxy));
>          }
>          else
> @@ -1941,11 +1942,8 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
>  static void virtio_pci_reset(DeviceState *qdev)
>  {
>      VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> -    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
> -    PCIDevice *dev = PCI_DEVICE(qdev);
>      int i;
>  
> -    virtio_bus_reset(bus);
>      msix_unuse_all_vectors(&proxy->pci_dev);
>  
>      for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> @@ -1960,7 +1958,10 @@ static void virtio_pci_reset(DeviceState *qdev)
>  static void virtio_pci_bus_reset(DeviceState *qdev)
>  {
>      PCIDevice *dev = PCI_DEVICE(qdev);
> +    VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> +    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>  
> +    virtio_bus_reset(bus);
>      virtio_pci_reset(qdev);
>  
>      if (pci_is_express(dev)) {
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 5d607aeaa0..da58ca6f86 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1977,6 +1977,12 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>      trace_virtio_set_status(vdev, val);
>  
> +    if (val == 0) {
> +        VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +        virtio_bus_reset(bus);
> +        return 0;
> +    }
> +
>      if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>          if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
>              val & VIRTIO_CONFIG_S_FEATURES_OK) {
> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>      int i;
>  
> -    virtio_set_status(vdev, 0);
>      if (current_cpu) {
>          /* Guest initiated reset */
>          vdev->device_endian = virtio_current_cpu_endian();
> -- 
> 2.26.2

As you say this is incomplete ... bout could you share a bit more
of what issue does this address?

-- 
MST



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28 13:39         ` Michael S. Tsirkin
@ 2022-07-29  9:46           ` Claudio Fontana
  2022-07-29 10:13             ` Michael S. Tsirkin
  0 siblings, 1 reply; 18+ messages in thread
From: Claudio Fontana @ 2022-07-29  9:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, Alex Bennée, Marcel Apfelbaum

On 7/28/22 15:39, Michael S. Tsirkin wrote:
> On Thu, Jul 28, 2022 at 11:09:15AM +0200, Claudio Fontana wrote:
>> On 7/28/22 09:43, Claudio Fontana wrote:
>>> On 7/28/22 03:27, Jason Wang wrote:
>>>> On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>
>>>>> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
>>>>>> Hi Michael and all,
>>>>>>
>>>>>> I have started researching a qemu / ovs / dpdk bug:
>>>>>>
>>>>>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
>>>>>>
>>>>>> that seems to be affecting multiple parties in the telco space,
>>>>>>
>>>>>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
>>>>>> in virtio_set_status, when receiving a status value of 0.
>>>>>>
>>>>>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
>>>>>>
>>>>>> I checked the virtio spec at https://docs.oasis-open.org/
>>>>>>
>>>>>> and from:
>>>>>>
>>>>>> "
>>>>>> 4.1.4.3 Common configuration structure layout
>>>>>>
>>>>>> device_status
>>>>>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
>>>>>>
>>>>>> "
>>>>>>
>>>>>> and
>>>>>>
>>>>>> "
>>>>>> 2.4.1 Device Requirements: Device Reset
>>>>>> A device MUST reinitialize device status to 0 after receiving a reset.
>>>>>> "
>>>>>>
>>>>>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
>>>>>>
>>>>>> Instead, we have just the check:
>>>>>>
>>>>>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>>>>>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>>>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>>     }
>>>>>>
>>>>>> which just sets the started field,
>>>>>>
>>>>>> and then we have the call to the virtio device class set_status (virtio_net...),
>>>>>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
>>>>>>
>>>>>> "
>>>>>>     vdev->start_on_kick = false;
>>>>>>     vdev->started = false;
>>>>>>     vdev->broken = false;
>>>>>>     vdev->guest_features = 0;
>>>>>>     vdev->queue_sel = 0;
>>>>>>     vdev->status = 0;
>>>>>>     vdev->disabled = false;
>>>>>>     qatomic_set(&vdev->isr, 0);
>>>>>>     vdev->config_vector = VIRTIO_NO_VECTOR;
>>>>>>     virtio_notify_vector(vdev, vdev->config_vector);
>>>>>>
>>>>>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>>>>>         ... initialize vdev->vq[i] ...
>>>>>>     }
>>>>>> "
>>>>>>
>>>>>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
>>>>>> but what am I missing here?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Claudio
>>>>>>
>>>>>> --
>>>>>> Claudio Fontana
>>>>>> Engineering Manager Virtualization, SUSE Labs Core
>>>>>>
>>>>>> SUSE Software Solutions Italy Srl
>>>>>
>>>>>
>>>>> So for example for pci:
>>>>>
>>>>>     case VIRTIO_PCI_STATUS:
>>>>>
>>>>>
>>>>>         ....
>>>>>
>>>>>         if (vdev->status == 0) {
>>>>>             virtio_pci_reset(DEVICE(proxy));
>>>>>         }
>>>>>
>>>>> which I suspect is a bug because:
>>>>>
>>>>> static void virtio_pci_reset(DeviceState *qdev)
>>>>> {
>>>>>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>>>>>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>>>>     PCIDevice *dev = PCI_DEVICE(qdev);
>>>>>     int i;
>>>>>
>>>>>     virtio_bus_reset(bus);
>>>>
>>>> Note that we do virtio_reset() here.
>>>
>>>
>>> Yes, thank you, I completely overlooked it, I noticed this in Michael's response as well.
>>>
>>> However we end up with multiple calls to k->set_status, one from the virtio_set_status call,
>>> and one from the virtio_bus_reset(), which is probably something we don't want.
>>>
>>> All in all it is not clear what the meaning of virtio_set_status is supposed to be I think,
>>> and I wonder what the assumptions are among all the callers.
>>> If it is supposed to be an implementation of the virtio standard field as described, I think we should do the reset right then and there,
>>> but maybe the true meaning of the function is another one I couldn't understand, since _some_ of the cases are processes there.
>>>
>>> And there is a question about ordering:
>>>
>>> in virtio_pci we end up calling virtio_set_status(0), which gets us k->set_status(vdev, 0), which lands in virtio_net_set_status(0) and virtio_net_vhost_status,
>>> which causes a vhost_net_stop().
>>>
>>> Should we instead land in virtio_net_reset() first, by doing a virtio reset earlier when detecting a 0 value from the driver?
>>>
>>> in the scenario I am looking at (with vhost-user, ovs/dpdk, and a guest testpmd application),
>>> the guest application goes away without any chance to signal (kill -9), then gets immediately restarted and does a write of 0 to status, while qemu and ovs still hold the state for the device.
>>>
>>> As QEMU lands in vhost_net_stop(), it seems to cause a chain of events that crash ovs which is trying to read an rx burst from the queue,
>>> while QEMU is left hanging waiting forever for a response to VHOST_USER_GET_VRING_BASE issued as a result of vhost_net_stop.
>>>
>>> Just saying, I am having more success with the second ordering, but I am still studying, don't have the full picture yet.
>>
>>
>> Currently I'm doing (on top of Michael's patch) the following which seems to be working
>> (but of course this does not even being to look at the other transports, architectures etc),
>> just an idea to share:
>>
>> ---
>>  hw/virtio/virtio-pci.c | 7 ++++---
>>  hw/virtio/virtio.c     | 7 ++++++-
>>  2 files changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index 3189ec014d..3cbfa3ce3a 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -312,6 +312,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>>      case VIRTIO_PCI_QUEUE_PFN:
>>          pa = (hwaddr)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
>>          if (pa == 0) {
>> +            virtio_bus_reset(&proxy->bus);
>>              virtio_pci_reset(DEVICE(proxy));
>>          }
>>          else
>> @@ -1941,11 +1942,8 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
>>  static void virtio_pci_reset(DeviceState *qdev)
>>  {
>>      VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>> -    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>> -    PCIDevice *dev = PCI_DEVICE(qdev);
>>      int i;
>>  
>> -    virtio_bus_reset(bus);
>>      msix_unuse_all_vectors(&proxy->pci_dev);
>>  
>>      for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>> @@ -1960,7 +1958,10 @@ static void virtio_pci_reset(DeviceState *qdev)
>>  static void virtio_pci_bus_reset(DeviceState *qdev)
>>  {
>>      PCIDevice *dev = PCI_DEVICE(qdev);
>> +    VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>> +    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>  
>> +    virtio_bus_reset(bus);
>>      virtio_pci_reset(qdev);
>>  
>>      if (pci_is_express(dev)) {
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 5d607aeaa0..da58ca6f86 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -1977,6 +1977,12 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>      trace_virtio_set_status(vdev, val);
>>  
>> +    if (val == 0) {
>> +        VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
>> +        virtio_bus_reset(bus);
>> +        return 0;
>> +    }
>> +
>>      if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>>          if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
>>              val & VIRTIO_CONFIG_S_FEATURES_OK) {
>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>      int i;
>>  
>> -    virtio_set_status(vdev, 0);
>>      if (current_cpu) {
>>          /* Guest initiated reset */
>>          vdev->device_endian = virtio_current_cpu_endian();
>> -- 
>> 2.26.2
> 
> As you say this is incomplete ... bout could you share a bit more
> of what issue does this address?
> 

Hi, the problem I am trying to address is a segfault in OVS/dpdk that looks like this:

Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f64e5e6b700 (LWP 141373)]
rte_mov128blocks (n=2048, src=0xc <error: Cannot access memory at address 0xc>, dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at ../lib/eal/x86/include/rte_memcpy.h:384
384	../lib/eal/x86/include/rte_memcpy.h: No such file or directory.
(gdb) bt
#0  rte_mov128blocks (n=2048, src=0xc <error: Cannot access memory at address 0xc>, 
    dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at ../lib/eal/x86/include/rte_memcpy.h:384
#1  rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480) at ../lib/eal/x86/include/rte_memcpy.h:484
#2  rte_memcpy (n=2048, src=0xc, dst=<optimized out>) at ../lib/eal/x86/include/rte_memcpy.h:851
#3  sync_fill_seg (to_desc=false, cpy_len=2048, buf_iova=<optimized out>, buf_addr=12, mbuf_offset=0, m=0x150da4140, 
    vq=0x2200400680, dev=0x2200d3d740) at ../lib/vhost/virtio_net.c:1119
#4  desc_to_mbuf (is_async=false, slot_idx=0, legacy_ol_flags=true, mbuf_pool=0x17fe7df00, m=0x150da4140, nr_vec=<optimized out>, 
    buf_vec=0x7f64e5e67ca0, vq=0x2200400680, dev=0x2200d3d740) at ../lib/vhost/virtio_net.c:2747
#5  virtio_dev_tx_split (legacy_ol_flags=true, count=<optimized out>, count@entry=0, pkts=pkts@entry=0x0, 
    mbuf_pool=mbuf_pool@entry=0x150da4140, vq=vq@entry=0xe5e67d34, dev=dev@entry=0x7f64e5e694d0) at ../lib/vhost/virtio_net.c:2943
#6  virtio_dev_tx_split_legacy (dev=dev@entry=0x2200d3d740, vq=vq@entry=0x2200400680, mbuf_pool=mbuf_pool@entry=0x17fe7df00, 
    pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at ../lib/vhost/virtio_net.c:2979
#7  0x00007f676fea0fef in rte_vhost_dequeue_burst (vid=vid@entry=0, queue_id=queue_id@entry=1, mbuf_pool=0x17fe7df00, 
    pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at ../lib/vhost/virtio_net.c:3331
#8  0x00007f6772005a62 in netdev_dpdk_vhost_rxq_recv (rxq=<optimized out>, batch=0x7f64e5e695f0, qfill=0x0)
    at ../lib/netdev-dpdk.c:2393
#9  0x00007f6771f38116 in netdev_rxq_recv (rx=<optimized out>, batch=batch@entry=0x7f64e5e695f0, qfill=<optimized out>)
    at ../lib/netdev.c:727
#10 0x00007f6771f03d96 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7f64e5e6c010, rxq=0x254d730, port_no=2)
    at ../lib/dpif-netdev.c:5317
#11 0x00007f6771f04239 in pmd_thread_main (f_=<optimized out>) at ../lib/dpif-netdev.c:6945
#12 0x00007f6771f92aff in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:422
#13 0x00007f6771c1b6ea in start_thread () from /lib64/libpthread.so.0
#14 0x00007f6771933a8f in clone () from /lib64/libc.so.6

---

Where as you can see ovs/dpdk is trying to access rx memory at an invalid src address.

The setup includes the latest dpdk library, the latest openvswitch (ovs-vswitchd), the latest qemu on the host,
and old dpdk and test-pmd dpdk app in the guest (tested with 16- and 19-).  The guest should work even with older versions without crashing the host processes.

The setup is the most basic qemu, vhost-user, ovs, dpdk setup, with a traffic generator sending stuff to forward for ovs.

It is sufficient in the guest to bind the device,

modprobe uio_pci_generic
dpdk-stable-16.11.11/tools/dpdk-devbind.py -b uio_pci_generic 0000:07:00.0

Then start the testpmd application in a loop:

while true ; do 
    /home/zhl/dpdk-stable-16.11.11/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd --log-level=8 -c 0x1e -n 4 --socket-mem 512 -- -i --nb-cores=3 --port-topology=chained --disable-hw-vlan --forward-mode=macswap --auto-start --rxq=4 --txq=4 --rxd=512 --txd=512 --burst=32
done

and in parallel from another shell in the guest f.e. kill -9 the testpmd application as such:

while true ; do
    sleep 11
    kill -9 `pgrep -x testpmd`
done

What ends up happening is that at the time of reinitialization of testpmd after it was SIGKILLed,
the reset causes havok as ovs tries to read from the queues. May take anything from a few seconds to a few minutes to reproduce.

Still investigating, I noticed the strange qemu behavior on reset, however likely there is much more to study.

If anyone is trying to reproduce this, let me know if anything is needed to help reproduce.

I can share more logs if needed, the challenge with logging is,
if we log too much things slow down, and the issue becomes harder and harder to reproduce.
Optimization -O1 is still ok, -O0 also makes it much harder to hit.

Thanks,

Claudio


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-29  9:46           ` Claudio Fontana
@ 2022-07-29 10:13             ` Michael S. Tsirkin
  2022-07-29 10:19               ` Claudio Fontana
  0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2022-07-29 10:13 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Jason Wang, qemu-devel, Alex Bennée, Marcel Apfelbaum

On Fri, Jul 29, 2022 at 11:46:05AM +0200, Claudio Fontana wrote:
> >> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
> >>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> >>      int i;
> >>  
> >> -    virtio_set_status(vdev, 0);
> >>      if (current_cpu) {
> >>          /* Guest initiated reset */
> >>          vdev->device_endian = virtio_current_cpu_endian();
> >> -- 
> >> 2.26.2
> > 
> > As you say this is incomplete ... bout could you share a bit more
> > of what issue does this address?
> > 
> 
> Hi, the problem I am trying to address is a segfault in OVS/dpdk that looks like this:

Sorry I was not clear. What I mean is, you don't yet know why does removing
virtio_set_status call here prevent the crash in ovs, do you?

-- 
MST



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-29 10:13             ` Michael S. Tsirkin
@ 2022-07-29 10:19               ` Claudio Fontana
  2022-07-29 13:21                 ` Alex Bennée
  0 siblings, 1 reply; 18+ messages in thread
From: Claudio Fontana @ 2022-07-29 10:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, Alex Bennée, Marcel Apfelbaum

On 7/29/22 12:13, Michael S. Tsirkin wrote:
> On Fri, Jul 29, 2022 at 11:46:05AM +0200, Claudio Fontana wrote:
>>>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>>>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>>>      int i;
>>>>  
>>>> -    virtio_set_status(vdev, 0);
>>>>      if (current_cpu) {
>>>>          /* Guest initiated reset */
>>>>          vdev->device_endian = virtio_current_cpu_endian();
>>>> -- 
>>>> 2.26.2
>>>
>>> As you say this is incomplete ... bout could you share a bit more
>>> of what issue does this address?
>>>
>>
>> Hi, the problem I am trying to address is a segfault in OVS/dpdk that looks like this:
> 
> Sorry I was not clear. What I mean is, you don't yet know why does removing
> virtio_set_status call here prevent the crash in ovs, do you?
> 

I have no idea. Trying to collect logs to figure things out, but as mentioned the logs easily hide the issue.
Likely there is just more to study here.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-29 10:19               ` Claudio Fontana
@ 2022-07-29 13:21                 ` Alex Bennée
  2022-07-29 14:00                   ` Claudio Fontana
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Bennée @ 2022-07-29 13:21 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Michael S. Tsirkin, Jason Wang, qemu-devel, Marcel Apfelbaum


Claudio Fontana <cfontana@suse.de> writes:

> On 7/29/22 12:13, Michael S. Tsirkin wrote:
>> On Fri, Jul 29, 2022 at 11:46:05AM +0200, Claudio Fontana wrote:
>>>>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>>>>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>>>>      int i;
>>>>>  
>>>>> -    virtio_set_status(vdev, 0);
>>>>>      if (current_cpu) {
>>>>>          /* Guest initiated reset */
>>>>>          vdev->device_endian = virtio_current_cpu_endian();
>>>>> -- 
>>>>> 2.26.2
>>>>
>>>> As you say this is incomplete ... bout could you share a bit more
>>>> of what issue does this address?
>>>>
>>>
>>> Hi, the problem I am trying to address is a segfault in OVS/dpdk that looks like this:
>> 
>> Sorry I was not clear. What I mean is, you don't yet know why does removing
>> virtio_set_status call here prevent the crash in ovs, do you?
>> 
>
> I have no idea. Trying to collect logs to figure things out, but as
> mentioned the logs easily hide the issue.
> Likely there is just more to study here.

Given the OVS is going off on a NULL ptr deref could it just be it's not
handling the disabling/reenabling of the virtqueues as you pause and
restart properly? I could certainly imagine a backend jumping the gun to
read a queue going very wrong if the current queue state is disabled.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-29 13:21                 ` Alex Bennée
@ 2022-07-29 14:00                   ` Claudio Fontana
  2022-07-31 20:42                     ` Claudio Fontana
  0 siblings, 1 reply; 18+ messages in thread
From: Claudio Fontana @ 2022-07-29 14:00 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Michael S. Tsirkin, Jason Wang, qemu-devel, Marcel Apfelbaum

On 7/29/22 15:21, Alex Bennée wrote:
> 
> Claudio Fontana <cfontana@suse.de> writes:
> 
>> On 7/29/22 12:13, Michael S. Tsirkin wrote:
>>> On Fri, Jul 29, 2022 at 11:46:05AM +0200, Claudio Fontana wrote:
>>>>>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>>>>>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>>>>>      int i;
>>>>>>  
>>>>>> -    virtio_set_status(vdev, 0);
>>>>>>      if (current_cpu) {
>>>>>>          /* Guest initiated reset */
>>>>>>          vdev->device_endian = virtio_current_cpu_endian();
>>>>>> -- 
>>>>>> 2.26.2
>>>>>
>>>>> As you say this is incomplete ... bout could you share a bit more
>>>>> of what issue does this address?
>>>>>
>>>>
>>>> Hi, the problem I am trying to address is a segfault in OVS/dpdk that looks like this:
>>>
>>> Sorry I was not clear. What I mean is, you don't yet know why does removing
>>> virtio_set_status call here prevent the crash in ovs, do you?
>>>
>>
>> I have no idea. Trying to collect logs to figure things out, but as
>> mentioned the logs easily hide the issue.
>> Likely there is just more to study here.
> 
> Given the OVS is going off on a NULL ptr deref could it just be it's not
> handling the disabling/reenabling of the virtqueues as you pause and
> restart properly? I could certainly imagine a backend jumping the gun to
> read a queue going very wrong if the current queue state is disabled.
> 

In this case both the ovs buf_addr and buf_iova are NULL, which is a nice case as they are more detectable,
however I also have segfaults where the addresses are just garbage.

I wonder whether it's possible that given the fact that the guest is going away without notification (SIGKILL),
as the guest driver resets the device and communicates with QEMU, QEMU adapts the state without notifying ovs,
so ovs happily tries to dequeue data from memory that isn't there. But I am just guessing.

I am still studying the qemu vhost user side and ovs/dpdk side to try to understand how this whole thing works.

Thanks,

CLaudio




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-28 10:24         ` Cornelia Huck
@ 2022-07-31 20:38           ` Claudio Fontana
  0 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2022-07-31 20:38 UTC (permalink / raw)
  To: Cornelia Huck, Jason Wang, Michael S. Tsirkin
  Cc: qemu-devel, Alex Bennée, Marcel Apfelbaum, Halil Pasic, Eric Farman

On 7/28/22 12:24, Cornelia Huck wrote:
> On Thu, Jul 28 2022, Claudio Fontana <cfontana@suse.de> wrote:
> 
>> On 7/28/22 09:43, Claudio Fontana wrote:
>>> On 7/28/22 03:27, Jason Wang wrote:
>>>> On Wed, Jul 27, 2022 at 11:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>
>>>>> On Wed, Jul 27, 2022 at 12:51:31PM +0200, Claudio Fontana wrote:
>>>>>> Hi Michael and all,
>>>>>>
>>>>>> I have started researching a qemu / ovs / dpdk bug:
>>>>>>
>>>>>> https://inbox.dpdk.org/dev/322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com/T/
>>>>>>
>>>>>> that seems to be affecting multiple parties in the telco space,
>>>>>>
>>>>>> and during this process I noticed that qemu/hw/virtio/virtio.c does not do a full virtio reset
>>>>>> in virtio_set_status, when receiving a status value of 0.
>>>>>>
>>>>>> It seems it has always been this way, so I am clearly missing / forgetting something basic,
>>>>>>
>>>>>> I checked the virtio spec at https://docs.oasis-open.org/
>>>>>>
>>>>>> and from:
>>>>>>
>>>>>> "
>>>>>> 4.1.4.3 Common configuration structure layout
>>>>>>
>>>>>> device_status
>>>>>> The driver writes the device status here (see 2.1). Writing 0 into this field resets the device.
>>>>>>
>>>>>> "
>>>>>>
>>>>>> and
>>>>>>
>>>>>> "
>>>>>> 2.4.1 Device Requirements: Device Reset
>>>>>> A device MUST reinitialize device status to 0 after receiving a reset.
>>>>>> "
> 
> Side note: We can also have a reset without writing 0 to the device
> status (RESET ccw on the virtio-ccw transport).
> 
>>>>>>
>>>>>> I would conclude that in virtio.c::virtio_set_status we should unconditionally do a full virtio_reset.
>>>>>>
>>>>>> Instead, we have just the check:
>>>>>>
>>>>>>     if ((vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) !=
>>>>>>         (val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>>>>         virtio_set_started(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>>     }
>>>>>>
>>>>>> which just sets the started field,
>>>>>>
>>>>>> and then we have the call to the virtio device class set_status (virtio_net...),
>>>>>> but the VirtioDevice is not fully reset, as per the virtio_reset() call we are missing:
>>>>>>
>>>>>> "
>>>>>>     vdev->start_on_kick = false;
>>>>>>     vdev->started = false;
>>>>>>     vdev->broken = false;
>>>>>>     vdev->guest_features = 0;
>>>>>>     vdev->queue_sel = 0;
>>>>>>     vdev->status = 0;
>>>>>>     vdev->disabled = false;
>>>>>>     qatomic_set(&vdev->isr, 0);
>>>>>>     vdev->config_vector = VIRTIO_NO_VECTOR;
>>>>>>     virtio_notify_vector(vdev, vdev->config_vector);
>>>>>>
>>>>>>     for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>>>>>>         ... initialize vdev->vq[i] ...
>>>>>>     }
>>>>>> "
>>>>>>
>>>>>> Doing a full reset seems to fix the problem for me, so I can send tentative patches if necessary,
>>>>>> but what am I missing here?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Claudio
>>>>>>
>>>>>> --
>>>>>> Claudio Fontana
>>>>>> Engineering Manager Virtualization, SUSE Labs Core
>>>>>>
>>>>>> SUSE Software Solutions Italy Srl
>>>>>
>>>>>
>>>>> So for example for pci:
>>>>>
>>>>>     case VIRTIO_PCI_STATUS:
>>>>>
>>>>>
>>>>>         ....
>>>>>
>>>>>         if (vdev->status == 0) {
>>>>>             virtio_pci_reset(DEVICE(proxy));
>>>>>         }
> 
> FWIW, ccw ends up calling virtio_ccw_reset_virtio() when the driver
> issues a reset command, or when it issues a write status 0 command, or
> when the generic reset function is invoked.
> 
>>>>>
>>>>> which I suspect is a bug because:
>>>>>
>>>>> static void virtio_pci_reset(DeviceState *qdev)
>>>>> {
>>>>>     VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>>>>>     VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>>>>     PCIDevice *dev = PCI_DEVICE(qdev);
>>>>>     int i;
>>>>>
>>>>>     virtio_bus_reset(bus);
>>>>
>>>> Note that we do virtio_reset() here.
>>>
>>>
>>> Yes, thank you, I completely overlooked it, I noticed this in Michael's response as well.
>>>
>>> However we end up with multiple calls to k->set_status, one from the virtio_set_status call,
>>> and one from the virtio_bus_reset(), which is probably something we don't want.
>>>
>>> All in all it is not clear what the meaning of virtio_set_status is supposed to be I think,
>>> and I wonder what the assumptions are among all the callers.
>>> If it is supposed to be an implementation of the virtio standard field as described, I think we should do the reset right then and there,
>>> but maybe the true meaning of the function is another one I couldn't understand, since _some_ of the cases are processes there.
> 
> Hm. Maybe there needs to be a distinction between "we're forwarding the
> status setting by the driver to the core, take any appropriate action"
> and "we've just reset the device, now we just need to zero out the
> status field"?

Right, and the reset function of virtio already sets ->status to 0 manually as part of the reset.

Fundamentally, the only issue I am seeing in qemu is this semantic thing,

and the fact that the virtio device class set_status functions are called twice, which seems asking for trouble.

The actual segfault in OVS I am pursuing as a problem in DPDK itself.

Thanks,

Claudio
> 
>>>
>>> And there is a question about ordering:
>>>
>>> in virtio_pci we end up calling virtio_set_status(0), which gets us k->set_status(vdev, 0), which lands in virtio_net_set_status(0) and virtio_net_vhost_status,
>>> which causes a vhost_net_stop().
>>>
>>> Should we instead land in virtio_net_reset() first, by doing a virtio reset earlier when detecting a 0 value from the driver?
>>>
>>> in the scenario I am looking at (with vhost-user, ovs/dpdk, and a guest testpmd application),
>>> the guest application goes away without any chance to signal (kill -9), then gets immediately restarted and does a write of 0 to status, while qemu and ovs still hold the state for the device.
>>>
>>> As QEMU lands in vhost_net_stop(), it seems to cause a chain of events that crash ovs which is trying to read an rx burst from the queue,
>>> while QEMU is left hanging waiting forever for a response to VHOST_USER_GET_VRING_BASE issued as a result of vhost_net_stop.
>>>
>>> Just saying, I am having more success with the second ordering, but I am still studying, don't have the full picture yet.
>>
>>
>> Currently I'm doing (on top of Michael's patch) the following which seems to be working
>> (but of course this does not even being to look at the other transports, architectures etc),
>> just an idea to share:
>>
>> ---
>>  hw/virtio/virtio-pci.c | 7 ++++---
>>  hw/virtio/virtio.c     | 7 ++++++-
>>  2 files changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index 3189ec014d..3cbfa3ce3a 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -312,6 +312,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>>      case VIRTIO_PCI_QUEUE_PFN:
>>          pa = (hwaddr)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
>>          if (pa == 0) {
>> +            virtio_bus_reset(&proxy->bus);
>>              virtio_pci_reset(DEVICE(proxy));
>>          }
>>          else
>> @@ -1941,11 +1942,8 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
>>  static void virtio_pci_reset(DeviceState *qdev)
>>  {
>>      VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>> -    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>> -    PCIDevice *dev = PCI_DEVICE(qdev);
>>      int i;
>>  
>> -    virtio_bus_reset(bus);
>>      msix_unuse_all_vectors(&proxy->pci_dev);
>>  
>>      for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
>> @@ -1960,7 +1958,10 @@ static void virtio_pci_reset(DeviceState *qdev)
>>  static void virtio_pci_bus_reset(DeviceState *qdev)
>>  {
>>      PCIDevice *dev = PCI_DEVICE(qdev);
>> +    VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>> +    VirtioBusState *bus = VIRTIO_BUS(&proxy->bus);
>>  
>> +    virtio_bus_reset(bus);
>>      virtio_pci_reset(qdev);
>>  
>>      if (pci_is_express(dev)) {
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 5d607aeaa0..da58ca6f86 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -1977,6 +1977,12 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>      trace_virtio_set_status(vdev, val);
>>  
>> +    if (val == 0) {
>> +        VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
>> +        virtio_bus_reset(bus);
>> +        return 0;
>> +    }
>> +
>>      if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>>          if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
>>              val & VIRTIO_CONFIG_S_FEATURES_OK) {
>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>      int i;
>>  
>> -    virtio_set_status(vdev, 0);
> 
> Doesn't that break virtio-ccw for resets triggered via the RESET ccw
> (see above?)
> 
>>      if (current_cpu) {
>>          /* Guest initiated reset */
>>          vdev->device_endian = virtio_current_cpu_endian();
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-29 14:00                   ` Claudio Fontana
@ 2022-07-31 20:42                     ` Claudio Fontana
  2022-08-01  8:44                       ` Alex Bennée
  0 siblings, 1 reply; 18+ messages in thread
From: Claudio Fontana @ 2022-07-31 20:42 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Michael S. Tsirkin, Jason Wang, qemu-devel, Marcel Apfelbaum

On 7/29/22 16:00, Claudio Fontana wrote:
> On 7/29/22 15:21, Alex Bennée wrote:
>>
>> Claudio Fontana <cfontana@suse.de> writes:
>>
>>> On 7/29/22 12:13, Michael S. Tsirkin wrote:
>>>> On Fri, Jul 29, 2022 at 11:46:05AM +0200, Claudio Fontana wrote:
>>>>>>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>>>>>>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>>>>>>      int i;
>>>>>>>  
>>>>>>> -    virtio_set_status(vdev, 0);
>>>>>>>      if (current_cpu) {
>>>>>>>          /* Guest initiated reset */
>>>>>>>          vdev->device_endian = virtio_current_cpu_endian();
>>>>>>> -- 
>>>>>>> 2.26.2
>>>>>>
>>>>>> As you say this is incomplete ... bout could you share a bit more
>>>>>> of what issue does this address?
>>>>>>
>>>>>
>>>>> Hi, the problem I am trying to address is a segfault in OVS/dpdk that looks like this:
>>>>
>>>> Sorry I was not clear. What I mean is, you don't yet know why does removing
>>>> virtio_set_status call here prevent the crash in ovs, do you?
>>>>
>>>
>>> I have no idea. Trying to collect logs to figure things out, but as
>>> mentioned the logs easily hide the issue.
>>> Likely there is just more to study here.
>>
>> Given the OVS is going off on a NULL ptr deref could it just be it's not
>> handling the disabling/reenabling of the virtqueues as you pause and
>> restart properly? I could certainly imagine a backend jumping the gun to
>> read a queue going very wrong if the current queue state is disabled.
>>
> 
> In this case both the ovs buf_addr and buf_iova are NULL, which is a nice case as they are more detectable,
> however I also have segfaults where the addresses are just garbage.
> 
> I wonder whether it's possible that given the fact that the guest is going away without notification (SIGKILL),
> as the guest driver resets the device and communicates with QEMU, QEMU adapts the state without notifying ovs,
> so ovs happily tries to dequeue data from memory that isn't there. But I am just guessing.
> 
> I am still studying the qemu vhost user side and ovs/dpdk side to try to understand how this whole thing works.
> 
> Thanks,
> 
> CLaudio
> 

I am pursuing this as a DPDK library issue.

It would be cool to have ovs, dpdk and vhost-user with the default test-pmd application somehow hooked up in a basic test
in one of these projects..

Thanks,

Claudio




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: virtio: why no full reset on virtio_set_status 0 ?
  2022-07-31 20:42                     ` Claudio Fontana
@ 2022-08-01  8:44                       ` Alex Bennée
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2022-08-01  8:44 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Michael S. Tsirkin, Jason Wang, qemu-devel, Marcel Apfelbaum


Claudio Fontana <cfontana@suse.de> writes:

> On 7/29/22 16:00, Claudio Fontana wrote:
>> On 7/29/22 15:21, Alex Bennée wrote:
>>>
>>> Claudio Fontana <cfontana@suse.de> writes:
>>>
>>>> On 7/29/22 12:13, Michael S. Tsirkin wrote:
>>>>> On Fri, Jul 29, 2022 at 11:46:05AM +0200, Claudio Fontana wrote:
>>>>>>>> @@ -2025,7 +2031,6 @@ void virtio_reset(void *opaque)
>>>>>>>>      VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>>>>>>>      int i;
>>>>>>>>  
>>>>>>>> -    virtio_set_status(vdev, 0);
>>>>>>>>      if (current_cpu) {
>>>>>>>>          /* Guest initiated reset */
>>>>>>>>          vdev->device_endian = virtio_current_cpu_endian();
>>>>>>>> -- 
>>>>>>>> 2.26.2
>>>>>>>
>>>>>>> As you say this is incomplete ... bout could you share a bit more
>>>>>>> of what issue does this address?
>>>>>>>
>>>>>>
>>>>>> Hi, the problem I am trying to address is a segfault in OVS/dpdk that looks like this:
>>>>>
>>>>> Sorry I was not clear. What I mean is, you don't yet know why does removing
>>>>> virtio_set_status call here prevent the crash in ovs, do you?
>>>>>
>>>>
>>>> I have no idea. Trying to collect logs to figure things out, but as
>>>> mentioned the logs easily hide the issue.
>>>> Likely there is just more to study here.
>>>
>>> Given the OVS is going off on a NULL ptr deref could it just be it's not
>>> handling the disabling/reenabling of the virtqueues as you pause and
>>> restart properly? I could certainly imagine a backend jumping the gun to
>>> read a queue going very wrong if the current queue state is disabled.
>>>
>> 
>> In this case both the ovs buf_addr and buf_iova are NULL, which is a
>> nice case as they are more detectable,
>> however I also have segfaults where the addresses are just garbage.
>> 
>> I wonder whether it's possible that given the fact that the guest is
>> going away without notification (SIGKILL),
>> as the guest driver resets the device and communicates with QEMU,
>> QEMU adapts the state without notifying ovs,
>> so ovs happily tries to dequeue data from memory that isn't there. But I am just guessing.
>> 
>> I am still studying the qemu vhost user side and ovs/dpdk side to
>> try to understand how this whole thing works.
>> 
>> Thanks,
>> 
>> CLaudio
>> 
>
> I am pursuing this as a DPDK library issue.
>
> It would be cool to have ovs, dpdk and vhost-user with the default
> test-pmd application somehow hooked up in a basic test
> in one of these projects..

I agree although it's hard to marshal multiple projects into a known
working state that isn't too brittle for CI purposes. The existing
qos-test testing doesn't really exercise any more than the initial setup
and register reading of the VirtIO device.

For example we have a number of non network standalone vhost-user
backends in rust-vmm which would be nice to plumb in somehow.

>
> Thanks,
>
> Claudio


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-08-01  8:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-27 10:51 virtio: why no full reset on virtio_set_status 0 ? Claudio Fontana
2022-07-27 15:32 ` Michael S. Tsirkin
2022-07-28  1:27   ` Jason Wang
2022-07-28  7:16     ` Claudio Fontana
2022-07-28  7:43     ` Claudio Fontana
2022-07-28  9:09       ` Claudio Fontana
2022-07-28 10:24         ` Cornelia Huck
2022-07-31 20:38           ` Claudio Fontana
2022-07-28 13:39         ` Michael S. Tsirkin
2022-07-29  9:46           ` Claudio Fontana
2022-07-29 10:13             ` Michael S. Tsirkin
2022-07-29 10:19               ` Claudio Fontana
2022-07-29 13:21                 ` Alex Bennée
2022-07-29 14:00                   ` Claudio Fontana
2022-07-31 20:42                     ` Claudio Fontana
2022-08-01  8:44                       ` Alex Bennée
2022-07-28 11:41       ` Michael S. Tsirkin
2022-07-27 16:17 ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.