All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] failover: fix unplug pending detection
@ 2021-09-30  8:20 Laurent Vivier
  2021-09-30  9:24 ` Ani Sinha
  0 siblings, 1 reply; 5+ messages in thread
From: Laurent Vivier @ 2021-09-30  8:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ani Sinha, Igor Mammedov, Juan Quintela, Jens Freimann,
	Michael S. Tsirkin

Failover needs to detect the end of the PCI unplug to start migration
after the VFIO card has been unplugged.

To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
pcie_unplug_device().

But since
    17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
we have switched to ACPI unplug and these functions are not called anymore
and the flag not set. So failover migration is not able to detect if card
is really unplugged and acts as it's done as soon as it's started. So it
doesn't wait the end of the unplug to start the migration. We don't see any
problem when we test that because ACPI unplug is faster than PCIe native
hotplug and when the migration really starts the unplug operation is
already done.

See c000a9bd06ea ("pci: mark device having guest unplug request pending")
    a99c4da9fc2a ("pci: mark devices partially unplugged")

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/acpi/pcihp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index f610a25d2ef9..a2d27a3c4763 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -366,6 +366,11 @@ void acpi_pcihp_device_unplug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
     trace_acpi_pci_unplug(PCI_SLOT(pdev->devfn),
                           acpi_pcihp_get_bsel(pci_get_bus(pdev)));
 
+    if (pdev->partially_hotplugged) {
+        pdev->qdev.pending_deleted_event = false;
+        return;
+    }
+
     /*
      * clean up acpi-index so it could reused by another device
      */
@@ -396,6 +401,7 @@ void acpi_pcihp_device_unplug_request_cb(HotplugHandler *hotplug_dev,
         return;
     }
 
+    pdev->qdev.pending_deleted_event = true;
     s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
     acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] failover: fix unplug pending detection
  2021-09-30  8:20 [PATCH] failover: fix unplug pending detection Laurent Vivier
@ 2021-09-30  9:24 ` Ani Sinha
  2021-09-30  9:48   ` Laurent Vivier
  0 siblings, 1 reply; 5+ messages in thread
From: Ani Sinha @ 2021-09-30  9:24 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Juan Quintela, Michael S. Tsirkin, qemu-devel, Ani Sinha,
	Igor Mammedov, Jens Freimann



On Thu, 30 Sep 2021, Laurent Vivier wrote:

> Failover needs to detect the end of the PCI unplug to start migration
> after the VFIO card has been unplugged.
>
> To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
> pcie_unplug_device().
>
> But since
>     17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
> we have switched to ACPI unplug and these functions are not called anymore
> and the flag not set. So failover migration is not able to detect if card
> is really unplugged and acts as it's done as soon as it's started. So it
> doesn't wait the end of the unplug to start the migration. We don't see any
> problem when we test that because ACPI unplug is faster than PCIe native
> hotplug and when the migration really starts the unplug operation is
> already done.
>
> See c000a9bd06ea ("pci: mark device having guest unplug request pending")
>     a99c4da9fc2a ("pci: mark devices partially unplugged")

Ok so I have a basic question about partially_hotplugged flag in the
device struct (there were no comments added in a99c4da9fc2a39847
explaining it). It seems we return early from pcie_unplug_device() when
this flag is set from failover_unplug_primary() in virtio-net. What is the
purpose of this flag? It seems we are not doing a full unplug of the
primary device?

>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/acpi/pcihp.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index f610a25d2ef9..a2d27a3c4763 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -366,6 +366,11 @@ void acpi_pcihp_device_unplug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
>      trace_acpi_pci_unplug(PCI_SLOT(pdev->devfn),
>                            acpi_pcihp_get_bsel(pci_get_bus(pdev)));
>
> +    if (pdev->partially_hotplugged) {
> +        pdev->qdev.pending_deleted_event = false;
> +        return;
> +    }
> +
>      /*
>       * clean up acpi-index so it could reused by another device
>       */
> @@ -396,6 +401,7 @@ void acpi_pcihp_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>          return;
>      }
>
> +    pdev->qdev.pending_deleted_event = true;
>      s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>      acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
>  }
> --
> 2.31.1
>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] failover: fix unplug pending detection
  2021-09-30  9:24 ` Ani Sinha
@ 2021-09-30  9:48   ` Laurent Vivier
  2021-10-01  5:19     ` Ani Sinha
  0 siblings, 1 reply; 5+ messages in thread
From: Laurent Vivier @ 2021-09-30  9:48 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Jens Freimann, Igor Mammedov, Juan Quintela, qemu-devel,
	Michael S. Tsirkin

On 30/09/2021 11:24, Ani Sinha wrote:
> 
> 
> On Thu, 30 Sep 2021, Laurent Vivier wrote:
> 
>> Failover needs to detect the end of the PCI unplug to start migration
>> after the VFIO card has been unplugged.
>>
>> To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
>> pcie_unplug_device().
>>
>> But since
>>      17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
>> we have switched to ACPI unplug and these functions are not called anymore
>> and the flag not set. So failover migration is not able to detect if card
>> is really unplugged and acts as it's done as soon as it's started. So it
>> doesn't wait the end of the unplug to start the migration. We don't see any
>> problem when we test that because ACPI unplug is faster than PCIe native
>> hotplug and when the migration really starts the unplug operation is
>> already done.
>>
>> See c000a9bd06ea ("pci: mark device having guest unplug request pending")
>>      a99c4da9fc2a ("pci: mark devices partially unplugged")
> 
> Ok so I have a basic question about partially_hotplugged flag in the
> device struct (there were no comments added in a99c4da9fc2a39847
> explaining it). It seems we return early from pcie_unplug_device() when
> this flag is set from failover_unplug_primary() in virtio-net. What is the
> purpose of this flag? It seems we are not doing a full unplug of the
> primary device?

Yes, to be able to plug it back in case of migration failure we must keep all the data 
structures.

But reading the code again it seems this part should be in acpi_pcihp_eject_slot() rather 
than in acpi_pcihp_device_unplug_cb() to prevent the 
hotplug_handler_unplug()/object_unparent()  rather than the qdev_unrealize() (to be like 
in pcie.c).

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] failover: fix unplug pending detection
  2021-09-30  9:48   ` Laurent Vivier
@ 2021-10-01  5:19     ` Ani Sinha
  2021-10-01  6:32       ` Laurent Vivier
  0 siblings, 1 reply; 5+ messages in thread
From: Ani Sinha @ 2021-10-01  5:19 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Juan Quintela, Michael S. Tsirkin, qemu-devel, Ani Sinha,
	Igor Mammedov, Jens Freimann



On Thu, 30 Sep 2021, Laurent Vivier wrote:

> On 30/09/2021 11:24, Ani Sinha wrote:
> >
> >
> > On Thu, 30 Sep 2021, Laurent Vivier wrote:
> >
> > > Failover needs to detect the end of the PCI unplug to start migration
> > > after the VFIO card has been unplugged.
> > >
> > > To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset
> > > in
> > > pcie_unplug_device().
> > >
> > > But since
> > >      17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> > > Q35")
> > > we have switched to ACPI unplug and these functions are not called anymore
> > > and the flag not set. So failover migration is not able to detect if card
> > > is really unplugged and acts as it's done as soon as it's started. So it
> > > doesn't wait the end of the unplug to start the migration. We don't see
> > > any
> > > problem when we test that because ACPI unplug is faster than PCIe native
> > > hotplug and when the migration really starts the unplug operation is
> > > already done.
> > >
> > > See c000a9bd06ea ("pci: mark device having guest unplug request pending")
> > >      a99c4da9fc2a ("pci: mark devices partially unplugged")
> >
> > Ok so I have a basic question about partially_hotplugged flag in the
> > device struct (there were no comments added in a99c4da9fc2a39847
> > explaining it). It seems we return early from pcie_unplug_device() when
> > this flag is set from failover_unplug_primary() in virtio-net. What is the
> > purpose of this flag? It seems we are not doing a full unplug of the
> > primary device?
>
> Yes, to be able to plug it back in case of migration failure we must keep all
> the data structures.

Ok so two things here:
(a) could you please add a comment to PCIDevice struct in pci.h to clarify
what the flag actually means, why it is there and what it is supposed to
do.

(b) the naming of the variable could be something like do_partial_unplug
or some such. This could be a separate patch.

 >
> But reading the code again it seems this part should be in
> acpi_pcihp_eject_slot() rather than in acpi_pcihp_device_unplug_cb() to
> prevent the hotplug_handler_unplug()/object_unparent()  rather than the
> qdev_unrealize() (to be like in pcie.c).

Correct. You need to place the check earlier so as to be equivalent to
what the native hotplug code does.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] failover: fix unplug pending detection
  2021-10-01  5:19     ` Ani Sinha
@ 2021-10-01  6:32       ` Laurent Vivier
  0 siblings, 0 replies; 5+ messages in thread
From: Laurent Vivier @ 2021-10-01  6:32 UTC (permalink / raw)
  To: Ani Sinha
  Cc: Jens Freimann, Igor Mammedov, Juan Quintela, qemu-devel,
	Michael S. Tsirkin

On 01/10/2021 07:19, Ani Sinha wrote:
> 
> 
> On Thu, 30 Sep 2021, Laurent Vivier wrote:
> 
>> On 30/09/2021 11:24, Ani Sinha wrote:
>>>
>>>
>>> On Thu, 30 Sep 2021, Laurent Vivier wrote:
>>>
>>>> Failover needs to detect the end of the PCI unplug to start migration
>>>> after the VFIO card has been unplugged.
>>>>
>>>> To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset
>>>> in
>>>> pcie_unplug_device().
>>>>
>>>> But since
>>>>       17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
>>>> Q35")
>>>> we have switched to ACPI unplug and these functions are not called anymore
>>>> and the flag not set. So failover migration is not able to detect if card
>>>> is really unplugged and acts as it's done as soon as it's started. So it
>>>> doesn't wait the end of the unplug to start the migration. We don't see
>>>> any
>>>> problem when we test that because ACPI unplug is faster than PCIe native
>>>> hotplug and when the migration really starts the unplug operation is
>>>> already done.
>>>>
>>>> See c000a9bd06ea ("pci: mark device having guest unplug request pending")
>>>>       a99c4da9fc2a ("pci: mark devices partially unplugged")
>>>
>>> Ok so I have a basic question about partially_hotplugged flag in the
>>> device struct (there were no comments added in a99c4da9fc2a39847
>>> explaining it). It seems we return early from pcie_unplug_device() when
>>> this flag is set from failover_unplug_primary() in virtio-net. What is the
>>> purpose of this flag? It seems we are not doing a full unplug of the
>>> primary device?
>>
>> Yes, to be able to plug it back in case of migration failure we must keep all
>> the data structures.
> 
> Ok so two things here:
> (a) could you please add a comment to PCIDevice struct in pci.h to clarify
> what the flag actually means, why it is there and what it is supposed to
> do.

Will be in v3.

> (b) the naming of the variable could be something like do_partial_unplug
> or some such. This could be a separate patch.

OK, I'll do that on a separate patch: I'm already working on a patch series moving most of 
the failover code to PCI files (hotplug is a PCI feature not a virtio one).

https://patchew.org/QEMU/20210820142002.152994-1-lvivier@redhat.com/

>   >
>> But reading the code again it seems this part should be in
>> acpi_pcihp_eject_slot() rather than in acpi_pcihp_device_unplug_cb() to
>> prevent the hotplug_handler_unplug()/object_unparent()  rather than the
>> qdev_unrealize() (to be like in pcie.c).
> 
> Correct. You need to place the check earlier so as to be equivalent to
> what the native hotplug code does.
> 

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-01  6:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30  8:20 [PATCH] failover: fix unplug pending detection Laurent Vivier
2021-09-30  9:24 ` Ani Sinha
2021-09-30  9:48   ` Laurent Vivier
2021-10-01  5:19     ` Ani Sinha
2021-10-01  6:32       ` Laurent Vivier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.