[Qemu-devel] [PATCH v3 0/2] s390x/pci: hotplug handler fixes and reworks

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/2] s390x/pci: hotplug handler fixes and reworks
@ 2019-01-21 13:42 David Hildenbrand
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset David Hildenbrand
  0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-01-21 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Collin Walling, Thomas Huth, Christian Borntraeger,
	Cornelia Huck, Richard Henderson, David Hildenbrand

These are the remaining two patches, rebased to current upstream.

Patch 1: Rework unplug handler (introduce unplug_request handler) which
         also fixes some unplug scenarios
Patch 2: Handle leftover unplug requests on reset

v2 -> v3:
- Rebased

v1 -> v2:
- Some rewordings in patch descriptions
- "s390x/pci: Introduce unplug requests and split unplug handler"
-- Some simplifications regarding s390_pci_perform_unplug()

David Hildenbrand (2):
  s390x/pci: Introduce unplug requests and split unplug handler
  s390x/pci: Unplug remaining devices on pcihost reset

 hw/s390x/s390-pci-bus.c | 174 +++++++++++++++++++++++++++-------------
 hw/s390x/s390-pci-bus.h |   1 +
 2 files changed, 121 insertions(+), 54 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-21 13:42 [Qemu-devel] [PATCH v3 0/2] s390x/pci: hotplug handler fixes and reworks David Hildenbrand
@ 2019-01-21 13:42 ` David Hildenbrand
  2019-01-23 11:03   ` Cornelia Huck
                     ` (2 more replies)
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset David Hildenbrand
  1 sibling, 3 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-01-21 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Collin Walling, Thomas Huth, Christian Borntraeger,
	Cornelia Huck, Richard Henderson, David Hildenbrand

PCI on s390x is really weird and how it was modeled in QEMU might not have
been the right choice. Anyhow, right now it is the case that:
- Hotplugging a PCI device will silently create a zPCI device
  (if none is provided)
- Hotunplugging a zPCI device will unplug the PCI device (if any)
- Hotunplugging a PCI device will unplug also the zPCI device
As far as I can see, we can no longer change this behavior. But we
should fix it.

Both device types are handled via a single hotplug handler call. This
is problematic for various reasons:
1. Unplugging via the zPCI device allows to unplug PCI bridges as
   checks are not performed - bad.
2. Unplugging via the zPCI device allows to unplug devices that are not
   hot removable. (check performed in qdev_unplug()) - bad.
3. Hotplug handler chains are not possible for the unplug case. In the
   future, the machine might want to override hotplug handlers, to
   process device specific stuff and to then branch off to the actual
   hotplug handler. We need separate hotplug handler calls for both the
   PCI and zPCI device to make this work reliably. All other PCI
   implementations are already prepared to handle this correctly, only
   s390x is missing.

Therefore, introduce the unplug_request handler and properly perform
unplug checks by redirecting to the separate unplug_request handlers.
When finally unplugging, perform two separate hotplug_handler_unplug()
calls, first for the PCI device, followed by the zPCI device. This now
nicely splits unplugging paths for both devices.

The redirect part is a little hairy, as the user is allowed to trigger
unplug either via the PCI or the zPCI device. So redirect always to the
PCI unplug request handler first and remember if that check has been
performed in the zPCI device. Redirect then to the zPCI device unplug
request handler to perform the magic. Remembering that we already
checked the PCI device breaks the redirect loop.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 166 +++++++++++++++++++++++++++-------------
 hw/s390x/s390-pci-bus.h |   1 +
 2 files changed, 113 insertions(+), 54 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index f017c1ded0..bc17a8cf65 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -148,6 +148,22 @@ out:
     psccb->header.response_code = cpu_to_be16(rc);
 }
 
+static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
+{
+    HotplugHandler *hotplug_ctrl;
+
+    /* Unplug the PCI device */
+    if (pbdev->pdev) {
+        hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(pbdev->pdev));
+        hotplug_handler_unplug(hotplug_ctrl, DEVICE(pbdev->pdev),
+                               &error_abort);
+    }
+
+    /* Unplug the zPCI device */
+    hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(pbdev));
+    hotplug_handler_unplug(hotplug_ctrl, DEVICE(pbdev), &error_abort);
+}
+
 void s390_pci_sclp_deconfigure(SCCB *sccb)
 {
     IoaCfgSccb *psccb = (IoaCfgSccb *)sccb;
@@ -179,7 +195,7 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
         rc = SCLP_RC_NORMAL_COMPLETION;
 
         if (pbdev->release_timer) {
-            qdev_unplug(DEVICE(pbdev->pdev), NULL);
+            s390_pci_perform_unplug(pbdev);
         }
     }
 out:
@@ -217,6 +233,24 @@ S390PCIBusDevice *s390_pci_find_dev_by_target(S390pciState *s,
     return NULL;
 }
 
+static S390PCIBusDevice *s390_pci_find_dev_by_pci(S390pciState *s,
+                                                  PCIDevice *pci_dev)
+{
+    S390PCIBusDevice *pbdev;
+
+    if (!pci_dev) {
+        return NULL;
+    }
+
+    QTAILQ_FOREACH(pbdev, &s->zpci_devs, link) {
+        if (pbdev->pdev == pci_dev) {
+            return pbdev;
+        }
+    }
+
+    return NULL;
+}
+
 S390PCIBusDevice *s390_pci_find_dev_by_idx(S390pciState *s, uint32_t idx)
 {
     return g_hash_table_lookup(s->zpci_table, &idx);
@@ -939,77 +973,100 @@ static void s390_pcihost_timer_cb(void *opaque)
     pbdev->state = ZPCI_FS_STANDBY;
     s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
                                  pbdev->fh, pbdev->fid);
-    qdev_unplug(DEVICE(pbdev), NULL);
+    s390_pci_perform_unplug(pbdev);
 }
 
 static void s390_pcihost_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
                                 Error **errp)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
-    PCIDevice *pci_dev = NULL;
-    PCIBus *bus;
-    int32_t devfn;
     S390PCIBusDevice *pbdev = NULL;
 
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        PCIDevice *pci_dev = PCI_DEVICE(dev);
+        PCIBus *bus;
+        int32_t devfn;
+
+        pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
+        g_assert(pbdev);
+
+        s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
+                                     pbdev->fh, pbdev->fid);
+        bus = pci_get_bus(pci_dev);
+        devfn = pci_dev->devfn;
+        object_unparent(OBJECT(pci_dev));
+
+        s390_pci_msix_free(pbdev);
+        s390_pci_iommu_free(s, bus, devfn);
+        pbdev->pdev = NULL;
+        pbdev->state = ZPCI_FS_RESERVED;
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
+        pbdev = S390_PCI_DEVICE(dev);
+
+        if (pbdev->release_timer) {
+            timer_del(pbdev->release_timer);
+            timer_free(pbdev->release_timer);
+            pbdev->release_timer = NULL;
+        }
+        pbdev->fid = 0;
+        QTAILQ_REMOVE(&s->zpci_devs, pbdev, link);
+        g_hash_table_remove(s->zpci_table, &pbdev->idx);
+        object_unparent(OBJECT(pbdev));
+    }
+}
+
+static void s390_pcihost_unplug_request(HotplugHandler *hotplug_dev,
+                                        DeviceState *dev,
+                                        Error **errp)
+{
+    S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
+    S390PCIBusDevice *pbdev;
+
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
         error_setg(errp, "PCI bridge hot unplug currently not supported");
-        return;
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
-        pci_dev = PCI_DEVICE(dev);
-
-        QTAILQ_FOREACH(pbdev, &s->zpci_devs, link) {
-            if (pbdev->pdev == pci_dev) {
-                break;
-            }
-        }
-        assert(pbdev != NULL);
+        /*
+         * Redirect the unplug request to the zPCI device and remember that
+         * we've checked the PCI device already (to prevent endless recursion).
+         */
+        pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
+        g_assert(pbdev);
+        pbdev->pci_unplug_request_processed = true;
+        qdev_unplug(DEVICE(pbdev), errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
         pbdev = S390_PCI_DEVICE(dev);
-        pci_dev = pbdev->pdev;
-    } else {
-        g_assert_not_reached();
-    }
 
-    switch (pbdev->state) {
-    case ZPCI_FS_RESERVED:
-        goto out;
-    case ZPCI_FS_STANDBY:
-        break;
-    default:
-        if (pbdev->release_timer) {
+        /*
+         * If unplug was initially requested for the zPCI device, we
+         * first have to redirect to the PCI device, which will in return
+         * redirect back to us after performing its checks (if the request
+         * is not blocked, e.g. because it's a PCI bridge).
+         */
+        if (pbdev->pdev && !pbdev->pci_unplug_request_processed) {
+            qdev_unplug(DEVICE(pbdev->pdev), errp);
             return;
         }
-        s390_pci_generate_plug_event(HP_EVENT_DECONFIGURE_REQUEST,
-                                     pbdev->fh, pbdev->fid);
-        pbdev->release_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-                                            s390_pcihost_timer_cb,
-                                            pbdev);
-        timer_mod(pbdev->release_timer,
-                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + HOT_UNPLUG_TIMEOUT);
-        return;
-    }
+        pbdev->pci_unplug_request_processed = false;
 
-    if (pbdev->release_timer) {
-        timer_del(pbdev->release_timer);
-        timer_free(pbdev->release_timer);
-        pbdev->release_timer = NULL;
+        switch (pbdev->state) {
+        case ZPCI_FS_STANDBY:
+        case ZPCI_FS_RESERVED:
+            s390_pci_perform_unplug(pbdev);
+            break;
+        default:
+            if (pbdev->release_timer) {
+                return;
+            }
+            s390_pci_generate_plug_event(HP_EVENT_DECONFIGURE_REQUEST,
+                                         pbdev->fh, pbdev->fid);
+            pbdev->release_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                                s390_pcihost_timer_cb, pbdev);
+            timer_mod(pbdev->release_timer,
+                    qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + HOT_UNPLUG_TIMEOUT);
+        }
+    } else {
+        g_assert_not_reached();
     }
-
-    s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
-                                 pbdev->fh, pbdev->fid);
-    bus = pci_get_bus(pci_dev);
-    devfn = pci_dev->devfn;
-    object_unparent(OBJECT(pci_dev));
-    fmb_timer_free(pbdev);
-    s390_pci_msix_free(pbdev);
-    s390_pci_iommu_free(s, bus, devfn);
-    pbdev->pdev = NULL;
-    pbdev->state = ZPCI_FS_RESERVED;
-out:
-    pbdev->fid = 0;
-    QTAILQ_REMOVE(&s->zpci_devs, pbdev, link);
-    g_hash_table_remove(s->zpci_table, &pbdev->idx);
-    object_unparent(OBJECT(pbdev));
 }
 
 static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
@@ -1059,6 +1116,7 @@ static void s390_pcihost_class_init(ObjectClass *klass, void *data)
     dc->realize = s390_pcihost_realize;
     hc->pre_plug = s390_pcihost_pre_plug;
     hc->plug = s390_pcihost_plug;
+    hc->unplug_request = s390_pcihost_unplug_request;
     hc->unplug = s390_pcihost_unplug;
     msi_nonbroken = true;
 }
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index dadad1f758..b1a6bb8296 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -336,6 +336,7 @@ struct S390PCIBusDevice {
     IndAddr *summary_ind;
     IndAddr *indicator;
     QEMUTimer *release_timer;
+    bool pci_unplug_request_processed;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-21 13:42 [Qemu-devel] [PATCH v3 0/2] s390x/pci: hotplug handler fixes and reworks David Hildenbrand
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
@ 2019-01-21 13:42 ` David Hildenbrand
  2019-01-23 11:05   ` Cornelia Huck
  1 sibling, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-01-21 13:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Collin Walling, Thomas Huth, Christian Borntraeger,
	Cornelia Huck, Richard Henderson, David Hildenbrand

When resetting the guest we should unplug and remove all devices that
are still pending. Otherwise the fresh guest will see devices that will
suddenly vanish.

Can be triggered e.g. via
(hmp) device_add virtio-mouse-pci,id=test
(hmp) stop
(hmp) device_del test
(hmp) system_reset
(hmp) c

The device will vanish after roughly 5 minutes. With this patch, the
device will vanish on reboot (S390_RESET_EXTERNAL and S390_RESET_REIPL,
which reset the pcihost bridge via qemu_devices_reset()). If we want
these devices to vanish directly on any reset (S390_RESET_MODIFIED_CLEAR
and S390_RESET_LOAD_NORMAL), we have to modify s390_machine_reset(). But
I have the feeling that this should not be done for all reset types.

This approach is similar to what's done for acpi PCI hotplug in
acpi_pcihp_reset() -> acpi_pcihp_update() ->
acpi_pcihp_update_hotplug_bus() -> acpi_pcihp_eject_slot().

s390_pci_generate_plug_event()'s will still be generated, I guess this
is not an issue (same thing could happen right now if the timer expires
just after reset).

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index bc17a8cf65..b70ae25533 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1102,6 +1102,14 @@ static void s390_pcihost_reset(DeviceState *dev)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
     PCIBus *bus = s->parent_obj.bus;
+    S390PCIBusDevice *pbdev, *next;
+
+    /* Unplug all pending devices that were requested to be released */
+    QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
+        if (pbdev->release_timer) {
+            s390_pcihost_timer_cb(pbdev);
+        }
+    }

     s->bus_no = 0;
     pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
@ 2019-01-23 11:03   ` Cornelia Huck
  2019-01-23 11:08     ` David Hildenbrand
  2019-01-29 13:31   ` Pierre Morel
  2019-01-30 19:52   ` [Qemu-devel] [qemu-s390x] " Collin Walling
  2 siblings, 1 reply; 22+ messages in thread
From: Cornelia Huck @ 2019-01-23 11:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson

On Mon, 21 Jan 2019 14:42:48 +0100
David Hildenbrand <david@redhat.com> wrote:

> PCI on s390x is really weird and how it was modeled in QEMU might not have
> been the right choice. Anyhow, right now it is the case that:
> - Hotplugging a PCI device will silently create a zPCI device
>   (if none is provided)
> - Hotunplugging a zPCI device will unplug the PCI device (if any)
> - Hotunplugging a PCI device will unplug also the zPCI device
> As far as I can see, we can no longer change this behavior. But we
> should fix it.
> 
> Both device types are handled via a single hotplug handler call. This
> is problematic for various reasons:
> 1. Unplugging via the zPCI device allows to unplug PCI bridges as
>    checks are not performed - bad.

Maybe I'm confused here, but how can a zPCI device couple with a bridge?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset David Hildenbrand
@ 2019-01-23 11:05   ` Cornelia Huck
  2019-01-28 11:28     ` Cornelia Huck
  0 siblings, 1 reply; 22+ messages in thread
From: Cornelia Huck @ 2019-01-23 11:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson

On Mon, 21 Jan 2019 14:42:49 +0100
David Hildenbrand <david@redhat.com> wrote:

> When resetting the guest we should unplug and remove all devices that
> are still pending. Otherwise the fresh guest will see devices that will
> suddenly vanish.
> 
> Can be triggered e.g. via
> (hmp) device_add virtio-mouse-pci,id=test
> (hmp) stop
> (hmp) device_del test
> (hmp) system_reset
> (hmp) c
> 
> The device will vanish after roughly 5 minutes. With this patch, the
> device will vanish on reboot (S390_RESET_EXTERNAL and S390_RESET_REIPL,
> which reset the pcihost bridge via qemu_devices_reset()). If we want
> these devices to vanish directly on any reset (S390_RESET_MODIFIED_CLEAR
> and S390_RESET_LOAD_NORMAL), we have to modify s390_machine_reset(). But
> I have the feeling that this should not be done for all reset types.
> 
> This approach is similar to what's done for acpi PCI hotplug in
> acpi_pcihp_reset() -> acpi_pcihp_update() ->
> acpi_pcihp_update_hotplug_bus() -> acpi_pcihp_eject_slot().
> 
> s390_pci_generate_plug_event()'s will still be generated, I guess this
> is not an issue (same thing could happen right now if the timer expires
> just after reset).

I'm wondering what the architecture says regarding those events -- can
someone with access to the documentation comment?

> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/s390x/s390-pci-bus.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index bc17a8cf65..b70ae25533 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -1102,6 +1102,14 @@ static void s390_pcihost_reset(DeviceState *dev)
>  {
>      S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
>      PCIBus *bus = s->parent_obj.bus;
> +    S390PCIBusDevice *pbdev, *next;
> +
> +    /* Unplug all pending devices that were requested to be released */
> +    QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
> +        if (pbdev->release_timer) {
> +            s390_pcihost_timer_cb(pbdev);
> +        }
> +    }
>  
>      s->bus_no = 0;
>      pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-23 11:03   ` Cornelia Huck
@ 2019-01-23 11:08     ` David Hildenbrand
  2019-01-28 11:27       ` Cornelia Huck
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-01-23 11:08 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson

On 23.01.19 12:03, Cornelia Huck wrote:
> On Mon, 21 Jan 2019 14:42:48 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> PCI on s390x is really weird and how it was modeled in QEMU might not have
>> been the right choice. Anyhow, right now it is the case that:
>> - Hotplugging a PCI device will silently create a zPCI device
>>   (if none is provided)
>> - Hotunplugging a zPCI device will unplug the PCI device (if any)
>> - Hotunplugging a PCI device will unplug also the zPCI device
>> As far as I can see, we can no longer change this behavior. But we
>> should fix it.
>>
>> Both device types are handled via a single hotplug handler call. This
>> is problematic for various reasons:
>> 1. Unplugging via the zPCI device allows to unplug PCI bridges as
>>    checks are not performed - bad.
> 
> Maybe I'm confused here, but how can a zPCI device couple with a bridge?
> 

I was confused, bridges don't attach to a zPCI device. So this remark is
invalid. Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-23 11:08     ` David Hildenbrand
@ 2019-01-28 11:27       ` Cornelia Huck
  0 siblings, 0 replies; 22+ messages in thread
From: Cornelia Huck @ 2019-01-28 11:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson

On Wed, 23 Jan 2019 12:08:37 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 23.01.19 12:03, Cornelia Huck wrote:
> > On Mon, 21 Jan 2019 14:42:48 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> PCI on s390x is really weird and how it was modeled in QEMU might not have
> >> been the right choice. Anyhow, right now it is the case that:
> >> - Hotplugging a PCI device will silently create a zPCI device
> >>   (if none is provided)
> >> - Hotunplugging a zPCI device will unplug the PCI device (if any)
> >> - Hotunplugging a PCI device will unplug also the zPCI device
> >> As far as I can see, we can no longer change this behavior. But we
> >> should fix it.
> >>
> >> Both device types are handled via a single hotplug handler call. This
> >> is problematic for various reasons:
> >> 1. Unplugging via the zPCI device allows to unplug PCI bridges as
> >>    checks are not performed - bad.  
> > 
> > Maybe I'm confused here, but how can a zPCI device couple with a bridge?
> >   
> 
> I was confused, bridges don't attach to a zPCI device. So this remark is
> invalid. Thanks!
> 

Ok, I can remove point 1 when applying.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-23 11:05   ` Cornelia Huck
@ 2019-01-28 11:28     ` Cornelia Huck
  2019-01-29  0:09       ` [Qemu-devel] [qemu-s390x] " Collin Walling
  0 siblings, 1 reply; 22+ messages in thread
From: Cornelia Huck @ 2019-01-28 11:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson

On Wed, 23 Jan 2019 12:05:39 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 21 Jan 2019 14:42:49 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
> > When resetting the guest we should unplug and remove all devices that
> > are still pending. Otherwise the fresh guest will see devices that will
> > suddenly vanish.
> > 
> > Can be triggered e.g. via
> > (hmp) device_add virtio-mouse-pci,id=test
> > (hmp) stop
> > (hmp) device_del test
> > (hmp) system_reset
> > (hmp) c
> > 
> > The device will vanish after roughly 5 minutes. With this patch, the
> > device will vanish on reboot (S390_RESET_EXTERNAL and S390_RESET_REIPL,
> > which reset the pcihost bridge via qemu_devices_reset()). If we want
> > these devices to vanish directly on any reset (S390_RESET_MODIFIED_CLEAR
> > and S390_RESET_LOAD_NORMAL), we have to modify s390_machine_reset(). But
> > I have the feeling that this should not be done for all reset types.
> > 
> > This approach is similar to what's done for acpi PCI hotplug in
> > acpi_pcihp_reset() -> acpi_pcihp_update() ->
> > acpi_pcihp_update_hotplug_bus() -> acpi_pcihp_eject_slot().
> > 
> > s390_pci_generate_plug_event()'s will still be generated, I guess this
> > is not an issue (same thing could happen right now if the timer expires
> > just after reset).  
> 
> I'm wondering what the architecture says regarding those events -- can
> someone with access to the documentation comment?

Ping. Any comments from the IBM folks?

> 
> > 
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > ---
> >  hw/s390x/s390-pci-bus.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> > index bc17a8cf65..b70ae25533 100644
> > --- a/hw/s390x/s390-pci-bus.c
> > +++ b/hw/s390x/s390-pci-bus.c
> > @@ -1102,6 +1102,14 @@ static void s390_pcihost_reset(DeviceState *dev)
> >  {
> >      S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
> >      PCIBus *bus = s->parent_obj.bus;
> > +    S390PCIBusDevice *pbdev, *next;
> > +
> > +    /* Unplug all pending devices that were requested to be released */
> > +    QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
> > +        if (pbdev->release_timer) {
> > +            s390_pcihost_timer_cb(pbdev);
> > +        }
> > +    }
> >  
> >      s->bus_no = 0;
> >      pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);  
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-28 11:28     ` Cornelia Huck
@ 2019-01-29  0:09       ` Collin Walling
  2019-01-29 10:24         ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Collin Walling @ 2019-01-29  0:09 UTC (permalink / raw)
  To: Cornelia Huck, David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Thomas Huth, Christian Borntraeger,
	Richard Henderson

On 1/28/19 6:28 AM, Cornelia Huck wrote:
> On Wed, 23 Jan 2019 12:05:39 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Mon, 21 Jan 2019 14:42:49 +0100
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> When resetting the guest we should unplug and remove all devices that
>>> are still pending. Otherwise the fresh guest will see devices that will
>>> suddenly vanish.
>>>
>>> Can be triggered e.g. via
>>> (hmp) device_add virtio-mouse-pci,id=test
>>> (hmp) stop
>>> (hmp) device_del test
>>> (hmp) system_reset
>>> (hmp) c
>>>
>>> The device will vanish after roughly 5 minutes. With this patch, the
>>> device will vanish on reboot (S390_RESET_EXTERNAL and S390_RESET_REIPL,
>>> which reset the pcihost bridge via qemu_devices_reset()). If we want
>>> these devices to vanish directly on any reset (S390_RESET_MODIFIED_CLEAR
>>> and S390_RESET_LOAD_NORMAL), we have to modify s390_machine_reset(). But
>>> I have the feeling that this should not be done for all reset types.
>>>
>>> This approach is similar to what's done for acpi PCI hotplug in
>>> acpi_pcihp_reset() -> acpi_pcihp_update() ->
>>> acpi_pcihp_update_hotplug_bus() -> acpi_pcihp_eject_slot().
>>>
>>> s390_pci_generate_plug_event()'s will still be generated, I guess this
>>> is not an issue (same thing could happen right now if the timer expires
>>> just after reset).
>>
>> I'm wondering what the architecture says regarding those events -- can
>> someone with access to the documentation comment?
> 
> Ping. Any comments from the IBM folks?
> 

So the idea here is that if we have a PCI device that is the process of 
being deconfigured and we are also in the middle of a reset, then let's 
accelerate deconfiguring of the PCI device during the reset. Makes sense.

Note:

The callback function will deconfigure the the device and put it into 
standby mode. However, a PCI device should only go into standby from the 
*disabled state* (which it could already be in due to the unplug 
sequence), or from a *permanent error state* (something we should 
hopefully never see -- this means something went seriously wrong with 
the device).

Two things I'm concerned about:

1)

What I would suggest is adding a check for the pbdev->state for 
ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these 
states, then we're safe to deconfigure and put into standby. If the 
device is still in another state (such as enabled or blocked, etc) then 
we should allow the timer to resume and give the device some more time 
before forcing an unplug. It's also probably not a good idea to try and 
deconfigure a device that might already be deconfigured (e.g. if it's 
already in standby or reserved state). That might not happen though, but 
it's good to cover our bases.

A side thought: In addition to checking the states, what would happen if 
you forced the timer to 0? Would the callback get called? Would that 
just accelerate the already-in-progress unplug sequence?

and 2)

I worry that the sclp might try to deconfigure a PCI device at the same 
time we force the callback in this patch. I noticed that the 
sclp_deconfigure function also checks on the release timer before 
unplugging. I think we should make sure the timer is properly stopped or 
canceled before the sclp tries to deconfigure and before this patch 
forces the callback, that way we hopefully won't try to do both at the 
same time.

something like

if (release_timer) {
	stop timer
	unplug
}

Your adjustments to the pcihost_unplug function in patch #1 would of 
course handle freeing the release timer later on.

>>
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   hw/s390x/s390-pci-bus.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> index bc17a8cf65..b70ae25533 100644
>>> --- a/hw/s390x/s390-pci-bus.c
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -1102,6 +1102,14 @@ static void s390_pcihost_reset(DeviceState *dev)
>>>   {
>>>       S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
>>>       PCIBus *bus = s->parent_obj.bus;
>>> +    S390PCIBusDevice *pbdev, *next;
>>> +
>>> +    /* Unplug all pending devices that were requested to be released */
>>> +    QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
>>> +        if (pbdev->release_timer) {
>>> +            s390_pcihost_timer_cb(pbdev);
>>> +        }
>>> +    }
>>>   
>>>       s->bus_no = 0;
>>>       pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
>>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-29  0:09       ` [Qemu-devel] [qemu-s390x] " Collin Walling
@ 2019-01-29 10:24         ` David Hildenbrand
  2019-01-29 13:50           ` Pierre Morel
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-01-29 10:24 UTC (permalink / raw)
  To: Collin Walling, Cornelia Huck, Christian Borntraeger
  Cc: qemu-devel, qemu-s390x, Thomas Huth, Richard Henderson

>>> I'm wondering what the architecture says regarding those events -- can
>>> someone with access to the documentation comment?
>>
>> Ping. Any comments from the IBM folks?
>>
> 
> 
> So the idea here is that if we have a PCI device that is the process of 
> being deconfigured and we are also in the middle of a reset, then let's 
> accelerate deconfiguring of the PCI device during the reset. Makes sense.
> 
> Note:
> 
> The callback function will deconfigure the the device and put it into 
> standby mode. However, a PCI device should only go into standby from the 
> *disabled state* (which it could already be in due to the unplug 
> sequence), or from a *permanent error state* (something we should 
> hopefully never see -- this means something went seriously wrong with 
> the device).

Right, this should already have been checked before setting up the timer.

> 
> Two things I'm concerned about:
> 
> 1)
> 
> What I would suggest is adding a check for the pbdev->state for 
> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these 
> states, then we're safe to deconfigure and put into standby. If the 

We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.

So for
- ZPCI_FS_DISABLED
- ZPCI_FS_ENABLED
- ZPCI_FS_BLOCKED
- ZPCI_FS_ERROR
- ZPCI_FS_PERMANENT_ERROR

We setup a timer and simply go ahead and unplug the device when the
timer expires ("forced unplug").

Changing that behavior might be more invasive. Simply not unplugging in
s390_pcihost_timer_cb() on some of these states would mean that somebody
issued a request and that requests is simply lost/ignored. Not sure if
that is what we want. I think we need separate patches to change
something like that. Especially

1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
the unplug request and moves the device to ZPCI_FS_ENABLED before the
timer expires? These are corner cases to consider.

2. Do we need a timer at all? Now that Patch #1 introduces
unplug_requests, we are free to ignore requests from the user if the
guest is not reacting. I would really favor getting rid of the timer
completely. Was there a special reason why this was introduced?

No other architecture (e.g. ACPI) uses such a timer. They use a simple
flag to remember if a request is pending. I would really favor going
into that direction.

@Christian do you think we need this "force remove after 30 seconds"
timer thingy?

> device is still in another state (such as enabled or blocked, etc) then 
> we should allow the timer to resume and give the device some more time 
> before forcing an unplug. It's also probably not a good idea to try and 
> deconfigure a device that might already be deconfigured (e.g. if it's 
> already in standby or reserved state). That might not happen though, but 
> it's good to cover our bases.
> 
> A side thought: In addition to checking the states, what would happen if 
> you forced the timer to 0? Would the callback get called? Would that 
> just accelerate the already-in-progress unplug sequence?

I guess so, but then action will be called asynchronously from the main
loop when timers are run. Calling it synchronously during the reset
makes in my point of view more sense.

> 
> and 2)
> 
> I worry that the sclp might try to deconfigure a PCI device at the same 
> time we force the callback in this patch. I noticed that the 
> sclp_deconfigure function also checks on the release timer before 
> unplugging. I think we should make sure the timer is properly stopped or 
> canceled before the sclp tries to deconfigure and before this patch 
> forces the callback, that way we hopefully won't try to do both at the 
> same time.

Not necessary. Timers are ran from the main thread, just as we (system
reset) are. We don't have any race conditions here. This would only be
possible when being triggered by a VCPU without holding the iothread
mutex. But this is never the case. All VCPUs and even timers (due to the
virtual clock being stopped) are stopped during system reset.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
  2019-01-23 11:03   ` Cornelia Huck
@ 2019-01-29 13:31   ` Pierre Morel
  2019-01-29 15:14     ` David Hildenbrand
  2019-01-30 19:52   ` [Qemu-devel] [qemu-s390x] " Collin Walling
  2 siblings, 1 reply; 22+ messages in thread
From: Pierre Morel @ 2019-01-29 13:31 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Thomas Huth, Cornelia Huck, Collin Walling,
	Christian Borntraeger, qemu-s390x, Richard Henderson

On 21/01/2019 14:42, David Hildenbrand wrote:
> PCI on s390x is really weird and how it was modeled in QEMU might not have
> been the right choice. Anyhow, right now it is the case that:
> - Hotplugging a PCI device will silently create a zPCI device
>    (if none is provided)
> - Hotunplugging a zPCI device will unplug the PCI device (if any)
> - Hotunplugging a PCI device will unplug also the zPCI device
> As far as I can see, we can no longer change this behavior. But we
> should fix it.

hum, is it really a problem per se?

> 
> Both device types are handled via a single hotplug handler call. This
> is problematic for various reasons:
> 1. Unplugging via the zPCI device allows to unplug PCI bridges as
>     checks are not performed - bad.

bad

> 2. Unplugging via the zPCI device allows to unplug devices that are not
>     hot removable. (check performed in qdev_unplug()) - bad.

bad

> 3. Hotplug handler chains are not possible for the unplug case. In the
>     future, the machine might want to override hotplug handlers, to
>     process device specific stuff and to then branch off to the actual
>     hotplug handler. We need separate hotplug handler calls for both the
>     PCI and zPCI device to make this work reliably. All other PCI
>     implementations are already prepared to handle this correctly, only
>     s390x is missing.

ok

> 
> Therefore, introduce the unplug_request handler and properly perform
> unplug checks by redirecting to the separate unplug_request handlers.
> When finally unplugging, perform two separate hotplug_handler_unplug()
> calls, first for the PCI device, followed by the zPCI device. This now
> nicely splits unplugging paths for both devices.

hum, PCI device handle the backend, host side, while zPCI handle the 
front end, guest side.
So unplugging PCI first will deny the guest any possibility to smoothly 
relinquish a device.


Is it possible the other way around?

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-29 10:24         ` David Hildenbrand
@ 2019-01-29 13:50           ` Pierre Morel
  2019-01-29 15:11             ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Pierre Morel @ 2019-01-29 13:50 UTC (permalink / raw)
  To: David Hildenbrand, Collin Walling, Cornelia Huck, Christian Borntraeger
  Cc: Thomas Huth, qemu-s390x, qemu-devel, Richard Henderson

On 29/01/2019 11:24, David Hildenbrand wrote:
>>>> I'm wondering what the architecture says regarding those events -- can
>>>> someone with access to the documentation comment?
>>>
>>> Ping. Any comments from the IBM folks?

Hi,

Sorry to have wait so long.
At least Collin was faster.

>>>
>>
>>
>> So the idea here is that if we have a PCI device that is the process of
>> being deconfigured and we are also in the middle of a reset, then let's
>> accelerate deconfiguring of the PCI device during the reset. Makes sense.

to me too.
However, how do we ensure that the guest got time to respond to the 
first deconfigure request?

>>
>> Note:
>>
>> The callback function will deconfigure the the device and put it into
>> standby mode. However, a PCI device should only go into standby from the
>> *disabled state* (which it could already be in due to the unplug
>> sequence), or from a *permanent error state* (something we should
>> hopefully never see -- this means something went seriously wrong with
>> the device).

Not completely exact, the CHSC event 0x304, on the initiative of the 
host, force the "deconfigure state" from "configure state" generally, 
whatever sub-state it has (enabled/disabled/error...).

> 
> Right, this should already have been checked before setting up the timer.

Apropos timer, do we need a timer or wouldn't it be better to use a 
delay / a timer + condition?

AFAIU we get out of the unplug without waiting for any answer from the 
guest and we surely get the timer triggering after the reset has been done.
That seems bad.


> 
>>
>> Two things I'm concerned about:
>>
>> 1)
>>
>> What I would suggest is adding a check for the pbdev->state for
>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>> states, then we're safe to deconfigure and put into standby. If the
> 
> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
> 
> So for
> - ZPCI_FS_DISABLED
> - ZPCI_FS_ENABLED
> - ZPCI_FS_BLOCKED
> - ZPCI_FS_ERROR
> - ZPCI_FS_PERMANENT_ERROR
> 
> We setup a timer and simply go ahead and unplug the device when the
> timer expires ("forced unplug").

I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other 
states?
ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but 
other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST 
and we do not need a timer (or any delay)

> 
> Changing that behavior might be more invasive. Simply not unplugging in
> s390_pcihost_timer_cb() on some of these states would mean that somebody
> issued a request and that requests is simply lost/ignored. Not sure if
> that is what we want. I think we need separate patches to change
> something like that. Especially
> 
> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
> the unplug request and moves the device to ZPCI_FS_ENABLED before the
> timer expires? These are corner cases to consider.

+1, we must ensure to do the work inside the unplug CB.

> 
> 2. Do we need a timer at all? Now that Patch #1 introduces
> unplug_requests, we are free to ignore requests from the user if the
> guest is not reacting. I would really favor getting rid of the timer
> completely. Was there a special reason why this was introduced?

Yes, to let a chance to the guest to smoothly relinquish the device.
(for example sync/clean the disk)
However I do not think it is right implemented.

> 
> No other architecture (e.g. ACPI) uses such a timer. They use a simple
> flag to remember if a request is pending. I would really favor going
> into that direction.

I am not sure that the Intel architecture is a good example. :)

AFAIU they do not wait for the guest to have relinquish the device.
Or do they?
How long do they wait?

> 
> @Christian do you think we need this "force remove after 30 seconds"
> timer thingy?
> 
>> device is still in another state (such as enabled or blocked, etc) then
>> we should allow the timer to resume and give the device some more time
>> before forcing an unplug. It's also probably not a good idea to try and
>> deconfigure a device that might already be deconfigured (e.g. if it's
>> already in standby or reserved state). That might not happen though, but
>> it's good to cover our bases.
>>
>> A side thought: In addition to checking the states, what would happen if
>> you forced the timer to 0? Would the callback get called? Would that
>> just accelerate the already-in-progress unplug sequence?
> 
> I guess so, but then action will be called asynchronously from the main
> loop when timers are run. Calling it synchronously during the reset
> makes in my point of view more sense.

To me too, (and during hot unplug too) but it has a sense only if we 
wait some time between the demand to relinquish the device and the rip 
off of the device.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-29 13:50           ` Pierre Morel
@ 2019-01-29 15:11             ` David Hildenbrand
  2019-01-29 16:50               ` Pierre Morel
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-01-29 15:11 UTC (permalink / raw)
  To: pmorel, Collin Walling, Cornelia Huck, Christian Borntraeger
  Cc: Thomas Huth, qemu-s390x, qemu-devel, Richard Henderson

On 29.01.19 14:50, Pierre Morel wrote:
> On 29/01/2019 11:24, David Hildenbrand wrote:
>>>>> I'm wondering what the architecture says regarding those events -- can
>>>>> someone with access to the documentation comment?
>>>>
>>>> Ping. Any comments from the IBM folks?
> 
> Hi,
> 
> Sorry to have wait so long.
> At least Collin was faster.
> 
>>>>
>>>
>>>
>>> So the idea here is that if we have a PCI device that is the process of
>>> being deconfigured and we are also in the middle of a reset, then let's
>>> accelerate deconfiguring of the PCI device during the reset. Makes sense.
> 
> to me too.
> However, how do we ensure that the guest got time to respond to the 
> first deconfigure request?

30 seconds, then the reboot. On a reboot, I don't see why we should give
a guest more time. "It's dead", rip out the card as the guest refused to
hand it back. (maybe it crashed! but after a reboot the guest state is
reset and baught back to life)

> 
>>>
>>> Note:
>>>
>>> The callback function will deconfigure the the device and put it into
>>> standby mode. However, a PCI device should only go into standby from the
>>> *disabled state* (which it could already be in due to the unplug
>>> sequence), or from a *permanent error state* (something we should
>>> hopefully never see -- this means something went seriously wrong with
>>> the device).
> 
> Not completely exact, the CHSC event 0x304, on the initiative of the 
> host, force the "deconfigure state" from "configure state" generally, 
> whatever sub-state it has (enabled/disabled/error...).
> 
>>
>> Right, this should already have been checked before setting up the timer.
> 
> Apropos timer, do we need a timer or wouldn't it be better to use a 
> delay / a timer + condition?

I don't think we need a timer at all.

> 
> AFAIU we get out of the unplug without waiting for any answer from the 
> guest and we surely get the timer triggering after the reset has been done.
> That seems bad.

This is the case right now, correct.

> 
> 
>>
>>>
>>> Two things I'm concerned about:
>>>
>>> 1)
>>>
>>> What I would suggest is adding a check for the pbdev->state for
>>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>>> states, then we're safe to deconfigure and put into standby. If the
>>
>> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
>>
>> So for
>> - ZPCI_FS_DISABLED
>> - ZPCI_FS_ENABLED
>> - ZPCI_FS_BLOCKED
>> - ZPCI_FS_ERROR
>> - ZPCI_FS_PERMANENT_ERROR
>>
>> We setup a timer and simply go ahead and unplug the device when the
>> timer expires ("forced unplug").
> 
> I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other 
> states?
> ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but 
> other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST 
> and we do not need a timer (or any delay)

You can always expect that your guest driver is dead.

> 
>>
>> Changing that behavior might be more invasive. Simply not unplugging in
>> s390_pcihost_timer_cb() on some of these states would mean that somebody
>> issued a request and that requests is simply lost/ignored. Not sure if
>> that is what we want. I think we need separate patches to change
>> something like that. Especially
>>
>> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
>> the unplug request and moves the device to ZPCI_FS_ENABLED before the
>> timer expires? These are corner cases to consider.
> 
> +1, we must ensure to do the work inside the unplug CB.
> 
>>
>> 2. Do we need a timer at all? Now that Patch #1 introduces
>> unplug_requests, we are free to ignore requests from the user if the
>> guest is not reacting. I would really favor getting rid of the timer
>> completely. Was there a special reason why this was introduced?
> 
> Yes, to let a chance to the guest to smoothly relinquish the device.
> (for example sync/clean the disk)
> However I do not think it is right implemented.
> 
>>
>> No other architecture (e.g. ACPI) uses such a timer. They use a simple
>> flag to remember if a request is pending. I would really favor going
>> into that direction.
> 
> I am not sure that the Intel architecture is a good example. :)

Right, we all learned that zPCI did it better. (sorry ;) )

> 
> AFAIU they do not wait for the guest to have relinquish the device.
> Or do they?
> How long do they wait?

They wait for ever. And I guess we should do the same thing. If the
guest driver is broken (and this is really a rare scenario!) we would
not get the device back. Which is perfectly fine in my point of view. In
all other scenarios I guess the guest will simply respond in a timely
manner. And ripping out stuff from the guest always feels wrong. (yes
the channel subsystem works like that, but here we actually have a choice)

If we reboot, we can unplug the device. Otherwise, let's keep it simply
and don't use a timer.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-29 13:31   ` Pierre Morel
@ 2019-01-29 15:14     ` David Hildenbrand
  2019-01-29 16:54       ` Pierre Morel
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-01-29 15:14 UTC (permalink / raw)
  To: pmorel, qemu-devel
  Cc: Thomas Huth, Cornelia Huck, Collin Walling,
	Christian Borntraeger, qemu-s390x, Richard Henderson

On 29.01.19 14:31, Pierre Morel wrote:
> On 21/01/2019 14:42, David Hildenbrand wrote:
>> PCI on s390x is really weird and how it was modeled in QEMU might not have
>> been the right choice. Anyhow, right now it is the case that:
>> - Hotplugging a PCI device will silently create a zPCI device
>>    (if none is provided)
>> - Hotunplugging a zPCI device will unplug the PCI device (if any)
>> - Hotunplugging a PCI device will unplug also the zPCI device
>> As far as I can see, we can no longer change this behavior. But we
>> should fix it.
> 
> hum, is it really a problem per se?

Yes, because this is not the way it is usually done in QEMU :/

> 
>>
>> Both device types are handled via a single hotplug handler call. This
>> is problematic for various reasons:
>> 1. Unplugging via the zPCI device allows to unplug PCI bridges as
>>     checks are not performed - bad.
> 
> bad
> 
>> 2. Unplugging via the zPCI device allows to unplug devices that are not
>>     hot removable. (check performed in qdev_unplug()) - bad.
> 
> bad
> 
>> 3. Hotplug handler chains are not possible for the unplug case. In the
>>     future, the machine might want to override hotplug handlers, to
>>     process device specific stuff and to then branch off to the actual
>>     hotplug handler. We need separate hotplug handler calls for both the
>>     PCI and zPCI device to make this work reliably. All other PCI
>>     implementations are already prepared to handle this correctly, only
>>     s390x is missing.
> 
> ok
> 
>>
>> Therefore, introduce the unplug_request handler and properly perform
>> unplug checks by redirecting to the separate unplug_request handlers.
>> When finally unplugging, perform two separate hotplug_handler_unplug()
>> calls, first for the PCI device, followed by the zPCI device. This now
>> nicely splits unplugging paths for both devices.
> 
> hum, PCI device handle the backend, host side, while zPCI handle the 
> front end, guest side.
> So unplugging PCI first will deny the guest any possibility to smoothly 
> relinquish a device.
> 
> 
> Is it possible the other way around?

Maybe, but it does not really matter. We unplug both devices
synchronously, without the guest recognizing the order. We always have
the unplug request first that notifies the guest. When we get an ACK
from the guest, we can unpplug both devices in any order.

(and if we want to change the order, we should do it in a separate
patch, this patch does not change the order, just refactors the code)

Thanks!

> 
> Regards,
> Pierre
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-29 15:11             ` David Hildenbrand
@ 2019-01-29 16:50               ` Pierre Morel
  2019-01-29 18:20                 ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Pierre Morel @ 2019-01-29 16:50 UTC (permalink / raw)
  To: David Hildenbrand, Collin Walling, Cornelia Huck, Christian Borntraeger
  Cc: Thomas Huth, qemu-s390x, qemu-devel, Richard Henderson

On 29/01/2019 16:11, David Hildenbrand wrote:
> On 29.01.19 14:50, Pierre Morel wrote:
>> On 29/01/2019 11:24, David Hildenbrand wrote:
>>>>>> I'm wondering what the architecture says regarding those events -- can
>>>>>> someone with access to the documentation comment?
>>>>>
>>>>> Ping. Any comments from the IBM folks?
>>
>> Hi,
>>
>> Sorry to have wait so long.
>> At least Collin was faster.
>>
>>>>>
>>>>
>>>>
>>>> So the idea here is that if we have a PCI device that is the process of
>>>> being deconfigured and we are also in the middle of a reset, then let's
>>>> accelerate deconfiguring of the PCI device during the reset. Makes sense.
>>
>> to me too.
>> However, how do we ensure that the guest got time to respond to the
>> first deconfigure request?
> 
> 30 seconds, then the reboot. On a reboot, I don't see why we should give
> a guest more time. "It's dead", rip out the card as the guest refused to
> hand it back. (maybe it crashed! but after a reboot the guest state is
> reset and baught back to life)

I agree that 30 seconds is way enough,
also agree that in most cases the guest is dead.


> 
>>
>>>>
>>>> Note:
>>>>
>>>> The callback function will deconfigure the the device and put it into
>>>> standby mode. However, a PCI device should only go into standby from the
>>>> *disabled state* (which it could already be in due to the unplug
>>>> sequence), or from a *permanent error state* (something we should
>>>> hopefully never see -- this means something went seriously wrong with
>>>> the device).
>>
>> Not completely exact, the CHSC event 0x304, on the initiative of the
>> host, force the "deconfigure state" from "configure state" generally,
>> whatever sub-state it has (enabled/disabled/error...).
>>
>>>
>>> Right, this should already have been checked before setting up the timer.
>>
>> Apropos timer, do we need a timer or wouldn't it be better to use a
>> delay / a timer + condition?
> 
> I don't think we need a timer at all.

Yes, if it is possible to wait synchronously 30s it seems good to me.

> 
>>
>> AFAIU we get out of the unplug without waiting for any answer from the
>> guest and we surely get the timer triggering after the reset has been done.
>> That seems bad.
> 
> This is the case right now, correct.
> 
>>
>>
>>>
>>>>
>>>> Two things I'm concerned about:
>>>>
>>>> 1)
>>>>
>>>> What I would suggest is adding a check for the pbdev->state for
>>>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>>>> states, then we're safe to deconfigure and put into standby. If the
>>>
>>> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
>>>
>>> So for
>>> - ZPCI_FS_DISABLED
>>> - ZPCI_FS_ENABLED
>>> - ZPCI_FS_BLOCKED
>>> - ZPCI_FS_ERROR
>>> - ZPCI_FS_PERMANENT_ERROR
>>>
>>> We setup a timer and simply go ahead and unplug the device when the
>>> timer expires ("forced unplug").
>>
>> I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other
>> states?
>> ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but
>> other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST
>> and we do not need a timer (or any delay)
> 
> You can always expect that your guest driver is dead.

hum, the device is dead.
May be the guest got hit too if the driver is not right written.

> 
>>
>>>
>>> Changing that behavior might be more invasive. Simply not unplugging in
>>> s390_pcihost_timer_cb() on some of these states would mean that somebody
>>> issued a request and that requests is simply lost/ignored. Not sure if
>>> that is what we want. I think we need separate patches to change
>>> something like that. Especially
>>>
>>> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
>>> the unplug request and moves the device to ZPCI_FS_ENABLED before the
>>> timer expires? These are corner cases to consider.
>>
>> +1, we must ensure to do the work inside the unplug CB.
>>
>>>
>>> 2. Do we need a timer at all? Now that Patch #1 introduces
>>> unplug_requests, we are free to ignore requests from the user if the
>>> guest is not reacting. I would really favor getting rid of the timer
>>> completely. Was there a special reason why this was introduced?
>>
>> Yes, to let a chance to the guest to smoothly relinquish the device.
>> (for example sync/clean the disk)
>> However I do not think it is right implemented.
>>
>>>
>>> No other architecture (e.g. ACPI) uses such a timer. They use a simple
>>> flag to remember if a request is pending. I would really favor going
>>> into that direction.
>>
>> I am not sure that the Intel architecture is a good example. :)
> 
> Right, we all learned that zPCI did it better. (sorry ;) )

Well I really think so.
It is designed with several guest in parallel and shared devices.

In such an architecture, ripping of a device from a guest may have interest.
One good thing would be that the software of the guest handle it :)

> 
>>
>> AFAIU they do not wait for the guest to have relinquish the device.
>> Or do they?
>> How long do they wait?
> 
> They wait for ever. And I guess we should do the same thing. If the
> guest driver is broken (and this is really a rare scenario!) we would
> not get the device back. Which is perfectly fine in my point of view. In
> all other scenarios I guess the guest will simply respond in a timely
> manner. And ripping out stuff from the guest always feels wrong. (yes
> the channel subsystem works like that, but here we actually have a choice)
> 
> If we reboot, we can unplug the device. Otherwise, let's keep it simply
> and don't use a timer.
> 
> Thanks!
> 

AFAIK We have no plan to operate on pools of PCI devices so for me I 
have no objection to keep it simple:

Regards,
Pierre



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-29 15:14     ` David Hildenbrand
@ 2019-01-29 16:54       ` Pierre Morel
  2019-01-29 20:27         ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Pierre Morel @ 2019-01-29 16:54 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Thomas Huth, Cornelia Huck, Collin Walling,
	Christian Borntraeger, qemu-s390x, Richard Henderson

On 29/01/2019 16:14, David Hildenbrand wrote:
> On 29.01.19 14:31, Pierre Morel wrote:
>> On 21/01/2019 14:42, David Hildenbrand wrote:
>>> PCI on s390x is really weird and how it was modeled in QEMU might not have
>>> been the right choice. Anyhow, right now it is the case that:
>>> - Hotplugging a PCI device will silently create a zPCI device
>>>     (if none is provided)
>>> - Hotunplugging a zPCI device will unplug the PCI device (if any)
>>> - Hotunplugging a PCI device will unplug also the zPCI device
>>> As far as I can see, we can no longer change this behavior. But we
>>> should fix it.


>> So unplugging PCI first will deny the guest any possibility to smoothly
>> relinquish a device.
>>
>>
>> Is it possible the other way around?
> 
> Maybe, but it does not really matter. We unplug both devices
> synchronously, without the guest recognizing the order. We always have
> the unplug request first that notifies the guest. When we get an ACK
> from the guest, we can unpplug both devices in any order.
> 
> (and if we want to change the order, we should do it in a separate
> patch, this patch does not change the order, just refactors the code)


If it is done atomically, then I have no objection.

Regards,
Pierre



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-29 16:50               ` Pierre Morel
@ 2019-01-29 18:20                 ` David Hildenbrand
  2019-01-29 18:37                   ` Cornelia Huck
  2019-01-29 18:42                   ` Collin Walling
  0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-01-29 18:20 UTC (permalink / raw)
  To: pmorel, Collin Walling, Cornelia Huck, Christian Borntraeger
  Cc: Thomas Huth, qemu-s390x, qemu-devel, Richard Henderson

On 29.01.19 17:50, Pierre Morel wrote:
> On 29/01/2019 16:11, David Hildenbrand wrote:
>> On 29.01.19 14:50, Pierre Morel wrote:
>>> On 29/01/2019 11:24, David Hildenbrand wrote:
>>>>>>> I'm wondering what the architecture says regarding those events -- can
>>>>>>> someone with access to the documentation comment?
>>>>>>
>>>>>> Ping. Any comments from the IBM folks?
>>>
>>> Hi,
>>>
>>> Sorry to have wait so long.
>>> At least Collin was faster.
>>>
>>>>>>
>>>>>
>>>>>
>>>>> So the idea here is that if we have a PCI device that is the process of
>>>>> being deconfigured and we are also in the middle of a reset, then let's
>>>>> accelerate deconfiguring of the PCI device during the reset. Makes sense.
>>>
>>> to me too.
>>> However, how do we ensure that the guest got time to respond to the
>>> first deconfigure request?
>>
>> 30 seconds, then the reboot. On a reboot, I don't see why we should give
>> a guest more time. "It's dead", rip out the card as the guest refused to
>> hand it back. (maybe it crashed! but after a reboot the guest state is
>> reset and baught back to life)
> 
> I agree that 30 seconds is way enough,
> also agree that in most cases the guest is dead.
> 
> 
>>
>>>
>>>>>
>>>>> Note:
>>>>>
>>>>> The callback function will deconfigure the the device and put it into
>>>>> standby mode. However, a PCI device should only go into standby from the
>>>>> *disabled state* (which it could already be in due to the unplug
>>>>> sequence), or from a *permanent error state* (something we should
>>>>> hopefully never see -- this means something went seriously wrong with
>>>>> the device).
>>>
>>> Not completely exact, the CHSC event 0x304, on the initiative of the
>>> host, force the "deconfigure state" from "configure state" generally,
>>> whatever sub-state it has (enabled/disabled/error...).
>>>
>>>>
>>>> Right, this should already have been checked before setting up the timer.
>>>
>>> Apropos timer, do we need a timer or wouldn't it be better to use a
>>> delay / a timer + condition?
>>
>> I don't think we need a timer at all.
> 
> Yes, if it is possible to wait synchronously 30s it seems good to me.

I mean, we don't have to wait 30 seconds at all.

1. We send a request to the guest
2. It responds (after some seconds), letting go of the zPCI device
3. We unplug the device

1. We send a request to the guest
2. Guest does not respond, request keeps pending forever
3. On reboot, unplug the device

This is how x86/ACPI handles it.

> 
>>
>>>
>>> AFAIU we get out of the unplug without waiting for any answer from the
>>> guest and we surely get the timer triggering after the reset has been done.
>>> That seems bad.
>>
>> This is the case right now, correct.
>>
>>>
>>>
>>>>
>>>>>
>>>>> Two things I'm concerned about:
>>>>>
>>>>> 1)
>>>>>
>>>>> What I would suggest is adding a check for the pbdev->state for
>>>>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>>>>> states, then we're safe to deconfigure and put into standby. If the
>>>>
>>>> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
>>>>
>>>> So for
>>>> - ZPCI_FS_DISABLED
>>>> - ZPCI_FS_ENABLED
>>>> - ZPCI_FS_BLOCKED
>>>> - ZPCI_FS_ERROR
>>>> - ZPCI_FS_PERMANENT_ERROR
>>>>
>>>> We setup a timer and simply go ahead and unplug the device when the
>>>> timer expires ("forced unplug").
>>>
>>> I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other
>>> states?
>>> ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but
>>> other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST
>>> and we do not need a timer (or any delay)
>>
>> You can always expect that your guest driver is dead.
> 
> hum, the device is dead.
> May be the guest got hit too if the driver is not right written.
> 
>>
>>>
>>>>
>>>> Changing that behavior might be more invasive. Simply not unplugging in
>>>> s390_pcihost_timer_cb() on some of these states would mean that somebody
>>>> issued a request and that requests is simply lost/ignored. Not sure if
>>>> that is what we want. I think we need separate patches to change
>>>> something like that. Especially
>>>>
>>>> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
>>>> the unplug request and moves the device to ZPCI_FS_ENABLED before the
>>>> timer expires? These are corner cases to consider.
>>>
>>> +1, we must ensure to do the work inside the unplug CB.
>>>
>>>>
>>>> 2. Do we need a timer at all? Now that Patch #1 introduces
>>>> unplug_requests, we are free to ignore requests from the user if the
>>>> guest is not reacting. I would really favor getting rid of the timer
>>>> completely. Was there a special reason why this was introduced?
>>>
>>> Yes, to let a chance to the guest to smoothly relinquish the device.
>>> (for example sync/clean the disk)
>>> However I do not think it is right implemented.
>>>
>>>>
>>>> No other architecture (e.g. ACPI) uses such a timer. They use a simple
>>>> flag to remember if a request is pending. I would really favor going
>>>> into that direction.
>>>
>>> I am not sure that the Intel architecture is a good example. :)
>>
>> Right, we all learned that zPCI did it better. (sorry ;) )
> 
> Well I really think so.
> It is designed with several guest in parallel and shared devices.
> 
> In such an architecture, ripping of a device from a guest may have interest.
> One good thing would be that the software of the guest handle it :)

That is indeed true, but I think such a forced removal also works on
x86. Theoretically. ("physically rip out the card"). See below.

> 
>>
>>>
>>> AFAIU they do not wait for the guest to have relinquish the device.
>>> Or do they?
>>> How long do they wait?
>>
>> They wait for ever. And I guess we should do the same thing. If the
>> guest driver is broken (and this is really a rare scenario!) we would
>> not get the device back. Which is perfectly fine in my point of view. In
>> all other scenarios I guess the guest will simply respond in a timely
>> manner. And ripping out stuff from the guest always feels wrong. (yes
>> the channel subsystem works like that, but here we actually have a choice)
>>
>> If we reboot, we can unplug the device. Otherwise, let's keep it simply
>> and don't use a timer.
>>
>> Thanks!
>>
> 
> AFAIK We have no plan to operate on pools of PCI devices so for me I 
> have no objection to keep it simple:

Especially one note:

There seems to be demand for a so called "forced PCI removal" also on
other architectures. However, this would than rather most probably be
modeled on top of what we have right now.

E.g. instead of "device_del XXX" which would request the guest to let go
of the device, there could be something like "device_del XXX,forced=true".

E.g. ask the guest. If it does not respond after some time, force remove
it. This is basically the timer, but managed by a different level, of
software. And you can than actually decide if you want to do eventually
harm to the guest OS.

Are there any other objects of getting rid of the timer?

Conny could pick up patch #1 once you get an ACK. I would send more
patches to drop the timer and rework this patch.

Thanks Pierre!

> 
> Regards,
> Pierre
> 
> 
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-29 18:20                 ` David Hildenbrand
@ 2019-01-29 18:37                   ` Cornelia Huck
  2019-01-29 18:42                   ` Collin Walling
  1 sibling, 0 replies; 22+ messages in thread
From: Cornelia Huck @ 2019-01-29 18:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: pmorel, Collin Walling, Christian Borntraeger, Thomas Huth,
	qemu-s390x, qemu-devel, Richard Henderson

On Tue, 29 Jan 2019 19:20:55 +0100
David Hildenbrand <david@redhat.com> wrote:

> Conny could pick up patch #1 once you get an ACK. I would send more
> patches to drop the timer and rework this patch.

Sure, will do.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
  2019-01-29 18:20                 ` David Hildenbrand
  2019-01-29 18:37                   ` Cornelia Huck
@ 2019-01-29 18:42                   ` Collin Walling
  1 sibling, 0 replies; 22+ messages in thread
From: Collin Walling @ 2019-01-29 18:42 UTC (permalink / raw)
  To: David Hildenbrand, pmorel, Cornelia Huck, Christian Borntraeger
  Cc: Thomas Huth, qemu-s390x, qemu-devel, Richard Henderson

On 1/29/19 1:20 PM, David Hildenbrand wrote:
> On 29.01.19 17:50, Pierre Morel wrote:
>> On 29/01/2019 16:11, David Hildenbrand wrote:
>>> On 29.01.19 14:50, Pierre Morel wrote:
>>>> On 29/01/2019 11:24, David Hildenbrand wrote:
>>>>>>>> I'm wondering what the architecture says regarding those events -- can
>>>>>>>> someone with access to the documentation comment?
>>>>>>>
>>>>>>> Ping. Any comments from the IBM folks?
>>>>
>>>> Hi,
>>>>
>>>> Sorry to have wait so long.
>>>> At least Collin was faster.
>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> So the idea here is that if we have a PCI device that is the process of
>>>>>> being deconfigured and we are also in the middle of a reset, then let's
>>>>>> accelerate deconfiguring of the PCI device during the reset. Makes sense.
>>>>
>>>> to me too.
>>>> However, how do we ensure that the guest got time to respond to the
>>>> first deconfigure request?
>>>
>>> 30 seconds, then the reboot. On a reboot, I don't see why we should give
>>> a guest more time. "It's dead", rip out the card as the guest refused to
>>> hand it back. (maybe it crashed! but after a reboot the guest state is
>>> reset and baught back to life)
>>
>> I agree that 30 seconds is way enough,
>> also agree that in most cases the guest is dead.
>>
>>
>>>
>>>>
>>>>>>
>>>>>> Note:
>>>>>>
>>>>>> The callback function will deconfigure the the device and put it into
>>>>>> standby mode. However, a PCI device should only go into standby from the
>>>>>> *disabled state* (which it could already be in due to the unplug
>>>>>> sequence), or from a *permanent error state* (something we should
>>>>>> hopefully never see -- this means something went seriously wrong with
>>>>>> the device).
>>>>
>>>> Not completely exact, the CHSC event 0x304, on the initiative of the
>>>> host, force the "deconfigure state" from "configure state" generally,
>>>> whatever sub-state it has (enabled/disabled/error...).
>>>>
>>>>>
>>>>> Right, this should already have been checked before setting up the timer.
>>>>
>>>> Apropos timer, do we need a timer or wouldn't it be better to use a
>>>> delay / a timer + condition?
>>>
>>> I don't think we need a timer at all.
>>
>> Yes, if it is possible to wait synchronously 30s it seems good to me.
> 
> I mean, we don't have to wait 30 seconds at all.
> 
> 1. We send a request to the guest
> 2. It responds (after some seconds), letting go of the zPCI device
> 3. We unplug the device
> 
> 1. We send a request to the guest
> 2. Guest does not respond, request keeps pending forever
> 3. On reboot, unplug the device
> 
> This is how x86/ACPI handles it.
> 
>>
>>>
>>>>
>>>> AFAIU we get out of the unplug without waiting for any answer from the
>>>> guest and we surely get the timer triggering after the reset has been done.
>>>> That seems bad.
>>>
>>> This is the case right now, correct.
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Two things I'm concerned about:
>>>>>>
>>>>>> 1)
>>>>>>
>>>>>> What I would suggest is adding a check for the pbdev->state for
>>>>>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>>>>>> states, then we're safe to deconfigure and put into standby. If the
>>>>>
>>>>> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
>>>>>
>>>>> So for
>>>>> - ZPCI_FS_DISABLED
>>>>> - ZPCI_FS_ENABLED
>>>>> - ZPCI_FS_BLOCKED
>>>>> - ZPCI_FS_ERROR
>>>>> - ZPCI_FS_PERMANENT_ERROR
>>>>>
>>>>> We setup a timer and simply go ahead and unplug the device when the
>>>>> timer expires ("forced unplug").
>>>>
>>>> I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other
>>>> states?
>>>> ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but
>>>> other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST
>>>> and we do not need a timer (or any delay)
>>>
>>> You can always expect that your guest driver is dead.
>>
>> hum, the device is dead.
>> May be the guest got hit too if the driver is not right written.
>>
>>>
>>>>
>>>>>
>>>>> Changing that behavior might be more invasive. Simply not unplugging in
>>>>> s390_pcihost_timer_cb() on some of these states would mean that somebody
>>>>> issued a request and that requests is simply lost/ignored. Not sure if
>>>>> that is what we want. I think we need separate patches to change
>>>>> something like that. Especially
>>>>>
>>>>> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
>>>>> the unplug request and moves the device to ZPCI_FS_ENABLED before the
>>>>> timer expires? These are corner cases to consider.
>>>>
>>>> +1, we must ensure to do the work inside the unplug CB.
>>>>
>>>>>
>>>>> 2. Do we need a timer at all? Now that Patch #1 introduces
>>>>> unplug_requests, we are free to ignore requests from the user if the
>>>>> guest is not reacting. I would really favor getting rid of the timer
>>>>> completely. Was there a special reason why this was introduced?
>>>>
>>>> Yes, to let a chance to the guest to smoothly relinquish the device.
>>>> (for example sync/clean the disk)
>>>> However I do not think it is right implemented.
>>>>
>>>>>
>>>>> No other architecture (e.g. ACPI) uses such a timer. They use a simple
>>>>> flag to remember if a request is pending. I would really favor going
>>>>> into that direction.
>>>>
>>>> I am not sure that the Intel architecture is a good example. :)
>>>
>>> Right, we all learned that zPCI did it better. (sorry ;) )
>>
>> Well I really think so.
>> It is designed with several guest in parallel and shared devices.
>>
>> In such an architecture, ripping of a device from a guest may have interest.
>> One good thing would be that the software of the guest handle it :)
> 
> That is indeed true, but I think such a forced removal also works on
> x86. Theoretically. ("physically rip out the card"). See below.
> 
>>
>>>
>>>>
>>>> AFAIU they do not wait for the guest to have relinquish the device.
>>>> Or do they?
>>>> How long do they wait?
>>>
>>> They wait for ever. And I guess we should do the same thing. If the
>>> guest driver is broken (and this is really a rare scenario!) we would
>>> not get the device back. Which is perfectly fine in my point of view. In
>>> all other scenarios I guess the guest will simply respond in a timely
>>> manner. And ripping out stuff from the guest always feels wrong. (yes
>>> the channel subsystem works like that, but here we actually have a choice)
>>>
>>> If we reboot, we can unplug the device. Otherwise, let's keep it simply
>>> and don't use a timer.
>>>
>>> Thanks!
>>>

This makes more sense to me. If something goes wrong with unplugging a device,
and we use a timeout for forcefully unplug the device... it will never be
apparent to the guest or user that something went wrong in the first place!

>>
>> AFAIK We have no plan to operate on pools of PCI devices so for me I 
>> have no objection to keep it simple:
> 
> Especially one note:
> 
> There seems to be demand for a so called "forced PCI removal" also on
> other architectures. However, this would than rather most probably be
> modeled on top of what we have right now.
> 
> E.g. instead of "device_del XXX" which would request the guest to let go
> of the device, there could be something like "device_del XXX,forced=true".
> 
> E.g. ask the guest. If it does not respond after some time, force remove
> it. This is basically the timer, but managed by a different level, of
> software. And you can than actually decide if you want to do eventually
> harm to the guest OS.
> 
> Are there any other objects of getting rid of the timer?
> 
> Conny could pick up patch #1 once you get an ACK. I would send more
> patches to drop the timer and rework this patch.
> 
> Thanks Pierre!
> 
>>
>> Regards,
>> Pierre
>>
>>
>>
> 
> 

David, Pierre: good discussion.

I'm in favor of dropping the timer, especially with the notes and examples 
above. I think the PCI code will look a lot cleaner / easier-to-maintain 
without it as well.

-- 
Respectfully,
- Collin Walling

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-29 16:54       ` Pierre Morel
@ 2019-01-29 20:27         ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-01-29 20:27 UTC (permalink / raw)
  To: pmorel, qemu-devel
  Cc: Thomas Huth, Cornelia Huck, Collin Walling,
	Christian Borntraeger, qemu-s390x, Richard Henderson

On 29.01.19 17:54, Pierre Morel wrote:
> On 29/01/2019 16:14, David Hildenbrand wrote:
>> On 29.01.19 14:31, Pierre Morel wrote:
>>> On 21/01/2019 14:42, David Hildenbrand wrote:
>>>> PCI on s390x is really weird and how it was modeled in QEMU might not have
>>>> been the right choice. Anyhow, right now it is the case that:
>>>> - Hotplugging a PCI device will silently create a zPCI device
>>>>     (if none is provided)
>>>> - Hotunplugging a zPCI device will unplug the PCI device (if any)
>>>> - Hotunplugging a PCI device will unplug also the zPCI device
>>>> As far as I can see, we can no longer change this behavior. But we
>>>> should fix it.
> 
> 
>>> So unplugging PCI first will deny the guest any possibility to smoothly
>>> relinquish a device.
>>>
>>>
>>> Is it possible the other way around?
>>
>> Maybe, but it does not really matter. We unplug both devices
>> synchronously, without the guest recognizing the order. We always have
>> the unplug request first that notifies the guest. When we get an ACK
>> from the guest, we can unpplug both devices in any order.
>>
>> (and if we want to change the order, we should do it in a separate
>> patch, this patch does not change the order, just refactors the code)
> 
> 
> If it is done atomically, then I have no objection.

Yes, this is atomically. It is two separate steps, but only logically.
The guest cannot observe it.

Thanks!

> 
> Regards,
> Pierre
> 
> 
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
  2019-01-23 11:03   ` Cornelia Huck
  2019-01-29 13:31   ` Pierre Morel
@ 2019-01-30 19:52   ` Collin Walling
  2019-01-31  9:31     ` David Hildenbrand
  2 siblings, 1 reply; 22+ messages in thread
From: Collin Walling @ 2019-01-30 19:52 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Thomas Huth, Cornelia Huck, Christian Borntraeger, qemu-s390x,
	Richard Henderson

On 1/21/19 8:42 AM, David Hildenbrand wrote:
> PCI on s390x is really weird and how it was modeled in QEMU might not have
> been the right choice. Anyhow, right now it is the case that:
> - Hotplugging a PCI device will silently create a zPCI device
>   (if none is provided)
> - Hotunplugging a zPCI device will unplug the PCI device (if any)
> - Hotunplugging a PCI device will unplug also the zPCI device
> As far as I can see, we can no longer change this behavior. But we
> should fix it.
> 
> Both device types are handled via a single hotplug handler call. This
> is problematic for various reasons:
> 1. Unplugging via the zPCI device allows to unplug PCI bridges as
>    checks are not performed - bad.

Very bad...

Solving this issue is outside the scope of this patchset, correct? Or
am I missing something (nothing stood out to me as I reviewed the code)

> 2. Unplugging via the zPCI device allows to unplug devices that are not
>    hot removable. (check performed in qdev_unplug()) - bad.

VERY bad.

> 3. Hotplug handler chains are not possible for the unplug case. In the
>    future, the machine might want to override hotplug handlers, to
>    process device specific stuff and to then branch off to the actual
>    hotplug handler. We need separate hotplug handler calls for both the
>    PCI and zPCI device to make this work reliably. All other PCI
>    implementations are already prepared to handle this correctly, only
>    s390x is missing.
> 
> Therefore, introduce the unplug_request handler and properly perform
> unplug checks by redirecting to the separate unplug_request handlers.
> When finally unplugging, perform two separate hotplug_handler_unplug()
> calls, first for the PCI device, followed by the zPCI device. This now
> nicely splits unplugging paths for both devices.
> 
> The redirect part is a little hairy, as the user is allowed to trigger
> unplug either via the PCI or the zPCI device. So redirect always to the
> PCI unplug request handler first and remember if that check has been
> performed in the zPCI device. Redirect then to the zPCI device unplug
> request handler to perform the magic. Remembering that we already
> checked the PCI device breaks the redirect loop.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/s390x/s390-pci-bus.c | 166 +++++++++++++++++++++++++++-------------
>  hw/s390x/s390-pci-bus.h |   1 +
>  2 files changed, 113 insertions(+), 54 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index f017c1ded0..bc17a8cf65 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -148,6 +148,22 @@ out:
>      psccb->header.response_code = cpu_to_be16(rc);
>  }
>  
> +static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
> +{
> +    HotplugHandler *hotplug_ctrl;
> +
> +    /* Unplug the PCI device */
> +    if (pbdev->pdev) {
> +        hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(pbdev->pdev));
> +        hotplug_handler_unplug(hotplug_ctrl, DEVICE(pbdev->pdev),
> +                               &error_abort);
> +    }
> +
> +    /* Unplug the zPCI device */
> +    hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(pbdev));
> +    hotplug_handler_unplug(hotplug_ctrl, DEVICE(pbdev), &error_abort);
> +}

Please help me understand this just a little bit better. Most of my 
understanding of the s390 PCI architecture stems from the lower-level 
instruction / DMA stuff and some of the pass-through stuff. I'm just a 
little shaky on the PCI topology with respect to how it is portrayed in 
the QEMU code.

My confusion: when we unplug a zPCI device, we are unplugging a *bus* 
device? Won't that cause an issue if other PCI devices are sharing that
bus? Or am I misunderstanding the QEMU code here and a PCIBusDevice is 
more-or-less a path from the bridge to the PCI device (and not the 
entire bus itself)? Or...?

Sorry for the technical question here that I should probably already 
know the answer to. Hopefully someone can help accelerate my
understanding.

> +
>  void s390_pci_sclp_deconfigure(SCCB *sccb)
>  {
>      IoaCfgSccb *psccb = (IoaCfgSccb *)sccb;
> @@ -179,7 +195,7 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>          rc = SCLP_RC_NORMAL_COMPLETION;
>  
>          if (pbdev->release_timer) {
> -            qdev_unplug(DEVICE(pbdev->pdev), NULL);
> +            s390_pci_perform_unplug(pbdev);
>          }
>      }
>  out:
> @@ -217,6 +233,24 @@ S390PCIBusDevice *s390_pci_find_dev_by_target(S390pciState *s,
>      return NULL;
>  }
>  
> +static S390PCIBusDevice *s390_pci_find_dev_by_pci(S390pciState *s,
> +                                                  PCIDevice *pci_dev)
> +{
> +    S390PCIBusDevice *pbdev;
> +
> +    if (!pci_dev) {
> +        return NULL;
> +    }
> +
> +    QTAILQ_FOREACH(pbdev, &s->zpci_devs, link) {
> +        if (pbdev->pdev == pci_dev) {
> +            return pbdev;
> +        }
> +    }
> +
> +    return NULL;
> +}

Maybe call this "s390_pci_find_*bus*_dev_by_pci" to make it a little 
more clear that this finds and returns a bus device? Better yet, maybe
"s390_pci_find_bus" to be similar to the generic "pci_get_bus" function.

> +
>  S390PCIBusDevice *s390_pci_find_dev_by_idx(S390pciState *s, uint32_t idx)
>  {
>      return g_hash_table_lookup(s->zpci_table, &idx);
> @@ -939,77 +973,100 @@ static void s390_pcihost_timer_cb(void *opaque)
>      pbdev->state = ZPCI_FS_STANDBY;
>      s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
>                                   pbdev->fh, pbdev->fid);
> -    qdev_unplug(DEVICE(pbdev), NULL);
> +    s390_pci_perform_unplug(pbdev);
>  }
>  
>  static void s390_pcihost_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                  Error **errp)
>  {
>      S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
> -    PCIDevice *pci_dev = NULL;
> -    PCIBus *bus;
> -    int32_t devfn;
>      S390PCIBusDevice *pbdev = NULL;
>  
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> +        PCIBus *bus;
> +        int32_t devfn;
> +
> +        pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
> +        g_assert(pbdev);
> +
> +        s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
> +                                     pbdev->fh, pbdev->fid);
> +        bus = pci_get_bus(pci_dev);
> +        devfn = pci_dev->devfn;
> +        object_unparent(OBJECT(pci_dev));
> +
> +        s390_pci_msix_free(pbdev);
> +        s390_pci_iommu_free(s, bus, devfn);
> +        pbdev->pdev = NULL;
> +        pbdev->state = ZPCI_FS_RESERVED;
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {

"S390_PCI_DEVICE" is synonymous with "zPCI device", right? Just making 
sure...

> +        pbdev = S390_PCI_DEVICE(dev);
> +
> +        if (pbdev->release_timer) {
> +            timer_del(pbdev->release_timer);
> +            timer_free(pbdev->release_timer);
> +            pbdev->release_timer = NULL;
> +        }
> +        pbdev->fid = 0;
> +        QTAILQ_REMOVE(&s->zpci_devs, pbdev, link);
> +        g_hash_table_remove(s->zpci_table, &pbdev->idx);
> +        object_unparent(OBJECT(pbdev));
> +    }
> +}
> +
> +static void s390_pcihost_unplug_request(HotplugHandler *hotplug_dev,
> +                                        DeviceState *dev,
> +                                        Error **errp)
> +{
> +    S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
> +    S390PCIBusDevice *pbdev;
> +
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>          error_setg(errp, "PCI bridge hot unplug currently not supported");
> -        return;
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> -        pci_dev = PCI_DEVICE(dev);
> -
> -        QTAILQ_FOREACH(pbdev, &s->zpci_devs, link) {
> -            if (pbdev->pdev == pci_dev) {
> -                break;
> -            }
> -        }
> -        assert(pbdev != NULL);
> +        /*
> +         * Redirect the unplug request to the zPCI device and remember that
> +         * we've checked the PCI device already (to prevent endless recursion).
> +         */
> +        pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
> +        g_assert(pbdev);
> +        pbdev->pci_unplug_request_processed = true;
> +        qdev_unplug(DEVICE(pbdev), errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
>          pbdev = S390_PCI_DEVICE(dev);
> -        pci_dev = pbdev->pdev;
> -    } else {
> -        g_assert_not_reached();
> -    }
>  
> -    switch (pbdev->state) {
> -    case ZPCI_FS_RESERVED:
> -        goto out;
> -    case ZPCI_FS_STANDBY:
> -        break;
> -    default:
> -        if (pbdev->release_timer) {
> +        /*
> +         * If unplug was initially requested for the zPCI device, we
> +         * first have to redirect to the PCI device, which will in return
> +         * redirect back to us after performing its checks (if the request
> +         * is not blocked, e.g. because it's a PCI bridge).
> +         */
> +        if (pbdev->pdev && !pbdev->pci_unplug_request_processed) {
> +            qdev_unplug(DEVICE(pbdev->pdev), errp);
>              return;
>          }
> -        s390_pci_generate_plug_event(HP_EVENT_DECONFIGURE_REQUEST,
> -                                     pbdev->fh, pbdev->fid);
> -        pbdev->release_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> -                                            s390_pcihost_timer_cb,
> -                                            pbdev);
> -        timer_mod(pbdev->release_timer,
> -                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + HOT_UNPLUG_TIMEOUT);
> -        return;
> -    }
> +        pbdev->pci_unplug_request_processed = false;
>  
> -    if (pbdev->release_timer) {
> -        timer_del(pbdev->release_timer);
> -        timer_free(pbdev->release_timer);
> -        pbdev->release_timer = NULL;
> +        switch (pbdev->state) {
> +        case ZPCI_FS_STANDBY:
> +        case ZPCI_FS_RESERVED:
> +            s390_pci_perform_unplug(pbdev);
> +            break;
> +        default:
> +            if (pbdev->release_timer) {
> +                return;
> +            }
> +            s390_pci_generate_plug_event(HP_EVENT_DECONFIGURE_REQUEST,
> +                                         pbdev->fh, pbdev->fid);
> +            pbdev->release_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> +                                                s390_pcihost_timer_cb, pbdev);
> +            timer_mod(pbdev->release_timer,
> +                    qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + HOT_UNPLUG_TIMEOUT);
> +        }
> +    } else {
> +        g_assert_not_reached();
>      }
> -
> -    s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
> -                                 pbdev->fh, pbdev->fid);
> -    bus = pci_get_bus(pci_dev);
> -    devfn = pci_dev->devfn;
> -    object_unparent(OBJECT(pci_dev));
> -    fmb_timer_free(pbdev);
> -    s390_pci_msix_free(pbdev);
> -    s390_pci_iommu_free(s, bus, devfn);
> -    pbdev->pdev = NULL;
> -    pbdev->state = ZPCI_FS_RESERVED;
> -out:
> -    pbdev->fid = 0;
> -    QTAILQ_REMOVE(&s->zpci_devs, pbdev, link);
> -    g_hash_table_remove(s->zpci_table, &pbdev->idx);
> -    object_unparent(OBJECT(pbdev));
>  }

So from what I understand: _unplug_request will deconfigure the device 
(if !standby and !reserved) and _unplug will remove the device from the 
guest?

Looks good.

>  
>  static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
> @@ -1059,6 +1116,7 @@ static void s390_pcihost_class_init(ObjectClass *klass, void *data)
>      dc->realize = s390_pcihost_realize;
>      hc->pre_plug = s390_pcihost_pre_plug;
>      hc->plug = s390_pcihost_plug;
> +    hc->unplug_request = s390_pcihost_unplug_request;
>      hc->unplug = s390_pcihost_unplug;
>      msi_nonbroken = true;
>  }
> diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
> index dadad1f758..b1a6bb8296 100644
> --- a/hw/s390x/s390-pci-bus.h
> +++ b/hw/s390x/s390-pci-bus.h
> @@ -336,6 +336,7 @@ struct S390PCIBusDevice {
>      IndAddr *summary_ind;
>      IndAddr *indicator;
>      QEMUTimer *release_timer;
> +    bool pci_unplug_request_processed;
>      QTAILQ_ENTRY(S390PCIBusDevice) link;
>  };
>  
> 

Patch looks good so far. I only have a few questions above.

-- 
Respectfully,
- Collin Walling

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler
  2019-01-30 19:52   ` [Qemu-devel] [qemu-s390x] " Collin Walling
@ 2019-01-31  9:31     ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-01-31  9:31 UTC (permalink / raw)
  To: Collin Walling, qemu-devel
  Cc: Thomas Huth, Cornelia Huck, Christian Borntraeger, qemu-s390x,
	Richard Henderson

On 30.01.19 20:52, Collin Walling wrote:
> On 1/21/19 8:42 AM, David Hildenbrand wrote:
>> PCI on s390x is really weird and how it was modeled in QEMU might not have
>> been the right choice. Anyhow, right now it is the case that:
>> - Hotplugging a PCI device will silently create a zPCI device
>>   (if none is provided)
>> - Hotunplugging a zPCI device will unplug the PCI device (if any)
>> - Hotunplugging a PCI device will unplug also the zPCI device
>> As far as I can see, we can no longer change this behavior. But we
>> should fix it.
>>
>> Both device types are handled via a single hotplug handler call. This
>> is problematic for various reasons:
>> 1. Unplugging via the zPCI device allows to unplug PCI bridges as
>>    checks are not performed - bad.
> 
> Very bad...
> 
> Solving this issue is outside the scope of this patchset, correct? Or
> am I missing something (nothing stood out to me as I reviewed the code)

As Conny mentioned, 1. does not hold (it resulted from a lack of
understanding on my side). So 1. is not possible.

> 
>> 2. Unplugging via the zPCI device allows to unplug devices that are not
>>    hot removable. (check performed in qdev_unplug()) - bad.
> 
> VERY bad.

This one however holds.

> 
>> 3. Hotplug handler chains are not possible for the unplug case. In the
>>    future, the machine might want to override hotplug handlers, to
>>    process device specific stuff and to then branch off to the actual
>>    hotplug handler. We need separate hotplug handler calls for both the
>>    PCI and zPCI device to make this work reliably. All other PCI
>>    implementations are already prepared to handle this correctly, only
>>    s390x is missing.
>>
>> Therefore, introduce the unplug_request handler and properly perform
>> unplug checks by redirecting to the separate unplug_request handlers.
>> When finally unplugging, perform two separate hotplug_handler_unplug()
>> calls, first for the PCI device, followed by the zPCI device. This now
>> nicely splits unplugging paths for both devices.
>>
>> The redirect part is a little hairy, as the user is allowed to trigger
>> unplug either via the PCI or the zPCI device. So redirect always to the
>> PCI unplug request handler first and remember if that check has been
>> performed in the zPCI device. Redirect then to the zPCI device unplug
>> request handler to perform the magic. Remembering that we already
>> checked the PCI device breaks the redirect loop.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/s390x/s390-pci-bus.c | 166 +++++++++++++++++++++++++++-------------
>>  hw/s390x/s390-pci-bus.h |   1 +
>>  2 files changed, 113 insertions(+), 54 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index f017c1ded0..bc17a8cf65 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -148,6 +148,22 @@ out:
>>      psccb->header.response_code = cpu_to_be16(rc);
>>  }
>>  
>> +static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
>> +{
>> +    HotplugHandler *hotplug_ctrl;
>> +
>> +    /* Unplug the PCI device */
>> +    if (pbdev->pdev) {
>> +        hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(pbdev->pdev));
>> +        hotplug_handler_unplug(hotplug_ctrl, DEVICE(pbdev->pdev),
>> +                               &error_abort);
>> +    }
>> +
>> +    /* Unplug the zPCI device */
>> +    hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(pbdev));
>> +    hotplug_handler_unplug(hotplug_ctrl, DEVICE(pbdev), &error_abort);
>> +}
> 
> Please help me understand this just a little bit better. Most of my 
> understanding of the s390 PCI architecture stems from the lower-level 
> instruction / DMA stuff and some of the pass-through stuff. I'm just a 
> little shaky on the PCI topology with respect to how it is portrayed in 
> the QEMU code.
> 
> My confusion: when we unplug a zPCI device, we are unplugging a *bus* 
> device? Won't that cause an issue if other PCI devices are sharing that
> bus? Or am I misunderstanding the QEMU code here and a PCIBusDevice is 
> more-or-less a path from the bridge to the PCI device (and not the 
> entire bus itself)? Or...?

Both zPCI as well as PCI devices are actually bus devices ("devices
attached to a bus" - TYPE_DEVICE with dc->bus_type configured). (And the
registered bus hotplug handler handles hotplug/unplug on that bus).

There should not be any problem as long as we are not unplugging a
bridge, which has it's own bus with potentially other devices
registered. But we already shield unplugging bridges, so that is not an
issue.

S390PCIBusDevice (a.k.a. zPCI devices) are just a mean to attach further
architecture information to a PCI devices. These devices are all
attached to the TYPE_S390_PCI_BUS (and there is only one for the whole
system, part of the PCI host bridge)!

In contrast, PCI devices are attached to whatever bus is defined (it
could be the bus of the host bridge or the bus of a created bridge).

So unplugging an ordinary PCI device / S390PCIBusDevice does not affect
other devices, as long as we don't unplug bridges with devices attached.
But we already disallow that.

(I *think* we could even allow unplugging of PCI bridges if there are no
other devices attached, but that is future work)


> 
> Sorry for the technical question here that I should probably already 
> know the answer to. Hopefully someone can help accelerate my
> understanding.
> 
>> +
>>  void s390_pci_sclp_deconfigure(SCCB *sccb)
>>  {
>>      IoaCfgSccb *psccb = (IoaCfgSccb *)sccb;
>> @@ -179,7 +195,7 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>          rc = SCLP_RC_NORMAL_COMPLETION;
>>  
>>          if (pbdev->release_timer) {
>> -            qdev_unplug(DEVICE(pbdev->pdev), NULL);
>> +            s390_pci_perform_unplug(pbdev);
>>          }
>>      }
>>  out:
>> @@ -217,6 +233,24 @@ S390PCIBusDevice *s390_pci_find_dev_by_target(S390pciState *s,
>>      return NULL;
>>  }
>>  
>> +static S390PCIBusDevice *s390_pci_find_dev_by_pci(S390pciState *s,
>> +                                                  PCIDevice *pci_dev)
>> +{
>> +    S390PCIBusDevice *pbdev;
>> +
>> +    if (!pci_dev) {
>> +        return NULL;
>> +    }
>> +
>> +    QTAILQ_FOREACH(pbdev, &s->zpci_devs, link) {
>> +        if (pbdev->pdev == pci_dev) {
>> +            return pbdev;
>> +        }
>> +    }
>> +
>> +    return NULL;
>> +}
> 
> Maybe call this "s390_pci_find_*bus*_dev_by_pci" to make it a little 
> more clear that this finds and returns a bus device? Better yet, maybe
> "s390_pci_find_bus" to be similar to the generic "pci_get_bus" function.
> 

I chose this naming because it matches the other functions that already
exist

s390_pci_find_next_avail_dev()
s390_pci_find_dev_by_fid()
s390_pci_find_dev_by_uid()
...

>> +
>>  S390PCIBusDevice *s390_pci_find_dev_by_idx(S390pciState *s, uint32_t idx)
>>  {
>>      return g_hash_table_lookup(s->zpci_table, &idx);
>> @@ -939,77 +973,100 @@ static void s390_pcihost_timer_cb(void *opaque)
>>      pbdev->state = ZPCI_FS_STANDBY;
>>      s390_pci_generate_plug_event(HP_EVENT_CONFIGURED_TO_STBRES,
>>                                   pbdev->fh, pbdev->fid);
>> -    qdev_unplug(DEVICE(pbdev), NULL);
>> +    s390_pci_perform_unplug(pbdev);
>>  }
>>  
>>  static void s390_pcihost_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>                                  Error **errp)
>>  {
>>      S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
>> -    PCIDevice *pci_dev = NULL;
>> -    PCIBus *bus;
>> -    int32_t devfn;
>>      S390PCIBusDevice *pbdev = NULL;
>>  
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>> +        PCIBus *bus;
>> +        int32_t devfn;
>> +
>> +        pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
>> +        g_assert(pbdev);
>> +
>> +        s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
>> +                                     pbdev->fh, pbdev->fid);
>> +        bus = pci_get_bus(pci_dev);
>> +        devfn = pci_dev->devfn;
>> +        object_unparent(OBJECT(pci_dev));
>> +
>> +        s390_pci_msix_free(pbdev);
>> +        s390_pci_iommu_free(s, bus, devfn);
>> +        pbdev->pdev = NULL;
>> +        pbdev->state = ZPCI_FS_RESERVED;
>> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
> 
> "S390_PCI_DEVICE" is synonymous with "zPCI device", right? Just making 
> sure...

Yes, exactly.

> 
>> +        pbdev = S390_PCI_DEVICE(dev);
>> +
>> +        if (pbdev->release_timer) {
>> +            timer_del(pbdev->release_timer);
>> +            timer_free(pbdev->release_timer);
>> +            pbdev->release_timer = NULL;
>> +        }
>> +        pbdev->fid = 0;
>> +        QTAILQ_REMOVE(&s->zpci_devs, pbdev, link);
>> +        g_hash_table_remove(s->zpci_table, &pbdev->idx);
>> +        object_unparent(OBJECT(pbdev));
>> +    }
>> +}
>> +
>> +static void s390_pcihost_unplug_request(HotplugHandler *hotplug_dev,
>> +                                        DeviceState *dev,
>> +                                        Error **errp)
>> +{
>> +    S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
>> +    S390PCIBusDevice *pbdev;
>> +
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>>          error_setg(errp, "PCI bridge hot unplug currently not supported");
>> -        return;
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> -        pci_dev = PCI_DEVICE(dev);
>> -
>> -        QTAILQ_FOREACH(pbdev, &s->zpci_devs, link) {
>> -            if (pbdev->pdev == pci_dev) {
>> -                break;
>> -            }
>> -        }
>> -        assert(pbdev != NULL);
>> +        /*
>> +         * Redirect the unplug request to the zPCI device and remember that
>> +         * we've checked the PCI device already (to prevent endless recursion).
>> +         */
>> +        pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
>> +        g_assert(pbdev);
>> +        pbdev->pci_unplug_request_processed = true;
>> +        qdev_unplug(DEVICE(pbdev), errp);
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
>>          pbdev = S390_PCI_DEVICE(dev);
>> -        pci_dev = pbdev->pdev;
>> -    } else {
>> -        g_assert_not_reached();
>> -    }
>>  
>> -    switch (pbdev->state) {
>> -    case ZPCI_FS_RESERVED:
>> -        goto out;
>> -    case ZPCI_FS_STANDBY:
>> -        break;
>> -    default:
>> -        if (pbdev->release_timer) {
>> +        /*
>> +         * If unplug was initially requested for the zPCI device, we
>> +         * first have to redirect to the PCI device, which will in return
>> +         * redirect back to us after performing its checks (if the request
>> +         * is not blocked, e.g. because it's a PCI bridge).
>> +         */
>> +        if (pbdev->pdev && !pbdev->pci_unplug_request_processed) {
>> +            qdev_unplug(DEVICE(pbdev->pdev), errp);
>>              return;
>>          }
>> -        s390_pci_generate_plug_event(HP_EVENT_DECONFIGURE_REQUEST,
>> -                                     pbdev->fh, pbdev->fid);
>> -        pbdev->release_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>> -                                            s390_pcihost_timer_cb,
>> -                                            pbdev);
>> -        timer_mod(pbdev->release_timer,
>> -                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + HOT_UNPLUG_TIMEOUT);
>> -        return;
>> -    }
>> +        pbdev->pci_unplug_request_processed = false;
>>  
>> -    if (pbdev->release_timer) {
>> -        timer_del(pbdev->release_timer);
>> -        timer_free(pbdev->release_timer);
>> -        pbdev->release_timer = NULL;
>> +        switch (pbdev->state) {
>> +        case ZPCI_FS_STANDBY:
>> +        case ZPCI_FS_RESERVED:
>> +            s390_pci_perform_unplug(pbdev);
>> +            break;
>> +        default:
>> +            if (pbdev->release_timer) {
>> +                return;
>> +            }
>> +            s390_pci_generate_plug_event(HP_EVENT_DECONFIGURE_REQUEST,
>> +                                         pbdev->fh, pbdev->fid);
>> +            pbdev->release_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>> +                                                s390_pcihost_timer_cb, pbdev);
>> +            timer_mod(pbdev->release_timer,
>> +                    qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + HOT_UNPLUG_TIMEOUT);
>> +        }
>> +    } else {
>> +        g_assert_not_reached();
>>      }
>> -
>> -    s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
>> -                                 pbdev->fh, pbdev->fid);
>> -    bus = pci_get_bus(pci_dev);
>> -    devfn = pci_dev->devfn;
>> -    object_unparent(OBJECT(pci_dev));
>> -    fmb_timer_free(pbdev);
>> -    s390_pci_msix_free(pbdev);
>> -    s390_pci_iommu_free(s, bus, devfn);
>> -    pbdev->pdev = NULL;
>> -    pbdev->state = ZPCI_FS_RESERVED;
>> -out:
>> -    pbdev->fid = 0;
>> -    QTAILQ_REMOVE(&s->zpci_devs, pbdev, link);
>> -    g_hash_table_remove(s->zpci_table, &pbdev->idx);
>> -    object_unparent(OBJECT(pbdev));
>>  }
> 
> So from what I understand: _unplug_request will deconfigure the device 
> (if !standby and !reserved) and _unplug will remove the device from the 
> guest?

You can think of unplug_request() as something like

"If the guest does not use this device, perform the unplug. Otherwise,
ask the guest ("request") to release the device so we can unplug it".

> 
> Looks good.
> 
>>  
>>  static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
>> @@ -1059,6 +1116,7 @@ static void s390_pcihost_class_init(ObjectClass *klass, void *data)
>>      dc->realize = s390_pcihost_realize;
>>      hc->pre_plug = s390_pcihost_pre_plug;
>>      hc->plug = s390_pcihost_plug;
>> +    hc->unplug_request = s390_pcihost_unplug_request;
>>      hc->unplug = s390_pcihost_unplug;
>>      msi_nonbroken = true;
>>  }
>> diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
>> index dadad1f758..b1a6bb8296 100644
>> --- a/hw/s390x/s390-pci-bus.h
>> +++ b/hw/s390x/s390-pci-bus.h
>> @@ -336,6 +336,7 @@ struct S390PCIBusDevice {
>>      IndAddr *summary_ind;
>>      IndAddr *indicator;
>>      QEMUTimer *release_timer;
>> +    bool pci_unplug_request_processed;
>>      QTAILQ_ENTRY(S390PCIBusDevice) link;
>>  };
>>  
>>
> 
> Patch looks good so far. I only have a few questions above.
> 

Sure, thanks for having a look! BTW, I resend all s390x/pci patches as a
single series yesterday to give a better overview of what I am changing.

Thanks!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-01-31  9:31 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21 13:42 [Qemu-devel] [PATCH v3 0/2] s390x/pci: hotplug handler fixes and reworks David Hildenbrand
2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
2019-01-23 11:03   ` Cornelia Huck
2019-01-23 11:08     ` David Hildenbrand
2019-01-28 11:27       ` Cornelia Huck
2019-01-29 13:31   ` Pierre Morel
2019-01-29 15:14     ` David Hildenbrand
2019-01-29 16:54       ` Pierre Morel
2019-01-29 20:27         ` David Hildenbrand
2019-01-30 19:52   ` [Qemu-devel] [qemu-s390x] " Collin Walling
2019-01-31  9:31     ` David Hildenbrand
2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset David Hildenbrand
2019-01-23 11:05   ` Cornelia Huck
2019-01-28 11:28     ` Cornelia Huck
2019-01-29  0:09       ` [Qemu-devel] [qemu-s390x] " Collin Walling
2019-01-29 10:24         ` David Hildenbrand
2019-01-29 13:50           ` Pierre Morel
2019-01-29 15:11             ` David Hildenbrand
2019-01-29 16:50               ` Pierre Morel
2019-01-29 18:20                 ` David Hildenbrand
2019-01-29 18:37                   ` Cornelia Huck
2019-01-29 18:42                   ` Collin Walling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.