QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration
@ 2020-02-14 15:01 Greg Kurz
  2020-02-14 15:01 ` [PATCH v2 1/2] spapr: Don't use spapr_drc_needed() in CAS code Greg Kurz
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Greg Kurz @ 2020-02-14 15:01 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, Alexey Kardashevskiy, qemu-ppc, qemu-devel

While working on getting rid of CAS reboot, I realized that we currently
don't handle device hot unplug properly in the following situations:

1) if the device is unplugged between boot and CAS, SLOF doesn't handle
   the even, which is a known limitation. The device hence stays around
   forever (specifically, until some other event is emitted and the guest
   eventually completes the unplug or a reboot). Until we can teach SLOF
   to correctly process the full FDT at CAS, we should trigger a CAS reboot,
   like we already do for hotplug.

2) if the guest is migrated after the even was emitted but before the
   guest could process it, the destination is unaware of the pending
   unplug operation and doesn't remove the device when the guests
   releases it. The 'unplug_requested' field of the DRC is actually state
   that should be migrated.

Changes since v1:
   - new spapr_drc_transient() helper that covers pending plug and unplug
     situations for both CAS and migration
   - as a mechanical consequence, fix unplug for CAS an migration in the
     same patch

--
Greg

---

Greg Kurz (2):
      spapr: Don't use spapr_drc_needed() in CAS code
      spapr: Fix handling of unplugged devices during CAS and migration


 hw/ppc/spapr_drc.c         |   43 ++++++++++++++++++++++++++++++++++++-------
 hw/ppc/spapr_hcall.c       |   14 +++++++++-----
 include/hw/ppc/spapr_drc.h |    4 +++-
 3 files changed, 48 insertions(+), 13 deletions(-)



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] spapr: Don't use spapr_drc_needed() in CAS code
  2020-02-14 15:01 [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration Greg Kurz
@ 2020-02-14 15:01 ` Greg Kurz
  2020-02-14 15:01 ` [PATCH v2 2/2] spapr: Fix handling of unplugged devices during CAS and migration Greg Kurz
  2020-02-17  0:31 ` [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration David Gibson
  2 siblings, 0 replies; 4+ messages in thread
From: Greg Kurz @ 2020-02-14 15:01 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, Alexey Kardashevskiy, qemu-ppc, qemu-devel

We currently don't support hotplug of devices between boot and CAS. If
this happens a CAS reboot is triggered. We detect this during CAS using
the spapr_drc_needed() function which is essentially a VMStateDescription
.needed callback. Even if the condition for CAS reboot happens to be the
same as for DRC migration, it looks wrong to piggyback a migration helper
for this.

Introduce a helper with slightly more explicit name and use it in both CAS
and DRC migration code. Since a subsequent patch will enhance this helper
to cover the case of hot unplug, let's go for spapr_drc_transient(). While
here convert spapr_hotplugged_dev_before_cas() to the "transient" wording as
well.

This doesn't change any behaviour.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
v2: - spapr_drc_transient() helper
---
 hw/ppc/spapr_drc.c         |   20 ++++++++++++++------
 hw/ppc/spapr_hcall.c       |   14 +++++++++-----
 include/hw/ppc/spapr_drc.h |    4 +++-
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index fc62e049010f..4c35ce7c5c37 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -456,23 +456,31 @@ void spapr_drc_reset(SpaprDrc *drc)
     }
 }
 
-bool spapr_drc_needed(void *opaque)
+bool spapr_drc_transient(SpaprDrc *drc)
 {
-    SpaprDrc *drc = (SpaprDrc *)opaque;
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
-    /* If no dev is plugged in there is no need to migrate the DRC state */
+    /*
+     * If no dev is plugged in there is no need to migrate the DRC state
+     * nor to reset the DRC at CAS.
+     */
     if (!drc->dev) {
         return false;
     }
 
     /*
-     * We need to migrate the state if it's not equal to the expected
-     * long-term state, which is the same as the coldplugged initial
-     * state */
+     * We need to reset the DRC at CAS or to migrate the DRC state if it's
+     * not equal to the expected long-term state, which is the same as the
+     * coldplugged initial state.
+     */
     return (drc->state != drck->ready_state);
 }
 
+static bool spapr_drc_needed(void *opaque)
+{
+    return spapr_drc_transient(opaque);
+}
+
 static const VMStateDescription vmstate_spapr_drc = {
     .name = "spapr_drc",
     .version_id = 1,
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index b8bb66b5c0d4..6db3dbde9c92 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1640,20 +1640,24 @@ static uint32_t cas_check_pvr(SpaprMachineState *spapr, PowerPCCPU *cpu,
     return best_compat;
 }
 
-static bool spapr_hotplugged_dev_before_cas(void)
+static bool spapr_transient_dev_before_cas(void)
 {
-    Object *drc_container, *obj;
+    Object *drc_container;
     ObjectProperty *prop;
     ObjectPropertyIterator iter;
 
     drc_container = container_get(object_get_root(), "/dr-connector");
     object_property_iter_init(&iter, drc_container);
     while ((prop = object_property_iter_next(&iter))) {
+        SpaprDrc *drc;
+
         if (!strstart(prop->type, "link<", NULL)) {
             continue;
         }
-        obj = object_property_get_link(drc_container, prop->name, NULL);
-        if (spapr_drc_needed(obj)) {
+        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
+                                                          prop->name, NULL));
+
+        if (spapr_drc_transient(drc)) {
             return true;
         }
     }
@@ -1830,7 +1834,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
 
     spapr_irq_update_active_intc(spapr);
 
-    if (spapr_hotplugged_dev_before_cas()) {
+    if (spapr_transient_dev_before_cas()) {
         spapr->cas_reboot = true;
     }
 
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index df3d958a66a2..21af8deac13f 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -278,7 +278,9 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
 
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d, Error **errp);
 void spapr_drc_detach(SpaprDrc *drc);
-bool spapr_drc_needed(void *opaque);
+
+/* Returns true if a hot plug/unplug request is pending */
+bool spapr_drc_transient(SpaprDrc *drc);
 
 static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
 {



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] spapr: Fix handling of unplugged devices during CAS and migration
  2020-02-14 15:01 [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration Greg Kurz
  2020-02-14 15:01 ` [PATCH v2 1/2] spapr: Don't use spapr_drc_needed() in CAS code Greg Kurz
@ 2020-02-14 15:01 ` Greg Kurz
  2020-02-17  0:31 ` [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration David Gibson
  2 siblings, 0 replies; 4+ messages in thread
From: Greg Kurz @ 2020-02-14 15:01 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, Alexey Kardashevskiy, qemu-ppc, qemu-devel

We already detect if a device is being hot plugged before CAS to trigger
a CAS reboot and during migration to migrate the state of the associated
DRC. But hot unplugging a device is also an asynchronous operation that
requires the guest to take action. This means that if the guest is migrated
after the hot unplug event was sent but before it could release the device
with RTAS, the destination QEMU doesn't know about the pending unplug
operation and doesn't actually remove the device when the guest finally
releases it.

Similarly, if the unplug request is fired before CAS, the guest isn't
notified of the change, just like with hotplug. It ends up booting with
the device still present in the DT and configures it, just like it was
never removed. Even weirder, since the event is still queued, it will
be eventually processed when some other unrelated event is posted to
the guest.

Enhance spapr_drc_transient() to also return true if an unplug request is
pending. This fixes the issue at CAS with a CAS reboot request and
causes the DRC state to be migrated. Some extra care is still needed to
inform the destination that an unplug request is pending : migrate the
unplug_requested field of the DRC in an optional subsection. This might
break backwards migration, but this is still better than ending with
an inconsistent guest.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr_drc.c |   25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 4c35ce7c5c37..e373d342eb84 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -456,6 +456,22 @@ void spapr_drc_reset(SpaprDrc *drc)
     }
 }
 
+static bool spapr_drc_unplug_requested_needed(void *opaque)
+{
+    return spapr_drc_unplug_requested(opaque);
+}
+
+static const VMStateDescription vmstate_spapr_drc_unplug_requested = {
+    .name = "spapr_drc/unplug_requested",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_drc_unplug_requested_needed,
+    .fields  = (VMStateField []) {
+        VMSTATE_BOOL(unplug_requested, SpaprDrc),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 bool spapr_drc_transient(SpaprDrc *drc)
 {
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
@@ -471,9 +487,10 @@ bool spapr_drc_transient(SpaprDrc *drc)
     /*
      * We need to reset the DRC at CAS or to migrate the DRC state if it's
      * not equal to the expected long-term state, which is the same as the
-     * coldplugged initial state.
+     * coldplugged initial state, or if an unplug request is pending.
      */
-    return (drc->state != drck->ready_state);
+    return drc->state != drck->ready_state ||
+        spapr_drc_unplug_requested(drc);
 }
 
 static bool spapr_drc_needed(void *opaque)
@@ -489,6 +506,10 @@ static const VMStateDescription vmstate_spapr_drc = {
     .fields  = (VMStateField []) {
         VMSTATE_UINT32(state, SpaprDrc),
         VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &vmstate_spapr_drc_unplug_requested,
+        NULL
     }
 };
 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration
  2020-02-14 15:01 [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration Greg Kurz
  2020-02-14 15:01 ` [PATCH v2 1/2] spapr: Don't use spapr_drc_needed() in CAS code Greg Kurz
  2020-02-14 15:01 ` [PATCH v2 2/2] spapr: Fix handling of unplugged devices during CAS and migration Greg Kurz
@ 2020-02-17  0:31 ` David Gibson
  2 siblings, 0 replies; 4+ messages in thread
From: David Gibson @ 2020-02-17  0:31 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Laurent Vivier, Alexey Kardashevskiy, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

On Fri, Feb 14, 2020 at 04:01:16PM +0100, Greg Kurz wrote:
> While working on getting rid of CAS reboot, I realized that we currently
> don't handle device hot unplug properly in the following situations:
> 
> 1) if the device is unplugged between boot and CAS, SLOF doesn't handle
>    the even, which is a known limitation. The device hence stays around
>    forever (specifically, until some other event is emitted and the guest
>    eventually completes the unplug or a reboot). Until we can teach SLOF
>    to correctly process the full FDT at CAS, we should trigger a CAS reboot,
>    like we already do for hotplug.
> 
> 2) if the guest is migrated after the even was emitted but before the
>    guest could process it, the destination is unaware of the pending
>    unplug operation and doesn't remove the device when the guests
>    releases it. The 'unplug_requested' field of the DRC is actually state
>    that should be migrated.
> 
> Changes since v1:
>    - new spapr_drc_transient() helper that covers pending plug and unplug
>      situations for both CAS and migration
>    - as a mechanical consequence, fix unplug for CAS an migration in the
>      same patch

Applied to ppc-for-5.0, thanks.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 15:01 [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration Greg Kurz
2020-02-14 15:01 ` [PATCH v2 1/2] spapr: Don't use spapr_drc_needed() in CAS code Greg Kurz
2020-02-14 15:01 ` [PATCH v2 2/2] spapr: Fix handling of unplugged devices during CAS and migration Greg Kurz
2020-02-17  0:31 ` [PATCH v2 0/2] spapr: Fix device unplug vs CAS or migration David Gibson

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git