All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] spapr: Migrate CAS reboot flag
@ 2020-01-15 17:48 Greg Kurz
  2020-01-15 18:10 ` Laurent Vivier
  2020-01-15 18:10 ` Cédric Le Goater
  0 siblings, 2 replies; 24+ messages in thread
From: Greg Kurz @ 2020-01-15 17:48 UTC (permalink / raw)
  To: David Gibson
  Cc: Laurent Vivier, Lukas Doktor, qemu-ppc, Cédric Le Goater,
	qemu-devel

Migration can potentially race with CAS reboot. If the migration thread
completes migration after CAS has set spapr->cas_reboot but before the
mainloop could pick up the reset request and reset the machine, the
guest is migrated unrebooted and the destination doesn't reboot it
either because it isn't aware a CAS reboot was needed (eg, because a
device was added before CAS). This likely result in a broken or hung
guest.

Even if it is small, the window between CAS and CAS reboot is enough to
re-qualify spapr->cas_reboot as state that we should migrate. Add a new
subsection for that and always send it when a CAS reboot is pending.
This may cause migration to older QEMUs to fail but it is still better
than end up with a broken guest.

The destination cannot honour the CAS reboot request from a post load
handler because this must be done after the guest is fully restored.
It is thus done from a VM change state handler.

Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
---

This patch is supposed to fix the interrupt controller mode inconsistency
between QEMU and the guest reported in this BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1781315 (requires auth)

Even if interrupt controller selection doesn't involve CAS reboot anymore,
we still have other conditions that require CAS reboot.
---
 hw/ppc/spapr.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 30a5fbd3bea6..bf2763aa16e5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1959,6 +1959,31 @@ static const VMStateDescription vmstate_spapr_dtb = {
     },
 };
 
+static bool spapr_cas_reboot_needed(void *opaque)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(opaque);
+
+    /*
+     * This causes the "spapr_cas_reboot" subsection to always be
+     * sent if migration raced with CAS. This causes older QEMUs
+     * that don't know about the subsection to fail migration but
+     * it is still better than end up with a broken guest on the
+     * destination.
+     */
+    return spapr->cas_reboot;
+}
+
+static const VMStateDescription vmstate_spapr_cas_reboot = {
+    .name = "spapr_cas_reboot",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_cas_reboot_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_BOOL(cas_reboot, SpaprMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -1992,6 +2017,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_dtb,
         &vmstate_spapr_cap_large_decr,
         &vmstate_spapr_cap_ccf_assist,
+        &vmstate_spapr_cas_reboot,
         NULL
     }
 };
@@ -2577,6 +2603,21 @@ static PCIHostState *spapr_create_default_phb(void)
     return PCI_HOST_BRIDGE(dev);
 }
 
+static void spapr_change_state_handler(void *opaque, int running,
+                                       RunState state)
+{
+    SpaprMachineState *spapr = opaque;
+
+    if (running && spapr->cas_reboot) {
+        /*
+         * This happens when resuming from migration if the source
+         * processed a CAS but didn't have time to trigger the CAS
+         * reboot. Do it now.
+         */
+        qemu_system_reset_request(SHUTDOWN_CAUSE_SUBSYSTEM_RESET);
+    }
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void spapr_machine_init(MachineState *machine)
 {
@@ -2970,6 +3011,8 @@ static void spapr_machine_init(MachineState *machine)
 
         kvmppc_spapr_enable_inkernel_multitce();
     }
+
+    qemu_add_vm_change_state_handler(spapr_change_state_handler, spapr);
 }
 
 static int spapr_kvm_type(MachineState *machine, const char *vm_type)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-01-23  5:15 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 17:48 [PATCH] spapr: Migrate CAS reboot flag Greg Kurz
2020-01-15 18:10 ` Laurent Vivier
2020-01-15 18:26   ` Laurent Vivier
2020-01-17 11:49     ` Greg Kurz
2020-01-17 12:10       ` Laurent Vivier
2020-01-17 15:49         ` Greg Kurz
2020-01-16  8:48   ` Greg Kurz
2020-01-16 10:37     ` Laurent Vivier
2020-01-16 12:14       ` Greg Kurz
2020-01-16 18:29         ` Greg Kurz
2020-01-17  9:16           ` David Gibson
2020-01-17 15:44             ` Greg Kurz
2020-01-20  8:04               ` Greg Kurz
2020-01-21  3:43                 ` David Gibson
2020-01-21  9:32                   ` Greg Kurz
2020-01-22  6:50                     ` David Gibson
2020-01-22 10:06                       ` Greg Kurz
2020-01-23  5:08                         ` David Gibson
2020-01-15 18:10 ` Cédric Le Goater
2020-01-21  3:41   ` David Gibson
2020-01-21  6:57     ` Cédric Le Goater
2020-01-21  7:38     ` Greg Kurz
2020-01-22 12:47   ` Greg Kurz
2020-01-22 14:08     ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.