All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@kaod.org>
To: qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	David Gibson <david@gibson.dropbear.id.au>,
	Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Cédric Le Goater" <clg@kaod.org>
Subject: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
Date: Mon,  9 Oct 2017 17:49:28 +0200	[thread overview]
Message-ID: <20171009154930.29095-3-clg@kaod.org> (raw)
In-Reply-To: <20171009154930.29095-1-clg@kaod.org>

When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

If the DECR timer fires after 'stop-self' is called and before the CPU
'stop' state is reached, the nearly-dead CPU will have some work to do
and the guest will crash. This case happens very frequently with the
not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
occasionally fired but after 'stop' state, so no work is to be done
and the guest survives.

I suspect there is a race between the QEMU mainloop triggering the
timers and the TCG CPU thread but I could not quite identify the root
cause. To be safe, let's disable the decrementer interrupt in the LPCR
when the CPU is halted and reenable it when the CPU is restarted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

Changes in v2:

 - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
 - removed the LPCR:PECE* enablement bit when the CPU is initialized
   if it is a secondary

 hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
 target/ppc/translate_init.c | 19 +++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index cdf0b607a0a0..dfdbf1e2c6f8 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -46,6 +46,7 @@
 #include "qemu/cutils.h"
 #include "trace.h"
 #include "hw/ppc/fdt.h"
+#include "target/ppc/cpu-models.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
         kvm_cpu_synchronize_state(cs);
 
         env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
+
+        /* Enable DECR interrupt */
+        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
+            env->spr[SPR_LPCR] |= LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
+        }
+
         env->nip = start;
         env->gpr[3] = r3;
         cs->halted = 0;
@@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
      * no need to bother with specific bits, we just clear it.
      */
     env->msr = 0;
+
+    /* Don't let the decremeter run on a CPU being stopped. This could
+     * deliver an interrupt on a dying CPU and crash the guest.
+     */
+    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
+        env->spr[SPR_LPCR] &= ~LPCR_DEE;
+    } else {
+        /* P7 and P8 both have same bit for DECR */
+        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
+    }
 }
 
 static inline int sysparm_st(target_ulong addr, target_ulong len,
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 0d6379fcc5b4..1a62159843e7 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
     CPUPPCState *env = &cpu->env;
     ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
     ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
+    CPUState *cs = CPU(cpu);
 
     cpu->vhyp = vhyp;
 
@@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
         } else {
             lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
         }
-        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
+        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
                                LPCR_OEE;
+
+        /* Only let the decremeter wake up the boot CPU. The RTAS
+         * command start-cpu will enable it on secondaries.
+         */
+        if (cs == first_cpu) {
+            lpcr->default_value |= LPCR_DEE;
+        }
         break;
     default:
         /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
@@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
          * will work as expected for both implementations
          */
         lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
-                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
+                               LPCR_P8_PECE4;
+
+        /* Only let the decremeter wake up the boot CPU. The RTAS
+         * command start-cpu will enable it on secondaries.
+         */
+        if (cs == first_cpu) {
+            lpcr->default_value |= LPCR_P8_PECE3;
+        }
     }
 
     /* We should be followed by a CPU reset but update the active value
-- 
2.13.6

  parent reply	other threads:[~2017-10-09 15:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-09 15:49 [Qemu-devel] [PATCH v2 0/4] disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper Cédric Le Goater
2017-10-11  6:41   ` David Gibson
2017-10-09 15:49 ` Cédric Le Goater [this message]
2017-10-10  8:08   ` [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged Benjamin Herrenschmidt
2017-10-10 15:56     ` Cédric Le Goater
2017-10-11  6:45   ` David Gibson
2017-10-11 11:55     ` Cédric Le Goater
2017-10-11 22:46       ` David Gibson
2017-10-12  9:25         ` Cédric Le Goater
2017-10-12  9:29           ` Cédric Le Goater
2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest Cédric Le Goater
2017-10-12  4:34   ` Nikunj A Dadhania
2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 4/4] spapr/rtas: do not reset the MSR in stop-self command Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171009154930.29095-3-clg@kaod.org \
    --to=clg@kaod.org \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.