All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] disable the decrementer interrupt when a CPU is unplugged
@ 2017-10-09 15:49 Cédric Le Goater
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper Cédric Le Goater
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-09 15:49 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Nikunj A Dadhania,
	Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Hello,

When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

If the DECR timer fires after 'stop-self' is called and before the CPU
'stop' state is reached, the nearly-dead CPU will have some work to do
and the guest will crash. This case happens very frequently with the
not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
occasionally fired but after 'stop' state, so no work is to be done
and the guest survives.

I suspect there is a race between the QEMU mainloop triggering the
timers and the TCG CPU thread but I could not quite identify the root
cause. To be safe, let's disable the decrementer interrupt in the LPCR
when the CPU is halted and reenable it when the CPU is restarted.
Reseting the MSR is now pointless, so remove this dubious workaround.

Thanks,

C.

Changes in v2:

 - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
 - removed the LPCR:PECE* enablement bit when the CPU is initialized
   if it is a secondary
 - included Nikunj's fix to reboot SMP TCG guests

Cédric Le Goater (4):
  target/ppc: export ppc_cpu_pvr_match() helper
  spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  spapr/rtas: fix reboot of a a SMP TCG guest
  spapr/rtas: do not reset the MSR in stop-self command

 hw/ppc/spapr_cpu_core.c     | 12 ++++++++++++
 hw/ppc/spapr_rtas.c         | 28 +++++++++++++++++++---------
 include/hw/ppc/ppc.h        |  1 +
 target/ppc/machine.c        |  5 +++--
 target/ppc/translate_init.c | 19 +++++++++++++++++--
 5 files changed, 52 insertions(+), 13 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper
  2017-10-09 15:49 [Qemu-devel] [PATCH v2 0/4] disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
@ 2017-10-09 15:49 ` Cédric Le Goater
  2017-10-11  6:41   ` David Gibson
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-09 15:49 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Nikunj A Dadhania,
	Benjamin Herrenschmidt
  Cc: Cédric Le Goater

It will be used to test the family of a CPU.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/ppc.h | 1 +
 target/ppc/machine.c | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index 4e7fe110d67b..01e32fa03d2e 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -107,4 +107,5 @@ enum {
 void ppc_booke_timers_init(PowerPCCPU *cpu, uint32_t freq, uint32_t flags);
 
 void ppc_cpu_parse_features(const char *cpu_model);
+bool ppc_cpu_pvr_match(PowerPCCPU *cpu, uint32_t pvr);
 #endif
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 384caee800e6..6085a8c25fd3 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -4,6 +4,7 @@
 #include "exec/exec-all.h"
 #include "hw/hw.h"
 #include "hw/boards.h"
+#include "hw/ppc/ppc.h"
 #include "sysemu/kvm.h"
 #include "helper_regs.h"
 #include "mmu-hash64.h"
@@ -210,7 +211,7 @@ static int cpu_pre_save(void *opaque)
  * between sufficiently similar PVRs, as determined by the CPU class's
  * pvr_match() hook.
  */
-static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr)
+bool ppc_cpu_pvr_match(PowerPCCPU *cpu, uint32_t pvr)
 {
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 
@@ -247,7 +248,7 @@ static int cpu_post_load(void *opaque, int version_id)
     } else
 #endif
     {
-        if (!pvr_match(cpu, env->spr[SPR_PVR])) {
+        if (!ppc_cpu_pvr_match(cpu, env->spr[SPR_PVR])) {
             return -1;
         }
     }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-09 15:49 [Qemu-devel] [PATCH v2 0/4] disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper Cédric Le Goater
@ 2017-10-09 15:49 ` Cédric Le Goater
  2017-10-10  8:08   ` Benjamin Herrenschmidt
  2017-10-11  6:45   ` David Gibson
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest Cédric Le Goater
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 4/4] spapr/rtas: do not reset the MSR in stop-self command Cédric Le Goater
  3 siblings, 2 replies; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-09 15:49 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Nikunj A Dadhania,
	Benjamin Herrenschmidt
  Cc: Cédric Le Goater

When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

If the DECR timer fires after 'stop-self' is called and before the CPU
'stop' state is reached, the nearly-dead CPU will have some work to do
and the guest will crash. This case happens very frequently with the
not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
occasionally fired but after 'stop' state, so no work is to be done
and the guest survives.

I suspect there is a race between the QEMU mainloop triggering the
timers and the TCG CPU thread but I could not quite identify the root
cause. To be safe, let's disable the decrementer interrupt in the LPCR
when the CPU is halted and reenable it when the CPU is restarted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

Changes in v2:

 - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
 - removed the LPCR:PECE* enablement bit when the CPU is initialized
   if it is a secondary

 hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
 target/ppc/translate_init.c | 19 +++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index cdf0b607a0a0..dfdbf1e2c6f8 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -46,6 +46,7 @@
 #include "qemu/cutils.h"
 #include "trace.h"
 #include "hw/ppc/fdt.h"
+#include "target/ppc/cpu-models.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
         kvm_cpu_synchronize_state(cs);
 
         env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
+
+        /* Enable DECR interrupt */
+        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
+            env->spr[SPR_LPCR] |= LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
+        }
+
         env->nip = start;
         env->gpr[3] = r3;
         cs->halted = 0;
@@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
      * no need to bother with specific bits, we just clear it.
      */
     env->msr = 0;
+
+    /* Don't let the decremeter run on a CPU being stopped. This could
+     * deliver an interrupt on a dying CPU and crash the guest.
+     */
+    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
+        env->spr[SPR_LPCR] &= ~LPCR_DEE;
+    } else {
+        /* P7 and P8 both have same bit for DECR */
+        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
+    }
 }
 
 static inline int sysparm_st(target_ulong addr, target_ulong len,
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 0d6379fcc5b4..1a62159843e7 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
     CPUPPCState *env = &cpu->env;
     ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
     ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
+    CPUState *cs = CPU(cpu);
 
     cpu->vhyp = vhyp;
 
@@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
         } else {
             lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
         }
-        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
+        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
                                LPCR_OEE;
+
+        /* Only let the decremeter wake up the boot CPU. The RTAS
+         * command start-cpu will enable it on secondaries.
+         */
+        if (cs == first_cpu) {
+            lpcr->default_value |= LPCR_DEE;
+        }
         break;
     default:
         /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
@@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
          * will work as expected for both implementations
          */
         lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
-                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
+                               LPCR_P8_PECE4;
+
+        /* Only let the decremeter wake up the boot CPU. The RTAS
+         * command start-cpu will enable it on secondaries.
+         */
+        if (cs == first_cpu) {
+            lpcr->default_value |= LPCR_P8_PECE3;
+        }
     }
 
     /* We should be followed by a CPU reset but update the active value
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest
  2017-10-09 15:49 [Qemu-devel] [PATCH v2 0/4] disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper Cédric Le Goater
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
@ 2017-10-09 15:49 ` Cédric Le Goater
  2017-10-12  4:34   ` Nikunj A Dadhania
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 4/4] spapr/rtas: do not reset the MSR in stop-self command Cédric Le Goater
  3 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-09 15:49 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Nikunj A Dadhania,
	Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Just like for hot unplugged CPUs, when a guest is rebooted, the
secondary CPUs can be awaken by the decrementer and start entering
SLOF at the same time the boot CPU is.

To be safe, let's disable the decrementer interrupt in the LPCR for
the secondaries.

Based on previous work from Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr_cpu_core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 37beb56e8b18..112868dc39d5 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -20,6 +20,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/hw_accel.h"
 #include "qemu/error-report.h"
+#include "target/ppc/cpu-models.h"
 
 void spapr_cpu_parse_features(sPAPRMachineState *spapr)
 {
@@ -86,6 +87,17 @@ static void spapr_cpu_reset(void *opaque)
     cs->halted = 1;
 
     env->spr[SPR_HIOR] = 0;
+
+    /* Don't let the decremeter wake up CPUs other than the boot
+     * CPUs. this can cause issues when rebooting the guest */
+    if (cs != first_cpu) {
+        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
+            env->spr[SPR_LPCR] &= ~LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
+        }
+    }
 }
 
 static void spapr_cpu_destroy(PowerPCCPU *cpu)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] spapr/rtas: do not reset the MSR in stop-self command
  2017-10-09 15:49 [Qemu-devel] [PATCH v2 0/4] disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
                   ` (2 preceding siblings ...)
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest Cédric Le Goater
@ 2017-10-09 15:49 ` Cédric Le Goater
  3 siblings, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-09 15:49 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Nikunj A Dadhania,
	Benjamin Herrenschmidt
  Cc: Cédric Le Goater

When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

The CPU is now also protected from the decrementer interrupt by the
LPCR:PECE* bits which are disabled in the 'stop-self' RTAS
call. Reseting the MSR is pointless.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_rtas.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index dfdbf1e2c6f8..dc825bc58263 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -210,16 +210,6 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
 
     cs->halted = 1;
     qemu_cpu_kick(cs);
-    /*
-     * While stopping a CPU, the guest calls H_CPPR which
-     * effectively disables interrupts on XICS level.
-     * However decrementer interrupts in TCG can still
-     * wake the CPU up so here we disable interrupts in MSR
-     * as well.
-     * As rtas_start_cpu() resets the whole MSR anyway, there is
-     * no need to bother with specific bits, we just clear it.
-     */
-    env->msr = 0;
 
     /* Don't let the decremeter run on a CPU being stopped. This could
      * deliver an interrupt on a dying CPU and crash the guest.
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
@ 2017-10-10  8:08   ` Benjamin Herrenschmidt
  2017-10-10 15:56     ` Cédric Le Goater
  2017-10-11  6:45   ` David Gibson
  1 sibling, 1 reply; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2017-10-10  8:08 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc, qemu-devel, David Gibson,
	Nikunj A Dadhania

On Mon, 2017-10-09 at 17:49 +0200, Cédric Le Goater wrote:
> When a CPU is stopped with the 'stop-self' RTAS call, its state
> 'halted' is switched to 1 and, in this case, the MSR is not taken into
> account anymore in the cpu_has_work() routine. Only the pending
> hardware interrupts are checked with their LPCR:PECE* enablement bit.
> 
> If the DECR timer fires after 'stop-self' is called and before the CPU
> 'stop' state is reached, the nearly-dead CPU will have some work to do
> and the guest will crash. This case happens very frequently with the
> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
> occasionally fired but after 'stop' state, so no work is to be done
> and the guest survives.
> 
> I suspect there is a race between the QEMU mainloop triggering the
> timers and the TCG CPU thread but I could not quite identify the root
> cause. To be safe, let's disable the decrementer interrupt in the LPCR
> when the CPU is halted and reenable it when the CPU is restarted.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

We should disable external interrupts and doorbells too no ? IE, we
could clear all of PECE in fact.

> ---
> 
> Changes in v2:
> 
>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>    if it is a secondary
> 
>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>  2 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index cdf0b607a0a0..dfdbf1e2c6f8 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -46,6 +46,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "hw/ppc/fdt.h"
> +#include "target/ppc/cpu-models.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>          kvm_cpu_synchronize_state(cs);
>  
>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> +
> +        /* Enable DECR interrupt */
> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> +            env->spr[SPR_LPCR] |= LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> +        }
> +
>          env->nip = start;
>          env->gpr[3] = r3;
>          cs->halted = 0;
> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>       * no need to bother with specific bits, we just clear it.
>       */
>      env->msr = 0;
> +
> +    /* Don't let the decremeter run on a CPU being stopped. This could
> +     * deliver an interrupt on a dying CPU and crash the guest.
> +     */
> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +    } else {
> +        /* P7 and P8 both have same bit for DECR */
> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +    }
>  }
>  
>  static inline int sysparm_st(target_ulong addr, target_ulong len,
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index 0d6379fcc5b4..1a62159843e7 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>      CPUPPCState *env = &cpu->env;
>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
> +    CPUState *cs = CPU(cpu);
>  
>      cpu->vhyp = vhyp;
>  
> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>          } else {
>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>          }
> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>                                 LPCR_OEE;
> +
> +        /* Only let the decremeter wake up the boot CPU. The RTAS
> +         * command start-cpu will enable it on secondaries.
> +         */
> +        if (cs == first_cpu) {
> +            lpcr->default_value |= LPCR_DEE;
> +        }
>          break;
>      default:
>          /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>           * will work as expected for both implementations
>           */
>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
> +                               LPCR_P8_PECE4;
> +
> +        /* Only let the decremeter wake up the boot CPU. The RTAS
> +         * command start-cpu will enable it on secondaries.
> +         */
> +        if (cs == first_cpu) {
> +            lpcr->default_value |= LPCR_P8_PECE3;
> +        }
>      }
>  
>      /* We should be followed by a CPU reset but update the active value

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-10  8:08   ` Benjamin Herrenschmidt
@ 2017-10-10 15:56     ` Cédric Le Goater
  0 siblings, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-10 15:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-ppc, qemu-devel, David Gibson,
	Nikunj A Dadhania

On 10/10/2017 10:08 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-10-09 at 17:49 +0200, Cédric Le Goater wrote:
>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>> account anymore in the cpu_has_work() routine. Only the pending
>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>
>> If the DECR timer fires after 'stop-self' is called and before the CPU
>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>> and the guest will crash. This case happens very frequently with the
>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>> occasionally fired but after 'stop' state, so no work is to be done
>> and the guest survives.
>>
>> I suspect there is a race between the QEMU mainloop triggering the
>> timers and the TCG CPU thread but I could not quite identify the root
>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>> when the CPU is halted and reenable it when the CPU is restarted.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> We should disable external interrupts and doorbells too no ? IE, we
> could clear all of PECE in fact.

and enable them all in 'start-cpu' for secondaries then ? 

C. 


> 
>> ---
>>
>> Changes in v2:
>>
>>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>>    if it is a secondary
>>
>>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>>  2 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index cdf0b607a0a0..dfdbf1e2c6f8 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -46,6 +46,7 @@
>>  #include "qemu/cutils.h"
>>  #include "trace.h"
>>  #include "hw/ppc/fdt.h"
>> +#include "target/ppc/cpu-models.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>>          kvm_cpu_synchronize_state(cs);
>>  
>>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
>> +
>> +        /* Enable DECR interrupt */
>> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>> +            env->spr[SPR_LPCR] |= LPCR_DEE;
>> +        } else {
>> +            /* P7 and P8 both have same bit for DECR */
>> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
>> +        }
>> +
>>          env->nip = start;
>>          env->gpr[3] = r3;
>>          cs->halted = 0;
>> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>       * no need to bother with specific bits, we just clear it.
>>       */
>>      env->msr = 0;
>> +
>> +    /* Don't let the decremeter run on a CPU being stopped. This could
>> +     * deliver an interrupt on a dying CPU and crash the guest.
>> +     */
>> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
>> +    } else {
>> +        /* P7 and P8 both have same bit for DECR */
>> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
>> +    }
>>  }
>>  
>>  static inline int sysparm_st(target_ulong addr, target_ulong len,
>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>> index 0d6379fcc5b4..1a62159843e7 100644
>> --- a/target/ppc/translate_init.c
>> +++ b/target/ppc/translate_init.c
>> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>      CPUPPCState *env = &cpu->env;
>>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
>> +    CPUState *cs = CPU(cpu);
>>  
>>      cpu->vhyp = vhyp;
>>  
>> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>          } else {
>>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>>          }
>> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
>> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>>                                 LPCR_OEE;
>> +
>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>> +         * command start-cpu will enable it on secondaries.
>> +         */
>> +        if (cs == first_cpu) {
>> +            lpcr->default_value |= LPCR_DEE;
>> +        }
>>          break;
>>      default:
>>          /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
>> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>           * will work as expected for both implementations
>>           */
>>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
>> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
>> +                               LPCR_P8_PECE4;
>> +
>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>> +         * command start-cpu will enable it on secondaries.
>> +         */
>> +        if (cs == first_cpu) {
>> +            lpcr->default_value |= LPCR_P8_PECE3;
>> +        }
>>      }
>>  
>>      /* We should be followed by a CPU reset but update the active value

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper Cédric Le Goater
@ 2017-10-11  6:41   ` David Gibson
  0 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2017-10-11  6:41 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Nikunj A Dadhania, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

On Mon, Oct 09, 2017 at 05:49:27PM +0200, Cédric Le Goater wrote:
> It will be used to test the family of a CPU.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

I don't think this is useful, see comments on later patches.

> ---
>  include/hw/ppc/ppc.h | 1 +
>  target/ppc/machine.c | 5 +++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index 4e7fe110d67b..01e32fa03d2e 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -107,4 +107,5 @@ enum {
>  void ppc_booke_timers_init(PowerPCCPU *cpu, uint32_t freq, uint32_t flags);
>  
>  void ppc_cpu_parse_features(const char *cpu_model);
> +bool ppc_cpu_pvr_match(PowerPCCPU *cpu, uint32_t pvr);
>  #endif
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index 384caee800e6..6085a8c25fd3 100644
> --- a/target/ppc/machine.c
> +++ b/target/ppc/machine.c
> @@ -4,6 +4,7 @@
>  #include "exec/exec-all.h"
>  #include "hw/hw.h"
>  #include "hw/boards.h"
> +#include "hw/ppc/ppc.h"
>  #include "sysemu/kvm.h"
>  #include "helper_regs.h"
>  #include "mmu-hash64.h"
> @@ -210,7 +211,7 @@ static int cpu_pre_save(void *opaque)
>   * between sufficiently similar PVRs, as determined by the CPU class's
>   * pvr_match() hook.
>   */
> -static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr)
> +bool ppc_cpu_pvr_match(PowerPCCPU *cpu, uint32_t pvr)
>  {
>      PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>  
> @@ -247,7 +248,7 @@ static int cpu_post_load(void *opaque, int version_id)
>      } else
>  #endif
>      {
> -        if (!pvr_match(cpu, env->spr[SPR_PVR])) {
> +        if (!ppc_cpu_pvr_match(cpu, env->spr[SPR_PVR])) {
>              return -1;
>          }
>      }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
  2017-10-10  8:08   ` Benjamin Herrenschmidt
@ 2017-10-11  6:45   ` David Gibson
  2017-10-11 11:55     ` Cédric Le Goater
  1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2017-10-11  6:45 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Nikunj A Dadhania, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 6057 bytes --]

On Mon, Oct 09, 2017 at 05:49:28PM +0200, Cédric Le Goater wrote:
> When a CPU is stopped with the 'stop-self' RTAS call, its state
> 'halted' is switched to 1 and, in this case, the MSR is not taken into
> account anymore in the cpu_has_work() routine. Only the pending
> hardware interrupts are checked with their LPCR:PECE* enablement bit.
> 
> If the DECR timer fires after 'stop-self' is called and before the CPU
> 'stop' state is reached, the nearly-dead CPU will have some work to do
> and the guest will crash. This case happens very frequently with the
> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
> occasionally fired but after 'stop' state, so no work is to be done
> and the guest survives.
> 
> I suspect there is a race between the QEMU mainloop triggering the
> timers and the TCG CPU thread but I could not quite identify the root
> cause. To be safe, let's disable the decrementer interrupt in the LPCR
> when the CPU is halted and reenable it when the CPU is restarted.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
> 
> Changes in v2:
> 
>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>    if it is a secondary
> 
>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>  2 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index cdf0b607a0a0..dfdbf1e2c6f8 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -46,6 +46,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "hw/ppc/fdt.h"
> +#include "target/ppc/cpu-models.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>          kvm_cpu_synchronize_state(cs);
>  
>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> +
> +        /* Enable DECR interrupt */
> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {

Sorry, I didn't reply to your earlier mail in time.  Going via the PVR
in this way seems bonkers to me - I like it even less than checking
the mmu type.  After all, classifying a bunch of precise models (PVRs)
together by behaviour is kind of exactly what the CPU classes are for,
so using object_dynamic_case() (==instance_of) is a better idea here.

> +            env->spr[SPR_LPCR] |= LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> +        }
> +
>          env->nip = start;
>          env->gpr[3] = r3;
>          cs->halted = 0;

The other option I'm wondering about here is to actually add a
"shutdown" (or something) method to the cpu class, which does whatever
is necessary to put the vcpu into a quiescent state that won't be
woken up unless it's specifically requested.

> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>       * no need to bother with specific bits, we just clear it.
>       */
>      env->msr = 0;
> +
> +    /* Don't let the decremeter run on a CPU being stopped. This could
> +     * deliver an interrupt on a dying CPU and crash the guest.
> +     */
> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +    } else {
> +        /* P7 and P8 both have same bit for DECR */
> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +    }
>  }
>  
>  static inline int sysparm_st(target_ulong addr, target_ulong len,
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index 0d6379fcc5b4..1a62159843e7 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>      CPUPPCState *env = &cpu->env;
>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
> +    CPUState *cs = CPU(cpu);
>  
>      cpu->vhyp = vhyp;
>  
> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>          } else {
>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>          }
> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>                                 LPCR_OEE;

But I guess we'd also need a "set_papr" method to go with that.

> +
> +        /* Only let the decremeter wake up the boot CPU. The RTAS
> +         * command start-cpu will enable it on secondaries.
> +         */
> +        if (cs == first_cpu) {
> +            lpcr->default_value |= LPCR_DEE;
> +        }
>          break;
>      default:
>          /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>           * will work as expected for both implementations
>           */
>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
> +                               LPCR_P8_PECE4;
> +
> +        /* Only let the decremeter wake up the boot CPU. The RTAS
> +         * command start-cpu will enable it on secondaries.
> +         */
> +        if (cs == first_cpu) {
> +            lpcr->default_value |= LPCR_P8_PECE3;
> +        }
>      }
>  
>      /* We should be followed by a CPU reset but update the active value

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-11  6:45   ` David Gibson
@ 2017-10-11 11:55     ` Cédric Le Goater
  2017-10-11 22:46       ` David Gibson
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-11 11:55 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Nikunj A Dadhania, Benjamin Herrenschmidt

On 10/11/2017 08:45 AM, David Gibson wrote:
> On Mon, Oct 09, 2017 at 05:49:28PM +0200, Cédric Le Goater wrote:
>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>> account anymore in the cpu_has_work() routine. Only the pending
>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>
>> If the DECR timer fires after 'stop-self' is called and before the CPU
>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>> and the guest will crash. This case happens very frequently with the
>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>> occasionally fired but after 'stop' state, so no work is to be done
>> and the guest survives.
>>
>> I suspect there is a race between the QEMU mainloop triggering the
>> timers and the TCG CPU thread but I could not quite identify the root
>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>> when the CPU is halted and reenable it when the CPU is restarted.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>
>> Changes in v2:
>>
>>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>>    if it is a secondary
>>
>>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>>  2 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index cdf0b607a0a0..dfdbf1e2c6f8 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -46,6 +46,7 @@
>>  #include "qemu/cutils.h"
>>  #include "trace.h"
>>  #include "hw/ppc/fdt.h"
>> +#include "target/ppc/cpu-models.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>>          kvm_cpu_synchronize_state(cs);
>>  
>>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
>> +
>> +        /* Enable DECR interrupt */
>> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> 
> Sorry, I didn't reply to your earlier mail in time.  Going via the PVR
> in this way seems bonkers to me - I like it even less than checking
> the mmu type.  After all, classifying a bunch of precise models (PVRs)
> together by behaviour is kind of exactly what the CPU classes are for,
> so using object_dynamic_case() (==instance_of) is a better idea here.

hmm, and which type should I use ? we don't have any TYPE_POWER9* we 
could use for a object_dynamic_cast(). I don't think so ? I could use 
the name and strcmp("power9") probably but it looks ugly.

The only thing we have is "CPU_POWERPC_POWER9_BASE" and it only 
applicates to PVR.

May be I don't understand your idea.
 
>> +            env->spr[SPR_LPCR] |= LPCR_DEE;
>> +        } else {
>> +            /* P7 and P8 both have same bit for DECR */
>> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
>> +        }
>> +
>>          env->nip = start;
>>          env->gpr[3] = r3;
>>          cs->halted = 0;
> 
> The other option I'm wondering about here is to actually add a
> "shutdown" (or something) method to the cpu class, which does whatever
> is necessary to put the vcpu into a quiescent state that won't be
> woken up unless it's specifically requested.

yes. That is a good idea. 

Thanks,

C. 


>> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>       * no need to bother with specific bits, we just clear it.
>>       */
>>      env->msr = 0;
>> +
>> +    /* Don't let the decremeter run on a CPU being stopped. This could
>> +     * deliver an interrupt on a dying CPU and crash the guest.
>> +     */
>> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
>> +    } else {
>> +        /* P7 and P8 both have same bit for DECR */
>> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
>> +    }
>>  }
>>  
>>  static inline int sysparm_st(target_ulong addr, target_ulong len,
>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>> index 0d6379fcc5b4..1a62159843e7 100644
>> --- a/target/ppc/translate_init.c
>> +++ b/target/ppc/translate_init.c
>> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>      CPUPPCState *env = &cpu->env;
>>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
>> +    CPUState *cs = CPU(cpu);
>>  
>>      cpu->vhyp = vhyp;
>>  
>> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>          } else {
>>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>>          }
>> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
>> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>>                                 LPCR_OEE;
> 
> But I guess we'd also need a "set_papr" method to go with that.
> 
>> +
>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>> +         * command start-cpu will enable it on secondaries.
>> +         */
>> +        if (cs == first_cpu) {
>> +            lpcr->default_value |= LPCR_DEE;
>> +        }
>>          break;
>>      default:
>>          /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
>> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>           * will work as expected for both implementations
>>           */
>>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
>> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
>> +                               LPCR_P8_PECE4;
>> +
>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>> +         * command start-cpu will enable it on secondaries.
>> +         */
>> +        if (cs == first_cpu) {
>> +            lpcr->default_value |= LPCR_P8_PECE3;
>> +        }
>>      }
>>  
>>      /* We should be followed by a CPU reset but update the active value
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-11 11:55     ` Cédric Le Goater
@ 2017-10-11 22:46       ` David Gibson
  2017-10-12  9:25         ` Cédric Le Goater
  0 siblings, 1 reply; 14+ messages in thread
From: David Gibson @ 2017-10-11 22:46 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Nikunj A Dadhania, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 7406 bytes --]

On Wed, Oct 11, 2017 at 01:55:20PM +0200, Cédric Le Goater wrote:
> On 10/11/2017 08:45 AM, David Gibson wrote:
> > On Mon, Oct 09, 2017 at 05:49:28PM +0200, Cédric Le Goater wrote:
> >> When a CPU is stopped with the 'stop-self' RTAS call, its state
> >> 'halted' is switched to 1 and, in this case, the MSR is not taken into
> >> account anymore in the cpu_has_work() routine. Only the pending
> >> hardware interrupts are checked with their LPCR:PECE* enablement bit.
> >>
> >> If the DECR timer fires after 'stop-self' is called and before the CPU
> >> 'stop' state is reached, the nearly-dead CPU will have some work to do
> >> and the guest will crash. This case happens very frequently with the
> >> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
> >> occasionally fired but after 'stop' state, so no work is to be done
> >> and the guest survives.
> >>
> >> I suspect there is a race between the QEMU mainloop triggering the
> >> timers and the TCG CPU thread but I could not quite identify the root
> >> cause. To be safe, let's disable the decrementer interrupt in the LPCR
> >> when the CPU is halted and reenable it when the CPU is restarted.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>
> >> Changes in v2:
> >>
> >>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
> >>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
> >>    if it is a secondary
> >>
> >>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
> >>  target/ppc/translate_init.c | 19 +++++++++++++++++--
> >>  2 files changed, 37 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index cdf0b607a0a0..dfdbf1e2c6f8 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -46,6 +46,7 @@
> >>  #include "qemu/cutils.h"
> >>  #include "trace.h"
> >>  #include "hw/ppc/fdt.h"
> >> +#include "target/ppc/cpu-models.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
> >>          kvm_cpu_synchronize_state(cs);
> >>  
> >>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> >> +
> >> +        /* Enable DECR interrupt */
> >> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> > 
> > Sorry, I didn't reply to your earlier mail in time.  Going via the PVR
> > in this way seems bonkers to me - I like it even less than checking
> > the mmu type.  After all, classifying a bunch of precise models (PVRs)
> > together by behaviour is kind of exactly what the CPU classes are for,
> > so using object_dynamic_case() (==instance_of) is a better idea here.
> 
> hmm, and which type should I use ? we don't have any TYPE_POWER9* we 
> could use for a object_dynamic_cast(). I don't think so ? I could use 
> the name and strcmp("power9") probably but it looks ugly.

Actually there is, but, yeah, it's a lot less obvious than I thought.
It's constructed by the POWERPC_FAILY macro and will be
"POWER9-family-powerpc64-cpu"

> The only thing we have is "CPU_POWERPC_POWER9_BASE" and it only 
> applicates to PVR.
> 
> May be I don't understand your idea.

Urgh, sorry.  This got much muckier than I thought it would be.  I
think maybe it's best to go back to the mmu type test, and later on we
can fix up both the previously existing test like that, and the new
one to something better.

> >> +            env->spr[SPR_LPCR] |= LPCR_DEE;
> >> +        } else {
> >> +            /* P7 and P8 both have same bit for DECR */
> >> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> >> +        }
> >> +
> >>          env->nip = start;
> >>          env->gpr[3] = r3;
> >>          cs->halted = 0;
> > 
> > The other option I'm wondering about here is to actually add a
> > "shutdown" (or something) method to the cpu class, which does whatever
> > is necessary to put the vcpu into a quiescent state that won't be
> > woken up unless it's specifically requested.
> 
> yes. That is a good idea. 
> 
> Thanks,
> 
> C. 
> 
> 
> >> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> >>       * no need to bother with specific bits, we just clear it.
> >>       */
> >>      env->msr = 0;
> >> +
> >> +    /* Don't let the decremeter run on a CPU being stopped. This could
> >> +     * deliver an interrupt on a dying CPU and crash the guest.
> >> +     */
> >> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> >> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
> >> +    } else {
> >> +        /* P7 and P8 both have same bit for DECR */
> >> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> >> +    }
> >>  }
> >>  
> >>  static inline int sysparm_st(target_ulong addr, target_ulong len,
> >> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> >> index 0d6379fcc5b4..1a62159843e7 100644
> >> --- a/target/ppc/translate_init.c
> >> +++ b/target/ppc/translate_init.c
> >> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
> >>      CPUPPCState *env = &cpu->env;
> >>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
> >>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
> >> +    CPUState *cs = CPU(cpu);
> >>  
> >>      cpu->vhyp = vhyp;
> >>  
> >> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
> >>          } else {
> >>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
> >>          }
> >> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
> >> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
> >>                                 LPCR_OEE;
> > 
> > But I guess we'd also need a "set_papr" method to go with that.
> > 
> >> +
> >> +        /* Only let the decremeter wake up the boot CPU. The RTAS
> >> +         * command start-cpu will enable it on secondaries.
> >> +         */
> >> +        if (cs == first_cpu) {
> >> +            lpcr->default_value |= LPCR_DEE;
> >> +        }
> >>          break;
> >>      default:
> >>          /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
> >> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
> >>           * will work as expected for both implementations
> >>           */
> >>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
> >> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
> >> +                               LPCR_P8_PECE4;
> >> +
> >> +        /* Only let the decremeter wake up the boot CPU. The RTAS
> >> +         * command start-cpu will enable it on secondaries.
> >> +         */
> >> +        if (cs == first_cpu) {
> >> +            lpcr->default_value |= LPCR_P8_PECE3;
> >> +        }
> >>      }
> >>  
> >>      /* We should be followed by a CPU reset but update the active value
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest
  2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest Cédric Le Goater
@ 2017-10-12  4:34   ` Nikunj A Dadhania
  0 siblings, 0 replies; 14+ messages in thread
From: Nikunj A Dadhania @ 2017-10-12  4:34 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc, qemu-devel, David Gibson,
	Benjamin Herrenschmidt

Cédric Le Goater <clg@kaod.org> writes:

> Just like for hot unplugged CPUs, when a guest is rebooted, the
> secondary CPUs can be awaken by the decrementer and start entering
> SLOF at the same time the boot CPU is.
>
> To be safe, let's disable the decrementer interrupt in the LPCR for
> the secondaries.
>
> Based on previous work from Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

> ---
>  hw/ppc/spapr_cpu_core.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 37beb56e8b18..112868dc39d5 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -20,6 +20,7 @@
>  #include "sysemu/numa.h"
>  #include "sysemu/hw_accel.h"
>  #include "qemu/error-report.h"
> +#include "target/ppc/cpu-models.h"
>
>  void spapr_cpu_parse_features(sPAPRMachineState *spapr)
>  {
> @@ -86,6 +87,17 @@ static void spapr_cpu_reset(void *opaque)
>      cs->halted = 1;
>
>      env->spr[SPR_HIOR] = 0;
> +
> +    /* Don't let the decremeter wake up CPUs other than the boot
> +     * CPUs. this can cause issues when rebooting the guest */
> +    if (cs != first_cpu) {
> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>  }
>
>  static void spapr_cpu_destroy(PowerPCCPU *cpu)
> -- 
> 2.13.6

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-11 22:46       ` David Gibson
@ 2017-10-12  9:25         ` Cédric Le Goater
  2017-10-12  9:29           ` Cédric Le Goater
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-12  9:25 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Nikunj A Dadhania, Benjamin Herrenschmidt

On 10/12/2017 12:46 AM, David Gibson wrote:
> On Wed, Oct 11, 2017 at 01:55:20PM +0200, Cédric Le Goater wrote:
>> On 10/11/2017 08:45 AM, David Gibson wrote:
>>> On Mon, Oct 09, 2017 at 05:49:28PM +0200, Cédric Le Goater wrote:
>>>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>>>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>>>> account anymore in the cpu_has_work() routine. Only the pending
>>>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>>>
>>>> If the DECR timer fires after 'stop-self' is called and before the CPU
>>>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>>>> and the guest will crash. This case happens very frequently with the
>>>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>>>> occasionally fired but after 'stop' state, so no work is to be done
>>>> and the guest survives.
>>>>
>>>> I suspect there is a race between the QEMU mainloop triggering the
>>>> timers and the TCG CPU thread but I could not quite identify the root
>>>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>>>> when the CPU is halted and reenable it when the CPU is restarted.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>
>>>> Changes in v2:
>>>>
>>>>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>>>>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>>>>    if it is a secondary
>>>>
>>>>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>>>>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>>>>  2 files changed, 37 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index cdf0b607a0a0..dfdbf1e2c6f8 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -46,6 +46,7 @@
>>>>  #include "qemu/cutils.h"
>>>>  #include "trace.h"
>>>>  #include "hw/ppc/fdt.h"
>>>> +#include "target/ppc/cpu-models.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>>>>          kvm_cpu_synchronize_state(cs);
>>>>  
>>>>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
>>>> +
>>>> +        /* Enable DECR interrupt */
>>>> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>>>
>>> Sorry, I didn't reply to your earlier mail in time.  Going via the PVR
>>> in this way seems bonkers to me - I like it even less than checking
>>> the mmu type.  After all, classifying a bunch of precise models (PVRs)
>>> together by behaviour is kind of exactly what the CPU classes are for,
>>> so using object_dynamic_case() (==instance_of) is a better idea here.
>>
>> hmm, and which type should I use ? we don't have any TYPE_POWER9* we 
>> could use for a object_dynamic_cast(). I don't think so ? I could use 
>> the name and strcmp("power9") probably but it looks ugly.
> 
> Actually there is, but, yeah, it's a lot less obvious than I thought.
> It's constructed by the POWERPC_FAILY macro and will be
> "POWER9-family-powerpc64-cpu"
> 
>> The only thing we have is "CPU_POWERPC_POWER9_BASE" and it only 
>> applicates to PVR.
>>
>> May be I don't understand your idea.
> 
> Urgh, sorry.  This got much muckier than I thought it would be.  I
> think maybe it's best to go back to the mmu type test, and later on we
> can fix up both the previously existing test like that, and the new
> one to something better.

Given that the bits are the same on all processors, why not just use :

    env->spr[SPR_LPCR] |= LPCR_PECE_L_MASK;

and

    env->spr[SPR_LPCR] &= ~LPCR_PECE_L_MASK;

Thanks,

C. 


>>>> +            env->spr[SPR_LPCR] |= LPCR_DEE;
>>>> +        } else {
>>>> +            /* P7 and P8 both have same bit for DECR */
>>>> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
>>>> +        }
>>>> +
>>>>          env->nip = start;
>>>>          env->gpr[3] = r3;
>>>>          cs->halted = 0;
>>>
>>> The other option I'm wondering about here is to actually add a
>>> "shutdown" (or something) method to the cpu class, which does whatever
>>> is necessary to put the vcpu into a quiescent state that won't be
>>> woken up unless it's specifically requested.
>>
>> yes. That is a good idea. 
>>
>> Thanks,
>>
>> C. 
>>
>>
>>>> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>>>       * no need to bother with specific bits, we just clear it.
>>>>       */
>>>>      env->msr = 0;
>>>> +
>>>> +    /* Don't let the decremeter run on a CPU being stopped. This could
>>>> +     * deliver an interrupt on a dying CPU and crash the guest.
>>>> +     */
>>>> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>>>> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
>>>> +    } else {
>>>> +        /* P7 and P8 both have same bit for DECR */
>>>> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
>>>> +    }
>>>>  }
>>>>  
>>>>  static inline int sysparm_st(target_ulong addr, target_ulong len,
>>>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>>>> index 0d6379fcc5b4..1a62159843e7 100644
>>>> --- a/target/ppc/translate_init.c
>>>> +++ b/target/ppc/translate_init.c
>>>> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>>>      CPUPPCState *env = &cpu->env;
>>>>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>>>>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
>>>> +    CPUState *cs = CPU(cpu);
>>>>  
>>>>      cpu->vhyp = vhyp;
>>>>  
>>>> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>>>          } else {
>>>>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>>>>          }
>>>> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
>>>> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>>>>                                 LPCR_OEE;
>>>
>>> But I guess we'd also need a "set_papr" method to go with that.
>>>
>>>> +
>>>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>>>> +         * command start-cpu will enable it on secondaries.
>>>> +         */
>>>> +        if (cs == first_cpu) {
>>>> +            lpcr->default_value |= LPCR_DEE;
>>>> +        }
>>>>          break;
>>>>      default:
>>>>          /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
>>>> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>>>           * will work as expected for both implementations
>>>>           */
>>>>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
>>>> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
>>>> +                               LPCR_P8_PECE4;
>>>> +
>>>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>>>> +         * command start-cpu will enable it on secondaries.
>>>> +         */
>>>> +        if (cs == first_cpu) {
>>>> +            lpcr->default_value |= LPCR_P8_PECE3;
>>>> +        }
>>>>      }
>>>>  
>>>>      /* We should be followed by a CPU reset but update the active value
>>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  2017-10-12  9:25         ` Cédric Le Goater
@ 2017-10-12  9:29           ` Cédric Le Goater
  0 siblings, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2017-10-12  9:29 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Nikunj A Dadhania, Benjamin Herrenschmidt

On 10/12/2017 11:25 AM, Cédric Le Goater wrote:
> On 10/12/2017 12:46 AM, David Gibson wrote:
>> On Wed, Oct 11, 2017 at 01:55:20PM +0200, Cédric Le Goater wrote:
>>> On 10/11/2017 08:45 AM, David Gibson wrote:
>>>> On Mon, Oct 09, 2017 at 05:49:28PM +0200, Cédric Le Goater wrote:
>>>>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>>>>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>>>>> account anymore in the cpu_has_work() routine. Only the pending
>>>>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>>>>
>>>>> If the DECR timer fires after 'stop-self' is called and before the CPU
>>>>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>>>>> and the guest will crash. This case happens very frequently with the
>>>>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>>>>> occasionally fired but after 'stop' state, so no work is to be done
>>>>> and the guest survives.
>>>>>
>>>>> I suspect there is a race between the QEMU mainloop triggering the
>>>>> timers and the TCG CPU thread but I could not quite identify the root
>>>>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>>>>> when the CPU is halted and reenable it when the CPU is restarted.
>>>>>
>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>>> ---
>>>>>
>>>>> Changes in v2:
>>>>>
>>>>>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>>>>>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>>>>>    if it is a secondary
>>>>>
>>>>>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>>>>>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>>>>>  2 files changed, 37 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>> index cdf0b607a0a0..dfdbf1e2c6f8 100644
>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>> @@ -46,6 +46,7 @@
>>>>>  #include "qemu/cutils.h"
>>>>>  #include "trace.h"
>>>>>  #include "hw/ppc/fdt.h"
>>>>> +#include "target/ppc/cpu-models.h"
>>>>>  
>>>>>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>>>>                                     uint32_t token, uint32_t nargs,
>>>>> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>>>>>          kvm_cpu_synchronize_state(cs);
>>>>>  
>>>>>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
>>>>> +
>>>>> +        /* Enable DECR interrupt */
>>>>> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>>>>
>>>> Sorry, I didn't reply to your earlier mail in time.  Going via the PVR
>>>> in this way seems bonkers to me - I like it even less than checking
>>>> the mmu type.  After all, classifying a bunch of precise models (PVRs)
>>>> together by behaviour is kind of exactly what the CPU classes are for,
>>>> so using object_dynamic_case() (==instance_of) is a better idea here.
>>>
>>> hmm, and which type should I use ? we don't have any TYPE_POWER9* we 
>>> could use for a object_dynamic_cast(). I don't think so ? I could use 
>>> the name and strcmp("power9") probably but it looks ugly.
>>
>> Actually there is, but, yeah, it's a lot less obvious than I thought.
>> It's constructed by the POWERPC_FAILY macro and will be
>> "POWER9-family-powerpc64-cpu"
>>
>>> The only thing we have is "CPU_POWERPC_POWER9_BASE" and it only 
>>> applicates to PVR.
>>>
>>> May be I don't understand your idea.
>>
>> Urgh, sorry.  This got much muckier than I thought it would be.  I
>> think maybe it's best to go back to the mmu type test, and later on we
>> can fix up both the previously existing test like that, and the new
>> one to something better.
> 
> Given that the bits are the same on all processors, why not just use :

grummf, P7 reserves bits 47 and 48.

C.


>     env->spr[SPR_LPCR] |= LPCR_PECE_L_MASK;
> 
> and
> 
>     env->spr[SPR_LPCR] &= ~LPCR_PECE_L_MASK;
> 
> Thanks,
> 
> C. 
> 
> 
>>>>> +            env->spr[SPR_LPCR] |= LPCR_DEE;
>>>>> +        } else {
>>>>> +            /* P7 and P8 both have same bit for DECR */
>>>>> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
>>>>> +        }
>>>>> +
>>>>>          env->nip = start;
>>>>>          env->gpr[3] = r3;
>>>>>          cs->halted = 0;
>>>>
>>>> The other option I'm wondering about here is to actually add a
>>>> "shutdown" (or something) method to the cpu class, which does whatever
>>>> is necessary to put the vcpu into a quiescent state that won't be
>>>> woken up unless it's specifically requested.
>>>
>>> yes. That is a good idea. 
>>>
>>> Thanks,
>>>
>>> C. 
>>>
>>>
>>>>> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>>>>       * no need to bother with specific bits, we just clear it.
>>>>>       */
>>>>>      env->msr = 0;
>>>>> +
>>>>> +    /* Don't let the decremeter run on a CPU being stopped. This could
>>>>> +     * deliver an interrupt on a dying CPU and crash the guest.
>>>>> +     */
>>>>> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>>>>> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
>>>>> +    } else {
>>>>> +        /* P7 and P8 both have same bit for DECR */
>>>>> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
>>>>> +    }
>>>>>  }
>>>>>  
>>>>>  static inline int sysparm_st(target_ulong addr, target_ulong len,
>>>>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>>>>> index 0d6379fcc5b4..1a62159843e7 100644
>>>>> --- a/target/ppc/translate_init.c
>>>>> +++ b/target/ppc/translate_init.c
>>>>> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>>>>      CPUPPCState *env = &cpu->env;
>>>>>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>>>>>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
>>>>> +    CPUState *cs = CPU(cpu);
>>>>>  
>>>>>      cpu->vhyp = vhyp;
>>>>>  
>>>>> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>>>>          } else {
>>>>>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>>>>>          }
>>>>> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
>>>>> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>>>>>                                 LPCR_OEE;
>>>>
>>>> But I guess we'd also need a "set_papr" method to go with that.
>>>>
>>>>> +
>>>>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>>>>> +         * command start-cpu will enable it on secondaries.
>>>>> +         */
>>>>> +        if (cs == first_cpu) {
>>>>> +            lpcr->default_value |= LPCR_DEE;
>>>>> +        }
>>>>>          break;
>>>>>      default:
>>>>>          /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
>>>>> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp)
>>>>>           * will work as expected for both implementations
>>>>>           */
>>>>>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2 |
>>>>> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
>>>>> +                               LPCR_P8_PECE4;
>>>>> +
>>>>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>>>>> +         * command start-cpu will enable it on secondaries.
>>>>> +         */
>>>>> +        if (cs == first_cpu) {
>>>>> +            lpcr->default_value |= LPCR_P8_PECE3;
>>>>> +        }
>>>>>      }
>>>>>  
>>>>>      /* We should be followed by a CPU reset but update the active value
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-10-12  9:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-09 15:49 [Qemu-devel] [PATCH v2 0/4] disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 1/4] target/ppc: export ppc_cpu_pvr_match() helper Cédric Le Goater
2017-10-11  6:41   ` David Gibson
2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged Cédric Le Goater
2017-10-10  8:08   ` Benjamin Herrenschmidt
2017-10-10 15:56     ` Cédric Le Goater
2017-10-11  6:45   ` David Gibson
2017-10-11 11:55     ` Cédric Le Goater
2017-10-11 22:46       ` David Gibson
2017-10-12  9:25         ` Cédric Le Goater
2017-10-12  9:29           ` Cédric Le Goater
2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest Cédric Le Goater
2017-10-12  4:34   ` Nikunj A Dadhania
2017-10-09 15:49 ` [Qemu-devel] [PATCH v2 4/4] spapr/rtas: do not reset the MSR in stop-self command Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.