[PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests
@ 2019-10-24  7:43 Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 1/7] Wrapper function to wait on condition for the main loop mutex Ganesh Goudar
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, Ganesh Goudar, arawinda.p, groug

This patch set adds support for FWNMI in PowerKVM guests.

System errors such as SLB multihit and memory errors
that cannot be corrected by hardware is passed on to
the kernel for handling by raising machine check
exception (an NMI). Upon such machine check exceptions,
if the address in error belongs to guest then KVM
invokes guests' 0x200 interrupt vector if the guest
is not FWNMI capable. For FWNMI capable guest
KVM passes the control to QEMU by exiting the guest.

This patch series adds functionality to QEMU to pass
on such machine check exceptions to the FWNMI capable
guest kernel by building an error log and invoking
the guest registered machine check handling routine.

The KVM changes are now part of the upstream kernel
(commit e20bbd3d). This series contain QEMU changes.

Change Log v17:
  - Add fwnmi cap to migration state
  - Reprhase the commit message in patch 2/7

Change Log v16:
  - Fixed coding style problems

Change Log v15:
  - Removed cap_ppc_fwnmi
  - Moved fwnmi registeration to .apply hook
  - Assume SLOF has allocated enough room for rtas error log
  - Using ARRAY_SIZE to end the loop
  - Do not set FWNMI cap in post_load, now its done in .apply hook

Change Log v14:
  - Feature activation moved to a separate patch
  - Fixed issues with migration blocker

Change Log v13:
  - Minor fixes (mostly nits)
  - Moved FWNMI guest registration check from patch 4 to 3.

Change Log v12:
  - Rebased to latest ppc-for-4.2 (SHA b1e8156743)

Change Log v11:
  - Moved FWNMI SPAPR cap defaults to 4.2 class option
  - Fixed issues with handling fwnmi KVM capability

Change Log v10:
  - Reshuffled the patch sequence + minor fixes

Change Log v9:
  - Fixed kvm cap and spapr cap issues

Change Log v8:
  - Added functionality to check FWNMI capability during
    VM migration

---

Aravinda Prasad (7):
  Wrapper function to wait on condition for the main loop mutex
  ppc: spapr: Introduce FWNMI capability
  target/ppc: Handle NMI guest exit
  target/ppc: Build rtas error log upon an MCE
  ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS
    calls
  migration: Include migration support for machine check handling
  ppc: spapr: Activate the FWNMI functionality

 cpus.c                   |   5 +
 hw/ppc/spapr.c           |  52 ++++++++
 hw/ppc/spapr_caps.c      |  34 +++++
 hw/ppc/spapr_events.c    | 269 +++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c      |  85 +++++++++++++
 include/hw/ppc/spapr.h   |  26 +++-
 include/qemu/main-loop.h |   8 ++
 target/ppc/kvm.c         |  24 ++++
 target/ppc/kvm_ppc.h     |   8 ++
 target/ppc/trace-events  |   1 +
 10 files changed, 510 insertions(+), 2 deletions(-)

-- 
2.17.2



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v17 1/7] Wrapper function to wait on condition for the main loop mutex
  2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
@ 2019-10-24  7:43 ` Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 2/7] ppc: spapr: Introduce FWNMI capability Ganesh Goudar
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, arawinda.p, groug

From: Aravinda Prasad <arawinda.p@gmail.com>

Introduce a wrapper function to wait on condition for
the main loop mutex. This function atomically releases
the main loop mutex and causes the calling thread to
block on the condition. This wrapper is required because
qemu_global_mutex is a static variable.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 cpus.c                   | 5 +++++
 include/qemu/main-loop.h | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/cpus.c b/cpus.c
index fabbeca6f4..3c8e423f74 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1897,6 +1897,11 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_cond_wait_iothread(QemuCond *cond)
+{
+    qemu_cond_wait(cond, &qemu_global_mutex);
+}
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index f6ba78ea73..a6d20b0719 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/*
+ * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ *
+ * This function atomically releases the main loop mutex and causes
+ * the calling thread to block on the condition.
+ */
+void qemu_cond_wait_iothread(QemuCond *cond);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v17 2/7] ppc: spapr: Introduce FWNMI capability
  2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 1/7] Wrapper function to wait on condition for the main loop mutex Ganesh Goudar
@ 2019-10-24  7:43 ` Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 3/7] target/ppc: Handle NMI guest exit Ganesh Goudar
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, Ganesh Goudar, arawinda.p, groug

From: Aravinda Prasad <arawinda.p@gmail.com>

Introduce fwnmi an spapr capability and validate it against
the kernels existing capability by trying to enable it.

[eliminate cap_ppc_fwnmi, add fwnmi cap to migration state
 and reprhase the commit message]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
---
 hw/ppc/spapr.c         |  2 ++
 hw/ppc/spapr_caps.c    | 29 +++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  5 ++++-
 target/ppc/kvm.c       |  8 ++++++++
 target/ppc/kvm_ppc.h   |  6 ++++++
 5 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4eb97d3a9b..31a3fb72d4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2068,6 +2068,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_dtb,
         &vmstate_spapr_cap_large_decr,
         &vmstate_spapr_cap_ccf_assist,
+        &vmstate_spapr_cap_fwnmi,
         NULL
     }
 };
@@ -4434,6 +4435,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
     spapr_caps_add_properties(smc, &error_abort);
     smc->irq = &spapr_irq_dual;
     smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 481dfd2a27..976d709210 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -496,6 +496,25 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
     }
 }
 
+static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
+                                Error **errp)
+{
+    if (!val) {
+        return; /* Disabled by default */
+    }
+
+    if (tcg_enabled()) {
+        /*
+         * TCG support may not be correct in some conditions (e.g., in case
+         * of software injected faults like duplicate SLBs).
+         */
+        warn_report("Firmware Assisted Non-Maskable Interrupts not supported in TCG");
+    } else if (kvm_enabled() && (kvmppc_set_fwnmi() != 0)) {
+        error_setg(errp,
+"Firmware Assisted Non-Maskable Interrupts not supported by KVM, try cap-fwnmi-mce=off");
+    }
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
     [SPAPR_CAP_HTM] = {
         .name = "htm",
@@ -595,6 +614,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
         .type = "bool",
         .apply = cap_ccf_assist_apply,
     },
+    [SPAPR_CAP_FWNMI_MCE] = {
+        .name = "fwnmi-mce",
+        .description = "Handle fwnmi machine check exceptions",
+        .index = SPAPR_CAP_FWNMI_MCE,
+        .get = spapr_cap_get_bool,
+        .set = spapr_cap_set_bool,
+        .type = "bool",
+        .apply = cap_fwnmi_mce_apply,
+    },
 };
 
 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
@@ -734,6 +762,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
+SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
 
 void spapr_caps_init(SpaprMachineState *spapr)
 {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index cbd1a4c9f3..af9a9ce6f2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -79,8 +79,10 @@ typedef enum {
 #define SPAPR_CAP_LARGE_DECREMENTER     0x08
 /* Count Cache Flush Assist HW Instruction */
 #define SPAPR_CAP_CCF_ASSIST            0x09
+/* FWNMI machine check handling */
+#define SPAPR_CAP_FWNMI_MCE             0x0A
 /* Num Caps */
-#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
+#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
 
 /*
  * Capability Values
@@ -868,6 +870,7 @@ extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
 extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
 extern const VMStateDescription vmstate_spapr_cap_large_decr;
 extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
+extern const VMStateDescription vmstate_spapr_cap_fwnmi;
 
 static inline uint8_t spapr_get_cap(SpaprMachineState *spapr, int cap)
 {
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 820724cc7d..d56f11a883 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2060,6 +2060,14 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
     }
 }
 
+int kvmppc_set_fwnmi(void)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+    CPUState *cs = CPU(cpu);
+
+    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
+}
+
 int kvmppc_smt_threads(void)
 {
     return cap_ppc_smt ? cap_ppc_smt : 1;
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 98bd7d5da6..5727a5025f 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
 void kvmppc_set_papr(PowerPCCPU *cpu);
 int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
 void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
+int kvmppc_set_fwnmi(void);
 int kvmppc_smt_threads(void);
 void kvmppc_hint_smt_possible(Error **errp);
 int kvmppc_set_smt_threads(int smt);
@@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
 {
 }
 
+static inline int kvmppc_set_fwnmi(void)
+{
+    return -1;
+}
+
 static inline int kvmppc_smt_threads(void)
 {
     return 1;
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v17 3/7] target/ppc: Handle NMI guest exit
  2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 1/7] Wrapper function to wait on condition for the main loop mutex Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 2/7] ppc: spapr: Introduce FWNMI capability Ganesh Goudar
@ 2019-10-24  7:43 ` Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE Ganesh Goudar
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, arawinda.p, groug

From: Aravinda Prasad <arawinda.p@gmail.com>

Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.

[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
    (e20bbd3d and related commits)

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c          |  8 ++++++++
 hw/ppc/spapr_events.c   | 37 +++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h  | 10 ++++++++++
 target/ppc/kvm.c        | 14 ++++++++++++++
 target/ppc/kvm_ppc.h    |  2 ++
 target/ppc/trace-events |  1 +
 6 files changed, 72 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 31a3fb72d4..346ec5ba6c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1745,6 +1745,12 @@ static void spapr_machine_reset(MachineState *machine)
     first_ppc_cpu->env.gpr[5] = 0;
 
     spapr->cas_reboot = false;
+
+    spapr->mc_status = -1;
+    spapr->guest_machine_check_addr = -1;
+
+    /* Signal all vCPUs waiting on this condition */
+    qemu_cond_broadcast(&spapr->mc_delivery_cond);
 }
 
 static void spapr_create_nvram(SpaprMachineState *spapr)
@@ -3044,6 +3050,8 @@ static void spapr_machine_init(MachineState *machine)
 
         kvmppc_spapr_enable_inkernel_multitce();
     }
+
+    qemu_cond_init(&spapr->mc_delivery_cond);
 }
 
 static int spapr_kvm_type(MachineState *machine, const char *vm_type)
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 0e4c19523a..0ce96b86be 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -40,6 +40,7 @@
 #include "hw/ppc/spapr_drc.h"
 #include "qemu/help_option.h"
 #include "qemu/bcd.h"
+#include "qemu/main-loop.h"
 #include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
 
@@ -621,6 +622,42 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+void spapr_mce_req_event(PowerPCCPU *cpu)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
+
+    if (spapr->guest_machine_check_addr == -1) {
+        /*
+         * This implies that we have hit a machine check either when the
+         * guest has not registered FWNMI (i.e., "ibm,nmi-register" not
+         * called) or between system reset and "ibm,nmi-register".
+         * Fall back to the old machine check behavior in such cases.
+         */
+        cs->exception_index = POWERPC_EXCP_MCHECK;
+        ppc_cpu_do_interrupt(cs);
+        return;
+    }
+
+    while (spapr->mc_status != -1) {
+        /*
+         * Check whether the same CPU got machine check error
+         * while still handling the mc error (i.e., before
+         * that CPU called "ibm,nmi-interlock")
+         */
+        if (spapr->mc_status == cpu->vcpu_id) {
+            qemu_system_guest_panicked(NULL);
+            return;
+        }
+        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
+        /* Meanwhile if the system is reset, then just return */
+        if (spapr->guest_machine_check_addr == -1) {
+            return;
+        }
+    }
+    spapr->mc_status = cpu->vcpu_id;
+}
+
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
                             uint32_t token, uint32_t nargs,
                             target_ulong args,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index af9a9ce6f2..1f5eb8c856 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -190,6 +190,15 @@ struct SpaprMachineState {
      * occurs during the unplug process. */
     QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
 
+    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
+    target_ulong guest_machine_check_addr;
+    /*
+     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
+     * handling the mc.
+     */
+    int mc_status;
+    QemuCond mc_delivery_cond;
+
     /*< public >*/
     char *kvm_type;
     char *host_model;
@@ -803,6 +812,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
+void spapr_mce_req_event(PowerPCCPU *cpu);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index d56f11a883..2d8db6d832 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1708,6 +1708,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ret = 0;
         break;
 
+    case KVM_EXIT_NMI:
+        trace_kvm_handle_nmi_exception();
+        ret = kvm_handle_nmi(cpu, run);
+        break;
+
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
@@ -2798,6 +2803,15 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
     return data & 0xffff;
 }
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
+{
+    cpu_synchronize_state(CPU(cpu));
+
+    spapr_mce_req_event(cpu);
+
+    return 0;
+}
+
 int kvmppc_enable_hwrng(void)
 {
     if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 5727a5025f..e88ee786a3 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -83,6 +83,8 @@ void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
 void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
 void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset);
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
+
 #else
 
 static inline uint32_t kvmppc_get_tbfreq(void)
diff --git a/target/ppc/trace-events b/target/ppc/trace-events
index 3dc6740706..6d15aa90b4 100644
--- a/target/ppc/trace-events
+++ b/target/ppc/trace-events
@@ -28,3 +28,4 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
 kvm_handle_epr(void) "handle epr"
 kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
 kvm_handle_debug_exception(void) "handle debug exception"
+kvm_handle_nmi_exception(void) "handle NMI exception"
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE
  2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
                   ` (2 preceding siblings ...)
  2019-10-24  7:43 ` [PATCH v17 3/7] target/ppc: Handle NMI guest exit Ganesh Goudar
@ 2019-10-24  7:43 ` Ganesh Goudar
  2019-11-04 16:10   ` David Gibson
  2019-10-24  7:43 ` [PATCH v17 5/7] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Ganesh Goudar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, Ganesh Goudar, arawinda.p, groug

From: Aravinda Prasad <arawinda.p@gmail.com>

Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.

This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.

[Assume SLOF has allocated enough room for rtas error log]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
---
 hw/ppc/spapr_events.c  | 220 ++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_rtas.c    |  26 +++++
 include/hw/ppc/spapr.h |   6 +-
 target/ppc/kvm.c       |   4 +-
 4 files changed, 253 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 0ce96b86be..db44e09154 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -214,6 +214,104 @@ struct hp_extended_log {
     struct rtas_event_log_v6_hp hp;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_mc {
+#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
+    struct rtas_event_log_v6_section_header hdr;
+    uint32_t fru_id;
+    uint32_t proc_id;
+    uint8_t error_type;
+#define RTAS_LOG_V6_MC_TYPE_UE                           0
+#define RTAS_LOG_V6_MC_TYPE_SLB                          1
+#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
+#define RTAS_LOG_V6_MC_TYPE_TLB                          4
+#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
+#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
+    uint8_t sub_err_type;
+#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
+#define RTAS_LOG_V6_MC_UE_IFETCH                         1
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
+#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
+#define RTAS_LOG_V6_MC_SLB_PARITY                        0
+#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
+#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
+#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
+#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
+#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
+#define RTAS_LOG_V6_MC_TLB_PARITY                        1
+#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
+#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
+    uint8_t reserved_1[6];
+    uint64_t effective_address;
+    uint64_t logical_address;
+} QEMU_PACKED;
+
+struct mc_extended_log {
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_mc mc;
+} QEMU_PACKED;
+
+struct MC_ierror_table {
+    unsigned long srr1_mask;
+    unsigned long srr1_value;
+    bool nip_valid; /* nip is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_ierror_table mc_ierror_table[] = {
+{ 0x00000000081c0000, 0x0000000000040000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000080000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x00000000000c0000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000100000, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000140000, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000180000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
+
+struct MC_derror_table {
+    unsigned long dsisr_value;
+    bool dar_valid; /* dar is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_derror_table mc_derror_table[] = {
+{ 0x00008000, false,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00004000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000800, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000400, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000080, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000100, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
+
+#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
+
 typedef enum EventClass {
     EVENT_CLASS_INTERNAL_ERRORS     = 0,
     EVENT_CLASS_EPOW                = 1,
@@ -622,7 +720,125 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
-void spapr_mce_req_event(PowerPCCPU *cpu)
+static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
+                                        struct mc_extended_log *ext_elog)
+{
+    int i;
+    CPUPPCState *env = &cpu->env;
+    uint32_t summary;
+    uint64_t dsisr = env->spr[SPR_DSISR];
+
+    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
+    if (recovered) {
+        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
+    } else {
+        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
+    }
+
+    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
+        for (i = 0; i < ARRAY_SIZE(mc_derror_table); i++) {
+            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_derror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
+            if (mc_derror_table[i].dar_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
+            }
+
+            summary |= mc_derror_table[i].initiator
+                        | mc_derror_table[i].severity;
+
+            return summary;
+        }
+    } else {
+        for (i = 0; i < ARRAY_SIZE(mc_ierror_table); i++) {
+            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
+                    mc_ierror_table[i].srr1_value) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
+            if (mc_ierror_table[i].nip_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
+            }
+
+            summary |= mc_ierror_table[i].initiator
+                        | mc_ierror_table[i].severity;
+
+            return summary;
+        }
+    }
+
+    summary |= RTAS_LOG_INITIATOR_CPU;
+    return summary;
+}
+
+static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
+    uint64_t rtas_addr;
+    CPUPPCState *env = &cpu->env;
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    target_ulong msr = 0;
+    struct rtas_error_log log;
+    struct mc_extended_log *ext_elog;
+    uint32_t summary;
+
+    /*
+     * Properly set bits in MSR before we invoke the handler.
+     * SRR0/1, DAR and DSISR are properly set by KVM
+     */
+    if (!(*pcc->interrupts_big_endian)(cpu)) {
+        msr |= (1ULL << MSR_LE);
+    }
+
+    if (env->msr & (1ULL << MSR_SF)) {
+        msr |= (1ULL << MSR_SF);
+    }
+
+    msr |= (1ULL << MSR_ME);
+
+    ext_elog = g_malloc0(sizeof(*ext_elog));
+    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
+
+    log.summary = cpu_to_be32(summary);
+    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
+
+    spapr_init_v6hdr(&ext_elog->v6hdr);
+    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
+    ext_elog->mc.hdr.section_length =
+                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
+    ext_elog->mc.hdr.section_version = 1;
+
+    /* get rtas addr from fdt */
+    rtas_addr = spapr_get_rtas_addr();
+    if (!rtas_addr) {
+        /* Unable to fetch rtas_addr. Hence reset the guest */
+        ppc_cpu_do_system_reset(cs);
+        g_free(ext_elog);
+        return;
+    }
+
+    stq_be_phys(&address_space_memory, rtas_addr + RTAS_ERROR_LOG_OFFSET,
+                env->gpr[3]);
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
+                              sizeof(env->gpr[3]), &log, sizeof(log));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
+                              sizeof(env->gpr[3]) + sizeof(log), ext_elog,
+                              sizeof(*ext_elog));
+
+    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
+    env->msr = msr;
+    env->nip = spapr->guest_machine_check_addr;
+
+    g_free(ext_elog);
+}
+
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     CPUState *cs = CPU(cpu);
@@ -656,6 +872,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
         }
     }
     spapr->mc_status = cpu->vcpu_id;
+
+    spapr_mce_dispatch_elog(cpu, recovered);
 }
 
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 8d8d8cdfcb..2c066a372d 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -477,6 +477,32 @@ void spapr_dt_rtas_tokens(void *fdt, int rtas)
     }
 }
 
+hwaddr spapr_get_rtas_addr(void)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int rtas_node;
+    const fdt32_t *rtas_data;
+    void *fdt = spapr->fdt_blob;
+
+    /* fetch rtas addr from fdt */
+    rtas_node = fdt_path_offset(fdt, "/rtas");
+    if (rtas_node < 0) {
+        return 0;
+    }
+
+    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
+    if (!rtas_data) {
+        return 0;
+    }
+
+    /*
+     * We assume that the OS called RTAS instantiate-rtas, but some other
+     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
+     * as SLOF only supports 32-bit variant.
+     */
+    return (hwaddr)fdt32_to_cpu(*rtas_data);
+}
+
 static void core_rtas_register_types(void)
 {
     spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 1f5eb8c856..4afa8d4d09 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -724,6 +724,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
 
 #define RTAS_ERROR_LOG_MAX      2048
 
+/* Offset from rtas-base where error log is placed */
+#define RTAS_ERROR_LOG_OFFSET       0x30
+
 #define RTAS_EVENT_SCAN_RATE    1
 
 /* This helper should be used to encode interrupt specifiers when the related
@@ -812,7 +815,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
-void spapr_mce_req_event(PowerPCCPU *cpu);
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
@@ -903,4 +906,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
+hwaddr spapr_get_rtas_addr(void);
 #endif /* HW_SPAPR_H */
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 2d8db6d832..9a902c1064 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2805,9 +2805,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 
 int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
 {
+    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
+
     cpu_synchronize_state(CPU(cpu));
 
-    spapr_mce_req_event(cpu);
+    spapr_mce_req_event(cpu, recovered);
 
     return 0;
 }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v17 5/7] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
                   ` (3 preceding siblings ...)
  2019-10-24  7:43 ` [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE Ganesh Goudar
@ 2019-10-24  7:43 ` Ganesh Goudar
  2019-11-19  2:39   ` [PATCH v17 5/7] ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" " David Gibson
  2019-10-24  7:43 ` [PATCH v17 6/7] migration: Include migration support for machine check handling Ganesh Goudar
  2019-10-24  7:43 ` [PATCH v17 7/7] ppc: spapr: Activate the FWNMI functionality Ganesh Goudar
  6 siblings, 1 reply; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, Ganesh Goudar, arawinda.p, groug

From: Aravinda Prasad <arawinda.p@gmail.com>

This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.

The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.

This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed.

[Move fwnmi registeration to .apply hook]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
---
 hw/ppc/spapr_caps.c    |  9 +++++--
 hw/ppc/spapr_rtas.c    | 57 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  5 +++-
 3 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 976d709210..1675ebd45e 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -509,9 +509,14 @@ static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
          * of software injected faults like duplicate SLBs).
          */
         warn_report("Firmware Assisted Non-Maskable Interrupts not supported in TCG");
-    } else if (kvm_enabled() && (kvmppc_set_fwnmi() != 0)) {
-        error_setg(errp,
+    } else if (kvm_enabled()) {
+        if (!kvmppc_set_fwnmi()) {
+            /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
+            spapr_fwnmi_register();
+        } else {
+            error_setg(errp,
 "Firmware Assisted Non-Maskable Interrupts not supported by KVM, try cap-fwnmi-mce=off");
+        }
     }
 }
 
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 2c066a372d..0328b1f341 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -400,6 +400,55 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
     rtas_st(rets, 1, 100);
 }
 
+static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
+                                  SpaprMachineState *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args,
+                                  uint32_t nret, target_ulong rets)
+{
+    hwaddr rtas_addr = spapr_get_rtas_addr();
+
+    if (!rtas_addr) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    spapr->guest_machine_check_addr = rtas_ld(args, 1);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
+static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    if (spapr->guest_machine_check_addr == -1) {
+        /* NMI register not called */
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    if (spapr->mc_status != cpu->vcpu_id) {
+        /* The vCPU that hit the NMI should invoke "ibm,nmi-interlock" */
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    /*
+     * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
+     * hence unset mc_status.
+     */
+    spapr->mc_status = -1;
+    qemu_cond_signal(&spapr->mc_delivery_cond);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -503,6 +552,14 @@ hwaddr spapr_get_rtas_addr(void)
     return (hwaddr)fdt32_to_cpu(*rtas_data);
 }
 
+void spapr_fwnmi_register(void)
+{
+    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
+                        rtas_ibm_nmi_register);
+    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
+                        rtas_ibm_nmi_interlock);
+}
+
 static void core_rtas_register_types(void)
 {
     spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4afa8d4d09..86f0fc8fdd 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -653,8 +653,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
 #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
 #define RTAS_IBM_SUSPEND_ME                     (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2B)
+#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2C)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2B)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
 
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
@@ -907,4 +909,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
 hwaddr spapr_get_rtas_addr(void);
+void spapr_fwnmi_register(void);
 #endif /* HW_SPAPR_H */
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v17 6/7] migration: Include migration support for machine check handling
  2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
                   ` (4 preceding siblings ...)
  2019-10-24  7:43 ` [PATCH v17 5/7] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Ganesh Goudar
@ 2019-10-24  7:43 ` Ganesh Goudar
  2019-11-19  2:45   ` David Gibson
  2019-10-24  7:43 ` [PATCH v17 7/7] ppc: spapr: Activate the FWNMI functionality Ganesh Goudar
  6 siblings, 1 reply; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, Ganesh Goudar, arawinda.p, groug

From: Aravinda Prasad <arawinda.p@gmail.com>

This patch includes migration support for machine check
handling. Especially this patch blocks VM migration
requests until the machine check error handling is
complete as these errors are specific to the source
hardware and is irrelevant on the target hardware.

[Do not set FWNMI cap in post_load, now its done in .apply hook]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
---
 hw/ppc/spapr.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_events.c  | 16 +++++++++++++++-
 hw/ppc/spapr_rtas.c    |  2 ++
 include/hw/ppc/spapr.h |  2 ++
 4 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 346ec5ba6c..e0d0f95ec0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -46,6 +46,7 @@
 #include "migration/qemu-file-types.h"
 #include "migration/global_state.h"
 #include "migration/register.h"
+#include "migration/blocker.h"
 #include "mmu-hash64.h"
 #include "mmu-book3s-v3.h"
 #include "cpu-models.h"
@@ -1751,6 +1752,8 @@ static void spapr_machine_reset(MachineState *machine)
 
     /* Signal all vCPUs waiting on this condition */
     qemu_cond_broadcast(&spapr->mc_delivery_cond);
+
+    migrate_del_blocker(spapr->fwnmi_migration_blocker);
 }
 
 static void spapr_create_nvram(SpaprMachineState *spapr)
@@ -2041,6 +2044,43 @@ static const VMStateDescription vmstate_spapr_dtb = {
     },
 };
 
+static bool spapr_fwnmi_needed(void *opaque)
+{
+    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
+
+    return spapr->guest_machine_check_addr != -1;
+}
+
+static int spapr_fwnmi_pre_save(void *opaque)
+{
+    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
+
+    /*
+     * With -only-migratable QEMU option, we cannot block migration.
+     * Hence check if machine check handling is in progress and print
+     * a warning message.
+     */
+    if (spapr->mc_status != -1) {
+        warn_report("A machine check is being handled during migration. The"
+                "handler may run and log hardware error on the destination");
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_spapr_machine_check = {
+    .name = "spapr_machine_check",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_fwnmi_needed,
+    .pre_save = spapr_fwnmi_pre_save,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
+        VMSTATE_INT32(mc_status, SpaprMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -2075,6 +2115,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_cap_large_decr,
         &vmstate_spapr_cap_ccf_assist,
         &vmstate_spapr_cap_fwnmi,
+        &vmstate_spapr_machine_check,
         NULL
     }
 };
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index db44e09154..30d9371c88 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -43,6 +43,7 @@
 #include "qemu/main-loop.h"
 #include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
+#include "migration/blocker.h"
 
 #define RTAS_LOG_VERSION_MASK                   0xff000000
 #define   RTAS_LOG_VERSION_6                    0x06000000
@@ -842,6 +843,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     CPUState *cs = CPU(cpu);
+    int ret;
+    Error *local_err = NULL;
 
     if (spapr->guest_machine_check_addr == -1) {
         /*
@@ -871,8 +874,19 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
             return;
         }
     }
-    spapr->mc_status = cpu->vcpu_id;
 
+    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
+    if (ret == -EBUSY) {
+        /*
+         * We don't want to abort so we let the migration to continue.
+         * In a rare case, the machine check handler will run on the target.
+         * Though this is not preferable, it is better than aborting
+         * the migration or killing the VM.
+         */
+        warn_report_err(local_err);
+    }
+
+    spapr->mc_status = cpu->vcpu_id;
     spapr_mce_dispatch_elog(cpu, recovered);
 }
 
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 0328b1f341..c78d96ee7e 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -50,6 +50,7 @@
 #include "hw/ppc/fdt.h"
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
+#include "migration/blocker.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -446,6 +447,7 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
      */
     spapr->mc_status = -1;
     qemu_cond_signal(&spapr->mc_delivery_cond);
+    migrate_del_blocker(spapr->fwnmi_migration_blocker);
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 86f0fc8fdd..290abd6328 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -215,6 +215,8 @@ struct SpaprMachineState {
 
     unsigned gpu_numa_id;
     SpaprTpmProxy *tpm_proxy;
+
+    Error *fwnmi_migration_blocker;
 };
 
 #define H_SUCCESS         0
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v17 7/7] ppc: spapr: Activate the FWNMI functionality
  2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
                   ` (5 preceding siblings ...)
  2019-10-24  7:43 ` [PATCH v17 6/7] migration: Include migration support for machine check handling Ganesh Goudar
@ 2019-10-24  7:43 ` Ganesh Goudar
  6 siblings, 0 replies; 19+ messages in thread
From: Ganesh Goudar @ 2019-10-24  7:43 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, arawinda.p, groug

From: Aravinda Prasad <arawinda.p@gmail.com>

This patch sets the default value of SPAPR_CAP_FWNMI_MCE
to SPAPR_CAP_ON for machine type 4.2.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
---
 hw/ppc/spapr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e0d0f95ec0..4e7def1bdb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4484,7 +4484,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
-    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
     spapr_caps_add_properties(smc, &error_abort);
     smc->irq = &spapr_irq_dual;
     smc->dr_phb_enabled = true;
@@ -4558,6 +4558,7 @@ static void spapr_machine_4_1_class_options(MachineClass *mc)
     smc->linux_pci_probe = false;
     compat_props_add(mc->compat_props, hw_compat_4_1, hw_compat_4_1_len);
     compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
 }
 
 DEFINE_SPAPR_MACHINE(4_1, "4.1", false);
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE
  2019-10-24  7:43 ` [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE Ganesh Goudar
@ 2019-11-04 16:10   ` David Gibson
  2019-11-06 11:07     ` Ganesh
  0 siblings, 1 reply; 19+ messages in thread
From: David Gibson @ 2019-11-04 16:10 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 14535 bytes --]

On Thu, Oct 24, 2019 at 01:13:04PM +0530, Ganesh Goudar wrote:
> From: Aravinda Prasad <arawinda.p@gmail.com>
> 
> Upon a machine check exception (MCE) in a guest address space,
> KVM causes a guest exit to enable QEMU to build and pass the
> error to the guest in the PAPR defined rtas error log format.
> 
> This patch builds the rtas error log, copies it to the rtas_addr
> and then invokes the guest registered machine check handler. The
> handler in the guest takes suitable action(s) depending on the type
> and criticality of the error. For example, if an error is
> unrecoverable memory corruption in an application inside the
> guest, then the guest kernel sends a SIGBUS to the application.
> For recoverable errors, the guest performs recovery actions and
> logs the error.
> 
> [Assume SLOF has allocated enough room for rtas error log]

Is that correct with the SLOF image currently included in qemu?

Apart from that detail,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>



> ---
>  hw/ppc/spapr_events.c  | 220 ++++++++++++++++++++++++++++++++++++++++-
>  hw/ppc/spapr_rtas.c    |  26 +++++
>  include/hw/ppc/spapr.h |   6 +-
>  target/ppc/kvm.c       |   4 +-
>  4 files changed, 253 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 0ce96b86be..db44e09154 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -214,6 +214,104 @@ struct hp_extended_log {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_mc {
> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint32_t fru_id;
> +    uint32_t proc_id;
> +    uint8_t error_type;
> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> +    uint8_t sub_err_type;
> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> +    uint8_t reserved_1[6];
> +    uint64_t effective_address;
> +    uint64_t logical_address;
> +} QEMU_PACKED;
> +
> +struct mc_extended_log {
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_mc mc;
> +} QEMU_PACKED;
> +
> +struct MC_ierror_table {
> +    unsigned long srr1_mask;
> +    unsigned long srr1_value;
> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_ierror_table mc_ierror_table[] = {
> +{ 0x00000000081c0000, 0x0000000000040000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000080000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000100000, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000140000, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000180000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
> +
> +struct MC_derror_table {
> +    unsigned long dsisr_value;
> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_derror_table mc_derror_table[] = {
> +{ 0x00008000, false,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00004000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000800, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000400, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000080, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000100, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
> +
> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> +
>  typedef enum EventClass {
>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>      EVENT_CLASS_EPOW                = 1,
> @@ -622,7 +720,125 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> -void spapr_mce_req_event(PowerPCCPU *cpu)
> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> +                                        struct mc_extended_log *ext_elog)
> +{
> +    int i;
> +    CPUPPCState *env = &cpu->env;
> +    uint32_t summary;
> +    uint64_t dsisr = env->spr[SPR_DSISR];
> +
> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> +    if (recovered) {
> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> +    } else {
> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> +    }
> +
> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> +        for (i = 0; i < ARRAY_SIZE(mc_derror_table); i++) {
> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> +            if (mc_derror_table[i].dar_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> +            }
> +
> +            summary |= mc_derror_table[i].initiator
> +                        | mc_derror_table[i].severity;
> +
> +            return summary;
> +        }
> +    } else {
> +        for (i = 0; i < ARRAY_SIZE(mc_ierror_table); i++) {
> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> +                    mc_ierror_table[i].srr1_value) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> +            if (mc_ierror_table[i].nip_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> +            }
> +
> +            summary |= mc_ierror_table[i].initiator
> +                        | mc_ierror_table[i].severity;
> +
> +            return summary;
> +        }
> +    }
> +
> +    summary |= RTAS_LOG_INITIATOR_CPU;
> +    return summary;
> +}
> +
> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    CPUState *cs = CPU(cpu);
> +    uint64_t rtas_addr;
> +    CPUPPCState *env = &cpu->env;
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    target_ulong msr = 0;
> +    struct rtas_error_log log;
> +    struct mc_extended_log *ext_elog;
> +    uint32_t summary;
> +
> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr & (1ULL << MSR_SF)) {
> +        msr |= (1ULL << MSR_SF);
> +    }
> +
> +    msr |= (1ULL << MSR_ME);
> +
> +    ext_elog = g_malloc0(sizeof(*ext_elog));
> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> +
> +    log.summary = cpu_to_be32(summary);
> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
> +
> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> +    ext_elog->mc.hdr.section_length =
> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> +    ext_elog->mc.hdr.section_version = 1;
> +
> +    /* get rtas addr from fdt */
> +    rtas_addr = spapr_get_rtas_addr();
> +    if (!rtas_addr) {
> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> +        ppc_cpu_do_system_reset(cs);
> +        g_free(ext_elog);
> +        return;
> +    }
> +
> +    stq_be_phys(&address_space_memory, rtas_addr + RTAS_ERROR_LOG_OFFSET,
> +                env->gpr[3]);
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
> +                              sizeof(env->gpr[3]), &log, sizeof(log));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
> +                              sizeof(env->gpr[3]) + sizeof(log), ext_elog,
> +                              sizeof(*ext_elog));
> +
> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
> +    env->msr = msr;
> +    env->nip = spapr->guest_machine_check_addr;
> +
> +    g_free(ext_elog);
> +}
> +
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      CPUState *cs = CPU(cpu);
> @@ -656,6 +872,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
>          }
>      }
>      spapr->mc_status = cpu->vcpu_id;
> +
> +    spapr_mce_dispatch_elog(cpu, recovered);
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 8d8d8cdfcb..2c066a372d 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -477,6 +477,32 @@ void spapr_dt_rtas_tokens(void *fdt, int rtas)
>      }
>  }
>  
> +hwaddr spapr_get_rtas_addr(void)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int rtas_node;
> +    const fdt32_t *rtas_data;
> +    void *fdt = spapr->fdt_blob;
> +
> +    /* fetch rtas addr from fdt */
> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> +    if (rtas_node < 0) {
> +        return 0;
> +    }
> +
> +    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
> +    if (!rtas_data) {
> +        return 0;
> +    }
> +
> +    /*
> +     * We assume that the OS called RTAS instantiate-rtas, but some other
> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
> +     * as SLOF only supports 32-bit variant.
> +     */
> +    return (hwaddr)fdt32_to_cpu(*rtas_data);
> +}
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 1f5eb8c856..4afa8d4d09 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -724,6 +724,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
>  
>  #define RTAS_ERROR_LOG_MAX      2048
>  
> +/* Offset from rtas-base where error log is placed */
> +#define RTAS_ERROR_LOG_OFFSET       0x30
> +
>  #define RTAS_EVENT_SCAN_RATE    1
>  
>  /* This helper should be used to encode interrupt specifiers when the related
> @@ -812,7 +815,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> -void spapr_mce_req_event(PowerPCCPU *cpu);
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> @@ -903,4 +906,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> +hwaddr spapr_get_rtas_addr(void);
>  #endif /* HW_SPAPR_H */
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 2d8db6d832..9a902c1064 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2805,9 +2805,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  
>  int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>  {
> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> +
>      cpu_synchronize_state(CPU(cpu));
>  
> -    spapr_mce_req_event(cpu);
> +    spapr_mce_req_event(cpu, recovered);
>  
>      return 0;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE
  2019-11-04 16:10   ` David Gibson
@ 2019-11-06 11:07     ` Ganesh
  2019-11-18 11:09       ` Ganesh
  0 siblings, 1 reply; 19+ messages in thread
From: Ganesh @ 2019-11-06 11:07 UTC (permalink / raw)
  To: David Gibson; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc


On 11/4/19 9:40 PM, David Gibson wrote:
> On Thu, Oct 24, 2019 at 01:13:04PM +0530, Ganesh Goudar wrote:
>> From: Aravinda Prasad <arawinda.p@gmail.com>
>>
>> Upon a machine check exception (MCE) in a guest address space,
>> KVM causes a guest exit to enable QEMU to build and pass the
>> error to the guest in the PAPR defined rtas error log format.
>>
>> This patch builds the rtas error log, copies it to the rtas_addr
>> and then invokes the guest registered machine check handler. The
>> handler in the guest takes suitable action(s) depending on the type
>> and criticality of the error. For example, if an error is
>> unrecoverable memory corruption in an application inside the
>> guest, then the guest kernel sends a SIGBUS to the application.
>> For recoverable errors, the guest performs recovery actions and
>> logs the error.
>>
>> [Assume SLOF has allocated enough room for rtas error log]
> Is that correct with the SLOF image currently included in qemu?
Yes
>
> Apart from that detail,
>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
>
>
>> ---
>>   hw/ppc/spapr_events.c  | 220 ++++++++++++++++++++++++++++++++++++++++-
>>   hw/ppc/spapr_rtas.c    |  26 +++++
>>   include/hw/ppc/spapr.h |   6 +-
>>   target/ppc/kvm.c       |   4 +-
>>   4 files changed, 253 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 0ce96b86be..db44e09154 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -214,6 +214,104 @@ struct hp_extended_log {
>>       struct rtas_event_log_v6_hp hp;
>>   } QEMU_PACKED;
>>   
>> +struct rtas_event_log_v6_mc {
>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>> +    struct rtas_event_log_v6_section_header hdr;
>> +    uint32_t fru_id;
>> +    uint32_t proc_id;
>> +    uint8_t error_type;
>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>> +    uint8_t sub_err_type;
>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>> +    uint8_t reserved_1[6];
>> +    uint64_t effective_address;
>> +    uint64_t logical_address;
>> +} QEMU_PACKED;
>> +
>> +struct mc_extended_log {
>> +    struct rtas_event_log_v6 v6hdr;
>> +    struct rtas_event_log_v6_mc mc;
>> +} QEMU_PACKED;
>> +
>> +struct MC_ierror_table {
>> +    unsigned long srr1_mask;
>> +    unsigned long srr1_value;
>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_ierror_table mc_ierror_table[] = {
>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
>> +
>> +struct MC_derror_table {
>> +    unsigned long dsisr_value;
>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_derror_table mc_derror_table[] = {
>> +{ 0x00008000, false,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00004000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000800, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000400, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000080, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000100, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
>> +
>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>> +
>>   typedef enum EventClass {
>>       EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>       EVENT_CLASS_EPOW                = 1,
>> @@ -622,7 +720,125 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                               RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>   }
>>   
>> -void spapr_mce_req_event(PowerPCCPU *cpu)
>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>> +                                        struct mc_extended_log *ext_elog)
>> +{
>> +    int i;
>> +    CPUPPCState *env = &cpu->env;
>> +    uint32_t summary;
>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>> +
>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>> +    if (recovered) {
>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>> +    } else {
>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>> +    }
>> +
>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>> +        for (i = 0; i < ARRAY_SIZE(mc_derror_table); i++) {
>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>> +            if (mc_derror_table[i].dar_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
>> +            }
>> +
>> +            summary |= mc_derror_table[i].initiator
>> +                        | mc_derror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    } else {
>> +        for (i = 0; i < ARRAY_SIZE(mc_ierror_table); i++) {
>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>> +                    mc_ierror_table[i].srr1_value) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>> +            if (mc_ierror_table[i].nip_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
>> +            }
>> +
>> +            summary |= mc_ierror_table[i].initiator
>> +                        | mc_ierror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    }
>> +
>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>> +    return summary;
>> +}
>> +
>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    CPUState *cs = CPU(cpu);
>> +    uint64_t rtas_addr;
>> +    CPUPPCState *env = &cpu->env;
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    target_ulong msr = 0;
>> +    struct rtas_error_log log;
>> +    struct mc_extended_log *ext_elog;
>> +    uint32_t summary;
>> +
>> +    /*
>> +     * Properly set bits in MSR before we invoke the handler.
>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>> +     */
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +    if (env->msr & (1ULL << MSR_SF)) {
>> +        msr |= (1ULL << MSR_SF);
>> +    }
>> +
>> +    msr |= (1ULL << MSR_ME);
>> +
>> +    ext_elog = g_malloc0(sizeof(*ext_elog));
>> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
>> +
>> +    log.summary = cpu_to_be32(summary);
>> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
>> +
>> +    spapr_init_v6hdr(&ext_elog->v6hdr);
>> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
>> +    ext_elog->mc.hdr.section_length =
>> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
>> +    ext_elog->mc.hdr.section_version = 1;
>> +
>> +    /* get rtas addr from fdt */
>> +    rtas_addr = spapr_get_rtas_addr();
>> +    if (!rtas_addr) {
>> +        /* Unable to fetch rtas_addr. Hence reset the guest */
>> +        ppc_cpu_do_system_reset(cs);
>> +        g_free(ext_elog);
>> +        return;
>> +    }
>> +
>> +    stq_be_phys(&address_space_memory, rtas_addr + RTAS_ERROR_LOG_OFFSET,
>> +                env->gpr[3]);
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
>> +                              sizeof(env->gpr[3]), &log, sizeof(log));
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
>> +                              sizeof(env->gpr[3]) + sizeof(log), ext_elog,
>> +                              sizeof(*ext_elog));
>> +
>> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
>> +    env->msr = msr;
>> +    env->nip = spapr->guest_machine_check_addr;
>> +
>> +    g_free(ext_elog);
>> +}
>> +
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>   {
>>       SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>       CPUState *cs = CPU(cpu);
>> @@ -656,6 +872,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
>>           }
>>       }
>>       spapr->mc_status = cpu->vcpu_id;
>> +
>> +    spapr_mce_dispatch_elog(cpu, recovered);
>>   }
>>   
>>   static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 8d8d8cdfcb..2c066a372d 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -477,6 +477,32 @@ void spapr_dt_rtas_tokens(void *fdt, int rtas)
>>       }
>>   }
>>   
>> +hwaddr spapr_get_rtas_addr(void)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int rtas_node;
>> +    const fdt32_t *rtas_data;
>> +    void *fdt = spapr->fdt_blob;
>> +
>> +    /* fetch rtas addr from fdt */
>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>> +    if (rtas_node < 0) {
>> +        return 0;
>> +    }
>> +
>> +    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
>> +    if (!rtas_data) {
>> +        return 0;
>> +    }
>> +
>> +    /*
>> +     * We assume that the OS called RTAS instantiate-rtas, but some other
>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
>> +     * as SLOF only supports 32-bit variant.
>> +     */
>> +    return (hwaddr)fdt32_to_cpu(*rtas_data);
>> +}
>> +
>>   static void core_rtas_register_types(void)
>>   {
>>       spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 1f5eb8c856..4afa8d4d09 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -724,6 +724,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
>>   
>>   #define RTAS_ERROR_LOG_MAX      2048
>>   
>> +/* Offset from rtas-base where error log is placed */
>> +#define RTAS_ERROR_LOG_OFFSET       0x30
>> +
>>   #define RTAS_EVENT_SCAN_RATE    1
>>   
>>   /* This helper should be used to encode interrupt specifiers when the related
>> @@ -812,7 +815,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>>   int spapr_max_server_number(SpaprMachineState *spapr);
>>   void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                         uint64_t pte0, uint64_t pte1);
>> -void spapr_mce_req_event(PowerPCCPU *cpu);
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>   
>>   /* DRC callbacks. */
>>   void spapr_core_release(DeviceState *dev);
>> @@ -903,4 +906,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>   #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>   
>>   void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>> +hwaddr spapr_get_rtas_addr(void);
>>   #endif /* HW_SPAPR_H */
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 2d8db6d832..9a902c1064 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -2805,9 +2805,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>   
>>   int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>>   {
>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>> +
>>       cpu_synchronize_state(CPU(cpu));
>>   
>> -    spapr_mce_req_event(cpu);
>> +    spapr_mce_req_event(cpu, recovered);
>>   
>>       return 0;
>>   }



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE
  2019-11-06 11:07     ` Ganesh
@ 2019-11-18 11:09       ` Ganesh
  2019-11-19  1:18         ` David Gibson
  0 siblings, 1 reply; 19+ messages in thread
From: Ganesh @ 2019-11-18 11:09 UTC (permalink / raw)
  To: David Gibson; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc


On 11/6/19 4:37 PM, Ganesh wrote:
>
> On 11/4/19 9:40 PM, David Gibson wrote:
>> On Thu, Oct 24, 2019 at 01:13:04PM +0530, Ganesh Goudar wrote:
>>> From: Aravinda Prasad <arawinda.p@gmail.com>
>>>
>>> Upon a machine check exception (MCE) in a guest address space,
>>> KVM causes a guest exit to enable QEMU to build and pass the
>>> error to the guest in the PAPR defined rtas error log format.
>>>
>>> This patch builds the rtas error log, copies it to the rtas_addr
>>> and then invokes the guest registered machine check handler. The
>>> handler in the guest takes suitable action(s) depending on the type
>>> and criticality of the error. For example, if an error is
>>> unrecoverable memory corruption in an application inside the
>>> guest, then the guest kernel sends a SIGBUS to the application.
>>> For recoverable errors, the guest performs recovery actions and
>>> logs the error.
>>>
>>> [Assume SLOF has allocated enough room for rtas error log]
>> Is that correct with the SLOF image currently included in qemu?
> Yes
>>
>> Apart from that detail,
>>
>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Hi David, Please see if this patch series can be merged to 4.2.
>>
>>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>>> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
>>
>>
>>> ---
>>>   hw/ppc/spapr_events.c  | 220 
>>> ++++++++++++++++++++++++++++++++++++++++-
>>>   hw/ppc/spapr_rtas.c    |  26 +++++
>>>   include/hw/ppc/spapr.h |   6 +-
>>>   target/ppc/kvm.c       |   4 +-
>>>   4 files changed, 253 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>> index 0ce96b86be..db44e09154 100644
>>> --- a/hw/ppc/spapr_events.c
>>> +++ b/hw/ppc/spapr_events.c
>>> @@ -214,6 +214,104 @@ struct hp_extended_log {
>>>       struct rtas_event_log_v6_hp hp;
>>>   } QEMU_PACKED;
>>>   +struct rtas_event_log_v6_mc {
>>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>>> +    struct rtas_event_log_v6_section_header hdr;
>>> +    uint32_t fru_id;
>>> +    uint32_t proc_id;
>>> +    uint8_t error_type;
>>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>>> +    uint8_t sub_err_type;
>>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>>> +    uint8_t reserved_1[6];
>>> +    uint64_t effective_address;
>>> +    uint64_t logical_address;
>>> +} QEMU_PACKED;
>>> +
>>> +struct mc_extended_log {
>>> +    struct rtas_event_log_v6 v6hdr;
>>> +    struct rtas_event_log_v6_mc mc;
>>> +} QEMU_PACKED;
>>> +
>>> +struct MC_ierror_table {
>>> +    unsigned long srr1_mask;
>>> +    unsigned long srr1_value;
>>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>>> +    uint8_t error_type;
>>> +    uint8_t error_subtype;
>>> +    unsigned int initiator;
>>> +    unsigned int severity;
>>> +};
>>> +
>>> +static const struct MC_ierror_table mc_ierror_table[] = {
>>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
>>> +
>>> +struct MC_derror_table {
>>> +    unsigned long dsisr_value;
>>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>>> +    uint8_t error_type;
>>> +    uint8_t error_subtype;
>>> +    unsigned int initiator;
>>> +    unsigned int severity;
>>> +};
>>> +
>>> +static const struct MC_derror_table mc_derror_table[] = {
>>> +{ 0x00008000, false,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00004000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, 
>>> RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000800, true,
>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000400, true,
>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000080, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before 
>>> PARITY */
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000100, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
>>> +
>>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>>> +
>>>   typedef enum EventClass {
>>>       EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>>       EVENT_CLASS_EPOW                = 1,
>>> @@ -622,7 +720,125 @@ void 
>>> spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>>                               RTAS_LOG_V6_HP_ACTION_REMOVE, 
>>> drc_type, &drc_id);
>>>   }
>>>   -void spapr_mce_req_event(PowerPCCPU *cpu)
>>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool 
>>> recovered,
>>> +                                        struct mc_extended_log 
>>> *ext_elog)
>>> +{
>>> +    int i;
>>> +    CPUPPCState *env = &cpu->env;
>>> +    uint32_t summary;
>>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>>> +
>>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>>> +    if (recovered) {
>>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>>> +    } else {
>>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>>> +    }
>>> +
>>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>>> +        for (i = 0; i < ARRAY_SIZE(mc_derror_table); i++) {
>>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>>> +                continue;
>>> +            }
>>> +
>>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>>> +            ext_elog->mc.sub_err_type = 
>>> mc_derror_table[i].error_subtype;
>>> +            if (mc_derror_table[i].dar_valid) {
>>> +                ext_elog->mc.effective_address = 
>>> cpu_to_be64(env->spr[SPR_DAR]);
>>> +            }
>>> +
>>> +            summary |= mc_derror_table[i].initiator
>>> +                        | mc_derror_table[i].severity;
>>> +
>>> +            return summary;
>>> +        }
>>> +    } else {
>>> +        for (i = 0; i < ARRAY_SIZE(mc_ierror_table); i++) {
>>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>>> +                    mc_ierror_table[i].srr1_value) {
>>> +                continue;
>>> +            }
>>> +
>>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>>> +            ext_elog->mc.sub_err_type = 
>>> mc_ierror_table[i].error_subtype;
>>> +            if (mc_ierror_table[i].nip_valid) {
>>> +                ext_elog->mc.effective_address = 
>>> cpu_to_be64(env->nip);
>>> +            }
>>> +
>>> +            summary |= mc_ierror_table[i].initiator
>>> +                        | mc_ierror_table[i].severity;
>>> +
>>> +            return summary;
>>> +        }
>>> +    }
>>> +
>>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>>> +    return summary;
>>> +}
>>> +
>>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>> +{
>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> +    CPUState *cs = CPU(cpu);
>>> +    uint64_t rtas_addr;
>>> +    CPUPPCState *env = &cpu->env;
>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>> +    target_ulong msr = 0;
>>> +    struct rtas_error_log log;
>>> +    struct mc_extended_log *ext_elog;
>>> +    uint32_t summary;
>>> +
>>> +    /*
>>> +     * Properly set bits in MSR before we invoke the handler.
>>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>>> +     */
>>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>>> +        msr |= (1ULL << MSR_LE);
>>> +    }
>>> +
>>> +    if (env->msr & (1ULL << MSR_SF)) {
>>> +        msr |= (1ULL << MSR_SF);
>>> +    }
>>> +
>>> +    msr |= (1ULL << MSR_ME);
>>> +
>>> +    ext_elog = g_malloc0(sizeof(*ext_elog));
>>> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
>>> +
>>> +    log.summary = cpu_to_be32(summary);
>>> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
>>> +
>>> +    spapr_init_v6hdr(&ext_elog->v6hdr);
>>> +    ext_elog->mc.hdr.section_id = 
>>> cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
>>> +    ext_elog->mc.hdr.section_length =
>>> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
>>> +    ext_elog->mc.hdr.section_version = 1;
>>> +
>>> +    /* get rtas addr from fdt */
>>> +    rtas_addr = spapr_get_rtas_addr();
>>> +    if (!rtas_addr) {
>>> +        /* Unable to fetch rtas_addr. Hence reset the guest */
>>> +        ppc_cpu_do_system_reset(cs);
>>> +        g_free(ext_elog);
>>> +        return;
>>> +    }
>>> +
>>> +    stq_be_phys(&address_space_memory, rtas_addr + 
>>> RTAS_ERROR_LOG_OFFSET,
>>> +                env->gpr[3]);
>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
>>> +                              sizeof(env->gpr[3]), &log, sizeof(log));
>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
>>> +                              sizeof(env->gpr[3]) + sizeof(log), 
>>> ext_elog,
>>> +                              sizeof(*ext_elog));
>>> +
>>> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
>>> +    env->msr = msr;
>>> +    env->nip = spapr->guest_machine_check_addr;
>>> +
>>> +    g_free(ext_elog);
>>> +}
>>> +
>>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>   {
>>>       SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>       CPUState *cs = CPU(cpu);
>>> @@ -656,6 +872,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
>>>           }
>>>       }
>>>       spapr->mc_status = cpu->vcpu_id;
>>> +
>>> +    spapr_mce_dispatch_elog(cpu, recovered);
>>>   }
>>>     static void check_exception(PowerPCCPU *cpu, SpaprMachineState 
>>> *spapr,
>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index 8d8d8cdfcb..2c066a372d 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -477,6 +477,32 @@ void spapr_dt_rtas_tokens(void *fdt, int rtas)
>>>       }
>>>   }
>>>   +hwaddr spapr_get_rtas_addr(void)
>>> +{
>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> +    int rtas_node;
>>> +    const fdt32_t *rtas_data;
>>> +    void *fdt = spapr->fdt_blob;
>>> +
>>> +    /* fetch rtas addr from fdt */
>>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>>> +    if (rtas_node < 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
>>> +    if (!rtas_data) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    /*
>>> +     * We assume that the OS called RTAS instantiate-rtas, but some 
>>> other
>>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as 
>>> of now
>>> +     * as SLOF only supports 32-bit variant.
>>> +     */
>>> +    return (hwaddr)fdt32_to_cpu(*rtas_data);
>>> +}
>>> +
>>>   static void core_rtas_register_types(void)
>>>   {
>>>       spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index 1f5eb8c856..4afa8d4d09 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -724,6 +724,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, 
>>> void *fdt, hwaddr addr);
>>>     #define RTAS_ERROR_LOG_MAX      2048
>>>   +/* Offset from rtas-base where error log is placed */
>>> +#define RTAS_ERROR_LOG_OFFSET       0x30
>>> +
>>>   #define RTAS_EVENT_SCAN_RATE    1
>>>     /* This helper should be used to encode interrupt specifiers 
>>> when the related
>>> @@ -812,7 +815,7 @@ void 
>>> spapr_clear_pending_events(SpaprMachineState *spapr);
>>>   int spapr_max_server_number(SpaprMachineState *spapr);
>>>   void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>>                         uint64_t pte0, uint64_t pte1);
>>> -void spapr_mce_req_event(PowerPCCPU *cpu);
>>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>>     /* DRC callbacks. */
>>>   void spapr_core_release(DeviceState *dev);
>>> @@ -903,4 +906,5 @@ void spapr_check_pagesize(SpaprMachineState 
>>> *spapr, hwaddr pagesize,
>>>   #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the 
>>> platform */
>>>     void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>> +hwaddr spapr_get_rtas_addr(void);
>>>   #endif /* HW_SPAPR_H */
>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>>> index 2d8db6d832..9a902c1064 100644
>>> --- a/target/ppc/kvm.c
>>> +++ b/target/ppc/kvm.c
>>> @@ -2805,9 +2805,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>>     int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>>>   {
>>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>>> +
>>>       cpu_synchronize_state(CPU(cpu));
>>>   -    spapr_mce_req_event(cpu);
>>> +    spapr_mce_req_event(cpu, recovered);
>>>         return 0;
>>>   }



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE
  2019-11-18 11:09       ` Ganesh
@ 2019-11-19  1:18         ` David Gibson
  0 siblings, 0 replies; 19+ messages in thread
From: David Gibson @ 2019-11-19  1:18 UTC (permalink / raw)
  To: Ganesh; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 17161 bytes --]

On Mon, Nov 18, 2019 at 04:39:16PM +0530, Ganesh wrote:
> 
> On 11/6/19 4:37 PM, Ganesh wrote:
> > 
> > On 11/4/19 9:40 PM, David Gibson wrote:
> > > On Thu, Oct 24, 2019 at 01:13:04PM +0530, Ganesh Goudar wrote:
> > > > From: Aravinda Prasad <arawinda.p@gmail.com>
> > > > 
> > > > Upon a machine check exception (MCE) in a guest address space,
> > > > KVM causes a guest exit to enable QEMU to build and pass the
> > > > error to the guest in the PAPR defined rtas error log format.
> > > > 
> > > > This patch builds the rtas error log, copies it to the rtas_addr
> > > > and then invokes the guest registered machine check handler. The
> > > > handler in the guest takes suitable action(s) depending on the type
> > > > and criticality of the error. For example, if an error is
> > > > unrecoverable memory corruption in an application inside the
> > > > guest, then the guest kernel sends a SIGBUS to the application.
> > > > For recoverable errors, the guest performs recovery actions and
> > > > logs the error.
> > > > 
> > > > [Assume SLOF has allocated enough room for rtas error log]
> > > Is that correct with the SLOF image currently included in qemu?
> > Yes
> > > 
> > > Apart from that detail,
> > > 
> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Hi David, Please see if this patch series can be merged to 4.2.

Sorry, we're much too late for 4.2 now.  I am hoping to merge it for
5.0, the next release.

> > > 
> > > > Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> > > > Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
> > > 
> > > 
> > > > ---
> > > >   hw/ppc/spapr_events.c  | 220
> > > > ++++++++++++++++++++++++++++++++++++++++-
> > > >   hw/ppc/spapr_rtas.c    |  26 +++++
> > > >   include/hw/ppc/spapr.h |   6 +-
> > > >   target/ppc/kvm.c       |   4 +-
> > > >   4 files changed, 253 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > > > index 0ce96b86be..db44e09154 100644
> > > > --- a/hw/ppc/spapr_events.c
> > > > +++ b/hw/ppc/spapr_events.c
> > > > @@ -214,6 +214,104 @@ struct hp_extended_log {
> > > >       struct rtas_event_log_v6_hp hp;
> > > >   } QEMU_PACKED;
> > > >   +struct rtas_event_log_v6_mc {
> > > > +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> > > > +    struct rtas_event_log_v6_section_header hdr;
> > > > +    uint32_t fru_id;
> > > > +    uint32_t proc_id;
> > > > +    uint8_t error_type;
> > > > +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> > > > +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> > > > +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> > > > +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> > > > +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> > > > +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> > > > +    uint8_t sub_err_type;
> > > > +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> > > > +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> > > > +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> > > > +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> > > > +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> > > > +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> > > > +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> > > > +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> > > > +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> > > > +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> > > > +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> > > > +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> > > > +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> > > > +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> > > > +    uint8_t reserved_1[6];
> > > > +    uint64_t effective_address;
> > > > +    uint64_t logical_address;
> > > > +} QEMU_PACKED;
> > > > +
> > > > +struct mc_extended_log {
> > > > +    struct rtas_event_log_v6 v6hdr;
> > > > +    struct rtas_event_log_v6_mc mc;
> > > > +} QEMU_PACKED;
> > > > +
> > > > +struct MC_ierror_table {
> > > > +    unsigned long srr1_mask;
> > > > +    unsigned long srr1_value;
> > > > +    bool nip_valid; /* nip is a valid indicator of faulting address */
> > > > +    uint8_t error_type;
> > > > +    uint8_t error_subtype;
> > > > +    unsigned int initiator;
> > > > +    unsigned int severity;
> > > > +};
> > > > +
> > > > +static const struct MC_ierror_table mc_ierror_table[] = {
> > > > +{ 0x00000000081c0000, 0x0000000000040000, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000000081c0000, 0x0000000000080000, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000000081c0000, 0x00000000000c0000, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000000081c0000, 0x0000000000100000, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000000081c0000, 0x0000000000140000, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000000081c0000, 0x0000000000180000, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
> > > > +
> > > > +struct MC_derror_table {
> > > > +    unsigned long dsisr_value;
> > > > +    bool dar_valid; /* dar is a valid indicator of faulting address */
> > > > +    uint8_t error_type;
> > > > +    uint8_t error_subtype;
> > > > +    unsigned int initiator;
> > > > +    unsigned int severity;
> > > > +};
> > > > +
> > > > +static const struct MC_derror_table mc_derror_table[] = {
> > > > +{ 0x00008000, false,
> > > > +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00004000, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_UE,
> > > > RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000800, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000400, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000080, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /*
> > > > Before PARITY */
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > > > +{ 0x00000100, true,
> > > > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> > > > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, } };
> > > > +
> > > > +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> > > > +
> > > >   typedef enum EventClass {
> > > >       EVENT_CLASS_INTERNAL_ERRORS     = 0,
> > > >       EVENT_CLASS_EPOW                = 1,
> > > > @@ -622,7 +720,125 @@ void
> > > > spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> > > >                               RTAS_LOG_V6_HP_ACTION_REMOVE,
> > > > drc_type, &drc_id);
> > > >   }
> > > >   -void spapr_mce_req_event(PowerPCCPU *cpu)
> > > > +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool
> > > > recovered,
> > > > +                                        struct mc_extended_log
> > > > *ext_elog)
> > > > +{
> > > > +    int i;
> > > > +    CPUPPCState *env = &cpu->env;
> > > > +    uint32_t summary;
> > > > +    uint64_t dsisr = env->spr[SPR_DSISR];
> > > > +
> > > > +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> > > > +    if (recovered) {
> > > > +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> > > > +    } else {
> > > > +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> > > > +    }
> > > > +
> > > > +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> > > > +        for (i = 0; i < ARRAY_SIZE(mc_derror_table); i++) {
> > > > +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> > > > +                continue;
> > > > +            }
> > > > +
> > > > +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> > > > +            ext_elog->mc.sub_err_type =
> > > > mc_derror_table[i].error_subtype;
> > > > +            if (mc_derror_table[i].dar_valid) {
> > > > +                ext_elog->mc.effective_address =
> > > > cpu_to_be64(env->spr[SPR_DAR]);
> > > > +            }
> > > > +
> > > > +            summary |= mc_derror_table[i].initiator
> > > > +                        | mc_derror_table[i].severity;
> > > > +
> > > > +            return summary;
> > > > +        }
> > > > +    } else {
> > > > +        for (i = 0; i < ARRAY_SIZE(mc_ierror_table); i++) {
> > > > +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> > > > +                    mc_ierror_table[i].srr1_value) {
> > > > +                continue;
> > > > +            }
> > > > +
> > > > +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> > > > +            ext_elog->mc.sub_err_type =
> > > > mc_ierror_table[i].error_subtype;
> > > > +            if (mc_ierror_table[i].nip_valid) {
> > > > +                ext_elog->mc.effective_address =
> > > > cpu_to_be64(env->nip);
> > > > +            }
> > > > +
> > > > +            summary |= mc_ierror_table[i].initiator
> > > > +                        | mc_ierror_table[i].severity;
> > > > +
> > > > +            return summary;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    summary |= RTAS_LOG_INITIATOR_CPU;
> > > > +    return summary;
> > > > +}
> > > > +
> > > > +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> > > > +{
> > > > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > > > +    CPUState *cs = CPU(cpu);
> > > > +    uint64_t rtas_addr;
> > > > +    CPUPPCState *env = &cpu->env;
> > > > +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> > > > +    target_ulong msr = 0;
> > > > +    struct rtas_error_log log;
> > > > +    struct mc_extended_log *ext_elog;
> > > > +    uint32_t summary;
> > > > +
> > > > +    /*
> > > > +     * Properly set bits in MSR before we invoke the handler.
> > > > +     * SRR0/1, DAR and DSISR are properly set by KVM
> > > > +     */
> > > > +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> > > > +        msr |= (1ULL << MSR_LE);
> > > > +    }
> > > > +
> > > > +    if (env->msr & (1ULL << MSR_SF)) {
> > > > +        msr |= (1ULL << MSR_SF);
> > > > +    }
> > > > +
> > > > +    msr |= (1ULL << MSR_ME);
> > > > +
> > > > +    ext_elog = g_malloc0(sizeof(*ext_elog));
> > > > +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> > > > +
> > > > +    log.summary = cpu_to_be32(summary);
> > > > +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
> > > > +
> > > > +    spapr_init_v6hdr(&ext_elog->v6hdr);
> > > > +    ext_elog->mc.hdr.section_id =
> > > > cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> > > > +    ext_elog->mc.hdr.section_length =
> > > > +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> > > > +    ext_elog->mc.hdr.section_version = 1;
> > > > +
> > > > +    /* get rtas addr from fdt */
> > > > +    rtas_addr = spapr_get_rtas_addr();
> > > > +    if (!rtas_addr) {
> > > > +        /* Unable to fetch rtas_addr. Hence reset the guest */
> > > > +        ppc_cpu_do_system_reset(cs);
> > > > +        g_free(ext_elog);
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    stq_be_phys(&address_space_memory, rtas_addr +
> > > > RTAS_ERROR_LOG_OFFSET,
> > > > +                env->gpr[3]);
> > > > +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
> > > > +                              sizeof(env->gpr[3]), &log, sizeof(log));
> > > > +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET +
> > > > +                              sizeof(env->gpr[3]) +
> > > > sizeof(log), ext_elog,
> > > > +                              sizeof(*ext_elog));
> > > > +
> > > > +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
> > > > +    env->msr = msr;
> > > > +    env->nip = spapr->guest_machine_check_addr;
> > > > +
> > > > +    g_free(ext_elog);
> > > > +}
> > > > +
> > > > +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > > >   {
> > > >       SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > > >       CPUState *cs = CPU(cpu);
> > > > @@ -656,6 +872,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
> > > >           }
> > > >       }
> > > >       spapr->mc_status = cpu->vcpu_id;
> > > > +
> > > > +    spapr_mce_dispatch_elog(cpu, recovered);
> > > >   }
> > > >     static void check_exception(PowerPCCPU *cpu,
> > > > SpaprMachineState *spapr,
> > > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > > index 8d8d8cdfcb..2c066a372d 100644
> > > > --- a/hw/ppc/spapr_rtas.c
> > > > +++ b/hw/ppc/spapr_rtas.c
> > > > @@ -477,6 +477,32 @@ void spapr_dt_rtas_tokens(void *fdt, int rtas)
> > > >       }
> > > >   }
> > > >   +hwaddr spapr_get_rtas_addr(void)
> > > > +{
> > > > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > > > +    int rtas_node;
> > > > +    const fdt32_t *rtas_data;
> > > > +    void *fdt = spapr->fdt_blob;
> > > > +
> > > > +    /* fetch rtas addr from fdt */
> > > > +    rtas_node = fdt_path_offset(fdt, "/rtas");
> > > > +    if (rtas_node < 0) {
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
> > > > +    if (!rtas_data) {
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    /*
> > > > +     * We assume that the OS called RTAS instantiate-rtas, but
> > > > some other
> > > > +     * OS might call RTAS instantiate-rtas-64 instead. This
> > > > fine as of now
> > > > +     * as SLOF only supports 32-bit variant.
> > > > +     */
> > > > +    return (hwaddr)fdt32_to_cpu(*rtas_data);
> > > > +}
> > > > +
> > > >   static void core_rtas_register_types(void)
> > > >   {
> > > >       spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> > > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > > index 1f5eb8c856..4afa8d4d09 100644
> > > > --- a/include/hw/ppc/spapr.h
> > > > +++ b/include/hw/ppc/spapr.h
> > > > @@ -724,6 +724,9 @@ void spapr_load_rtas(SpaprMachineState
> > > > *spapr, void *fdt, hwaddr addr);
> > > >     #define RTAS_ERROR_LOG_MAX      2048
> > > >   +/* Offset from rtas-base where error log is placed */
> > > > +#define RTAS_ERROR_LOG_OFFSET       0x30
> > > > +
> > > >   #define RTAS_EVENT_SCAN_RATE    1
> > > >     /* This helper should be used to encode interrupt specifiers
> > > > when the related
> > > > @@ -812,7 +815,7 @@ void
> > > > spapr_clear_pending_events(SpaprMachineState *spapr);
> > > >   int spapr_max_server_number(SpaprMachineState *spapr);
> > > >   void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> > > >                         uint64_t pte0, uint64_t pte1);
> > > > -void spapr_mce_req_event(PowerPCCPU *cpu);
> > > > +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> > > >     /* DRC callbacks. */
> > > >   void spapr_core_release(DeviceState *dev);
> > > > @@ -903,4 +906,5 @@ void spapr_check_pagesize(SpaprMachineState
> > > > *spapr, hwaddr pagesize,
> > > >   #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on
> > > > the platform */
> > > >     void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > > > +hwaddr spapr_get_rtas_addr(void);
> > > >   #endif /* HW_SPAPR_H */
> > > > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> > > > index 2d8db6d832..9a902c1064 100644
> > > > --- a/target/ppc/kvm.c
> > > > +++ b/target/ppc/kvm.c
> > > > @@ -2805,9 +2805,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> > > >     int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> > > >   {
> > > > +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> > > > +
> > > >       cpu_synchronize_state(CPU(cpu));
> > > >   -    spapr_mce_req_event(cpu);
> > > > +    spapr_mce_req_event(cpu, recovered);
> > > >         return 0;
> > > >   }
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 5/7] ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
  2019-10-24  7:43 ` [PATCH v17 5/7] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Ganesh Goudar
@ 2019-11-19  2:39   ` David Gibson
  2019-12-05  4:55     ` Ganesh
  0 siblings, 1 reply; 19+ messages in thread
From: David Gibson @ 2019-11-19  2:39 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 7130 bytes --]

On Thu, Oct 24, 2019 at 01:13:05PM +0530, Ganesh Goudar wrote:
> From: Aravinda Prasad <arawinda.p@gmail.com>
> 
> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed.
> 
> [Move fwnmi registeration to .apply hook]

s/registeration/registration/

> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
> ---
>  hw/ppc/spapr_caps.c    |  9 +++++--
>  hw/ppc/spapr_rtas.c    | 57 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |  5 +++-
>  3 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 976d709210..1675ebd45e 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -509,9 +509,14 @@ static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>           * of software injected faults like duplicate SLBs).
>           */
>          warn_report("Firmware Assisted Non-Maskable Interrupts not supported in TCG");

This logic still isn't quite right.  To start with the warn_report()
above possible wants to be more weakly worded.  With TCG, FWNMI won't
generally *do* anything, and there are some edge cases where the
behaviour is arguably incorrect.  However there's no reason we can't
make the RTAS calls work basically as expected and in almost all cases
things will behave correctly - at least according to the case where no
fwnmi events are delivered...

> -    } else if (kvm_enabled() && (kvmppc_set_fwnmi() != 0)) {
> -        error_setg(errp,
> +    } else if (kvm_enabled()) {
> +        if (!kvmppc_set_fwnmi()) {
> +            /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> +            spapr_fwnmi_register();

..but here you only register the RTAS calls in the KVM case, which
breaks that.  If there really is a strong reason to do this, then the
warn_report() above should be error_setg() and fail the apply.

> +        } else {
> +            error_setg(errp,
>  "Firmware Assisted Non-Maskable Interrupts not supported by KVM, try cap-fwnmi-mce=off");
> +        }
>      }
>  }
>  
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 2c066a372d..0328b1f341 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -400,6 +400,55 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);

Actually, since you explicitly test for the cap being enabled here,
there's no reason not to *always* register this RTAS call.  Also this
test for the feature flag should go first, before delving into the
device tree for the RTAS address.

> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +        return;
> +    }
> +
> +    if (spapr->mc_status != cpu->vcpu_id) {
> +        /* The vCPU that hit the NMI should invoke "ibm,nmi-interlock" */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +        return;
> +    }
> +
> +    /*
> +     * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +     * hence unset mc_status.
> +     */
> +    spapr->mc_status = -1;
> +    qemu_cond_signal(&spapr->mc_delivery_cond);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -503,6 +552,14 @@ hwaddr spapr_get_rtas_addr(void)
>      return (hwaddr)fdt32_to_cpu(*rtas_data);
>  }
>  
> +void spapr_fwnmi_register(void)
> +{
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
> +}
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 4afa8d4d09..86f0fc8fdd 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -653,8 +653,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>  #define RTAS_IBM_SUSPEND_ME                     (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2B)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2C)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2B)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -907,4 +909,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>  hwaddr spapr_get_rtas_addr(void);
> +void spapr_fwnmi_register(void);
>  #endif /* HW_SPAPR_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 6/7] migration: Include migration support for machine check handling
  2019-10-24  7:43 ` [PATCH v17 6/7] migration: Include migration support for machine check handling Ganesh Goudar
@ 2019-11-19  2:45   ` David Gibson
  2019-12-05  5:09     ` Ganesh
  0 siblings, 1 reply; 19+ messages in thread
From: David Gibson @ 2019-11-19  2:45 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 7023 bytes --]

On Thu, Oct 24, 2019 at 01:13:06PM +0530, Ganesh Goudar wrote:
> From: Aravinda Prasad <arawinda.p@gmail.com>
> 
> This patch includes migration support for machine check
> handling. Especially this patch blocks VM migration
> requests until the machine check error handling is
> complete as these errors are specific to the source
> hardware and is irrelevant on the target hardware.
> 
> [Do not set FWNMI cap in post_load, now its done in .apply hook]
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
> ---
>  hw/ppc/spapr.c         | 41 +++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_events.c  | 16 +++++++++++++++-
>  hw/ppc/spapr_rtas.c    |  2 ++
>  include/hw/ppc/spapr.h |  2 ++
>  4 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 346ec5ba6c..e0d0f95ec0 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -46,6 +46,7 @@
>  #include "migration/qemu-file-types.h"
>  #include "migration/global_state.h"
>  #include "migration/register.h"
> +#include "migration/blocker.h"
>  #include "mmu-hash64.h"
>  #include "mmu-book3s-v3.h"
>  #include "cpu-models.h"
> @@ -1751,6 +1752,8 @@ static void spapr_machine_reset(MachineState *machine)
>  
>      /* Signal all vCPUs waiting on this condition */
>      qemu_cond_broadcast(&spapr->mc_delivery_cond);
> +
> +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
>  }
>  
>  static void spapr_create_nvram(SpaprMachineState *spapr)
> @@ -2041,6 +2044,43 @@ static const VMStateDescription vmstate_spapr_dtb = {
>      },
>  };
>  
> +static bool spapr_fwnmi_needed(void *opaque)
> +{
> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> +
> +    return spapr->guest_machine_check_addr != -1;
> +}
> +
> +static int spapr_fwnmi_pre_save(void *opaque)
> +{
> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> +
> +    /*
> +     * With -only-migratable QEMU option, we cannot block migration.
> +     * Hence check if machine check handling is in progress and print
> +     * a warning message.
> +     */

IIUC the logic below this could also occur in the case where the fwnmi
event occurs after a migration has started, but before it completes,
not just with -only-migratable.  Is that correct?

> +    if (spapr->mc_status != -1) {
> +        warn_report("A machine check is being handled during migration. The"
> +                "handler may run and log hardware error on the destination");
> +    }
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_spapr_machine_check = {
> +    .name = "spapr_machine_check",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_fwnmi_needed,
> +    .pre_save = spapr_fwnmi_pre_save,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -2075,6 +2115,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_cap_large_decr,
>          &vmstate_spapr_cap_ccf_assist,
>          &vmstate_spapr_cap_fwnmi,
> +        &vmstate_spapr_machine_check,
>          NULL
>      }
>  };
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index db44e09154..30d9371c88 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -43,6 +43,7 @@
>  #include "qemu/main-loop.h"
>  #include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
> +#include "migration/blocker.h"
>  
>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>  #define   RTAS_LOG_VERSION_6                    0x06000000
> @@ -842,6 +843,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      CPUState *cs = CPU(cpu);
> +    int ret;
> +    Error *local_err = NULL;
>  
>      if (spapr->guest_machine_check_addr == -1) {
>          /*
> @@ -871,8 +874,19 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>              return;
>          }
>      }
> -    spapr->mc_status = cpu->vcpu_id;
>  
> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> +    if (ret == -EBUSY) {
> +        /*
> +         * We don't want to abort so we let the migration to continue.
> +         * In a rare case, the machine check handler will run on the target.
> +         * Though this is not preferable, it is better than aborting
> +         * the migration or killing the VM.
> +         */
> +        warn_report_err(local_err);

I suspect the error message in local_err won't be particularly
meaningful on its own.  Perhaps you need to add a prefix to clarify
that the problem is you've received a fwnmi after migration has
commenced?

> +    }
> +
> +    spapr->mc_status = cpu->vcpu_id;
>      spapr_mce_dispatch_elog(cpu, recovered);
>  }
>  
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 0328b1f341..c78d96ee7e 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -50,6 +50,7 @@
>  #include "hw/ppc/fdt.h"
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
> +#include "migration/blocker.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -446,6 +447,7 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>       */
>      spapr->mc_status = -1;
>      qemu_cond_signal(&spapr->mc_delivery_cond);
> +    migrate_del_blocker(spapr->fwnmi_migration_blocker);

Oh... damn.  I suggested using a static fwnmi_migration_blocker
instance, but I just realized there's a problem with it.

If we do receive multiple fwnmi events on different cpus at roughly
the same time, this will break: I think we'll try to add the same
migration blocker instance multiple times, which won't be good.  Even
if that doesn't do anything *too* bad, then we'll unblock the
migration on the first interlock, rather than waiting for all pending
fwnmi events to complete.

>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 86f0fc8fdd..290abd6328 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -215,6 +215,8 @@ struct SpaprMachineState {
>  
>      unsigned gpu_numa_id;
>      SpaprTpmProxy *tpm_proxy;
> +
> +    Error *fwnmi_migration_blocker;
>  };
>  
>  #define H_SUCCESS         0

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 5/7] ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
  2019-11-19  2:39   ` [PATCH v17 5/7] ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" " David Gibson
@ 2019-12-05  4:55     ` Ganesh
  2019-12-11 13:50       ` Ganesh
  0 siblings, 1 reply; 19+ messages in thread
From: Ganesh @ 2019-12-05  4:55 UTC (permalink / raw)
  To: David Gibson; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc


On 11/19/19 8:09 AM, David Gibson wrote:
> On Thu, Oct 24, 2019 at 01:13:05PM +0530, Ganesh Goudar wrote:
>> From: Aravinda Prasad <arawinda.p@gmail.com>
>>
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls.
>>
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor that also received a machine check error waits
>> till the first processor is done reading the error log.
>> The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed.
>>
>> [Move fwnmi registeration to .apply hook]
> s/registeration/registration/
Thanks
>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
>> ---
>>   hw/ppc/spapr_caps.c    |  9 +++++--
>>   hw/ppc/spapr_rtas.c    | 57 ++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h |  5 +++-
>>   3 files changed, 68 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index 976d709210..1675ebd45e 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -509,9 +509,14 @@ static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>            * of software injected faults like duplicate SLBs).
>>            */
>>           warn_report("Firmware Assisted Non-Maskable Interrupts not supported in TCG");
> This logic still isn't quite right.  To start with the warn_report()
> above possible wants to be more weakly worded.  With TCG, FWNMI won't
> generally *do* anything, and there are some edge cases where the
> behaviour is arguably incorrect.  However there's no reason we can't
> make the RTAS calls work basically as expected and in almost all cases
> things will behave correctly - at least according to the case where no
> fwnmi events are delivered...
ok
>
>> -    } else if (kvm_enabled() && (kvmppc_set_fwnmi() != 0)) {
>> -        error_setg(errp,
>> +    } else if (kvm_enabled()) {
>> +        if (!kvmppc_set_fwnmi()) {
>> +            /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>> +            spapr_fwnmi_register();
> ..but here you only register the RTAS calls in the KVM case, which
> breaks that.  If there really is a strong reason to do this, then the
> warn_report() above should be error_setg() and fail the apply.
>
>> +        } else {
>> +            error_setg(errp,
>>   "Firmware Assisted Non-Maskable Interrupts not supported by KVM, try cap-fwnmi-mce=off");
>> +        }
>>       }
>>   }
>>   
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 2c066a372d..0328b1f341 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -400,6 +400,55 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>       rtas_st(rets, 1, 100);
>>   }
>>   
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  SpaprMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>> +
>> +    if (!rtas_addr) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> Actually, since you explicitly test for the cap being enabled here,
> there's no reason not to *always* register this RTAS call.  Also this
> test for the feature flag should go first, before delving into the
> device tree for the RTAS address.
Sure, will do
>
>> +        return;
>> +    }
>> +
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (spapr->guest_machine_check_addr == -1) {
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +        return;
>> +    }
>> +
>> +    if (spapr->mc_status != cpu->vcpu_id) {
>> +        /* The vCPU that hit the NMI should invoke "ibm,nmi-interlock" */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +     * hence unset mc_status.
>> +     */
>> +    spapr->mc_status = -1;
>> +    qemu_cond_signal(&spapr->mc_delivery_cond);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>>   static struct rtas_call {
>>       const char *name;
>>       spapr_rtas_fn fn;
>> @@ -503,6 +552,14 @@ hwaddr spapr_get_rtas_addr(void)
>>       return (hwaddr)fdt32_to_cpu(*rtas_data);
>>   }
>>   
>> +void spapr_fwnmi_register(void)
>> +{
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>> +}
>> +
>>   static void core_rtas_register_types(void)
>>   {
>>       spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 4afa8d4d09..86f0fc8fdd 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -653,8 +653,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>   #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>   #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>>   #define RTAS_IBM_SUSPEND_ME                     (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2B)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2C)
>>   
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2B)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
>>   
>>   /* RTAS ibm,get-system-parameter token values */
>>   #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -907,4 +909,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>   
>>   void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>   hwaddr spapr_get_rtas_addr(void);
>> +void spapr_fwnmi_register(void);
>>   #endif /* HW_SPAPR_H */



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 6/7] migration: Include migration support for machine check handling
  2019-11-19  2:45   ` David Gibson
@ 2019-12-05  5:09     ` Ganesh
  2019-12-06  1:22       ` David Gibson
  0 siblings, 1 reply; 19+ messages in thread
From: Ganesh @ 2019-12-05  5:09 UTC (permalink / raw)
  To: David Gibson; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc


On 11/19/19 8:15 AM, David Gibson wrote:
> On Thu, Oct 24, 2019 at 01:13:06PM +0530, Ganesh Goudar wrote:
>> From: Aravinda Prasad <arawinda.p@gmail.com>
>>
>> This patch includes migration support for machine check
>> handling. Especially this patch blocks VM migration
>> requests until the machine check error handling is
>> complete as these errors are specific to the source
>> hardware and is irrelevant on the target hardware.
>>
>> [Do not set FWNMI cap in post_load, now its done in .apply hook]
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
>> ---
>>   hw/ppc/spapr.c         | 41 +++++++++++++++++++++++++++++++++++++++++
>>   hw/ppc/spapr_events.c  | 16 +++++++++++++++-
>>   hw/ppc/spapr_rtas.c    |  2 ++
>>   include/hw/ppc/spapr.h |  2 ++
>>   4 files changed, 60 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 346ec5ba6c..e0d0f95ec0 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -46,6 +46,7 @@
>>   #include "migration/qemu-file-types.h"
>>   #include "migration/global_state.h"
>>   #include "migration/register.h"
>> +#include "migration/blocker.h"
>>   #include "mmu-hash64.h"
>>   #include "mmu-book3s-v3.h"
>>   #include "cpu-models.h"
>> @@ -1751,6 +1752,8 @@ static void spapr_machine_reset(MachineState *machine)
>>   
>>       /* Signal all vCPUs waiting on this condition */
>>       qemu_cond_broadcast(&spapr->mc_delivery_cond);
>> +
>> +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>   }
>>   
>>   static void spapr_create_nvram(SpaprMachineState *spapr)
>> @@ -2041,6 +2044,43 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>       },
>>   };
>>   
>> +static bool spapr_fwnmi_needed(void *opaque)
>> +{
>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
>> +
>> +    return spapr->guest_machine_check_addr != -1;
>> +}
>> +
>> +static int spapr_fwnmi_pre_save(void *opaque)
>> +{
>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
>> +
>> +    /*
>> +     * With -only-migratable QEMU option, we cannot block migration.
>> +     * Hence check if machine check handling is in progress and print
>> +     * a warning message.
>> +     */
> IIUC the logic below this could also occur in the case where the fwnmi
> event occurs after a migration has started, but before it completes,
> not just with -only-migratable.  Is that correct?
Yes
>
>> +    if (spapr->mc_status != -1) {
>> +        warn_report("A machine check is being handled during migration. The"
>> +                "handler may run and log hardware error on the destination");
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_machine_check = {
>> +    .name = "spapr_machine_check",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = spapr_fwnmi_needed,
>> +    .pre_save = spapr_fwnmi_pre_save,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>   static const VMStateDescription vmstate_spapr = {
>>       .name = "spapr",
>>       .version_id = 3,
>> @@ -2075,6 +2115,7 @@ static const VMStateDescription vmstate_spapr = {
>>           &vmstate_spapr_cap_large_decr,
>>           &vmstate_spapr_cap_ccf_assist,
>>           &vmstate_spapr_cap_fwnmi,
>> +        &vmstate_spapr_machine_check,
>>           NULL
>>       }
>>   };
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index db44e09154..30d9371c88 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -43,6 +43,7 @@
>>   #include "qemu/main-loop.h"
>>   #include "hw/ppc/spapr_ovec.h"
>>   #include <libfdt.h>
>> +#include "migration/blocker.h"
>>   
>>   #define RTAS_LOG_VERSION_MASK                   0xff000000
>>   #define   RTAS_LOG_VERSION_6                    0x06000000
>> @@ -842,6 +843,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>   {
>>       SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>       CPUState *cs = CPU(cpu);
>> +    int ret;
>> +    Error *local_err = NULL;
>>   
>>       if (spapr->guest_machine_check_addr == -1) {
>>           /*
>> @@ -871,8 +874,19 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>               return;
>>           }
>>       }
>> -    spapr->mc_status = cpu->vcpu_id;
>>   
>> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
>> +    if (ret == -EBUSY) {
>> +        /*
>> +         * We don't want to abort so we let the migration to continue.
>> +         * In a rare case, the machine check handler will run on the target.
>> +         * Though this is not preferable, it is better than aborting
>> +         * the migration or killing the VM.
>> +         */
>> +        warn_report_err(local_err);
> I suspect the error message in local_err won't be particularly
> meaningful on its own.  Perhaps you need to add a prefix to clarify
> that the problem is you've received a fwnmi after migration has
> commenced?
ok
>
>> +    }
>> +
>> +    spapr->mc_status = cpu->vcpu_id;
>>       spapr_mce_dispatch_elog(cpu, recovered);
>>   }
>>   
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 0328b1f341..c78d96ee7e 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -50,6 +50,7 @@
>>   #include "hw/ppc/fdt.h"
>>   #include "target/ppc/mmu-hash64.h"
>>   #include "target/ppc/mmu-book3s-v3.h"
>> +#include "migration/blocker.h"
>>   
>>   static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                      uint32_t token, uint32_t nargs,
>> @@ -446,6 +447,7 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>        */
>>       spapr->mc_status = -1;
>>       qemu_cond_signal(&spapr->mc_delivery_cond);
>> +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
> Oh... damn.  I suggested using a static fwnmi_migration_blocker
> instance, but I just realized there's a problem with it.
>
> If we do receive multiple fwnmi events on different cpus at roughly
> the same time, this will break: I think we'll try to add the same
> migration blocker instance multiple times, which won't be good.  Even
> if that doesn't do anything *too* bad, then we'll unblock the
> migration on the first interlock, rather than waiting for all pending
> fwnmi events to complete.
Ok, not sure how to handle this, ill look into it.
>
>>       rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>   }
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 86f0fc8fdd..290abd6328 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -215,6 +215,8 @@ struct SpaprMachineState {
>>   
>>       unsigned gpu_numa_id;
>>       SpaprTpmProxy *tpm_proxy;
>> +
>> +    Error *fwnmi_migration_blocker;
>>   };
>>   
>>   #define H_SUCCESS         0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 6/7] migration: Include migration support for machine check handling
  2019-12-05  5:09     ` Ganesh
@ 2019-12-06  1:22       ` David Gibson
  2019-12-06  5:45         ` Ganesh
  0 siblings, 1 reply; 19+ messages in thread
From: David Gibson @ 2019-12-06  1:22 UTC (permalink / raw)
  To: Ganesh; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 8155 bytes --]

On Thu, Dec 05, 2019 at 10:39:29AM +0530, Ganesh wrote:
> 
> On 11/19/19 8:15 AM, David Gibson wrote:
> > On Thu, Oct 24, 2019 at 01:13:06PM +0530, Ganesh Goudar wrote:
> > > From: Aravinda Prasad <arawinda.p@gmail.com>
> > > 
> > > This patch includes migration support for machine check
> > > handling. Especially this patch blocks VM migration
> > > requests until the machine check error handling is
> > > complete as these errors are specific to the source
> > > hardware and is irrelevant on the target hardware.
> > > 
> > > [Do not set FWNMI cap in post_load, now its done in .apply hook]
> > > Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> > > Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
> > > ---
> > >   hw/ppc/spapr.c         | 41 +++++++++++++++++++++++++++++++++++++++++
> > >   hw/ppc/spapr_events.c  | 16 +++++++++++++++-
> > >   hw/ppc/spapr_rtas.c    |  2 ++
> > >   include/hw/ppc/spapr.h |  2 ++
> > >   4 files changed, 60 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 346ec5ba6c..e0d0f95ec0 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -46,6 +46,7 @@
> > >   #include "migration/qemu-file-types.h"
> > >   #include "migration/global_state.h"
> > >   #include "migration/register.h"
> > > +#include "migration/blocker.h"
> > >   #include "mmu-hash64.h"
> > >   #include "mmu-book3s-v3.h"
> > >   #include "cpu-models.h"
> > > @@ -1751,6 +1752,8 @@ static void spapr_machine_reset(MachineState *machine)
> > >       /* Signal all vCPUs waiting on this condition */
> > >       qemu_cond_broadcast(&spapr->mc_delivery_cond);
> > > +
> > > +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
> > >   }
> > >   static void spapr_create_nvram(SpaprMachineState *spapr)
> > > @@ -2041,6 +2044,43 @@ static const VMStateDescription vmstate_spapr_dtb = {
> > >       },
> > >   };
> > > +static bool spapr_fwnmi_needed(void *opaque)
> > > +{
> > > +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > > +
> > > +    return spapr->guest_machine_check_addr != -1;
> > > +}
> > > +
> > > +static int spapr_fwnmi_pre_save(void *opaque)
> > > +{
> > > +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > > +
> > > +    /*
> > > +     * With -only-migratable QEMU option, we cannot block migration.
> > > +     * Hence check if machine check handling is in progress and print
> > > +     * a warning message.
> > > +     */
> > IIUC the logic below this could also occur in the case where the fwnmi
> > event occurs after a migration has started, but before it completes,
> > not just with -only-migratable.  Is that correct?
> Yes

Ok, please update the comment accordingly.

> > 
> > > +    if (spapr->mc_status != -1) {
> > > +        warn_report("A machine check is being handled during migration. The"
> > > +                "handler may run and log hardware error on the destination");
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_spapr_machine_check = {
> > > +    .name = "spapr_machine_check",
> > > +    .version_id = 1,
> > > +    .minimum_version_id = 1,
> > > +    .needed = spapr_fwnmi_needed,
> > > +    .pre_save = spapr_fwnmi_pre_save,
> > > +    .fields = (VMStateField[]) {
> > > +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> > > +        VMSTATE_INT32(mc_status, SpaprMachineState),
> > > +        VMSTATE_END_OF_LIST()
> > > +    },
> > > +};
> > > +
> > >   static const VMStateDescription vmstate_spapr = {
> > >       .name = "spapr",
> > >       .version_id = 3,
> > > @@ -2075,6 +2115,7 @@ static const VMStateDescription vmstate_spapr = {
> > >           &vmstate_spapr_cap_large_decr,
> > >           &vmstate_spapr_cap_ccf_assist,
> > >           &vmstate_spapr_cap_fwnmi,
> > > +        &vmstate_spapr_machine_check,
> > >           NULL
> > >       }
> > >   };
> > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > > index db44e09154..30d9371c88 100644
> > > --- a/hw/ppc/spapr_events.c
> > > +++ b/hw/ppc/spapr_events.c
> > > @@ -43,6 +43,7 @@
> > >   #include "qemu/main-loop.h"
> > >   #include "hw/ppc/spapr_ovec.h"
> > >   #include <libfdt.h>
> > > +#include "migration/blocker.h"
> > >   #define RTAS_LOG_VERSION_MASK                   0xff000000
> > >   #define   RTAS_LOG_VERSION_6                    0x06000000
> > > @@ -842,6 +843,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > >   {
> > >       SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > >       CPUState *cs = CPU(cpu);
> > > +    int ret;
> > > +    Error *local_err = NULL;
> > >       if (spapr->guest_machine_check_addr == -1) {
> > >           /*
> > > @@ -871,8 +874,19 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > >               return;
> > >           }
> > >       }
> > > -    spapr->mc_status = cpu->vcpu_id;
> > > +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> > > +    if (ret == -EBUSY) {
> > > +        /*
> > > +         * We don't want to abort so we let the migration to continue.
> > > +         * In a rare case, the machine check handler will run on the target.
> > > +         * Though this is not preferable, it is better than aborting
> > > +         * the migration or killing the VM.
> > > +         */
> > > +        warn_report_err(local_err);
> > I suspect the error message in local_err won't be particularly
> > meaningful on its own.  Perhaps you need to add a prefix to clarify
> > that the problem is you've received a fwnmi after migration has
> > commenced?
> ok
> > 
> > > +    }
> > > +
> > > +    spapr->mc_status = cpu->vcpu_id;
> > >       spapr_mce_dispatch_elog(cpu, recovered);
> > >   }
> > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > index 0328b1f341..c78d96ee7e 100644
> > > --- a/hw/ppc/spapr_rtas.c
> > > +++ b/hw/ppc/spapr_rtas.c
> > > @@ -50,6 +50,7 @@
> > >   #include "hw/ppc/fdt.h"
> > >   #include "target/ppc/mmu-hash64.h"
> > >   #include "target/ppc/mmu-book3s-v3.h"
> > > +#include "migration/blocker.h"
> > >   static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > >                                      uint32_t token, uint32_t nargs,
> > > @@ -446,6 +447,7 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> > >        */
> > >       spapr->mc_status = -1;
> > >       qemu_cond_signal(&spapr->mc_delivery_cond);
> > > +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
> > Oh... damn.  I suggested using a static fwnmi_migration_blocker
> > instance, but I just realized there's a problem with it.
> > 
> > If we do receive multiple fwnmi events on different cpus at roughly
> > the same time, this will break: I think we'll try to add the same
> > migration blocker instance multiple times, which won't be good.  Even
> > if that doesn't do anything *too* bad, then we'll unblock the
> > migration on the first interlock, rather than waiting for all pending
> > fwnmi events to complete.
> Ok, not sure how to handle this, ill look into it.

Well, you can do it by reverting to the old approach of dynamically
creating the Error object before adding it as blocker (and freeing it
when the blocker is removed).  It's a bit ugly, but it will suffice.

> > 
> > >       rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > >   }
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index 86f0fc8fdd..290abd6328 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -215,6 +215,8 @@ struct SpaprMachineState {
> > >       unsigned gpu_numa_id;
> > >       SpaprTpmProxy *tpm_proxy;
> > > +
> > > +    Error *fwnmi_migration_blocker;
> > >   };
> > >   #define H_SUCCESS         0
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 6/7] migration: Include migration support for machine check handling
  2019-12-06  1:22       ` David Gibson
@ 2019-12-06  5:45         ` Ganesh
  0 siblings, 0 replies; 19+ messages in thread
From: Ganesh @ 2019-12-06  5:45 UTC (permalink / raw)
  To: David Gibson; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc


On 12/6/19 6:52 AM, David Gibson wrote:
> On Thu, Dec 05, 2019 at 10:39:29AM +0530, Ganesh wrote:
>> On 11/19/19 8:15 AM, David Gibson wrote:
>>> On Thu, Oct 24, 2019 at 01:13:06PM +0530, Ganesh Goudar wrote:
>>>> From: Aravinda Prasad <arawinda.p@gmail.com>
>>>>
>>>> This patch includes migration support for machine check
>>>> handling. Especially this patch blocks VM migration
>>>> requests until the machine check error handling is
>>>> complete as these errors are specific to the source
>>>> hardware and is irrelevant on the target hardware.
>>>>
>>>> [Do not set FWNMI cap in post_load, now its done in .apply hook]
>>>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>>>> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
>>>> ---
>>>>    hw/ppc/spapr.c         | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>    hw/ppc/spapr_events.c  | 16 +++++++++++++++-
>>>>    hw/ppc/spapr_rtas.c    |  2 ++
>>>>    include/hw/ppc/spapr.h |  2 ++
>>>>    4 files changed, 60 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 346ec5ba6c..e0d0f95ec0 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -46,6 +46,7 @@
>>>>    #include "migration/qemu-file-types.h"
>>>>    #include "migration/global_state.h"
>>>>    #include "migration/register.h"
>>>> +#include "migration/blocker.h"
>>>>    #include "mmu-hash64.h"
>>>>    #include "mmu-book3s-v3.h"
>>>>    #include "cpu-models.h"
>>>> @@ -1751,6 +1752,8 @@ static void spapr_machine_reset(MachineState *machine)
>>>>        /* Signal all vCPUs waiting on this condition */
>>>>        qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>>> +
>>>> +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>>>    }
>>>>    static void spapr_create_nvram(SpaprMachineState *spapr)
>>>> @@ -2041,6 +2044,43 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>>>        },
>>>>    };
>>>> +static bool spapr_fwnmi_needed(void *opaque)
>>>> +{
>>>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
>>>> +
>>>> +    return spapr->guest_machine_check_addr != -1;
>>>> +}
>>>> +
>>>> +static int spapr_fwnmi_pre_save(void *opaque)
>>>> +{
>>>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
>>>> +
>>>> +    /*
>>>> +     * With -only-migratable QEMU option, we cannot block migration.
>>>> +     * Hence check if machine check handling is in progress and print
>>>> +     * a warning message.
>>>> +     */
>>> IIUC the logic below this could also occur in the case where the fwnmi
>>> event occurs after a migration has started, but before it completes,
>>> not just with -only-migratable.  Is that correct?
>> Yes
> Ok, please update the comment accordingly.
sure
>
>>>> +    if (spapr->mc_status != -1) {
>>>> +        warn_report("A machine check is being handled during migration. The"
>>>> +                "handler may run and log hardware error on the destination");
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static const VMStateDescription vmstate_spapr_machine_check = {
>>>> +    .name = "spapr_machine_check",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .needed = spapr_fwnmi_needed,
>>>> +    .pre_save = spapr_fwnmi_pre_save,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>>>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    },
>>>> +};
>>>> +
>>>>    static const VMStateDescription vmstate_spapr = {
>>>>        .name = "spapr",
>>>>        .version_id = 3,
>>>> @@ -2075,6 +2115,7 @@ static const VMStateDescription vmstate_spapr = {
>>>>            &vmstate_spapr_cap_large_decr,
>>>>            &vmstate_spapr_cap_ccf_assist,
>>>>            &vmstate_spapr_cap_fwnmi,
>>>> +        &vmstate_spapr_machine_check,
>>>>            NULL
>>>>        }
>>>>    };
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index db44e09154..30d9371c88 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -43,6 +43,7 @@
>>>>    #include "qemu/main-loop.h"
>>>>    #include "hw/ppc/spapr_ovec.h"
>>>>    #include <libfdt.h>
>>>> +#include "migration/blocker.h"
>>>>    #define RTAS_LOG_VERSION_MASK                   0xff000000
>>>>    #define   RTAS_LOG_VERSION_6                    0x06000000
>>>> @@ -842,6 +843,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>    {
>>>>        SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>        CPUState *cs = CPU(cpu);
>>>> +    int ret;
>>>> +    Error *local_err = NULL;
>>>>        if (spapr->guest_machine_check_addr == -1) {
>>>>            /*
>>>> @@ -871,8 +874,19 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>                return;
>>>>            }
>>>>        }
>>>> -    spapr->mc_status = cpu->vcpu_id;
>>>> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
>>>> +    if (ret == -EBUSY) {
>>>> +        /*
>>>> +         * We don't want to abort so we let the migration to continue.
>>>> +         * In a rare case, the machine check handler will run on the target.
>>>> +         * Though this is not preferable, it is better than aborting
>>>> +         * the migration or killing the VM.
>>>> +         */
>>>> +        warn_report_err(local_err);
>>> I suspect the error message in local_err won't be particularly
>>> meaningful on its own.  Perhaps you need to add a prefix to clarify
>>> that the problem is you've received a fwnmi after migration has
>>> commenced?
>> ok
>>>> +    }
>>>> +
>>>> +    spapr->mc_status = cpu->vcpu_id;
>>>>        spapr_mce_dispatch_elog(cpu, recovered);
>>>>    }
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index 0328b1f341..c78d96ee7e 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -50,6 +50,7 @@
>>>>    #include "hw/ppc/fdt.h"
>>>>    #include "target/ppc/mmu-hash64.h"
>>>>    #include "target/ppc/mmu-book3s-v3.h"
>>>> +#include "migration/blocker.h"
>>>>    static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                                       uint32_t token, uint32_t nargs,
>>>> @@ -446,6 +447,7 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>>         */
>>>>        spapr->mc_status = -1;
>>>>        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>> +    migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>> Oh... damn.  I suggested using a static fwnmi_migration_blocker
>>> instance, but I just realized there's a problem with it.
>>>
>>> If we do receive multiple fwnmi events on different cpus at roughly
>>> the same time, this will break: I think we'll try to add the same
>>> migration blocker instance multiple times, which won't be good.  Even
>>> if that doesn't do anything *too* bad, then we'll unblock the
>>> migration on the first interlock, rather than waiting for all pending
>>> fwnmi events to complete.
>> Ok, not sure how to handle this, ill look into it.
> Well, you can do it by reverting to the old approach of dynamically
> creating the Error object before adding it as blocker (and freeing it
> when the blocker is removed).  It's a bit ugly, but it will suffice.
Sure, Thanks
>
>>>>        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>    }
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 86f0fc8fdd..290abd6328 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -215,6 +215,8 @@ struct SpaprMachineState {
>>>>        unsigned gpu_numa_id;
>>>>        SpaprTpmProxy *tpm_proxy;
>>>> +
>>>> +    Error *fwnmi_migration_blocker;
>>>>    };
>>>>    #define H_SUCCESS         0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v17 5/7] ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
  2019-12-05  4:55     ` Ganesh
@ 2019-12-11 13:50       ` Ganesh
  0 siblings, 0 replies; 19+ messages in thread
From: Ganesh @ 2019-12-11 13:50 UTC (permalink / raw)
  To: David Gibson; +Cc: arawinda.p, aik, qemu-devel, groug, paulus, qemu-ppc


On 12/5/19 10:25 AM, Ganesh wrote:
>
> On 11/19/19 8:09 AM, David Gibson wrote:
>> On Thu, Oct 24, 2019 at 01:13:05PM +0530, Ganesh Goudar wrote:
>>> From: Aravinda Prasad <arawinda.p@gmail.com>
>>>
>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>> and "ibm,nmi-interlock" RTAS calls.
>>>
>>> The machine check notification address is saved when the
>>> OS issues "ibm,nmi-register" RTAS call.
>>>
>>> This patch also handles the case when multiple processors
>>> experience machine check at or about the same time by
>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>> PAPR, subsequent processors serialize waiting for the first
>>> processor to issue the "ibm,nmi-interlock" call. The second
>>> processor that also received a machine check error waits
>>> till the first processor is done reading the error log.
>>> The first processor issues "ibm,nmi-interlock" call
>>> when the error log is consumed.
>>>
>>> [Move fwnmi registeration to .apply hook]
>> s/registeration/registration/
> Thanks
>>
>>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>>> Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
>>> ---
>>>   hw/ppc/spapr_caps.c    |  9 +++++--
>>>   hw/ppc/spapr_rtas.c    | 57 
>>> ++++++++++++++++++++++++++++++++++++++++++
>>>   include/hw/ppc/spapr.h |  5 +++-
>>>   3 files changed, 68 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>> index 976d709210..1675ebd45e 100644
>>> --- a/hw/ppc/spapr_caps.c
>>> +++ b/hw/ppc/spapr_caps.c
>>> @@ -509,9 +509,14 @@ static void 
>>> cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>            * of software injected faults like duplicate SLBs).
>>>            */
>>>           warn_report("Firmware Assisted Non-Maskable Interrupts not 
>>> supported in TCG");
>> This logic still isn't quite right.  To start with the warn_report()
>> above possible wants to be more weakly worded.  With TCG, FWNMI won't
>> generally *do* anything, and there are some edge cases where the
>> behaviour is arguably incorrect.  However there's no reason we can't
>> make the RTAS calls work basically as expected and in almost all cases
>> things will behave correctly - at least according to the case where no
>> fwnmi events are delivered...
> ok
>>
>>> -    } else if (kvm_enabled() && (kvmppc_set_fwnmi() != 0)) {
>>> -        error_setg(errp,
>>> +    } else if (kvm_enabled()) {
>>> +        if (!kvmppc_set_fwnmi()) {
>>> +            /* Register ibm,nmi-register and ibm,nmi-interlock RTAS 
>>> calls */
>>> +            spapr_fwnmi_register();
>> ..but here you only register the RTAS calls in the KVM case, which
>> breaks that.  If there really is a strong reason to do this, then the
>> warn_report() above should be error_setg() and fail the apply.

Also I found a side effect of moving this fwnmi registration to apply 
hook, I see the following assert

when I reboot the guest. may be I must have a member to indicate if the 
registration is already done.

Requesting system reboot
[  186.368745] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  186.370222] reboot: Restarting system
#######MACHINE RESET######
#####SPAPR_FWNMI_REGISTER######
qemu-system-ppc64: /home/jupit/qemu/hw/ppc/spapr_rtas.c:510: 
spapr_rtas_register: Assertion `!name || !rtas_table[token].name' failed.
Aborted

>>
>>> +        } else {
>>> +            error_setg(errp,
>>>   "Firmware Assisted Non-Maskable Interrupts not supported by KVM, 
>>> try cap-fwnmi-mce=off");
>>> +        }
>>>       }
>>>   }
>>>   diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index 2c066a372d..0328b1f341 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -400,6 +400,55 @@ static void rtas_get_power_level(PowerPCCPU 
>>> *cpu, SpaprMachineState *spapr,
>>>       rtas_st(rets, 1, 100);
>>>   }
>>>   +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>> +                                  SpaprMachineState *spapr,
>>> +                                  uint32_t token, uint32_t nargs,
>>> +                                  target_ulong args,
>>> +                                  uint32_t nret, target_ulong rets)
>>> +{
>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>> +
>>> +    if (!rtas_addr) {
>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>> +        return;
>>> +    }
>>> +
>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> Actually, since you explicitly test for the cap being enabled here,
>> there's no reason not to *always* register this RTAS call.  Also this
>> test for the feature flag should go first, before delving into the
>> device tree for the RTAS address.
> Sure, will do
>>
>>> +        return;
>>> +    }
>>> +
>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +}
>>> +
>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>> +                                   SpaprMachineState *spapr,
>>> +                                   uint32_t token, uint32_t nargs,
>>> +                                   target_ulong args,
>>> +                                   uint32_t nret, target_ulong rets)
>>> +{
>>> +    if (spapr->guest_machine_check_addr == -1) {
>>> +        /* NMI register not called */
>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> +        return;
>>> +    }
>>> +
>>> +    if (spapr->mc_status != cpu->vcpu_id) {
>>> +        /* The vCPU that hit the NMI should invoke 
>>> "ibm,nmi-interlock" */
>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> +        return;
>>> +    }
>>> +
>>> +    /*
>>> +     * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>>> +     * hence unset mc_status.
>>> +     */
>>> +    spapr->mc_status = -1;
>>> +    qemu_cond_signal(&spapr->mc_delivery_cond);
>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +}
>>> +
>>>   static struct rtas_call {
>>>       const char *name;
>>>       spapr_rtas_fn fn;
>>> @@ -503,6 +552,14 @@ hwaddr spapr_get_rtas_addr(void)
>>>       return (hwaddr)fdt32_to_cpu(*rtas_data);
>>>   }
>>>   +void spapr_fwnmi_register(void)
>>> +{
>>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>>> +                        rtas_ibm_nmi_register);
>>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>>> +                        rtas_ibm_nmi_interlock);
>>> +}
>>> +
>>>   static void core_rtas_register_types(void)
>>>   {
>>>       spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index 4afa8d4d09..86f0fc8fdd 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -653,8 +653,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, 
>>> target_ulong opcode,
>>>   #define RTAS_IBM_REMOVE_PE_DMA_WINDOW (RTAS_TOKEN_BASE + 0x28)
>>>   #define RTAS_IBM_RESET_PE_DMA_WINDOW (RTAS_TOKEN_BASE + 0x29)
>>>   #define RTAS_IBM_SUSPEND_ME (RTAS_TOKEN_BASE + 0x2A)
>>> +#define RTAS_IBM_NMI_REGISTER (RTAS_TOKEN_BASE + 0x2B)
>>> +#define RTAS_IBM_NMI_INTERLOCK (RTAS_TOKEN_BASE + 0x2C)
>>>   -#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x2B)
>>> +#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x2D)
>>>     /* RTAS ibm,get-system-parameter token values */
>>>   #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>> @@ -907,4 +909,5 @@ void spapr_check_pagesize(SpaprMachineState 
>>> *spapr, hwaddr pagesize,
>>>     void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>>   hwaddr spapr_get_rtas_addr(void);
>>> +void spapr_fwnmi_register(void);
>>>   #endif /* HW_SPAPR_H */



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-12-11 13:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24  7:43 [PATCH v17 0/7] target-ppc/spapr: Add FWNMI support in QEMU for PowerKM guests Ganesh Goudar
2019-10-24  7:43 ` [PATCH v17 1/7] Wrapper function to wait on condition for the main loop mutex Ganesh Goudar
2019-10-24  7:43 ` [PATCH v17 2/7] ppc: spapr: Introduce FWNMI capability Ganesh Goudar
2019-10-24  7:43 ` [PATCH v17 3/7] target/ppc: Handle NMI guest exit Ganesh Goudar
2019-10-24  7:43 ` [PATCH v17 4/7] target/ppc: Build rtas error log upon an MCE Ganesh Goudar
2019-11-04 16:10   ` David Gibson
2019-11-06 11:07     ` Ganesh
2019-11-18 11:09       ` Ganesh
2019-11-19  1:18         ` David Gibson
2019-10-24  7:43 ` [PATCH v17 5/7] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Ganesh Goudar
2019-11-19  2:39   ` [PATCH v17 5/7] ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" " David Gibson
2019-12-05  4:55     ` Ganesh
2019-12-11 13:50       ` Ganesh
2019-10-24  7:43 ` [PATCH v17 6/7] migration: Include migration support for machine check handling Ganesh Goudar
2019-11-19  2:45   ` David Gibson
2019-12-05  5:09     ` Ganesh
2019-12-06  1:22       ` David Gibson
2019-12-06  5:45         ` Ganesh
2019-10-24  7:43 ` [PATCH v17 7/7] ppc: spapr: Activate the FWNMI functionality Ganesh Goudar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).