[Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests
@ 2019-06-12  9:20 Aravinda Prasad
  2019-06-12  9:20 ` [Qemu-devel] [PATCH v10 1/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
                   ` (5 more replies)
  0 siblings, 6 replies; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-12  9:20 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

This patch set adds support for FWNMI in PowerKVM guests.

System errors such as SLB multihit and memory errors
that cannot be corrected by hardware is passed on to
the kernel for handling by raising machine check
exception (an NMI). Upon such machine check exceptions,
if the address in error belongs to guest then KVM
invokes guests' 0x200 interrupt vector if the guest
is not FWNMI capable. For FWNMI capable guest
KVM passes the control to QEMU by exiting the guest.

This patch series adds functionality to QEMU to pass
on such machine check exceptions to the FWNMI capable
guest kernel by building an error log and invoking
the guest registered machine check handling routine.

The KVM changes are now part of the upstream kernel
(commit e20bbd3d). This series contain QEMU changes.

Change Log v10:
  - Reshuffled the patch sequence + minor fixes

Change Log v9:
  - Fixed kvm cap and spapr cap issues

Change Log v8:
  - Added functionality to check FWNMI capability during
    VM migration
---

Aravinda Prasad (6):
      Wrapper function to wait on condition for the main loop mutex
      ppc: spapr: Introduce FWNMI capability
      target/ppc: Handle NMI guest exit
      target/ppc: Build rtas error log upon an MCE
      migration: Include migration support for machine check handling
      ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls

 cpus.c                   |    5 +
 hw/ppc/spapr.c           |   53 +++++++++
 hw/ppc/spapr_caps.c      |   26 ++++
 hw/ppc/spapr_events.c    |  273 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c      |   89 +++++++++++++++
 include/hw/ppc/spapr.h   |   25 ++++
 include/qemu/main-loop.h |    8 +
 target/ppc/kvm.c         |   35 ++++++
 target/ppc/kvm_ppc.h     |   14 ++
 target/ppc/trace-events  |    1 
 10 files changed, 527 insertions(+), 2 deletions(-)

--
Aravinda Prasad

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v10 1/6] Wrapper function to wait on condition for the main loop mutex
  2019-06-12  9:20 [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
@ 2019-06-12  9:20 ` Aravinda Prasad
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability Aravinda Prasad
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-12  9:20 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Introduce a wrapper function to wait on condition for
the main loop mutex. This function atomically releases
the main loop mutex and causes the calling thread to
block on the condition. This wrapper is required because
qemu_global_mutex is a static variable.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 cpus.c                   |    5 +++++
 include/qemu/main-loop.h |    8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/cpus.c b/cpus.c
index ffc5711..14cb24b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1867,6 +1867,11 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_cond_wait_iothread(QemuCond *cond)
+{
+    qemu_cond_wait(cond, &qemu_global_mutex);
+}
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index f6ba78e..a6d20b0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/*
+ * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ *
+ * This function atomically releases the main loop mutex and causes
+ * the calling thread to block on the condition.
+ */
+void qemu_cond_wait_iothread(QemuCond *cond);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-06-12  9:20 [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
  2019-06-12  9:20 ` [Qemu-devel] [PATCH v10 1/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
@ 2019-06-12  9:21 ` Aravinda Prasad
  2019-07-02  3:51   ` David Gibson
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-12  9:21 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
the KVM causes guest exit with NMI as exit reason
when it encounters a machine check exception on the
address belonging to a guest. Without this capability
enabled, KVM redirects machine check exceptions to
guest's 0x200 vector.

This patch also introduces fwnmi-mce capability to
deal with the case when a guest with the
KVM_CAP_PPC_FWNMI capability enabled is attempted
to migrate to a host that does not support this
capability.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    1 +
 hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    4 +++-
 target/ppc/kvm.c       |   19 +++++++++++++++++++
 target/ppc/kvm_ppc.h   |   12 ++++++++++++
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6dd8aaa..2ef86aa 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
     spapr_caps_add_properties(smc, &error_abort);
     smc->irq = &spapr_irq_dual;
     smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 31b4661..2e92eb6 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
     }
 }
 
+static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
+                                Error **errp)
+{
+    if (!val) {
+        return; /* Disabled by default */
+    }
+
+    if (tcg_enabled()) {
+        error_setg(errp,
+"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
+    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
+        error_setg(errp,
+"Firmware Assisted Non-Maskable Interrupts not supported by KVM, try cap-fwnmi-mce=off");
+    }
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
     [SPAPR_CAP_HTM] = {
         .name = "htm",
@@ -578,6 +594,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
         .type = "bool",
         .apply = cap_ccf_assist_apply,
     },
+    [SPAPR_CAP_FWNMI_MCE] = {
+        .name = "fwnmi-mce",
+        .description = "Handle fwnmi machine check exceptions",
+        .index = SPAPR_CAP_FWNMI_MCE,
+        .get = spapr_cap_get_bool,
+        .set = spapr_cap_set_bool,
+        .type = "bool",
+        .apply = cap_fwnmi_mce_apply,
+    },
 };
 
 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
@@ -717,6 +742,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
+SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
 
 void spapr_caps_init(SpaprMachineState *spapr)
 {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4f5becf..f891f8f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -78,8 +78,10 @@ typedef enum {
 #define SPAPR_CAP_LARGE_DECREMENTER     0x08
 /* Count Cache Flush Assist HW Instruction */
 #define SPAPR_CAP_CCF_ASSIST            0x09
+/* FWNMI machine check handling */
+#define SPAPR_CAP_FWNMI_MCE             0x0A
 /* Num Caps */
-#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
+#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
 
 /*
  * Capability Values
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 3bf0a46..afef4cd 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
 static int cap_ppc_count_cache_flush_assist;
 static int cap_ppc_nested_kvm_hv;
 static int cap_large_decr;
+static int cap_ppc_fwnmi;
 
 static uint32_t debug_inst_opcode;
 
@@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     kvmppc_get_cpu_characteristics(s);
     cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
     cap_large_decr = kvmppc_get_dec_bits();
+    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
     /*
      * Note: setting it to false because there is not such capability
      * in KVM at this moment.
@@ -2114,6 +2116,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
     }
 }
 
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+
+    if (!cap_ppc_fwnmi) {
+        return 1;
+    }
+
+    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
+}
+
+
 int kvmppc_smt_threads(void)
 {
     return cap_ppc_smt ? cap_ppc_smt : 1;
@@ -2414,6 +2428,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
     return cap_mmu_hash_v3;
 }
 
+bool kvmppc_has_cap_ppc_fwnmi(void)
+{
+    return cap_ppc_fwnmi;
+}
+
 static bool kvmppc_power8_host(void)
 {
     bool ret = false;
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 45776ca..880cee9 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
 void kvmppc_set_papr(PowerPCCPU *cpu);
 int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
 void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
+bool kvmppc_has_cap_ppc_fwnmi(void);
 int kvmppc_smt_threads(void);
 void kvmppc_hint_smt_possible(Error **errp);
 int kvmppc_set_smt_threads(int smt);
@@ -158,6 +160,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
 {
 }
 
+static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    return 1;
+}
+
+static inline bool kvmppc_has_cap_ppc_fwnmi(void)
+{
+    return false;
+}
+
 static inline int kvmppc_smt_threads(void)
 {
     return 1;



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v10 3/6] target/ppc: Handle NMI guest exit
  2019-06-12  9:20 [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
  2019-06-12  9:20 ` [Qemu-devel] [PATCH v10 1/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability Aravinda Prasad
@ 2019-06-12  9:21 ` Aravinda Prasad
  2019-07-02  3:52   ` David Gibson
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-12  9:21 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.

[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
    (e20bbd3d and related commits)

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c          |    8 ++++++++
 hw/ppc/spapr_events.c   |   23 +++++++++++++++++++++++
 include/hw/ppc/spapr.h  |   10 ++++++++++
 target/ppc/kvm.c        |   14 ++++++++++++++
 target/ppc/kvm_ppc.h    |    2 ++
 target/ppc/trace-events |    1 +
 6 files changed, 58 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2ef86aa..6cc2c3b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1806,6 +1806,12 @@ static void spapr_machine_reset(void)
     first_ppc_cpu->env.gpr[5] = 0;
 
     spapr->cas_reboot = false;
+
+    spapr->mc_status = -1;
+    spapr->guest_machine_check_addr = -1;
+
+    /* Signal all vCPUs waiting on this condition */
+    qemu_cond_broadcast(&spapr->mc_delivery_cond);
 }
 
 static void spapr_create_nvram(SpaprMachineState *spapr)
@@ -3070,6 +3076,8 @@ static void spapr_machine_init(MachineState *machine)
 
         kvmppc_spapr_enable_inkernel_multitce();
     }
+
+    qemu_cond_init(&spapr->mc_delivery_cond);
 }
 
 static int spapr_kvm_type(MachineState *machine, const char *vm_type)
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index ae0f093..a0c66d7 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -620,6 +620,29 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+void spapr_mce_req_event(PowerPCCPU *cpu)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+
+    while (spapr->mc_status != -1) {
+        /*
+         * Check whether the same CPU got machine check error
+         * while still handling the mc error (i.e., before
+         * that CPU called "ibm,nmi-interlock")
+         */
+        if (spapr->mc_status == cpu->vcpu_id) {
+            qemu_system_guest_panicked(NULL);
+            return;
+        }
+        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
+        /* Meanwhile if the system is reset, then just return */
+        if (spapr->guest_machine_check_addr == -1) {
+            return;
+        }
+    }
+    spapr->mc_status = cpu->vcpu_id;
+}
+
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
                             uint32_t token, uint32_t nargs,
                             target_ulong args,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index f891f8f..f34c79f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -190,6 +190,15 @@ struct SpaprMachineState {
      * occurs during the unplug process. */
     QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
 
+    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
+    target_ulong guest_machine_check_addr;
+    /*
+     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
+     * handling the mc.
+     */
+    int mc_status;
+    QemuCond mc_delivery_cond;
+
     /*< public >*/
     char *kvm_type;
     char *host_model;
@@ -789,6 +798,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
+void spapr_mce_req_event(PowerPCCPU *cpu);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index afef4cd..99f33fe 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1763,6 +1763,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ret = 0;
         break;
 
+    case KVM_EXIT_NMI:
+        trace_kvm_handle_nmi_exception();
+        ret = kvm_handle_nmi(cpu, run);
+        break;
+
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
@@ -2863,6 +2868,15 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
     return data & 0xffff;
 }
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
+{
+    cpu_synchronize_state(CPU(cpu));
+
+    spapr_mce_req_event(cpu);
+
+    return 0;
+}
+
 int kvmppc_enable_hwrng(void)
 {
     if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 880cee9..3d9f0b4 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -83,6 +83,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
 void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
 void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
+
 #else
 
 static inline uint32_t kvmppc_get_tbfreq(void)
diff --git a/target/ppc/trace-events b/target/ppc/trace-events
index 3dc6740..6d15aa9 100644
--- a/target/ppc/trace-events
+++ b/target/ppc/trace-events
@@ -28,3 +28,4 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
 kvm_handle_epr(void) "handle epr"
 kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
 kvm_handle_debug_exception(void) "handle debug exception"
+kvm_handle_nmi_exception(void) "handle NMI exception"



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE
  2019-06-12  9:20 [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
                   ` (2 preceding siblings ...)
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
@ 2019-06-12  9:21 ` Aravinda Prasad
  2019-07-02  4:03   ` David Gibson
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 5/6] migration: Include migration support for machine check handling Aravinda Prasad
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
  5 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-12  9:21 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.

This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   13 +++
 hw/ppc/spapr_events.c  |  238 ++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c    |   26 +++++
 include/hw/ppc/spapr.h |    6 +
 target/ppc/kvm.c       |    4 +
 5 files changed, 284 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6cc2c3b..d61905b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2908,6 +2908,19 @@ static void spapr_machine_init(MachineState *machine)
         error_report("Could not get size of LPAR rtas '%s'", filename);
         exit(1);
     }
+
+    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
+        /*
+         * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
+         * or else the rtas image will be overwritten with the rtas error log
+         * when a machine check exception is encountered.
+         */
+        g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
+
+        /* Resize rtas blob to accommodate error log */
+        spapr->rtas_size = RTAS_ERROR_LOG_MAX;
+    }
+
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
     if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
         error_report("Could not load LPAR rtas '%s'", filename);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index a0c66d7..51c052e 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -212,6 +212,106 @@ struct hp_extended_log {
     struct rtas_event_log_v6_hp hp;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_mc {
+#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
+    struct rtas_event_log_v6_section_header hdr;
+    uint32_t fru_id;
+    uint32_t proc_id;
+    uint8_t error_type;
+#define RTAS_LOG_V6_MC_TYPE_UE                           0
+#define RTAS_LOG_V6_MC_TYPE_SLB                          1
+#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
+#define RTAS_LOG_V6_MC_TYPE_TLB                          4
+#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
+#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
+    uint8_t sub_err_type;
+#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
+#define RTAS_LOG_V6_MC_UE_IFETCH                         1
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
+#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
+#define RTAS_LOG_V6_MC_SLB_PARITY                        0
+#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
+#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
+#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
+#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
+#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
+#define RTAS_LOG_V6_MC_TLB_PARITY                        1
+#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
+#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
+    uint8_t reserved_1[6];
+    uint64_t effective_address;
+    uint64_t logical_address;
+} QEMU_PACKED;
+
+struct mc_extended_log {
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_mc mc;
+} QEMU_PACKED;
+
+struct MC_ierror_table {
+    unsigned long srr1_mask;
+    unsigned long srr1_value;
+    bool nip_valid; /* nip is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_ierror_table mc_ierror_table[] = {
+{ 0x00000000081c0000, 0x0000000000040000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000080000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x00000000000c0000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000100000, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000140000, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000180000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, 0, 0, 0, 0, 0 } };
+
+struct MC_derror_table {
+    unsigned long dsisr_value;
+    bool dar_valid; /* dar is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_derror_table mc_derror_table[] = {
+{ 0x00008000, false,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00004000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000800, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000400, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000080, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000100, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, false, 0, 0, 0, 0 } };
+
+#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
+
 typedef enum EventClass {
     EVENT_CLASS_INTERNAL_ERRORS     = 0,
     EVENT_CLASS_EPOW                = 1,
@@ -620,7 +720,141 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
-void spapr_mce_req_event(PowerPCCPU *cpu)
+static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
+                                        struct mc_extended_log *ext_elog)
+{
+    int i;
+    CPUPPCState *env = &cpu->env;
+    uint32_t summary;
+    uint64_t dsisr = env->spr[SPR_DSISR];
+
+    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
+    if (recovered) {
+        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
+    } else {
+        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
+    }
+
+    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
+        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
+            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_derror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
+            if (mc_derror_table[i].dar_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
+            }
+
+            summary |= mc_derror_table[i].initiator
+                        | mc_derror_table[i].severity;
+
+            return summary;
+        }
+    } else {
+        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
+            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
+                    mc_ierror_table[i].srr1_value) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
+            if (mc_ierror_table[i].nip_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
+            }
+
+            summary |= mc_ierror_table[i].initiator
+                        | mc_ierror_table[i].severity;
+
+            return summary;
+        }
+    }
+
+    summary |= RTAS_LOG_INITIATOR_CPU;
+    return summary;
+}
+
+static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
+    uint64_t rtas_addr;
+    CPUPPCState *env = &cpu->env;
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    target_ulong r3, msr = 0;
+    struct rtas_error_log log;
+    struct mc_extended_log *ext_elog;
+    uint32_t summary;
+
+    /*
+     * Properly set bits in MSR before we invoke the handler.
+     * SRR0/1, DAR and DSISR are properly set by KVM
+     */
+    if (!(*pcc->interrupts_big_endian)(cpu)) {
+        msr |= (1ULL << MSR_LE);
+    }
+
+    if (env->msr & (1ULL << MSR_SF)) {
+        msr |= (1ULL << MSR_SF);
+    }
+
+    msr |= (1ULL << MSR_ME);
+
+    if (spapr->guest_machine_check_addr == -1) {
+        /*
+         * This implies that we have hit a machine check between system
+         * reset and "ibm,nmi-register". Fall back to the old machine
+         * check behavior in such cases.
+         */
+        env->spr[SPR_SRR0] = env->nip;
+        env->spr[SPR_SRR1] = env->msr;
+        env->msr = msr;
+        env->nip = 0x200;
+        return;
+    }
+
+    ext_elog = g_malloc0(sizeof(*ext_elog));
+    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
+
+    log.summary = cpu_to_be32(summary);
+    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
+
+    /* r3 should be in BE always */
+    r3 = cpu_to_be64(env->gpr[3]);
+    env->msr = msr;
+
+    spapr_init_v6hdr(&ext_elog->v6hdr);
+    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
+    ext_elog->mc.hdr.section_length =
+                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
+    ext_elog->mc.hdr.section_version = 1;
+
+    /* get rtas addr from fdt */
+    rtas_addr = spapr_get_rtas_addr();
+    if (!rtas_addr) {
+        /* Unable to fetch rtas_addr. Hence reset the guest */
+        ppc_cpu_do_system_reset(cs);
+        g_free(ext_elog);
+        return;
+    }
+
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
+                              sizeof(r3));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
+                              &log, sizeof(log));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
+                              sizeof(log), ext_elog,
+                              sizeof(*ext_elog));
+
+    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
+    env->nip = spapr->guest_machine_check_addr;
+
+    g_free(ext_elog);
+}
+
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
@@ -641,6 +875,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
         }
     }
     spapr->mc_status = cpu->vcpu_id;
+
+    spapr_mce_dispatch_elog(cpu, recovered);
 }
 
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 5bc1a93..a015a80 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -470,6 +470,32 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
     }
 }
 
+hwaddr spapr_get_rtas_addr(void)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int rtas_node;
+    const fdt32_t *rtas_data;
+    void *fdt = spapr->fdt_blob;
+
+    /* fetch rtas addr from fdt */
+    rtas_node = fdt_path_offset(fdt, "/rtas");
+    if (rtas_node < 0) {
+        return 0;
+    }
+
+    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
+    if (!rtas_data) {
+        return 0;
+    }
+
+    /*
+     * We assume that the OS called RTAS instantiate-rtas, but some other
+     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
+     * as SLOF only supports 32-bit variant.
+     */
+    return (hwaddr)fdt32_to_cpu(*rtas_data);
+}
+
 static void core_rtas_register_types(void)
 {
     spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index f34c79f..debb57b 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
 
 #define RTAS_ERROR_LOG_MAX      2048
 
+/* Offset from rtas-base where error log is placed */
+#define RTAS_ERROR_LOG_OFFSET       0x30
+
 #define RTAS_EVENT_SCAN_RATE    1
 
 /* This helper should be used to encode interrupt specifiers when the related
@@ -798,7 +801,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
-void spapr_mce_req_event(PowerPCCPU *cpu);
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
@@ -888,4 +891,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
+hwaddr spapr_get_rtas_addr(void);
 #endif /* HW_SPAPR_H */
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 99f33fe..368ec6e 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2870,9 +2870,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 
 int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
 {
+    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
+
     cpu_synchronize_state(CPU(cpu));
 
-    spapr_mce_req_event(cpu);
+    spapr_mce_req_event(cpu, recovered);
 
     return 0;
 }



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v10 5/6] migration: Include migration support for machine check handling
  2019-06-12  9:20 [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
                   ` (3 preceding siblings ...)
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
@ 2019-06-12  9:21 ` Aravinda Prasad
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
  5 siblings, 0 replies; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-12  9:21 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

This patch includes migration support for machine check
handling. Especially this patch blocks VM migration
requests until the machine check error handling is
complete as (i) these errors are specific to the source
hardware and is irrelevant on the target hardware,
(ii) these errors cause data corruption and should
be handled before migration.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   27 +++++++++++++++++++++++++++
 hw/ppc/spapr_events.c  |   14 ++++++++++++++
 include/hw/ppc/spapr.h |    2 ++
 3 files changed, 43 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d61905b..3d6d139 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -42,6 +42,7 @@
 #include "migration/misc.h"
 #include "migration/global_state.h"
 #include "migration/register.h"
+#include "migration/blocker.h"
 #include "mmu-hash64.h"
 #include "mmu-book3s-v3.h"
 #include "cpu-models.h"
@@ -1812,6 +1813,8 @@ static void spapr_machine_reset(void)
 
     /* Signal all vCPUs waiting on this condition */
     qemu_cond_broadcast(&spapr->mc_delivery_cond);
+
+    migrate_del_blocker(spapr->fwnmi_migration_blocker);
 }
 
 static void spapr_create_nvram(SpaprMachineState *spapr)
@@ -2102,6 +2105,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
     },
 };
 
+static bool spapr_fwnmi_needed(void *opaque)
+{
+    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
+
+    return spapr->guest_machine_check_addr != -1;
+}
+
+static const VMStateDescription vmstate_spapr_machine_check = {
+    .name = "spapr_machine_check",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_fwnmi_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
+        VMSTATE_INT32(mc_status, SpaprMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -2135,6 +2157,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_dtb,
         &vmstate_spapr_cap_large_decr,
         &vmstate_spapr_cap_ccf_assist,
+        &vmstate_spapr_machine_check,
         NULL
     }
 };
@@ -2919,6 +2942,10 @@ static void spapr_machine_init(MachineState *machine)
 
         /* Resize rtas blob to accommodate error log */
         spapr->rtas_size = RTAS_ERROR_LOG_MAX;
+
+        /* Create the error string for live migration blocker */
+        error_setg(&spapr->fwnmi_migration_blocker,
+                "Live migration not supported during machine check handling");
     }
 
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 51c052e..f8ce7f0 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -41,6 +41,7 @@
 #include "qemu/bcd.h"
 #include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
+#include "migration/blocker.h"
 
 #define RTAS_LOG_VERSION_MASK                   0xff000000
 #define   RTAS_LOG_VERSION_6                    0x06000000
@@ -857,6 +858,19 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int ret;
+    Error *local_err = NULL;
+
+    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
+    if (ret < 0) {
+        /*
+         * We don't want to abort and let the migration to continue. In a
+         * rare case, the machine check handler will run on the target
+         * hardware. Though this is not preferable, it is better than aborting
+         * the migration or killing the VM.
+         */
+        warn_report_err(local_err);
+    }
 
     while (spapr->mc_status != -1) {
         /*
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index debb57b..0dedf0a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -214,6 +214,8 @@ struct SpaprMachineState {
     SpaprCapabilities def, eff, mig;
 
     unsigned gpu_numa_id;
+
+    Error *fwnmi_migration_blocker;
 };
 
 #define H_SUCCESS         0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-12  9:20 [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
                   ` (4 preceding siblings ...)
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 5/6] migration: Include migration support for machine check handling Aravinda Prasad
@ 2019-06-12  9:21 ` Aravinda Prasad
  2019-06-24 14:29   ` Greg Kurz
  2019-07-02  4:11   ` [Qemu-devel] " David Gibson
  5 siblings, 2 replies; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-12  9:21 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls and sets the default
value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
type 4.0.

The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.

This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    6 ++++-
 hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    5 +++-
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3d6d139..213d493 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
         /* Create the error string for live migration blocker */
         error_setg(&spapr->fwnmi_migration_blocker,
                 "Live migration not supported during machine check handling");
+
+        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
+        spapr_fwnmi_register();
     }
 
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
@@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
-    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
     spapr_caps_add_properties(smc, &error_abort);
     smc->irq = &spapr_irq_dual;
     smc->dr_phb_enabled = true;
@@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
     smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
     smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
 }
 
 DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index a015a80..e010cb2 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -49,6 +49,7 @@
 #include "hw/ppc/fdt.h"
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
+#include "migration/blocker.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
     rtas_st(rets, 1, 100);
 }
 
+static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
+                                  SpaprMachineState *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args,
+                                  uint32_t nret, target_ulong rets)
+{
+    int ret;
+    hwaddr rtas_addr = spapr_get_rtas_addr();
+
+    if (!rtas_addr) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    ret = kvmppc_fwnmi_enable(cpu);
+    if (ret == 1) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    } else if (ret < 0) {
+        error_report("Couldn't enable KVM FWNMI capability");
+        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+        return;
+    }
+
+    spapr->guest_machine_check_addr = rtas_ld(args, 1);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
+static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    if (spapr->guest_machine_check_addr == -1) {
+        /* NMI register not called */
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+    } else {
+        /*
+         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
+         * hence unset mc_status.
+         */
+        spapr->mc_status = -1;
+        qemu_cond_signal(&spapr->mc_delivery_cond);
+        migrate_del_blocker(spapr->fwnmi_migration_blocker);
+        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    }
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -496,6 +551,14 @@ hwaddr spapr_get_rtas_addr(void)
     return (hwaddr)fdt32_to_cpu(*rtas_data);
 }
 
+void spapr_fwnmi_register(void)
+{
+    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
+                        rtas_ibm_nmi_register);
+    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
+                        rtas_ibm_nmi_interlock);
+}
+
 static void core_rtas_register_types(void)
 {
     spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 0dedf0a..7ae53e2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -637,8 +637,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
 #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
 #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
+#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
 
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
@@ -894,4 +896,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
 hwaddr spapr_get_rtas_addr(void);
+void spapr_fwnmi_register(void);
 #endif /* HW_SPAPR_H */



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
@ 2019-06-24 14:29   ` Greg Kurz
  2019-06-25  6:16     ` Aravinda Prasad
  2019-07-02  4:11   ` [Qemu-devel] " David Gibson
  1 sibling, 1 reply; 37+ messages in thread
From: Greg Kurz @ 2019-06-24 14:29 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Wed, 12 Jun 2019 14:51:38 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls and sets the default
> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> type 4.0.
> 

Next machine type is 4.1.

> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    6 ++++-
>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    5 +++-
>  3 files changed, 72 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3d6d139..213d493 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>          /* Create the error string for live migration blocker */
>          error_setg(&spapr->fwnmi_migration_blocker,
>                  "Live migration not supported during machine check handling");
> +
> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> +        spapr_fwnmi_register();

IIRC this was supposed to depend on SPAPR_CAP_FWNMI_MCE being ON.

>      }
>  
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>      spapr_caps_add_properties(smc, &error_abort);
>      smc->irq = &spapr_irq_dual;
>      smc->dr_phb_enabled = true;
> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;

This should have been put into spapr_machine_4_0_class_options().

But unless you manage to get this merged before soft-freeze (2019-07-02),
I'm afraid this will be a 4.2 feature.

>  }
>  
>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index a015a80..e010cb2 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -49,6 +49,7 @@
>  #include "hw/ppc/fdt.h"
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
> +#include "migration/blocker.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    int ret;
> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    ret = kvmppc_fwnmi_enable(cpu);
> +    if (ret == 1) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    } else if (ret < 0) {
> +        error_report("Couldn't enable KVM FWNMI capability");
> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        /*
> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_status.
> +         */
> +        spapr->mc_status = -1;
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -496,6 +551,14 @@ hwaddr spapr_get_rtas_addr(void)
>      return (hwaddr)fdt32_to_cpu(*rtas_data);
>  }
>  
> +void spapr_fwnmi_register(void)
> +{
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
> +}
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 0dedf0a..7ae53e2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -637,8 +637,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -894,4 +896,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>  hwaddr spapr_get_rtas_addr(void);
> +void spapr_fwnmi_register(void);
>  #endif /* HW_SPAPR_H */
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-24 14:29   ` Greg Kurz
@ 2019-06-25  6:16     ` Aravinda Prasad
  2019-06-25  7:00       ` Greg Kurz
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-25  6:16 UTC (permalink / raw)
  To: Greg Kurz, david; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Monday 24 June 2019 07:59 PM, Greg Kurz wrote:
> On Wed, 12 Jun 2019 14:51:38 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls and sets the default
>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
>> type 4.0.
>>
> 
> Next machine type is 4.1.

ok.

> 
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor that also received a machine check error waits
>> till the first processor is done reading the error log.
>> The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |    6 ++++-
>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    5 +++-
>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3d6d139..213d493 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>>          /* Create the error string for live migration blocker */
>>          error_setg(&spapr->fwnmi_migration_blocker,
>>                  "Live migration not supported during machine check handling");
>> +
>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>> +        spapr_fwnmi_register();
> 
> IIRC this was supposed to depend on SPAPR_CAP_FWNMI_MCE being ON.

Yes this is inside SPAPR_CAP_FWNMI_MCE check:

if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
    /*
     * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
     * or else the rtas image will be overwritten with the rtas error log
     * when a machine check exception is encountered.
     */
    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);

    /* Resize rtas blob to accommodate error log */
    spapr->rtas_size = RTAS_ERROR_LOG_MAX;

    /* Create the error string for live migration blocker */
    error_setg(&spapr->fwnmi_migration_blocker,
            "Live migration not supported during machine check handling");

    /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
    spapr_fwnmi_register();
}


> 
>>      }
>>  
>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>      spapr_caps_add_properties(smc, &error_abort);
>>      smc->irq = &spapr_irq_dual;
>>      smc->dr_phb_enabled = true;
>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> 
> This should have been put into spapr_machine_4_0_class_options().

ok. I will change it.

> 
> But unless you manage to get this merged before soft-freeze (2019-07-02),
> I'm afraid this will be a 4.2 feature.

If there are no other comments, can this be merged to 4.1? I will send a
revised version with the above changes.

Regards,
Aravinda

> 
>>  }
>>  
>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index a015a80..e010cb2 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -49,6 +49,7 @@
>>  #include "hw/ppc/fdt.h"
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>> +#include "migration/blocker.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>      rtas_st(rets, 1, 100);
>>  }
>>  
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  SpaprMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    int ret;
>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>> +
>> +    if (!rtas_addr) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    ret = kvmppc_fwnmi_enable(cpu);
>> +    if (ret == 1) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    } else if (ret < 0) {
>> +        error_report("Couldn't enable KVM FWNMI capability");
>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> +        return;
>> +    }
>> +
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (spapr->guest_machine_check_addr == -1) {
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +    } else {
>> +        /*
>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +         * hence unset mc_status.
>> +         */
>> +        spapr->mc_status = -1;
>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    }
>> +}
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -496,6 +551,14 @@ hwaddr spapr_get_rtas_addr(void)
>>      return (hwaddr)fdt32_to_cpu(*rtas_data);
>>  }
>>  
>> +void spapr_fwnmi_register(void)
>> +{
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>> +}
>> +
>>  static void core_rtas_register_types(void)
>>  {
>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 0dedf0a..7ae53e2 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -637,8 +637,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -894,4 +896,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>  
>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>  hwaddr spapr_get_rtas_addr(void);
>> +void spapr_fwnmi_register(void);
>>  #endif /* HW_SPAPR_H */
>>
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-25  6:16     ` Aravinda Prasad
@ 2019-06-25  7:00       ` Greg Kurz
  2019-06-26  5:13         ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: Greg Kurz @ 2019-06-25  7:00 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Tue, 25 Jun 2019 11:46:06 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> On Monday 24 June 2019 07:59 PM, Greg Kurz wrote:
> > On Wed, 12 Jun 2019 14:51:38 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >   
> >> This patch adds support in QEMU to handle "ibm,nmi-register"
> >> and "ibm,nmi-interlock" RTAS calls and sets the default
> >> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >> type 4.0.
> >>  
> > 
> > Next machine type is 4.1.  
> 
> ok.
> 
> >   
> >> The machine check notification address is saved when the
> >> OS issues "ibm,nmi-register" RTAS call.
> >>
> >> This patch also handles the case when multiple processors
> >> experience machine check at or about the same time by
> >> handling "ibm,nmi-interlock" call. In such cases, as per
> >> PAPR, subsequent processors serialize waiting for the first
> >> processor to issue the "ibm,nmi-interlock" call. The second
> >> processor that also received a machine check error waits
> >> till the first processor is done reading the error log.
> >> The first processor issues "ibm,nmi-interlock" call
> >> when the error log is consumed.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |    6 ++++-
> >>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |    5 +++-
> >>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 3d6d139..213d493 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
> >>          /* Create the error string for live migration blocker */
> >>          error_setg(&spapr->fwnmi_migration_blocker,
> >>                  "Live migration not supported during machine check handling");
> >> +
> >> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >> +        spapr_fwnmi_register();  
> > 
> > IIRC this was supposed to depend on SPAPR_CAP_FWNMI_MCE being ON.  
> 
> Yes this is inside SPAPR_CAP_FWNMI_MCE check:
> 
> if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
>     /*
>      * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
>      * or else the rtas image will be overwritten with the rtas error log
>      * when a machine check exception is encountered.
>      */
>     g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
> 
>     /* Resize rtas blob to accommodate error log */
>     spapr->rtas_size = RTAS_ERROR_LOG_MAX;
> 
>     /* Create the error string for live migration blocker */
>     error_setg(&spapr->fwnmi_migration_blocker,
>             "Live migration not supported during machine check handling");
> 
>     /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>     spapr_fwnmi_register();
> }
> 

Oops my bad... sorry for the noise.

> 
> >   
> >>      }
> >>  
> >>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>      spapr_caps_add_properties(smc, &error_abort);
> >>      smc->irq = &spapr_irq_dual;
> >>      smc->dr_phb_enabled = true;
> >> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
> >>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> >> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;  
> > 
> > This should have been put into spapr_machine_4_0_class_options().  
> 
> ok. I will change it.
> 
> > 
> > But unless you manage to get this merged before soft-freeze (2019-07-02),
> > I'm afraid this will be a 4.2 feature.  
> 
> If there are no other comments, can this be merged to 4.1? I will send a
> revised version with the above changes.
> 

This is David's call.

> Regards,
> Aravinda
> 
> >   
> >>  }
> >>  
> >>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index a015a80..e010cb2 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -49,6 +49,7 @@
> >>  #include "hw/ppc/fdt.h"
> >>  #include "target/ppc/mmu-hash64.h"
> >>  #include "target/ppc/mmu-book3s-v3.h"
> >> +#include "migration/blocker.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>      rtas_st(rets, 1, 100);
> >>  }
> >>  
> >> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >> +                                  SpaprMachineState *spapr,
> >> +                                  uint32_t token, uint32_t nargs,
> >> +                                  target_ulong args,
> >> +                                  uint32_t nret, target_ulong rets)
> >> +{
> >> +    int ret;
> >> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >> +
> >> +    if (!rtas_addr) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    }
> >> +
> >> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    }
> >> +
> >> +    ret = kvmppc_fwnmi_enable(cpu);
> >> +    if (ret == 1) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    } else if (ret < 0) {
> >> +        error_report("Couldn't enable KVM FWNMI capability");
> >> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >> +        return;
> >> +    }
> >> +
> >> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >> +}
> >> +
> >> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >> +                                   SpaprMachineState *spapr,
> >> +                                   uint32_t token, uint32_t nargs,
> >> +                                   target_ulong args,
> >> +                                   uint32_t nret, target_ulong rets)
> >> +{
> >> +    if (spapr->guest_machine_check_addr == -1) {
> >> +        /* NMI register not called */
> >> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >> +    } else {
> >> +        /*
> >> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >> +         * hence unset mc_status.
> >> +         */
> >> +        spapr->mc_status = -1;
> >> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >> +    }
> >> +}
> >> +
> >>  static struct rtas_call {
> >>      const char *name;
> >>      spapr_rtas_fn fn;
> >> @@ -496,6 +551,14 @@ hwaddr spapr_get_rtas_addr(void)
> >>      return (hwaddr)fdt32_to_cpu(*rtas_data);
> >>  }
> >>  
> >> +void spapr_fwnmi_register(void)
> >> +{
> >> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> >> +                        rtas_ibm_nmi_register);
> >> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> >> +                        rtas_ibm_nmi_interlock);
> >> +}
> >> +
> >>  static void core_rtas_register_types(void)
> >>  {
> >>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index 0dedf0a..7ae53e2 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -637,8 +637,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> >>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
> >>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
> >>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> >> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> >> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
> >>  
> >> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> >> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
> >>  
> >>  /* RTAS ibm,get-system-parameter token values */
> >>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> >> @@ -894,4 +896,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> >>  
> >>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> >>  hwaddr spapr_get_rtas_addr(void);
> >> +void spapr_fwnmi_register(void);
> >>  #endif /* HW_SPAPR_H */
> >>  
> >   
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-25  7:00       ` Greg Kurz
@ 2019-06-26  5:13         ` Aravinda Prasad
  2019-07-02  4:12           ` David Gibson
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-06-26  5:13 UTC (permalink / raw)
  To: david; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc



On Tuesday 25 June 2019 12:30 PM, Greg Kurz wrote:
> On Tue, 25 Jun 2019 11:46:06 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> On Monday 24 June 2019 07:59 PM, Greg Kurz wrote:
>>> On Wed, 12 Jun 2019 14:51:38 +0530
>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>   
>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
>>>> type 4.0.
>>>>  
>>>
>>> Next machine type is 4.1.  
>>
>> ok.
>>
>>>   
>>>> The machine check notification address is saved when the
>>>> OS issues "ibm,nmi-register" RTAS call.
>>>>
>>>> This patch also handles the case when multiple processors
>>>> experience machine check at or about the same time by
>>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>>> PAPR, subsequent processors serialize waiting for the first
>>>> processor to issue the "ibm,nmi-interlock" call. The second
>>>> processor that also received a machine check error waits
>>>> till the first processor is done reading the error log.
>>>> The first processor issues "ibm,nmi-interlock" call
>>>> when the error log is consumed.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr.c         |    6 ++++-
>>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |    5 +++-
>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 3d6d139..213d493 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>>>>          /* Create the error string for live migration blocker */
>>>>          error_setg(&spapr->fwnmi_migration_blocker,
>>>>                  "Live migration not supported during machine check handling");
>>>> +
>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>>>> +        spapr_fwnmi_register();  
>>>
>>> IIRC this was supposed to depend on SPAPR_CAP_FWNMI_MCE being ON.  
>>
>> Yes this is inside SPAPR_CAP_FWNMI_MCE check:
>>
>> if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
>>     /*
>>      * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
>>      * or else the rtas image will be overwritten with the rtas error log
>>      * when a machine check exception is encountered.
>>      */
>>     g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
>>
>>     /* Resize rtas blob to accommodate error log */
>>     spapr->rtas_size = RTAS_ERROR_LOG_MAX;
>>
>>     /* Create the error string for live migration blocker */
>>     error_setg(&spapr->fwnmi_migration_blocker,
>>             "Live migration not supported during machine check handling");
>>
>>     /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>>     spapr_fwnmi_register();
>> }
>>
> 
> Oops my bad... sorry for the noise.
> 
>>
>>>   
>>>>      }
>>>>  
>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>      smc->irq = &spapr_irq_dual;
>>>>      smc->dr_phb_enabled = true;
>>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;  
>>>
>>> This should have been put into spapr_machine_4_0_class_options().  
>>
>> ok. I will change it.
>>
>>>
>>> But unless you manage to get this merged before soft-freeze (2019-07-02),
>>> I'm afraid this will be a 4.2 feature.  
>>
>> If there are no other comments, can this be merged to 4.1? I will send a
>> revised version with the above changes.
>>
> 
> This is David's call.

David, can you let me know if this can be merged to 4.1 with the above
minor changes?

> 
>> Regards,
>> Aravinda
>>
>>>   
>>>>  }
>>>>  
>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index a015a80..e010cb2 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -49,6 +49,7 @@
>>>>  #include "hw/ppc/fdt.h"
>>>>  #include "target/ppc/mmu-hash64.h"
>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>> +#include "migration/blocker.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>      rtas_st(rets, 1, 100);
>>>>  }
>>>>  
>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>> +                                  SpaprMachineState *spapr,
>>>> +                                  uint32_t token, uint32_t nargs,
>>>> +                                  target_ulong args,
>>>> +                                  uint32_t nret, target_ulong rets)
>>>> +{
>>>> +    int ret;
>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>>> +
>>>> +    if (!rtas_addr) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>> +    if (ret == 1) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    } else if (ret < 0) {
>>>> +        error_report("Couldn't enable KVM FWNMI capability");
>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>> +}
>>>> +
>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>> +                                   SpaprMachineState *spapr,
>>>> +                                   uint32_t token, uint32_t nargs,
>>>> +                                   target_ulong args,
>>>> +                                   uint32_t nret, target_ulong rets)
>>>> +{
>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>> +        /* NMI register not called */
>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>> +    } else {
>>>> +        /*
>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>>>> +         * hence unset mc_status.
>>>> +         */
>>>> +        spapr->mc_status = -1;
>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>> +    }
>>>> +}
>>>> +
>>>>  static struct rtas_call {
>>>>      const char *name;
>>>>      spapr_rtas_fn fn;
>>>> @@ -496,6 +551,14 @@ hwaddr spapr_get_rtas_addr(void)
>>>>      return (hwaddr)fdt32_to_cpu(*rtas_data);
>>>>  }
>>>>  
>>>> +void spapr_fwnmi_register(void)
>>>> +{
>>>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>>>> +                        rtas_ibm_nmi_register);
>>>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>>>> +                        rtas_ibm_nmi_interlock);
>>>> +}
>>>> +
>>>>  static void core_rtas_register_types(void)
>>>>  {
>>>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 0dedf0a..7ae53e2 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -637,8 +637,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>>>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>>>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>>>  
>>>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>>>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>>>  
>>>>  /* RTAS ibm,get-system-parameter token values */
>>>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>>> @@ -894,4 +896,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>>>  
>>>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>>>  hwaddr spapr_get_rtas_addr(void);
>>>> +void spapr_fwnmi_register(void);
>>>>  #endif /* HW_SPAPR_H */
>>>>  
>>>   
>>
> 
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability Aravinda Prasad
@ 2019-07-02  3:51   ` David Gibson
  2019-07-02  6:24     ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-02  3:51 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 7289 bytes --]

On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
> the KVM causes guest exit with NMI as exit reason
> when it encounters a machine check exception on the
> address belonging to a guest. Without this capability
> enabled, KVM redirects machine check exceptions to
> guest's 0x200 vector.
> 
> This patch also introduces fwnmi-mce capability to
> deal with the case when a guest with the
> KVM_CAP_PPC_FWNMI capability enabled is attempted
> to migrate to a host that does not support this
> capability.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    1 +
>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    4 +++-
>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>  5 files changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6dd8aaa..2ef86aa 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>      spapr_caps_add_properties(smc, &error_abort);
>      smc->irq = &spapr_irq_dual;
>      smc->dr_phb_enabled = true;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 31b4661..2e92eb6 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>      }
>  }
>  
> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> +                                Error **errp)
> +{
> +    if (!val) {
> +        return; /* Disabled by default */
> +    }
> +
> +    if (tcg_enabled()) {
> +        error_setg(errp,
> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");

Not allowing this for TCG creates an awkward incompatibility between
KVM and TCG guests.  I can't actually see any reason to ban it for TCG
- with the current code TCG won't ever generate NMIs, but I don't see
that anything will actually break.

In fact, we do have an nmi monitor command, currently wired to the
spapr_nmi() function which resets each cpu, but it probably makes
sense to wire it up to the fwnmi stuff when present.

> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
> +        error_setg(errp,
> +"Firmware Assisted Non-Maskable Interrupts not supported by KVM, try cap-fwnmi-mce=off");
> +    }
> +}
> +
>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>      [SPAPR_CAP_HTM] = {
>          .name = "htm",
> @@ -578,6 +594,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>          .type = "bool",
>          .apply = cap_ccf_assist_apply,
>      },
> +    [SPAPR_CAP_FWNMI_MCE] = {
> +        .name = "fwnmi-mce",
> +        .description = "Handle fwnmi machine check exceptions",
> +        .index = SPAPR_CAP_FWNMI_MCE,
> +        .get = spapr_cap_get_bool,
> +        .set = spapr_cap_set_bool,
> +        .type = "bool",
> +        .apply = cap_fwnmi_mce_apply,
> +    },
>  };
>  
>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> @@ -717,6 +742,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>  
>  void spapr_caps_init(SpaprMachineState *spapr)
>  {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 4f5becf..f891f8f 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -78,8 +78,10 @@ typedef enum {
>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>  /* Count Cache Flush Assist HW Instruction */
>  #define SPAPR_CAP_CCF_ASSIST            0x09
> +/* FWNMI machine check handling */
> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>  /* Num Caps */
> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>  
>  /*
>   * Capability Values
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 3bf0a46..afef4cd 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
>  static int cap_ppc_count_cache_flush_assist;
>  static int cap_ppc_nested_kvm_hv;
>  static int cap_large_decr;
> +static int cap_ppc_fwnmi;
>  
>  static uint32_t debug_inst_opcode;
>  
> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      kvmppc_get_cpu_characteristics(s);
>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>      cap_large_decr = kvmppc_get_dec_bits();
> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>      /*
>       * Note: setting it to false because there is not such capability
>       * in KVM at this moment.
> @@ -2114,6 +2116,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>      }
>  }
>  
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!cap_ppc_fwnmi) {
> +        return 1;
> +    }
> +
> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> +}
> +
> +
>  int kvmppc_smt_threads(void)
>  {
>      return cap_ppc_smt ? cap_ppc_smt : 1;
> @@ -2414,6 +2428,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>      return cap_mmu_hash_v3;
>  }
>  
> +bool kvmppc_has_cap_ppc_fwnmi(void)
> +{
> +    return cap_ppc_fwnmi;
> +}
> +
>  static bool kvmppc_power8_host(void)
>  {
>      bool ret = false;
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 45776ca..880cee9 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
>  void kvmppc_set_papr(PowerPCCPU *cpu);
>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
> +bool kvmppc_has_cap_ppc_fwnmi(void);
>  int kvmppc_smt_threads(void);
>  void kvmppc_hint_smt_possible(Error **errp);
>  int kvmppc_set_smt_threads(int smt);
> @@ -158,6 +160,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  {
>  }
>  
> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    return 1;
> +}
> +
> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
> +{
> +    return false;
> +}
> +
>  static inline int kvmppc_smt_threads(void)
>  {
>      return 1;
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 3/6] target/ppc: Handle NMI guest exit
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
@ 2019-07-02  3:52   ` David Gibson
  0 siblings, 0 replies; 37+ messages in thread
From: David Gibson @ 2019-07-02  3:52 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 6357 bytes --]

On Wed, Jun 12, 2019 at 02:51:13PM +0530, Aravinda Prasad wrote:
> Memory error such as bit flips that cannot be corrected
> by hardware are passed on to the kernel for handling.
> If the memory address in error belongs to guest then
> the guest kernel is responsible for taking suitable action.
> Patch [1] enhances KVM to exit guest with exit reason
> set to KVM_EXIT_NMI in such cases. This patch handles
> KVM_EXIT_NMI exit.
> 
> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>     (e20bbd3d and related commits)
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c          |    8 ++++++++
>  hw/ppc/spapr_events.c   |   23 +++++++++++++++++++++++
>  include/hw/ppc/spapr.h  |   10 ++++++++++
>  target/ppc/kvm.c        |   14 ++++++++++++++
>  target/ppc/kvm_ppc.h    |    2 ++
>  target/ppc/trace-events |    1 +
>  6 files changed, 58 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2ef86aa..6cc2c3b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1806,6 +1806,12 @@ static void spapr_machine_reset(void)
>      first_ppc_cpu->env.gpr[5] = 0;
>  
>      spapr->cas_reboot = false;
> +
> +    spapr->mc_status = -1;
> +    spapr->guest_machine_check_addr = -1;
> +
> +    /* Signal all vCPUs waiting on this condition */
> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>  }
>  
>  static void spapr_create_nvram(SpaprMachineState *spapr)
> @@ -3070,6 +3076,8 @@ static void spapr_machine_init(MachineState *machine)
>  
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
> +
> +    qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ae0f093..a0c66d7 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -620,6 +620,29 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +void spapr_mce_req_event(PowerPCCPU *cpu)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +
> +    while (spapr->mc_status != -1) {
> +        /*
> +         * Check whether the same CPU got machine check error
> +         * while still handling the mc error (i.e., before
> +         * that CPU called "ibm,nmi-interlock")
> +         */
> +        if (spapr->mc_status == cpu->vcpu_id) {
> +            qemu_system_guest_panicked(NULL);
> +            return;
> +        }
> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> +        /* Meanwhile if the system is reset, then just return */
> +        if (spapr->guest_machine_check_addr == -1) {
> +            return;
> +        }
> +    }
> +    spapr->mc_status = cpu->vcpu_id;
> +}
> +
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                              uint32_t token, uint32_t nargs,
>                              target_ulong args,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index f891f8f..f34c79f 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -190,6 +190,15 @@ struct SpaprMachineState {
>       * occurs during the unplug process. */
>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>  
> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> +    target_ulong guest_machine_check_addr;
> +    /*
> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> +     * handling the mc.
> +     */
> +    int mc_status;
> +    QemuCond mc_delivery_cond;
> +
>      /*< public >*/
>      char *kvm_type;
>      char *host_model;
> @@ -789,6 +798,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> +void spapr_mce_req_event(PowerPCCPU *cpu);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index afef4cd..99f33fe 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1763,6 +1763,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ret = 0;
>          break;
>  
> +    case KVM_EXIT_NMI:
> +        trace_kvm_handle_nmi_exception();
> +        ret = kvm_handle_nmi(cpu, run);
> +        break;
> +
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> @@ -2863,6 +2868,15 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>      return data & 0xffff;
>  }
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> +{
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    spapr_mce_req_event(cpu);
> +
> +    return 0;
> +}
> +
>  int kvmppc_enable_hwrng(void)
>  {
>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 880cee9..3d9f0b4 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -83,6 +83,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> +
>  #else
>  
>  static inline uint32_t kvmppc_get_tbfreq(void)
> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> index 3dc6740..6d15aa9 100644
> --- a/target/ppc/trace-events
> +++ b/target/ppc/trace-events
> @@ -28,3 +28,4 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>  kvm_handle_epr(void) "handle epr"
>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>  kvm_handle_debug_exception(void) "handle debug exception"
> +kvm_handle_nmi_exception(void) "handle NMI exception"
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
@ 2019-07-02  4:03   ` David Gibson
  2019-07-02  9:49     ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-02  4:03 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 16292 bytes --]

On Wed, Jun 12, 2019 at 02:51:21PM +0530, Aravinda Prasad wrote:
> Upon a machine check exception (MCE) in a guest address space,
> KVM causes a guest exit to enable QEMU to build and pass the
> error to the guest in the PAPR defined rtas error log format.
> 
> This patch builds the rtas error log, copies it to the rtas_addr
> and then invokes the guest registered machine check handler. The
> handler in the guest takes suitable action(s) depending on the type
> and criticality of the error. For example, if an error is
> unrecoverable memory corruption in an application inside the
> guest, then the guest kernel sends a SIGBUS to the application.
> For recoverable errors, the guest performs recovery actions and
> logs the error.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |   13 +++
>  hw/ppc/spapr_events.c  |  238 ++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c    |   26 +++++
>  include/hw/ppc/spapr.h |    6 +
>  target/ppc/kvm.c       |    4 +
>  5 files changed, 284 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6cc2c3b..d61905b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2908,6 +2908,19 @@ static void spapr_machine_init(MachineState *machine)
>          error_report("Could not get size of LPAR rtas '%s'", filename);
>          exit(1);
>      }
> +
> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
> +        /*
> +         * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
> +         * or else the rtas image will be overwritten with the rtas error log
> +         * when a machine check exception is encountered.
> +         */
> +        g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
> +
> +        /* Resize rtas blob to accommodate error log */
> +        spapr->rtas_size = RTAS_ERROR_LOG_MAX;
> +    }
> +
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>          error_report("Could not load LPAR rtas '%s'", filename);
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index a0c66d7..51c052e 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -212,6 +212,106 @@ struct hp_extended_log {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_mc {
> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint32_t fru_id;
> +    uint32_t proc_id;
> +    uint8_t error_type;
> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> +    uint8_t sub_err_type;
> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> +    uint8_t reserved_1[6];
> +    uint64_t effective_address;
> +    uint64_t logical_address;
> +} QEMU_PACKED;
> +
> +struct mc_extended_log {
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_mc mc;
> +} QEMU_PACKED;
> +
> +struct MC_ierror_table {
> +    unsigned long srr1_mask;
> +    unsigned long srr1_value;
> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_ierror_table mc_ierror_table[] = {
> +{ 0x00000000081c0000, 0x0000000000040000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000080000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000100000, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000140000, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000180000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, 0, 0, 0, 0, 0 } };
> +
> +struct MC_derror_table {
> +    unsigned long dsisr_value;
> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_derror_table mc_derror_table[] = {
> +{ 0x00008000, false,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00004000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000800, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000400, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000080, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000100, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, false, 0, 0, 0, 0 } };
> +
> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> +
>  typedef enum EventClass {
>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>      EVENT_CLASS_EPOW                = 1,
> @@ -620,7 +720,141 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> -void spapr_mce_req_event(PowerPCCPU *cpu)
> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> +                                        struct mc_extended_log *ext_elog)
> +{
> +    int i;
> +    CPUPPCState *env = &cpu->env;
> +    uint32_t summary;
> +    uint64_t dsisr = env->spr[SPR_DSISR];
> +
> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> +    if (recovered) {
> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> +    } else {
> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> +    }
> +
> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> +            if (mc_derror_table[i].dar_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> +            }
> +
> +            summary |= mc_derror_table[i].initiator
> +                        | mc_derror_table[i].severity;
> +
> +            return summary;
> +        }
> +    } else {
> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> +                    mc_ierror_table[i].srr1_value) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> +            if (mc_ierror_table[i].nip_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> +            }
> +
> +            summary |= mc_ierror_table[i].initiator
> +                        | mc_ierror_table[i].severity;
> +
> +            return summary;
> +        }
> +    }
> +
> +    summary |= RTAS_LOG_INITIATOR_CPU;
> +    return summary;
> +}
> +
> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    CPUState *cs = CPU(cpu);
> +    uint64_t rtas_addr;
> +    CPUPPCState *env = &cpu->env;
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    target_ulong r3, msr = 0;
> +    struct rtas_error_log log;
> +    struct mc_extended_log *ext_elog;
> +    uint32_t summary;
> +
> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr & (1ULL << MSR_SF)) {
> +        msr |= (1ULL << MSR_SF);
> +    }
> +
> +    msr |= (1ULL << MSR_ME);
> +
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /*
> +         * This implies that we have hit a machine check between system
> +         * reset and "ibm,nmi-register". Fall back to the old machine
> +         * check behavior in such cases.
> +         */
> +        env->spr[SPR_SRR0] = env->nip;
> +        env->spr[SPR_SRR1] = env->msr;
> +        env->msr = msr;
> +        env->nip = 0x200;
> +        return;

Hm, does this differ from what ppc_cpu_do_system_reset() will do?

> +    }
> +
> +    ext_elog = g_malloc0(sizeof(*ext_elog));
> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> +
> +    log.summary = cpu_to_be32(summary);
> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
> +
> +    /* r3 should be in BE always */
> +    r3 = cpu_to_be64(env->gpr[3]);

I don't love bare integer variables being store in a particular
"endianness" because logically they're endianness-free values.  They
only get an endianness when they are placed into byte-ordered memory.

> +    env->msr = msr;
> +
> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> +    ext_elog->mc.hdr.section_length =
> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> +    ext_elog->mc.hdr.section_version = 1;
> +
> +    /* get rtas addr from fdt */
> +    rtas_addr = spapr_get_rtas_addr();
> +    if (!rtas_addr) {
> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> +        ppc_cpu_do_system_reset(cs);
> +        g_free(ext_elog);
> +        return;
> +    }
> +
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
> +                              sizeof(r3));

So, I'd prefer to see the endianness coversion done as part of the
memory store.  You can do that using the stl_be_phys() helper.

> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
> +                              &log, sizeof(log));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
> +                              sizeof(log), ext_elog,
> +                              sizeof(*ext_elog));
> +
> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
> +    env->nip = spapr->guest_machine_check_addr;
> +
> +    g_free(ext_elog);
> +}
> +
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  
> @@ -641,6 +875,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
>          }
>      }
>      spapr->mc_status = cpu->vcpu_id;
> +
> +    spapr_mce_dispatch_elog(cpu, recovered);
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 5bc1a93..a015a80 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -470,6 +470,32 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>      }
>  }
>  
> +hwaddr spapr_get_rtas_addr(void)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int rtas_node;
> +    const fdt32_t *rtas_data;
> +    void *fdt = spapr->fdt_blob;
> +
> +    /* fetch rtas addr from fdt */
> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> +    if (rtas_node < 0) {
> +        return 0;
> +    }
> +
> +    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
> +    if (!rtas_data) {
> +        return 0;
> +    }
> +
> +    /*
> +     * We assume that the OS called RTAS instantiate-rtas, but some other
> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
> +     * as SLOF only supports 32-bit variant.
> +     */
> +    return (hwaddr)fdt32_to_cpu(*rtas_data);
> +}
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index f34c79f..debb57b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
>  
>  #define RTAS_ERROR_LOG_MAX      2048
>  
> +/* Offset from rtas-base where error log is placed */
> +#define RTAS_ERROR_LOG_OFFSET       0x30
> +
>  #define RTAS_EVENT_SCAN_RATE    1
>  
>  /* This helper should be used to encode interrupt specifiers when the related
> @@ -798,7 +801,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> -void spapr_mce_req_event(PowerPCCPU *cpu);
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> @@ -888,4 +891,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> +hwaddr spapr_get_rtas_addr(void);
>  #endif /* HW_SPAPR_H */
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 99f33fe..368ec6e 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2870,9 +2870,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  
>  int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>  {
> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> +
>      cpu_synchronize_state(CPU(cpu));
>  
> -    spapr_mce_req_event(cpu);
> +    spapr_mce_req_event(cpu, recovered);
>  
>      return 0;
>  }
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
  2019-06-24 14:29   ` Greg Kurz
@ 2019-07-02  4:11   ` David Gibson
  2019-07-02 10:40     ` Aravinda Prasad
  1 sibling, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-02  4:11 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 8039 bytes --]

On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls and sets the default
> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> type 4.0.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    6 ++++-
>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    5 +++-
>  3 files changed, 72 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3d6d139..213d493 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>          /* Create the error string for live migration blocker */
>          error_setg(&spapr->fwnmi_migration_blocker,
>                  "Live migration not supported during machine check handling");
> +
> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> +        spapr_fwnmi_register();
>      }
>  
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;

Turning this on by default really isn't ok if it stops you running TCG
guests at all.

>      spapr_caps_add_properties(smc, &error_abort);
>      smc->irq = &spapr_irq_dual;
>      smc->dr_phb_enabled = true;
> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;

We're now well past 4.0, and in fact we're about to go into soft
freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
will need to retain the old default.

>  }
>  
>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index a015a80..e010cb2 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -49,6 +49,7 @@
>  #include "hw/ppc/fdt.h"
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
> +#include "migration/blocker.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    int ret;
> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    ret = kvmppc_fwnmi_enable(cpu);
> +    if (ret == 1) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);

I don't understand this case separate from the others.  We've already
set the cap, so fwnmi support should be checked and available.

> +        return;
> +    } else if (ret < 0) {
> +        error_report("Couldn't enable KVM FWNMI capability");
> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        /*
> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_status.
> +         */
> +        spapr->mc_status = -1;
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);

Hrm.  We add the blocker at the mce request point.  First, that's in
another patch, which isn't great.  Second, does that mean we could add
multiple times if we get an MCE on multiple CPUs?  Will that work and
correctly match adds and removes properly?

> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -496,6 +551,14 @@ hwaddr spapr_get_rtas_addr(void)
>      return (hwaddr)fdt32_to_cpu(*rtas_data);
>  }
>  
> +void spapr_fwnmi_register(void)
> +{
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
> +}
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 0dedf0a..7ae53e2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -637,8 +637,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -894,4 +896,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>  hwaddr spapr_get_rtas_addr(void);
> +void spapr_fwnmi_register(void);
>  #endif /* HW_SPAPR_H */
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-26  5:13         ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
@ 2019-07-02  4:12           ` David Gibson
  0 siblings, 0 replies; 37+ messages in thread
From: David Gibson @ 2019-07-02  4:12 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5236 bytes --]

On Wed, Jun 26, 2019 at 10:43:33AM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 25 June 2019 12:30 PM, Greg Kurz wrote:
> > On Tue, 25 Jun 2019 11:46:06 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > 
> >> On Monday 24 June 2019 07:59 PM, Greg Kurz wrote:
> >>> On Wed, 12 Jun 2019 14:51:38 +0530
> >>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >>>   
> >>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>> type 4.0.
> >>>>  
> >>>
> >>> Next machine type is 4.1.  
> >>
> >> ok.
> >>
> >>>   
> >>>> The machine check notification address is saved when the
> >>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>
> >>>> This patch also handles the case when multiple processors
> >>>> experience machine check at or about the same time by
> >>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>> PAPR, subsequent processors serialize waiting for the first
> >>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>> processor that also received a machine check error waits
> >>>> till the first processor is done reading the error log.
> >>>> The first processor issues "ibm,nmi-interlock" call
> >>>> when the error log is consumed.
> >>>>
> >>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>> ---
> >>>>  hw/ppc/spapr.c         |    6 ++++-
> >>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  include/hw/ppc/spapr.h |    5 +++-
> >>>>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>> index 3d6d139..213d493 100644
> >>>> --- a/hw/ppc/spapr.c
> >>>> +++ b/hw/ppc/spapr.c
> >>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
> >>>>          /* Create the error string for live migration blocker */
> >>>>          error_setg(&spapr->fwnmi_migration_blocker,
> >>>>                  "Live migration not supported during machine check handling");
> >>>> +
> >>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >>>> +        spapr_fwnmi_register();  
> >>>
> >>> IIRC this was supposed to depend on SPAPR_CAP_FWNMI_MCE being ON.  
> >>
> >> Yes this is inside SPAPR_CAP_FWNMI_MCE check:
> >>
> >> if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
> >>     /*
> >>      * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
> >>      * or else the rtas image will be overwritten with the rtas error log
> >>      * when a machine check exception is encountered.
> >>      */
> >>     g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
> >>
> >>     /* Resize rtas blob to accommodate error log */
> >>     spapr->rtas_size = RTAS_ERROR_LOG_MAX;
> >>
> >>     /* Create the error string for live migration blocker */
> >>     error_setg(&spapr->fwnmi_migration_blocker,
> >>             "Live migration not supported during machine check handling");
> >>
> >>     /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >>     spapr_fwnmi_register();
> >> }
> >>
> > 
> > Oops my bad... sorry for the noise.
> > 
> >>
> >>>   
> >>>>      }
> >>>>  
> >>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>      smc->irq = &spapr_irq_dual;
> >>>>      smc->dr_phb_enabled = true;
> >>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> >>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;  
> >>>
> >>> This should have been put into spapr_machine_4_0_class_options().  
> >>
> >> ok. I will change it.
> >>
> >>>
> >>> But unless you manage to get this merged before soft-freeze (2019-07-02),
> >>> I'm afraid this will be a 4.2 feature.  
> >>
> >> If there are no other comments, can this be merged to 4.1? I will send a
> >> revised version with the above changes.
> >>
> > 
> > This is David's call.
> 
> David, can you let me know if this can be merged to 4.1 with the above
> minor changes?

Nope, sorry.  I've been away, but also this still needs more polish.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-02  3:51   ` David Gibson
@ 2019-07-02  6:24     ` Aravinda Prasad
  2019-07-03  3:03       ` David Gibson
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-02  6:24 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
>> the KVM causes guest exit with NMI as exit reason
>> when it encounters a machine check exception on the
>> address belonging to a guest. Without this capability
>> enabled, KVM redirects machine check exceptions to
>> guest's 0x200 vector.
>>
>> This patch also introduces fwnmi-mce capability to
>> deal with the case when a guest with the
>> KVM_CAP_PPC_FWNMI capability enabled is attempted
>> to migrate to a host that does not support this
>> capability.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |    1 +
>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    4 +++-
>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>  5 files changed, 61 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 6dd8aaa..2ef86aa 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>      spapr_caps_add_properties(smc, &error_abort);
>>      smc->irq = &spapr_irq_dual;
>>      smc->dr_phb_enabled = true;
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index 31b4661..2e92eb6 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>      }
>>  }
>>  
>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>> +                                Error **errp)
>> +{
>> +    if (!val) {
>> +        return; /* Disabled by default */
>> +    }
>> +
>> +    if (tcg_enabled()) {
>> +        error_setg(errp,
>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
> 
> Not allowing this for TCG creates an awkward incompatibility between
> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
> - with the current code TCG won't ever generate NMIs, but I don't see
> that anything will actually break.
> 
> In fact, we do have an nmi monitor command, currently wired to the
> spapr_nmi() function which resets each cpu, but it probably makes
> sense to wire it up to the fwnmi stuff when present.

Yes, but that nmi support is not enough to inject a synchronous error
into the guest kernel. For example, we should provide the faulty address
along with other information such as the type of error (slb multi-hit,
memory error, TLB multi-hit) and when the error occurred (load/store)
and whether the error was completely recovered or not. Without such
information we cannot build the error log and pass it on to the guest
kernel. Right now nmi monitor command takes cpu number as the only argument.

So I think TCG support should be a separate patch by itself.

Regards,
Aravinda

> 
>> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
>> +        error_setg(errp,
>> +"Firmware Assisted Non-Maskable Interrupts not supported by KVM, try cap-fwnmi-mce=off");
>> +    }
>> +}
>> +
>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>      [SPAPR_CAP_HTM] = {
>>          .name = "htm",
>> @@ -578,6 +594,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>          .type = "bool",
>>          .apply = cap_ccf_assist_apply,
>>      },
>> +    [SPAPR_CAP_FWNMI_MCE] = {
>> +        .name = "fwnmi-mce",
>> +        .description = "Handle fwnmi machine check exceptions",
>> +        .index = SPAPR_CAP_FWNMI_MCE,
>> +        .get = spapr_cap_get_bool,
>> +        .set = spapr_cap_set_bool,
>> +        .type = "bool",
>> +        .apply = cap_fwnmi_mce_apply,
>> +    },
>>  };
>>  
>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>> @@ -717,6 +742,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>  
>>  void spapr_caps_init(SpaprMachineState *spapr)
>>  {
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 4f5becf..f891f8f 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -78,8 +78,10 @@ typedef enum {
>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>>  /* Count Cache Flush Assist HW Instruction */
>>  #define SPAPR_CAP_CCF_ASSIST            0x09
>> +/* FWNMI machine check handling */
>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>>  /* Num Caps */
>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>>  
>>  /*
>>   * Capability Values
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 3bf0a46..afef4cd 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
>>  static int cap_ppc_count_cache_flush_assist;
>>  static int cap_ppc_nested_kvm_hv;
>>  static int cap_large_decr;
>> +static int cap_ppc_fwnmi;
>>  
>>  static uint32_t debug_inst_opcode;
>>  
>> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>      kvmppc_get_cpu_characteristics(s);
>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>>      cap_large_decr = kvmppc_get_dec_bits();
>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>      /*
>>       * Note: setting it to false because there is not such capability
>>       * in KVM at this moment.
>> @@ -2114,6 +2116,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>      }
>>  }
>>  
>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>> +{
>> +    CPUState *cs = CPU(cpu);
>> +
>> +    if (!cap_ppc_fwnmi) {
>> +        return 1;
>> +    }
>> +
>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>> +}
>> +
>> +
>>  int kvmppc_smt_threads(void)
>>  {
>>      return cap_ppc_smt ? cap_ppc_smt : 1;
>> @@ -2414,6 +2428,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>>      return cap_mmu_hash_v3;
>>  }
>>  
>> +bool kvmppc_has_cap_ppc_fwnmi(void)
>> +{
>> +    return cap_ppc_fwnmi;
>> +}
>> +
>>  static bool kvmppc_power8_host(void)
>>  {
>>      bool ret = false;
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index 45776ca..880cee9 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
>>  void kvmppc_set_papr(PowerPCCPU *cpu);
>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>> +bool kvmppc_has_cap_ppc_fwnmi(void);
>>  int kvmppc_smt_threads(void);
>>  void kvmppc_hint_smt_possible(Error **errp);
>>  int kvmppc_set_smt_threads(int smt);
>> @@ -158,6 +160,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>  {
>>  }
>>  
>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>> +{
>> +    return 1;
>> +}
>> +
>> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
>> +{
>> +    return false;
>> +}
>> +
>>  static inline int kvmppc_smt_threads(void)
>>  {
>>      return 1;
>>
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE
  2019-07-02  4:03   ` David Gibson
@ 2019-07-02  9:49     ` Aravinda Prasad
  2019-07-03  3:07       ` David Gibson
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-02  9:49 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Tuesday 02 July 2019 09:33 AM, David Gibson wrote:
> On Wed, Jun 12, 2019 at 02:51:21PM +0530, Aravinda Prasad wrote:
>> Upon a machine check exception (MCE) in a guest address space,
>> KVM causes a guest exit to enable QEMU to build and pass the
>> error to the guest in the PAPR defined rtas error log format.
>>
>> This patch builds the rtas error log, copies it to the rtas_addr
>> and then invokes the guest registered machine check handler. The
>> handler in the guest takes suitable action(s) depending on the type
>> and criticality of the error. For example, if an error is
>> unrecoverable memory corruption in an application inside the
>> guest, then the guest kernel sends a SIGBUS to the application.
>> For recoverable errors, the guest performs recovery actions and
>> logs the error.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |   13 +++
>>  hw/ppc/spapr_events.c  |  238 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |   26 +++++
>>  include/hw/ppc/spapr.h |    6 +
>>  target/ppc/kvm.c       |    4 +
>>  5 files changed, 284 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 6cc2c3b..d61905b 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2908,6 +2908,19 @@ static void spapr_machine_init(MachineState *machine)
>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>          exit(1);
>>      }
>> +
>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
>> +        /*
>> +         * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
>> +         * or else the rtas image will be overwritten with the rtas error log
>> +         * when a machine check exception is encountered.
>> +         */
>> +        g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
>> +
>> +        /* Resize rtas blob to accommodate error log */
>> +        spapr->rtas_size = RTAS_ERROR_LOG_MAX;
>> +    }
>> +
>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>>          error_report("Could not load LPAR rtas '%s'", filename);
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index a0c66d7..51c052e 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -212,6 +212,106 @@ struct hp_extended_log {
>>      struct rtas_event_log_v6_hp hp;
>>  } QEMU_PACKED;
>>  
>> +struct rtas_event_log_v6_mc {
>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>> +    struct rtas_event_log_v6_section_header hdr;
>> +    uint32_t fru_id;
>> +    uint32_t proc_id;
>> +    uint8_t error_type;
>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>> +    uint8_t sub_err_type;
>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>> +    uint8_t reserved_1[6];
>> +    uint64_t effective_address;
>> +    uint64_t logical_address;
>> +} QEMU_PACKED;
>> +
>> +struct mc_extended_log {
>> +    struct rtas_event_log_v6 v6hdr;
>> +    struct rtas_event_log_v6_mc mc;
>> +} QEMU_PACKED;
>> +
>> +struct MC_ierror_table {
>> +    unsigned long srr1_mask;
>> +    unsigned long srr1_value;
>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_ierror_table mc_ierror_table[] = {
>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0, 0, 0, 0, 0, 0 } };
>> +
>> +struct MC_derror_table {
>> +    unsigned long dsisr_value;
>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_derror_table mc_derror_table[] = {
>> +{ 0x00008000, false,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00004000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000800, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000400, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000080, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000100, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0, false, 0, 0, 0, 0 } };
>> +
>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>> +
>>  typedef enum EventClass {
>>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>      EVENT_CLASS_EPOW                = 1,
>> @@ -620,7 +720,141 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>  }
>>  
>> -void spapr_mce_req_event(PowerPCCPU *cpu)
>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>> +                                        struct mc_extended_log *ext_elog)
>> +{
>> +    int i;
>> +    CPUPPCState *env = &cpu->env;
>> +    uint32_t summary;
>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>> +
>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>> +    if (recovered) {
>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>> +    } else {
>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>> +    }
>> +
>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>> +            if (mc_derror_table[i].dar_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
>> +            }
>> +
>> +            summary |= mc_derror_table[i].initiator
>> +                        | mc_derror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    } else {
>> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>> +                    mc_ierror_table[i].srr1_value) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>> +            if (mc_ierror_table[i].nip_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
>> +            }
>> +
>> +            summary |= mc_ierror_table[i].initiator
>> +                        | mc_ierror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    }
>> +
>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>> +    return summary;
>> +}
>> +
>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    CPUState *cs = CPU(cpu);
>> +    uint64_t rtas_addr;
>> +    CPUPPCState *env = &cpu->env;
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    target_ulong r3, msr = 0;
>> +    struct rtas_error_log log;
>> +    struct mc_extended_log *ext_elog;
>> +    uint32_t summary;
>> +
>> +    /*
>> +     * Properly set bits in MSR before we invoke the handler.
>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>> +     */
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +    if (env->msr & (1ULL << MSR_SF)) {
>> +        msr |= (1ULL << MSR_SF);
>> +    }
>> +
>> +    msr |= (1ULL << MSR_ME);
>> +
>> +    if (spapr->guest_machine_check_addr == -1) {
>> +        /*
>> +         * This implies that we have hit a machine check between system
>> +         * reset and "ibm,nmi-register". Fall back to the old machine
>> +         * check behavior in such cases.
>> +         */
>> +        env->spr[SPR_SRR0] = env->nip;
>> +        env->spr[SPR_SRR1] = env->msr;
>> +        env->msr = msr;
>> +        env->nip = 0x200;
>> +        return;
> 
> Hm, does this differ from what ppc_cpu_do_system_reset() will do?

Not much, but we branch to 0x200 instead of 0x100. But I think calling

powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_MCHECK)

should also work. Let me if you prefer using powerpc_excp().

> 
>> +    }
>> +
>> +    ext_elog = g_malloc0(sizeof(*ext_elog));
>> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
>> +
>> +    log.summary = cpu_to_be32(summary);
>> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
>> +
>> +    /* r3 should be in BE always */
>> +    r3 = cpu_to_be64(env->gpr[3]);
> 
> I don't love bare integer variables being store in a particular
> "endianness" because logically they're endianness-free values.  They
> only get an endianness when they are placed into byte-ordered memory.

ok.

> 
>> +    env->msr = msr;
>> +
>> +    spapr_init_v6hdr(&ext_elog->v6hdr);
>> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
>> +    ext_elog->mc.hdr.section_length =
>> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
>> +    ext_elog->mc.hdr.section_version = 1;
>> +
>> +    /* get rtas addr from fdt */
>> +    rtas_addr = spapr_get_rtas_addr();
>> +    if (!rtas_addr) {
>> +        /* Unable to fetch rtas_addr. Hence reset the guest */
>> +        ppc_cpu_do_system_reset(cs);
>> +        g_free(ext_elog);
>> +        return;
>> +    }
>> +
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
>> +                              sizeof(r3));
> 
> So, I'd prefer to see the endianness coversion done as part of the
> memory store.  You can do that using the stl_be_phys() helper.

ok.

Regards,
Aravinda

> 
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
>> +                              &log, sizeof(log));
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
>> +                              sizeof(log), ext_elog,
>> +                              sizeof(*ext_elog));
>> +
>> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
>> +    env->nip = spapr->guest_machine_check_addr;
>> +
>> +    g_free(ext_elog);
>> +}
>> +
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>  
>> @@ -641,6 +875,8 @@ void spapr_mce_req_event(PowerPCCPU *cpu)
>>          }
>>      }
>>      spapr->mc_status = cpu->vcpu_id;
>> +
>> +    spapr_mce_dispatch_elog(cpu, recovered);
>>  }
>>  
>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 5bc1a93..a015a80 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -470,6 +470,32 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>      }
>>  }
>>  
>> +hwaddr spapr_get_rtas_addr(void)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int rtas_node;
>> +    const fdt32_t *rtas_data;
>> +    void *fdt = spapr->fdt_blob;
>> +
>> +    /* fetch rtas addr from fdt */
>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>> +    if (rtas_node < 0) {
>> +        return 0;
>> +    }
>> +
>> +    rtas_data = fdt_getprop(fdt, rtas_node, "linux,rtas-base", NULL);
>> +    if (!rtas_data) {
>> +        return 0;
>> +    }
>> +
>> +    /*
>> +     * We assume that the OS called RTAS instantiate-rtas, but some other
>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
>> +     * as SLOF only supports 32-bit variant.
>> +     */
>> +    return (hwaddr)fdt32_to_cpu(*rtas_data);
>> +}
>> +
>>  static void core_rtas_register_types(void)
>>  {
>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index f34c79f..debb57b 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
>>  
>>  #define RTAS_ERROR_LOG_MAX      2048
>>  
>> +/* Offset from rtas-base where error log is placed */
>> +#define RTAS_ERROR_LOG_OFFSET       0x30
>> +
>>  #define RTAS_EVENT_SCAN_RATE    1
>>  
>>  /* This helper should be used to encode interrupt specifiers when the related
>> @@ -798,7 +801,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>>  int spapr_max_server_number(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                        uint64_t pte0, uint64_t pte1);
>> -void spapr_mce_req_event(PowerPCCPU *cpu);
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>  
>>  /* DRC callbacks. */
>>  void spapr_core_release(DeviceState *dev);
>> @@ -888,4 +891,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>  
>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>> +hwaddr spapr_get_rtas_addr(void);
>>  #endif /* HW_SPAPR_H */
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 99f33fe..368ec6e 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -2870,9 +2870,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>  
>>  int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>>  {
>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>> +
>>      cpu_synchronize_state(CPU(cpu));
>>  
>> -    spapr_mce_req_event(cpu);
>> +    spapr_mce_req_event(cpu, recovered);
>>  
>>      return 0;
>>  }
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-02  4:11   ` [Qemu-devel] " David Gibson
@ 2019-07-02 10:40     ` Aravinda Prasad
  2019-07-03  3:20       ` David Gibson
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-02 10:40 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls and sets the default
>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
>> type 4.0.
>>
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor that also received a machine check error waits
>> till the first processor is done reading the error log.
>> The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |    6 ++++-
>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    5 +++-
>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3d6d139..213d493 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>>          /* Create the error string for live migration blocker */
>>          error_setg(&spapr->fwnmi_migration_blocker,
>>                  "Live migration not supported during machine check handling");
>> +
>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>> +        spapr_fwnmi_register();
>>      }
>>  
>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> 
> Turning this on by default really isn't ok if it stops you running TCG
> guests at all.

If so this can be "off" by default until TCG is supported.

> 
>>      spapr_caps_add_properties(smc, &error_abort);
>>      smc->irq = &spapr_irq_dual;
>>      smc->dr_phb_enabled = true;
>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> 
> We're now well past 4.0, and in fact we're about to go into soft
> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> will need to retain the old default.

ok.

> 
>>  }
>>  
>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index a015a80..e010cb2 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -49,6 +49,7 @@
>>  #include "hw/ppc/fdt.h"
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>> +#include "migration/blocker.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>      rtas_st(rets, 1, 100);
>>  }
>>  
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  SpaprMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    int ret;
>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>> +
>> +    if (!rtas_addr) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    ret = kvmppc_fwnmi_enable(cpu);
>> +    if (ret == 1) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> 
> I don't understand this case separate from the others.  We've already
> set the cap, so fwnmi support should be checked and available.

But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
cap_ppc_fwnmi is not available in KVM.

> 
>> +        return;
>> +    } else if (ret < 0) {
>> +        error_report("Couldn't enable KVM FWNMI capability");
>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> +        return;
>> +    }
>> +
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (spapr->guest_machine_check_addr == -1) {
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +    } else {
>> +        /*
>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +         * hence unset mc_status.
>> +         */
>> +        spapr->mc_status = -1;
>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> 
> Hrm.  We add the blocker at the mce request point.  First, that's in
> another patch, which isn't great.  Second, does that mean we could add
> multiple times if we get an MCE on multiple CPUs?  Will that work and
> correctly match adds and removes properly?

If it is fine to move the migration patch as the last patch in the
sequence, then we will have add and del blocker in the same patch.

And yes we could add multiple times if we get MCE on multiple CPUs and
as all those cpus call interlock there should be matching number of
delete blockers.

Regards,
Aravinda

> 
>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    }
>> +}
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -496,6 +551,14 @@ hwaddr spapr_get_rtas_addr(void)
>>      return (hwaddr)fdt32_to_cpu(*rtas_data);
>>  }
>>  
>> +void spapr_fwnmi_register(void)
>> +{
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>> +}
>> +
>>  static void core_rtas_register_types(void)
>>  {
>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 0dedf0a..7ae53e2 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -637,8 +637,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -894,4 +896,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>  
>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>  hwaddr spapr_get_rtas_addr(void);
>> +void spapr_fwnmi_register(void);
>>  #endif /* HW_SPAPR_H */
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-02  6:24     ` Aravinda Prasad
@ 2019-07-03  3:03       ` David Gibson
  2019-07-03  9:28         ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-03  3:03 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4224 bytes --]

On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
> > On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
> >> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
> >> the KVM causes guest exit with NMI as exit reason
> >> when it encounters a machine check exception on the
> >> address belonging to a guest. Without this capability
> >> enabled, KVM redirects machine check exceptions to
> >> guest's 0x200 vector.
> >>
> >> This patch also introduces fwnmi-mce capability to
> >> deal with the case when a guest with the
> >> KVM_CAP_PPC_FWNMI capability enabled is attempted
> >> to migrate to a host that does not support this
> >> capability.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |    1 +
> >>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |    4 +++-
> >>  target/ppc/kvm.c       |   19 +++++++++++++++++++
> >>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
> >>  5 files changed, 61 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 6dd8aaa..2ef86aa 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>      spapr_caps_add_properties(smc, &error_abort);
> >>      smc->irq = &spapr_irq_dual;
> >>      smc->dr_phb_enabled = true;
> >> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >> index 31b4661..2e92eb6 100644
> >> --- a/hw/ppc/spapr_caps.c
> >> +++ b/hw/ppc/spapr_caps.c
> >> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>      }
> >>  }
> >>  
> >> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >> +                                Error **errp)
> >> +{
> >> +    if (!val) {
> >> +        return; /* Disabled by default */
> >> +    }
> >> +
> >> +    if (tcg_enabled()) {
> >> +        error_setg(errp,
> >> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
> > 
> > Not allowing this for TCG creates an awkward incompatibility between
> > KVM and TCG guests.  I can't actually see any reason to ban it for TCG
> > - with the current code TCG won't ever generate NMIs, but I don't see
> > that anything will actually break.
> > 
> > In fact, we do have an nmi monitor command, currently wired to the
> > spapr_nmi() function which resets each cpu, but it probably makes
> > sense to wire it up to the fwnmi stuff when present.
> 
> Yes, but that nmi support is not enough to inject a synchronous error
> into the guest kernel. For example, we should provide the faulty address
> along with other information such as the type of error (slb multi-hit,
> memory error, TLB multi-hit) and when the error occurred (load/store)
> and whether the error was completely recovered or not. Without such
> information we cannot build the error log and pass it on to the guest
> kernel. Right now nmi monitor command takes cpu number as the only argument.

Obviously we can't inject an arbitrary MCE event with that monitor
command.  But isn't there some sort of catch-all / unknown type of MCE
event which we could inject?

It seems very confusing to me to have 2 totally separate "nmi"
mechanisms.

> So I think TCG support should be a separate patch by itself.

Even if we don't wire up the monitor command, I still don't see
anything that this patch breaks - we can support the nmi-register and
nmi-interlock calls without ever actually creating MCE events.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE
  2019-07-02  9:49     ` Aravinda Prasad
@ 2019-07-03  3:07       ` David Gibson
  2019-07-03  7:16         ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-03  3:07 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 12200 bytes --]

On Tue, Jul 02, 2019 at 03:19:24PM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 02 July 2019 09:33 AM, David Gibson wrote:
> > On Wed, Jun 12, 2019 at 02:51:21PM +0530, Aravinda Prasad wrote:
> >> Upon a machine check exception (MCE) in a guest address space,
> >> KVM causes a guest exit to enable QEMU to build and pass the
> >> error to the guest in the PAPR defined rtas error log format.
> >>
> >> This patch builds the rtas error log, copies it to the rtas_addr
> >> and then invokes the guest registered machine check handler. The
> >> handler in the guest takes suitable action(s) depending on the type
> >> and criticality of the error. For example, if an error is
> >> unrecoverable memory corruption in an application inside the
> >> guest, then the guest kernel sends a SIGBUS to the application.
> >> For recoverable errors, the guest performs recovery actions and
> >> logs the error.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |   13 +++
> >>  hw/ppc/spapr_events.c  |  238 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  hw/ppc/spapr_rtas.c    |   26 +++++
> >>  include/hw/ppc/spapr.h |    6 +
> >>  target/ppc/kvm.c       |    4 +
> >>  5 files changed, 284 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 6cc2c3b..d61905b 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2908,6 +2908,19 @@ static void spapr_machine_init(MachineState *machine)
> >>          error_report("Could not get size of LPAR rtas '%s'", filename);
> >>          exit(1);
> >>      }
> >> +
> >> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
> >> +        /*
> >> +         * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
> >> +         * or else the rtas image will be overwritten with the rtas error log
> >> +         * when a machine check exception is encountered.
> >> +         */
> >> +        g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
> >> +
> >> +        /* Resize rtas blob to accommodate error log */
> >> +        spapr->rtas_size = RTAS_ERROR_LOG_MAX;
> >> +    }
> >> +
> >>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
> >>          error_report("Could not load LPAR rtas '%s'", filename);
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index a0c66d7..51c052e 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -212,6 +212,106 @@ struct hp_extended_log {
> >>      struct rtas_event_log_v6_hp hp;
> >>  } QEMU_PACKED;
> >>  
> >> +struct rtas_event_log_v6_mc {
> >> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> >> +    struct rtas_event_log_v6_section_header hdr;
> >> +    uint32_t fru_id;
> >> +    uint32_t proc_id;
> >> +    uint8_t error_type;
> >> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> >> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> >> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> >> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> >> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> >> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> >> +    uint8_t sub_err_type;
> >> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> >> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> >> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> >> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> >> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> >> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> >> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> >> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> >> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> >> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> >> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> >> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> >> +    uint8_t reserved_1[6];
> >> +    uint64_t effective_address;
> >> +    uint64_t logical_address;
> >> +} QEMU_PACKED;
> >> +
> >> +struct mc_extended_log {
> >> +    struct rtas_event_log_v6 v6hdr;
> >> +    struct rtas_event_log_v6_mc mc;
> >> +} QEMU_PACKED;
> >> +
> >> +struct MC_ierror_table {
> >> +    unsigned long srr1_mask;
> >> +    unsigned long srr1_value;
> >> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_ierror_table mc_ierror_table[] = {
> >> +{ 0x00000000081c0000, 0x0000000000040000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000080000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000100000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000140000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000180000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, 0, 0, 0, 0, 0 } };
> >> +
> >> +struct MC_derror_table {
> >> +    unsigned long dsisr_value;
> >> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_derror_table mc_derror_table[] = {
> >> +{ 0x00008000, false,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00004000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000800, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000400, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000080, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000100, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, false, 0, 0, 0, 0 } };
> >> +
> >> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> >> +
> >>  typedef enum EventClass {
> >>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
> >>      EVENT_CLASS_EPOW                = 1,
> >> @@ -620,7 +720,141 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> >>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >>  }
> >>  
> >> -void spapr_mce_req_event(PowerPCCPU *cpu)
> >> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> >> +                                        struct mc_extended_log *ext_elog)
> >> +{
> >> +    int i;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    uint32_t summary;
> >> +    uint64_t dsisr = env->spr[SPR_DSISR];
> >> +
> >> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> >> +    if (recovered) {
> >> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> >> +    } else {
> >> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> >> +    }
> >> +
> >> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> >> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> >> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> >> +            if (mc_derror_table[i].dar_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> >> +            }
> >> +
> >> +            summary |= mc_derror_table[i].initiator
> >> +                        | mc_derror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    } else {
> >> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> >> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> >> +                    mc_ierror_table[i].srr1_value) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> >> +            if (mc_ierror_table[i].nip_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> >> +            }
> >> +
> >> +            summary |= mc_ierror_table[i].initiator
> >> +                        | mc_ierror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    }
> >> +
> >> +    summary |= RTAS_LOG_INITIATOR_CPU;
> >> +    return summary;
> >> +}
> >> +
> >> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >> +{
> >> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    CPUState *cs = CPU(cpu);
> >> +    uint64_t rtas_addr;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> >> +    target_ulong r3, msr = 0;
> >> +    struct rtas_error_log log;
> >> +    struct mc_extended_log *ext_elog;
> >> +    uint32_t summary;
> >> +
> >> +    /*
> >> +     * Properly set bits in MSR before we invoke the handler.
> >> +     * SRR0/1, DAR and DSISR are properly set by KVM
> >> +     */
> >> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> >> +        msr |= (1ULL << MSR_LE);
> >> +    }
> >> +
> >> +    if (env->msr & (1ULL << MSR_SF)) {
> >> +        msr |= (1ULL << MSR_SF);
> >> +    }
> >> +
> >> +    msr |= (1ULL << MSR_ME);
> >> +
> >> +    if (spapr->guest_machine_check_addr == -1) {
> >> +        /*
> >> +         * This implies that we have hit a machine check between system
> >> +         * reset and "ibm,nmi-register". Fall back to the old machine
> >> +         * check behavior in such cases.
> >> +         */
> >> +        env->spr[SPR_SRR0] = env->nip;
> >> +        env->spr[SPR_SRR1] = env->msr;
> >> +        env->msr = msr;
> >> +        env->nip = 0x200;
> >> +        return;
> > 
> > Hm, does this differ from what ppc_cpu_do_system_reset() will do?
> 
> Not much, but we branch to 0x200 instead of 0x100. But I think calling
> 
> powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_MCHECK)
> 
> should also work. Let me if you prefer using powerpc_excp().

Yes, I'd prefer to share code with existing exception paths where
possible.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-02 10:40     ` Aravinda Prasad
@ 2019-07-03  3:20       ` David Gibson
  2019-07-03  9:00         ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-03  3:20 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 7592 bytes --]

On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> > On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >> This patch adds support in QEMU to handle "ibm,nmi-register"
> >> and "ibm,nmi-interlock" RTAS calls and sets the default
> >> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >> type 4.0.
> >>
> >> The machine check notification address is saved when the
> >> OS issues "ibm,nmi-register" RTAS call.
> >>
> >> This patch also handles the case when multiple processors
> >> experience machine check at or about the same time by
> >> handling "ibm,nmi-interlock" call. In such cases, as per
> >> PAPR, subsequent processors serialize waiting for the first
> >> processor to issue the "ibm,nmi-interlock" call. The second
> >> processor that also received a machine check error waits
> >> till the first processor is done reading the error log.
> >> The first processor issues "ibm,nmi-interlock" call
> >> when the error log is consumed.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |    6 ++++-
> >>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |    5 +++-
> >>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 3d6d139..213d493 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
> >>          /* Create the error string for live migration blocker */
> >>          error_setg(&spapr->fwnmi_migration_blocker,
> >>                  "Live migration not supported during machine check handling");
> >> +
> >> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >> +        spapr_fwnmi_register();
> >>      }
> >>  
> >>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> > 
> > Turning this on by default really isn't ok if it stops you running TCG
> > guests at all.
> 
> If so this can be "off" by default until TCG is supported.
> 
> > 
> >>      spapr_caps_add_properties(smc, &error_abort);
> >>      smc->irq = &spapr_irq_dual;
> >>      smc->dr_phb_enabled = true;
> >> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
> >>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> >> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> > 
> > We're now well past 4.0, and in fact we're about to go into soft
> > freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> > will need to retain the old default.
> 
> ok.
> 
> > 
> >>  }
> >>  
> >>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index a015a80..e010cb2 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -49,6 +49,7 @@
> >>  #include "hw/ppc/fdt.h"
> >>  #include "target/ppc/mmu-hash64.h"
> >>  #include "target/ppc/mmu-book3s-v3.h"
> >> +#include "migration/blocker.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>      rtas_st(rets, 1, 100);
> >>  }
> >>  
> >> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >> +                                  SpaprMachineState *spapr,
> >> +                                  uint32_t token, uint32_t nargs,
> >> +                                  target_ulong args,
> >> +                                  uint32_t nret, target_ulong rets)
> >> +{
> >> +    int ret;
> >> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >> +
> >> +    if (!rtas_addr) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    }
> >> +
> >> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    }
> >> +
> >> +    ret = kvmppc_fwnmi_enable(cpu);
> >> +    if (ret == 1) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> > 
> > I don't understand this case separate from the others.  We've already
> > set the cap, so fwnmi support should be checked and available.
> 
> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
> cap_ppc_fwnmi is not available in KVM.

But you've checked for the presence of the extension, yes?  So a
failure to enable the cap would be unexpected.  In which case how does
this case differ from.. 

> 
> > 
> >> +        return;
> >> +    } else if (ret < 0) {
> >> +        error_report("Couldn't enable KVM FWNMI capability");
> >> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >> +        return;

..this case.

> >> +    }
> >> +
> >> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >> +}
> >> +
> >> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >> +                                   SpaprMachineState *spapr,
> >> +                                   uint32_t token, uint32_t nargs,
> >> +                                   target_ulong args,
> >> +                                   uint32_t nret, target_ulong rets)
> >> +{
> >> +    if (spapr->guest_machine_check_addr == -1) {
> >> +        /* NMI register not called */
> >> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >> +    } else {
> >> +        /*
> >> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >> +         * hence unset mc_status.
> >> +         */
> >> +        spapr->mc_status = -1;
> >> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> > 
> > Hrm.  We add the blocker at the mce request point.  First, that's in
> > another patch, which isn't great.  Second, does that mean we could add
> > multiple times if we get an MCE on multiple CPUs?  Will that work and
> > correctly match adds and removes properly?
> 
> If it is fine to move the migration patch as the last patch in the
> sequence, then we will have add and del blocker in the same patch.
> 
> And yes we could add multiple times if we get MCE on multiple CPUs and
> as all those cpus call interlock there should be matching number of
> delete blockers.

Ok, and I think adding the same pointer to the list multiple times
will work ok.

Btw, add_blocker() can fail - have you handled failure conditions?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE
  2019-07-03  3:07       ` David Gibson
@ 2019-07-03  7:16         ` Aravinda Prasad
  0 siblings, 0 replies; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-03  7:16 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Wednesday 03 July 2019 08:37 AM, David Gibson wrote:
> On Tue, Jul 02, 2019 at 03:19:24PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 02 July 2019 09:33 AM, David Gibson wrote:
>>> On Wed, Jun 12, 2019 at 02:51:21PM +0530, Aravinda Prasad wrote:
>>>> Upon a machine check exception (MCE) in a guest address space,
>>>> KVM causes a guest exit to enable QEMU to build and pass the
>>>> error to the guest in the PAPR defined rtas error log format.
>>>>
>>>> This patch builds the rtas error log, copies it to the rtas_addr
>>>> and then invokes the guest registered machine check handler. The
>>>> handler in the guest takes suitable action(s) depending on the type
>>>> and criticality of the error. For example, if an error is
>>>> unrecoverable memory corruption in an application inside the
>>>> guest, then the guest kernel sends a SIGBUS to the application.
>>>> For recoverable errors, the guest performs recovery actions and
>>>> logs the error.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr.c         |   13 +++
>>>>  hw/ppc/spapr_events.c  |  238 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  hw/ppc/spapr_rtas.c    |   26 +++++
>>>>  include/hw/ppc/spapr.h |    6 +
>>>>  target/ppc/kvm.c       |    4 +
>>>>  5 files changed, 284 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 6cc2c3b..d61905b 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -2908,6 +2908,19 @@ static void spapr_machine_init(MachineState *machine)
>>>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>>>          exit(1);
>>>>      }
>>>> +
>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_ON) {
>>>> +        /*
>>>> +         * Ensure that the rtas image size is less than RTAS_ERROR_LOG_OFFSET
>>>> +         * or else the rtas image will be overwritten with the rtas error log
>>>> +         * when a machine check exception is encountered.
>>>> +         */
>>>> +        g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
>>>> +
>>>> +        /* Resize rtas blob to accommodate error log */
>>>> +        spapr->rtas_size = RTAS_ERROR_LOG_MAX;
>>>> +    }
>>>> +
>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>>>>          error_report("Could not load LPAR rtas '%s'", filename);
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index a0c66d7..51c052e 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -212,6 +212,106 @@ struct hp_extended_log {
>>>>      struct rtas_event_log_v6_hp hp;
>>>>  } QEMU_PACKED;
>>>>  
>>>> +struct rtas_event_log_v6_mc {
>>>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>>>> +    struct rtas_event_log_v6_section_header hdr;
>>>> +    uint32_t fru_id;
>>>> +    uint32_t proc_id;
>>>> +    uint8_t error_type;
>>>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>>>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>>>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>>>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>>>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>>>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>>>> +    uint8_t sub_err_type;
>>>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>>>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>>>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>>>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>>>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>>>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>>>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>>>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>>>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>>>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>>>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>>>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>>>> +    uint8_t reserved_1[6];
>>>> +    uint64_t effective_address;
>>>> +    uint64_t logical_address;
>>>> +} QEMU_PACKED;
>>>> +
>>>> +struct mc_extended_log {
>>>> +    struct rtas_event_log_v6 v6hdr;
>>>> +    struct rtas_event_log_v6_mc mc;
>>>> +} QEMU_PACKED;
>>>> +
>>>> +struct MC_ierror_table {
>>>> +    unsigned long srr1_mask;
>>>> +    unsigned long srr1_value;
>>>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>>>> +    uint8_t error_type;
>>>> +    uint8_t error_subtype;
>>>> +    unsigned int initiator;
>>>> +    unsigned int severity;
>>>> +};
>>>> +
>>>> +static const struct MC_ierror_table mc_ierror_table[] = {
>>>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0, 0, 0, 0, 0, 0 } };
>>>> +
>>>> +struct MC_derror_table {
>>>> +    unsigned long dsisr_value;
>>>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>>>> +    uint8_t error_type;
>>>> +    uint8_t error_subtype;
>>>> +    unsigned int initiator;
>>>> +    unsigned int severity;
>>>> +};
>>>> +
>>>> +static const struct MC_derror_table mc_derror_table[] = {
>>>> +{ 0x00008000, false,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00004000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000800, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000400, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000080, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000100, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0, false, 0, 0, 0, 0 } };
>>>> +
>>>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>>>> +
>>>>  typedef enum EventClass {
>>>>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>>>      EVENT_CLASS_EPOW                = 1,
>>>> @@ -620,7 +720,141 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>>>  }
>>>>  
>>>> -void spapr_mce_req_event(PowerPCCPU *cpu)
>>>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>>>> +                                        struct mc_extended_log *ext_elog)
>>>> +{
>>>> +    int i;
>>>> +    CPUPPCState *env = &cpu->env;
>>>> +    uint32_t summary;
>>>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>>>> +
>>>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>>>> +    if (recovered) {
>>>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>>>> +    } else {
>>>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>>>> +    }
>>>> +
>>>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>>>> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
>>>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>>>> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>>>> +            if (mc_derror_table[i].dar_valid) {
>>>> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
>>>> +            }
>>>> +
>>>> +            summary |= mc_derror_table[i].initiator
>>>> +                        | mc_derror_table[i].severity;
>>>> +
>>>> +            return summary;
>>>> +        }
>>>> +    } else {
>>>> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
>>>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>>>> +                    mc_ierror_table[i].srr1_value) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>>>> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>>>> +            if (mc_ierror_table[i].nip_valid) {
>>>> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
>>>> +            }
>>>> +
>>>> +            summary |= mc_ierror_table[i].initiator
>>>> +                        | mc_ierror_table[i].severity;
>>>> +
>>>> +            return summary;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>>>> +    return summary;
>>>> +}
>>>> +
>>>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>>> +{
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> +    CPUState *cs = CPU(cpu);
>>>> +    uint64_t rtas_addr;
>>>> +    CPUPPCState *env = &cpu->env;
>>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>>> +    target_ulong r3, msr = 0;
>>>> +    struct rtas_error_log log;
>>>> +    struct mc_extended_log *ext_elog;
>>>> +    uint32_t summary;
>>>> +
>>>> +    /*
>>>> +     * Properly set bits in MSR before we invoke the handler.
>>>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>>>> +     */
>>>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>>>> +        msr |= (1ULL << MSR_LE);
>>>> +    }
>>>> +
>>>> +    if (env->msr & (1ULL << MSR_SF)) {
>>>> +        msr |= (1ULL << MSR_SF);
>>>> +    }
>>>> +
>>>> +    msr |= (1ULL << MSR_ME);
>>>> +
>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>> +        /*
>>>> +         * This implies that we have hit a machine check between system
>>>> +         * reset and "ibm,nmi-register". Fall back to the old machine
>>>> +         * check behavior in such cases.
>>>> +         */
>>>> +        env->spr[SPR_SRR0] = env->nip;
>>>> +        env->spr[SPR_SRR1] = env->msr;
>>>> +        env->msr = msr;
>>>> +        env->nip = 0x200;
>>>> +        return;
>>>
>>> Hm, does this differ from what ppc_cpu_do_system_reset() will do?
>>
>> Not much, but we branch to 0x200 instead of 0x100. But I think calling
>>
>> powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_MCHECK)
>>
>> should also work. Let me if you prefer using powerpc_excp().
> 
> Yes, I'd prefer to share code with existing exception paths where
> possible.

sure.

> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-03  3:20       ` David Gibson
@ 2019-07-03  9:00         ` Aravinda Prasad
  2019-07-04  1:12           ` David Gibson
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-03  9:00 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
>>>> type 4.0.
>>>>
>>>> The machine check notification address is saved when the
>>>> OS issues "ibm,nmi-register" RTAS call.
>>>>
>>>> This patch also handles the case when multiple processors
>>>> experience machine check at or about the same time by
>>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>>> PAPR, subsequent processors serialize waiting for the first
>>>> processor to issue the "ibm,nmi-interlock" call. The second
>>>> processor that also received a machine check error waits
>>>> till the first processor is done reading the error log.
>>>> The first processor issues "ibm,nmi-interlock" call
>>>> when the error log is consumed.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr.c         |    6 ++++-
>>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |    5 +++-
>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 3d6d139..213d493 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>>>>          /* Create the error string for live migration blocker */
>>>>          error_setg(&spapr->fwnmi_migration_blocker,
>>>>                  "Live migration not supported during machine check handling");
>>>> +
>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>>>> +        spapr_fwnmi_register();
>>>>      }
>>>>  
>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>>
>>> Turning this on by default really isn't ok if it stops you running TCG
>>> guests at all.
>>
>> If so this can be "off" by default until TCG is supported.
>>
>>>
>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>      smc->irq = &spapr_irq_dual;
>>>>      smc->dr_phb_enabled = true;
>>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>
>>> We're now well past 4.0, and in fact we're about to go into soft
>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
>>> will need to retain the old default.
>>
>> ok.
>>
>>>
>>>>  }
>>>>  
>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index a015a80..e010cb2 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -49,6 +49,7 @@
>>>>  #include "hw/ppc/fdt.h"
>>>>  #include "target/ppc/mmu-hash64.h"
>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>> +#include "migration/blocker.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>      rtas_st(rets, 1, 100);
>>>>  }
>>>>  
>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>> +                                  SpaprMachineState *spapr,
>>>> +                                  uint32_t token, uint32_t nargs,
>>>> +                                  target_ulong args,
>>>> +                                  uint32_t nret, target_ulong rets)
>>>> +{
>>>> +    int ret;
>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>>> +
>>>> +    if (!rtas_addr) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>> +    if (ret == 1) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>
>>> I don't understand this case separate from the others.  We've already
>>> set the cap, so fwnmi support should be checked and available.
>>
>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
>> cap_ppc_fwnmi is not available in KVM.
> 
> But you've checked for the presence of the extension, yes?  So a
> failure to enable the cap would be unexpected.  In which case how does
> this case differ from.. 

No, this is the function where I check for the presence of the
extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
support is available, but don't take any action if unavailable.

So this case is when we are running an old version of KVM with no
cap_ppc_fwnmi support.

> 
>>
>>>
>>>> +        return;
>>>> +    } else if (ret < 0) {
>>>> +        error_report("Couldn't enable KVM FWNMI capability");
>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> +        return;
> 
> ..this case.

And this is when we have the KVM support but due to some problem with
either KVM or QEMU we are unable to enable cap_ppc_fwnmi.

> 
>>>> +    }
>>>> +
>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>> +}
>>>> +
>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>> +                                   SpaprMachineState *spapr,
>>>> +                                   uint32_t token, uint32_t nargs,
>>>> +                                   target_ulong args,
>>>> +                                   uint32_t nret, target_ulong rets)
>>>> +{
>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>> +        /* NMI register not called */
>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>> +    } else {
>>>> +        /*
>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>>>> +         * hence unset mc_status.
>>>> +         */
>>>> +        spapr->mc_status = -1;
>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>>
>>> Hrm.  We add the blocker at the mce request point.  First, that's in
>>> another patch, which isn't great.  Second, does that mean we could add
>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
>>> correctly match adds and removes properly?
>>
>> If it is fine to move the migration patch as the last patch in the
>> sequence, then we will have add and del blocker in the same patch.
>>
>> And yes we could add multiple times if we get MCE on multiple CPUs and
>> as all those cpus call interlock there should be matching number of
>> delete blockers.
> 
> Ok, and I think adding the same pointer to the list multiple times
> will work ok.

I think so

> 
> Btw, add_blocker() can fail - have you handled failure conditions?

yes, I am handling it.

> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-03  3:03       ` David Gibson
@ 2019-07-03  9:28         ` Aravinda Prasad
  2019-07-04  1:07           ` David Gibson
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-03  9:28 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Wednesday 03 July 2019 08:33 AM, David Gibson wrote:
> On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
>>> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
>>>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
>>>> the KVM causes guest exit with NMI as exit reason
>>>> when it encounters a machine check exception on the
>>>> address belonging to a guest. Without this capability
>>>> enabled, KVM redirects machine check exceptions to
>>>> guest's 0x200 vector.
>>>>
>>>> This patch also introduces fwnmi-mce capability to
>>>> deal with the case when a guest with the
>>>> KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>> to migrate to a host that does not support this
>>>> capability.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr.c         |    1 +
>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>>>  5 files changed, 61 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 6dd8aaa..2ef86aa 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>      smc->irq = &spapr_irq_dual;
>>>>      smc->dr_phb_enabled = true;
>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>> index 31b4661..2e92eb6 100644
>>>> --- a/hw/ppc/spapr_caps.c
>>>> +++ b/hw/ppc/spapr_caps.c
>>>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>      }
>>>>  }
>>>>  
>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>> +                                Error **errp)
>>>> +{
>>>> +    if (!val) {
>>>> +        return; /* Disabled by default */
>>>> +    }
>>>> +
>>>> +    if (tcg_enabled()) {
>>>> +        error_setg(errp,
>>>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
>>>
>>> Not allowing this for TCG creates an awkward incompatibility between
>>> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
>>> - with the current code TCG won't ever generate NMIs, but I don't see
>>> that anything will actually break.
>>>
>>> In fact, we do have an nmi monitor command, currently wired to the
>>> spapr_nmi() function which resets each cpu, but it probably makes
>>> sense to wire it up to the fwnmi stuff when present.
>>
>> Yes, but that nmi support is not enough to inject a synchronous error
>> into the guest kernel. For example, we should provide the faulty address
>> along with other information such as the type of error (slb multi-hit,
>> memory error, TLB multi-hit) and when the error occurred (load/store)
>> and whether the error was completely recovered or not. Without such
>> information we cannot build the error log and pass it on to the guest
>> kernel. Right now nmi monitor command takes cpu number as the only argument.
> 
> Obviously we can't inject an arbitrary MCE event with that monitor
> command.  But isn't there some sort of catch-all / unknown type of MCE
> event which we could inject?

We have "unknown" type of error, but we should also pass an address in
the MCE event log. Strictly speaking this address should be a valid
address in the current CPU context as MCEs are synchronous errors
triggered when we touch a bad address.

We can pass a default address with every nmi, but I am not sure whether
that will be practically helpful.

> 
> It seems very confusing to me to have 2 totally separate "nmi"
> mechanisms.
> 
>> So I think TCG support should be a separate patch by itself.
> 
> Even if we don't wire up the monitor command, I still don't see
> anything that this patch breaks - we can support the nmi-register and
> nmi-interlock calls without ever actually creating MCE events.

If we support nmi-register and nmi-interlock calls without the monitor
command wire-up then we will be falsely claiming the nmi support to the
guest while it is not actually supported.

Regards,
Aravinda



> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-03  9:28         ` Aravinda Prasad
@ 2019-07-04  1:07           ` David Gibson
  2019-07-04  5:03             ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-04  1:07 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5470 bytes --]

On Wed, Jul 03, 2019 at 02:58:24PM +0530, Aravinda Prasad wrote:
> 
> 
> On Wednesday 03 July 2019 08:33 AM, David Gibson wrote:
> > On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
> >>> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
> >>>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
> >>>> the KVM causes guest exit with NMI as exit reason
> >>>> when it encounters a machine check exception on the
> >>>> address belonging to a guest. Without this capability
> >>>> enabled, KVM redirects machine check exceptions to
> >>>> guest's 0x200 vector.
> >>>>
> >>>> This patch also introduces fwnmi-mce capability to
> >>>> deal with the case when a guest with the
> >>>> KVM_CAP_PPC_FWNMI capability enabled is attempted
> >>>> to migrate to a host that does not support this
> >>>> capability.
> >>>>
> >>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>> ---
> >>>>  hw/ppc/spapr.c         |    1 +
> >>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
> >>>>  include/hw/ppc/spapr.h |    4 +++-
> >>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
> >>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
> >>>>  5 files changed, 61 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>> index 6dd8aaa..2ef86aa 100644
> >>>> --- a/hw/ppc/spapr.c
> >>>> +++ b/hw/ppc/spapr.c
> >>>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>      smc->irq = &spapr_irq_dual;
> >>>>      smc->dr_phb_enabled = true;
> >>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >>>> index 31b4661..2e92eb6 100644
> >>>> --- a/hw/ppc/spapr_caps.c
> >>>> +++ b/hw/ppc/spapr_caps.c
> >>>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>      }
> >>>>  }
> >>>>  
> >>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >>>> +                                Error **errp)
> >>>> +{
> >>>> +    if (!val) {
> >>>> +        return; /* Disabled by default */
> >>>> +    }
> >>>> +
> >>>> +    if (tcg_enabled()) {
> >>>> +        error_setg(errp,
> >>>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
> >>>
> >>> Not allowing this for TCG creates an awkward incompatibility between
> >>> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
> >>> - with the current code TCG won't ever generate NMIs, but I don't see
> >>> that anything will actually break.
> >>>
> >>> In fact, we do have an nmi monitor command, currently wired to the
> >>> spapr_nmi() function which resets each cpu, but it probably makes
> >>> sense to wire it up to the fwnmi stuff when present.
> >>
> >> Yes, but that nmi support is not enough to inject a synchronous error
> >> into the guest kernel. For example, we should provide the faulty address
> >> along with other information such as the type of error (slb multi-hit,
> >> memory error, TLB multi-hit) and when the error occurred (load/store)
> >> and whether the error was completely recovered or not. Without such
> >> information we cannot build the error log and pass it on to the guest
> >> kernel. Right now nmi monitor command takes cpu number as the only argument.
> > 
> > Obviously we can't inject an arbitrary MCE event with that monitor
> > command.  But isn't there some sort of catch-all / unknown type of MCE
> > event which we could inject?
> 
> We have "unknown" type of error, but we should also pass an address in
> the MCE event log. Strictly speaking this address should be a valid
> address in the current CPU context as MCEs are synchronous errors
> triggered when we touch a bad address.

Well, some of them are.  At least historically both synchronous and
asnchronous MCEs were possible.  Are there really no versions where
you can report an MCE with unknown address?

> We can pass a default address with every nmi, but I am not sure whether
> that will be practically helpful.
> 
> > It seems very confusing to me to have 2 totally separate "nmi"
> > mechanisms.
> > 
> >> So I think TCG support should be a separate patch by itself.
> > 
> > Even if we don't wire up the monitor command, I still don't see
> > anything that this patch breaks - we can support the nmi-register and
> > nmi-interlock calls without ever actually creating MCE events.
> 
> If we support nmi-register and nmi-interlock calls without the monitor
> command wire-up then we will be falsely claiming the nmi support to the
> guest while it is not actually supported.

How so?  AFAICT, from the point of view of the guest this is not
observably different from supporting the NMI mechanism but NMIs never
occurring.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-03  9:00         ` Aravinda Prasad
@ 2019-07-04  1:12           ` David Gibson
  2019-07-04  5:19             ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-04  1:12 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 9143 bytes --]

On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
> 
> 
> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> > On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> >>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>> type 4.0.
> >>>>
> >>>> The machine check notification address is saved when the
> >>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>
> >>>> This patch also handles the case when multiple processors
> >>>> experience machine check at or about the same time by
> >>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>> PAPR, subsequent processors serialize waiting for the first
> >>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>> processor that also received a machine check error waits
> >>>> till the first processor is done reading the error log.
> >>>> The first processor issues "ibm,nmi-interlock" call
> >>>> when the error log is consumed.
> >>>>
> >>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>> ---
> >>>>  hw/ppc/spapr.c         |    6 ++++-
> >>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  include/hw/ppc/spapr.h |    5 +++-
> >>>>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>> index 3d6d139..213d493 100644
> >>>> --- a/hw/ppc/spapr.c
> >>>> +++ b/hw/ppc/spapr.c
> >>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
> >>>>          /* Create the error string for live migration blocker */
> >>>>          error_setg(&spapr->fwnmi_migration_blocker,
> >>>>                  "Live migration not supported during machine check handling");
> >>>> +
> >>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >>>> +        spapr_fwnmi_register();
> >>>>      }
> >>>>  
> >>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>
> >>> Turning this on by default really isn't ok if it stops you running TCG
> >>> guests at all.
> >>
> >> If so this can be "off" by default until TCG is supported.
> >>
> >>>
> >>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>      smc->irq = &spapr_irq_dual;
> >>>>      smc->dr_phb_enabled = true;
> >>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> >>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>
> >>> We're now well past 4.0, and in fact we're about to go into soft
> >>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> >>> will need to retain the old default.
> >>
> >> ok.
> >>
> >>>
> >>>>  }
> >>>>  
> >>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>> index a015a80..e010cb2 100644
> >>>> --- a/hw/ppc/spapr_rtas.c
> >>>> +++ b/hw/ppc/spapr_rtas.c
> >>>> @@ -49,6 +49,7 @@
> >>>>  #include "hw/ppc/fdt.h"
> >>>>  #include "target/ppc/mmu-hash64.h"
> >>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>> +#include "migration/blocker.h"
> >>>>  
> >>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>                                     uint32_t token, uint32_t nargs,
> >>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>      rtas_st(rets, 1, 100);
> >>>>  }
> >>>>  
> >>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>> +                                  SpaprMachineState *spapr,
> >>>> +                                  uint32_t token, uint32_t nargs,
> >>>> +                                  target_ulong args,
> >>>> +                                  uint32_t nret, target_ulong rets)
> >>>> +{
> >>>> +    int ret;
> >>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >>>> +
> >>>> +    if (!rtas_addr) {
> >>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> >>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>> +    if (ret == 1) {
> >>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>
> >>> I don't understand this case separate from the others.  We've already
> >>> set the cap, so fwnmi support should be checked and available.
> >>
> >> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
> >> cap_ppc_fwnmi is not available in KVM.
> > 
> > But you've checked for the presence of the extension, yes?  So a
> > failure to enable the cap would be unexpected.  In which case how does
> > this case differ from.. 
> 
> No, this is the function where I check for the presence of the
> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
> support is available, but don't take any action if unavailable.

Yeah, that's not ok.  You should be checking for the presence of the
extension in the .apply() function.  If you start up with the spapr
cap selected then failing at nmi-register time means something has
gone badly wrong.

This is necessary for migration: if you start on a system with nmi
support and the guest registers for it, you can't then migrate safely
to a system that doesn't have nmi support.  The way to handle that
case is to have qemu fail to even start up on a destination without
the support.

> So this case is when we are running an old version of KVM with no
> cap_ppc_fwnmi support.
> 
> > 
> >>
> >>>
> >>>> +        return;
> >>>> +    } else if (ret < 0) {
> >>>> +        error_report("Couldn't enable KVM FWNMI capability");
> >>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>> +        return;
> > 
> > ..this case.
> 
> And this is when we have the KVM support but due to some problem with
> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
> 
> > 
> >>>> +    }
> >>>> +
> >>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>> +}
> >>>> +
> >>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>> +                                   SpaprMachineState *spapr,
> >>>> +                                   uint32_t token, uint32_t nargs,
> >>>> +                                   target_ulong args,
> >>>> +                                   uint32_t nret, target_ulong rets)
> >>>> +{
> >>>> +    if (spapr->guest_machine_check_addr == -1) {
> >>>> +        /* NMI register not called */
> >>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>>> +    } else {
> >>>> +        /*
> >>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >>>> +         * hence unset mc_status.
> >>>> +         */
> >>>> +        spapr->mc_status = -1;
> >>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>>
> >>> Hrm.  We add the blocker at the mce request point.  First, that's in
> >>> another patch, which isn't great.  Second, does that mean we could add
> >>> multiple times if we get an MCE on multiple CPUs?  Will that work and
> >>> correctly match adds and removes properly?
> >>
> >> If it is fine to move the migration patch as the last patch in the
> >> sequence, then we will have add and del blocker in the same patch.
> >>
> >> And yes we could add multiple times if we get MCE on multiple CPUs and
> >> as all those cpus call interlock there should be matching number of
> >> delete blockers.
> > 
> > Ok, and I think adding the same pointer to the list multiple times
> > will work ok.
> 
> I think so
> 
> > 
> > Btw, add_blocker() can fail - have you handled failure conditions?
> 
> yes, I am handling it.
> 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-04  1:07           ` David Gibson
@ 2019-07-04  5:03             ` Aravinda Prasad
  2019-07-05  1:07               ` David Gibson
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-04  5:03 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Thursday 04 July 2019 06:37 AM, David Gibson wrote:
> On Wed, Jul 03, 2019 at 02:58:24PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Wednesday 03 July 2019 08:33 AM, David Gibson wrote:
>>> On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
>>>>> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
>>>>>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
>>>>>> the KVM causes guest exit with NMI as exit reason
>>>>>> when it encounters a machine check exception on the
>>>>>> address belonging to a guest. Without this capability
>>>>>> enabled, KVM redirects machine check exceptions to
>>>>>> guest's 0x200 vector.
>>>>>>
>>>>>> This patch also introduces fwnmi-mce capability to
>>>>>> deal with the case when a guest with the
>>>>>> KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>>>> to migrate to a host that does not support this
>>>>>> capability.
>>>>>>
>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  hw/ppc/spapr.c         |    1 +
>>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>>>>>  5 files changed, 61 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>> index 6dd8aaa..2ef86aa 100644
>>>>>> --- a/hw/ppc/spapr.c
>>>>>> +++ b/hw/ppc/spapr.c
>>>>>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>      smc->irq = &spapr_irq_dual;
>>>>>>      smc->dr_phb_enabled = true;
>>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>>>> index 31b4661..2e92eb6 100644
>>>>>> --- a/hw/ppc/spapr_caps.c
>>>>>> +++ b/hw/ppc/spapr_caps.c
>>>>>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>      }
>>>>>>  }
>>>>>>  
>>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>> +                                Error **errp)
>>>>>> +{
>>>>>> +    if (!val) {
>>>>>> +        return; /* Disabled by default */
>>>>>> +    }
>>>>>> +
>>>>>> +    if (tcg_enabled()) {
>>>>>> +        error_setg(errp,
>>>>>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
>>>>>
>>>>> Not allowing this for TCG creates an awkward incompatibility between
>>>>> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
>>>>> - with the current code TCG won't ever generate NMIs, but I don't see
>>>>> that anything will actually break.
>>>>>
>>>>> In fact, we do have an nmi monitor command, currently wired to the
>>>>> spapr_nmi() function which resets each cpu, but it probably makes
>>>>> sense to wire it up to the fwnmi stuff when present.
>>>>
>>>> Yes, but that nmi support is not enough to inject a synchronous error
>>>> into the guest kernel. For example, we should provide the faulty address
>>>> along with other information such as the type of error (slb multi-hit,
>>>> memory error, TLB multi-hit) and when the error occurred (load/store)
>>>> and whether the error was completely recovered or not. Without such
>>>> information we cannot build the error log and pass it on to the guest
>>>> kernel. Right now nmi monitor command takes cpu number as the only argument.
>>>
>>> Obviously we can't inject an arbitrary MCE event with that monitor
>>> command.  But isn't there some sort of catch-all / unknown type of MCE
>>> event which we could inject?
>>
>> We have "unknown" type of error, but we should also pass an address in
>> the MCE event log. Strictly speaking this address should be a valid
>> address in the current CPU context as MCEs are synchronous errors
>> triggered when we touch a bad address.
> 
> Well, some of them are.  At least historically both synchronous and
> asnchronous MCEs were possible.  Are there really no versions where
> you can report an MCE with unknown address?

I am not aware of any such versions. Will cross check.

> 
>> We can pass a default address with every nmi, but I am not sure whether
>> that will be practically helpful.
>>
>>> It seems very confusing to me to have 2 totally separate "nmi"
>>> mechanisms.
>>>
>>>> So I think TCG support should be a separate patch by itself.
>>>
>>> Even if we don't wire up the monitor command, I still don't see
>>> anything that this patch breaks - we can support the nmi-register and
>>> nmi-interlock calls without ever actually creating MCE events.
>>
>> If we support nmi-register and nmi-interlock calls without the monitor
>> command wire-up then we will be falsely claiming the nmi support to the
>> guest while it is not actually supported.
> 
> How so?  AFAICT, from the point of view of the guest this is not
> observably different from supporting the NMI mechanism but NMIs never
> occurring.

A guest inserting a duplicate SLB will expect the machine check
exception delivered to the handler registered via nmi,register.
But we actually don't do that in TCG.

> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-04  1:12           ` David Gibson
@ 2019-07-04  5:19             ` Aravinda Prasad
  2019-07-04 18:37               ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-07-09  6:41               ` [Qemu-devel] " David Gibson
  0 siblings, 2 replies; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-04  5:19 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
> On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
>>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
>>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
>>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
>>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
>>>>>> type 4.0.
>>>>>>
>>>>>> The machine check notification address is saved when the
>>>>>> OS issues "ibm,nmi-register" RTAS call.
>>>>>>
>>>>>> This patch also handles the case when multiple processors
>>>>>> experience machine check at or about the same time by
>>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>>>>> PAPR, subsequent processors serialize waiting for the first
>>>>>> processor to issue the "ibm,nmi-interlock" call. The second
>>>>>> processor that also received a machine check error waits
>>>>>> till the first processor is done reading the error log.
>>>>>> The first processor issues "ibm,nmi-interlock" call
>>>>>> when the error log is consumed.
>>>>>>
>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  hw/ppc/spapr.c         |    6 ++++-
>>>>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  include/hw/ppc/spapr.h |    5 +++-
>>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>> index 3d6d139..213d493 100644
>>>>>> --- a/hw/ppc/spapr.c
>>>>>> +++ b/hw/ppc/spapr.c
>>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>>>>>>          /* Create the error string for live migration blocker */
>>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
>>>>>>                  "Live migration not supported during machine check handling");
>>>>>> +
>>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>>>>>> +        spapr_fwnmi_register();
>>>>>>      }
>>>>>>  
>>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>>>>
>>>>> Turning this on by default really isn't ok if it stops you running TCG
>>>>> guests at all.
>>>>
>>>> If so this can be "off" by default until TCG is supported.
>>>>
>>>>>
>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>      smc->irq = &spapr_irq_dual;
>>>>>>      smc->dr_phb_enabled = true;
>>>>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>
>>>>> We're now well past 4.0, and in fact we're about to go into soft
>>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
>>>>> will need to retain the old default.
>>>>
>>>> ok.
>>>>
>>>>>
>>>>>>  }
>>>>>>  
>>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>>> index a015a80..e010cb2 100644
>>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>>> @@ -49,6 +49,7 @@
>>>>>>  #include "hw/ppc/fdt.h"
>>>>>>  #include "target/ppc/mmu-hash64.h"
>>>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>>>> +#include "migration/blocker.h"
>>>>>>  
>>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>>>                                     uint32_t token, uint32_t nargs,
>>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>>>      rtas_st(rets, 1, 100);
>>>>>>  }
>>>>>>  
>>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>>> +                                  SpaprMachineState *spapr,
>>>>>> +                                  uint32_t token, uint32_t nargs,
>>>>>> +                                  target_ulong args,
>>>>>> +                                  uint32_t nret, target_ulong rets)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>>>>> +
>>>>>> +    if (!rtas_addr) {
>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>>>> +    if (ret == 1) {
>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>
>>>>> I don't understand this case separate from the others.  We've already
>>>>> set the cap, so fwnmi support should be checked and available.
>>>>
>>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
>>>> cap_ppc_fwnmi is not available in KVM.
>>>
>>> But you've checked for the presence of the extension, yes?  So a
>>> failure to enable the cap would be unexpected.  In which case how does
>>> this case differ from.. 
>>
>> No, this is the function where I check for the presence of the
>> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
>> support is available, but don't take any action if unavailable.
> 
> Yeah, that's not ok.  You should be checking for the presence of the
> extension in the .apply() function.  If you start up with the spapr
> cap selected then failing at nmi-register time means something has
> gone badly wrong.

So, I should check for two things in the .apply() function: first if
cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.

In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
called during spapr_machine_init().

So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
can't be enabled irrespective of whether a guest issues nmi,register or not.

> 
> This is necessary for migration: if you start on a system with nmi
> support and the guest registers for it, you can't then migrate safely
> to a system that doesn't have nmi support.  The way to handle that
> case is to have qemu fail to even start up on a destination without
> the support.
> 
>> So this case is when we are running an old version of KVM with no
>> cap_ppc_fwnmi support.
>>
>>>
>>>>
>>>>>
>>>>>> +        return;
>>>>>> +    } else if (ret < 0) {
>>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
>>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>>>> +        return;
>>>
>>> ..this case.
>>
>> And this is when we have the KVM support but due to some problem with
>> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
>>
>>>
>>>>>> +    }
>>>>>> +
>>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>>> +}
>>>>>> +
>>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>>>> +                                   SpaprMachineState *spapr,
>>>>>> +                                   uint32_t token, uint32_t nargs,
>>>>>> +                                   target_ulong args,
>>>>>> +                                   uint32_t nret, target_ulong rets)
>>>>>> +{
>>>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>>>> +        /* NMI register not called */
>>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>>>> +    } else {
>>>>>> +        /*
>>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>>>>>> +         * hence unset mc_status.
>>>>>> +         */
>>>>>> +        spapr->mc_status = -1;
>>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>>>>
>>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
>>>>> another patch, which isn't great.  Second, does that mean we could add
>>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
>>>>> correctly match adds and removes properly?
>>>>
>>>> If it is fine to move the migration patch as the last patch in the
>>>> sequence, then we will have add and del blocker in the same patch.
>>>>
>>>> And yes we could add multiple times if we get MCE on multiple CPUs and
>>>> as all those cpus call interlock there should be matching number of
>>>> delete blockers.
>>>
>>> Ok, and I think adding the same pointer to the list multiple times
>>> will work ok.
>>
>> I think so
>>
>>>
>>> Btw, add_blocker() can fail - have you handled failure conditions?
>>
>> yes, I am handling it.
>>
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-04  5:19             ` Aravinda Prasad
@ 2019-07-04 18:37               ` Greg Kurz
  2019-07-05 11:11                 ` Aravinda Prasad
  2019-07-09  6:41               ` [Qemu-devel] " David Gibson
  1 sibling, 1 reply; 37+ messages in thread
From: Greg Kurz @ 2019-07-04 18:37 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, David Gibson

On Thu, 4 Jul 2019 10:49:05 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> 
> 
> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
> > On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> >>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> >>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>>>> type 4.0.
> >>>>>>
> >>>>>> The machine check notification address is saved when the
> >>>>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>>>
> >>>>>> This patch also handles the case when multiple processors
> >>>>>> experience machine check at or about the same time by
> >>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>>>> PAPR, subsequent processors serialize waiting for the first
> >>>>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>>>> processor that also received a machine check error waits
> >>>>>> till the first processor is done reading the error log.
> >>>>>> The first processor issues "ibm,nmi-interlock" call
> >>>>>> when the error log is consumed.
> >>>>>>
> >>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>>>> ---
> >>>>>>  hw/ppc/spapr.c         |    6 ++++-
> >>>>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>  include/hw/ppc/spapr.h |    5 +++-
> >>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>> index 3d6d139..213d493 100644
> >>>>>> --- a/hw/ppc/spapr.c
> >>>>>> +++ b/hw/ppc/spapr.c
> >>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
> >>>>>>          /* Create the error string for live migration blocker */
> >>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
> >>>>>>                  "Live migration not supported during machine check handling");
> >>>>>> +
> >>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >>>>>> +        spapr_fwnmi_register();
> >>>>>>      }
> >>>>>>  
> >>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>>>
> >>>>> Turning this on by default really isn't ok if it stops you running TCG
> >>>>> guests at all.
> >>>>
> >>>> If so this can be "off" by default until TCG is supported.
> >>>>
> >>>>>
> >>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>      smc->irq = &spapr_irq_dual;
> >>>>>>      smc->dr_phb_enabled = true;
> >>>>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>
> >>>>> We're now well past 4.0, and in fact we're about to go into soft
> >>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> >>>>> will need to retain the old default.
> >>>>
> >>>> ok.
> >>>>
> >>>>>
> >>>>>>  }
> >>>>>>  
> >>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>>> index a015a80..e010cb2 100644
> >>>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>>> @@ -49,6 +49,7 @@
> >>>>>>  #include "hw/ppc/fdt.h"
> >>>>>>  #include "target/ppc/mmu-hash64.h"
> >>>>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>>>> +#include "migration/blocker.h"
> >>>>>>  
> >>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>>>                                     uint32_t token, uint32_t nargs,
> >>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>>>      rtas_st(rets, 1, 100);
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>> +                                  SpaprMachineState *spapr,
> >>>>>> +                                  uint32_t token, uint32_t nargs,
> >>>>>> +                                  target_ulong args,
> >>>>>> +                                  uint32_t nret, target_ulong rets)
> >>>>>> +{
> >>>>>> +    int ret;
> >>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >>>>>> +
> >>>>>> +    if (!rtas_addr) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>> +        return;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>> +        return;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>>>> +    if (ret == 1) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>
> >>>>> I don't understand this case separate from the others.  We've already
> >>>>> set the cap, so fwnmi support should be checked and available.
> >>>>
> >>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
> >>>> cap_ppc_fwnmi is not available in KVM.
> >>>
> >>> But you've checked for the presence of the extension, yes?  So a
> >>> failure to enable the cap would be unexpected.  In which case how does
> >>> this case differ from.. 
> >>
> >> No, this is the function where I check for the presence of the
> >> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
> >> support is available, but don't take any action if unavailable.
> > 
> > Yeah, that's not ok.  You should be checking for the presence of the
> > extension in the .apply() function.  If you start up with the spapr
> > cap selected then failing at nmi-register time means something has
> > gone badly wrong.
> 
> So, I should check for two things in the .apply() function: first if
> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.
> 
> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
> called during spapr_machine_init().
> 
> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
> can't be enabled irrespective of whether a guest issues nmi,register or not.
> 

Yes. The idea is that we don't want to expose some feature to the guest
if we already know it cannot work. The same stands for migration, if
we know the destination host doesn't support the feature, we don't want
to confuse the guest because the feature disappeared unexpectedly. The
correct way to address that is to check the feature is supported in QEMU
and/or KVM before we start the guest or before we accept a migration
stream and fail early if we can't provide the feature.

Speaking of migration, kvmppc_fwnmi_enable() should be called in a
post load callback... and the problem I see is that this is a vCPU
ioctl. So either we add a state "I should enable FWNMI on post load"
to the vCPU that called nmi,register, or maybe first_cpu can do the
trick in a post load callback of vmstate_spapr_machine_check.

BTW, since FWNMI is a platform wide feature, ie. you don't need to
enable it on a per-CPU basis), why is KVM_CAP_PPC_FWNMI a vCPU ioctl
and not a VM ioctl ?

> > 
> > This is necessary for migration: if you start on a system with nmi
> > support and the guest registers for it, you can't then migrate safely
> > to a system that doesn't have nmi support.  The way to handle that
> > case is to have qemu fail to even start up on a destination without
> > the support.
> > 
> >> So this case is when we are running an old version of KVM with no
> >> cap_ppc_fwnmi support.
> >>
> >>>
> >>>>
> >>>>>
> >>>>>> +        return;
> >>>>>> +    } else if (ret < 0) {
> >>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>>>> +        return;
> >>>
> >>> ..this case.
> >>
> >> And this is when we have the KVM support but due to some problem with
> >> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
> >>
> >>>
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>>>> +                                   SpaprMachineState *spapr,
> >>>>>> +                                   uint32_t token, uint32_t nargs,
> >>>>>> +                                   target_ulong args,
> >>>>>> +                                   uint32_t nret, target_ulong rets)
> >>>>>> +{
> >>>>>> +    if (spapr->guest_machine_check_addr == -1) {
> >>>>>> +        /* NMI register not called */
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>>>>> +    } else {
> >>>>>> +        /*
> >>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >>>>>> +         * hence unset mc_status.
> >>>>>> +         */
> >>>>>> +        spapr->mc_status = -1;
> >>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>>>>
> >>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
> >>>>> another patch, which isn't great.  Second, does that mean we could add
> >>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
> >>>>> correctly match adds and removes properly?
> >>>>
> >>>> If it is fine to move the migration patch as the last patch in the
> >>>> sequence, then we will have add and del blocker in the same patch.
> >>>>
> >>>> And yes we could add multiple times if we get MCE on multiple CPUs and
> >>>> as all those cpus call interlock there should be matching number of
> >>>> delete blockers.
> >>>
> >>> Ok, and I think adding the same pointer to the list multiple times
> >>> will work ok.
> >>
> >> I think so
> >>
> >>>
> >>> Btw, add_blocker() can fail - have you handled failure conditions?
> >>
> >> yes, I am handling it.
> >>
> >>>
> >>
> > 
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-04  5:03             ` Aravinda Prasad
@ 2019-07-05  1:07               ` David Gibson
  2019-07-05 11:19                 ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: David Gibson @ 2019-07-05  1:07 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 6766 bytes --]

On Thu, Jul 04, 2019 at 10:33:11AM +0530, Aravinda Prasad wrote:
> 
> 
> On Thursday 04 July 2019 06:37 AM, David Gibson wrote:
> > On Wed, Jul 03, 2019 at 02:58:24PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Wednesday 03 July 2019 08:33 AM, David Gibson wrote:
> >>> On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
> >>>>> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
> >>>>>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
> >>>>>> the KVM causes guest exit with NMI as exit reason
> >>>>>> when it encounters a machine check exception on the
> >>>>>> address belonging to a guest. Without this capability
> >>>>>> enabled, KVM redirects machine check exceptions to
> >>>>>> guest's 0x200 vector.
> >>>>>>
> >>>>>> This patch also introduces fwnmi-mce capability to
> >>>>>> deal with the case when a guest with the
> >>>>>> KVM_CAP_PPC_FWNMI capability enabled is attempted
> >>>>>> to migrate to a host that does not support this
> >>>>>> capability.
> >>>>>>
> >>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>>>> ---
> >>>>>>  hw/ppc/spapr.c         |    1 +
> >>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
> >>>>>>  include/hw/ppc/spapr.h |    4 +++-
> >>>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
> >>>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
> >>>>>>  5 files changed, 61 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>> index 6dd8aaa..2ef86aa 100644
> >>>>>> --- a/hw/ppc/spapr.c
> >>>>>> +++ b/hw/ppc/spapr.c
> >>>>>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>      smc->irq = &spapr_irq_dual;
> >>>>>>      smc->dr_phb_enabled = true;
> >>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >>>>>> index 31b4661..2e92eb6 100644
> >>>>>> --- a/hw/ppc/spapr_caps.c
> >>>>>> +++ b/hw/ppc/spapr_caps.c
> >>>>>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>>>      }
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>>> +                                Error **errp)
> >>>>>> +{
> >>>>>> +    if (!val) {
> >>>>>> +        return; /* Disabled by default */
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    if (tcg_enabled()) {
> >>>>>> +        error_setg(errp,
> >>>>>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
> >>>>>
> >>>>> Not allowing this for TCG creates an awkward incompatibility between
> >>>>> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
> >>>>> - with the current code TCG won't ever generate NMIs, but I don't see
> >>>>> that anything will actually break.
> >>>>>
> >>>>> In fact, we do have an nmi monitor command, currently wired to the
> >>>>> spapr_nmi() function which resets each cpu, but it probably makes
> >>>>> sense to wire it up to the fwnmi stuff when present.
> >>>>
> >>>> Yes, but that nmi support is not enough to inject a synchronous error
> >>>> into the guest kernel. For example, we should provide the faulty address
> >>>> along with other information such as the type of error (slb multi-hit,
> >>>> memory error, TLB multi-hit) and when the error occurred (load/store)
> >>>> and whether the error was completely recovered or not. Without such
> >>>> information we cannot build the error log and pass it on to the guest
> >>>> kernel. Right now nmi monitor command takes cpu number as the only argument.
> >>>
> >>> Obviously we can't inject an arbitrary MCE event with that monitor
> >>> command.  But isn't there some sort of catch-all / unknown type of MCE
> >>> event which we could inject?
> >>
> >> We have "unknown" type of error, but we should also pass an address in
> >> the MCE event log. Strictly speaking this address should be a valid
> >> address in the current CPU context as MCEs are synchronous errors
> >> triggered when we touch a bad address.
> > 
> > Well, some of them are.  At least historically both synchronous and
> > asnchronous MCEs were possible.  Are there really no versions where
> > you can report an MCE with unknown address?
> 
> I am not aware of any such versions. Will cross check.
> 
> > 
> >> We can pass a default address with every nmi, but I am not sure whether
> >> that will be practically helpful.
> >>
> >>> It seems very confusing to me to have 2 totally separate "nmi"
> >>> mechanisms.
> >>>
> >>>> So I think TCG support should be a separate patch by itself.
> >>>
> >>> Even if we don't wire up the monitor command, I still don't see
> >>> anything that this patch breaks - we can support the nmi-register and
> >>> nmi-interlock calls without ever actually creating MCE events.
> >>
> >> If we support nmi-register and nmi-interlock calls without the monitor
> >> command wire-up then we will be falsely claiming the nmi support to the
> >> guest while it is not actually supported.
> > 
> > How so?  AFAICT, from the point of view of the guest this is not
> > observably different from supporting the NMI mechanism but NMIs never
> > occurring.
> 
> A guest inserting a duplicate SLB will expect the machine check
> exception delivered to the handler registered via nmi,register.
> But we actually don't do that in TCG.

Ah, true, I was thinking of external hardware fault triggered MCEs
rather than software error ones like duplicate SLB.

That said, I strongly suspect TCG is buggy enough at present that
exact behaviour in rare error conditions like duplicate SLB is not
really a big problem in the scheme of things.

I really don't think we can enable this by default until we allow it
for TCG - we don't want starting a TCG guest to involve manually
switching other options.

We could consider allowing it for TCG but just printing a warning that
the behaviour may not be correct in some conditions - we do something
similar for some of the Spectre workarounds already.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-04 18:37               ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-07-05 11:11                 ` Aravinda Prasad
  2019-07-05 12:35                   ` Greg Kurz
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-05 11:11 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, David Gibson



On Friday 05 July 2019 12:07 AM, Greg Kurz wrote:
> On Thu, 4 Jul 2019 10:49:05 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>>
>>
>> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
>>> On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
>>>>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
>>>>>>
>>>>>>
>>>>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
>>>>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
>>>>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>>>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
>>>>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
>>>>>>>> type 4.0.
>>>>>>>>
>>>>>>>> The machine check notification address is saved when the
>>>>>>>> OS issues "ibm,nmi-register" RTAS call.
>>>>>>>>
>>>>>>>> This patch also handles the case when multiple processors
>>>>>>>> experience machine check at or about the same time by
>>>>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>>>>>>> PAPR, subsequent processors serialize waiting for the first
>>>>>>>> processor to issue the "ibm,nmi-interlock" call. The second
>>>>>>>> processor that also received a machine check error waits
>>>>>>>> till the first processor is done reading the error log.
>>>>>>>> The first processor issues "ibm,nmi-interlock" call
>>>>>>>> when the error log is consumed.
>>>>>>>>
>>>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>>>> ---
>>>>>>>>  hw/ppc/spapr.c         |    6 ++++-
>>>>>>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  include/hw/ppc/spapr.h |    5 +++-
>>>>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>>>> index 3d6d139..213d493 100644
>>>>>>>> --- a/hw/ppc/spapr.c
>>>>>>>> +++ b/hw/ppc/spapr.c
>>>>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
>>>>>>>>          /* Create the error string for live migration blocker */
>>>>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
>>>>>>>>                  "Live migration not supported during machine check handling");
>>>>>>>> +
>>>>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
>>>>>>>> +        spapr_fwnmi_register();
>>>>>>>>      }
>>>>>>>>  
>>>>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>>>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>>>>>>
>>>>>>> Turning this on by default really isn't ok if it stops you running TCG
>>>>>>> guests at all.
>>>>>>
>>>>>> If so this can be "off" by default until TCG is supported.
>>>>>>
>>>>>>>
>>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>>>      smc->irq = &spapr_irq_dual;
>>>>>>>>      smc->dr_phb_enabled = true;
>>>>>>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
>>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>>
>>>>>>> We're now well past 4.0, and in fact we're about to go into soft
>>>>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
>>>>>>> will need to retain the old default.
>>>>>>
>>>>>> ok.
>>>>>>
>>>>>>>
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>>>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>>>>> index a015a80..e010cb2 100644
>>>>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>>>>> @@ -49,6 +49,7 @@
>>>>>>>>  #include "hw/ppc/fdt.h"
>>>>>>>>  #include "target/ppc/mmu-hash64.h"
>>>>>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>>>>>> +#include "migration/blocker.h"
>>>>>>>>  
>>>>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>>>>>                                     uint32_t token, uint32_t nargs,
>>>>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>>>>>      rtas_st(rets, 1, 100);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>>>>> +                                  SpaprMachineState *spapr,
>>>>>>>> +                                  uint32_t token, uint32_t nargs,
>>>>>>>> +                                  target_ulong args,
>>>>>>>> +                                  uint32_t nret, target_ulong rets)
>>>>>>>> +{
>>>>>>>> +    int ret;
>>>>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>>>>>>> +
>>>>>>>> +    if (!rtas_addr) {
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>>>>>> +    if (ret == 1) {
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>>>
>>>>>>> I don't understand this case separate from the others.  We've already
>>>>>>> set the cap, so fwnmi support should be checked and available.
>>>>>>
>>>>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
>>>>>> cap_ppc_fwnmi is not available in KVM.
>>>>>
>>>>> But you've checked for the presence of the extension, yes?  So a
>>>>> failure to enable the cap would be unexpected.  In which case how does
>>>>> this case differ from.. 
>>>>
>>>> No, this is the function where I check for the presence of the
>>>> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
>>>> support is available, but don't take any action if unavailable.
>>>
>>> Yeah, that's not ok.  You should be checking for the presence of the
>>> extension in the .apply() function.  If you start up with the spapr
>>> cap selected then failing at nmi-register time means something has
>>> gone badly wrong.
>>
>> So, I should check for two things in the .apply() function: first if
>> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.
>>
>> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
>> called during spapr_machine_init().
>>
>> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
>> can't be enabled irrespective of whether a guest issues nmi,register or not.
>>
> 
> Yes. The idea is that we don't want to expose some feature to the guest
> if we already know it cannot work. The same stands for migration, if
> we know the destination host doesn't support the feature, we don't want
> to confuse the guest because the feature disappeared unexpectedly. The
> correct way to address that is to check the feature is supported in QEMU
> and/or KVM before we start the guest or before we accept a migration
> stream and fail early if we can't provide the feature.

ok.

> 
> Speaking of migration, kvmppc_fwnmi_enable() should be called in a
> post load callback... and the problem I see is that this is a vCPU
> ioctl. So either we add a state "I should enable FWNMI on post load"
> to the vCPU that called nmi,register, or maybe first_cpu can do the
> trick in a post load callback of vmstate_spapr_machine_check.

To summarize the changes I am planning:

If SPAPR_CAP_FWNMI_MCE=ON, then enable cap_ppc_fwnmi in KVM. If
cap_ppc_fwnmi can't be enabled then fail early. This can be included in
spapr_machine_init() function.

During migration, if SPAPR_CAP_FWNMI_MCE=ON then enable cap_ppc_fwnmi by
having a .post_load callback which calls kvmppc_fwnmi_enable(). Fail the
migration if we can't enable cap_ppc_fwnmi on the destination host.

> 
> BTW, since FWNMI is a platform wide feature, ie. you don't need to
> enable it on a per-CPU basis), why is KVM_CAP_PPC_FWNMI a vCPU ioctl
> and not a VM ioctl ?

I think it should be VM ioctl, let me check.

Regards,
Aravinda

> 
>>>
>>> This is necessary for migration: if you start on a system with nmi
>>> support and the guest registers for it, you can't then migrate safely
>>> to a system that doesn't have nmi support.  The way to handle that
>>> case is to have qemu fail to even start up on a destination without
>>> the support.
>>>
>>>> So this case is when we are running an old version of KVM with no
>>>> cap_ppc_fwnmi support.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> +        return;
>>>>>>>> +    } else if (ret < 0) {
>>>>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>>>>>> +        return;
>>>>>
>>>>> ..this case.
>>>>
>>>> And this is when we have the KVM support but due to some problem with
>>>> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
>>>>
>>>>>
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>>>>>> +                                   SpaprMachineState *spapr,
>>>>>>>> +                                   uint32_t token, uint32_t nargs,
>>>>>>>> +                                   target_ulong args,
>>>>>>>> +                                   uint32_t nret, target_ulong rets)
>>>>>>>> +{
>>>>>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>>>>>> +        /* NMI register not called */
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>>>>>> +    } else {
>>>>>>>> +        /*
>>>>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>>>>>>>> +         * hence unset mc_status.
>>>>>>>> +         */
>>>>>>>> +        spapr->mc_status = -1;
>>>>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>>>>>>
>>>>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
>>>>>>> another patch, which isn't great.  Second, does that mean we could add
>>>>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
>>>>>>> correctly match adds and removes properly?
>>>>>>
>>>>>> If it is fine to move the migration patch as the last patch in the
>>>>>> sequence, then we will have add and del blocker in the same patch.
>>>>>>
>>>>>> And yes we could add multiple times if we get MCE on multiple CPUs and
>>>>>> as all those cpus call interlock there should be matching number of
>>>>>> delete blockers.
>>>>>
>>>>> Ok, and I think adding the same pointer to the list multiple times
>>>>> will work ok.
>>>>
>>>> I think so
>>>>
>>>>>
>>>>> Btw, add_blocker() can fail - have you handled failure conditions?
>>>>
>>>> yes, I am handling it.
>>>>
>>>>>
>>>>
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-05  1:07               ` David Gibson
@ 2019-07-05 11:19                 ` Aravinda Prasad
  2019-07-05 13:23                   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  0 siblings, 1 reply; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-05 11:19 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Friday 05 July 2019 06:37 AM, David Gibson wrote:
> On Thu, Jul 04, 2019 at 10:33:11AM +0530, Aravinda Prasad wrote:
>>
>>
>> On Thursday 04 July 2019 06:37 AM, David Gibson wrote:
>>> On Wed, Jul 03, 2019 at 02:58:24PM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Wednesday 03 July 2019 08:33 AM, David Gibson wrote:
>>>>> On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
>>>>>>
>>>>>>
>>>>>> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
>>>>>>> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
>>>>>>>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
>>>>>>>> the KVM causes guest exit with NMI as exit reason
>>>>>>>> when it encounters a machine check exception on the
>>>>>>>> address belonging to a guest. Without this capability
>>>>>>>> enabled, KVM redirects machine check exceptions to
>>>>>>>> guest's 0x200 vector.
>>>>>>>>
>>>>>>>> This patch also introduces fwnmi-mce capability to
>>>>>>>> deal with the case when a guest with the
>>>>>>>> KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>>>>>> to migrate to a host that does not support this
>>>>>>>> capability.
>>>>>>>>
>>>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>>>> ---
>>>>>>>>  hw/ppc/spapr.c         |    1 +
>>>>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>>>>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>>>>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>>>>>>>  5 files changed, 61 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>>>> index 6dd8aaa..2ef86aa 100644
>>>>>>>> --- a/hw/ppc/spapr.c
>>>>>>>> +++ b/hw/ppc/spapr.c
>>>>>>>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>>>      smc->irq = &spapr_irq_dual;
>>>>>>>>      smc->dr_phb_enabled = true;
>>>>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>>>>>> index 31b4661..2e92eb6 100644
>>>>>>>> --- a/hw/ppc/spapr_caps.c
>>>>>>>> +++ b/hw/ppc/spapr_caps.c
>>>>>>>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>>> +                                Error **errp)
>>>>>>>> +{
>>>>>>>> +    if (!val) {
>>>>>>>> +        return; /* Disabled by default */
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (tcg_enabled()) {
>>>>>>>> +        error_setg(errp,
>>>>>>>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
>>>>>>>
>>>>>>> Not allowing this for TCG creates an awkward incompatibility between
>>>>>>> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
>>>>>>> - with the current code TCG won't ever generate NMIs, but I don't see
>>>>>>> that anything will actually break.
>>>>>>>
>>>>>>> In fact, we do have an nmi monitor command, currently wired to the
>>>>>>> spapr_nmi() function which resets each cpu, but it probably makes
>>>>>>> sense to wire it up to the fwnmi stuff when present.
>>>>>>
>>>>>> Yes, but that nmi support is not enough to inject a synchronous error
>>>>>> into the guest kernel. For example, we should provide the faulty address
>>>>>> along with other information such as the type of error (slb multi-hit,
>>>>>> memory error, TLB multi-hit) and when the error occurred (load/store)
>>>>>> and whether the error was completely recovered or not. Without such
>>>>>> information we cannot build the error log and pass it on to the guest
>>>>>> kernel. Right now nmi monitor command takes cpu number as the only argument.
>>>>>
>>>>> Obviously we can't inject an arbitrary MCE event with that monitor
>>>>> command.  But isn't there some sort of catch-all / unknown type of MCE
>>>>> event which we could inject?
>>>>
>>>> We have "unknown" type of error, but we should also pass an address in
>>>> the MCE event log. Strictly speaking this address should be a valid
>>>> address in the current CPU context as MCEs are synchronous errors
>>>> triggered when we touch a bad address.
>>>
>>> Well, some of them are.  At least historically both synchronous and
>>> asnchronous MCEs were possible.  Are there really no versions where
>>> you can report an MCE with unknown address?
>>
>> I am not aware of any such versions. Will cross check.
>>
>>>
>>>> We can pass a default address with every nmi, but I am not sure whether
>>>> that will be practically helpful.
>>>>
>>>>> It seems very confusing to me to have 2 totally separate "nmi"
>>>>> mechanisms.
>>>>>
>>>>>> So I think TCG support should be a separate patch by itself.
>>>>>
>>>>> Even if we don't wire up the monitor command, I still don't see
>>>>> anything that this patch breaks - we can support the nmi-register and
>>>>> nmi-interlock calls without ever actually creating MCE events.
>>>>
>>>> If we support nmi-register and nmi-interlock calls without the monitor
>>>> command wire-up then we will be falsely claiming the nmi support to the
>>>> guest while it is not actually supported.
>>>
>>> How so?  AFAICT, from the point of view of the guest this is not
>>> observably different from supporting the NMI mechanism but NMIs never
>>> occurring.
>>
>> A guest inserting a duplicate SLB will expect the machine check
>> exception delivered to the handler registered via nmi,register.
>> But we actually don't do that in TCG.
> 
> Ah, true, I was thinking of external hardware fault triggered MCEs
> rather than software error ones like duplicate SLB.
> 
> That said, I strongly suspect TCG is buggy enough at present that
> exact behaviour in rare error conditions like duplicate SLB is not
> really a big problem in the scheme of things.
> 
> I really don't think we can enable this by default until we allow it
> for TCG - we don't want starting a TCG guest to involve manually
> switching other options.
> 
> We could consider allowing it for TCG but just printing a warning that
> the behaviour may not be correct in some conditions - we do something
> similar for some of the Spectre workarounds already.

I think we better not enable this by default until we enhance TCG to
support fwnmi.

> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-05 11:11                 ` Aravinda Prasad
@ 2019-07-05 12:35                   ` Greg Kurz
  0 siblings, 0 replies; 37+ messages in thread
From: Greg Kurz @ 2019-07-05 12:35 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, David Gibson

On Fri, 5 Jul 2019 16:41:25 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> 
> 
> On Friday 05 July 2019 12:07 AM, Greg Kurz wrote:
> > On Thu, 4 Jul 2019 10:49:05 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > 
> >>
> >>
> >> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
> >>> On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> >>>>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> >>>>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >>>>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>>>>>> type 4.0.
> >>>>>>>>
> >>>>>>>> The machine check notification address is saved when the
> >>>>>>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>>>>>
> >>>>>>>> This patch also handles the case when multiple processors
> >>>>>>>> experience machine check at or about the same time by
> >>>>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>>>>>> PAPR, subsequent processors serialize waiting for the first
> >>>>>>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>>>>>> processor that also received a machine check error waits
> >>>>>>>> till the first processor is done reading the error log.
> >>>>>>>> The first processor issues "ibm,nmi-interlock" call
> >>>>>>>> when the error log is consumed.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>>>>>> ---
> >>>>>>>>  hw/ppc/spapr.c         |    6 ++++-
> >>>>>>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>  include/hw/ppc/spapr.h |    5 +++-
> >>>>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>>>> index 3d6d139..213d493 100644
> >>>>>>>> --- a/hw/ppc/spapr.c
> >>>>>>>> +++ b/hw/ppc/spapr.c
> >>>>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
> >>>>>>>>          /* Create the error string for live migration blocker */
> >>>>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
> >>>>>>>>                  "Live migration not supported during machine check handling");
> >>>>>>>> +
> >>>>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >>>>>>>> +        spapr_fwnmi_register();
> >>>>>>>>      }
> >>>>>>>>  
> >>>>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>>>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>>>>>
> >>>>>>> Turning this on by default really isn't ok if it stops you running TCG
> >>>>>>> guests at all.
> >>>>>>
> >>>>>> If so this can be "off" by default until TCG is supported.
> >>>>>>
> >>>>>>>
> >>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>>>      smc->irq = &spapr_irq_dual;
> >>>>>>>>      smc->dr_phb_enabled = true;
> >>>>>>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> >>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>>
> >>>>>>> We're now well past 4.0, and in fact we're about to go into soft
> >>>>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> >>>>>>> will need to retain the old default.
> >>>>>>
> >>>>>> ok.
> >>>>>>
> >>>>>>>
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >>>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>>>>> index a015a80..e010cb2 100644
> >>>>>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>>>>> @@ -49,6 +49,7 @@
> >>>>>>>>  #include "hw/ppc/fdt.h"
> >>>>>>>>  #include "target/ppc/mmu-hash64.h"
> >>>>>>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>>>>>> +#include "migration/blocker.h"
> >>>>>>>>  
> >>>>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>>>>>                                     uint32_t token, uint32_t nargs,
> >>>>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>>>>>      rtas_st(rets, 1, 100);
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>>>> +                                  SpaprMachineState *spapr,
> >>>>>>>> +                                  uint32_t token, uint32_t nargs,
> >>>>>>>> +                                  target_ulong args,
> >>>>>>>> +                                  uint32_t nret, target_ulong rets)
> >>>>>>>> +{
> >>>>>>>> +    int ret;
> >>>>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >>>>>>>> +
> >>>>>>>> +    if (!rtas_addr) {
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>> +        return;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>> +        return;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>>>>>> +    if (ret == 1) {
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>
> >>>>>>> I don't understand this case separate from the others.  We've already
> >>>>>>> set the cap, so fwnmi support should be checked and available.
> >>>>>>
> >>>>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
> >>>>>> cap_ppc_fwnmi is not available in KVM.
> >>>>>
> >>>>> But you've checked for the presence of the extension, yes?  So a
> >>>>> failure to enable the cap would be unexpected.  In which case how does
> >>>>> this case differ from.. 
> >>>>
> >>>> No, this is the function where I check for the presence of the
> >>>> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
> >>>> support is available, but don't take any action if unavailable.
> >>>
> >>> Yeah, that's not ok.  You should be checking for the presence of the
> >>> extension in the .apply() function.  If you start up with the spapr
> >>> cap selected then failing at nmi-register time means something has
> >>> gone badly wrong.
> >>
> >> So, I should check for two things in the .apply() function: first if
> >> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.
> >>
> >> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
> >> called during spapr_machine_init().
> >>
> >> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
> >> can't be enabled irrespective of whether a guest issues nmi,register or not.
> >>
> > 
> > Yes. The idea is that we don't want to expose some feature to the guest
> > if we already know it cannot work. The same stands for migration, if
> > we know the destination host doesn't support the feature, we don't want
> > to confuse the guest because the feature disappeared unexpectedly. The
> > correct way to address that is to check the feature is supported in QEMU
> > and/or KVM before we start the guest or before we accept a migration
> > stream and fail early if we can't provide the feature.
> 
> ok.
> 
> > 
> > Speaking of migration, kvmppc_fwnmi_enable() should be called in a
> > post load callback... and the problem I see is that this is a vCPU
> > ioctl. So either we add a state "I should enable FWNMI on post load"
> > to the vCPU that called nmi,register, or maybe first_cpu can do the
> > trick in a post load callback of vmstate_spapr_machine_check.
> 
> To summarize the changes I am planning:
> 
> If SPAPR_CAP_FWNMI_MCE=ON, then enable cap_ppc_fwnmi in KVM. If
> cap_ppc_fwnmi can't be enabled then fail early. This can be included in
> spapr_machine_init() function.
> 
> During migration, if SPAPR_CAP_FWNMI_MCE=ON then enable cap_ppc_fwnmi by
> having a .post_load callback which calls kvmppc_fwnmi_enable(). Fail the
> migration if we can't enable cap_ppc_fwnmi on the destination host.
> 

LGTM

> > 
> > BTW, since FWNMI is a platform wide feature, ie. you don't need to
> > enable it on a per-CPU basis), why is KVM_CAP_PPC_FWNMI a vCPU ioctl
> > and not a VM ioctl ?
> 
> I think it should be VM ioctl, let me check.
> 

It is indeed described as a VM ioctl in Documentation/virtual/kvm/api.txt
but...

static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
				     struct kvm_enable_cap *cap)
{
[...]
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
	case KVM_CAP_PPC_FWNMI:
		r = -EINVAL;
		if (!is_kvmppc_hv_enabled(vcpu->kvm))
			break;
		r = 0;
		vcpu->kvm->arch.fwnmi_enabled = true;
		break;
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
[...]
}

I guess it should have been rather added to kvm_vm_ioctl_enable_cap(),
but since this was released in 4.13, I'm afraid we'll need to support
this variant in QEMU anyway.

> Regards,
> Aravinda
> 
> > 
> >>>
> >>> This is necessary for migration: if you start on a system with nmi
> >>> support and the guest registers for it, you can't then migrate safely
> >>> to a system that doesn't have nmi support.  The way to handle that
> >>> case is to have qemu fail to even start up on a destination without
> >>> the support.
> >>>
> >>>> So this case is when we are running an old version of KVM with no
> >>>> cap_ppc_fwnmi support.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>> +        return;
> >>>>>>>> +    } else if (ret < 0) {
> >>>>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>>>>>> +        return;
> >>>>>
> >>>>> ..this case.
> >>>>
> >>>> And this is when we have the KVM support but due to some problem with
> >>>> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
> >>>>
> >>>>>
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>>>>>> +                                   SpaprMachineState *spapr,
> >>>>>>>> +                                   uint32_t token, uint32_t nargs,
> >>>>>>>> +                                   target_ulong args,
> >>>>>>>> +                                   uint32_t nret, target_ulong rets)
> >>>>>>>> +{
> >>>>>>>> +    if (spapr->guest_machine_check_addr == -1) {
> >>>>>>>> +        /* NMI register not called */
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>>>>>>> +    } else {
> >>>>>>>> +        /*
> >>>>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >>>>>>>> +         * hence unset mc_status.
> >>>>>>>> +         */
> >>>>>>>> +        spapr->mc_status = -1;
> >>>>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>>>>>>
> >>>>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
> >>>>>>> another patch, which isn't great.  Second, does that mean we could add
> >>>>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
> >>>>>>> correctly match adds and removes properly?
> >>>>>>
> >>>>>> If it is fine to move the migration patch as the last patch in the
> >>>>>> sequence, then we will have add and del blocker in the same patch.
> >>>>>>
> >>>>>> And yes we could add multiple times if we get MCE on multiple CPUs and
> >>>>>> as all those cpus call interlock there should be matching number of
> >>>>>> delete blockers.
> >>>>>
> >>>>> Ok, and I think adding the same pointer to the list multiple times
> >>>>> will work ok.
> >>>>
> >>>> I think so
> >>>>
> >>>>>
> >>>>> Btw, add_blocker() can fail - have you handled failure conditions?
> >>>>
> >>>> yes, I am handling it.
> >>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-05 11:19                 ` Aravinda Prasad
@ 2019-07-05 13:23                   ` Greg Kurz
  2019-07-08 10:26                     ` Aravinda Prasad
  0 siblings, 1 reply; 37+ messages in thread
From: Greg Kurz @ 2019-07-05 13:23 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, David Gibson

On Fri, 5 Jul 2019 16:49:17 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> 
> 
> On Friday 05 July 2019 06:37 AM, David Gibson wrote:
> > On Thu, Jul 04, 2019 at 10:33:11AM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Thursday 04 July 2019 06:37 AM, David Gibson wrote:
> >>> On Wed, Jul 03, 2019 at 02:58:24PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Wednesday 03 July 2019 08:33 AM, David Gibson wrote:
> >>>>> On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
> >>>>>>> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
> >>>>>>>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
> >>>>>>>> the KVM causes guest exit with NMI as exit reason
> >>>>>>>> when it encounters a machine check exception on the
> >>>>>>>> address belonging to a guest. Without this capability
> >>>>>>>> enabled, KVM redirects machine check exceptions to
> >>>>>>>> guest's 0x200 vector.
> >>>>>>>>
> >>>>>>>> This patch also introduces fwnmi-mce capability to
> >>>>>>>> deal with the case when a guest with the
> >>>>>>>> KVM_CAP_PPC_FWNMI capability enabled is attempted
> >>>>>>>> to migrate to a host that does not support this
> >>>>>>>> capability.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>>>>>> ---
> >>>>>>>>  hw/ppc/spapr.c         |    1 +
> >>>>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
> >>>>>>>>  include/hw/ppc/spapr.h |    4 +++-
> >>>>>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
> >>>>>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
> >>>>>>>>  5 files changed, 61 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>>>> index 6dd8aaa..2ef86aa 100644
> >>>>>>>> --- a/hw/ppc/spapr.c
> >>>>>>>> +++ b/hw/ppc/spapr.c
> >>>>>>>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>>>      smc->irq = &spapr_irq_dual;
> >>>>>>>>      smc->dr_phb_enabled = true;
> >>>>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >>>>>>>> index 31b4661..2e92eb6 100644
> >>>>>>>> --- a/hw/ppc/spapr_caps.c
> >>>>>>>> +++ b/hw/ppc/spapr_caps.c
> >>>>>>>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>>>>>      }
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>>>>> +                                Error **errp)
> >>>>>>>> +{
> >>>>>>>> +    if (!val) {
> >>>>>>>> +        return; /* Disabled by default */
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    if (tcg_enabled()) {
> >>>>>>>> +        error_setg(errp,
> >>>>>>>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
> >>>>>>>
> >>>>>>> Not allowing this for TCG creates an awkward incompatibility between
> >>>>>>> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
> >>>>>>> - with the current code TCG won't ever generate NMIs, but I don't see
> >>>>>>> that anything will actually break.
> >>>>>>>
> >>>>>>> In fact, we do have an nmi monitor command, currently wired to the
> >>>>>>> spapr_nmi() function which resets each cpu, but it probably makes
> >>>>>>> sense to wire it up to the fwnmi stuff when present.
> >>>>>>
> >>>>>> Yes, but that nmi support is not enough to inject a synchronous error
> >>>>>> into the guest kernel. For example, we should provide the faulty address
> >>>>>> along with other information such as the type of error (slb multi-hit,
> >>>>>> memory error, TLB multi-hit) and when the error occurred (load/store)
> >>>>>> and whether the error was completely recovered or not. Without such
> >>>>>> information we cannot build the error log and pass it on to the guest
> >>>>>> kernel. Right now nmi monitor command takes cpu number as the only argument.
> >>>>>
> >>>>> Obviously we can't inject an arbitrary MCE event with that monitor
> >>>>> command.  But isn't there some sort of catch-all / unknown type of MCE
> >>>>> event which we could inject?
> >>>>
> >>>> We have "unknown" type of error, but we should also pass an address in
> >>>> the MCE event log. Strictly speaking this address should be a valid
> >>>> address in the current CPU context as MCEs are synchronous errors
> >>>> triggered when we touch a bad address.
> >>>
> >>> Well, some of them are.  At least historically both synchronous and
> >>> asnchronous MCEs were possible.  Are there really no versions where
> >>> you can report an MCE with unknown address?
> >>
> >> I am not aware of any such versions. Will cross check.
> >>
> >>>
> >>>> We can pass a default address with every nmi, but I am not sure whether
> >>>> that will be practically helpful.
> >>>>
> >>>>> It seems very confusing to me to have 2 totally separate "nmi"
> >>>>> mechanisms.
> >>>>>
> >>>>>> So I think TCG support should be a separate patch by itself.
> >>>>>
> >>>>> Even if we don't wire up the monitor command, I still don't see
> >>>>> anything that this patch breaks - we can support the nmi-register and
> >>>>> nmi-interlock calls without ever actually creating MCE events.
> >>>>
> >>>> If we support nmi-register and nmi-interlock calls without the monitor
> >>>> command wire-up then we will be falsely claiming the nmi support to the
> >>>> guest while it is not actually supported.
> >>>
> >>> How so?  AFAICT, from the point of view of the guest this is not
> >>> observably different from supporting the NMI mechanism but NMIs never
> >>> occurring.
> >>
> >> A guest inserting a duplicate SLB will expect the machine check
> >> exception delivered to the handler registered via nmi,register.
> >> But we actually don't do that in TCG.
> > 
> > Ah, true, I was thinking of external hardware fault triggered MCEs
> > rather than software error ones like duplicate SLB.
> > 
> > That said, I strongly suspect TCG is buggy enough at present that
> > exact behaviour in rare error conditions like duplicate SLB is not
> > really a big problem in the scheme of things.
> > 
> > I really don't think we can enable this by default until we allow it
> > for TCG - we don't want starting a TCG guest to involve manually
> > switching other options.
> > 
> > We could consider allowing it for TCG but just printing a warning that
> > the behaviour may not be correct in some conditions - we do something
> > similar for some of the Spectre workarounds already.
> 
> I think we better not enable this by default until we enhance TCG to
> support fwnmi.
> 

If we ever enhance TCG... until this get done, I concur with David's
idea of just printing a warning. System emulation+TCG is more a CI
or developer thing: we just want FWNMI not to break anything, even
if it doesn't work. KVM is the real life scenario we want to support.
If the feature is valuable, and I think it is, it should be the
default otherwise fewer people will have a chance to take benefit
from it.

> > 
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability
  2019-07-05 13:23                   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-07-08 10:26                     ` Aravinda Prasad
  0 siblings, 0 replies; 37+ messages in thread
From: Aravinda Prasad @ 2019-07-08 10:26 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, David Gibson



On Friday 05 July 2019 06:53 PM, Greg Kurz wrote:
> On Fri, 5 Jul 2019 16:49:17 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>>
>>
>> On Friday 05 July 2019 06:37 AM, David Gibson wrote:
>>> On Thu, Jul 04, 2019 at 10:33:11AM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Thursday 04 July 2019 06:37 AM, David Gibson wrote:
>>>>> On Wed, Jul 03, 2019 at 02:58:24PM +0530, Aravinda Prasad wrote:
>>>>>>
>>>>>>
>>>>>> On Wednesday 03 July 2019 08:33 AM, David Gibson wrote:
>>>>>>> On Tue, Jul 02, 2019 at 11:54:26AM +0530, Aravinda Prasad wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday 02 July 2019 09:21 AM, David Gibson wrote:
>>>>>>>>> On Wed, Jun 12, 2019 at 02:51:04PM +0530, Aravinda Prasad wrote:
>>>>>>>>>> Introduce the KVM capability KVM_CAP_PPC_FWNMI so that
>>>>>>>>>> the KVM causes guest exit with NMI as exit reason
>>>>>>>>>> when it encounters a machine check exception on the
>>>>>>>>>> address belonging to a guest. Without this capability
>>>>>>>>>> enabled, KVM redirects machine check exceptions to
>>>>>>>>>> guest's 0x200 vector.
>>>>>>>>>>
>>>>>>>>>> This patch also introduces fwnmi-mce capability to
>>>>>>>>>> deal with the case when a guest with the
>>>>>>>>>> KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>>>>>>>> to migrate to a host that does not support this
>>>>>>>>>> capability.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>>>>>> ---
>>>>>>>>>>  hw/ppc/spapr.c         |    1 +
>>>>>>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>>>>>>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>>>>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>>>>>>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>>>>>>>>>  5 files changed, 61 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>>>>>> index 6dd8aaa..2ef86aa 100644
>>>>>>>>>> --- a/hw/ppc/spapr.c
>>>>>>>>>> +++ b/hw/ppc/spapr.c
>>>>>>>>>> @@ -4360,6 +4360,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>>>>>      smc->irq = &spapr_irq_dual;
>>>>>>>>>>      smc->dr_phb_enabled = true;
>>>>>>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>>>>>>>> index 31b4661..2e92eb6 100644
>>>>>>>>>> --- a/hw/ppc/spapr_caps.c
>>>>>>>>>> +++ b/hw/ppc/spapr_caps.c
>>>>>>>>>> @@ -479,6 +479,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>>>>>      }
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>>>>> +                                Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    if (!val) {
>>>>>>>>>> +        return; /* Disabled by default */
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    if (tcg_enabled()) {
>>>>>>>>>> +        error_setg(errp,
>>>>>>>>>> +"No Firmware Assisted Non-Maskable Interrupts support in TCG, try cap-fwnmi-mce=off");
>>>>>>>>>
>>>>>>>>> Not allowing this for TCG creates an awkward incompatibility between
>>>>>>>>> KVM and TCG guests.  I can't actually see any reason to ban it for TCG
>>>>>>>>> - with the current code TCG won't ever generate NMIs, but I don't see
>>>>>>>>> that anything will actually break.
>>>>>>>>>
>>>>>>>>> In fact, we do have an nmi monitor command, currently wired to the
>>>>>>>>> spapr_nmi() function which resets each cpu, but it probably makes
>>>>>>>>> sense to wire it up to the fwnmi stuff when present.
>>>>>>>>
>>>>>>>> Yes, but that nmi support is not enough to inject a synchronous error
>>>>>>>> into the guest kernel. For example, we should provide the faulty address
>>>>>>>> along with other information such as the type of error (slb multi-hit,
>>>>>>>> memory error, TLB multi-hit) and when the error occurred (load/store)
>>>>>>>> and whether the error was completely recovered or not. Without such
>>>>>>>> information we cannot build the error log and pass it on to the guest
>>>>>>>> kernel. Right now nmi monitor command takes cpu number as the only argument.
>>>>>>>
>>>>>>> Obviously we can't inject an arbitrary MCE event with that monitor
>>>>>>> command.  But isn't there some sort of catch-all / unknown type of MCE
>>>>>>> event which we could inject?
>>>>>>
>>>>>> We have "unknown" type of error, but we should also pass an address in
>>>>>> the MCE event log. Strictly speaking this address should be a valid
>>>>>> address in the current CPU context as MCEs are synchronous errors
>>>>>> triggered when we touch a bad address.
>>>>>
>>>>> Well, some of them are.  At least historically both synchronous and
>>>>> asnchronous MCEs were possible.  Are there really no versions where
>>>>> you can report an MCE with unknown address?
>>>>
>>>> I am not aware of any such versions. Will cross check.
>>>>
>>>>>
>>>>>> We can pass a default address with every nmi, but I am not sure whether
>>>>>> that will be practically helpful.
>>>>>>
>>>>>>> It seems very confusing to me to have 2 totally separate "nmi"
>>>>>>> mechanisms.
>>>>>>>
>>>>>>>> So I think TCG support should be a separate patch by itself.
>>>>>>>
>>>>>>> Even if we don't wire up the monitor command, I still don't see
>>>>>>> anything that this patch breaks - we can support the nmi-register and
>>>>>>> nmi-interlock calls without ever actually creating MCE events.
>>>>>>
>>>>>> If we support nmi-register and nmi-interlock calls without the monitor
>>>>>> command wire-up then we will be falsely claiming the nmi support to the
>>>>>> guest while it is not actually supported.
>>>>>
>>>>> How so?  AFAICT, from the point of view of the guest this is not
>>>>> observably different from supporting the NMI mechanism but NMIs never
>>>>> occurring.
>>>>
>>>> A guest inserting a duplicate SLB will expect the machine check
>>>> exception delivered to the handler registered via nmi,register.
>>>> But we actually don't do that in TCG.
>>>
>>> Ah, true, I was thinking of external hardware fault triggered MCEs
>>> rather than software error ones like duplicate SLB.
>>>
>>> That said, I strongly suspect TCG is buggy enough at present that
>>> exact behaviour in rare error conditions like duplicate SLB is not
>>> really a big problem in the scheme of things.
>>>
>>> I really don't think we can enable this by default until we allow it
>>> for TCG - we don't want starting a TCG guest to involve manually
>>> switching other options.
>>>
>>> We could consider allowing it for TCG but just printing a warning that
>>> the behaviour may not be correct in some conditions - we do something
>>> similar for some of the Spectre workarounds already.
>>
>> I think we better not enable this by default until we enhance TCG to
>> support fwnmi.
>>
> 
> If we ever enhance TCG... until this get done, I concur with David's
> idea of just printing a warning. System emulation+TCG is more a CI
> or developer thing: we just want FWNMI not to break anything, even
> if it doesn't work. KVM is the real life scenario we want to support.
> If the feature is valuable, and I think it is, it should be the
> default otherwise fewer people will have a chance to take benefit
> from it.

ok.

> 
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-07-04  5:19             ` Aravinda Prasad
  2019-07-04 18:37               ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-07-09  6:41               ` David Gibson
  1 sibling, 0 replies; 37+ messages in thread
From: David Gibson @ 2019-07-09  6:41 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 10487 bytes --]

On Thu, Jul 04, 2019 at 10:49:05AM +0530, Aravinda Prasad wrote:
> 
> 
> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
> > On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> >>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> >>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>>>> type 4.0.
> >>>>>>
> >>>>>> The machine check notification address is saved when the
> >>>>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>>>
> >>>>>> This patch also handles the case when multiple processors
> >>>>>> experience machine check at or about the same time by
> >>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>>>> PAPR, subsequent processors serialize waiting for the first
> >>>>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>>>> processor that also received a machine check error waits
> >>>>>> till the first processor is done reading the error log.
> >>>>>> The first processor issues "ibm,nmi-interlock" call
> >>>>>> when the error log is consumed.
> >>>>>>
> >>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>>>> ---
> >>>>>>  hw/ppc/spapr.c         |    6 ++++-
> >>>>>>  hw/ppc/spapr_rtas.c    |   63 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>  include/hw/ppc/spapr.h |    5 +++-
> >>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>> index 3d6d139..213d493 100644
> >>>>>> --- a/hw/ppc/spapr.c
> >>>>>> +++ b/hw/ppc/spapr.c
> >>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState *machine)
> >>>>>>          /* Create the error string for live migration blocker */
> >>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
> >>>>>>                  "Live migration not supported during machine check handling");
> >>>>>> +
> >>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls */
> >>>>>> +        spapr_fwnmi_register();
> >>>>>>      }
> >>>>>>  
> >>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>>>
> >>>>> Turning this on by default really isn't ok if it stops you running TCG
> >>>>> guests at all.
> >>>>
> >>>> If so this can be "off" by default until TCG is supported.
> >>>>
> >>>>>
> >>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>      smc->irq = &spapr_irq_dual;
> >>>>>>      smc->dr_phb_enabled = true;
> >>>>>> @@ -4512,6 +4515,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>
> >>>>> We're now well past 4.0, and in fact we're about to go into soft
> >>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> >>>>> will need to retain the old default.
> >>>>
> >>>> ok.
> >>>>
> >>>>>
> >>>>>>  }
> >>>>>>  
> >>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>>> index a015a80..e010cb2 100644
> >>>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>>> @@ -49,6 +49,7 @@
> >>>>>>  #include "hw/ppc/fdt.h"
> >>>>>>  #include "target/ppc/mmu-hash64.h"
> >>>>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>>>> +#include "migration/blocker.h"
> >>>>>>  
> >>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>>>                                     uint32_t token, uint32_t nargs,
> >>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>>>      rtas_st(rets, 1, 100);
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>> +                                  SpaprMachineState *spapr,
> >>>>>> +                                  uint32_t token, uint32_t nargs,
> >>>>>> +                                  target_ulong args,
> >>>>>> +                                  uint32_t nret, target_ulong rets)
> >>>>>> +{
> >>>>>> +    int ret;
> >>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >>>>>> +
> >>>>>> +    if (!rtas_addr) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>> +        return;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>> +        return;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>>>> +    if (ret == 1) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>
> >>>>> I don't understand this case separate from the others.  We've already
> >>>>> set the cap, so fwnmi support should be checked and available.
> >>>>
> >>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
> >>>> cap_ppc_fwnmi is not available in KVM.
> >>>
> >>> But you've checked for the presence of the extension, yes?  So a
> >>> failure to enable the cap would be unexpected.  In which case how does
> >>> this case differ from.. 
> >>
> >> No, this is the function where I check for the presence of the
> >> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
> >> support is available, but don't take any action if unavailable.
> > 
> > Yeah, that's not ok.  You should be checking for the presence of the
> > extension in the .apply() function.  If you start up with the spapr
> > cap selected then failing at nmi-register time means something has
> > gone badly wrong.
> 
> So, I should check for two things in the .apply() function: first if
> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.

Not exactly.  Checking that the extension is supported means you *can*
enable it in KVM, but you should not do so at .apply() time (or NMI
behaviour won't be correct until nmi-register is called IIUC).  It
does mean that when you do enable the cap, a "not supported" failure
means something is wrong with the kernel.

> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
> called during spapr_machine_init().
> 
> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
> can't be enabled irrespective of whether a guest issues nmi,register or not.
> 
> > 
> > This is necessary for migration: if you start on a system with nmi
> > support and the guest registers for it, you can't then migrate safely
> > to a system that doesn't have nmi support.  The way to handle that
> > case is to have qemu fail to even start up on a destination without
> > the support.
> > 
> >> So this case is when we are running an old version of KVM with no
> >> cap_ppc_fwnmi support.
> >>
> >>>
> >>>>
> >>>>>
> >>>>>> +        return;
> >>>>>> +    } else if (ret < 0) {
> >>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>>>> +        return;
> >>>
> >>> ..this case.
> >>
> >> And this is when we have the KVM support but due to some problem with
> >> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
> >>
> >>>
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>>>> +                                   SpaprMachineState *spapr,
> >>>>>> +                                   uint32_t token, uint32_t nargs,
> >>>>>> +                                   target_ulong args,
> >>>>>> +                                   uint32_t nret, target_ulong rets)
> >>>>>> +{
> >>>>>> +    if (spapr->guest_machine_check_addr == -1) {
> >>>>>> +        /* NMI register not called */
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>>>>> +    } else {
> >>>>>> +        /*
> >>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >>>>>> +         * hence unset mc_status.
> >>>>>> +         */
> >>>>>> +        spapr->mc_status = -1;
> >>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>>>>
> >>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
> >>>>> another patch, which isn't great.  Second, does that mean we could add
> >>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
> >>>>> correctly match adds and removes properly?
> >>>>
> >>>> If it is fine to move the migration patch as the last patch in the
> >>>> sequence, then we will have add and del blocker in the same patch.
> >>>>
> >>>> And yes we could add multiple times if we get MCE on multiple CPUs and
> >>>> as all those cpus call interlock there should be matching number of
> >>>> delete blockers.
> >>>
> >>> Ok, and I think adding the same pointer to the list multiple times
> >>> will work ok.
> >>
> >> I think so
> >>
> >>>
> >>> Btw, add_blocker() can fail - have you handled failure conditions?
> >>
> >> yes, I am handling it.
> >>
> >>>
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2019-07-09  6:44 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-12  9:20 [Qemu-devel] [PATCH v10 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
2019-06-12  9:20 ` [Qemu-devel] [PATCH v10 1/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 2/6] ppc: spapr: Introduce FWNMI capability Aravinda Prasad
2019-07-02  3:51   ` David Gibson
2019-07-02  6:24     ` Aravinda Prasad
2019-07-03  3:03       ` David Gibson
2019-07-03  9:28         ` Aravinda Prasad
2019-07-04  1:07           ` David Gibson
2019-07-04  5:03             ` Aravinda Prasad
2019-07-05  1:07               ` David Gibson
2019-07-05 11:19                 ` Aravinda Prasad
2019-07-05 13:23                   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-07-08 10:26                     ` Aravinda Prasad
2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
2019-07-02  3:52   ` David Gibson
2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
2019-07-02  4:03   ` David Gibson
2019-07-02  9:49     ` Aravinda Prasad
2019-07-03  3:07       ` David Gibson
2019-07-03  7:16         ` Aravinda Prasad
2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 5/6] migration: Include migration support for machine check handling Aravinda Prasad
2019-06-12  9:21 ` [Qemu-devel] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
2019-06-24 14:29   ` Greg Kurz
2019-06-25  6:16     ` Aravinda Prasad
2019-06-25  7:00       ` Greg Kurz
2019-06-26  5:13         ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2019-07-02  4:12           ` David Gibson
2019-07-02  4:11   ` [Qemu-devel] " David Gibson
2019-07-02 10:40     ` Aravinda Prasad
2019-07-03  3:20       ` David Gibson
2019-07-03  9:00         ` Aravinda Prasad
2019-07-04  1:12           ` David Gibson
2019-07-04  5:19             ` Aravinda Prasad
2019-07-04 18:37               ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-07-05 11:11                 ` Aravinda Prasad
2019-07-05 12:35                   ` Greg Kurz
2019-07-09  6:41               ` [Qemu-devel] " David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).