[Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests
@ 2019-05-29  5:40 Aravinda Prasad
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
                   ` (5 more replies)
  0 siblings, 6 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:40 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

This patch set adds support for FWNMI in PowerKVM guests.

System errors such as SLB multihit and memory errors
that cannot be corrected by hardware is passed on to
the kernel for handling by raising machine check
exception (an NMI). Upon such machine check exceptions,
if the address in error belongs to guest then KVM
invokes guests' 0x200 interrupt vector if the guest
is not FWNMI capable. For FWNMI capable guest
KVM passes the control to QEMU by exiting the guest.

This patch series adds functionality to QEMU to pass
on such machine check exceptions to the FWNMI capable
guest kernel by building an error log and invoking
the guest registered machine check handling routine.

The KVM changes are now part of the upstream kernel
(commit e20bbd3d). This series contain QEMU changes.

Change Log v9:
  - Fixed kvm cap and spapr cap issues

Change Log v8:
  - Added functionality to check FWNMI capability during
    VM migration

Change Log v7:
  - Rebased to 4.1

Change Log v6:
  - Fetches rtas_addr from fdt
  - Handles all error conditions (earlier it was only UEs)

Change Log v5:
  - Handled VM migrations by including rtas_addr in VMSTATE.
  - Migration is blocked while a machine check is in progress.

---

Aravinda Prasad (6):
      ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
      Wrapper function to wait on condition for the main loop mutex
      target/ppc: Handle NMI guest exit
      target/ppc: Build rtas error log upon an MCE
      ppc: spapr: Enable FWNMI capability
      migration: Include migration support for machine check handling


 cpus.c                   |    5 +
 hw/ppc/spapr.c           |   34 ++++++
 hw/ppc/spapr_caps.c      |   24 ++++
 hw/ppc/spapr_events.c    |  276 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c      |   92 +++++++++++++++
 include/hw/ppc/spapr.h   |   25 ++++
 include/qemu/main-loop.h |    8 +
 target/ppc/kvm.c         |   35 ++++++
 target/ppc/kvm_ppc.h     |   14 ++
 target/ppc/trace-events  |    1 
 10 files changed, 512 insertions(+), 2 deletions(-)

--
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-05-29  5:40 [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
@ 2019-05-29  5:40 ` Aravinda Prasad
  2019-06-03 10:12   ` Greg Kurz
  2019-06-06  1:34   ` [Qemu-devel] " David Gibson
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 2/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:40 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.

The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.

This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed. This patch implements the
releasing part of the error-log while subsequent patch
(which builds error log) handles the locking part.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c         |    7 +++++
 hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    9 ++++++-
 3 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e2b33e5..fae28a9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
     first_ppc_cpu->env.gpr[5] = 0;
 
     spapr->cas_reboot = false;
+
+    spapr->guest_machine_check_addr = -1;
+
+    /* Signal all vCPUs waiting on this condition */
+    qemu_cond_broadcast(&spapr->mc_delivery_cond);
 }
 
 static void spapr_create_nvram(SpaprMachineState *spapr)
@@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
 
         kvmppc_spapr_enable_inkernel_multitce();
     }
+
+    qemu_cond_init(&spapr->mc_delivery_cond);
 }
 
 static int spapr_kvm_type(MachineState *machine, const char *vm_type)
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 5bc1a93..e7509cf 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
     rtas_st(rets, 1, 100);
 }
 
+static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
+                                  SpaprMachineState *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args,
+                                  uint32_t nret, target_ulong rets)
+{
+    hwaddr rtas_addr = spapr_get_rtas_addr();
+
+    if (!rtas_addr) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    spapr->guest_machine_check_addr = rtas_ld(args, 1);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
+static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    if (spapr->guest_machine_check_addr == -1) {
+        /* NMI register not called */
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+    } else {
+        qemu_cond_signal(&spapr->mc_delivery_cond);
+        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    }
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
     }
 }
 
+hwaddr spapr_get_rtas_addr(void)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int rtas_node;
+    const struct fdt_property *rtas_addr_prop;
+    void *fdt = spapr->fdt_blob;
+    uint32_t rtas_addr;
+
+    /* fetch rtas addr from fdt */
+    rtas_node = fdt_path_offset(fdt, "/rtas");
+    if (rtas_node == 0) {
+        return 0;
+    }
+
+    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
+    if (!rtas_addr_prop) {
+        return 0;
+    }
+
+    /*
+     * We assume that the OS called RTAS instantiate-rtas, but some other
+     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
+     * as SLOF only supports 32-bit variant.
+     */
+    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
+    return (hwaddr)rtas_addr;
+}
+
+
 static void core_rtas_register_types(void)
 {
     spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
@@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
                         rtas_set_power_level);
     spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
                         rtas_get_power_level);
+    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
+                        rtas_ibm_nmi_register);
+    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
+                        rtas_ibm_nmi_interlock);
 }
 
 type_init(core_rtas_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4f5becf..9dc5e30 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -188,6 +188,10 @@ struct SpaprMachineState {
      * occurs during the unplug process. */
     QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
 
+    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
+    target_ulong guest_machine_check_addr;
+    QemuCond mc_delivery_cond;
+
     /*< public >*/
     char *kvm_type;
     char *host_model;
@@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
 #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
 #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
+#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
 
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
@@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
+uint64_t spapr_get_rtas_addr(void);
 #endif /* HW_SPAPR_H */



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH v9 2/6] Wrapper function to wait on condition for the main loop mutex
  2019-05-29  5:40 [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
@ 2019-05-29  5:40 ` Aravinda Prasad
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:40 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Introduce a wrapper function to wait on condition for
the main loop mutex. This function atomically releases
the main loop mutex and causes the calling thread to
block on the condition. This wrapper is required because
qemu_global_mutex is a static variable.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 cpus.c                   |    5 +++++
 include/qemu/main-loop.h |    8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/cpus.c b/cpus.c
index ffc5711..14cb24b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1867,6 +1867,11 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_cond_wait_iothread(QemuCond *cond)
+{
+    qemu_cond_wait(cond, &qemu_global_mutex);
+}
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index f6ba78e..a6d20b0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/*
+ * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ *
+ * This function atomically releases the main loop mutex and causes
+ * the calling thread to block on the condition.
+ */
+void qemu_cond_wait_iothread(QemuCond *cond);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit
  2019-05-29  5:40 [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 2/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
@ 2019-05-29  5:40 ` Aravinda Prasad
  2019-06-03 11:53   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-06-06  1:43   ` [Qemu-devel] " David Gibson
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:40 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.

[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
    (e20bbd3d and related commits)

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c          |    1 +
 hw/ppc/spapr_events.c   |   23 +++++++++++++++++++++++
 hw/ppc/spapr_rtas.c     |    5 +++++
 include/hw/ppc/spapr.h  |    6 ++++++
 target/ppc/kvm.c        |   16 ++++++++++++++++
 target/ppc/kvm_ppc.h    |    2 ++
 target/ppc/trace-events |    1 +
 7 files changed, 54 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fae28a9..6b6c962 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1809,6 +1809,7 @@ static void spapr_machine_reset(void)
 
     spapr->cas_reboot = false;
 
+    spapr->mc_status = -1;
     spapr->guest_machine_check_addr = -1;
 
     /* Signal all vCPUs waiting on this condition */
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index ae0f093..a18446b 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -620,6 +620,29 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+
+    while (spapr->mc_status != -1) {
+        /*
+         * Check whether the same CPU got machine check error
+         * while still handling the mc error (i.e., before
+         * that CPU called "ibm,nmi-interlock")
+         */
+        if (spapr->mc_status == cpu->vcpu_id) {
+            qemu_system_guest_panicked(NULL);
+            return;
+        }
+        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
+        /* Meanwhile if the system is reset, then just return */
+        if (spapr->guest_machine_check_addr == -1) {
+            return;
+        }
+    }
+    spapr->mc_status = cpu->vcpu_id;
+}
+
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
                             uint32_t token, uint32_t nargs,
                             target_ulong args,
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index e7509cf..e0bdfc8 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -379,6 +379,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
         /* NMI register not called */
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
     } else {
+        /*
+         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
+         * hence unset mc_status.
+         */
+        spapr->mc_status = -1;
         qemu_cond_signal(&spapr->mc_delivery_cond);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
     }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9dc5e30..fc3a776 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -190,6 +190,11 @@ struct SpaprMachineState {
 
     /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
     target_ulong guest_machine_check_addr;
+    /*
+     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
+     * handling the mc.
+     */
+    int mc_status;
     QemuCond mc_delivery_cond;
 
     /*< public >*/
@@ -793,6 +798,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 3bf0a46..39f1a73 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1761,6 +1761,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ret = 0;
         break;
 
+    case KVM_EXIT_NMI:
+        trace_kvm_handle_nmi_exception();
+        ret = kvm_handle_nmi(cpu, run);
+        break;
+
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
@@ -2844,6 +2849,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
     return data & 0xffff;
 }
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
+{
+    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    spapr_mce_req_event(cpu, recovered);
+
+    return 0;
+}
+
 int kvmppc_enable_hwrng(void)
 {
     if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 45776ca..18693f1 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -81,6 +81,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
 void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
 void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
+
 #else
 
 static inline uint32_t kvmppc_get_tbfreq(void)
diff --git a/target/ppc/trace-events b/target/ppc/trace-events
index 3dc6740..6d15aa9 100644
--- a/target/ppc/trace-events
+++ b/target/ppc/trace-events
@@ -28,3 +28,4 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
 kvm_handle_epr(void) "handle epr"
 kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
 kvm_handle_debug_exception(void) "handle debug exception"
+kvm_handle_nmi_exception(void) "handle NMI exception"



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-29  5:40 [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
                   ` (2 preceding siblings ...)
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
@ 2019-05-29  5:40 ` Aravinda Prasad
  2019-06-03 14:00   ` Greg Kurz
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability Aravinda Prasad
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling Aravinda Prasad
  5 siblings, 1 reply; 44+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:40 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.

This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    5 +
 hw/ppc/spapr_events.c  |  236 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    4 +
 3 files changed, 245 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6b6c962..c97f6a6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2910,6 +2910,11 @@ static void spapr_machine_init(MachineState *machine)
         error_report("Could not get size of LPAR rtas '%s'", filename);
         exit(1);
     }
+
+    /* Resize blob to accommodate error log. */
+    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
+    spapr->rtas_size = RTAS_ERROR_LOG_MAX;
+
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
     if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
         error_report("Could not load LPAR rtas '%s'", filename);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index a18446b..573c0b7 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -212,6 +212,106 @@ struct hp_extended_log {
     struct rtas_event_log_v6_hp hp;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_mc {
+#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
+    struct rtas_event_log_v6_section_header hdr;
+    uint32_t fru_id;
+    uint32_t proc_id;
+    uint8_t error_type;
+#define RTAS_LOG_V6_MC_TYPE_UE                           0
+#define RTAS_LOG_V6_MC_TYPE_SLB                          1
+#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
+#define RTAS_LOG_V6_MC_TYPE_TLB                          4
+#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
+#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
+    uint8_t sub_err_type;
+#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
+#define RTAS_LOG_V6_MC_UE_IFETCH                         1
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
+#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
+#define RTAS_LOG_V6_MC_SLB_PARITY                        0
+#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
+#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
+#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
+#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
+#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
+#define RTAS_LOG_V6_MC_TLB_PARITY                        1
+#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
+#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
+    uint8_t reserved_1[6];
+    uint64_t effective_address;
+    uint64_t logical_address;
+} QEMU_PACKED;
+
+struct mc_extended_log {
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_mc mc;
+} QEMU_PACKED;
+
+struct MC_ierror_table {
+    unsigned long srr1_mask;
+    unsigned long srr1_value;
+    bool nip_valid; /* nip is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_ierror_table mc_ierror_table[] = {
+{ 0x00000000081c0000, 0x0000000000040000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000080000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x00000000000c0000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000100000, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000140000, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000180000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, 0, 0, 0, 0, 0 } };
+
+struct MC_derror_table {
+    unsigned long dsisr_value;
+    bool dar_valid; /* dar is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_derror_table mc_derror_table[] = {
+{ 0x00008000, false,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00004000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000800, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000400, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000080, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000100, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, false, 0, 0, 0, 0 } };
+
+#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
+
 typedef enum EventClass {
     EVENT_CLASS_INTERNAL_ERRORS     = 0,
     EVENT_CLASS_EPOW                = 1,
@@ -620,6 +720,138 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
+                                        struct mc_extended_log *ext_elog)
+{
+    int i;
+    CPUPPCState *env = &cpu->env;
+    uint32_t summary;
+    uint64_t dsisr = env->spr[SPR_DSISR];
+
+    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
+    if (recovered) {
+        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
+    } else {
+        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
+    }
+
+    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
+        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
+            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_derror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
+            if (mc_derror_table[i].dar_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
+            }
+
+            summary |= mc_derror_table[i].initiator
+                        | mc_derror_table[i].severity;
+
+            return summary;
+        }
+    } else {
+        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
+            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
+                    mc_ierror_table[i].srr1_value) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
+            if (mc_ierror_table[i].nip_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
+            }
+
+            summary |= mc_ierror_table[i].initiator
+                        | mc_ierror_table[i].severity;
+
+            return summary;
+        }
+    }
+
+    summary |= RTAS_LOG_INITIATOR_CPU;
+    return summary;
+}
+
+static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
+    uint64_t rtas_addr;
+    CPUPPCState *env = &cpu->env;
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    target_ulong r3, msr = 0;
+    struct rtas_error_log log;
+    struct mc_extended_log *ext_elog;
+    uint32_t summary;
+
+    /*
+     * Properly set bits in MSR before we invoke the handler.
+     * SRR0/1, DAR and DSISR are properly set by KVM
+     */
+    if (!(*pcc->interrupts_big_endian)(cpu)) {
+        msr |= (1ULL << MSR_LE);
+    }
+
+    if (env->msr & (1ULL << MSR_SF)) {
+        msr |= (1ULL << MSR_SF);
+    }
+
+    msr |= (1ULL << MSR_ME);
+
+    if (spapr->guest_machine_check_addr == -1) {
+        /*
+         * This implies that we have hit a machine check between system
+         * reset and "ibm,nmi-register". Fall back to the old machine
+         * check behavior in such cases.
+         */
+        env->spr[SPR_SRR0] = env->nip;
+        env->spr[SPR_SRR1] = env->msr;
+        env->msr = msr;
+        env->nip = 0x200;
+        return;
+    }
+
+    ext_elog = g_malloc0(sizeof(*ext_elog));
+    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
+
+    log.summary = cpu_to_be32(summary);
+    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
+
+    /* r3 should be in BE always */
+    r3 = cpu_to_be64(env->gpr[3]);
+    env->msr = msr;
+
+    spapr_init_v6hdr(&ext_elog->v6hdr);
+    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
+    ext_elog->mc.hdr.section_length =
+                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
+    ext_elog->mc.hdr.section_version = 1;
+
+    /* get rtas addr from fdt */
+    rtas_addr = spapr_get_rtas_addr();
+    if (!rtas_addr) {
+        /* Unable to fetch rtas_addr. Hence reset the guest */
+        ppc_cpu_do_system_reset(cs);
+    }
+
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
+                              sizeof(r3));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
+                              &log, sizeof(log));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
+                              sizeof(log), ext_elog,
+                              sizeof(*ext_elog));
+
+    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
+    env->nip = spapr->guest_machine_check_addr;
+
+    g_free(ext_elog);
+}
+
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
@@ -641,6 +873,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
         }
     }
     spapr->mc_status = cpu->vcpu_id;
+
+    spapr_mce_dispatch_elog(cpu, recovered);
+
+    return;
 }
 
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index fc3a776..c717ab2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
 
 #define RTAS_ERROR_LOG_MAX      2048
 
+/* Offset from rtas-base where error log is placed */
+#define RTAS_ERROR_LOG_OFFSET       0x30
+
 #define RTAS_EVENT_SCAN_RATE    1
 
 /* This helper should be used to encode interrupt specifiers when the related
@@ -799,6 +802,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
+ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-29  5:40 [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
                   ` (3 preceding siblings ...)
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
@ 2019-05-29  5:40 ` Aravinda Prasad
  2019-06-03 15:25   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-06-06  3:00   ` [Qemu-devel] " David Gibson
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling Aravinda Prasad
  5 siblings, 2 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:40 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

Enable the KVM capability KVM_CAP_PPC_FWNMI so that
the KVM causes guest exit with NMI as exit reason
when it encounters a machine check exception on the
address belonging to a guest. Without this capability
enabled, KVM redirects machine check exceptions to
guest's 0x200 vector.

This patch also deals with the case when a guest with
the KVM_CAP_PPC_FWNMI capability enabled is attempted
to migrate to a host that does not support this
capability.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    1 +
 hw/ppc/spapr_caps.c    |   24 ++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c    |   18 ++++++++++++++++++
 include/hw/ppc/spapr.h |    4 +++-
 target/ppc/kvm.c       |   19 +++++++++++++++++++
 target/ppc/kvm_ppc.h   |   12 ++++++++++++
 6 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c97f6a6..e8a77636 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4364,6 +4364,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
     spapr_caps_add_properties(smc, &error_abort);
     smc->irq = &spapr_irq_dual;
     smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 31b4661..ef9e612 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -479,6 +479,20 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
     }
 }
 
+static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
+                                Error **errp)
+{
+    if (!val) {
+        return; /* Disabled by default */
+    }
+
+    if (tcg_enabled()) {
+            error_setg(errp, "No fwnmi support in TCG, try cap-fwnmi-mce=off");
+    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
+            error_setg(errp, "Requested fwnmi capability not support by KVM");
+    }
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
     [SPAPR_CAP_HTM] = {
         .name = "htm",
@@ -578,6 +592,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
         .type = "bool",
         .apply = cap_ccf_assist_apply,
     },
+    [SPAPR_CAP_FWNMI_MCE] = {
+        .name = "fwnmi-mce",
+        .description = "Handle fwnmi machine check exceptions",
+        .index = SPAPR_CAP_FWNMI_MCE,
+        .get = spapr_cap_get_bool,
+        .set = spapr_cap_set_bool,
+        .type = "bool",
+        .apply = cap_fwnmi_mce_apply,
+    },
 };
 
 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
@@ -717,6 +740,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
+SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
 
 void spapr_caps_init(SpaprMachineState *spapr)
 {
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index e0bdfc8..91a7ab9 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -49,6 +49,7 @@
 #include "hw/ppc/fdt.h"
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
+#include "kvm_ppc.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -358,6 +359,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
                                   target_ulong args,
                                   uint32_t nret, target_ulong rets)
 {
+    int ret;
     hwaddr rtas_addr = spapr_get_rtas_addr();
 
     if (!rtas_addr) {
@@ -365,6 +367,22 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
         return;
     }
 
+    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == 0) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    ret = kvmppc_fwnmi_enable(cpu);
+    if (ret == 1) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    if (ret < 0) {
+        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+        return;
+    }
+
     spapr->guest_machine_check_addr = rtas_ld(args, 1);
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c717ab2..bd75d4b 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -78,8 +78,10 @@ typedef enum {
 #define SPAPR_CAP_LARGE_DECREMENTER     0x08
 /* Count Cache Flush Assist HW Instruction */
 #define SPAPR_CAP_CCF_ASSIST            0x09
+/* FWNMI machine check handling */
+#define SPAPR_CAP_FWNMI_MCE             0x0A
 /* Num Caps */
-#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
+#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
 
 /*
  * Capability Values
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 39f1a73..368ec6e 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
 static int cap_ppc_count_cache_flush_assist;
 static int cap_ppc_nested_kvm_hv;
 static int cap_large_decr;
+static int cap_ppc_fwnmi;
 
 static uint32_t debug_inst_opcode;
 
@@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     kvmppc_get_cpu_characteristics(s);
     cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
     cap_large_decr = kvmppc_get_dec_bits();
+    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
     /*
      * Note: setting it to false because there is not such capability
      * in KVM at this moment.
@@ -2119,6 +2121,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
     }
 }
 
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+
+    if (!cap_ppc_fwnmi) {
+        return 1;
+    }
+
+    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
+}
+
+
 int kvmppc_smt_threads(void)
 {
     return cap_ppc_smt ? cap_ppc_smt : 1;
@@ -2419,6 +2433,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
     return cap_mmu_hash_v3;
 }
 
+bool kvmppc_has_cap_ppc_fwnmi(void)
+{
+    return cap_ppc_fwnmi;
+}
+
 static bool kvmppc_power8_host(void)
 {
     bool ret = false;
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 18693f1..3d9f0b4 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
 void kvmppc_set_papr(PowerPCCPU *cpu);
 int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
 void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
+bool kvmppc_has_cap_ppc_fwnmi(void);
 int kvmppc_smt_threads(void);
 void kvmppc_hint_smt_possible(Error **errp);
 int kvmppc_set_smt_threads(int smt);
@@ -160,6 +162,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
 {
 }
 
+static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    return 1;
+}
+
+static inline bool kvmppc_has_cap_ppc_fwnmi(void)
+{
+    return false;
+}
+
 static inline int kvmppc_smt_threads(void)
 {
     return 1;



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-05-29  5:40 [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
                   ` (4 preceding siblings ...)
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability Aravinda Prasad
@ 2019-05-29  5:40 ` Aravinda Prasad
  2019-06-03 15:40   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-06-06  3:06   ` [Qemu-devel] " David Gibson
  5 siblings, 2 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:40 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, groug

This patch includes migration support for machine check
handling. Especially this patch blocks VM migration
requests until the machine check error handling is
complete as (i) these errors are specific to the source
hardware and is irrelevant on the target hardware,
(ii) these errors cause data corruption and should
be handled before migration.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   20 ++++++++++++++++++++
 hw/ppc/spapr_events.c  |   17 +++++++++++++++++
 hw/ppc/spapr_rtas.c    |    4 ++++
 include/hw/ppc/spapr.h |    2 ++
 4 files changed, 43 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e8a77636..31c4850 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
     },
 };
 
+static bool spapr_fwnmi_needed(void *opaque)
+{
+    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
+
+    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;
+}
+
+static const VMStateDescription vmstate_spapr_machine_check = {
+    .name = "spapr_machine_check",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_fwnmi_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
+        VMSTATE_INT32(mc_status, SpaprMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_dtb,
         &vmstate_spapr_cap_large_decr,
         &vmstate_spapr_cap_ccf_assist,
+        &vmstate_spapr_machine_check,
         NULL
     }
 };
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 573c0b7..35e21e4 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -41,6 +41,7 @@
 #include "qemu/bcd.h"
 #include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
+#include "migration/blocker.h"
 
 #define RTAS_LOG_VERSION_MASK                   0xff000000
 #define   RTAS_LOG_VERSION_6                    0x06000000
@@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int ret;
+    Error *local_err = NULL;
+
+    error_setg(&spapr->fwnmi_migration_blocker,
+            "Live migration not supported during machine check handling");
+    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
+    if (ret < 0) {
+        /*
+         * We don't want to abort and let the migration to continue. In a
+         * rare case, the machine check handler will run on the target
+         * hardware. Though this is not preferable, it is better than aborting
+         * the migration or killing the VM.
+         */
+        error_free(spapr->fwnmi_migration_blocker);
+        warn_report_err(local_err);
+    }
 
     while (spapr->mc_status != -1) {
         /*
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 91a7ab9..c849223 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -50,6 +50,7 @@
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
 #include "kvm_ppc.h"
+#include "migration/blocker.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
         spapr->mc_status = -1;
         qemu_cond_signal(&spapr->mc_delivery_cond);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+        migrate_del_blocker(spapr->fwnmi_migration_blocker);
+        error_free(spapr->fwnmi_migration_blocker);
+        spapr->fwnmi_migration_blocker = NULL;
     }
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index bd75d4b..6c0cfd8 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -214,6 +214,8 @@ struct SpaprMachineState {
     SpaprCapabilities def, eff, mig;
 
     unsigned gpu_numa_id;
+
+    Error *fwnmi_migration_blocker;
 };
 
 #define H_SUCCESS         0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
@ 2019-06-03 10:12   ` Greg Kurz
  2019-06-03 11:17     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-06-06  1:34   ` [Qemu-devel] " David Gibson
  1 sibling, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-03 10:12 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Wed, 29 May 2019 11:10:14 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed. This patch implements the
> releasing part of the error-log while subsequent patch
> (which builds error log) handles the locking part.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---

The code looks okay but it still seems wrong to advertise the RTAS
calls to the guest that early in the series. The linux kernel in
the guest will assume FWNMI is functional, which isn't true until
patch 6 (yes, migration is part of the feature, it should be
supported upfront, not fixed afterwards).

It doesn't help much to introduce the RTAS calls early and to
modify them in the other patches. I'd rather see the rest of
the code first and a final patch that introduces the fully
functional RTAS calls and calls spapr_rtas_register().

>  hw/ppc/spapr.c         |    7 +++++
>  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    9 ++++++-
>  3 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e2b33e5..fae28a9 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
>      first_ppc_cpu->env.gpr[5] = 0;
>  
>      spapr->cas_reboot = false;
> +
> +    spapr->guest_machine_check_addr = -1;
> +
> +    /* Signal all vCPUs waiting on this condition */
> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>  }
>  
>  static void spapr_create_nvram(SpaprMachineState *spapr)
> @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
>  
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
> +
> +    qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 5bc1a93..e7509cf 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>      }
>  }
>  
> +hwaddr spapr_get_rtas_addr(void)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int rtas_node;
> +    const struct fdt_property *rtas_addr_prop;
> +    void *fdt = spapr->fdt_blob;
> +    uint32_t rtas_addr;
> +
> +    /* fetch rtas addr from fdt */
> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> +    if (rtas_node == 0) {
> +        return 0;
> +    }
> +
> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> +    if (!rtas_addr_prop) {
> +        return 0;
> +    }
> +
> +    /*
> +     * We assume that the OS called RTAS instantiate-rtas, but some other
> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
> +     * as SLOF only supports 32-bit variant.
> +     */
> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> +    return (hwaddr)rtas_addr;
> +}
> +
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
>                          rtas_set_power_level);
>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>                          rtas_get_power_level);
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
>  }
>  
>  type_init(core_rtas_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 4f5becf..9dc5e30 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -188,6 +188,10 @@ struct SpaprMachineState {
>       * occurs during the unplug process. */
>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>  
> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> +    target_ulong guest_machine_check_addr;
> +    QemuCond mc_delivery_cond;
> +
>      /*< public >*/
>      char *kvm_type;
>      char *host_model;
> @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> +uint64_t spapr_get_rtas_addr(void);
>  #endif /* HW_SPAPR_H */
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-03 10:12   ` Greg Kurz
@ 2019-06-03 11:17     ` Greg Kurz
  2019-06-04  6:08       ` Aravinda Prasad
  2019-06-06  1:35       ` David Gibson
  0 siblings, 2 replies; 44+ messages in thread
From: Greg Kurz @ 2019-06-03 11:17 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Mon, 3 Jun 2019 12:12:43 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Wed, 29 May 2019 11:10:14 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
> > This patch adds support in QEMU to handle "ibm,nmi-register"
> > and "ibm,nmi-interlock" RTAS calls.
> > 
> > The machine check notification address is saved when the
> > OS issues "ibm,nmi-register" RTAS call.
> > 
> > This patch also handles the case when multiple processors
> > experience machine check at or about the same time by
> > handling "ibm,nmi-interlock" call. In such cases, as per
> > PAPR, subsequent processors serialize waiting for the first
> > processor to issue the "ibm,nmi-interlock" call. The second
> > processor that also received a machine check error waits
> > till the first processor is done reading the error log.
> > The first processor issues "ibm,nmi-interlock" call
> > when the error log is consumed. This patch implements the
> > releasing part of the error-log while subsequent patch
> > (which builds error log) handles the locking part.
> > 
> > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---  
> 
> The code looks okay but it still seems wrong to advertise the RTAS
> calls to the guest that early in the series. The linux kernel in
> the guest will assume FWNMI is functional, which isn't true until
> patch 6 (yes, migration is part of the feature, it should be
> supported upfront, not fixed afterwards).
> 
> It doesn't help much to introduce the RTAS calls early and to
> modify them in the other patches. I'd rather see the rest of
> the code first and a final patch that introduces the fully
> functional RTAS calls and calls spapr_rtas_register().
> 

Thinking again, you should introduce the "fwnmi-mce" spapr capability in
its own patch first, default to "off" and and have the last patch in the
series to switch the default to "on" for newer machine types only.

This patch should then only register the RTAS calls if "fwnmi-mcr" is set
to "on".

This should address the fact that we don't want to expose a partially
implemented FWNMI feature to the guest, and we don't want to support
FWNMI at all with older machine types for the sake of compatibility.

> >  hw/ppc/spapr.c         |    7 +++++
> >  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h |    9 ++++++-
> >  3 files changed, 80 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index e2b33e5..fae28a9 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
> >      first_ppc_cpu->env.gpr[5] = 0;
> >  
> >      spapr->cas_reboot = false;
> > +
> > +    spapr->guest_machine_check_addr = -1;
> > +
> > +    /* Signal all vCPUs waiting on this condition */
> > +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
> >  }
> >  
> >  static void spapr_create_nvram(SpaprMachineState *spapr)
> > @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
> >  
> >          kvmppc_spapr_enable_inkernel_multitce();
> >      }
> > +
> > +    qemu_cond_init(&spapr->mc_delivery_cond);
> >  }
> >  
> >  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index 5bc1a93..e7509cf 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >      rtas_st(rets, 1, 100);
> >  }
> >  
> > +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> > +                                  SpaprMachineState *spapr,
> > +                                  uint32_t token, uint32_t nargs,
> > +                                  target_ulong args,
> > +                                  uint32_t nret, target_ulong rets)
> > +{
> > +    hwaddr rtas_addr = spapr_get_rtas_addr();
> > +
> > +    if (!rtas_addr) {
> > +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> > +        return;
> > +    }
> > +
> > +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +}
> > +
> > +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> > +                                   SpaprMachineState *spapr,
> > +                                   uint32_t token, uint32_t nargs,
> > +                                   target_ulong args,
> > +                                   uint32_t nret, target_ulong rets)
> > +{
> > +    if (spapr->guest_machine_check_addr == -1) {
> > +        /* NMI register not called */
> > +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > +    } else {
> > +        qemu_cond_signal(&spapr->mc_delivery_cond);
> > +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    }
> > +}
> > +
> >  static struct rtas_call {
> >      const char *name;
> >      spapr_rtas_fn fn;
> > @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
> >      }
> >  }
> >  
> > +hwaddr spapr_get_rtas_addr(void)
> > +{
> > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > +    int rtas_node;
> > +    const struct fdt_property *rtas_addr_prop;
> > +    void *fdt = spapr->fdt_blob;
> > +    uint32_t rtas_addr;
> > +
> > +    /* fetch rtas addr from fdt */
> > +    rtas_node = fdt_path_offset(fdt, "/rtas");
> > +    if (rtas_node == 0) {
> > +        return 0;
> > +    }
> > +
> > +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> > +    if (!rtas_addr_prop) {
> > +        return 0;
> > +    }
> > +
> > +    /*
> > +     * We assume that the OS called RTAS instantiate-rtas, but some other
> > +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
> > +     * as SLOF only supports 32-bit variant.
> > +     */
> > +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> > +    return (hwaddr)rtas_addr;
> > +}
> > +
> > +
> >  static void core_rtas_register_types(void)
> >  {
> >      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> > @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
> >                          rtas_set_power_level);
> >      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> >                          rtas_get_power_level);
> > +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> > +                        rtas_ibm_nmi_register);
> > +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> > +                        rtas_ibm_nmi_interlock);
> >  }
> >  
> >  type_init(core_rtas_register_types)
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 4f5becf..9dc5e30 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -188,6 +188,10 @@ struct SpaprMachineState {
> >       * occurs during the unplug process. */
> >      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
> >  
> > +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> > +    target_ulong guest_machine_check_addr;
> > +    QemuCond mc_delivery_cond;
> > +
> >      /*< public >*/
> >      char *kvm_type;
> >      char *host_model;
> > @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> >  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
> >  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
> >  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> > +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> > +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
> >  
> > -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> > +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
> >  
> >  /* RTAS ibm,get-system-parameter token values */
> >  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> > @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> >  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
> >  
> >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > +uint64_t spapr_get_rtas_addr(void);
> >  #endif /* HW_SPAPR_H */
> >   
> 
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 3/6] target/ppc: Handle NMI guest exit
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
@ 2019-06-03 11:53   ` Greg Kurz
  2019-06-06  1:43   ` [Qemu-devel] " David Gibson
  1 sibling, 0 replies; 44+ messages in thread
From: Greg Kurz @ 2019-06-03 11:53 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Wed, 29 May 2019 11:10:32 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> Memory error such as bit flips that cannot be corrected
> by hardware are passed on to the kernel for handling.
> If the memory address in error belongs to guest then
> the guest kernel is responsible for taking suitable action.
> Patch [1] enhances KVM to exit guest with exit reason
> set to KVM_EXIT_NMI in such cases. This patch handles
> KVM_EXIT_NMI exit.
> 
> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>     (e20bbd3d and related commits)
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---

LGTM

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/ppc/spapr.c          |    1 +
>  hw/ppc/spapr_events.c   |   23 +++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c     |    5 +++++
>  include/hw/ppc/spapr.h  |    6 ++++++
>  target/ppc/kvm.c        |   16 ++++++++++++++++
>  target/ppc/kvm_ppc.h    |    2 ++
>  target/ppc/trace-events |    1 +
>  7 files changed, 54 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index fae28a9..6b6c962 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1809,6 +1809,7 @@ static void spapr_machine_reset(void)
>  
>      spapr->cas_reboot = false;
>  
> +    spapr->mc_status = -1;
>      spapr->guest_machine_check_addr = -1;
>  
>      /* Signal all vCPUs waiting on this condition */
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ae0f093..a18446b 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -620,6 +620,29 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +
> +    while (spapr->mc_status != -1) {
> +        /*
> +         * Check whether the same CPU got machine check error
> +         * while still handling the mc error (i.e., before
> +         * that CPU called "ibm,nmi-interlock")
> +         */
> +        if (spapr->mc_status == cpu->vcpu_id) {
> +            qemu_system_guest_panicked(NULL);
> +            return;
> +        }
> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> +        /* Meanwhile if the system is reset, then just return */
> +        if (spapr->guest_machine_check_addr == -1) {
> +            return;
> +        }
> +    }
> +    spapr->mc_status = cpu->vcpu_id;
> +}
> +
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                              uint32_t token, uint32_t nargs,
>                              target_ulong args,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index e7509cf..e0bdfc8 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -379,6 +379,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          /* NMI register not called */
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>      } else {
> +        /*
> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_status.
> +         */
> +        spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>      }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9dc5e30..fc3a776 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -190,6 +190,11 @@ struct SpaprMachineState {
>  
>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>      target_ulong guest_machine_check_addr;
> +    /*
> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> +     * handling the mc.
> +     */
> +    int mc_status;
>      QemuCond mc_delivery_cond;
>  
>      /*< public >*/
> @@ -793,6 +798,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 3bf0a46..39f1a73 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1761,6 +1761,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ret = 0;
>          break;
>  
> +    case KVM_EXIT_NMI:
> +        trace_kvm_handle_nmi_exception();
> +        ret = kvm_handle_nmi(cpu, run);
> +        break;
> +
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> @@ -2844,6 +2849,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>      return data & 0xffff;
>  }
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> +{
> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    spapr_mce_req_event(cpu, recovered);
> +
> +    return 0;
> +}
> +
>  int kvmppc_enable_hwrng(void)
>  {
>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 45776ca..18693f1 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -81,6 +81,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> +
>  #else
>  
>  static inline uint32_t kvmppc_get_tbfreq(void)
> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> index 3dc6740..6d15aa9 100644
> --- a/target/ppc/trace-events
> +++ b/target/ppc/trace-events
> @@ -28,3 +28,4 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>  kvm_handle_epr(void) "handle epr"
>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>  kvm_handle_debug_exception(void) "handle debug exception"
> +kvm_handle_nmi_exception(void) "handle NMI exception"
> 
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
@ 2019-06-03 14:00   ` Greg Kurz
  2019-06-04  6:29     ` Aravinda Prasad
  0 siblings, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-03 14:00 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Wed, 29 May 2019 11:10:40 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> Upon a machine check exception (MCE) in a guest address space,
> KVM causes a guest exit to enable QEMU to build and pass the
> error to the guest in the PAPR defined rtas error log format.
> 
> This patch builds the rtas error log, copies it to the rtas_addr
> and then invokes the guest registered machine check handler. The
> handler in the guest takes suitable action(s) depending on the type
> and criticality of the error. For example, if an error is
> unrecoverable memory corruption in an application inside the
> guest, then the guest kernel sends a SIGBUS to the application.
> For recoverable errors, the guest performs recovery actions and
> logs the error.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    5 +
>  hw/ppc/spapr_events.c  |  236 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    4 +
>  3 files changed, 245 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6b6c962..c97f6a6 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2910,6 +2910,11 @@ static void spapr_machine_init(MachineState *machine)
>          error_report("Could not get size of LPAR rtas '%s'", filename);
>          exit(1);
>      }
> +
> +    /* Resize blob to accommodate error log. */
> +    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);

I don't see the point of this assertion... especially with the assignment
below.

> +    spapr->rtas_size = RTAS_ERROR_LOG_MAX;

As requested by David, this should only be done when the spapr cap is set,
so that 4.0 machine types and older continue to use the current size.

> +
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>          error_report("Could not load LPAR rtas '%s'", filename);
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index a18446b..573c0b7 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -212,6 +212,106 @@ struct hp_extended_log {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_mc {
> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint32_t fru_id;
> +    uint32_t proc_id;
> +    uint8_t error_type;
> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> +    uint8_t sub_err_type;
> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> +    uint8_t reserved_1[6];
> +    uint64_t effective_address;
> +    uint64_t logical_address;
> +} QEMU_PACKED;
> +
> +struct mc_extended_log {
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_mc mc;
> +} QEMU_PACKED;
> +
> +struct MC_ierror_table {
> +    unsigned long srr1_mask;
> +    unsigned long srr1_value;
> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_ierror_table mc_ierror_table[] = {
> +{ 0x00000000081c0000, 0x0000000000040000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000080000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000100000, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000140000, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000180000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, 0, 0, 0, 0, 0 } };
> +
> +struct MC_derror_table {
> +    unsigned long dsisr_value;
> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_derror_table mc_derror_table[] = {
> +{ 0x00008000, false,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00004000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000800, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000400, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000080, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000100, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, false, 0, 0, 0, 0 } };
> +
> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> +
>  typedef enum EventClass {
>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>      EVENT_CLASS_EPOW                = 1,
> @@ -620,6 +720,138 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> +                                        struct mc_extended_log *ext_elog)
> +{
> +    int i;
> +    CPUPPCState *env = &cpu->env;
> +    uint32_t summary;
> +    uint64_t dsisr = env->spr[SPR_DSISR];
> +
> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> +    if (recovered) {
> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> +    } else {
> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> +    }
> +
> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> +            if (mc_derror_table[i].dar_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> +            }
> +
> +            summary |= mc_derror_table[i].initiator
> +                        | mc_derror_table[i].severity;
> +
> +            return summary;
> +        }
> +    } else {
> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> +                    mc_ierror_table[i].srr1_value) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> +            if (mc_ierror_table[i].nip_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> +            }
> +
> +            summary |= mc_ierror_table[i].initiator
> +                        | mc_ierror_table[i].severity;
> +
> +            return summary;
> +        }
> +    }
> +
> +    summary |= RTAS_LOG_INITIATOR_CPU;
> +    return summary;
> +}
> +
> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    CPUState *cs = CPU(cpu);
> +    uint64_t rtas_addr;
> +    CPUPPCState *env = &cpu->env;
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    target_ulong r3, msr = 0;
> +    struct rtas_error_log log;
> +    struct mc_extended_log *ext_elog;
> +    uint32_t summary;
> +
> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr & (1ULL << MSR_SF)) {
> +        msr |= (1ULL << MSR_SF);
> +    }
> +
> +    msr |= (1ULL << MSR_ME);
> +
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /*
> +         * This implies that we have hit a machine check between system
> +         * reset and "ibm,nmi-register". Fall back to the old machine
> +         * check behavior in such cases.
> +         */
> +        env->spr[SPR_SRR0] = env->nip;
> +        env->spr[SPR_SRR1] = env->msr;
> +        env->msr = msr;
> +        env->nip = 0x200;
> +        return;
> +    }
> +
> +    ext_elog = g_malloc0(sizeof(*ext_elog));
> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> +
> +    log.summary = cpu_to_be32(summary);
> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
> +
> +    /* r3 should be in BE always */
> +    r3 = cpu_to_be64(env->gpr[3]);
> +    env->msr = msr;
> +
> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> +    ext_elog->mc.hdr.section_length =
> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> +    ext_elog->mc.hdr.section_version = 1;
> +
> +    /* get rtas addr from fdt */
> +    rtas_addr = spapr_get_rtas_addr();
> +    if (!rtas_addr) {
> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> +        ppc_cpu_do_system_reset(cs);
> +    }
> +
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
> +                              sizeof(r3));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
> +                              &log, sizeof(log));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
> +                              sizeof(log), ext_elog,
> +                              sizeof(*ext_elog));
> +
> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
> +    env->nip = spapr->guest_machine_check_addr;
> +
> +    g_free(ext_elog);
> +}
> +
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> @@ -641,6 +873,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>          }
>      }
>      spapr->mc_status = cpu->vcpu_id;
> +
> +    spapr_mce_dispatch_elog(cpu, recovered);
> +
> +    return;

Drop the last two lines.

>  }
>  
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index fc3a776..c717ab2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
>  
>  #define RTAS_ERROR_LOG_MAX      2048
>  
> +/* Offset from rtas-base where error log is placed */
> +#define RTAS_ERROR_LOG_OFFSET       0x30
> +
>  #define RTAS_EVENT_SCAN_RATE    1
>  
>  /* This helper should be used to encode interrupt specifiers when the related
> @@ -799,6 +802,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
>  

Looks like a leftover.

>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability Aravinda Prasad
@ 2019-06-03 15:25   ` Greg Kurz
  2019-06-04  6:45     ` Aravinda Prasad
  2019-06-06  3:00   ` [Qemu-devel] " David Gibson
  1 sibling, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-03 15:25 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Wed, 29 May 2019 11:10:49 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
> the KVM causes guest exit with NMI as exit reason
> when it encounters a machine check exception on the
> address belonging to a guest. Without this capability
> enabled, KVM redirects machine check exceptions to
> guest's 0x200 vector.
> 
> This patch also deals with the case when a guest with
> the KVM_CAP_PPC_FWNMI capability enabled is attempted
> to migrate to a host that does not support this
> capability.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---

As suggested in another mail, it may be worth introducing the sPAPR cap
in its own patch, earlier in the series.

Anyway, I have some comments below.

>  hw/ppc/spapr.c         |    1 +
>  hw/ppc/spapr_caps.c    |   24 ++++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c    |   18 ++++++++++++++++++
>  include/hw/ppc/spapr.h |    4 +++-
>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>  6 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c97f6a6..e8a77636 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4364,6 +4364,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>      spapr_caps_add_properties(smc, &error_abort);
>      smc->irq = &spapr_irq_dual;
>      smc->dr_phb_enabled = true;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 31b4661..ef9e612 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -479,6 +479,20 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>      }
>  }
>  
> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> +                                Error **errp)
> +{
> +    if (!val) {
> +        return; /* Disabled by default */
> +    }
> +
> +    if (tcg_enabled()) {
> +            error_setg(errp, "No fwnmi support in TCG, try cap-fwnmi-mce=off");

Maybe expand "fwnmi" to "Firmware Assisted Non-Maskable Interrupts" ?

> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
> +            error_setg(errp, "Requested fwnmi capability not support by KVM");

Maybe reword and add a hint:

"KVM implementation does not support Firmware Assisted Non-Maskable Interrupts, try cap-fwnmi-mce=off"


> +    }
> +}
> +
>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>      [SPAPR_CAP_HTM] = {
>          .name = "htm",
> @@ -578,6 +592,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>          .type = "bool",
>          .apply = cap_ccf_assist_apply,
>      },
> +    [SPAPR_CAP_FWNMI_MCE] = {
> +        .name = "fwnmi-mce",
> +        .description = "Handle fwnmi machine check exceptions",
> +        .index = SPAPR_CAP_FWNMI_MCE,
> +        .get = spapr_cap_get_bool,
> +        .set = spapr_cap_set_bool,
> +        .type = "bool",
> +        .apply = cap_fwnmi_mce_apply,
> +    },
>  };
>  
>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> @@ -717,6 +740,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>  
>  void spapr_caps_init(SpaprMachineState *spapr)
>  {
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index e0bdfc8..91a7ab9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -49,6 +49,7 @@
>  #include "hw/ppc/fdt.h"
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
> +#include "kvm_ppc.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -358,6 +359,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>                                    target_ulong args,
>                                    uint32_t nret, target_ulong rets)
>  {
> +    int ret;
>      hwaddr rtas_addr = spapr_get_rtas_addr();
>  
>      if (!rtas_addr) {
> @@ -365,6 +367,22 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>          return;
>      }
>  
> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == 0) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    ret = kvmppc_fwnmi_enable(cpu);
> +    if (ret == 1) {

I have the impression that this should really not happen,
otherwise something has gone terribly wrong in QEMU or
in KVM... this maybe deserves an error message as well ?
No big deal.

> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    if (ret < 0) {
> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +        return;
> +    }
> +
>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index c717ab2..bd75d4b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -78,8 +78,10 @@ typedef enum {
>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>  /* Count Cache Flush Assist HW Instruction */
>  #define SPAPR_CAP_CCF_ASSIST            0x09
> +/* FWNMI machine check handling */
> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>  /* Num Caps */
> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>  
>  /*
>   * Capability Values
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 39f1a73..368ec6e 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
>  static int cap_ppc_count_cache_flush_assist;
>  static int cap_ppc_nested_kvm_hv;
>  static int cap_large_decr;
> +static int cap_ppc_fwnmi;
>  
>  static uint32_t debug_inst_opcode;
>  
> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      kvmppc_get_cpu_characteristics(s);
>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>      cap_large_decr = kvmppc_get_dec_bits();
> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>      /*
>       * Note: setting it to false because there is not such capability
>       * in KVM at this moment.
> @@ -2119,6 +2121,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>      }
>  }
>  
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!cap_ppc_fwnmi) {
> +        return 1;
> +    }
> +
> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> +}
> +
> +
>  int kvmppc_smt_threads(void)
>  {
>      return cap_ppc_smt ? cap_ppc_smt : 1;
> @@ -2419,6 +2433,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>      return cap_mmu_hash_v3;
>  }
>  
> +bool kvmppc_has_cap_ppc_fwnmi(void)
> +{
> +    return cap_ppc_fwnmi;
> +}
> +
>  static bool kvmppc_power8_host(void)
>  {
>      bool ret = false;
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 18693f1..3d9f0b4 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
>  void kvmppc_set_papr(PowerPCCPU *cpu);
>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
> +bool kvmppc_has_cap_ppc_fwnmi(void);
>  int kvmppc_smt_threads(void);
>  void kvmppc_hint_smt_possible(Error **errp);
>  int kvmppc_set_smt_threads(int smt);
> @@ -160,6 +162,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  {
>  }
>  
> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    return 1;
> +}
> +
> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
> +{
> +    return false;
> +}
> +
>  static inline int kvmppc_smt_threads(void)
>  {
>      return 1;
> 
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling Aravinda Prasad
@ 2019-06-03 15:40   ` Greg Kurz
  2019-06-04  7:04     ` Aravinda Prasad
  2019-06-06  3:06   ` [Qemu-devel] " David Gibson
  1 sibling, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-03 15:40 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Wed, 29 May 2019 11:10:57 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> This patch includes migration support for machine check
> handling. Especially this patch blocks VM migration
> requests until the machine check error handling is
> complete as (i) these errors are specific to the source
> hardware and is irrelevant on the target hardware,
> (ii) these errors cause data corruption and should
> be handled before migration.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---

LGTM, just one issue: machine reset should del and free the blocker as well,
otherwise QEMU would crash if spapr_mce_req_event() is called again.

>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>  hw/ppc/spapr_rtas.c    |    4 ++++
>  include/hw/ppc/spapr.h |    2 ++
>  4 files changed, 43 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e8a77636..31c4850 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
>      },
>  };
>  
> +static bool spapr_fwnmi_needed(void *opaque)
> +{
> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> +
> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;
> +}
> +
> +static const VMStateDescription vmstate_spapr_machine_check = {
> +    .name = "spapr_machine_check",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_fwnmi_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_dtb,
>          &vmstate_spapr_cap_large_decr,
>          &vmstate_spapr_cap_ccf_assist,
> +        &vmstate_spapr_machine_check,
>          NULL
>      }
>  };
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 573c0b7..35e21e4 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -41,6 +41,7 @@
>  #include "qemu/bcd.h"
>  #include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
> +#include "migration/blocker.h"
>  
>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>  #define   RTAS_LOG_VERSION_6                    0x06000000
> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int ret;
> +    Error *local_err = NULL;
> +
> +    error_setg(&spapr->fwnmi_migration_blocker,
> +            "Live migration not supported during machine check handling");
> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> +    if (ret < 0) {
> +        /*
> +         * We don't want to abort and let the migration to continue. In a
> +         * rare case, the machine check handler will run on the target
> +         * hardware. Though this is not preferable, it is better than aborting
> +         * the migration or killing the VM.
> +         */
> +        error_free(spapr->fwnmi_migration_blocker);
> +        warn_report_err(local_err);
> +    }
>  
>      while (spapr->mc_status != -1) {
>          /*
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 91a7ab9..c849223 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -50,6 +50,7 @@
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
>  #include "kvm_ppc.h"
> +#include "migration/blocker.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> +        error_free(spapr->fwnmi_migration_blocker);
> +        spapr->fwnmi_migration_blocker = NULL;
>      }
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index bd75d4b..6c0cfd8 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -214,6 +214,8 @@ struct SpaprMachineState {
>      SpaprCapabilities def, eff, mig;
>  
>      unsigned gpu_numa_id;
> +
> +    Error *fwnmi_migration_blocker;
>  };
>  
>  #define H_SUCCESS         0
> 
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-03 11:17     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-06-04  6:08       ` Aravinda Prasad
  2019-06-04 14:50         ` Greg Kurz
  2019-06-06  1:35       ` David Gibson
  1 sibling, 1 reply; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-04  6:08 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Monday 03 June 2019 04:47 PM, Greg Kurz wrote:
> On Mon, 3 Jun 2019 12:12:43 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
>> On Wed, 29 May 2019 11:10:14 +0530
>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>
>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>> and "ibm,nmi-interlock" RTAS calls.
>>>
>>> The machine check notification address is saved when the
>>> OS issues "ibm,nmi-register" RTAS call.
>>>
>>> This patch also handles the case when multiple processors
>>> experience machine check at or about the same time by
>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>> PAPR, subsequent processors serialize waiting for the first
>>> processor to issue the "ibm,nmi-interlock" call. The second
>>> processor that also received a machine check error waits
>>> till the first processor is done reading the error log.
>>> The first processor issues "ibm,nmi-interlock" call
>>> when the error log is consumed. This patch implements the
>>> releasing part of the error-log while subsequent patch
>>> (which builds error log) handles the locking part.
>>>
>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---  
>>
>> The code looks okay but it still seems wrong to advertise the RTAS
>> calls to the guest that early in the series. The linux kernel in
>> the guest will assume FWNMI is functional, which isn't true until
>> patch 6 (yes, migration is part of the feature, it should be
>> supported upfront, not fixed afterwards).
>>
>> It doesn't help much to introduce the RTAS calls early and to
>> modify them in the other patches. I'd rather see the rest of
>> the code first and a final patch that introduces the fully
>> functional RTAS calls and calls spapr_rtas_register().
>>
> 
> Thinking again, you should introduce the "fwnmi-mce" spapr capability in
> its own patch first, default to "off" and and have the last patch in the
> series to switch the default to "on" for newer machine types only.
> 
> This patch should then only register the RTAS calls if "fwnmi-mcr" is set
> to "on".
> 
> This should address the fact that we don't want to expose a partially
> implemented FWNMI feature to the guest, and we don't want to support
> FWNMI at all with older machine types for the sake of compatibility.

When you say "expose a partially implemented FWNMI feature to the
guest", do you mean while debugging/bisect we may end up with exposing
the partially implemented FWNMI feature? Otherwise it is expected that
QEMU runs with all the 6 patches.

If that is the case, I will have the rtas nmi register functionality as
the last patch in the series. This way we don't have to have spapr cap
turned off first and later turned on. However, as mentioned earlier
(when David raised the same concern), use of guest_machine_check_addr
may look odd at other patches as it is set only during rtas nmi register.

Or else, as a workaround, I can return RTAS_OUT_NOT_SUPPORTED for rtas
nmi register till the entire functionality is implemented and only in
the last patch in the series I will return RTAS_OUT_SUCCESS. This will
ensure that we have a logical connection between the patches and the
partially implemented fwnmi is not exposed to the guest kernel.

Regards,
Aravinda




> 
>>>  hw/ppc/spapr.c         |    7 +++++
>>>  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/hw/ppc/spapr.h |    9 ++++++-
>>>  3 files changed, 80 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index e2b33e5..fae28a9 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
>>>      first_ppc_cpu->env.gpr[5] = 0;
>>>  
>>>      spapr->cas_reboot = false;
>>> +
>>> +    spapr->guest_machine_check_addr = -1;
>>> +
>>> +    /* Signal all vCPUs waiting on this condition */
>>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>>  }
>>>  
>>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>>> @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
>>>  
>>>          kvmppc_spapr_enable_inkernel_multitce();
>>>      }
>>> +
>>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>>  }
>>>  
>>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index 5bc1a93..e7509cf 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>      rtas_st(rets, 1, 100);
>>>  }
>>>  
>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>> +                                  SpaprMachineState *spapr,
>>> +                                  uint32_t token, uint32_t nargs,
>>> +                                  target_ulong args,
>>> +                                  uint32_t nret, target_ulong rets)
>>> +{
>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>> +
>>> +    if (!rtas_addr) {
>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>> +        return;
>>> +    }
>>> +
>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +}
>>> +
>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>> +                                   SpaprMachineState *spapr,
>>> +                                   uint32_t token, uint32_t nargs,
>>> +                                   target_ulong args,
>>> +                                   uint32_t nret, target_ulong rets)
>>> +{
>>> +    if (spapr->guest_machine_check_addr == -1) {
>>> +        /* NMI register not called */
>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> +    } else {
>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +    }
>>> +}
>>> +
>>>  static struct rtas_call {
>>>      const char *name;
>>>      spapr_rtas_fn fn;
>>> @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>>      }
>>>  }
>>>  
>>> +hwaddr spapr_get_rtas_addr(void)
>>> +{
>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> +    int rtas_node;
>>> +    const struct fdt_property *rtas_addr_prop;
>>> +    void *fdt = spapr->fdt_blob;
>>> +    uint32_t rtas_addr;
>>> +
>>> +    /* fetch rtas addr from fdt */
>>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>>> +    if (rtas_node == 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
>>> +    if (!rtas_addr_prop) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    /*
>>> +     * We assume that the OS called RTAS instantiate-rtas, but some other
>>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
>>> +     * as SLOF only supports 32-bit variant.
>>> +     */
>>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>>> +    return (hwaddr)rtas_addr;
>>> +}
>>> +
>>> +
>>>  static void core_rtas_register_types(void)
>>>  {
>>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>>> @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
>>>                          rtas_set_power_level);
>>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>>                          rtas_get_power_level);
>>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>>> +                        rtas_ibm_nmi_register);
>>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>>> +                        rtas_ibm_nmi_interlock);
>>>  }
>>>  
>>>  type_init(core_rtas_register_types)
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index 4f5becf..9dc5e30 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -188,6 +188,10 @@ struct SpaprMachineState {
>>>       * occurs during the unplug process. */
>>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>>  
>>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>> +    target_ulong guest_machine_check_addr;
>>> +    QemuCond mc_delivery_cond;
>>> +
>>>      /*< public >*/
>>>      char *kvm_type;
>>>      char *host_model;
>>> @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>>  
>>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>>  
>>>  /* RTAS ibm,get-system-parameter token values */
>>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>> @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>>  
>>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>> +uint64_t spapr_get_rtas_addr(void);
>>>  #endif /* HW_SPAPR_H */
>>>   
>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE
  2019-06-03 14:00   ` Greg Kurz
@ 2019-06-04  6:29     ` Aravinda Prasad
  2019-06-04  9:01       ` Greg Kurz
  2019-06-06  2:57       ` David Gibson
  0 siblings, 2 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-04  6:29 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Monday 03 June 2019 07:30 PM, Greg Kurz wrote:
> On Wed, 29 May 2019 11:10:40 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> Upon a machine check exception (MCE) in a guest address space,
>> KVM causes a guest exit to enable QEMU to build and pass the
>> error to the guest in the PAPR defined rtas error log format.
>>
>> This patch builds the rtas error log, copies it to the rtas_addr
>> and then invokes the guest registered machine check handler. The
>> handler in the guest takes suitable action(s) depending on the type
>> and criticality of the error. For example, if an error is
>> unrecoverable memory corruption in an application inside the
>> guest, then the guest kernel sends a SIGBUS to the application.
>> For recoverable errors, the guest performs recovery actions and
>> logs the error.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |    5 +
>>  hw/ppc/spapr_events.c  |  236 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    4 +
>>  3 files changed, 245 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 6b6c962..c97f6a6 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2910,6 +2910,11 @@ static void spapr_machine_init(MachineState *machine)
>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>          exit(1);
>>      }
>> +
>> +    /* Resize blob to accommodate error log. */
>> +    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
> 
> I don't see the point of this assertion... especially with the assignment
> below.

It is required because we want to ensure that the rtas image size is
less than RTAS_ERROR_LOG_OFFSET, or else we will overwrite the rtas
image with rtas error when we hit machine check exception. But I think a
comment in the code will help. Will add it.


> 
>> +    spapr->rtas_size = RTAS_ERROR_LOG_MAX;
> 
> As requested by David, this should only be done when the spapr cap is set,
> so that 4.0 machine types and older continue to use the current size.

Due to other issue of re-allocating the blob and as this is not that
much space, we agreed to keep the size to RTAS_ERROR_LOG_MAX always.

Link to the discussion on this:
http://lists.nongnu.org/archive/html/qemu-ppc/2019-05/msg00275.html

> 
>> +
>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>>          error_report("Could not load LPAR rtas '%s'", filename);
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index a18446b..573c0b7 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -212,6 +212,106 @@ struct hp_extended_log {
>>      struct rtas_event_log_v6_hp hp;
>>  } QEMU_PACKED;
>>  
>> +struct rtas_event_log_v6_mc {
>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>> +    struct rtas_event_log_v6_section_header hdr;
>> +    uint32_t fru_id;
>> +    uint32_t proc_id;
>> +    uint8_t error_type;
>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>> +    uint8_t sub_err_type;
>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>> +    uint8_t reserved_1[6];
>> +    uint64_t effective_address;
>> +    uint64_t logical_address;
>> +} QEMU_PACKED;
>> +
>> +struct mc_extended_log {
>> +    struct rtas_event_log_v6 v6hdr;
>> +    struct rtas_event_log_v6_mc mc;
>> +} QEMU_PACKED;
>> +
>> +struct MC_ierror_table {
>> +    unsigned long srr1_mask;
>> +    unsigned long srr1_value;
>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_ierror_table mc_ierror_table[] = {
>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0, 0, 0, 0, 0, 0 } };
>> +
>> +struct MC_derror_table {
>> +    unsigned long dsisr_value;
>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_derror_table mc_derror_table[] = {
>> +{ 0x00008000, false,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00004000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000800, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000400, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000080, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000100, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0, false, 0, 0, 0, 0 } };
>> +
>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>> +
>>  typedef enum EventClass {
>>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>      EVENT_CLASS_EPOW                = 1,
>> @@ -620,6 +720,138 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>  }
>>  
>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>> +                                        struct mc_extended_log *ext_elog)
>> +{
>> +    int i;
>> +    CPUPPCState *env = &cpu->env;
>> +    uint32_t summary;
>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>> +
>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>> +    if (recovered) {
>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>> +    } else {
>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>> +    }
>> +
>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>> +            if (mc_derror_table[i].dar_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
>> +            }
>> +
>> +            summary |= mc_derror_table[i].initiator
>> +                        | mc_derror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    } else {
>> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>> +                    mc_ierror_table[i].srr1_value) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>> +            if (mc_ierror_table[i].nip_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
>> +            }
>> +
>> +            summary |= mc_ierror_table[i].initiator
>> +                        | mc_ierror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    }
>> +
>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>> +    return summary;
>> +}
>> +
>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    CPUState *cs = CPU(cpu);
>> +    uint64_t rtas_addr;
>> +    CPUPPCState *env = &cpu->env;
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    target_ulong r3, msr = 0;
>> +    struct rtas_error_log log;
>> +    struct mc_extended_log *ext_elog;
>> +    uint32_t summary;
>> +
>> +    /*
>> +     * Properly set bits in MSR before we invoke the handler.
>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>> +     */
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +    if (env->msr & (1ULL << MSR_SF)) {
>> +        msr |= (1ULL << MSR_SF);
>> +    }
>> +
>> +    msr |= (1ULL << MSR_ME);
>> +
>> +    if (spapr->guest_machine_check_addr == -1) {
>> +        /*
>> +         * This implies that we have hit a machine check between system
>> +         * reset and "ibm,nmi-register". Fall back to the old machine
>> +         * check behavior in such cases.
>> +         */
>> +        env->spr[SPR_SRR0] = env->nip;
>> +        env->spr[SPR_SRR1] = env->msr;
>> +        env->msr = msr;
>> +        env->nip = 0x200;
>> +        return;
>> +    }
>> +
>> +    ext_elog = g_malloc0(sizeof(*ext_elog));
>> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
>> +
>> +    log.summary = cpu_to_be32(summary);
>> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
>> +
>> +    /* r3 should be in BE always */
>> +    r3 = cpu_to_be64(env->gpr[3]);
>> +    env->msr = msr;
>> +
>> +    spapr_init_v6hdr(&ext_elog->v6hdr);
>> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
>> +    ext_elog->mc.hdr.section_length =
>> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
>> +    ext_elog->mc.hdr.section_version = 1;
>> +
>> +    /* get rtas addr from fdt */
>> +    rtas_addr = spapr_get_rtas_addr();
>> +    if (!rtas_addr) {
>> +        /* Unable to fetch rtas_addr. Hence reset the guest */
>> +        ppc_cpu_do_system_reset(cs);
>> +    }
>> +
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
>> +                              sizeof(r3));
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
>> +                              &log, sizeof(log));
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
>> +                              sizeof(log), ext_elog,
>> +                              sizeof(*ext_elog));
>> +
>> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
>> +    env->nip = spapr->guest_machine_check_addr;
>> +
>> +    g_free(ext_elog);
>> +}
>> +
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> @@ -641,6 +873,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>          }
>>      }
>>      spapr->mc_status = cpu->vcpu_id;
>> +
>> +    spapr_mce_dispatch_elog(cpu, recovered);
>> +
>> +    return;
> 
> Drop the last two lines.

ok.

> 
>>  }
>>  
>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index fc3a776..c717ab2 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
>>  
>>  #define RTAS_ERROR_LOG_MAX      2048
>>  
>> +/* Offset from rtas-base where error log is placed */
>> +#define RTAS_ERROR_LOG_OFFSET       0x30
>> +
>>  #define RTAS_EVENT_SCAN_RATE    1
>>  
>>  /* This helper should be used to encode interrupt specifiers when the related
>> @@ -799,6 +802,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                        uint64_t pte0, uint64_t pte1);
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
>>  
> 
> Looks like a leftover.

ah.. yes.

> 
>>  /* DRC callbacks. */
>>  void spapr_core_release(DeviceState *dev);
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability
  2019-06-03 15:25   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-06-04  6:45     ` Aravinda Prasad
  2019-06-06  3:02       ` David Gibson
  0 siblings, 1 reply; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-04  6:45 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Monday 03 June 2019 08:55 PM, Greg Kurz wrote:
> On Wed, 29 May 2019 11:10:49 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
>> the KVM causes guest exit with NMI as exit reason
>> when it encounters a machine check exception on the
>> address belonging to a guest. Without this capability
>> enabled, KVM redirects machine check exceptions to
>> guest's 0x200 vector.
>>
>> This patch also deals with the case when a guest with
>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
>> to migrate to a host that does not support this
>> capability.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
> 
> As suggested in another mail, it may be worth introducing the sPAPR cap
> in its own patch, earlier in the series.

Sure, also as a workaround mentioned in the reply to that mail, I am
thinking of returning RTAS_OUT_NOT_SUPPORTED to rtas nmi register call
until the entire functionality is implemented. This will help solve
spapr cap issue as well.

> 
> Anyway, I have some comments below.
> 
>>  hw/ppc/spapr.c         |    1 +
>>  hw/ppc/spapr_caps.c    |   24 ++++++++++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |   18 ++++++++++++++++++
>>  include/hw/ppc/spapr.h |    4 +++-
>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>  6 files changed, 77 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index c97f6a6..e8a77636 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -4364,6 +4364,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>      spapr_caps_add_properties(smc, &error_abort);
>>      smc->irq = &spapr_irq_dual;
>>      smc->dr_phb_enabled = true;
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index 31b4661..ef9e612 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -479,6 +479,20 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>      }
>>  }
>>  
>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>> +                                Error **errp)
>> +{
>> +    if (!val) {
>> +        return; /* Disabled by default */
>> +    }
>> +
>> +    if (tcg_enabled()) {
>> +            error_setg(errp, "No fwnmi support in TCG, try cap-fwnmi-mce=off");
> 
> Maybe expand "fwnmi" to "Firmware Assisted Non-Maskable Interrupts" ?

sure..

> 
>> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
> 
> Maybe reword and add a hint:
> 
> "KVM implementation does not support Firmware Assisted Non-Maskable Interrupts, try cap-fwnmi-mce=off"

sure..

> 
> 
>> +    }
>> +}
>> +
>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>      [SPAPR_CAP_HTM] = {
>>          .name = "htm",
>> @@ -578,6 +592,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>          .type = "bool",
>>          .apply = cap_ccf_assist_apply,
>>      },
>> +    [SPAPR_CAP_FWNMI_MCE] = {
>> +        .name = "fwnmi-mce",
>> +        .description = "Handle fwnmi machine check exceptions",
>> +        .index = SPAPR_CAP_FWNMI_MCE,
>> +        .get = spapr_cap_get_bool,
>> +        .set = spapr_cap_set_bool,
>> +        .type = "bool",
>> +        .apply = cap_fwnmi_mce_apply,
>> +    },
>>  };
>>  
>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>> @@ -717,6 +740,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>  
>>  void spapr_caps_init(SpaprMachineState *spapr)
>>  {
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index e0bdfc8..91a7ab9 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -49,6 +49,7 @@
>>  #include "hw/ppc/fdt.h"
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>> +#include "kvm_ppc.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -358,6 +359,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>                                    target_ulong args,
>>                                    uint32_t nret, target_ulong rets)
>>  {
>> +    int ret;
>>      hwaddr rtas_addr = spapr_get_rtas_addr();
>>  
>>      if (!rtas_addr) {
>> @@ -365,6 +367,22 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>          return;
>>      }
>>  
>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == 0) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    ret = kvmppc_fwnmi_enable(cpu);
>> +    if (ret == 1) {
> 
> I have the impression that this should really not happen,
> otherwise something has gone terribly wrong in QEMU or
> in KVM... this maybe deserves an error message as well ?
> No big deal.

I think so..  will add an error message.

Also I should check for non zero return value, not just ret == 1, as
kvmppc_fwnmi_enable() returns the error value from ioctl().


> 
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    if (ret < 0) {
>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> +        return;
>> +    }
>> +
>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>  }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index c717ab2..bd75d4b 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -78,8 +78,10 @@ typedef enum {
>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>>  /* Count Cache Flush Assist HW Instruction */
>>  #define SPAPR_CAP_CCF_ASSIST            0x09
>> +/* FWNMI machine check handling */
>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>>  /* Num Caps */
>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>>  
>>  /*
>>   * Capability Values
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 39f1a73..368ec6e 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
>>  static int cap_ppc_count_cache_flush_assist;
>>  static int cap_ppc_nested_kvm_hv;
>>  static int cap_large_decr;
>> +static int cap_ppc_fwnmi;
>>  
>>  static uint32_t debug_inst_opcode;
>>  
>> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>      kvmppc_get_cpu_characteristics(s);
>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>>      cap_large_decr = kvmppc_get_dec_bits();
>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>      /*
>>       * Note: setting it to false because there is not such capability
>>       * in KVM at this moment.
>> @@ -2119,6 +2121,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>      }
>>  }
>>  
>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>> +{
>> +    CPUState *cs = CPU(cpu);
>> +
>> +    if (!cap_ppc_fwnmi) {
>> +        return 1;
>> +    }
>> +
>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>> +}
>> +
>> +
>>  int kvmppc_smt_threads(void)
>>  {
>>      return cap_ppc_smt ? cap_ppc_smt : 1;
>> @@ -2419,6 +2433,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>>      return cap_mmu_hash_v3;
>>  }
>>  
>> +bool kvmppc_has_cap_ppc_fwnmi(void)
>> +{
>> +    return cap_ppc_fwnmi;
>> +}
>> +
>>  static bool kvmppc_power8_host(void)
>>  {
>>      bool ret = false;
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index 18693f1..3d9f0b4 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
>>  void kvmppc_set_papr(PowerPCCPU *cpu);
>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>> +bool kvmppc_has_cap_ppc_fwnmi(void);
>>  int kvmppc_smt_threads(void);
>>  void kvmppc_hint_smt_possible(Error **errp);
>>  int kvmppc_set_smt_threads(int smt);
>> @@ -160,6 +162,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>  {
>>  }
>>  
>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>> +{
>> +    return 1;
>> +}
>> +
>> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
>> +{
>> +    return false;
>> +}
>> +
>>  static inline int kvmppc_smt_threads(void)
>>  {
>>      return 1;
>>
>>
> 
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-03 15:40   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-06-04  7:04     ` Aravinda Prasad
  2019-06-04 20:04       ` Greg Kurz
  0 siblings, 1 reply; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-04  7:04 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Monday 03 June 2019 09:10 PM, Greg Kurz wrote:
> On Wed, 29 May 2019 11:10:57 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> This patch includes migration support for machine check
>> handling. Especially this patch blocks VM migration
>> requests until the machine check error handling is
>> complete as (i) these errors are specific to the source
>> hardware and is irrelevant on the target hardware,
>> (ii) these errors cause data corruption and should
>> be handled before migration.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
> 
> LGTM, just one issue: machine reset should del and free the blocker as well,
> otherwise QEMU would crash if spapr_mce_req_event() is called again.

Sure.


> 
>>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>  include/hw/ppc/spapr.h |    2 ++
>>  4 files changed, 43 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index e8a77636..31c4850 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>      },
>>  };
>>  
>> +static bool spapr_fwnmi_needed(void *opaque)
>> +{
>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
>> +
>> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_machine_check = {
>> +    .name = "spapr_machine_check",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = spapr_fwnmi_needed,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static const VMStateDescription vmstate_spapr = {
>>      .name = "spapr",
>>      .version_id = 3,
>> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
>>          &vmstate_spapr_dtb,
>>          &vmstate_spapr_cap_large_decr,
>>          &vmstate_spapr_cap_ccf_assist,
>> +        &vmstate_spapr_machine_check,
>>          NULL
>>      }
>>  };
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 573c0b7..35e21e4 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -41,6 +41,7 @@
>>  #include "qemu/bcd.h"
>>  #include "hw/ppc/spapr_ovec.h"
>>  #include <libfdt.h>
>> +#include "migration/blocker.h"
>>  
>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int ret;
>> +    Error *local_err = NULL;
>> +
>> +    error_setg(&spapr->fwnmi_migration_blocker,
>> +            "Live migration not supported during machine check handling");
>> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
>> +    if (ret < 0) {
>> +        /*
>> +         * We don't want to abort and let the migration to continue. In a
>> +         * rare case, the machine check handler will run on the target
>> +         * hardware. Though this is not preferable, it is better than aborting
>> +         * the migration or killing the VM.
>> +         */
>> +        error_free(spapr->fwnmi_migration_blocker);
>> +        warn_report_err(local_err);
>> +    }
>>  
>>      while (spapr->mc_status != -1) {
>>          /*
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 91a7ab9..c849223 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -50,6 +50,7 @@
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>>  #include "kvm_ppc.h"
>> +#include "migration/blocker.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>> +        error_free(spapr->fwnmi_migration_blocker);
>> +        spapr->fwnmi_migration_blocker = NULL;
>>      }
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index bd75d4b..6c0cfd8 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -214,6 +214,8 @@ struct SpaprMachineState {
>>      SpaprCapabilities def, eff, mig;
>>  
>>      unsigned gpu_numa_id;
>> +
>> +    Error *fwnmi_migration_blocker;
>>  };
>>  
>>  #define H_SUCCESS         0
>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE
  2019-06-04  6:29     ` Aravinda Prasad
@ 2019-06-04  9:01       ` Greg Kurz
  2019-06-04 10:10         ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  2019-06-06  2:58         ` [Qemu-devel] " David Gibson
  2019-06-06  2:57       ` David Gibson
  1 sibling, 2 replies; 44+ messages in thread
From: Greg Kurz @ 2019-06-04  9:01 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Tue, 4 Jun 2019 11:59:13 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> On Monday 03 June 2019 07:30 PM, Greg Kurz wrote:
> > On Wed, 29 May 2019 11:10:40 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >   
> >> Upon a machine check exception (MCE) in a guest address space,
> >> KVM causes a guest exit to enable QEMU to build and pass the
> >> error to the guest in the PAPR defined rtas error log format.
> >>
> >> This patch builds the rtas error log, copies it to the rtas_addr
> >> and then invokes the guest registered machine check handler. The
> >> handler in the guest takes suitable action(s) depending on the type
> >> and criticality of the error. For example, if an error is
> >> unrecoverable memory corruption in an application inside the
> >> guest, then the guest kernel sends a SIGBUS to the application.
> >> For recoverable errors, the guest performs recovery actions and
> >> logs the error.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |    5 +
> >>  hw/ppc/spapr_events.c  |  236 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |    4 +
> >>  3 files changed, 245 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 6b6c962..c97f6a6 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2910,6 +2910,11 @@ static void spapr_machine_init(MachineState *machine)
> >>          error_report("Could not get size of LPAR rtas '%s'", filename);
> >>          exit(1);
> >>      }
> >> +
> >> +    /* Resize blob to accommodate error log. */
> >> +    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);  
> > 
> > I don't see the point of this assertion... especially with the assignment
> > below.  
> 
> It is required because we want to ensure that the rtas image size is
> less than RTAS_ERROR_LOG_OFFSET, or else we will overwrite the rtas
> image with rtas error when we hit machine check exception. But I think a
> comment in the code will help. Will add it.
> 

I'd rather exit QEMU properly instead of aborting then. Also this is only
needed if the guest has a chance to use FWNMI, ie. the spapr cap is set.

> 
> >   
> >> +    spapr->rtas_size = RTAS_ERROR_LOG_MAX;  
> > 
> > As requested by David, this should only be done when the spapr cap is set,
> > so that 4.0 machine types and older continue to use the current size.  
> 
> Due to other issue of re-allocating the blob and as this is not that
> much space, we agreed to keep the size to RTAS_ERROR_LOG_MAX always.
> 
> Link to the discussion on this:
> http://lists.nongnu.org/archive/html/qemu-ppc/2019-05/msg00275.html
> 

Indeed, and in the next mail in that thread, David writes:

> No, that's not right.  It's impractical to change the allocation
> depending on whether fwnmi is currently active.  But you *can* (and
> should) base the allocation on whether fwnmi is *possible* - that is,
> the value of the spapr cap.

ie, allocate RTAS_ERROR_LOG_MAX when the spapr cap is set, allocate
the file size otherwise.

> >   
> >> +
> >>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
> >>          error_report("Could not load LPAR rtas '%s'", filename);
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index a18446b..573c0b7 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -212,6 +212,106 @@ struct hp_extended_log {
> >>      struct rtas_event_log_v6_hp hp;
> >>  } QEMU_PACKED;
> >>  
> >> +struct rtas_event_log_v6_mc {
> >> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> >> +    struct rtas_event_log_v6_section_header hdr;
> >> +    uint32_t fru_id;
> >> +    uint32_t proc_id;
> >> +    uint8_t error_type;
> >> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> >> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> >> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> >> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> >> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> >> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> >> +    uint8_t sub_err_type;
> >> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> >> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> >> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> >> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> >> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> >> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> >> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> >> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> >> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> >> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> >> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> >> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> >> +    uint8_t reserved_1[6];
> >> +    uint64_t effective_address;
> >> +    uint64_t logical_address;
> >> +} QEMU_PACKED;
> >> +
> >> +struct mc_extended_log {
> >> +    struct rtas_event_log_v6 v6hdr;
> >> +    struct rtas_event_log_v6_mc mc;
> >> +} QEMU_PACKED;
> >> +
> >> +struct MC_ierror_table {
> >> +    unsigned long srr1_mask;
> >> +    unsigned long srr1_value;
> >> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_ierror_table mc_ierror_table[] = {
> >> +{ 0x00000000081c0000, 0x0000000000040000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000080000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000100000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000140000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000180000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, 0, 0, 0, 0, 0 } };
> >> +
> >> +struct MC_derror_table {
> >> +    unsigned long dsisr_value;
> >> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_derror_table mc_derror_table[] = {
> >> +{ 0x00008000, false,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00004000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000800, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000400, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000080, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000100, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, false, 0, 0, 0, 0 } };
> >> +
> >> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> >> +
> >>  typedef enum EventClass {
> >>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
> >>      EVENT_CLASS_EPOW                = 1,
> >> @@ -620,6 +720,138 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> >>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >>  }
> >>  
> >> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> >> +                                        struct mc_extended_log *ext_elog)
> >> +{
> >> +    int i;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    uint32_t summary;
> >> +    uint64_t dsisr = env->spr[SPR_DSISR];
> >> +
> >> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> >> +    if (recovered) {
> >> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> >> +    } else {
> >> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> >> +    }
> >> +
> >> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> >> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> >> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> >> +            if (mc_derror_table[i].dar_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> >> +            }
> >> +
> >> +            summary |= mc_derror_table[i].initiator
> >> +                        | mc_derror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    } else {
> >> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> >> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> >> +                    mc_ierror_table[i].srr1_value) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> >> +            if (mc_ierror_table[i].nip_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> >> +            }
> >> +
> >> +            summary |= mc_ierror_table[i].initiator
> >> +                        | mc_ierror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    }
> >> +
> >> +    summary |= RTAS_LOG_INITIATOR_CPU;
> >> +    return summary;
> >> +}
> >> +
> >> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >> +{
> >> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    CPUState *cs = CPU(cpu);
> >> +    uint64_t rtas_addr;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> >> +    target_ulong r3, msr = 0;
> >> +    struct rtas_error_log log;
> >> +    struct mc_extended_log *ext_elog;
> >> +    uint32_t summary;
> >> +
> >> +    /*
> >> +     * Properly set bits in MSR before we invoke the handler.
> >> +     * SRR0/1, DAR and DSISR are properly set by KVM
> >> +     */
> >> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> >> +        msr |= (1ULL << MSR_LE);
> >> +    }
> >> +
> >> +    if (env->msr & (1ULL << MSR_SF)) {
> >> +        msr |= (1ULL << MSR_SF);
> >> +    }
> >> +
> >> +    msr |= (1ULL << MSR_ME);
> >> +
> >> +    if (spapr->guest_machine_check_addr == -1) {
> >> +        /*
> >> +         * This implies that we have hit a machine check between system
> >> +         * reset and "ibm,nmi-register". Fall back to the old machine
> >> +         * check behavior in such cases.
> >> +         */
> >> +        env->spr[SPR_SRR0] = env->nip;
> >> +        env->spr[SPR_SRR1] = env->msr;
> >> +        env->msr = msr;
> >> +        env->nip = 0x200;
> >> +        return;
> >> +    }
> >> +
> >> +    ext_elog = g_malloc0(sizeof(*ext_elog));
> >> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> >> +
> >> +    log.summary = cpu_to_be32(summary);
> >> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
> >> +
> >> +    /* r3 should be in BE always */
> >> +    r3 = cpu_to_be64(env->gpr[3]);
> >> +    env->msr = msr;
> >> +
> >> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> >> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> >> +    ext_elog->mc.hdr.section_length =
> >> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> >> +    ext_elog->mc.hdr.section_version = 1;
> >> +
> >> +    /* get rtas addr from fdt */
> >> +    rtas_addr = spapr_get_rtas_addr();
> >> +    if (!rtas_addr) {
> >> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> >> +        ppc_cpu_do_system_reset(cs);
> >> +    }
> >> +
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
> >> +                              sizeof(r3));
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
> >> +                              &log, sizeof(log));
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
> >> +                              sizeof(log), ext_elog,
> >> +                              sizeof(*ext_elog));
> >> +
> >> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
> >> +    env->nip = spapr->guest_machine_check_addr;
> >> +
> >> +    g_free(ext_elog);
> >> +}
> >> +
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>  {
> >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> @@ -641,6 +873,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>          }
> >>      }
> >>      spapr->mc_status = cpu->vcpu_id;
> >> +
> >> +    spapr_mce_dispatch_elog(cpu, recovered);
> >> +
> >> +    return;  
> > 
> > Drop the last two lines.  
> 
> ok.
> 
> >   
> >>  }
> >>  
> >>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index fc3a776..c717ab2 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
> >>  
> >>  #define RTAS_ERROR_LOG_MAX      2048
> >>  
> >> +/* Offset from rtas-base where error log is placed */
> >> +#define RTAS_ERROR_LOG_OFFSET       0x30
> >> +
> >>  #define RTAS_EVENT_SCAN_RATE    1
> >>  
> >>  /* This helper should be used to encode interrupt specifiers when the related
> >> @@ -799,6 +802,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
> >>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> >>                        uint64_t pte0, uint64_t pte1);
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> >> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
> >>    
> > 
> > Looks like a leftover.  
> 
> ah.. yes.
> 
> >   
> >>  /* DRC callbacks. */
> >>  void spapr_core_release(DeviceState *dev);
> >>  
> >   
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE
  2019-06-04  9:01       ` Greg Kurz
@ 2019-06-04 10:10         ` Aravinda Prasad
  2019-06-06  2:58         ` [Qemu-devel] " David Gibson
  1 sibling, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-04 10:10 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Tuesday 04 June 2019 02:31 PM, Greg Kurz wrote:
> On Tue, 4 Jun 2019 11:59:13 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> On Monday 03 June 2019 07:30 PM, Greg Kurz wrote:
>>> On Wed, 29 May 2019 11:10:40 +0530
>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>   
>>>> Upon a machine check exception (MCE) in a guest address space,
>>>> KVM causes a guest exit to enable QEMU to build and pass the
>>>> error to the guest in the PAPR defined rtas error log format.
>>>>
>>>> This patch builds the rtas error log, copies it to the rtas_addr
>>>> and then invokes the guest registered machine check handler. The
>>>> handler in the guest takes suitable action(s) depending on the type
>>>> and criticality of the error. For example, if an error is
>>>> unrecoverable memory corruption in an application inside the
>>>> guest, then the guest kernel sends a SIGBUS to the application.
>>>> For recoverable errors, the guest performs recovery actions and
>>>> logs the error.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr.c         |    5 +
>>>>  hw/ppc/spapr_events.c  |  236 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |    4 +
>>>>  3 files changed, 245 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 6b6c962..c97f6a6 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -2910,6 +2910,11 @@ static void spapr_machine_init(MachineState *machine)
>>>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>>>          exit(1);
>>>>      }
>>>> +
>>>> +    /* Resize blob to accommodate error log. */
>>>> +    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);  
>>>
>>> I don't see the point of this assertion... especially with the assignment
>>> below.  
>>
>> It is required because we want to ensure that the rtas image size is
>> less than RTAS_ERROR_LOG_OFFSET, or else we will overwrite the rtas
>> image with rtas error when we hit machine check exception. But I think a
>> comment in the code will help. Will add it.
>>
> 
> I'd rather exit QEMU properly instead of aborting then. Also this is only
> needed if the guest has a chance to use FWNMI, ie. the spapr cap is set.

ok..

> 
>>
>>>   
>>>> +    spapr->rtas_size = RTAS_ERROR_LOG_MAX;  
>>>
>>> As requested by David, this should only be done when the spapr cap is set,
>>> so that 4.0 machine types and older continue to use the current size.  
>>
>> Due to other issue of re-allocating the blob and as this is not that
>> much space, we agreed to keep the size to RTAS_ERROR_LOG_MAX always.
>>
>> Link to the discussion on this:
>> http://lists.nongnu.org/archive/html/qemu-ppc/2019-05/msg00275.html
>>
> 
> Indeed, and in the next mail in that thread, David writes:
> 
>> No, that's not right.  It's impractical to change the allocation
>> depending on whether fwnmi is currently active.  But you *can* (and
>> should) base the allocation on whether fwnmi is *possible* - that is,
>> the value of the spapr cap.
> 
> ie, allocate RTAS_ERROR_LOG_MAX when the spapr cap is set, allocate
> the file size otherwise.

Ah.. somehow this slipped off my mind...

Regards,
Aravinda

> 
>>>   
>>>> +
>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>>>>          error_report("Could not load LPAR rtas '%s'", filename);
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index a18446b..573c0b7 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -212,6 +212,106 @@ struct hp_extended_log {
>>>>      struct rtas_event_log_v6_hp hp;
>>>>  } QEMU_PACKED;
>>>>  
>>>> +struct rtas_event_log_v6_mc {
>>>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>>>> +    struct rtas_event_log_v6_section_header hdr;
>>>> +    uint32_t fru_id;
>>>> +    uint32_t proc_id;
>>>> +    uint8_t error_type;
>>>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>>>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>>>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>>>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>>>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>>>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>>>> +    uint8_t sub_err_type;
>>>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>>>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>>>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>>>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>>>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>>>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>>>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>>>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>>>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>>>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>>>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>>>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>>>> +    uint8_t reserved_1[6];
>>>> +    uint64_t effective_address;
>>>> +    uint64_t logical_address;
>>>> +} QEMU_PACKED;
>>>> +
>>>> +struct mc_extended_log {
>>>> +    struct rtas_event_log_v6 v6hdr;
>>>> +    struct rtas_event_log_v6_mc mc;
>>>> +} QEMU_PACKED;
>>>> +
>>>> +struct MC_ierror_table {
>>>> +    unsigned long srr1_mask;
>>>> +    unsigned long srr1_value;
>>>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>>>> +    uint8_t error_type;
>>>> +    uint8_t error_subtype;
>>>> +    unsigned int initiator;
>>>> +    unsigned int severity;
>>>> +};
>>>> +
>>>> +static const struct MC_ierror_table mc_ierror_table[] = {
>>>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0, 0, 0, 0, 0, 0 } };
>>>> +
>>>> +struct MC_derror_table {
>>>> +    unsigned long dsisr_value;
>>>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>>>> +    uint8_t error_type;
>>>> +    uint8_t error_subtype;
>>>> +    unsigned int initiator;
>>>> +    unsigned int severity;
>>>> +};
>>>> +
>>>> +static const struct MC_derror_table mc_derror_table[] = {
>>>> +{ 0x00008000, false,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00004000, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000800, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000400, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000080, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0x00000100, true,
>>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>>> +{ 0, false, 0, 0, 0, 0 } };
>>>> +
>>>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>>>> +
>>>>  typedef enum EventClass {
>>>>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>>>      EVENT_CLASS_EPOW                = 1,
>>>> @@ -620,6 +720,138 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>>>  }
>>>>  
>>>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>>>> +                                        struct mc_extended_log *ext_elog)
>>>> +{
>>>> +    int i;
>>>> +    CPUPPCState *env = &cpu->env;
>>>> +    uint32_t summary;
>>>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>>>> +
>>>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>>>> +    if (recovered) {
>>>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>>>> +    } else {
>>>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>>>> +    }
>>>> +
>>>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>>>> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
>>>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>>>> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>>>> +            if (mc_derror_table[i].dar_valid) {
>>>> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
>>>> +            }
>>>> +
>>>> +            summary |= mc_derror_table[i].initiator
>>>> +                        | mc_derror_table[i].severity;
>>>> +
>>>> +            return summary;
>>>> +        }
>>>> +    } else {
>>>> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
>>>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>>>> +                    mc_ierror_table[i].srr1_value) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>>>> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>>>> +            if (mc_ierror_table[i].nip_valid) {
>>>> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
>>>> +            }
>>>> +
>>>> +            summary |= mc_ierror_table[i].initiator
>>>> +                        | mc_ierror_table[i].severity;
>>>> +
>>>> +            return summary;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>>>> +    return summary;
>>>> +}
>>>> +
>>>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>>> +{
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> +    CPUState *cs = CPU(cpu);
>>>> +    uint64_t rtas_addr;
>>>> +    CPUPPCState *env = &cpu->env;
>>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>>> +    target_ulong r3, msr = 0;
>>>> +    struct rtas_error_log log;
>>>> +    struct mc_extended_log *ext_elog;
>>>> +    uint32_t summary;
>>>> +
>>>> +    /*
>>>> +     * Properly set bits in MSR before we invoke the handler.
>>>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>>>> +     */
>>>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>>>> +        msr |= (1ULL << MSR_LE);
>>>> +    }
>>>> +
>>>> +    if (env->msr & (1ULL << MSR_SF)) {
>>>> +        msr |= (1ULL << MSR_SF);
>>>> +    }
>>>> +
>>>> +    msr |= (1ULL << MSR_ME);
>>>> +
>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>> +        /*
>>>> +         * This implies that we have hit a machine check between system
>>>> +         * reset and "ibm,nmi-register". Fall back to the old machine
>>>> +         * check behavior in such cases.
>>>> +         */
>>>> +        env->spr[SPR_SRR0] = env->nip;
>>>> +        env->spr[SPR_SRR1] = env->msr;
>>>> +        env->msr = msr;
>>>> +        env->nip = 0x200;
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    ext_elog = g_malloc0(sizeof(*ext_elog));
>>>> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
>>>> +
>>>> +    log.summary = cpu_to_be32(summary);
>>>> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
>>>> +
>>>> +    /* r3 should be in BE always */
>>>> +    r3 = cpu_to_be64(env->gpr[3]);
>>>> +    env->msr = msr;
>>>> +
>>>> +    spapr_init_v6hdr(&ext_elog->v6hdr);
>>>> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
>>>> +    ext_elog->mc.hdr.section_length =
>>>> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
>>>> +    ext_elog->mc.hdr.section_version = 1;
>>>> +
>>>> +    /* get rtas addr from fdt */
>>>> +    rtas_addr = spapr_get_rtas_addr();
>>>> +    if (!rtas_addr) {
>>>> +        /* Unable to fetch rtas_addr. Hence reset the guest */
>>>> +        ppc_cpu_do_system_reset(cs);
>>>> +    }
>>>> +
>>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
>>>> +                              sizeof(r3));
>>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
>>>> +                              &log, sizeof(log));
>>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
>>>> +                              sizeof(log), ext_elog,
>>>> +                              sizeof(*ext_elog));
>>>> +
>>>> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
>>>> +    env->nip = spapr->guest_machine_check_addr;
>>>> +
>>>> +    g_free(ext_elog);
>>>> +}
>>>> +
>>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>  {
>>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> @@ -641,6 +873,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>          }
>>>>      }
>>>>      spapr->mc_status = cpu->vcpu_id;
>>>> +
>>>> +    spapr_mce_dispatch_elog(cpu, recovered);
>>>> +
>>>> +    return;  
>>>
>>> Drop the last two lines.  
>>
>> ok.
>>
>>>   
>>>>  }
>>>>  
>>>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index fc3a776..c717ab2 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
>>>>  
>>>>  #define RTAS_ERROR_LOG_MAX      2048
>>>>  
>>>> +/* Offset from rtas-base where error log is placed */
>>>> +#define RTAS_ERROR_LOG_OFFSET       0x30
>>>> +
>>>>  #define RTAS_EVENT_SCAN_RATE    1
>>>>  
>>>>  /* This helper should be used to encode interrupt specifiers when the related
>>>> @@ -799,6 +802,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
>>>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>>>                        uint64_t pte0, uint64_t pte1);
>>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>>> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
>>>>    
>>>
>>> Looks like a leftover.  
>>
>> ah.. yes.
>>
>>>   
>>>>  /* DRC callbacks. */
>>>>  void spapr_core_release(DeviceState *dev);
>>>>  
>>>   
>>
> 
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-04  6:08       ` Aravinda Prasad
@ 2019-06-04 14:50         ` Greg Kurz
  2019-06-06  5:17           ` Aravinda Prasad
  0 siblings, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-04 14:50 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Tue, 4 Jun 2019 11:38:31 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> On Monday 03 June 2019 04:47 PM, Greg Kurz wrote:
> > On Mon, 3 Jun 2019 12:12:43 +0200
> > Greg Kurz <groug@kaod.org> wrote:
> >   
> >> On Wed, 29 May 2019 11:10:14 +0530
> >> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >>  
> >>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>> and "ibm,nmi-interlock" RTAS calls.
> >>>
> >>> The machine check notification address is saved when the
> >>> OS issues "ibm,nmi-register" RTAS call.
> >>>
> >>> This patch also handles the case when multiple processors
> >>> experience machine check at or about the same time by
> >>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>> PAPR, subsequent processors serialize waiting for the first
> >>> processor to issue the "ibm,nmi-interlock" call. The second
> >>> processor that also received a machine check error waits
> >>> till the first processor is done reading the error log.
> >>> The first processor issues "ibm,nmi-interlock" call
> >>> when the error log is consumed. This patch implements the
> >>> releasing part of the error-log while subsequent patch
> >>> (which builds error log) handles the locking part.
> >>>
> >>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >>> ---    
> >>
> >> The code looks okay but it still seems wrong to advertise the RTAS
> >> calls to the guest that early in the series. The linux kernel in
> >> the guest will assume FWNMI is functional, which isn't true until
> >> patch 6 (yes, migration is part of the feature, it should be
> >> supported upfront, not fixed afterwards).
> >>
> >> It doesn't help much to introduce the RTAS calls early and to
> >> modify them in the other patches. I'd rather see the rest of
> >> the code first and a final patch that introduces the fully
> >> functional RTAS calls and calls spapr_rtas_register().
> >>  
> > 
> > Thinking again, you should introduce the "fwnmi-mce" spapr capability in
> > its own patch first, default to "off" and and have the last patch in the
> > series to switch the default to "on" for newer machine types only.
> > 
> > This patch should then only register the RTAS calls if "fwnmi-mcr" is set
> > to "on".
> > 
> > This should address the fact that we don't want to expose a partially
> > implemented FWNMI feature to the guest, and we don't want to support
> > FWNMI at all with older machine types for the sake of compatibility.  
> 
> When you say "expose a partially implemented FWNMI feature to the
> guest", do you mean while debugging/bisect we may end up with exposing
> the partially implemented FWNMI feature? Otherwise it is expected that

Yes, we don't want to break someone else's bisect.

> QEMU runs with all the 6 patches.
> 
> If that is the case, I will have the rtas nmi register functionality as
> the last patch in the series. This way we don't have to have spapr cap
> turned off first and later turned on. However, as mentioned earlier
> (when David raised the same concern), use of guest_machine_check_addr
> may look odd at other patches as it is set only during rtas nmi register.
> 

Why odd ?

> Or else, as a workaround, I can return RTAS_OUT_NOT_SUPPORTED for rtas
> nmi register till the entire functionality is implemented and only in
> the last patch in the series I will return RTAS_OUT_SUCCESS. This will
> ensure that we have a logical connection between the patches and the
> partially implemented fwnmi is not exposed to the guest kernel.
> 

Not exactly true. FWNMI would be exposed to the guest in the device tree
and the guest kernel would _just_ fail to set the fwnmi_active global:

	if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
				machine_check_addr))
		fwnmi_active = 1;

> Regards,
> Aravinda
> 
> 
> 
> 
> >   
> >>>  hw/ppc/spapr.c         |    7 +++++
> >>>  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  include/hw/ppc/spapr.h |    9 ++++++-
> >>>  3 files changed, 80 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>> index e2b33e5..fae28a9 100644
> >>> --- a/hw/ppc/spapr.c
> >>> +++ b/hw/ppc/spapr.c
> >>> @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
> >>>      first_ppc_cpu->env.gpr[5] = 0;
> >>>  
> >>>      spapr->cas_reboot = false;
> >>> +
> >>> +    spapr->guest_machine_check_addr = -1;
> >>> +
> >>> +    /* Signal all vCPUs waiting on this condition */
> >>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
> >>>  }
> >>>  
> >>>  static void spapr_create_nvram(SpaprMachineState *spapr)
> >>> @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
> >>>  
> >>>          kvmppc_spapr_enable_inkernel_multitce();
> >>>      }
> >>> +
> >>> +    qemu_cond_init(&spapr->mc_delivery_cond);
> >>>  }
> >>>  
> >>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> >>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>> index 5bc1a93..e7509cf 100644
> >>> --- a/hw/ppc/spapr_rtas.c
> >>> +++ b/hw/ppc/spapr_rtas.c
> >>> @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>      rtas_st(rets, 1, 100);
> >>>  }
> >>>  
> >>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>> +                                  SpaprMachineState *spapr,
> >>> +                                  uint32_t token, uint32_t nargs,
> >>> +                                  target_ulong args,
> >>> +                                  uint32_t nret, target_ulong rets)
> >>> +{
> >>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >>> +
> >>> +    if (!rtas_addr) {
> >>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>> +}
> >>> +
> >>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>> +                                   SpaprMachineState *spapr,
> >>> +                                   uint32_t token, uint32_t nargs,
> >>> +                                   target_ulong args,
> >>> +                                   uint32_t nret, target_ulong rets)
> >>> +{
> >>> +    if (spapr->guest_machine_check_addr == -1) {
> >>> +        /* NMI register not called */
> >>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>> +    } else {
> >>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>> +    }
> >>> +}
> >>> +
> >>>  static struct rtas_call {
> >>>      const char *name;
> >>>      spapr_rtas_fn fn;
> >>> @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
> >>>      }
> >>>  }
> >>>  
> >>> +hwaddr spapr_get_rtas_addr(void)
> >>> +{
> >>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >>> +    int rtas_node;
> >>> +    const struct fdt_property *rtas_addr_prop;
> >>> +    void *fdt = spapr->fdt_blob;
> >>> +    uint32_t rtas_addr;
> >>> +
> >>> +    /* fetch rtas addr from fdt */
> >>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> >>> +    if (rtas_node == 0) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> >>> +    if (!rtas_addr_prop) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    /*
> >>> +     * We assume that the OS called RTAS instantiate-rtas, but some other
> >>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
> >>> +     * as SLOF only supports 32-bit variant.
> >>> +     */
> >>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> >>> +    return (hwaddr)rtas_addr;
> >>> +}
> >>> +
> >>> +
> >>>  static void core_rtas_register_types(void)
> >>>  {
> >>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> >>> @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
> >>>                          rtas_set_power_level);
> >>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> >>>                          rtas_get_power_level);
> >>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> >>> +                        rtas_ibm_nmi_register);
> >>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> >>> +                        rtas_ibm_nmi_interlock);
> >>>  }
> >>>  
> >>>  type_init(core_rtas_register_types)
> >>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >>> index 4f5becf..9dc5e30 100644
> >>> --- a/include/hw/ppc/spapr.h
> >>> +++ b/include/hw/ppc/spapr.h
> >>> @@ -188,6 +188,10 @@ struct SpaprMachineState {
> >>>       * occurs during the unplug process. */
> >>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
> >>>  
> >>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> >>> +    target_ulong guest_machine_check_addr;
> >>> +    QemuCond mc_delivery_cond;
> >>> +
> >>>      /*< public >*/
> >>>      char *kvm_type;This means it isn't related to XIVE it to set
> >>>      char *host_model;
> >>> @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> >>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
> >>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
> >>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> >>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> >>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
> >>>  
> >>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> >>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
> >>>  
> >>>  /* RTAS ibm,get-system-parameter token values */
> >>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> >>> @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> >>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
> >>>  
> >>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> >>> +uint64_t spapr_get_rtas_addr(void);
> >>>  #endif /* HW_SPAPR_H */
> >>>     
> >>
> >>  
> >   
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-04  7:04     ` Aravinda Prasad
@ 2019-06-04 20:04       ` Greg Kurz
  2019-06-04 20:11         ` Greg Kurz
  0 siblings, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-04 20:04 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Tue, 4 Jun 2019 12:34:37 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> On Monday 03 June 2019 09:10 PM, Greg Kurz wrote:
> > On Wed, 29 May 2019 11:10:57 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >   
> >> This patch includes migration support for machine check
> >> handling. Especially this patch blocks VM migration
> >> requests until the machine check error handling is
> >> complete as (i) these errors are specific to the source
> >> hardware and is irrelevant on the target hardware,
> >> (ii) these errors cause data corruption and should
> >> be handled before migration.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---  
> > 
> > LGTM, just one issue: machine reset should del and free the blocker as well,
> > otherwise QEMU would crash if spapr_mce_req_event() is called again.  
> 
> Sure.
> 
> 
> >   
> >>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> >>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >>  hw/ppc/spapr_rtas.c    |    4 ++++
> >>  include/hw/ppc/spapr.h |    2 ++
> >>  4 files changed, 43 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index e8a77636..31c4850 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> >>      },
> >>  };
> >>  
> >> +static bool spapr_fwnmi_needed(void *opaque)
> >> +{
> >> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> >> +
> >> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;

And also you can drop the parens since == as precedence over ?:

> >> +}
> >> +
> >> +static const VMStateDescription vmstate_spapr_machine_check = {
> >> +    .name = "spapr_machine_check",
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .needed = spapr_fwnmi_needed,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> >> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >>  static const VMStateDescription vmstate_spapr = {
> >>      .name = "spapr",
> >>      .version_id = 3,
> >> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
> >>          &vmstate_spapr_dtb,
> >>          &vmstate_spapr_cap_large_decr,
> >>          &vmstate_spapr_cap_ccf_assist,
> >> +        &vmstate_spapr_machine_check,
> >>          NULL
> >>      }
> >>  };
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index 573c0b7..35e21e4 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -41,6 +41,7 @@
> >>  #include "qemu/bcd.h"
> >>  #include "hw/ppc/spapr_ovec.h"
> >>  #include <libfdt.h>
> >> +#include "migration/blocker.h"
> >>  
> >>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >>  #define   RTAS_LOG_VERSION_6                    0x06000000
> >> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>  {
> >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    int ret;
> >> +    Error *local_err = NULL;
> >> +
> >> +    error_setg(&spapr->fwnmi_migration_blocker,
> >> +            "Live migration not supported during machine check handling");
> >> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> >> +    if (ret < 0) {
> >> +        /*
> >> +         * We don't want to abort and let the migration to continue. In a
> >> +         * rare case, the machine check handler will run on the target
> >> +         * hardware. Though this is not preferable, it is better than aborting
> >> +         * the migration or killing the VM.
> >> +         */
> >> +        error_free(spapr->fwnmi_migration_blocker);
> >> +        warn_report_err(local_err);
> >> +    }
> >>  
> >>      while (spapr->mc_status != -1) {
> >>          /*
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index 91a7ab9..c849223 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -50,6 +50,7 @@
> >>  #include "target/ppc/mmu-hash64.h"
> >>  #include "target/ppc/mmu-book3s-v3.h"
> >>  #include "kvm_ppc.h"
> >> +#include "migration/blocker.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>          spapr->mc_status = -1;
> >>          qemu_cond_signal(&spapr->mc_delivery_cond);
> >>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >> +        error_free(spapr->fwnmi_migration_blocker);
> >> +        spapr->fwnmi_migration_blocker = NULL;
> >>      }
> >>  }
> >>  
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index bd75d4b..6c0cfd8 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -214,6 +214,8 @@ struct SpaprMachineState {
> >>      SpaprCapabilities def, eff, mig;
> >>  
> >>      unsigned gpu_numa_id;
> >> +
> >> +    Error *fwnmi_migration_blocker;
> >>  };
> >>  
> >>  #define H_SUCCESS         0
> >>
> >>  
> >   
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-04 20:04       ` Greg Kurz
@ 2019-06-04 20:11         ` Greg Kurz
  0 siblings, 0 replies; 44+ messages in thread
From: Greg Kurz @ 2019-06-04 20:11 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Tue, 4 Jun 2019 22:04:21 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Tue, 4 Jun 2019 12:34:37 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
> > On Monday 03 June 2019 09:10 PM, Greg Kurz wrote:  
> > > On Wed, 29 May 2019 11:10:57 +0530
> > > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > >     
> > >> This patch includes migration support for machine check
> > >> handling. Especially this patch blocks VM migration
> > >> requests until the machine check error handling is
> > >> complete as (i) these errors are specific to the source
> > >> hardware and is irrelevant on the target hardware,
> > >> (ii) these errors cause data corruption and should
> > >> be handled before migration.
> > >>
> > >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > >> ---    
> > > 
> > > LGTM, just one issue: machine reset should del and free the blocker as well,
> > > otherwise QEMU would crash if spapr_mce_req_event() is called again.    
> > 
> > Sure.
> > 
> >   
> > >     
> > >>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> > >>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> > >>  hw/ppc/spapr_rtas.c    |    4 ++++
> > >>  include/hw/ppc/spapr.h |    2 ++
> > >>  4 files changed, 43 insertions(+)
> > >>
> > >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > >> index e8a77636..31c4850 100644
> > >> --- a/hw/ppc/spapr.c
> > >> +++ b/hw/ppc/spapr.c
> > >> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> > >>      },
> > >>  };
> > >>  
> > >> +static bool spapr_fwnmi_needed(void *opaque)
> > >> +{
> > >> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > >> +
> > >> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;  
> 
> And also you can drop the parens since == as precedence over ?:
> 

... or even better make it spapr->guest_machine_check_addr != -1 :)

> > >> +}
> > >> +
> > >> +static const VMStateDescription vmstate_spapr_machine_check = {
> > >> +    .name = "spapr_machine_check",
> > >> +    .version_id = 1,
> > >> +    .minimum_version_id = 1,
> > >> +    .needed = spapr_fwnmi_needed,
> > >> +    .fields = (VMStateField[]) {
> > >> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> > >> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> > >> +        VMSTATE_END_OF_LIST()
> > >> +    },
> > >> +};
> > >> +
> > >>  static const VMStateDescription vmstate_spapr = {
> > >>      .name = "spapr",
> > >>      .version_id = 3,
> > >> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
> > >>          &vmstate_spapr_dtb,
> > >>          &vmstate_spapr_cap_large_decr,
> > >>          &vmstate_spapr_cap_ccf_assist,
> > >> +        &vmstate_spapr_machine_check,
> > >>          NULL
> > >>      }
> > >>  };
> > >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > >> index 573c0b7..35e21e4 100644
> > >> --- a/hw/ppc/spapr_events.c
> > >> +++ b/hw/ppc/spapr_events.c
> > >> @@ -41,6 +41,7 @@
> > >>  #include "qemu/bcd.h"
> > >>  #include "hw/ppc/spapr_ovec.h"
> > >>  #include <libfdt.h>
> > >> +#include "migration/blocker.h"
> > >>  
> > >>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> > >>  #define   RTAS_LOG_VERSION_6                    0x06000000
> > >> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> > >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > >>  {
> > >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > >> +    int ret;
> > >> +    Error *local_err = NULL;
> > >> +
> > >> +    error_setg(&spapr->fwnmi_migration_blocker,
> > >> +            "Live migration not supported during machine check handling");
> > >> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> > >> +    if (ret < 0) {
> > >> +        /*
> > >> +         * We don't want to abort and let the migration to continue. In a
> > >> +         * rare case, the machine check handler will run on the target
> > >> +         * hardware. Though this is not preferable, it is better than aborting
> > >> +         * the migration or killing the VM.
> > >> +         */
> > >> +        error_free(spapr->fwnmi_migration_blocker);
> > >> +        warn_report_err(local_err);
> > >> +    }
> > >>  
> > >>      while (spapr->mc_status != -1) {
> > >>          /*
> > >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > >> index 91a7ab9..c849223 100644
> > >> --- a/hw/ppc/spapr_rtas.c
> > >> +++ b/hw/ppc/spapr_rtas.c
> > >> @@ -50,6 +50,7 @@
> > >>  #include "target/ppc/mmu-hash64.h"
> > >>  #include "target/ppc/mmu-book3s-v3.h"
> > >>  #include "kvm_ppc.h"
> > >> +#include "migration/blocker.h"
> > >>  
> > >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > >>                                     uint32_t token, uint32_t nargs,
> > >> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> > >>          spapr->mc_status = -1;
> > >>          qemu_cond_signal(&spapr->mc_delivery_cond);
> > >>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > >> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> > >> +        error_free(spapr->fwnmi_migration_blocker);
> > >> +        spapr->fwnmi_migration_blocker = NULL;
> > >>      }
> > >>  }
> > >>  
> > >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > >> index bd75d4b..6c0cfd8 100644
> > >> --- a/include/hw/ppc/spapr.h
> > >> +++ b/include/hw/ppc/spapr.h
> > >> @@ -214,6 +214,8 @@ struct SpaprMachineState {
> > >>      SpaprCapabilities def, eff, mig;
> > >>  
> > >>      unsigned gpu_numa_id;
> > >> +
> > >> +    Error *fwnmi_migration_blocker;
> > >>  };
> > >>  
> > >>  #define H_SUCCESS         0
> > >>
> > >>    
> > >     
> >   
> 
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
  2019-06-03 10:12   ` Greg Kurz
@ 2019-06-06  1:34   ` David Gibson
  2019-06-06  5:26     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  1 sibling, 1 reply; 44+ messages in thread
From: David Gibson @ 2019-06-06  1:34 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 7502 bytes --]

On Wed, May 29, 2019 at 11:10:14AM +0530, Aravinda Prasad wrote:
> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed. This patch implements the
> releasing part of the error-log while subsequent patch
> (which builds error log) handles the locking part.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c         |    7 +++++
>  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    9 ++++++-
>  3 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e2b33e5..fae28a9 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
>      first_ppc_cpu->env.gpr[5] = 0;
>  
>      spapr->cas_reboot = false;
> +
> +    spapr->guest_machine_check_addr = -1;
> +
> +    /* Signal all vCPUs waiting on this condition */
> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>  }
>  
>  static void spapr_create_nvram(SpaprMachineState *spapr)
> @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
>  
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
> +
> +    qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 5bc1a93..e7509cf 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>      }
>  }
>  
> +hwaddr spapr_get_rtas_addr(void)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int rtas_node;
> +    const struct fdt_property *rtas_addr_prop;
> +    void *fdt = spapr->fdt_blob;
> +    uint32_t rtas_addr;
> +
> +    /* fetch rtas addr from fdt */
> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> +    if (rtas_node == 0) {
> +        return 0;

This is incorrect, a return code < 0 indicates an error which you
should check for.  A return code of 0 indicates the root node, which
could only happen if libfdt was badly buggy.

> +    }
> +
> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);

fdt_get_property is generally only needed for certain edge cases.
fdt_getprop() is a better option.

> +    if (!rtas_addr_prop) {
> +        return 0;
> +    }
> +
> +    /*
> +     * We assume that the OS called RTAS instantiate-rtas, but some other
> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
> +     * as SLOF only supports 32-bit variant.
> +     */
> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> +    return (hwaddr)rtas_addr;
> +}
> +
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
>                          rtas_set_power_level);
>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>                          rtas_get_power_level);
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
>  }
>  
>  type_init(core_rtas_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 4f5becf..9dc5e30 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -188,6 +188,10 @@ struct SpaprMachineState {
>       * occurs during the unplug process. */
>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>  
> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> +    target_ulong guest_machine_check_addr;
> +    QemuCond mc_delivery_cond;
> +
>      /*< public >*/
>      char *kvm_type;
>      char *host_model;
> @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> +uint64_t spapr_get_rtas_addr(void);
>  #endif /* HW_SPAPR_H */
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-03 11:17     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-06-04  6:08       ` Aravinda Prasad
@ 2019-06-06  1:35       ` David Gibson
  2019-06-06  4:39         ` Aravinda Prasad
  1 sibling, 1 reply; 44+ messages in thread
From: David Gibson @ 2019-06-06  1:35 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, Aravinda Prasad

[-- Attachment #1: Type: text/plain, Size: 9289 bytes --]

On Mon, Jun 03, 2019 at 01:17:23PM +0200, Greg Kurz wrote:
> On Mon, 3 Jun 2019 12:12:43 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
> > On Wed, 29 May 2019 11:10:14 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > 
> > > This patch adds support in QEMU to handle "ibm,nmi-register"
> > > and "ibm,nmi-interlock" RTAS calls.
> > > 
> > > The machine check notification address is saved when the
> > > OS issues "ibm,nmi-register" RTAS call.
> > > 
> > > This patch also handles the case when multiple processors
> > > experience machine check at or about the same time by
> > > handling "ibm,nmi-interlock" call. In such cases, as per
> > > PAPR, subsequent processors serialize waiting for the first
> > > processor to issue the "ibm,nmi-interlock" call. The second
> > > processor that also received a machine check error waits
> > > till the first processor is done reading the error log.
> > > The first processor issues "ibm,nmi-interlock" call
> > > when the error log is consumed. This patch implements the
> > > releasing part of the error-log while subsequent patch
> > > (which builds error log) handles the locking part.
> > > 
> > > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---  
> > 
> > The code looks okay but it still seems wrong to advertise the RTAS
> > calls to the guest that early in the series. The linux kernel in
> > the guest will assume FWNMI is functional, which isn't true until
> > patch 6 (yes, migration is part of the feature, it should be
> > supported upfront, not fixed afterwards).
> > 
> > It doesn't help much to introduce the RTAS calls early and to
> > modify them in the other patches. I'd rather see the rest of
> > the code first and a final patch that introduces the fully
> > functional RTAS calls and calls spapr_rtas_register().
> > 
> 
> Thinking again, you should introduce the "fwnmi-mce" spapr capability in
> its own patch first, default to "off" and and have the last patch in the
> series to switch the default to "on" for newer machine types only.
> 
> This patch should then only register the RTAS calls if "fwnmi-mcr" is set
> to "on".

Yes, I think this is a good approach.

> This should address the fact that we don't want to expose a partially
> implemented FWNMI feature to the guest, and we don't want to support
> FWNMI at all with older machine types for the sake of compatibility.
> 
> > >  hw/ppc/spapr.c         |    7 +++++
> > >  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/ppc/spapr.h |    9 ++++++-
> > >  3 files changed, 80 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index e2b33e5..fae28a9 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
> > >      first_ppc_cpu->env.gpr[5] = 0;
> > >  
> > >      spapr->cas_reboot = false;
> > > +
> > > +    spapr->guest_machine_check_addr = -1;
> > > +
> > > +    /* Signal all vCPUs waiting on this condition */
> > > +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
> > >  }
> > >  
> > >  static void spapr_create_nvram(SpaprMachineState *spapr)
> > > @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
> > >  
> > >          kvmppc_spapr_enable_inkernel_multitce();
> > >      }
> > > +
> > > +    qemu_cond_init(&spapr->mc_delivery_cond);
> > >  }
> > >  
> > >  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > index 5bc1a93..e7509cf 100644
> > > --- a/hw/ppc/spapr_rtas.c
> > > +++ b/hw/ppc/spapr_rtas.c
> > > @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > >      rtas_st(rets, 1, 100);
> > >  }
> > >  
> > > +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> > > +                                  SpaprMachineState *spapr,
> > > +                                  uint32_t token, uint32_t nargs,
> > > +                                  target_ulong args,
> > > +                                  uint32_t nret, target_ulong rets)
> > > +{
> > > +    hwaddr rtas_addr = spapr_get_rtas_addr();
> > > +
> > > +    if (!rtas_addr) {
> > > +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> > > +        return;
> > > +    }
> > > +
> > > +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> > > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > +}
> > > +
> > > +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> > > +                                   SpaprMachineState *spapr,
> > > +                                   uint32_t token, uint32_t nargs,
> > > +                                   target_ulong args,
> > > +                                   uint32_t nret, target_ulong rets)
> > > +{
> > > +    if (spapr->guest_machine_check_addr == -1) {
> > > +        /* NMI register not called */
> > > +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > > +    } else {
> > > +        qemu_cond_signal(&spapr->mc_delivery_cond);
> > > +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > +    }
> > > +}
> > > +
> > >  static struct rtas_call {
> > >      const char *name;
> > >      spapr_rtas_fn fn;
> > > @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
> > >      }
> > >  }
> > >  
> > > +hwaddr spapr_get_rtas_addr(void)
> > > +{
> > > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > > +    int rtas_node;
> > > +    const struct fdt_property *rtas_addr_prop;
> > > +    void *fdt = spapr->fdt_blob;
> > > +    uint32_t rtas_addr;
> > > +
> > > +    /* fetch rtas addr from fdt */
> > > +    rtas_node = fdt_path_offset(fdt, "/rtas");
> > > +    if (rtas_node == 0) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> > > +    if (!rtas_addr_prop) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    /*
> > > +     * We assume that the OS called RTAS instantiate-rtas, but some other
> > > +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
> > > +     * as SLOF only supports 32-bit variant.
> > > +     */
> > > +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> > > +    return (hwaddr)rtas_addr;
> > > +}
> > > +
> > > +
> > >  static void core_rtas_register_types(void)
> > >  {
> > >      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> > > @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
> > >                          rtas_set_power_level);
> > >      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> > >                          rtas_get_power_level);
> > > +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> > > +                        rtas_ibm_nmi_register);
> > > +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> > > +                        rtas_ibm_nmi_interlock);
> > >  }
> > >  
> > >  type_init(core_rtas_register_types)
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index 4f5becf..9dc5e30 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -188,6 +188,10 @@ struct SpaprMachineState {
> > >       * occurs during the unplug process. */
> > >      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
> > >  
> > > +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> > > +    target_ulong guest_machine_check_addr;
> > > +    QemuCond mc_delivery_cond;
> > > +
> > >      /*< public >*/
> > >      char *kvm_type;
> > >      char *host_model;
> > > @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> > >  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
> > >  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
> > >  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> > > +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> > > +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
> > >  
> > > -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> > > +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
> > >  
> > >  /* RTAS ibm,get-system-parameter token values */
> > >  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> > > @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> > >  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
> > >  
> > >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > > +uint64_t spapr_get_rtas_addr(void);
> > >  #endif /* HW_SPAPR_H */
> > >   
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
  2019-06-03 11:53   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-06-06  1:43   ` David Gibson
  2019-06-06  4:45     ` Aravinda Prasad
  1 sibling, 1 reply; 44+ messages in thread
From: David Gibson @ 2019-06-06  1:43 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 6834 bytes --]

On Wed, May 29, 2019 at 11:10:32AM +0530, Aravinda Prasad wrote:
> Memory error such as bit flips that cannot be corrected
> by hardware are passed on to the kernel for handling.
> If the memory address in error belongs to guest then
> the guest kernel is responsible for taking suitable action.
> Patch [1] enhances KVM to exit guest with exit reason
> set to KVM_EXIT_NMI in such cases. This patch handles
> KVM_EXIT_NMI exit.
> 
> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>     (e20bbd3d and related commits)
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c          |    1 +
>  hw/ppc/spapr_events.c   |   23 +++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c     |    5 +++++
>  include/hw/ppc/spapr.h  |    6 ++++++
>  target/ppc/kvm.c        |   16 ++++++++++++++++
>  target/ppc/kvm_ppc.h    |    2 ++
>  target/ppc/trace-events |    1 +
>  7 files changed, 54 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index fae28a9..6b6c962 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1809,6 +1809,7 @@ static void spapr_machine_reset(void)
>  
>      spapr->cas_reboot = false;
>  
> +    spapr->mc_status = -1;
>      spapr->guest_machine_check_addr = -1;
>  
>      /* Signal all vCPUs waiting on this condition */
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ae0f093..a18446b 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -620,6 +620,29 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());

You ignore the 'recovered' parameter, is that right?

> +    while (spapr->mc_status != -1) {
> +        /*
> +         * Check whether the same CPU got machine check error
> +         * while still handling the mc error (i.e., before
> +         * that CPU called "ibm,nmi-interlock")
> +         */
> +        if (spapr->mc_status == cpu->vcpu_id) {
> +            qemu_system_guest_panicked(NULL);
> +            return;
> +        }
> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> +        /* Meanwhile if the system is reset, then just return */
> +        if (spapr->guest_machine_check_addr == -1) {
> +            return;
> +        }
> +    }
> +    spapr->mc_status = cpu->vcpu_id;
> +}
> +
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                              uint32_t token, uint32_t nargs,
>                              target_ulong args,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index e7509cf..e0bdfc8 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -379,6 +379,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          /* NMI register not called */
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>      } else {
> +        /*
> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_status.
> +         */
> +        spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>      }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9dc5e30..fc3a776 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -190,6 +190,11 @@ struct SpaprMachineState {
>  
>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>      target_ulong guest_machine_check_addr;
> +    /*
> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> +     * handling the mc.
> +     */
> +    int mc_status;
>      QemuCond mc_delivery_cond;
>  
>      /*< public >*/
> @@ -793,6 +798,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 3bf0a46..39f1a73 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1761,6 +1761,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ret = 0;
>          break;
>  
> +    case KVM_EXIT_NMI:
> +        trace_kvm_handle_nmi_exception();
> +        ret = kvm_handle_nmi(cpu, run);
> +        break;
> +
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> @@ -2844,6 +2849,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>      return data & 0xffff;
>  }
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> +{
> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    spapr_mce_req_event(cpu, recovered);

Urgh.. KVM calling directly into spapr code isn't conceptually
correct, although it's usually kind of ok in practice.  I guess we
already do it for hypercalls, so this is no worse.  If we ever have
NMI events for KVM PR or BookE KVM which could happen for non-PAPR
guests, I guess we'll need to re-examine this.

> +    return 0;
> +}
> +
>  int kvmppc_enable_hwrng(void)
>  {
>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 45776ca..18693f1 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -81,6 +81,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> +
>  #else
>  
>  static inline uint32_t kvmppc_get_tbfreq(void)
> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> index 3dc6740..6d15aa9 100644
> --- a/target/ppc/trace-events
> +++ b/target/ppc/trace-events
> @@ -28,3 +28,4 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>  kvm_handle_epr(void) "handle epr"
>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>  kvm_handle_debug_exception(void) "handle debug exception"
> +kvm_handle_nmi_exception(void) "handle NMI exception"
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE
  2019-06-04  6:29     ` Aravinda Prasad
  2019-06-04  9:01       ` Greg Kurz
@ 2019-06-06  2:57       ` David Gibson
  1 sibling, 0 replies; 44+ messages in thread
From: David Gibson @ 2019-06-06  2:57 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 15594 bytes --]

On Tue, Jun 04, 2019 at 11:59:13AM +0530, Aravinda Prasad wrote:
> 
> 
> On Monday 03 June 2019 07:30 PM, Greg Kurz wrote:
> > On Wed, 29 May 2019 11:10:40 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > 
> >> Upon a machine check exception (MCE) in a guest address space,
> >> KVM causes a guest exit to enable QEMU to build and pass the
> >> error to the guest in the PAPR defined rtas error log format.
> >>
> >> This patch builds the rtas error log, copies it to the rtas_addr
> >> and then invokes the guest registered machine check handler. The
> >> handler in the guest takes suitable action(s) depending on the type
> >> and criticality of the error. For example, if an error is
> >> unrecoverable memory corruption in an application inside the
> >> guest, then the guest kernel sends a SIGBUS to the application.
> >> For recoverable errors, the guest performs recovery actions and
> >> logs the error.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |    5 +
> >>  hw/ppc/spapr_events.c  |  236 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |    4 +
> >>  3 files changed, 245 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 6b6c962..c97f6a6 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2910,6 +2910,11 @@ static void spapr_machine_init(MachineState *machine)
> >>          error_report("Could not get size of LPAR rtas '%s'", filename);
> >>          exit(1);
> >>      }
> >> +
> >> +    /* Resize blob to accommodate error log. */
> >> +    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);
> > 
> > I don't see the point of this assertion... especially with the assignment
> > below.
> 
> It is required because we want to ensure that the rtas image size is
> less than RTAS_ERROR_LOG_OFFSET, or else we will overwrite the rtas
> image with rtas error when we hit machine check exception. But I think a
> comment in the code will help. Will add it.
> 
> 
> > 
> >> +    spapr->rtas_size = RTAS_ERROR_LOG_MAX;
> > 
> > As requested by David, this should only be done when the spapr cap is set,
> > so that 4.0 machine types and older continue to use the current size.
> 
> Due to other issue of re-allocating the blob and as this is not that
> much space, we agreed to keep the size to RTAS_ERROR_LOG_MAX always.
> 
> Link to the discussion on this:
> http://lists.nongnu.org/archive/html/qemu-ppc/2019-05/msg00275.html

Sorry, I wasn't clear in that discussion.  It is definitely *not* ok
to advertise the increased size to the guest for old machine types.
It *is* ok to waste some space inside qemu internal allocations if it
reduces conditionals.


> 
> > 
> >> +
> >>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
> >>          error_report("Could not load LPAR rtas '%s'", filename);
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index a18446b..573c0b7 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -212,6 +212,106 @@ struct hp_extended_log {
> >>      struct rtas_event_log_v6_hp hp;
> >>  } QEMU_PACKED;
> >>  
> >> +struct rtas_event_log_v6_mc {
> >> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> >> +    struct rtas_event_log_v6_section_header hdr;
> >> +    uint32_t fru_id;
> >> +    uint32_t proc_id;
> >> +    uint8_t error_type;
> >> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> >> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> >> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> >> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> >> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> >> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> >> +    uint8_t sub_err_type;
> >> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> >> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> >> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> >> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> >> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> >> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> >> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> >> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> >> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> >> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> >> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> >> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> >> +    uint8_t reserved_1[6];
> >> +    uint64_t effective_address;
> >> +    uint64_t logical_address;
> >> +} QEMU_PACKED;
> >> +
> >> +struct mc_extended_log {
> >> +    struct rtas_event_log_v6 v6hdr;
> >> +    struct rtas_event_log_v6_mc mc;
> >> +} QEMU_PACKED;
> >> +
> >> +struct MC_ierror_table {
> >> +    unsigned long srr1_mask;
> >> +    unsigned long srr1_value;
> >> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_ierror_table mc_ierror_table[] = {
> >> +{ 0x00000000081c0000, 0x0000000000040000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000080000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000100000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000140000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000180000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, 0, 0, 0, 0, 0 } };
> >> +
> >> +struct MC_derror_table {
> >> +    unsigned long dsisr_value;
> >> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_derror_table mc_derror_table[] = {
> >> +{ 0x00008000, false,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00004000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000800, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000400, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000080, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000100, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, false, 0, 0, 0, 0 } };
> >> +
> >> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> >> +
> >>  typedef enum EventClass {
> >>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
> >>      EVENT_CLASS_EPOW                = 1,
> >> @@ -620,6 +720,138 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> >>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >>  }
> >>  
> >> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> >> +                                        struct mc_extended_log *ext_elog)
> >> +{
> >> +    int i;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    uint32_t summary;
> >> +    uint64_t dsisr = env->spr[SPR_DSISR];
> >> +
> >> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> >> +    if (recovered) {
> >> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> >> +    } else {
> >> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> >> +    }
> >> +
> >> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> >> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> >> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> >> +            if (mc_derror_table[i].dar_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> >> +            }
> >> +
> >> +            summary |= mc_derror_table[i].initiator
> >> +                        | mc_derror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    } else {
> >> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> >> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> >> +                    mc_ierror_table[i].srr1_value) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> >> +            if (mc_ierror_table[i].nip_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> >> +            }
> >> +
> >> +            summary |= mc_ierror_table[i].initiator
> >> +                        | mc_ierror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    }
> >> +
> >> +    summary |= RTAS_LOG_INITIATOR_CPU;
> >> +    return summary;
> >> +}
> >> +
> >> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >> +{
> >> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    CPUState *cs = CPU(cpu);
> >> +    uint64_t rtas_addr;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> >> +    target_ulong r3, msr = 0;
> >> +    struct rtas_error_log log;
> >> +    struct mc_extended_log *ext_elog;
> >> +    uint32_t summary;
> >> +
> >> +    /*
> >> +     * Properly set bits in MSR before we invoke the handler.
> >> +     * SRR0/1, DAR and DSISR are properly set by KVM
> >> +     */
> >> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> >> +        msr |= (1ULL << MSR_LE);
> >> +    }
> >> +
> >> +    if (env->msr & (1ULL << MSR_SF)) {
> >> +        msr |= (1ULL << MSR_SF);
> >> +    }
> >> +
> >> +    msr |= (1ULL << MSR_ME);
> >> +
> >> +    if (spapr->guest_machine_check_addr == -1) {
> >> +        /*
> >> +         * This implies that we have hit a machine check between system
> >> +         * reset and "ibm,nmi-register". Fall back to the old machine
> >> +         * check behavior in such cases.
> >> +         */
> >> +        env->spr[SPR_SRR0] = env->nip;
> >> +        env->spr[SPR_SRR1] = env->msr;
> >> +        env->msr = msr;
> >> +        env->nip = 0x200;
> >> +        return;
> >> +    }
> >> +
> >> +    ext_elog = g_malloc0(sizeof(*ext_elog));
> >> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> >> +
> >> +    log.summary = cpu_to_be32(summary);
> >> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
> >> +
> >> +    /* r3 should be in BE always */
> >> +    r3 = cpu_to_be64(env->gpr[3]);
> >> +    env->msr = msr;
> >> +
> >> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> >> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> >> +    ext_elog->mc.hdr.section_length =
> >> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> >> +    ext_elog->mc.hdr.section_version = 1;
> >> +
> >> +    /* get rtas addr from fdt */
> >> +    rtas_addr = spapr_get_rtas_addr();
> >> +    if (!rtas_addr) {
> >> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> >> +        ppc_cpu_do_system_reset(cs);
> >> +    }
> >> +
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
> >> +                              sizeof(r3));
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
> >> +                              &log, sizeof(log));
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
> >> +                              sizeof(log), ext_elog,
> >> +                              sizeof(*ext_elog));
> >> +
> >> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
> >> +    env->nip = spapr->guest_machine_check_addr;
> >> +
> >> +    g_free(ext_elog);
> >> +}
> >> +
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>  {
> >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> @@ -641,6 +873,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>          }
> >>      }
> >>      spapr->mc_status = cpu->vcpu_id;
> >> +
> >> +    spapr_mce_dispatch_elog(cpu, recovered);
> >> +
> >> +    return;
> > 
> > Drop the last two lines.
> 
> ok.
> 
> > 
> >>  }
> >>  
> >>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index fc3a776..c717ab2 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
> >>  
> >>  #define RTAS_ERROR_LOG_MAX      2048
> >>  
> >> +/* Offset from rtas-base where error log is placed */
> >> +#define RTAS_ERROR_LOG_OFFSET       0x30
> >> +
> >>  #define RTAS_EVENT_SCAN_RATE    1
> >>  
> >>  /* This helper should be used to encode interrupt specifiers when the related
> >> @@ -799,6 +802,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
> >>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> >>                        uint64_t pte0, uint64_t pte1);
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> >> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
> >>  
> > 
> > Looks like a leftover.
> 
> ah.. yes.
> 
> > 
> >>  /* DRC callbacks. */
> >>  void spapr_core_release(DeviceState *dev);
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE
  2019-06-04  9:01       ` Greg Kurz
  2019-06-04 10:10         ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
@ 2019-06-06  2:58         ` David Gibson
  1 sibling, 0 replies; 44+ messages in thread
From: David Gibson @ 2019-06-06  2:58 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, Aravinda Prasad

[-- Attachment #1: Type: text/plain, Size: 16908 bytes --]

On Tue, Jun 04, 2019 at 11:01:19AM +0200, Greg Kurz wrote:
> On Tue, 4 Jun 2019 11:59:13 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
> > On Monday 03 June 2019 07:30 PM, Greg Kurz wrote:
> > > On Wed, 29 May 2019 11:10:40 +0530
> > > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > >   
> > >> Upon a machine check exception (MCE) in a guest address space,
> > >> KVM causes a guest exit to enable QEMU to build and pass the
> > >> error to the guest in the PAPR defined rtas error log format.
> > >>
> > >> This patch builds the rtas error log, copies it to the rtas_addr
> > >> and then invokes the guest registered machine check handler. The
> > >> handler in the guest takes suitable action(s) depending on the type
> > >> and criticality of the error. For example, if an error is
> > >> unrecoverable memory corruption in an application inside the
> > >> guest, then the guest kernel sends a SIGBUS to the application.
> > >> For recoverable errors, the guest performs recovery actions and
> > >> logs the error.
> > >>
> > >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > >> ---
> > >>  hw/ppc/spapr.c         |    5 +
> > >>  hw/ppc/spapr_events.c  |  236 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >>  include/hw/ppc/spapr.h |    4 +
> > >>  3 files changed, 245 insertions(+)
> > >>
> > >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > >> index 6b6c962..c97f6a6 100644
> > >> --- a/hw/ppc/spapr.c
> > >> +++ b/hw/ppc/spapr.c
> > >> @@ -2910,6 +2910,11 @@ static void spapr_machine_init(MachineState *machine)
> > >>          error_report("Could not get size of LPAR rtas '%s'", filename);
> > >>          exit(1);
> > >>      }
> > >> +
> > >> +    /* Resize blob to accommodate error log. */
> > >> +    g_assert(spapr->rtas_size < RTAS_ERROR_LOG_OFFSET);  
> > > 
> > > I don't see the point of this assertion... especially with the assignment
> > > below.  
> > 
> > It is required because we want to ensure that the rtas image size is
> > less than RTAS_ERROR_LOG_OFFSET, or else we will overwrite the rtas
> > image with rtas error when we hit machine check exception. But I think a
> > comment in the code will help. Will add it.
> 
> I'd rather exit QEMU properly instead of aborting then. Also this is only
> needed if the guest has a chance to use FWNMI, ie. the spapr cap is
> set.

I think assert() is appropriate in this case.  If it fails it means
something is wrong in the code, not with configuration.

> 
> > 
> > >   
> > >> +    spapr->rtas_size = RTAS_ERROR_LOG_MAX;  
> > > 
> > > As requested by David, this should only be done when the spapr cap is set,
> > > so that 4.0 machine types and older continue to use the current size.  
> > 
> > Due to other issue of re-allocating the blob and as this is not that
> > much space, we agreed to keep the size to RTAS_ERROR_LOG_MAX always.
> > 
> > Link to the discussion on this:
> > http://lists.nongnu.org/archive/html/qemu-ppc/2019-05/msg00275.html
> > 
> 
> Indeed, and in the next mail in that thread, David writes:
> 
> > No, that's not right.  It's impractical to change the allocation
> > depending on whether fwnmi is currently active.  But you *can* (and
> > should) base the allocation on whether fwnmi is *possible* - that is,
> > the value of the spapr cap.
> 
> ie, allocate RTAS_ERROR_LOG_MAX when the spapr cap is set, allocate
> the file size otherwise.
> 
> > >   
> > >> +
> > >>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> > >>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
> > >>          error_report("Could not load LPAR rtas '%s'", filename);
> > >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > >> index a18446b..573c0b7 100644
> > >> --- a/hw/ppc/spapr_events.c
> > >> +++ b/hw/ppc/spapr_events.c
> > >> @@ -212,6 +212,106 @@ struct hp_extended_log {
> > >>      struct rtas_event_log_v6_hp hp;
> > >>  } QEMU_PACKED;
> > >>  
> > >> +struct rtas_event_log_v6_mc {
> > >> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> > >> +    struct rtas_event_log_v6_section_header hdr;
> > >> +    uint32_t fru_id;
> > >> +    uint32_t proc_id;
> > >> +    uint8_t error_type;
> > >> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> > >> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> > >> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> > >> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> > >> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> > >> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> > >> +    uint8_t sub_err_type;
> > >> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> > >> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> > >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> > >> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> > >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> > >> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> > >> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> > >> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> > >> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> > >> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> > >> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> > >> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> > >> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> > >> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> > >> +    uint8_t reserved_1[6];
> > >> +    uint64_t effective_address;
> > >> +    uint64_t logical_address;
> > >> +} QEMU_PACKED;
> > >> +
> > >> +struct mc_extended_log {
> > >> +    struct rtas_event_log_v6 v6hdr;
> > >> +    struct rtas_event_log_v6_mc mc;
> > >> +} QEMU_PACKED;
> > >> +
> > >> +struct MC_ierror_table {
> > >> +    unsigned long srr1_mask;
> > >> +    unsigned long srr1_value;
> > >> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> > >> +    uint8_t error_type;
> > >> +    uint8_t error_subtype;
> > >> +    unsigned int initiator;
> > >> +    unsigned int severity;
> > >> +};
> > >> +
> > >> +static const struct MC_ierror_table mc_ierror_table[] = {
> > >> +{ 0x00000000081c0000, 0x0000000000040000, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000000081c0000, 0x0000000000080000, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000000081c0000, 0x0000000000100000, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000000081c0000, 0x0000000000140000, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000000081c0000, 0x0000000000180000, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0, 0, 0, 0, 0, 0 } };
> > >> +
> > >> +struct MC_derror_table {
> > >> +    unsigned long dsisr_value;
> > >> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> > >> +    uint8_t error_type;
> > >> +    uint8_t error_subtype;
> > >> +    unsigned int initiator;
> > >> +    unsigned int severity;
> > >> +};
> > >> +
> > >> +static const struct MC_derror_table mc_derror_table[] = {
> > >> +{ 0x00008000, false,
> > >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00004000, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000800, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000400, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000080, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0x00000100, true,
> > >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> > >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > >> +{ 0, false, 0, 0, 0, 0 } };
> > >> +
> > >> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> > >> +
> > >>  typedef enum EventClass {
> > >>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
> > >>      EVENT_CLASS_EPOW                = 1,
> > >> @@ -620,6 +720,138 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> > >>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> > >>  }
> > >>  
> > >> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> > >> +                                        struct mc_extended_log *ext_elog)
> > >> +{
> > >> +    int i;
> > >> +    CPUPPCState *env = &cpu->env;
> > >> +    uint32_t summary;
> > >> +    uint64_t dsisr = env->spr[SPR_DSISR];
> > >> +
> > >> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> > >> +    if (recovered) {
> > >> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> > >> +    } else {
> > >> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> > >> +    }
> > >> +
> > >> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> > >> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> > >> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> > >> +                continue;
> > >> +            }
> > >> +
> > >> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> > >> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> > >> +            if (mc_derror_table[i].dar_valid) {
> > >> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> > >> +            }
> > >> +
> > >> +            summary |= mc_derror_table[i].initiator
> > >> +                        | mc_derror_table[i].severity;
> > >> +
> > >> +            return summary;
> > >> +        }
> > >> +    } else {
> > >> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> > >> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> > >> +                    mc_ierror_table[i].srr1_value) {
> > >> +                continue;
> > >> +            }
> > >> +
> > >> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> > >> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> > >> +            if (mc_ierror_table[i].nip_valid) {
> > >> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> > >> +            }
> > >> +
> > >> +            summary |= mc_ierror_table[i].initiator
> > >> +                        | mc_ierror_table[i].severity;
> > >> +
> > >> +            return summary;
> > >> +        }
> > >> +    }
> > >> +
> > >> +    summary |= RTAS_LOG_INITIATOR_CPU;
> > >> +    return summary;
> > >> +}
> > >> +
> > >> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> > >> +{
> > >> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > >> +    CPUState *cs = CPU(cpu);
> > >> +    uint64_t rtas_addr;
> > >> +    CPUPPCState *env = &cpu->env;
> > >> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> > >> +    target_ulong r3, msr = 0;
> > >> +    struct rtas_error_log log;
> > >> +    struct mc_extended_log *ext_elog;
> > >> +    uint32_t summary;
> > >> +
> > >> +    /*
> > >> +     * Properly set bits in MSR before we invoke the handler.
> > >> +     * SRR0/1, DAR and DSISR are properly set by KVM
> > >> +     */
> > >> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> > >> +        msr |= (1ULL << MSR_LE);
> > >> +    }
> > >> +
> > >> +    if (env->msr & (1ULL << MSR_SF)) {
> > >> +        msr |= (1ULL << MSR_SF);
> > >> +    }
> > >> +
> > >> +    msr |= (1ULL << MSR_ME);
> > >> +
> > >> +    if (spapr->guest_machine_check_addr == -1) {
> > >> +        /*
> > >> +         * This implies that we have hit a machine check between system
> > >> +         * reset and "ibm,nmi-register". Fall back to the old machine
> > >> +         * check behavior in such cases.
> > >> +         */
> > >> +        env->spr[SPR_SRR0] = env->nip;
> > >> +        env->spr[SPR_SRR1] = env->msr;
> > >> +        env->msr = msr;
> > >> +        env->nip = 0x200;
> > >> +        return;
> > >> +    }
> > >> +
> > >> +    ext_elog = g_malloc0(sizeof(*ext_elog));
> > >> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> > >> +
> > >> +    log.summary = cpu_to_be32(summary);
> > >> +    log.extended_length = cpu_to_be32(sizeof(*ext_elog));
> > >> +
> > >> +    /* r3 should be in BE always */
> > >> +    r3 = cpu_to_be64(env->gpr[3]);
> > >> +    env->msr = msr;
> > >> +
> > >> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> > >> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> > >> +    ext_elog->mc.hdr.section_length =
> > >> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> > >> +    ext_elog->mc.hdr.section_version = 1;
> > >> +
> > >> +    /* get rtas addr from fdt */
> > >> +    rtas_addr = spapr_get_rtas_addr();
> > >> +    if (!rtas_addr) {
> > >> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> > >> +        ppc_cpu_do_system_reset(cs);
> > >> +    }
> > >> +
> > >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET, &r3,
> > >> +                              sizeof(r3));
> > >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3),
> > >> +                              &log, sizeof(log));
> > >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERROR_LOG_OFFSET + sizeof(r3) +
> > >> +                              sizeof(log), ext_elog,
> > >> +                              sizeof(*ext_elog));
> > >> +
> > >> +    env->gpr[3] = rtas_addr + RTAS_ERROR_LOG_OFFSET;
> > >> +    env->nip = spapr->guest_machine_check_addr;
> > >> +
> > >> +    g_free(ext_elog);
> > >> +}
> > >> +
> > >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > >>  {
> > >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > >> @@ -641,6 +873,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> > >>          }
> > >>      }
> > >>      spapr->mc_status = cpu->vcpu_id;
> > >> +
> > >> +    spapr_mce_dispatch_elog(cpu, recovered);
> > >> +
> > >> +    return;  
> > > 
> > > Drop the last two lines.  
> > 
> > ok.
> > 
> > >   
> > >>  }
> > >>  
> > >>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > >> index fc3a776..c717ab2 100644
> > >> --- a/include/hw/ppc/spapr.h
> > >> +++ b/include/hw/ppc/spapr.h
> > >> @@ -710,6 +710,9 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr);
> > >>  
> > >>  #define RTAS_ERROR_LOG_MAX      2048
> > >>  
> > >> +/* Offset from rtas-base where error log is placed */
> > >> +#define RTAS_ERROR_LOG_OFFSET       0x30
> > >> +
> > >>  #define RTAS_EVENT_SCAN_RATE    1
> > >>  
> > >>  /* This helper should be used to encode interrupt specifiers when the related
> > >> @@ -799,6 +802,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
> > >>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> > >>                        uint64_t pte0, uint64_t pte1);
> > >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> > >> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
> > >>    
> > > 
> > > Looks like a leftover.  
> > 
> > ah.. yes.
> > 
> > >   
> > >>  /* DRC callbacks. */
> > >>  void spapr_core_release(DeviceState *dev);
> > >>  
> > >   
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability Aravinda Prasad
  2019-06-03 15:25   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-06-06  3:00   ` David Gibson
  1 sibling, 0 replies; 44+ messages in thread
From: David Gibson @ 2019-06-06  3:00 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 8256 bytes --]

On Wed, May 29, 2019 at 11:10:49AM +0530, Aravinda Prasad wrote:
> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
> the KVM causes guest exit with NMI as exit reason
> when it encounters a machine check exception on the
> address belonging to a guest. Without this capability
> enabled, KVM redirects machine check exceptions to
> guest's 0x200 vector.
> 
> This patch also deals with the case when a guest with
> the KVM_CAP_PPC_FWNMI capability enabled is attempted
> to migrate to a host that does not support this
> capability.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    1 +
>  hw/ppc/spapr_caps.c    |   24 ++++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c    |   18 ++++++++++++++++++
>  include/hw/ppc/spapr.h |    4 +++-
>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>  6 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c97f6a6..e8a77636 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4364,6 +4364,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;

You need to turn this back off by default for the older machine types.

>      spapr_caps_add_properties(smc, &error_abort);
>      smc->irq = &spapr_irq_dual;
>      smc->dr_phb_enabled = true;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 31b4661..ef9e612 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -479,6 +479,20 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>      }
>  }
>  
> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> +                                Error **errp)
> +{
> +    if (!val) {
> +        return; /* Disabled by default */
> +    }
> +
> +    if (tcg_enabled()) {
> +            error_setg(errp, "No fwnmi support in TCG, try cap-fwnmi-mce=off");
> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
> +    }
> +}
> +
>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>      [SPAPR_CAP_HTM] = {
>          .name = "htm",
> @@ -578,6 +592,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>          .type = "bool",
>          .apply = cap_ccf_assist_apply,
>      },
> +    [SPAPR_CAP_FWNMI_MCE] = {
> +        .name = "fwnmi-mce",
> +        .description = "Handle fwnmi machine check exceptions",
> +        .index = SPAPR_CAP_FWNMI_MCE,
> +        .get = spapr_cap_get_bool,
> +        .set = spapr_cap_set_bool,
> +        .type = "bool",
> +        .apply = cap_fwnmi_mce_apply,
> +    },
>  };
>  
>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> @@ -717,6 +740,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>  
>  void spapr_caps_init(SpaprMachineState *spapr)
>  {
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index e0bdfc8..91a7ab9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -49,6 +49,7 @@
>  #include "hw/ppc/fdt.h"
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
> +#include "kvm_ppc.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -358,6 +359,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>                                    target_ulong args,
>                                    uint32_t nret, target_ulong rets)
>  {
> +    int ret;
>      hwaddr rtas_addr = spapr_get_rtas_addr();
>  
>      if (!rtas_addr) {
> @@ -365,6 +367,22 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>          return;
>      }
>  
> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == 0) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    ret = kvmppc_fwnmi_enable(cpu);
> +    if (ret == 1) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    if (ret < 0) {
> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +        return;
> +    }
> +
>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index c717ab2..bd75d4b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -78,8 +78,10 @@ typedef enum {
>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>  /* Count Cache Flush Assist HW Instruction */
>  #define SPAPR_CAP_CCF_ASSIST            0x09
> +/* FWNMI machine check handling */
> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>  /* Num Caps */
> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>  
>  /*
>   * Capability Values
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 39f1a73..368ec6e 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
>  static int cap_ppc_count_cache_flush_assist;
>  static int cap_ppc_nested_kvm_hv;
>  static int cap_large_decr;
> +static int cap_ppc_fwnmi;
>  
>  static uint32_t debug_inst_opcode;
>  
> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      kvmppc_get_cpu_characteristics(s);
>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>      cap_large_decr = kvmppc_get_dec_bits();
> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>      /*
>       * Note: setting it to false because there is not such capability
>       * in KVM at this moment.
> @@ -2119,6 +2121,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>      }
>  }
>  
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!cap_ppc_fwnmi) {
> +        return 1;
> +    }
> +
> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> +}
> +
> +
>  int kvmppc_smt_threads(void)
>  {
>      return cap_ppc_smt ? cap_ppc_smt : 1;
> @@ -2419,6 +2433,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>      return cap_mmu_hash_v3;
>  }
>  
> +bool kvmppc_has_cap_ppc_fwnmi(void)
> +{
> +    return cap_ppc_fwnmi;
> +}
> +
>  static bool kvmppc_power8_host(void)
>  {
>      bool ret = false;
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 18693f1..3d9f0b4 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
>  void kvmppc_set_papr(PowerPCCPU *cpu);
>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
> +bool kvmppc_has_cap_ppc_fwnmi(void);
>  int kvmppc_smt_threads(void);
>  void kvmppc_hint_smt_possible(Error **errp);
>  int kvmppc_set_smt_threads(int smt);
> @@ -160,6 +162,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  {
>  }
>  
> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    return 1;
> +}
> +
> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
> +{
> +    return false;
> +}
> +
>  static inline int kvmppc_smt_threads(void)
>  {
>      return 1;
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability
  2019-06-04  6:45     ` Aravinda Prasad
@ 2019-06-06  3:02       ` David Gibson
  2019-06-06  4:50         ` Aravinda Prasad
  2019-06-06  7:51         ` Aravinda Prasad
  0 siblings, 2 replies; 44+ messages in thread
From: David Gibson @ 2019-06-06  3:02 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 10361 bytes --]

On Tue, Jun 04, 2019 at 12:15:26PM +0530, Aravinda Prasad wrote:
> 
> 
> On Monday 03 June 2019 08:55 PM, Greg Kurz wrote:
> > On Wed, 29 May 2019 11:10:49 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > 
> >> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
> >> the KVM causes guest exit with NMI as exit reason
> >> when it encounters a machine check exception on the
> >> address belonging to a guest. Without this capability
> >> enabled, KVM redirects machine check exceptions to
> >> guest's 0x200 vector.
> >>
> >> This patch also deals with the case when a guest with
> >> the KVM_CAP_PPC_FWNMI capability enabled is attempted
> >> to migrate to a host that does not support this
> >> capability.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> > 
> > As suggested in another mail, it may be worth introducing the sPAPR cap
> > in its own patch, earlier in the series.
> 
> Sure, also as a workaround mentioned in the reply to that mail, I am
> thinking of returning RTAS_OUT_NOT_SUPPORTED to rtas nmi register call
> until the entire functionality is implemented. This will help solve
> spapr cap issue as well.

Not registering the RTAS call at all is the correct way to handle that
case.

> 
> > 
> > Anyway, I have some comments below.
> > 
> >>  hw/ppc/spapr.c         |    1 +
> >>  hw/ppc/spapr_caps.c    |   24 ++++++++++++++++++++++++
> >>  hw/ppc/spapr_rtas.c    |   18 ++++++++++++++++++
> >>  include/hw/ppc/spapr.h |    4 +++-
> >>  target/ppc/kvm.c       |   19 +++++++++++++++++++
> >>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
> >>  6 files changed, 77 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index c97f6a6..e8a77636 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -4364,6 +4364,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>      spapr_caps_add_properties(smc, &error_abort);
> >>      smc->irq = &spapr_irq_dual;
> >>      smc->dr_phb_enabled = true;
> >> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >> index 31b4661..ef9e612 100644
> >> --- a/hw/ppc/spapr_caps.c
> >> +++ b/hw/ppc/spapr_caps.c
> >> @@ -479,6 +479,20 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>      }
> >>  }
> >>  
> >> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >> +                                Error **errp)
> >> +{
> >> +    if (!val) {
> >> +        return; /* Disabled by default */
> >> +    }
> >> +
> >> +    if (tcg_enabled()) {
> >> +            error_setg(errp, "No fwnmi support in TCG, try cap-fwnmi-mce=off");
> > 
> > Maybe expand "fwnmi" to "Firmware Assisted Non-Maskable Interrupts" ?
> 
> sure..
> 
> > 
> >> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
> >> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
> > 
> > Maybe reword and add a hint:
> > 
> > "KVM implementation does not support Firmware Assisted Non-Maskable Interrupts, try cap-fwnmi-mce=off"
> 
> sure..
> 
> > 
> > 
> >> +    }
> >> +}
> >> +
> >>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>      [SPAPR_CAP_HTM] = {
> >>          .name = "htm",
> >> @@ -578,6 +592,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>          .type = "bool",
> >>          .apply = cap_ccf_assist_apply,
> >>      },
> >> +    [SPAPR_CAP_FWNMI_MCE] = {
> >> +        .name = "fwnmi-mce",
> >> +        .description = "Handle fwnmi machine check exceptions",
> >> +        .index = SPAPR_CAP_FWNMI_MCE,
> >> +        .get = spapr_cap_get_bool,
> >> +        .set = spapr_cap_set_bool,
> >> +        .type = "bool",
> >> +        .apply = cap_fwnmi_mce_apply,
> >> +    },
> >>  };
> >>  
> >>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> >> @@ -717,6 +740,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
> >>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> >>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
> >>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> >> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
> >>  
> >>  void spapr_caps_init(SpaprMachineState *spapr)
> >>  {
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index e0bdfc8..91a7ab9 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -49,6 +49,7 @@
> >>  #include "hw/ppc/fdt.h"
> >>  #include "target/ppc/mmu-hash64.h"
> >>  #include "target/ppc/mmu-book3s-v3.h"
> >> +#include "kvm_ppc.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -358,6 +359,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>                                    target_ulong args,
> >>                                    uint32_t nret, target_ulong rets)
> >>  {
> >> +    int ret;
> >>      hwaddr rtas_addr = spapr_get_rtas_addr();
> >>  
> >>      if (!rtas_addr) {
> >> @@ -365,6 +367,22 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>          return;
> >>      }
> >>  
> >> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == 0) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    }
> >> +
> >> +    ret = kvmppc_fwnmi_enable(cpu);
> >> +    if (ret == 1) {
> > 
> > I have the impression that this should really not happen,
> > otherwise something has gone terribly wrong in QEMU or
> > in KVM... this maybe deserves an error message as well ?
> > No big deal.
> 
> I think so..  will add an error message.

Right, and because this is essentially a host side error,
RTAS_OUT_HW_ERROR would be more appropriate.

> 
> Also I should check for non zero return value, not just ret == 1, as
> kvmppc_fwnmi_enable() returns the error value from ioctl().
> 
> 
> > 
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    }
> >> +
> >> +    if (ret < 0) {
> >> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >> +        return;
> >> +    }
> >> +
> >>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>  }
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index c717ab2..bd75d4b 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -78,8 +78,10 @@ typedef enum {
> >>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
> >>  /* Count Cache Flush Assist HW Instruction */
> >>  #define SPAPR_CAP_CCF_ASSIST            0x09
> >> +/* FWNMI machine check handling */
> >> +#define SPAPR_CAP_FWNMI_MCE             0x0A
> >>  /* Num Caps */
> >> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
> >> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
> >>  
> >>  /*
> >>   * Capability Values
> >> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> >> index 39f1a73..368ec6e 100644
> >> --- a/target/ppc/kvm.c
> >> +++ b/target/ppc/kvm.c
> >> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
> >>  static int cap_ppc_count_cache_flush_assist;
> >>  static int cap_ppc_nested_kvm_hv;
> >>  static int cap_large_decr;
> >> +static int cap_ppc_fwnmi;
> >>  
> >>  static uint32_t debug_inst_opcode;
> >>  
> >> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >>      kvmppc_get_cpu_characteristics(s);
> >>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
> >>      cap_large_decr = kvmppc_get_dec_bits();
> >> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
> >>      /*
> >>       * Note: setting it to false because there is not such capability
> >>       * in KVM at this moment.
> >> @@ -2119,6 +2121,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>      }
> >>  }
> >>  
> >> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> >> +{
> >> +    CPUState *cs = CPU(cpu);
> >> +
> >> +    if (!cap_ppc_fwnmi) {
> >> +        return 1;
> >> +    }
> >> +
> >> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> >> +}
> >> +
> >> +
> >>  int kvmppc_smt_threads(void)
> >>  {
> >>      return cap_ppc_smt ? cap_ppc_smt : 1;
> >> @@ -2419,6 +2433,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
> >>      return cap_mmu_hash_v3;
> >>  }
> >>  
> >> +bool kvmppc_has_cap_ppc_fwnmi(void)
> >> +{
> >> +    return cap_ppc_fwnmi;
> >> +}
> >> +
> >>  static bool kvmppc_power8_host(void)
> >>  {
> >>      bool ret = false;
> >> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> >> index 18693f1..3d9f0b4 100644
> >> --- a/target/ppc/kvm_ppc.h
> >> +++ b/target/ppc/kvm_ppc.h
> >> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
> >>  void kvmppc_set_papr(PowerPCCPU *cpu);
> >>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
> >>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
> >> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
> >> +bool kvmppc_has_cap_ppc_fwnmi(void);
> >>  int kvmppc_smt_threads(void);
> >>  void kvmppc_hint_smt_possible(Error **errp);
> >>  int kvmppc_set_smt_threads(int smt);
> >> @@ -160,6 +162,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>  {
> >>  }
> >>  
> >> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> >> +{
> >> +    return 1;
> >> +}
> >> +
> >> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
> >> +{
> >> +    return false;
> >> +}
> >> +
> >>  static inline int kvmppc_smt_threads(void)
> >>  {
> >>      return 1;
> >>
> >>
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling Aravinda Prasad
  2019-06-03 15:40   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-06-06  3:06   ` David Gibson
  2019-06-06  6:06     ` Greg Kurz
  2019-06-06 11:25     ` Aravinda Prasad
  1 sibling, 2 replies; 44+ messages in thread
From: David Gibson @ 2019-06-06  3:06 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5336 bytes --]

On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:
> This patch includes migration support for machine check
> handling. Especially this patch blocks VM migration
> requests until the machine check error handling is
> complete as (i) these errors are specific to the source
> hardware and is irrelevant on the target hardware,
> (ii) these errors cause data corruption and should
> be handled before migration.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>  hw/ppc/spapr_rtas.c    |    4 ++++
>  include/hw/ppc/spapr.h |    2 ++
>  4 files changed, 43 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e8a77636..31c4850 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
>      },
>  };
>  
> +static bool spapr_fwnmi_needed(void *opaque)
> +{
> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> +
> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;

Since we're introducing a PAPR capability to enable this, it would
actually be better to check that here, rather than the runtime state.
That leads to less cases and easier to understand semantics for the
migration stream.

> +}
> +
> +static const VMStateDescription vmstate_spapr_machine_check = {
> +    .name = "spapr_machine_check",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_fwnmi_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_dtb,
>          &vmstate_spapr_cap_large_decr,
>          &vmstate_spapr_cap_ccf_assist,
> +        &vmstate_spapr_machine_check,
>          NULL
>      }
>  };
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 573c0b7..35e21e4 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -41,6 +41,7 @@
>  #include "qemu/bcd.h"
>  #include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
> +#include "migration/blocker.h"
>  
>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>  #define   RTAS_LOG_VERSION_6                    0x06000000
> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int ret;
> +    Error *local_err = NULL;
> +
> +    error_setg(&spapr->fwnmi_migration_blocker,
> +            "Live migration not supported during machine check handling");
> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> +    if (ret < 0) {
> +        /*
> +         * We don't want to abort and let the migration to continue. In a
> +         * rare case, the machine check handler will run on the target
> +         * hardware. Though this is not preferable, it is better than aborting
> +         * the migration or killing the VM.
> +         */
> +        error_free(spapr->fwnmi_migration_blocker);

You should set fwnmi_migration_blocker to NULL here as well.

As mentioned on an earlier iteration, the migration blocker is the
same every time.  Couldn't you just create it once and free at final
teardown, rather than recreating it for every NMI?

> +        warn_report_err(local_err);
> +    }
>  
>      while (spapr->mc_status != -1) {
>          /*
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 91a7ab9..c849223 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -50,6 +50,7 @@
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
>  #include "kvm_ppc.h"
> +#include "migration/blocker.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> +        error_free(spapr->fwnmi_migration_blocker);
> +        spapr->fwnmi_migration_blocker = NULL;
>      }
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index bd75d4b..6c0cfd8 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -214,6 +214,8 @@ struct SpaprMachineState {
>      SpaprCapabilities def, eff, mig;
>  
>      unsigned gpu_numa_id;
> +
> +    Error *fwnmi_migration_blocker;
>  };
>  
>  #define H_SUCCESS         0
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-06  1:35       ` David Gibson
@ 2019-06-06  4:39         ` Aravinda Prasad
  0 siblings, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06  4:39 UTC (permalink / raw)
  To: David Gibson, Greg Kurz; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Thursday 06 June 2019 07:05 AM, David Gibson wrote:
> On Mon, Jun 03, 2019 at 01:17:23PM +0200, Greg Kurz wrote:
>> On Mon, 3 Jun 2019 12:12:43 +0200
>> Greg Kurz <groug@kaod.org> wrote:
>>
>>> On Wed, 29 May 2019 11:10:14 +0530
>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>
>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>>> and "ibm,nmi-interlock" RTAS calls.
>>>>
>>>> The machine check notification address is saved when the
>>>> OS issues "ibm,nmi-register" RTAS call.
>>>>
>>>> This patch also handles the case when multiple processors
>>>> experience machine check at or about the same time by
>>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>>> PAPR, subsequent processors serialize waiting for the first
>>>> processor to issue the "ibm,nmi-interlock" call. The second
>>>> processor that also received a machine check error waits
>>>> till the first processor is done reading the error log.
>>>> The first processor issues "ibm,nmi-interlock" call
>>>> when the error log is consumed. This patch implements the
>>>> releasing part of the error-log while subsequent patch
>>>> (which builds error log) handles the locking part.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>>>> ---  
>>>
>>> The code looks okay but it still seems wrong to advertise the RTAS
>>> calls to the guest that early in the series. The linux kernel in
>>> the guest will assume FWNMI is functional, which isn't true until
>>> patch 6 (yes, migration is part of the feature, it should be
>>> supported upfront, not fixed afterwards).
>>>
>>> It doesn't help much to introduce the RTAS calls early and to
>>> modify them in the other patches. I'd rather see the rest of
>>> the code first and a final patch that introduces the fully
>>> functional RTAS calls and calls spapr_rtas_register().
>>>
>>
>> Thinking again, you should introduce the "fwnmi-mce" spapr capability in
>> its own patch first, default to "off" and and have the last patch in the
>> series to switch the default to "on" for newer machine types only.
>>
>> This patch should then only register the RTAS calls if "fwnmi-mcr" is set
>> to "on".
> 
> Yes, I think this is a good approach.

ok

> 
>> This should address the fact that we don't want to expose a partially
>> implemented FWNMI feature to the guest, and we don't want to support
>> FWNMI at all with older machine types for the sake of compatibility.
>>
>>>>  hw/ppc/spapr.c         |    7 +++++
>>>>  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |    9 ++++++-
>>>>  3 files changed, 80 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index e2b33e5..fae28a9 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
>>>>      first_ppc_cpu->env.gpr[5] = 0;
>>>>  
>>>>      spapr->cas_reboot = false;
>>>> +
>>>> +    spapr->guest_machine_check_addr = -1;
>>>> +
>>>> +    /* Signal all vCPUs waiting on this condition */
>>>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>>>  }
>>>>  
>>>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>>>> @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
>>>>  
>>>>          kvmppc_spapr_enable_inkernel_multitce();
>>>>      }
>>>> +
>>>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>>>  }
>>>>  
>>>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index 5bc1a93..e7509cf 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>      rtas_st(rets, 1, 100);
>>>>  }
>>>>  
>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>> +                                  SpaprMachineState *spapr,
>>>> +                                  uint32_t token, uint32_t nargs,
>>>> +                                  target_ulong args,
>>>> +                                  uint32_t nret, target_ulong rets)
>>>> +{
>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>>> +
>>>> +    if (!rtas_addr) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>> +}
>>>> +
>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>> +                                   SpaprMachineState *spapr,
>>>> +                                   uint32_t token, uint32_t nargs,
>>>> +                                   target_ulong args,
>>>> +                                   uint32_t nret, target_ulong rets)
>>>> +{
>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>> +        /* NMI register not called */
>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>> +    } else {
>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>> +    }
>>>> +}
>>>> +
>>>>  static struct rtas_call {
>>>>      const char *name;
>>>>      spapr_rtas_fn fn;
>>>> @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>>>      }
>>>>  }
>>>>  
>>>> +hwaddr spapr_get_rtas_addr(void)
>>>> +{
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> +    int rtas_node;
>>>> +    const struct fdt_property *rtas_addr_prop;
>>>> +    void *fdt = spapr->fdt_blob;
>>>> +    uint32_t rtas_addr;
>>>> +
>>>> +    /* fetch rtas addr from fdt */
>>>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>>>> +    if (rtas_node == 0) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
>>>> +    if (!rtas_addr_prop) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * We assume that the OS called RTAS instantiate-rtas, but some other
>>>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
>>>> +     * as SLOF only supports 32-bit variant.
>>>> +     */
>>>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>>>> +    return (hwaddr)rtas_addr;
>>>> +}
>>>> +
>>>> +
>>>>  static void core_rtas_register_types(void)
>>>>  {
>>>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>>>> @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
>>>>                          rtas_set_power_level);
>>>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>>>                          rtas_get_power_level);
>>>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>>>> +                        rtas_ibm_nmi_register);
>>>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>>>> +                        rtas_ibm_nmi_interlock);
>>>>  }
>>>>  
>>>>  type_init(core_rtas_register_types)
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 4f5becf..9dc5e30 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -188,6 +188,10 @@ struct SpaprMachineState {
>>>>       * occurs during the unplug process. */
>>>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>>>  
>>>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>>> +    target_ulong guest_machine_check_addr;
>>>> +    QemuCond mc_delivery_cond;
>>>> +
>>>>      /*< public >*/
>>>>      char *kvm_type;
>>>>      char *host_model;
>>>> @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>>>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>>>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>>>  
>>>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>>>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>>>  
>>>>  /* RTAS ibm,get-system-parameter token values */
>>>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>>> @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>>>  
>>>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>>> +uint64_t spapr_get_rtas_addr(void);
>>>>  #endif /* HW_SPAPR_H */
>>>>   
>>>
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit
  2019-06-06  1:43   ` [Qemu-devel] " David Gibson
@ 2019-06-06  4:45     ` Aravinda Prasad
  0 siblings, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06  4:45 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Thursday 06 June 2019 07:13 AM, David Gibson wrote:
> On Wed, May 29, 2019 at 11:10:32AM +0530, Aravinda Prasad wrote:
>> Memory error such as bit flips that cannot be corrected
>> by hardware are passed on to the kernel for handling.
>> If the memory address in error belongs to guest then
>> the guest kernel is responsible for taking suitable action.
>> Patch [1] enhances KVM to exit guest with exit reason
>> set to KVM_EXIT_NMI in such cases. This patch handles
>> KVM_EXIT_NMI exit.
>>
>> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>>     (e20bbd3d and related commits)
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c          |    1 +
>>  hw/ppc/spapr_events.c   |   23 +++++++++++++++++++++++
>>  hw/ppc/spapr_rtas.c     |    5 +++++
>>  include/hw/ppc/spapr.h  |    6 ++++++
>>  target/ppc/kvm.c        |   16 ++++++++++++++++
>>  target/ppc/kvm_ppc.h    |    2 ++
>>  target/ppc/trace-events |    1 +
>>  7 files changed, 54 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index fae28a9..6b6c962 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1809,6 +1809,7 @@ static void spapr_machine_reset(void)
>>  
>>      spapr->cas_reboot = false;
>>  
>> +    spapr->mc_status = -1;
>>      spapr->guest_machine_check_addr = -1;
>>  
>>      /* Signal all vCPUs waiting on this condition */
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index ae0f093..a18446b 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -620,6 +620,29 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>  }
>>  
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> 
> You ignore the 'recovered' parameter, is that right?

I use the "recovered" parameter, but in the next patch. This was left
out when the patch was split in one of the earlier versions. Will modify it.

> 
>> +    while (spapr->mc_status != -1) {
>> +        /*
>> +         * Check whether the same CPU got machine check error
>> +         * while still handling the mc error (i.e., before
>> +         * that CPU called "ibm,nmi-interlock")
>> +         */
>> +        if (spapr->mc_status == cpu->vcpu_id) {
>> +            qemu_system_guest_panicked(NULL);
>> +            return;
>> +        }
>> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
>> +        /* Meanwhile if the system is reset, then just return */
>> +        if (spapr->guest_machine_check_addr == -1) {
>> +            return;
>> +        }
>> +    }
>> +    spapr->mc_status = cpu->vcpu_id;
>> +}
>> +
>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                              uint32_t token, uint32_t nargs,
>>                              target_ulong args,
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index e7509cf..e0bdfc8 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -379,6 +379,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          /* NMI register not called */
>>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>      } else {
>> +        /*
>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +         * hence unset mc_status.
>> +         */
>> +        spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>      }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 9dc5e30..fc3a776 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -190,6 +190,11 @@ struct SpaprMachineState {
>>  
>>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>      target_ulong guest_machine_check_addr;
>> +    /*
>> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
>> +     * handling the mc.
>> +     */
>> +    int mc_status;
>>      QemuCond mc_delivery_cond;
>>  
>>      /*< public >*/
>> @@ -793,6 +798,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>>  int spapr_max_server_number(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                        uint64_t pte0, uint64_t pte1);
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>  
>>  /* DRC callbacks. */
>>  void spapr_core_release(DeviceState *dev);
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 3bf0a46..39f1a73 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -1761,6 +1761,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>          ret = 0;
>>          break;
>>  
>> +    case KVM_EXIT_NMI:
>> +        trace_kvm_handle_nmi_exception();
>> +        ret = kvm_handle_nmi(cpu, run);
>> +        break;
>> +
>>      default:
>>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>          ret = -1;
>> @@ -2844,6 +2849,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>      return data & 0xffff;
>>  }
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>> +{
>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>> +
>> +    cpu_synchronize_state(CPU(cpu));
>> +
>> +    spapr_mce_req_event(cpu, recovered);
> 
> Urgh.. KVM calling directly into spapr code isn't conceptually
> correct, although it's usually kind of ok in practice.  I guess we
> already do it for hypercalls, so this is no worse.  If we ever have
> NMI events for KVM PR or BookE KVM which could happen for non-PAPR
> guests, I guess we'll need to re-examine this.

ok.

> 
>> +    return 0;
>> +}
>> +
>>  int kvmppc_enable_hwrng(void)
>>  {
>>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index 45776ca..18693f1 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -81,6 +81,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
>> +
>>  #else
>>  
>>  static inline uint32_t kvmppc_get_tbfreq(void)
>> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
>> index 3dc6740..6d15aa9 100644
>> --- a/target/ppc/trace-events
>> +++ b/target/ppc/trace-events
>> @@ -28,3 +28,4 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>>  kvm_handle_epr(void) "handle epr"
>>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>>  kvm_handle_debug_exception(void) "handle debug exception"
>> +kvm_handle_nmi_exception(void) "handle NMI exception"
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability
  2019-06-06  3:02       ` David Gibson
@ 2019-06-06  4:50         ` Aravinda Prasad
  2019-06-06  7:51         ` Aravinda Prasad
  1 sibling, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06  4:50 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc



On Thursday 06 June 2019 08:32 AM, David Gibson wrote:
> On Tue, Jun 04, 2019 at 12:15:26PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Monday 03 June 2019 08:55 PM, Greg Kurz wrote:
>>> On Wed, 29 May 2019 11:10:49 +0530
>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>
>>>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
>>>> the KVM causes guest exit with NMI as exit reason
>>>> when it encounters a machine check exception on the
>>>> address belonging to a guest. Without this capability
>>>> enabled, KVM redirects machine check exceptions to
>>>> guest's 0x200 vector.
>>>>
>>>> This patch also deals with the case when a guest with
>>>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>> to migrate to a host that does not support this
>>>> capability.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>
>>> As suggested in another mail, it may be worth introducing the sPAPR cap
>>> in its own patch, earlier in the series.
>>
>> Sure, also as a workaround mentioned in the reply to that mail, I am
>> thinking of returning RTAS_OUT_NOT_SUPPORTED to rtas nmi register call
>> until the entire functionality is implemented. This will help solve
>> spapr cap issue as well.
> 
> Not registering the RTAS call at all is the correct way to handle that
> case.

ok.

> 
>>
>>>
>>> Anyway, I have some comments below.
>>>
>>>>  hw/ppc/spapr.c         |    1 +
>>>>  hw/ppc/spapr_caps.c    |   24 ++++++++++++++++++++++++
>>>>  hw/ppc/spapr_rtas.c    |   18 ++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>>>  6 files changed, 77 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index c97f6a6..e8a77636 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -4364,6 +4364,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>      smc->irq = &spapr_irq_dual;
>>>>      smc->dr_phb_enabled = true;
>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>> index 31b4661..ef9e612 100644
>>>> --- a/hw/ppc/spapr_caps.c
>>>> +++ b/hw/ppc/spapr_caps.c
>>>> @@ -479,6 +479,20 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>      }
>>>>  }
>>>>  
>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>> +                                Error **errp)
>>>> +{
>>>> +    if (!val) {
>>>> +        return; /* Disabled by default */
>>>> +    }
>>>> +
>>>> +    if (tcg_enabled()) {
>>>> +            error_setg(errp, "No fwnmi support in TCG, try cap-fwnmi-mce=off");
>>>
>>> Maybe expand "fwnmi" to "Firmware Assisted Non-Maskable Interrupts" ?
>>
>> sure..
>>
>>>
>>>> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
>>>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
>>>
>>> Maybe reword and add a hint:
>>>
>>> "KVM implementation does not support Firmware Assisted Non-Maskable Interrupts, try cap-fwnmi-mce=off"
>>
>> sure..
>>
>>>
>>>
>>>> +    }
>>>> +}
>>>> +
>>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>      [SPAPR_CAP_HTM] = {
>>>>          .name = "htm",
>>>> @@ -578,6 +592,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>          .type = "bool",
>>>>          .apply = cap_ccf_assist_apply,
>>>>      },
>>>> +    [SPAPR_CAP_FWNMI_MCE] = {
>>>> +        .name = "fwnmi-mce",
>>>> +        .description = "Handle fwnmi machine check exceptions",
>>>> +        .index = SPAPR_CAP_FWNMI_MCE,
>>>> +        .get = spapr_cap_get_bool,
>>>> +        .set = spapr_cap_set_bool,
>>>> +        .type = "bool",
>>>> +        .apply = cap_fwnmi_mce_apply,
>>>> +    },
>>>>  };
>>>>  
>>>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>>>> @@ -717,6 +740,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>>>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>>>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>>>  
>>>>  void spapr_caps_init(SpaprMachineState *spapr)
>>>>  {
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index e0bdfc8..91a7ab9 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -49,6 +49,7 @@
>>>>  #include "hw/ppc/fdt.h"
>>>>  #include "target/ppc/mmu-hash64.h"
>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>> +#include "kvm_ppc.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -358,6 +359,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>                                    target_ulong args,
>>>>                                    uint32_t nret, target_ulong rets)
>>>>  {
>>>> +    int ret;
>>>>      hwaddr rtas_addr = spapr_get_rtas_addr();
>>>>  
>>>>      if (!rtas_addr) {
>>>> @@ -365,6 +367,22 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>          return;
>>>>      }
>>>>  
>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == 0) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>> +    if (ret == 1) {
>>>
>>> I have the impression that this should really not happen,
>>> otherwise something has gone terribly wrong in QEMU or
>>> in KVM... this maybe deserves an error message as well ?
>>> No big deal.
>>
>> I think so..  will add an error message.
> 
> Right, and because this is essentially a host side error,
> RTAS_OUT_HW_ERROR would be more appropriate.

sure

> 
>>
>> Also I should check for non zero return value, not just ret == 1, as
>> kvmppc_fwnmi_enable() returns the error value from ioctl().
>>
>>
>>>
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (ret < 0) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> +        return;
>>>> +    }
>>>> +
>>>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>  }
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index c717ab2..bd75d4b 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -78,8 +78,10 @@ typedef enum {
>>>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>>>>  /* Count Cache Flush Assist HW Instruction */
>>>>  #define SPAPR_CAP_CCF_ASSIST            0x09
>>>> +/* FWNMI machine check handling */
>>>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>>>>  /* Num Caps */
>>>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
>>>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>>>>  
>>>>  /*
>>>>   * Capability Values
>>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>>>> index 39f1a73..368ec6e 100644
>>>> --- a/target/ppc/kvm.c
>>>> +++ b/target/ppc/kvm.c
>>>> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
>>>>  static int cap_ppc_count_cache_flush_assist;
>>>>  static int cap_ppc_nested_kvm_hv;
>>>>  static int cap_large_decr;
>>>> +static int cap_ppc_fwnmi;
>>>>  
>>>>  static uint32_t debug_inst_opcode;
>>>>  
>>>> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>      kvmppc_get_cpu_characteristics(s);
>>>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>>>>      cap_large_decr = kvmppc_get_dec_bits();
>>>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>>>      /*
>>>>       * Note: setting it to false because there is not such capability
>>>>       * in KVM at this moment.
>>>> @@ -2119,6 +2121,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>      }
>>>>  }
>>>>  
>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>> +{
>>>> +    CPUState *cs = CPU(cpu);
>>>> +
>>>> +    if (!cap_ppc_fwnmi) {
>>>> +        return 1;
>>>> +    }
>>>> +
>>>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>>>> +}
>>>> +
>>>> +
>>>>  int kvmppc_smt_threads(void)
>>>>  {
>>>>      return cap_ppc_smt ? cap_ppc_smt : 1;
>>>> @@ -2419,6 +2433,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>>>>      return cap_mmu_hash_v3;
>>>>  }
>>>>  
>>>> +bool kvmppc_has_cap_ppc_fwnmi(void)
>>>> +{
>>>> +    return cap_ppc_fwnmi;
>>>> +}
>>>> +
>>>>  static bool kvmppc_power8_host(void)
>>>>  {
>>>>      bool ret = false;
>>>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>>>> index 18693f1..3d9f0b4 100644
>>>> --- a/target/ppc/kvm_ppc.h
>>>> +++ b/target/ppc/kvm_ppc.h
>>>> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
>>>>  void kvmppc_set_papr(PowerPCCPU *cpu);
>>>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>>>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>>>> +bool kvmppc_has_cap_ppc_fwnmi(void);
>>>>  int kvmppc_smt_threads(void);
>>>>  void kvmppc_hint_smt_possible(Error **errp);
>>>>  int kvmppc_set_smt_threads(int smt);
>>>> @@ -160,6 +162,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>  {
>>>>  }
>>>>  
>>>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>> +{
>>>> +    return 1;
>>>> +}
>>>> +
>>>> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
>>>> +{
>>>> +    return false;
>>>> +}
>>>> +
>>>>  static inline int kvmppc_smt_threads(void)
>>>>  {
>>>>      return 1;
>>>>
>>>>
>>>
>>>
>>
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-04 14:50         ` Greg Kurz
@ 2019-06-06  5:17           ` Aravinda Prasad
  0 siblings, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06  5:17 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Tuesday 04 June 2019 08:20 PM, Greg Kurz wrote:
> On Tue, 4 Jun 2019 11:38:31 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> On Monday 03 June 2019 04:47 PM, Greg Kurz wrote:
>>> On Mon, 3 Jun 2019 12:12:43 +0200
>>> Greg Kurz <groug@kaod.org> wrote:
>>>   
>>>> On Wed, 29 May 2019 11:10:14 +0530
>>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>>  
>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>>>> and "ibm,nmi-interlock" RTAS calls.
>>>>>
>>>>> The machine check notification address is saved when the
>>>>> OS issues "ibm,nmi-register" RTAS call.
>>>>>
>>>>> This patch also handles the case when multiple processors
>>>>> experience machine check at or about the same time by
>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>>>> PAPR, subsequent processors serialize waiting for the first
>>>>> processor to issue the "ibm,nmi-interlock" call. The second
>>>>> processor that also received a machine check error waits
>>>>> till the first processor is done reading the error log.
>>>>> The first processor issues "ibm,nmi-interlock" call
>>>>> when the error log is consumed. This patch implements the
>>>>> releasing part of the error-log while subsequent patch
>>>>> (which builds error log) handles the locking part.
>>>>>
>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>>>>> ---    
>>>>
>>>> The code looks okay but it still seems wrong to advertise the RTAS
>>>> calls to the guest that early in the series. The linux kernel in
>>>> the guest will assume FWNMI is functional, which isn't true until
>>>> patch 6 (yes, migration is part of the feature, it should be
>>>> supported upfront, not fixed afterwards).
>>>>
>>>> It doesn't help much to introduce the RTAS calls early and to
>>>> modify them in the other patches. I'd rather see the rest of
>>>> the code first and a final patch that introduces the fully
>>>> functional RTAS calls and calls spapr_rtas_register().
>>>>  
>>>
>>> Thinking again, you should introduce the "fwnmi-mce" spapr capability in
>>> its own patch first, default to "off" and and have the last patch in the
>>> series to switch the default to "on" for newer machine types only.
>>>
>>> This patch should then only register the RTAS calls if "fwnmi-mcr" is set
>>> to "on".
>>>
>>> This should address the fact that we don't want to expose a partially
>>> implemented FWNMI feature to the guest, and we don't want to support
>>> FWNMI at all with older machine types for the sake of compatibility.  
>>
>> When you say "expose a partially implemented FWNMI feature to the
>> guest", do you mean while debugging/bisect we may end up with exposing
>> the partially implemented FWNMI feature? Otherwise it is expected that
> 
> Yes, we don't want to break someone else's bisect.

ok.

> 
>> QEMU runs with all the 6 patches.
>>
>> If that is the case, I will have the rtas nmi register functionality as
>> the last patch in the series. This way we don't have to have spapr cap
>> turned off first and later turned on. However, as mentioned earlier
>> (when David raised the same concern), use of guest_machine_check_addr
>> may look odd at other patches as it is set only during rtas nmi register.
>>
> 
> Why odd ?

see below

> 
>> Or else, as a workaround, I can return RTAS_OUT_NOT_SUPPORTED for rtas
>> nmi register till the entire functionality is implemented and only in
>> the last patch in the series I will return RTAS_OUT_SUCCESS. This will
>> ensure that we have a logical connection between the patches and the
>> partially implemented fwnmi is not exposed to the guest kernel.
>>
> 
> Not exactly true. FWNMI would be exposed to the guest in the device tree
> and the guest kernel would _just_ fail to set the fwnmi_active global:
> 
> 	if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
> 				machine_check_addr))
> 		fwnmi_active = 1;

Sorry for the confusion. I thought the suggestion was to introduce
rtas_ibm_nmi_register() call later in the series, but now I see that I
can still have rtas_ibm_nmi_register(), but have a final patch that
calls spapr_rtas_register().


> 
>> Regards,
>> Aravinda
>>
>>
>>
>>
>>>   
>>>>>  hw/ppc/spapr.c         |    7 +++++
>>>>>  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  include/hw/ppc/spapr.h |    9 ++++++-
>>>>>  3 files changed, 80 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>> index e2b33e5..fae28a9 100644
>>>>> --- a/hw/ppc/spapr.c
>>>>> +++ b/hw/ppc/spapr.c
>>>>> @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
>>>>>      first_ppc_cpu->env.gpr[5] = 0;
>>>>>  
>>>>>      spapr->cas_reboot = false;
>>>>> +
>>>>> +    spapr->guest_machine_check_addr = -1;
>>>>> +
>>>>> +    /* Signal all vCPUs waiting on this condition */
>>>>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>>>>  }
>>>>>  
>>>>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>>>>> @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
>>>>>  
>>>>>          kvmppc_spapr_enable_inkernel_multitce();
>>>>>      }
>>>>> +
>>>>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>>>>  }
>>>>>  
>>>>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>> index 5bc1a93..e7509cf 100644
>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>> @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>>      rtas_st(rets, 1, 100);
>>>>>  }
>>>>>  
>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>> +                                  SpaprMachineState *spapr,
>>>>> +                                  uint32_t token, uint32_t nargs,
>>>>> +                                  target_ulong args,
>>>>> +                                  uint32_t nret, target_ulong rets)
>>>>> +{
>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>>>> +
>>>>> +    if (!rtas_addr) {
>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>> +}
>>>>> +
>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>>> +                                   SpaprMachineState *spapr,
>>>>> +                                   uint32_t token, uint32_t nargs,
>>>>> +                                   target_ulong args,
>>>>> +                                   uint32_t nret, target_ulong rets)
>>>>> +{
>>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>>> +        /* NMI register not called */
>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>>> +    } else {
>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>  static struct rtas_call {
>>>>>      const char *name;
>>>>>      spapr_rtas_fn fn;
>>>>> @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>>>>      }
>>>>>  }
>>>>>  
>>>>> +hwaddr spapr_get_rtas_addr(void)
>>>>> +{
>>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>> +    int rtas_node;
>>>>> +    const struct fdt_property *rtas_addr_prop;
>>>>> +    void *fdt = spapr->fdt_blob;
>>>>> +    uint32_t rtas_addr;
>>>>> +
>>>>> +    /* fetch rtas addr from fdt */
>>>>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>>>>> +    if (rtas_node == 0) {
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
>>>>> +    if (!rtas_addr_prop) {
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    /*
>>>>> +     * We assume that the OS called RTAS instantiate-rtas, but some other
>>>>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
>>>>> +     * as SLOF only supports 32-bit variant.
>>>>> +     */
>>>>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>>>>> +    return (hwaddr)rtas_addr;
>>>>> +}
>>>>> +
>>>>> +
>>>>>  static void core_rtas_register_types(void)
>>>>>  {
>>>>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>>>>> @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
>>>>>                          rtas_set_power_level);
>>>>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>>>>                          rtas_get_power_level);
>>>>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>>>>> +                        rtas_ibm_nmi_register);
>>>>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>>>>> +                        rtas_ibm_nmi_interlock);
>>>>>  }
>>>>>  
>>>>>  type_init(core_rtas_register_types)
>>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>>> index 4f5becf..9dc5e30 100644
>>>>> --- a/include/hw/ppc/spapr.h
>>>>> +++ b/include/hw/ppc/spapr.h
>>>>> @@ -188,6 +188,10 @@ struct SpaprMachineState {
>>>>>       * occurs during the unplug process. */
>>>>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>>>>  
>>>>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>>>> +    target_ulong guest_machine_check_addr;
>>>>> +    QemuCond mc_delivery_cond;
>>>>> +
>>>>>      /*< public >*/
>>>>>      char *kvm_type;This means it isn't related to XIVE it to set
>>>>>      char *host_model;
>>>>> @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>>>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>>>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>>>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>>>>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>>>>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>>>>  
>>>>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>>>>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>>>>  
>>>>>  /* RTAS ibm,get-system-parameter token values */
>>>>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>>>> @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>>>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>>>>  
>>>>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>>>> +uint64_t spapr_get_rtas_addr(void);
>>>>>  #endif /* HW_SPAPR_H */
>>>>>     
>>>>
>>>>  
>>>   
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-06-06  1:34   ` [Qemu-devel] " David Gibson
@ 2019-06-06  5:26     ` Aravinda Prasad
  0 siblings, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06  5:26 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Thursday 06 June 2019 07:04 AM, David Gibson wrote:
> On Wed, May 29, 2019 at 11:10:14AM +0530, Aravinda Prasad wrote:
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls.
>>
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor that also received a machine check error waits
>> till the first processor is done reading the error log.
>> The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed. This patch implements the
>> releasing part of the error-log while subsequent patch
>> (which builds error log) handles the locking part.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>> ---
>>  hw/ppc/spapr.c         |    7 +++++
>>  hw/ppc/spapr_rtas.c    |   65 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    9 ++++++-
>>  3 files changed, 80 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index e2b33e5..fae28a9 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1808,6 +1808,11 @@ static void spapr_machine_reset(void)
>>      first_ppc_cpu->env.gpr[5] = 0;
>>  
>>      spapr->cas_reboot = false;
>> +
>> +    spapr->guest_machine_check_addr = -1;
>> +
>> +    /* Signal all vCPUs waiting on this condition */
>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>> @@ -3072,6 +3077,8 @@ static void spapr_machine_init(MachineState *machine)
>>  
>>          kvmppc_spapr_enable_inkernel_multitce();
>>      }
>> +
>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 5bc1a93..e7509cf 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -352,6 +352,38 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>      rtas_st(rets, 1, 100);
>>  }
>>  
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  SpaprMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>> +
>> +    if (!rtas_addr) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (spapr->guest_machine_check_addr == -1) {
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +    } else {
>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    }
>> +}
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -470,6 +502,35 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>      }
>>  }
>>  
>> +hwaddr spapr_get_rtas_addr(void)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int rtas_node;
>> +    const struct fdt_property *rtas_addr_prop;
>> +    void *fdt = spapr->fdt_blob;
>> +    uint32_t rtas_addr;
>> +
>> +    /* fetch rtas addr from fdt */
>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>> +    if (rtas_node == 0) {
>> +        return 0;
> 
> This is incorrect, a return code < 0 indicates an error which you
> should check for.  A return code of 0 indicates the root node, which
> could only happen if libfdt was badly buggy.

ok

> 
>> +    }
>> +
>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> 
> fdt_get_property is generally only needed for certain edge cases.
> fdt_getprop() is a better option.

ok

> 
>> +    if (!rtas_addr_prop) {
>> +        return 0;
>> +    }
>> +
>> +    /*
>> +     * We assume that the OS called RTAS instantiate-rtas, but some other
>> +     * OS might call RTAS instantiate-rtas-64 instead. This fine as of now
>> +     * as SLOF only supports 32-bit variant.
>> +     */
>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>> +    return (hwaddr)rtas_addr;
>> +}
>> +
>> +
>>  static void core_rtas_register_types(void)
>>  {
>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> @@ -493,6 +554,10 @@ static void core_rtas_register_types(void)
>>                          rtas_set_power_level);
>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>                          rtas_get_power_level);
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>>  }
>>  
>>  type_init(core_rtas_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 4f5becf..9dc5e30 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -188,6 +188,10 @@ struct SpaprMachineState {
>>       * occurs during the unplug process. */
>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>  
>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>> +    target_ulong guest_machine_check_addr;
>> +    QemuCond mc_delivery_cond;
>> +
>>      /*< public >*/
>>      char *kvm_type;
>>      char *host_model;
>> @@ -624,8 +628,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -876,4 +882,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>  
>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>> +uint64_t spapr_get_rtas_addr(void);
>>  #endif /* HW_SPAPR_H */
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-06  3:06   ` [Qemu-devel] " David Gibson
@ 2019-06-06  6:06     ` Greg Kurz
  2019-06-06 11:15       ` Aravinda Prasad
  2019-06-06 11:25     ` Aravinda Prasad
  1 sibling, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-06  6:06 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, paulus, qemu-ppc, Aravinda Prasad

[-- Attachment #1: Type: text/plain, Size: 5713 bytes --]

On Thu, 6 Jun 2019 13:06:14 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:
> > This patch includes migration support for machine check
> > handling. Especially this patch blocks VM migration
> > requests until the machine check error handling is
> > complete as (i) these errors are specific to the source
> > hardware and is irrelevant on the target hardware,
> > (ii) these errors cause data corruption and should
> > be handled before migration.
> > 
> > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> >  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >  hw/ppc/spapr_rtas.c    |    4 ++++
> >  include/hw/ppc/spapr.h |    2 ++
> >  4 files changed, 43 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index e8a77636..31c4850 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> >      },
> >  };
> >  
> > +static bool spapr_fwnmi_needed(void *opaque)
> > +{
> > +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > +
> > +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;  
> 
> Since we're introducing a PAPR capability to enable this, it would
> actually be better to check that here, rather than the runtime state.
> That leads to less cases and easier to understand semantics for the
> migration stream.
> 

Hmmm... the purpose of needed() VMState callbacks is precisely about
runtime state: the subsection should only be migrated if an MCE is
pending, ie. spapr->guest_machine_check_addr != -1.

> > +}
> > +
> > +static const VMStateDescription vmstate_spapr_machine_check = {
> > +    .name = "spapr_machine_check",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = spapr_fwnmi_needed,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> > +        VMSTATE_INT32(mc_status, SpaprMachineState),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> >  static const VMStateDescription vmstate_spapr = {
> >      .name = "spapr",
> >      .version_id = 3,
> > @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
> >          &vmstate_spapr_dtb,
> >          &vmstate_spapr_cap_large_decr,
> >          &vmstate_spapr_cap_ccf_assist,
> > +        &vmstate_spapr_machine_check,
> >          NULL
> >      }
> >  };
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 573c0b7..35e21e4 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -41,6 +41,7 @@
> >  #include "qemu/bcd.h"
> >  #include "hw/ppc/spapr_ovec.h"
> >  #include <libfdt.h>
> > +#include "migration/blocker.h"
> >  
> >  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >  #define   RTAS_LOG_VERSION_6                    0x06000000
> > @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >  {
> >      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > +    int ret;
> > +    Error *local_err = NULL;
> > +
> > +    error_setg(&spapr->fwnmi_migration_blocker,
> > +            "Live migration not supported during machine check handling");
> > +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> > +    if (ret < 0) {
> > +        /*
> > +         * We don't want to abort and let the migration to continue. In a
> > +         * rare case, the machine check handler will run on the target
> > +         * hardware. Though this is not preferable, it is better than aborting
> > +         * the migration or killing the VM.
> > +         */
> > +        error_free(spapr->fwnmi_migration_blocker);  
> 
> You should set fwnmi_migration_blocker to NULL here as well.
> 
> As mentioned on an earlier iteration, the migration blocker is the
> same every time.  Couldn't you just create it once and free at final
> teardown, rather than recreating it for every NMI?
> 
> > +        warn_report_err(local_err);
> > +    }
> >  
> >      while (spapr->mc_status != -1) {
> >          /*
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index 91a7ab9..c849223 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -50,6 +50,7 @@
> >  #include "target/ppc/mmu-hash64.h"
> >  #include "target/ppc/mmu-book3s-v3.h"
> >  #include "kvm_ppc.h"
> > +#include "migration/blocker.h"
> >  
> >  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >                                     uint32_t token, uint32_t nargs,
> > @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >          spapr->mc_status = -1;
> >          qemu_cond_signal(&spapr->mc_delivery_cond);
> >          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> > +        error_free(spapr->fwnmi_migration_blocker);
> > +        spapr->fwnmi_migration_blocker = NULL;
> >      }
> >  }
> >  
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index bd75d4b..6c0cfd8 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -214,6 +214,8 @@ struct SpaprMachineState {
> >      SpaprCapabilities def, eff, mig;
> >  
> >      unsigned gpu_numa_id;
> > +
> > +    Error *fwnmi_migration_blocker;
> >  };
> >  
> >  #define H_SUCCESS         0
> >   
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability
  2019-06-06  3:02       ` David Gibson
  2019-06-06  4:50         ` Aravinda Prasad
@ 2019-06-06  7:51         ` Aravinda Prasad
  1 sibling, 0 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06  7:51 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc



On Thursday 06 June 2019 08:32 AM, David Gibson wrote:
> On Tue, Jun 04, 2019 at 12:15:26PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Monday 03 June 2019 08:55 PM, Greg Kurz wrote:
>>> On Wed, 29 May 2019 11:10:49 +0530
>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>
>>>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
>>>> the KVM causes guest exit with NMI as exit reason
>>>> when it encounters a machine check exception on the
>>>> address belonging to a guest. Without this capability
>>>> enabled, KVM redirects machine check exceptions to
>>>> guest's 0x200 vector.
>>>>
>>>> This patch also deals with the case when a guest with
>>>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>> to migrate to a host that does not support this
>>>> capability.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>
>>> As suggested in another mail, it may be worth introducing the sPAPR cap
>>> in its own patch, earlier in the series.
>>
>> Sure, also as a workaround mentioned in the reply to that mail, I am
>> thinking of returning RTAS_OUT_NOT_SUPPORTED to rtas nmi register call
>> until the entire functionality is implemented. This will help solve
>> spapr cap issue as well.
> 
> Not registering the RTAS call at all is the correct way to handle that
> case.
> 
>>
>>>
>>> Anyway, I have some comments below.
>>>
>>>>  hw/ppc/spapr.c         |    1 +
>>>>  hw/ppc/spapr_caps.c    |   24 ++++++++++++++++++++++++
>>>>  hw/ppc/spapr_rtas.c    |   18 ++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>  target/ppc/kvm.c       |   19 +++++++++++++++++++
>>>>  target/ppc/kvm_ppc.h   |   12 ++++++++++++
>>>>  6 files changed, 77 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index c97f6a6..e8a77636 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -4364,6 +4364,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>      smc->irq = &spapr_irq_dual;
>>>>      smc->dr_phb_enabled = true;
>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>> index 31b4661..ef9e612 100644
>>>> --- a/hw/ppc/spapr_caps.c
>>>> +++ b/hw/ppc/spapr_caps.c
>>>> @@ -479,6 +479,20 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>      }
>>>>  }
>>>>  
>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>> +                                Error **errp)
>>>> +{
>>>> +    if (!val) {
>>>> +        return; /* Disabled by default */
>>>> +    }
>>>> +
>>>> +    if (tcg_enabled()) {
>>>> +            error_setg(errp, "No fwnmi support in TCG, try cap-fwnmi-mce=off");
>>>
>>> Maybe expand "fwnmi" to "Firmware Assisted Non-Maskable Interrupts" ?
>>
>> sure..
>>
>>>
>>>> +    } else if (kvm_enabled() && !kvmppc_has_cap_ppc_fwnmi()) {
>>>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
>>>
>>> Maybe reword and add a hint:
>>>
>>> "KVM implementation does not support Firmware Assisted Non-Maskable Interrupts, try cap-fwnmi-mce=off"
>>
>> sure..
>>
>>>
>>>
>>>> +    }
>>>> +}
>>>> +
>>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>      [SPAPR_CAP_HTM] = {
>>>>          .name = "htm",
>>>> @@ -578,6 +592,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>          .type = "bool",
>>>>          .apply = cap_ccf_assist_apply,
>>>>      },
>>>> +    [SPAPR_CAP_FWNMI_MCE] = {
>>>> +        .name = "fwnmi-mce",
>>>> +        .description = "Handle fwnmi machine check exceptions",
>>>> +        .index = SPAPR_CAP_FWNMI_MCE,
>>>> +        .get = spapr_cap_get_bool,
>>>> +        .set = spapr_cap_set_bool,
>>>> +        .type = "bool",
>>>> +        .apply = cap_fwnmi_mce_apply,
>>>> +    },
>>>>  };
>>>>  
>>>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>>>> @@ -717,6 +740,7 @@ SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>>>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>>>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>>>  
>>>>  void spapr_caps_init(SpaprMachineState *spapr)
>>>>  {
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index e0bdfc8..91a7ab9 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -49,6 +49,7 @@
>>>>  #include "hw/ppc/fdt.h"
>>>>  #include "target/ppc/mmu-hash64.h"
>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>> +#include "kvm_ppc.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -358,6 +359,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>                                    target_ulong args,
>>>>                                    uint32_t nret, target_ulong rets)
>>>>  {
>>>> +    int ret;
>>>>      hwaddr rtas_addr = spapr_get_rtas_addr();
>>>>  
>>>>      if (!rtas_addr) {
>>>> @@ -365,6 +367,22 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>          return;
>>>>      }
>>>>  
>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == 0) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>> +    if (ret == 1) {
>>>
>>> I have the impression that this should really not happen,
>>> otherwise something has gone terribly wrong in QEMU or
>>> in KVM... this maybe deserves an error message as well ?
>>> No big deal.
>>
>> I think so..  will add an error message.
> 
> Right, and because this is essentially a host side error,
> RTAS_OUT_HW_ERROR would be more appropriate.

In fact I am already doing that. If ret == 1 I set it to
RTAS_OUT_NOT_SUPPORTE else if ret < 0 than I set the error to
RTAS_OUT_HW_ERROR.

> 
>>
>> Also I should check for non zero return value, not just ret == 1, as
>> kvmppc_fwnmi_enable() returns the error value from ioctl().
>>
>>
>>>
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (ret < 0) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> +        return;
>>>> +    }
>>>> +
>>>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>  }
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index c717ab2..bd75d4b 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -78,8 +78,10 @@ typedef enum {
>>>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>>>>  /* Count Cache Flush Assist HW Instruction */
>>>>  #define SPAPR_CAP_CCF_ASSIST            0x09
>>>> +/* FWNMI machine check handling */
>>>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>>>>  /* Num Caps */
>>>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
>>>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>>>>  
>>>>  /*
>>>>   * Capability Values
>>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>>>> index 39f1a73..368ec6e 100644
>>>> --- a/target/ppc/kvm.c
>>>> +++ b/target/ppc/kvm.c
>>>> @@ -84,6 +84,7 @@ static int cap_ppc_safe_indirect_branch;
>>>>  static int cap_ppc_count_cache_flush_assist;
>>>>  static int cap_ppc_nested_kvm_hv;
>>>>  static int cap_large_decr;
>>>> +static int cap_ppc_fwnmi;
>>>>  
>>>>  static uint32_t debug_inst_opcode;
>>>>  
>>>> @@ -152,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>      kvmppc_get_cpu_characteristics(s);
>>>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>>>>      cap_large_decr = kvmppc_get_dec_bits();
>>>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>>>      /*
>>>>       * Note: setting it to false because there is not such capability
>>>>       * in KVM at this moment.
>>>> @@ -2119,6 +2121,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>      }
>>>>  }
>>>>  
>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>> +{
>>>> +    CPUState *cs = CPU(cpu);
>>>> +
>>>> +    if (!cap_ppc_fwnmi) {
>>>> +        return 1;
>>>> +    }
>>>> +
>>>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>>>> +}
>>>> +
>>>> +
>>>>  int kvmppc_smt_threads(void)
>>>>  {
>>>>      return cap_ppc_smt ? cap_ppc_smt : 1;
>>>> @@ -2419,6 +2433,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>>>>      return cap_mmu_hash_v3;
>>>>  }
>>>>  
>>>> +bool kvmppc_has_cap_ppc_fwnmi(void)
>>>> +{
>>>> +    return cap_ppc_fwnmi;
>>>> +}
>>>> +
>>>>  static bool kvmppc_power8_host(void)
>>>>  {
>>>>      bool ret = false;
>>>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>>>> index 18693f1..3d9f0b4 100644
>>>> --- a/target/ppc/kvm_ppc.h
>>>> +++ b/target/ppc/kvm_ppc.h
>>>> @@ -27,6 +27,8 @@ void kvmppc_enable_h_page_init(void);
>>>>  void kvmppc_set_papr(PowerPCCPU *cpu);
>>>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>>>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>>>> +bool kvmppc_has_cap_ppc_fwnmi(void);
>>>>  int kvmppc_smt_threads(void);
>>>>  void kvmppc_hint_smt_possible(Error **errp);
>>>>  int kvmppc_set_smt_threads(int smt);
>>>> @@ -160,6 +162,16 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>  {
>>>>  }
>>>>  
>>>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>> +{
>>>> +    return 1;
>>>> +}
>>>> +
>>>> +static inline bool kvmppc_has_cap_ppc_fwnmi(void)
>>>> +{
>>>> +    return false;
>>>> +}
>>>> +
>>>>  static inline int kvmppc_smt_threads(void)
>>>>  {
>>>>      return 1;
>>>>
>>>>
>>>
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-06  6:06     ` Greg Kurz
@ 2019-06-06 11:15       ` Aravinda Prasad
  2019-06-06 12:10         ` Greg Kurz
  0 siblings, 1 reply; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06 11:15 UTC (permalink / raw)
  To: Greg Kurz, David Gibson; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Thursday 06 June 2019 11:36 AM, Greg Kurz wrote:
> On Thu, 6 Jun 2019 13:06:14 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
>> On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:
>>> This patch includes migration support for machine check
>>> handling. Especially this patch blocks VM migration
>>> requests until the machine check error handling is
>>> complete as (i) these errors are specific to the source
>>> hardware and is irrelevant on the target hardware,
>>> (ii) these errors cause data corruption and should
>>> be handled before migration.
>>>
>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>> ---
>>>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
>>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>>  include/hw/ppc/spapr.h |    2 ++
>>>  4 files changed, 43 insertions(+)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index e8a77636..31c4850 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>>      },
>>>  };
>>>  
>>> +static bool spapr_fwnmi_needed(void *opaque)
>>> +{
>>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
>>> +
>>> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;  
>>
>> Since we're introducing a PAPR capability to enable this, it would
>> actually be better to check that here, rather than the runtime state.
>> That leads to less cases and easier to understand semantics for the
>> migration stream.
>>
> 
> Hmmm... the purpose of needed() VMState callbacks is precisely about
> runtime state: the subsection should only be migrated if an MCE is
> pending, ie. spapr->guest_machine_check_addr != -1.

spapr->guest_machine_check_addr is set when fwnmi is registered. Hence
an MCE might not be pending if it is set.

I am fine with both the approaches (checking for
guest_machine_check_addr or for SPAPR_CAP_FWNMI_MCE).

Regards,
Aravinda

> 
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_spapr_machine_check = {
>>> +    .name = "spapr_machine_check",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .needed = spapr_fwnmi_needed,
>>> +    .fields = (VMStateField[]) {
>>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>>  static const VMStateDescription vmstate_spapr = {
>>>      .name = "spapr",
>>>      .version_id = 3,
>>> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
>>>          &vmstate_spapr_dtb,
>>>          &vmstate_spapr_cap_large_decr,
>>>          &vmstate_spapr_cap_ccf_assist,
>>> +        &vmstate_spapr_machine_check,
>>>          NULL
>>>      }
>>>  };
>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>> index 573c0b7..35e21e4 100644
>>> --- a/hw/ppc/spapr_events.c
>>> +++ b/hw/ppc/spapr_events.c
>>> @@ -41,6 +41,7 @@
>>>  #include "qemu/bcd.h"
>>>  #include "hw/ppc/spapr_ovec.h"
>>>  #include <libfdt.h>
>>> +#include "migration/blocker.h"
>>>  
>>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>>> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>  {
>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> +    int ret;
>>> +    Error *local_err = NULL;
>>> +
>>> +    error_setg(&spapr->fwnmi_migration_blocker,
>>> +            "Live migration not supported during machine check handling");
>>> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
>>> +    if (ret < 0) {
>>> +        /*
>>> +         * We don't want to abort and let the migration to continue. In a
>>> +         * rare case, the machine check handler will run on the target
>>> +         * hardware. Though this is not preferable, it is better than aborting
>>> +         * the migration or killing the VM.
>>> +         */
>>> +        error_free(spapr->fwnmi_migration_blocker);  
>>
>> You should set fwnmi_migration_blocker to NULL here as well.
>>
>> As mentioned on an earlier iteration, the migration blocker is the
>> same every time.  Couldn't you just create it once and free at final
>> teardown, rather than recreating it for every NMI?
>>
>>> +        warn_report_err(local_err);
>>> +    }
>>>  
>>>      while (spapr->mc_status != -1) {
>>>          /*
>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index 91a7ab9..c849223 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -50,6 +50,7 @@
>>>  #include "target/ppc/mmu-hash64.h"
>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>  #include "kvm_ppc.h"
>>> +#include "migration/blocker.h"
>>>  
>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>                                     uint32_t token, uint32_t nargs,
>>> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>          spapr->mc_status = -1;
>>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>> +        error_free(spapr->fwnmi_migration_blocker);
>>> +        spapr->fwnmi_migration_blocker = NULL;
>>>      }
>>>  }
>>>  
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index bd75d4b..6c0cfd8 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -214,6 +214,8 @@ struct SpaprMachineState {
>>>      SpaprCapabilities def, eff, mig;
>>>  
>>>      unsigned gpu_numa_id;
>>> +
>>> +    Error *fwnmi_migration_blocker;
>>>  };
>>>  
>>>  #define H_SUCCESS         0
>>>   
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-06  3:06   ` [Qemu-devel] " David Gibson
  2019-06-06  6:06     ` Greg Kurz
@ 2019-06-06 11:25     ` Aravinda Prasad
  2019-06-06 12:24       ` Greg Kurz
  2019-06-07  0:23       ` David Gibson
  1 sibling, 2 replies; 44+ messages in thread
From: Aravinda Prasad @ 2019-06-06 11:25 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc



On Thursday 06 June 2019 08:36 AM, David Gibson wrote:
> On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:
>> This patch includes migration support for machine check
>> handling. Especially this patch blocks VM migration
>> requests until the machine check error handling is
>> complete as (i) these errors are specific to the source
>> hardware and is irrelevant on the target hardware,
>> (ii) these errors cause data corruption and should
>> be handled before migration.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>  include/hw/ppc/spapr.h |    2 ++
>>  4 files changed, 43 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index e8a77636..31c4850 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>      },
>>  };
>>  
>> +static bool spapr_fwnmi_needed(void *opaque)
>> +{
>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
>> +
>> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;
> 
> Since we're introducing a PAPR capability to enable this, it would
> actually be better to check that here, rather than the runtime state.
> That leads to less cases and easier to understand semantics for the
> migration stream.

I am fine with this approach as well.

> 
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_machine_check = {
>> +    .name = "spapr_machine_check",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = spapr_fwnmi_needed,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static const VMStateDescription vmstate_spapr = {
>>      .name = "spapr",
>>      .version_id = 3,
>> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
>>          &vmstate_spapr_dtb,
>>          &vmstate_spapr_cap_large_decr,
>>          &vmstate_spapr_cap_ccf_assist,
>> +        &vmstate_spapr_machine_check,
>>          NULL
>>      }
>>  };
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 573c0b7..35e21e4 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -41,6 +41,7 @@
>>  #include "qemu/bcd.h"
>>  #include "hw/ppc/spapr_ovec.h"
>>  #include <libfdt.h>
>> +#include "migration/blocker.h"
>>  
>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int ret;
>> +    Error *local_err = NULL;
>> +
>> +    error_setg(&spapr->fwnmi_migration_blocker,
>> +            "Live migration not supported during machine check handling");
>> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
>> +    if (ret < 0) {
>> +        /*
>> +         * We don't want to abort and let the migration to continue. In a
>> +         * rare case, the machine check handler will run on the target
>> +         * hardware. Though this is not preferable, it is better than aborting
>> +         * the migration or killing the VM.
>> +         */
>> +        error_free(spapr->fwnmi_migration_blocker);
> 
> You should set fwnmi_migration_blocker to NULL here as well.

ok.

> 
> As mentioned on an earlier iteration, the migration blocker is the
> same every time.  Couldn't you just create it once and free at final
> teardown, rather than recreating it for every NMI?

That means, we create the error string at the time when ibm,nmi-register
is invoked and tear it down during machine reset?

Regards,
Aravinda

> 
>> +        warn_report_err(local_err);
>> +    }
>>  
>>      while (spapr->mc_status != -1) {
>>          /*
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 91a7ab9..c849223 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -50,6 +50,7 @@
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>>  #include "kvm_ppc.h"
>> +#include "migration/blocker.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>> +        error_free(spapr->fwnmi_migration_blocker);
>> +        spapr->fwnmi_migration_blocker = NULL;
>>      }
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index bd75d4b..6c0cfd8 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -214,6 +214,8 @@ struct SpaprMachineState {
>>      SpaprCapabilities def, eff, mig;
>>  
>>      unsigned gpu_numa_id;
>> +
>> +    Error *fwnmi_migration_blocker;
>>  };
>>  
>>  #define H_SUCCESS         0
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-06 11:15       ` Aravinda Prasad
@ 2019-06-06 12:10         ` Greg Kurz
  2019-06-07  0:22           ` David Gibson
  0 siblings, 1 reply; 44+ messages in thread
From: Greg Kurz @ 2019-06-06 12:10 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, David Gibson

On Thu, 6 Jun 2019 16:45:30 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> On Thursday 06 June 2019 11:36 AM, Greg Kurz wrote:
> > On Thu, 6 Jun 2019 13:06:14 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> >> On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:  
> >>> This patch includes migration support for machine check
> >>> handling. Especially this patch blocks VM migration
> >>> requests until the machine check error handling is
> >>> complete as (i) these errors are specific to the source
> >>> hardware and is irrelevant on the target hardware,
> >>> (ii) these errors cause data corruption and should
> >>> be handled before migration.
> >>>
> >>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>> ---
> >>>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> >>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >>>  hw/ppc/spapr_rtas.c    |    4 ++++
> >>>  include/hw/ppc/spapr.h |    2 ++
> >>>  4 files changed, 43 insertions(+)
> >>>
> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>> index e8a77636..31c4850 100644
> >>> --- a/hw/ppc/spapr.c
> >>> +++ b/hw/ppc/spapr.c
> >>> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> >>>      },
> >>>  };
> >>>  
> >>> +static bool spapr_fwnmi_needed(void *opaque)
> >>> +{
> >>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> >>> +
> >>> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;    
> >>
> >> Since we're introducing a PAPR capability to enable this, it would
> >> actually be better to check that here, rather than the runtime state.
> >> That leads to less cases and easier to understand semantics for the
> >> migration stream.
> >>  
> > 
> > Hmmm... the purpose of needed() VMState callbacks is precisely about
> > runtime state: the subsection should only be migrated if an MCE is
> > pending, ie. spapr->guest_machine_check_addr != -1.  
> 
> spapr->guest_machine_check_addr is set when fwnmi is registered. Hence
> an MCE might not be pending if it is set.
> 

Oops sorry, got confused... should have written "if the guest has
registered FWNMI", but the idea is the same. We only need to migrate
the subsection if the state is different from reset. This is the way
needed() callbacks are generally implemented.

> I am fine with both the approaches (checking for
> guest_machine_check_addr or for SPAPR_CAP_FWNMI_MCE).
> 

I would find ackward to migrate this always for new machine types,
even if the guest doesn't register FWNMI...

> Regards,
> Aravinda
> 
> >   
> >>> +}
> >>> +
> >>> +static const VMStateDescription vmstate_spapr_machine_check = {
> >>> +    .name = "spapr_machine_check",
> >>> +    .version_id = 1,
> >>> +    .minimum_version_id = 1,
> >>> +    .needed = spapr_fwnmi_needed,
> >>> +    .fields = (VMStateField[]) {
> >>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> >>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> >>> +        VMSTATE_END_OF_LIST()
> >>> +    },
> >>> +};
> >>> +
> >>>  static const VMStateDescription vmstate_spapr = {
> >>>      .name = "spapr",
> >>>      .version_id = 3,
> >>> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
> >>>          &vmstate_spapr_dtb,
> >>>          &vmstate_spapr_cap_large_decr,
> >>>          &vmstate_spapr_cap_ccf_assist,
> >>> +        &vmstate_spapr_machine_check,
> >>>          NULL
> >>>      }
> >>>  };
> >>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >>> index 573c0b7..35e21e4 100644
> >>> --- a/hw/ppc/spapr_events.c
> >>> +++ b/hw/ppc/spapr_events.c
> >>> @@ -41,6 +41,7 @@
> >>>  #include "qemu/bcd.h"
> >>>  #include "hw/ppc/spapr_ovec.h"
> >>>  #include <libfdt.h>
> >>> +#include "migration/blocker.h"
> >>>  
> >>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >>>  #define   RTAS_LOG_VERSION_6                    0x06000000
> >>> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>>  {
> >>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >>> +    int ret;
> >>> +    Error *local_err = NULL;
> >>> +
> >>> +    error_setg(&spapr->fwnmi_migration_blocker,
> >>> +            "Live migration not supported during machine check handling");
> >>> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> >>> +    if (ret < 0) {
> >>> +        /*
> >>> +         * We don't want to abort and let the migration to continue. In a
> >>> +         * rare case, the machine check handler will run on the target
> >>> +         * hardware. Though this is not preferable, it is better than aborting
> >>> +         * the migration or killing the VM.
> >>> +         */
> >>> +        error_free(spapr->fwnmi_migration_blocker);    
> >>
> >> You should set fwnmi_migration_blocker to NULL here as well.
> >>
> >> As mentioned on an earlier iteration, the migration blocker is the
> >> same every time.  Couldn't you just create it once and free at final
> >> teardown, rather than recreating it for every NMI?
> >>  
> >>> +        warn_report_err(local_err);
> >>> +    }
> >>>  
> >>>      while (spapr->mc_status != -1) {
> >>>          /*
> >>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>> index 91a7ab9..c849223 100644
> >>> --- a/hw/ppc/spapr_rtas.c
> >>> +++ b/hw/ppc/spapr_rtas.c
> >>> @@ -50,6 +50,7 @@
> >>>  #include "target/ppc/mmu-hash64.h"
> >>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>  #include "kvm_ppc.h"
> >>> +#include "migration/blocker.h"
> >>>  
> >>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>                                     uint32_t token, uint32_t nargs,
> >>> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>          spapr->mc_status = -1;
> >>>          qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>> +        error_free(spapr->fwnmi_migration_blocker);
> >>> +        spapr->fwnmi_migration_blocker = NULL;
> >>>      }
> >>>  }
> >>>  
> >>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >>> index bd75d4b..6c0cfd8 100644
> >>> --- a/include/hw/ppc/spapr.h
> >>> +++ b/include/hw/ppc/spapr.h
> >>> @@ -214,6 +214,8 @@ struct SpaprMachineState {
> >>>      SpaprCapabilities def, eff, mig;
> >>>  
> >>>      unsigned gpu_numa_id;
> >>> +
> >>> +    Error *fwnmi_migration_blocker;
> >>>  };
> >>>  
> >>>  #define H_SUCCESS         0
> >>>     
> >>  
> >   
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-06 11:25     ` Aravinda Prasad
@ 2019-06-06 12:24       ` Greg Kurz
  2019-06-07  0:23       ` David Gibson
  1 sibling, 0 replies; 44+ messages in thread
From: Greg Kurz @ 2019-06-06 12:24 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, David Gibson

On Thu, 6 Jun 2019 16:55:18 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> On Thursday 06 June 2019 08:36 AM, David Gibson wrote:
> > On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:  
> >> This patch includes migration support for machine check
> >> handling. Especially this patch blocks VM migration
> >> requests until the machine check error handling is
> >> complete as (i) these errors are specific to the source
> >> hardware and is irrelevant on the target hardware,
> >> (ii) these errors cause data corruption and should
> >> be handled before migration.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> >>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >>  hw/ppc/spapr_rtas.c    |    4 ++++
> >>  include/hw/ppc/spapr.h |    2 ++
> >>  4 files changed, 43 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index e8a77636..31c4850 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> >>      },
> >>  };
> >>  
> >> +static bool spapr_fwnmi_needed(void *opaque)
> >> +{
> >> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> >> +
> >> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;  
> > 
> > Since we're introducing a PAPR capability to enable this, it would
> > actually be better to check that here, rather than the runtime state.
> > That leads to less cases and easier to understand semantics for the
> > migration stream.  
> 
> I am fine with this approach as well.
> 
> >   
> >> +}
> >> +
> >> +static const VMStateDescription vmstate_spapr_machine_check = {
> >> +    .name = "spapr_machine_check",
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .needed = spapr_fwnmi_needed,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> >> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >>  static const VMStateDescription vmstate_spapr = {
> >>      .name = "spapr",
> >>      .version_id = 3,
> >> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
> >>          &vmstate_spapr_dtb,
> >>          &vmstate_spapr_cap_large_decr,
> >>          &vmstate_spapr_cap_ccf_assist,
> >> +        &vmstate_spapr_machine_check,
> >>          NULL
> >>      }
> >>  };
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index 573c0b7..35e21e4 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -41,6 +41,7 @@
> >>  #include "qemu/bcd.h"
> >>  #include "hw/ppc/spapr_ovec.h"
> >>  #include <libfdt.h>
> >> +#include "migration/blocker.h"
> >>  
> >>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >>  #define   RTAS_LOG_VERSION_6                    0x06000000
> >> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>  {
> >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    int ret;
> >> +    Error *local_err = NULL;
> >> +
> >> +    error_setg(&spapr->fwnmi_migration_blocker,
> >> +            "Live migration not supported during machine check handling");
> >> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> >> +    if (ret < 0) {
> >> +        /*
> >> +         * We don't want to abort and let the migration to continue. In a
> >> +         * rare case, the machine check handler will run on the target
> >> +         * hardware. Though this is not preferable, it is better than aborting
> >> +         * the migration or killing the VM.
> >> +         */
> >> +        error_free(spapr->fwnmi_migration_blocker);  
> > 
> > You should set fwnmi_migration_blocker to NULL here as well.  
> 
> ok.
> 
> > 
> > As mentioned on an earlier iteration, the migration blocker is the
> > same every time.  Couldn't you just create it once and free at final
> > teardown, rather than recreating it for every NMI?  
> 
> That means, we create the error string at the time when ibm,nmi-register
> is invoked and tear it down during machine reset?
> 

No, I think David is asking to create the error string during machine init,
for all the machine lifetime. In which case, we don't even need to call
error_free() at all.

> Regards,
> Aravinda
> 
> >   
> >> +        warn_report_err(local_err);
> >> +    }
> >>  
> >>      while (spapr->mc_status != -1) {
> >>          /*
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index 91a7ab9..c849223 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -50,6 +50,7 @@
> >>  #include "target/ppc/mmu-hash64.h"
> >>  #include "target/ppc/mmu-book3s-v3.h"
> >>  #include "kvm_ppc.h"
> >> +#include "migration/blocker.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -404,6 +405,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>          spapr->mc_status = -1;
> >>          qemu_cond_signal(&spapr->mc_delivery_cond);
> >>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >> +        error_free(spapr->fwnmi_migration_blocker);
> >> +        spapr->fwnmi_migration_blocker = NULL;
> >>      }
> >>  }
> >>  
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index bd75d4b..6c0cfd8 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -214,6 +214,8 @@ struct SpaprMachineState {
> >>      SpaprCapabilities def, eff, mig;
> >>  
> >>      unsigned gpu_numa_id;
> >> +
> >> +    Error *fwnmi_migration_blocker;
> >>  };
> >>  
> >>  #define H_SUCCESS         0
> >>  
> >   
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-06 12:10         ` Greg Kurz
@ 2019-06-07  0:22           ` David Gibson
  2019-06-07 10:30             ` Greg Kurz
  0 siblings, 1 reply; 44+ messages in thread
From: David Gibson @ 2019-06-07  0:22 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, Aravinda Prasad

[-- Attachment #1: Type: text/plain, Size: 3276 bytes --]

On Thu, Jun 06, 2019 at 02:10:48PM +0200, Greg Kurz wrote:
> On Thu, 6 Jun 2019 16:45:30 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
> > On Thursday 06 June 2019 11:36 AM, Greg Kurz wrote:
> > > On Thu, 6 Jun 2019 13:06:14 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > >> On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:  
> > >>> This patch includes migration support for machine check
> > >>> handling. Especially this patch blocks VM migration
> > >>> requests until the machine check error handling is
> > >>> complete as (i) these errors are specific to the source
> > >>> hardware and is irrelevant on the target hardware,
> > >>> (ii) these errors cause data corruption and should
> > >>> be handled before migration.
> > >>>
> > >>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > >>> ---
> > >>>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> > >>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> > >>>  hw/ppc/spapr_rtas.c    |    4 ++++
> > >>>  include/hw/ppc/spapr.h |    2 ++
> > >>>  4 files changed, 43 insertions(+)
> > >>>
> > >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > >>> index e8a77636..31c4850 100644
> > >>> --- a/hw/ppc/spapr.c
> > >>> +++ b/hw/ppc/spapr.c
> > >>> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> > >>>      },
> > >>>  };
> > >>>  
> > >>> +static bool spapr_fwnmi_needed(void *opaque)
> > >>> +{
> > >>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > >>> +
> > >>> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;    
> > >>
> > >> Since we're introducing a PAPR capability to enable this, it would
> > >> actually be better to check that here, rather than the runtime state.
> > >> That leads to less cases and easier to understand semantics for the
> > >> migration stream.
> > >>  
> > > 
> > > Hmmm... the purpose of needed() VMState callbacks is precisely about
> > > runtime state: the subsection should only be migrated if an MCE is
> > > pending, ie. spapr->guest_machine_check_addr != -1.  
> > 
> > spapr->guest_machine_check_addr is set when fwnmi is registered. Hence
> > an MCE might not be pending if it is set.
> > 
> 
> Oops sorry, got confused... should have written "if the guest has
> registered FWNMI", but the idea is the same. We only need to migrate
> the subsection if the state is different from reset. This is the way
> needed() callbacks are generally implemented.

Yes, but usually that's because we need to omit the section if it's
not actively in use in order to maintain backwards compatiblity with
old migration streams.  If the cap is enabled we already need
something that implements it at both ends to have a sane migration.

> > I am fine with both the approaches (checking for
> > guest_machine_check_addr or for SPAPR_CAP_FWNMI_MCE).
> > 
> 
> I would find ackward to migrate this always for new machine types,
> even if the guest doesn't register FWNMI...

How so?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-06 11:25     ` Aravinda Prasad
  2019-06-06 12:24       ` Greg Kurz
@ 2019-06-07  0:23       ` David Gibson
  1 sibling, 0 replies; 44+ messages in thread
From: David Gibson @ 2019-06-07  0:23 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, groug, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4723 bytes --]

On Thu, Jun 06, 2019 at 04:55:18PM +0530, Aravinda Prasad wrote:
> 
> 
> On Thursday 06 June 2019 08:36 AM, David Gibson wrote:
> > On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:
> >> This patch includes migration support for machine check
> >> handling. Especially this patch blocks VM migration
> >> requests until the machine check error handling is
> >> complete as (i) these errors are specific to the source
> >> hardware and is irrelevant on the target hardware,
> >> (ii) these errors cause data corruption and should
> >> be handled before migration.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> >>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >>  hw/ppc/spapr_rtas.c    |    4 ++++
> >>  include/hw/ppc/spapr.h |    2 ++
> >>  4 files changed, 43 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index e8a77636..31c4850 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> >>      },
> >>  };
> >>  
> >> +static bool spapr_fwnmi_needed(void *opaque)
> >> +{
> >> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> >> +
> >> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;
> > 
> > Since we're introducing a PAPR capability to enable this, it would
> > actually be better to check that here, rather than the runtime state.
> > That leads to less cases and easier to understand semantics for the
> > migration stream.
> 
> I am fine with this approach as well.
> 
> > 
> >> +}
> >> +
> >> +static const VMStateDescription vmstate_spapr_machine_check = {
> >> +    .name = "spapr_machine_check",
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .needed = spapr_fwnmi_needed,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> >> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >>  static const VMStateDescription vmstate_spapr = {
> >>      .name = "spapr",
> >>      .version_id = 3,
> >> @@ -2137,6 +2156,7 @@ static const VMStateDescription vmstate_spapr = {
> >>          &vmstate_spapr_dtb,
> >>          &vmstate_spapr_cap_large_decr,
> >>          &vmstate_spapr_cap_ccf_assist,
> >> +        &vmstate_spapr_machine_check,
> >>          NULL
> >>      }
> >>  };
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index 573c0b7..35e21e4 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -41,6 +41,7 @@
> >>  #include "qemu/bcd.h"
> >>  #include "hw/ppc/spapr_ovec.h"
> >>  #include <libfdt.h>
> >> +#include "migration/blocker.h"
> >>  
> >>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >>  #define   RTAS_LOG_VERSION_6                    0x06000000
> >> @@ -855,6 +856,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>  {
> >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    int ret;
> >> +    Error *local_err = NULL;
> >> +
> >> +    error_setg(&spapr->fwnmi_migration_blocker,
> >> +            "Live migration not supported during machine check handling");
> >> +    ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> >> +    if (ret < 0) {
> >> +        /*
> >> +         * We don't want to abort and let the migration to continue. In a
> >> +         * rare case, the machine check handler will run on the target
> >> +         * hardware. Though this is not preferable, it is better than aborting
> >> +         * the migration or killing the VM.
> >> +         */
> >> +        error_free(spapr->fwnmi_migration_blocker);
> > 
> > You should set fwnmi_migration_blocker to NULL here as well.
> 
> ok.
> 
> > 
> > As mentioned on an earlier iteration, the migration blocker is the
> > same every time.  Couldn't you just create it once and free at final
> > teardown, rather than recreating it for every NMI?
> 
> That means, we create the error string at the time when ibm,nmi-register
> is invoked and tear it down during machine reset?

Or you could even just create it at machine_init time, and tear it
down never, just add/remove it from the blocker slot.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling
  2019-06-07  0:22           ` David Gibson
@ 2019-06-07 10:30             ` Greg Kurz
  0 siblings, 0 replies; 44+ messages in thread
From: Greg Kurz @ 2019-06-07 10:30 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, paulus, qemu-ppc, Aravinda Prasad

[-- Attachment #1: Type: text/plain, Size: 4045 bytes --]

On Fri, 7 Jun 2019 10:22:40 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jun 06, 2019 at 02:10:48PM +0200, Greg Kurz wrote:
> > On Thu, 6 Jun 2019 16:45:30 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >   
> > > On Thursday 06 June 2019 11:36 AM, Greg Kurz wrote:  
> > > > On Thu, 6 Jun 2019 13:06:14 +1000
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > >> On Wed, May 29, 2019 at 11:10:57AM +0530, Aravinda Prasad wrote:    
> > > >>> This patch includes migration support for machine check
> > > >>> handling. Especially this patch blocks VM migration
> > > >>> requests until the machine check error handling is
> > > >>> complete as (i) these errors are specific to the source
> > > >>> hardware and is irrelevant on the target hardware,
> > > >>> (ii) these errors cause data corruption and should
> > > >>> be handled before migration.
> > > >>>
> > > >>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > > >>> ---
> > > >>>  hw/ppc/spapr.c         |   20 ++++++++++++++++++++
> > > >>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> > > >>>  hw/ppc/spapr_rtas.c    |    4 ++++
> > > >>>  include/hw/ppc/spapr.h |    2 ++
> > > >>>  4 files changed, 43 insertions(+)
> > > >>>
> > > >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > >>> index e8a77636..31c4850 100644
> > > >>> --- a/hw/ppc/spapr.c
> > > >>> +++ b/hw/ppc/spapr.c
> > > >>> @@ -2104,6 +2104,25 @@ static const VMStateDescription vmstate_spapr_dtb = {
> > > >>>      },
> > > >>>  };
> > > >>>  
> > > >>> +static bool spapr_fwnmi_needed(void *opaque)
> > > >>> +{
> > > >>> +    SpaprMachineState *spapr = (SpaprMachineState *)opaque;
> > > >>> +
> > > >>> +    return (spapr->guest_machine_check_addr == -1) ? 0 : 1;      
> > > >>
> > > >> Since we're introducing a PAPR capability to enable this, it would
> > > >> actually be better to check that here, rather than the runtime state.
> > > >> That leads to less cases and easier to understand semantics for the
> > > >> migration stream.
> > > >>    
> > > > 
> > > > Hmmm... the purpose of needed() VMState callbacks is precisely about
> > > > runtime state: the subsection should only be migrated if an MCE is
> > > > pending, ie. spapr->guest_machine_check_addr != -1.    
> > > 
> > > spapr->guest_machine_check_addr is set when fwnmi is registered. Hence
> > > an MCE might not be pending if it is set.
> > >   
> > 
> > Oops sorry, got confused... should have written "if the guest has
> > registered FWNMI", but the idea is the same. We only need to migrate
> > the subsection if the state is different from reset. This is the way
> > needed() callbacks are generally implemented.  
> 
> Yes, but usually that's because we need to omit the section if it's
> not actively in use in order to maintain backwards compatiblity with
> old migration streams.  If the cap is enabled we already need
> something that implements it at both ends to have a sane migration.
> 

I see it the opposite way. A subsection is something that is optional
for the destination only. In the present case, an older QEMU wont send
the "spapr_machine_check" subsection, which is interpreted by a newer
QEMU as "the guest didn't register FWNMI".

On the source side, if some internal state got used it should always
be migrated. We maintain backwards compatibility to an older QEMU
by not using the new state at all, thanks to the versioned machine
property.

> > > I am fine with both the approaches (checking for
> > > guest_machine_check_addr or for SPAPR_CAP_FWNMI_MCE).
> > >   
> > 
> > I would find ackward to migrate this always for new machine types,
> > even if the guest doesn't register FWNMI...  
> 
> How so?
> 

Well, I just don't see the point in migrating something that
is not state since its value is the reset default that the
destination already knows.

This being said, I won't make more fuss about it, as long as
it works :)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-06-07 12:19 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-29  5:40 [Qemu-devel] [PATCH v9 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
2019-06-03 10:12   ` Greg Kurz
2019-06-03 11:17     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-06-04  6:08       ` Aravinda Prasad
2019-06-04 14:50         ` Greg Kurz
2019-06-06  5:17           ` Aravinda Prasad
2019-06-06  1:35       ` David Gibson
2019-06-06  4:39         ` Aravinda Prasad
2019-06-06  1:34   ` [Qemu-devel] " David Gibson
2019-06-06  5:26     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 2/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
2019-06-03 11:53   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-06-06  1:43   ` [Qemu-devel] " David Gibson
2019-06-06  4:45     ` Aravinda Prasad
2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
2019-06-03 14:00   ` Greg Kurz
2019-06-04  6:29     ` Aravinda Prasad
2019-06-04  9:01       ` Greg Kurz
2019-06-04 10:10         ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2019-06-06  2:58         ` [Qemu-devel] " David Gibson
2019-06-06  2:57       ` David Gibson
2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 5/6] ppc: spapr: Enable FWNMI capability Aravinda Prasad
2019-06-03 15:25   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-06-04  6:45     ` Aravinda Prasad
2019-06-06  3:02       ` David Gibson
2019-06-06  4:50         ` Aravinda Prasad
2019-06-06  7:51         ` Aravinda Prasad
2019-06-06  3:00   ` [Qemu-devel] " David Gibson
2019-05-29  5:40 ` [Qemu-devel] [PATCH v9 6/6] migration: Include migration support for machine check handling Aravinda Prasad
2019-06-03 15:40   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-06-04  7:04     ` Aravinda Prasad
2019-06-04 20:04       ` Greg Kurz
2019-06-04 20:11         ` Greg Kurz
2019-06-06  3:06   ` [Qemu-devel] " David Gibson
2019-06-06  6:06     ` Greg Kurz
2019-06-06 11:15       ` Aravinda Prasad
2019-06-06 12:10         ` Greg Kurz
2019-06-07  0:22           ` David Gibson
2019-06-07 10:30             ` Greg Kurz
2019-06-06 11:25     ` Aravinda Prasad
2019-06-06 12:24       ` Greg Kurz
2019-06-07  0:23       ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.