[Qemu-devel] [PATCH v8 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v8 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests
@ 2019-04-22  7:02 ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:02 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, mahesh

This patch set adds support for FWNMI in PowerKVM guests.

System errors such as SLB multihit and memory errors
that cannot be corrected by hardware is passed on to
the kernel for handling by raising machine check
exception (an NMI). Upon such machine check exceptions,
if the address in error belongs to guest then KVM
invokes guests' 0x200 interrupt vector if the guest
is not FWNMI capable. For FWNMI capable guest
KVM passes the control to QEMU by exiting the guest.

This patch series adds functionality to QEMU to pass
on such machine check exceptions to the FWNMI capable
guest kernel by building an error log and invoking
the guest registered machine check handling routine.

The KVM changes are now part of the upstream kernel
(commit e20bbd3d). This series contain QEMU changes.

Change Log v8:
  - Added functionality to check FWNMI capability during
    VM migration

Change Log v7:
  - Rebased to 4.1

Change Log v6:
  - Fetches rtas_addr from fdt
  - Handles all error conditions (earlier it was only UEs)

Change Log v5:
  - Handled VM migrations by including rtas_addr in VMSTATE.
  - Migration is blocked while a machine check is in progress.

---

Aravinda Prasad (6):
      ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
      Wrapper function to wait on condition for the main loop mutex
      target/ppc: Handle NMI guest exit
      target/ppc: Build rtas error log upon an MCE
      ppc: spapr: Enable FWNMI capability
      migration: Block migration while handling machine check

 cpus.c                   |    5 +
 hw/ppc/spapr.c           |   26 ++++
 hw/ppc/spapr_caps.c      |   26 ++++
 hw/ppc/spapr_events.c    |  284 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c      |   84 ++++++++++++++
 include/hw/ppc/spapr.h   |   26 ++++
 include/qemu/main-loop.h |    8 +
 target/ppc/kvm.c         |   30 +++++
 target/ppc/kvm_ppc.h     |    8 +
 target/ppc/trace-events  |    2 
 10 files changed, 497 insertions(+), 2 deletions(-)

--
Aravinda

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests
@ 2019-04-22  7:02 ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:02 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda

This patch set adds support for FWNMI in PowerKVM guests.

System errors such as SLB multihit and memory errors
that cannot be corrected by hardware is passed on to
the kernel for handling by raising machine check
exception (an NMI). Upon such machine check exceptions,
if the address in error belongs to guest then KVM
invokes guests' 0x200 interrupt vector if the guest
is not FWNMI capable. For FWNMI capable guest
KVM passes the control to QEMU by exiting the guest.

This patch series adds functionality to QEMU to pass
on such machine check exceptions to the FWNMI capable
guest kernel by building an error log and invoking
the guest registered machine check handling routine.

The KVM changes are now part of the upstream kernel
(commit e20bbd3d). This series contain QEMU changes.

Change Log v8:
  - Added functionality to check FWNMI capability during
    VM migration

Change Log v7:
  - Rebased to 4.1

Change Log v6:
  - Fetches rtas_addr from fdt
  - Handles all error conditions (earlier it was only UEs)

Change Log v5:
  - Handled VM migrations by including rtas_addr in VMSTATE.
  - Migration is blocked while a machine check is in progress.

---

Aravinda Prasad (6):
      ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
      Wrapper function to wait on condition for the main loop mutex
      target/ppc: Handle NMI guest exit
      target/ppc: Build rtas error log upon an MCE
      ppc: spapr: Enable FWNMI capability
      migration: Block migration while handling machine check

 cpus.c                   |    5 +
 hw/ppc/spapr.c           |   26 ++++
 hw/ppc/spapr_caps.c      |   26 ++++
 hw/ppc/spapr_events.c    |  284 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c      |   84 ++++++++++++++
 include/hw/ppc/spapr.h   |   26 ++++
 include/qemu/main-loop.h |    8 +
 target/ppc/kvm.c         |   30 +++++
 target/ppc/kvm_ppc.h     |    8 +
 target/ppc/trace-events  |    2 
 10 files changed, 497 insertions(+), 2 deletions(-)

--
Aravinda

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
@ 2019-04-22  7:02   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:02 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, mahesh

This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.

The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.

This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed. This patch implements the
releasing part of the error-log while subsequent patch
(which builds error log) handles the locking part.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   18 ++++++++++++++
 hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    9 ++++++-
 3 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c56939a..6642cb5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
     first_ppc_cpu->env.gpr[5] = 0;
 
     spapr->cas_reboot = false;
+
+    spapr->guest_machine_check_addr = -1;
+
+    /* Signal all vCPUs waiting on this condition */
+    qemu_cond_broadcast(&spapr->mc_delivery_cond);
 }
 
 static void spapr_create_nvram(SpaprMachineState *spapr)
@@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
     },
 };
 
+static const VMStateDescription vmstate_spapr_machine_check = {
+    .name = "spapr_machine_check",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_dtb,
         &vmstate_spapr_cap_large_decr,
         &vmstate_spapr_cap_ccf_assist,
+        &vmstate_spapr_machine_check,
         NULL
     }
 };
@@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
 
         kvmppc_spapr_enable_inkernel_multitce();
     }
+
+    qemu_cond_init(&spapr->mc_delivery_cond);
 }
 
 static int spapr_kvm_type(MachineState *machine, const char *vm_type)
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index ee24212..c2f3991 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
     rtas_st(rets, 1, 100);
 }
 
+static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
+                                  SpaprMachineState *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args,
+                                  uint32_t nret, target_ulong rets)
+{
+    uint64_t rtas_addr = spapr_get_rtas_addr();
+
+    if (!rtas_addr) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    spapr->guest_machine_check_addr = rtas_ld(args, 1);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
+static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    if (!spapr->guest_machine_check_addr) {
+        /* NMI register not called */
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+    } else {
+        qemu_cond_signal(&spapr->mc_delivery_cond);
+        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    }
+}
+
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
     }
 }
 
+uint64_t spapr_get_rtas_addr(void)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int rtas_node;
+    const struct fdt_property *rtas_addr_prop;
+    void *fdt = spapr->fdt_blob;
+    uint32_t rtas_addr;
+
+    /* fetch rtas addr from fdt */
+    rtas_node = fdt_path_offset(fdt, "/rtas");
+    if (rtas_node == 0) {
+        return 0;
+    }
+
+    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
+    if (!rtas_addr_prop) {
+        return 0;
+    }
+
+    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
+    return (uint64_t)rtas_addr;
+}
+
+
 static void core_rtas_register_types(void)
 {
     spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
@@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
                         rtas_set_power_level);
     spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
                         rtas_get_power_level);
+    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
+                        rtas_ibm_nmi_register);
+    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
+                        rtas_ibm_nmi_interlock);
 }
 
 type_init(core_rtas_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 7e32f30..ec6f33e 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -187,6 +187,10 @@ struct SpaprMachineState {
      * occurs during the unplug process. */
     QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
 
+    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
+    target_ulong guest_machine_check_addr;
+    QemuCond mc_delivery_cond;
+
     /*< public >*/
     char *kvm_type;
     char *host_model;
@@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
 #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
 #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
+#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
 
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
@@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
+uint64_t spapr_get_rtas_addr(void);
 #endif /* HW_SPAPR_H */

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
@ 2019-04-22  7:02   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:02 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda

This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.

The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.

This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed. This patch implements the
releasing part of the error-log while subsequent patch
(which builds error log) handles the locking part.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   18 ++++++++++++++
 hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    9 ++++++-
 3 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c56939a..6642cb5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
     first_ppc_cpu->env.gpr[5] = 0;
 
     spapr->cas_reboot = false;
+
+    spapr->guest_machine_check_addr = -1;
+
+    /* Signal all vCPUs waiting on this condition */
+    qemu_cond_broadcast(&spapr->mc_delivery_cond);
 }
 
 static void spapr_create_nvram(SpaprMachineState *spapr)
@@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
     },
 };
 
+static const VMStateDescription vmstate_spapr_machine_check = {
+    .name = "spapr_machine_check",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
@@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_dtb,
         &vmstate_spapr_cap_large_decr,
         &vmstate_spapr_cap_ccf_assist,
+        &vmstate_spapr_machine_check,
         NULL
     }
 };
@@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
 
         kvmppc_spapr_enable_inkernel_multitce();
     }
+
+    qemu_cond_init(&spapr->mc_delivery_cond);
 }
 
 static int spapr_kvm_type(MachineState *machine, const char *vm_type)
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index ee24212..c2f3991 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
     rtas_st(rets, 1, 100);
 }
 
+static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
+                                  SpaprMachineState *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args,
+                                  uint32_t nret, target_ulong rets)
+{
+    uint64_t rtas_addr = spapr_get_rtas_addr();
+
+    if (!rtas_addr) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    spapr->guest_machine_check_addr = rtas_ld(args, 1);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
+static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    if (!spapr->guest_machine_check_addr) {
+        /* NMI register not called */
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+    } else {
+        qemu_cond_signal(&spapr->mc_delivery_cond);
+        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    }
+}
+
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
     }
 }
 
+uint64_t spapr_get_rtas_addr(void)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int rtas_node;
+    const struct fdt_property *rtas_addr_prop;
+    void *fdt = spapr->fdt_blob;
+    uint32_t rtas_addr;
+
+    /* fetch rtas addr from fdt */
+    rtas_node = fdt_path_offset(fdt, "/rtas");
+    if (rtas_node == 0) {
+        return 0;
+    }
+
+    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
+    if (!rtas_addr_prop) {
+        return 0;
+    }
+
+    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
+    return (uint64_t)rtas_addr;
+}
+
+
 static void core_rtas_register_types(void)
 {
     spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
@@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
                         rtas_set_power_level);
     spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
                         rtas_get_power_level);
+    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
+                        rtas_ibm_nmi_register);
+    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
+                        rtas_ibm_nmi_interlock);
 }
 
 type_init(core_rtas_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 7e32f30..ec6f33e 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -187,6 +187,10 @@ struct SpaprMachineState {
      * occurs during the unplug process. */
     QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
 
+    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
+    target_ulong guest_machine_check_addr;
+    QemuCond mc_delivery_cond;
+
     /*< public >*/
     char *kvm_type;
     char *host_model;
@@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
 #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
 #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
+#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
 
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
@@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
+uint64_t spapr_get_rtas_addr(void);
 #endif /* HW_SPAPR_H */



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 2/6] Wrapper function to wait on condition for the main loop mutex
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, mahesh

Introduce a wrapper function to wait on condition for
the main loop mutex. This function atomically releases
the main loop mutex and causes the calling thread to
block on the condition. This wrapper is required because
qemu_global_mutex is a static variable.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 cpus.c                   |    5 +++++
 include/qemu/main-loop.h |    8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/cpus.c b/cpus.c
index e83f72b..d9379e7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1858,6 +1858,11 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_cond_wait_iothread(QemuCond *cond)
+{
+    qemu_cond_wait(cond, &qemu_global_mutex);
+}
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index f6ba78e..a6d20b0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/*
+ * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ *
+ * This function atomically releases the main loop mutex and causes
+ * the calling thread to block on the condition.
+ */
+void qemu_cond_wait_iothread(QemuCond *cond);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 2/6] Wrapper function to wait on condition for the main loop mutex
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda

Introduce a wrapper function to wait on condition for
the main loop mutex. This function atomically releases
the main loop mutex and causes the calling thread to
block on the condition. This wrapper is required because
qemu_global_mutex is a static variable.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 cpus.c                   |    5 +++++
 include/qemu/main-loop.h |    8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/cpus.c b/cpus.c
index e83f72b..d9379e7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1858,6 +1858,11 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_cond_wait_iothread(QemuCond *cond)
+{
+    qemu_cond_wait(cond, &qemu_global_mutex);
+}
+
 static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index f6ba78e..a6d20b0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/*
+ * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ *
+ * This function atomically releases the main loop mutex and causes
+ * the calling thread to block on the condition.
+ */
+void qemu_cond_wait_iothread(QemuCond *cond);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, mahesh

Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.

[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
    (e20bbd3d and related commits)

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c          |    3 +++
 hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
 hw/ppc/spapr_rtas.c     |    5 +++++
 include/hw/ppc/spapr.h  |    6 ++++++
 target/ppc/kvm.c        |   16 ++++++++++++++++
 target/ppc/kvm_ppc.h    |    2 ++
 target/ppc/trace-events |    2 ++
 7 files changed, 56 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6642cb5..2779efe 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
 
     spapr->cas_reboot = false;
 
+    spapr->mc_status = -1;
     spapr->guest_machine_check_addr = -1;
 
     /* Signal all vCPUs waiting on this condition */
@@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
     .minimum_version_id = 1,
     .fields = (VMStateField[]) {
         VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
+        VMSTATE_INT32(mc_status, SpaprMachineState),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
         kvmppc_spapr_enable_inkernel_multitce();
     }
 
+    spapr->mc_status = -1;
     qemu_cond_init(&spapr->mc_delivery_cond);
 }
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index ae0f093..9922a23 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+
+    while (spapr->mc_status != -1) {
+        /*
+         * Check whether the same CPU got machine check error
+         * while still handling the mc error (i.e., before
+         * that CPU called "ibm,nmi-interlock"
+         */
+        if (spapr->mc_status == cpu->vcpu_id) {
+            qemu_system_guest_panicked(NULL);
+        }
+        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
+        /* Meanwhile if the system is reset, then just return */
+        if (spapr->guest_machine_check_addr == -1) {
+            return;
+        }
+    }
+    spapr->mc_status = cpu->vcpu_id;
+}
+
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
                             uint32_t token, uint32_t nargs,
                             target_ulong args,
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index c2f3991..d3499f9 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
         /* NMI register not called */
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
     } else {
+        /*
+         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
+         * hence unset mc_status.
+         */
+        spapr->mc_status = -1;
         qemu_cond_signal(&spapr->mc_delivery_cond);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
     }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index ec6f33e..f7204d0 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -189,6 +189,11 @@ struct SpaprMachineState {
 
     /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
     target_ulong guest_machine_check_addr;
+    /*
+     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
+     * handling the mc.
+     */
+    int mc_status;
     QemuCond mc_delivery_cond;
 
     /*< public >*/
@@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 9e86db0..5eedce8 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ret = 0;
         break;
 
+    case KVM_EXIT_NMI:
+        trace_kvm_handle_nmi_exception();
+        ret = kvm_handle_nmi(cpu, run);
+        break;
+
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
@@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
     return data & 0xffff;
 }
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
+{
+    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    spapr_mce_req_event(cpu, recovered);
+
+    return 0;
+}
+
 int kvmppc_enable_hwrng(void)
 {
     if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 2238513..6edc42f 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
 void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
 void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
+
 #else
 
 static inline uint32_t kvmppc_get_tbfreq(void)
diff --git a/target/ppc/trace-events b/target/ppc/trace-events
index 7b3cfe1..d5691d2 100644
--- a/target/ppc/trace-events
+++ b/target/ppc/trace-events
@@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
 kvm_handle_epr(void) "handle epr"
 kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
 kvm_handle_debug_exception(void) "handle debug exception"
+kvm_handle_nmi_exception(void) "handle NMI exception"
+

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda

Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.

[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
    (e20bbd3d and related commits)

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c          |    3 +++
 hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
 hw/ppc/spapr_rtas.c     |    5 +++++
 include/hw/ppc/spapr.h  |    6 ++++++
 target/ppc/kvm.c        |   16 ++++++++++++++++
 target/ppc/kvm_ppc.h    |    2 ++
 target/ppc/trace-events |    2 ++
 7 files changed, 56 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6642cb5..2779efe 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
 
     spapr->cas_reboot = false;
 
+    spapr->mc_status = -1;
     spapr->guest_machine_check_addr = -1;
 
     /* Signal all vCPUs waiting on this condition */
@@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
     .minimum_version_id = 1,
     .fields = (VMStateField[]) {
         VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
+        VMSTATE_INT32(mc_status, SpaprMachineState),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
         kvmppc_spapr_enable_inkernel_multitce();
     }
 
+    spapr->mc_status = -1;
     qemu_cond_init(&spapr->mc_delivery_cond);
 }
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index ae0f093..9922a23 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+
+    while (spapr->mc_status != -1) {
+        /*
+         * Check whether the same CPU got machine check error
+         * while still handling the mc error (i.e., before
+         * that CPU called "ibm,nmi-interlock"
+         */
+        if (spapr->mc_status == cpu->vcpu_id) {
+            qemu_system_guest_panicked(NULL);
+        }
+        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
+        /* Meanwhile if the system is reset, then just return */
+        if (spapr->guest_machine_check_addr == -1) {
+            return;
+        }
+    }
+    spapr->mc_status = cpu->vcpu_id;
+}
+
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
                             uint32_t token, uint32_t nargs,
                             target_ulong args,
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index c2f3991..d3499f9 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
         /* NMI register not called */
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
     } else {
+        /*
+         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
+         * hence unset mc_status.
+         */
+        spapr->mc_status = -1;
         qemu_cond_signal(&spapr->mc_delivery_cond);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
     }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index ec6f33e..f7204d0 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -189,6 +189,11 @@ struct SpaprMachineState {
 
     /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
     target_ulong guest_machine_check_addr;
+    /*
+     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
+     * handling the mc.
+     */
+    int mc_status;
     QemuCond mc_delivery_cond;
 
     /*< public >*/
@@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
+void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 9e86db0..5eedce8 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ret = 0;
         break;
 
+    case KVM_EXIT_NMI:
+        trace_kvm_handle_nmi_exception();
+        ret = kvm_handle_nmi(cpu, run);
+        break;
+
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
@@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
     return data & 0xffff;
 }
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
+{
+    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
+
+    cpu_synchronize_state(CPU(cpu));
+
+    spapr_mce_req_event(cpu, recovered);
+
+    return 0;
+}
+
 int kvmppc_enable_hwrng(void)
 {
     if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 2238513..6edc42f 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
 void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
 void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
 
+int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
+
 #else
 
 static inline uint32_t kvmppc_get_tbfreq(void)
diff --git a/target/ppc/trace-events b/target/ppc/trace-events
index 7b3cfe1..d5691d2 100644
--- a/target/ppc/trace-events
+++ b/target/ppc/trace-events
@@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
 kvm_handle_epr(void) "handle epr"
 kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
 kvm_handle_debug_exception(void) "handle debug exception"
+kvm_handle_nmi_exception(void) "handle NMI exception"
+



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, mahesh

Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.

This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    4 +
 hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    4 +
 3 files changed, 253 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2779efe..ffd1715 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
         error_report("Could not get size of LPAR rtas '%s'", filename);
         exit(1);
     }
+
+    /* Resize blob to accommodate error log. */
+    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
+
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
     if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
         error_report("Could not load LPAR rtas '%s'", filename);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 9922a23..4032db0 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -212,6 +212,106 @@ struct hp_extended_log {
     struct rtas_event_log_v6_hp hp;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_mc {
+#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
+    struct rtas_event_log_v6_section_header hdr;
+    uint32_t fru_id;
+    uint32_t proc_id;
+    uint8_t error_type;
+#define RTAS_LOG_V6_MC_TYPE_UE                           0
+#define RTAS_LOG_V6_MC_TYPE_SLB                          1
+#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
+#define RTAS_LOG_V6_MC_TYPE_TLB                          4
+#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
+#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
+    uint8_t sub_err_type;
+#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
+#define RTAS_LOG_V6_MC_UE_IFETCH                         1
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
+#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
+#define RTAS_LOG_V6_MC_SLB_PARITY                        0
+#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
+#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
+#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
+#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
+#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
+#define RTAS_LOG_V6_MC_TLB_PARITY                        1
+#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
+#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
+    uint8_t reserved_1[6];
+    uint64_t effective_address;
+    uint64_t logical_address;
+} QEMU_PACKED;
+
+struct mc_extended_log {
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_mc mc;
+} QEMU_PACKED;
+
+struct MC_ierror_table {
+    unsigned long srr1_mask;
+    unsigned long srr1_value;
+    bool nip_valid; /* nip is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_ierror_table mc_ierror_table[] = {
+{ 0x00000000081c0000, 0x0000000000040000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000080000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x00000000000c0000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000100000, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000140000, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000180000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, 0, 0, 0, 0, 0 } };
+
+struct MC_derror_table {
+    unsigned long dsisr_value;
+    bool dar_valid; /* dar is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_derror_table mc_derror_table[] = {
+{ 0x00008000, false,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00004000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000800, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000400, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000080, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000100, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, false, 0, 0, 0, 0 } };
+
+#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
+
 typedef enum EventClass {
     EVENT_CLASS_INTERNAL_ERRORS     = 0,
     EVENT_CLASS_EPOW                = 1,
@@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
+{
+    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
+    return RTAS_ERROR_LOG_MAX;
+}
+
+static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
+                                        struct mc_extended_log *ext_elog)
+{
+    int i;
+    CPUPPCState *env = &cpu->env;
+    uint32_t summary;
+    uint64_t dsisr = env->spr[SPR_DSISR];
+
+    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
+    if (recovered) {
+        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
+    } else {
+        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
+    }
+
+    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
+        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
+            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_derror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
+            if (mc_derror_table[i].dar_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
+            }
+
+            summary |= mc_derror_table[i].initiator
+                        | mc_derror_table[i].severity;
+
+            return summary;
+        }
+    } else {
+        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
+            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
+                    mc_ierror_table[i].srr1_value) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
+            if (mc_ierror_table[i].nip_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
+            }
+
+            summary |= mc_ierror_table[i].initiator
+                        | mc_ierror_table[i].severity;
+
+            return summary;
+        }
+    }
+
+    summary |= RTAS_LOG_INITIATOR_CPU;
+    return summary;
+}
+
+static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
+    uint64_t rtas_addr;
+    CPUPPCState *env = &cpu->env;
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    target_ulong r3, msr = 0;
+    struct rtas_error_log log;
+    struct mc_extended_log *ext_elog;
+    uint32_t summary;
+
+    /*
+     * Properly set bits in MSR before we invoke the handler.
+     * SRR0/1, DAR and DSISR are properly set by KVM
+     */
+    if (!(*pcc->interrupts_big_endian)(cpu)) {
+        msr |= (1ULL << MSR_LE);
+    }
+
+    if (env->msr && (1ULL << MSR_SF)) {
+        msr |= (1ULL << MSR_SF);
+    }
+
+    msr |= (1ULL << MSR_ME);
+
+    if (spapr->guest_machine_check_addr == -1) {
+        /*
+         * This implies that we have hit a machine check between system
+         * reset and "ibm,nmi-register". Fall back to the old machine
+         * check behavior in such cases.
+         */
+        env->spr[SPR_SRR0] = env->nip;
+        env->spr[SPR_SRR1] = env->msr;
+        env->msr = msr;
+        env->nip = 0x200;
+        return;
+    }
+
+    ext_elog = g_malloc0(sizeof(struct mc_extended_log));
+    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
+
+    log.summary = cpu_to_be32(summary);
+    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
+
+    /* r3 should be in BE always */
+    r3 = cpu_to_be64(env->gpr[3]);
+    env->msr = msr;
+
+    spapr_init_v6hdr(&ext_elog->v6hdr);
+    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
+    ext_elog->mc.hdr.section_length =
+                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
+    ext_elog->mc.hdr.section_version = 1;
+
+    /* get rtas addr from fdt */
+    rtas_addr = spapr_get_rtas_addr();
+    if (!rtas_addr) {
+        /* Unable to fetch rtas_addr. Hence reset the guest */
+        ppc_cpu_do_system_reset(cs);
+    }
+
+    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
+                              &log, sizeof(log));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
+                              sizeof(log), ext_elog,
+                              sizeof(struct mc_extended_log));
+
+    /* Save gpr[3] in the guest endian mode */
+    if ((*pcc->interrupts_big_endian)(cpu)) {
+        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
+    } else {
+        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
+    }
+
+    env->nip = spapr->guest_machine_check_addr;
+}
+
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
@@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
         }
     }
     spapr->mc_status = cpu->vcpu_id;
+
+    spapr_mce_dispatch_elog(cpu, recovered);
+
+    return;
 }
 
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index f7204d0..03f34bf 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
 #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
 
+/* Offset from rtas-base where error log is placed */
+#define RTAS_ERRLOG_OFFSET       0x25
+
 static inline uint64_t ppc64_phys_to_real(uint64_t addr)
 {
     return addr & ~0xF000000000000000ULL;
@@ -798,6 +801,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
+ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda

Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.

This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    4 +
 hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    4 +
 3 files changed, 253 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2779efe..ffd1715 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
         error_report("Could not get size of LPAR rtas '%s'", filename);
         exit(1);
     }
+
+    /* Resize blob to accommodate error log. */
+    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
+
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
     if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
         error_report("Could not load LPAR rtas '%s'", filename);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 9922a23..4032db0 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -212,6 +212,106 @@ struct hp_extended_log {
     struct rtas_event_log_v6_hp hp;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_mc {
+#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
+    struct rtas_event_log_v6_section_header hdr;
+    uint32_t fru_id;
+    uint32_t proc_id;
+    uint8_t error_type;
+#define RTAS_LOG_V6_MC_TYPE_UE                           0
+#define RTAS_LOG_V6_MC_TYPE_SLB                          1
+#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
+#define RTAS_LOG_V6_MC_TYPE_TLB                          4
+#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
+#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
+    uint8_t sub_err_type;
+#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
+#define RTAS_LOG_V6_MC_UE_IFETCH                         1
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
+#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
+#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
+#define RTAS_LOG_V6_MC_SLB_PARITY                        0
+#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
+#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
+#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
+#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
+#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
+#define RTAS_LOG_V6_MC_TLB_PARITY                        1
+#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
+#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
+    uint8_t reserved_1[6];
+    uint64_t effective_address;
+    uint64_t logical_address;
+} QEMU_PACKED;
+
+struct mc_extended_log {
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_mc mc;
+} QEMU_PACKED;
+
+struct MC_ierror_table {
+    unsigned long srr1_mask;
+    unsigned long srr1_value;
+    bool nip_valid; /* nip is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_ierror_table mc_ierror_table[] = {
+{ 0x00000000081c0000, 0x0000000000040000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000080000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x00000000000c0000, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000100000, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000140000, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000000081c0000, 0x0000000000180000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, 0, 0, 0, 0, 0 } };
+
+struct MC_derror_table {
+    unsigned long dsisr_value;
+    bool dar_valid; /* dar is a valid indicator of faulting address */
+    uint8_t error_type;
+    uint8_t error_subtype;
+    unsigned int initiator;
+    unsigned int severity;
+};
+
+static const struct MC_derror_table mc_derror_table[] = {
+{ 0x00008000, false,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00004000, true,
+  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000800, true,
+  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000400, true,
+  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000080, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0x00000100, true,
+  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
+  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
+{ 0, false, 0, 0, 0, 0 } };
+
+#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
+
 typedef enum EventClass {
     EVENT_CLASS_INTERNAL_ERRORS     = 0,
     EVENT_CLASS_EPOW                = 1,
@@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                             RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
+ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
+{
+    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
+    return RTAS_ERROR_LOG_MAX;
+}
+
+static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
+                                        struct mc_extended_log *ext_elog)
+{
+    int i;
+    CPUPPCState *env = &cpu->env;
+    uint32_t summary;
+    uint64_t dsisr = env->spr[SPR_DSISR];
+
+    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
+    if (recovered) {
+        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
+    } else {
+        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
+    }
+
+    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
+        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
+            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_derror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
+            if (mc_derror_table[i].dar_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
+            }
+
+            summary |= mc_derror_table[i].initiator
+                        | mc_derror_table[i].severity;
+
+            return summary;
+        }
+    } else {
+        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
+            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
+                    mc_ierror_table[i].srr1_value) {
+                continue;
+            }
+
+            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
+            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
+            if (mc_ierror_table[i].nip_valid) {
+                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
+            }
+
+            summary |= mc_ierror_table[i].initiator
+                        | mc_ierror_table[i].severity;
+
+            return summary;
+        }
+    }
+
+    summary |= RTAS_LOG_INITIATOR_CPU;
+    return summary;
+}
+
+static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
+    uint64_t rtas_addr;
+    CPUPPCState *env = &cpu->env;
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    target_ulong r3, msr = 0;
+    struct rtas_error_log log;
+    struct mc_extended_log *ext_elog;
+    uint32_t summary;
+
+    /*
+     * Properly set bits in MSR before we invoke the handler.
+     * SRR0/1, DAR and DSISR are properly set by KVM
+     */
+    if (!(*pcc->interrupts_big_endian)(cpu)) {
+        msr |= (1ULL << MSR_LE);
+    }
+
+    if (env->msr && (1ULL << MSR_SF)) {
+        msr |= (1ULL << MSR_SF);
+    }
+
+    msr |= (1ULL << MSR_ME);
+
+    if (spapr->guest_machine_check_addr == -1) {
+        /*
+         * This implies that we have hit a machine check between system
+         * reset and "ibm,nmi-register". Fall back to the old machine
+         * check behavior in such cases.
+         */
+        env->spr[SPR_SRR0] = env->nip;
+        env->spr[SPR_SRR1] = env->msr;
+        env->msr = msr;
+        env->nip = 0x200;
+        return;
+    }
+
+    ext_elog = g_malloc0(sizeof(struct mc_extended_log));
+    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
+
+    log.summary = cpu_to_be32(summary);
+    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
+
+    /* r3 should be in BE always */
+    r3 = cpu_to_be64(env->gpr[3]);
+    env->msr = msr;
+
+    spapr_init_v6hdr(&ext_elog->v6hdr);
+    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
+    ext_elog->mc.hdr.section_length =
+                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
+    ext_elog->mc.hdr.section_version = 1;
+
+    /* get rtas addr from fdt */
+    rtas_addr = spapr_get_rtas_addr();
+    if (!rtas_addr) {
+        /* Unable to fetch rtas_addr. Hence reset the guest */
+        ppc_cpu_do_system_reset(cs);
+    }
+
+    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
+                              &log, sizeof(log));
+    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
+                              sizeof(log), ext_elog,
+                              sizeof(struct mc_extended_log));
+
+    /* Save gpr[3] in the guest endian mode */
+    if ((*pcc->interrupts_big_endian)(cpu)) {
+        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
+    } else {
+        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
+    }
+
+    env->nip = spapr->guest_machine_check_addr;
+}
+
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
@@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
         }
     }
     spapr->mc_status = cpu->vcpu_id;
+
+    spapr_mce_dispatch_elog(cpu, recovered);
+
+    return;
 }
 
 static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index f7204d0..03f34bf 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
 #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
 
+/* Offset from rtas-base where error log is placed */
+#define RTAS_ERRLOG_OFFSET       0x25
+
 static inline uint64_t ppc64_phys_to_real(uint64_t addr)
 {
     return addr & ~0xF000000000000000ULL;
@@ -798,6 +801,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
+ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, mahesh

Enable the KVM capability KVM_CAP_PPC_FWNMI so that
the KVM causes guest exit with NMI as exit reason
when it encounters a machine check exception on the
address belonging to a guest. Without this capability
enabled, KVM redirects machine check exceptions to
guest's 0x200 vector.

This patch also deals with the case when a guest with
the KVM_CAP_PPC_FWNMI capability enabled is attempted
to migrate to a host that does not support this
capability.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    1 +
 hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
 include/hw/ppc/spapr.h |    4 +++-
 target/ppc/kvm.c       |   14 ++++++++++++++
 target/ppc/kvm_ppc.h   |    6 ++++++
 6 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ffd1715..44e09bb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
     spapr_caps_add_properties(smc, &error_abort);
     smc->irq = &spapr_irq_xics;
     smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index edc5ed0..5b3af04 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
     }
 }
 
+static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
+                                Error **errp)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+
+    if (!val) {
+        return; /* Disabled by default */
+    }
+
+    if (kvm_enabled()) {
+        if (kvmppc_fwnmi_enable(cpu)) {
+            error_setg(errp, "Requested fwnmi capability not support by KVM");
+        }
+    }
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
     [SPAPR_CAP_HTM] = {
         .name = "htm",
@@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
         .type = "bool",
         .apply = cap_ccf_assist_apply,
     },
+    [SPAPR_CAP_FWNMI_MCE] = {
+        .name = "fwnmi-mce",
+        .description = "Handle fwnmi machine check exceptions",
+        .index = SPAPR_CAP_FWNMI_MCE,
+        .get = spapr_cap_get_bool,
+        .set = spapr_cap_set_bool,
+        .type = "bool",
+        .apply = cap_fwnmi_mce_apply,
+    },
 };
 
 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
@@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
+SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
 
 void spapr_caps_init(SpaprMachineState *spapr)
 {
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index d3499f9..997cf19 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -49,6 +49,7 @@
 #include "hw/ppc/fdt.h"
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
+#include "kvm_ppc.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
                                   target_ulong args,
                                   uint32_t nret, target_ulong rets)
 {
+    int ret;
     uint64_t rtas_addr = spapr_get_rtas_addr();
 
     if (!rtas_addr) {
@@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
         return;
     }
 
+    ret = kvmppc_fwnmi_enable(cpu);
+
+    if (ret == 1) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    if (ret < 0) {
+        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+        return;
+    }
+
     spapr->guest_machine_check_addr = rtas_ld(args, 1);
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 03f34bf..9d16ad1 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -78,8 +78,10 @@ typedef enum {
 #define SPAPR_CAP_LARGE_DECREMENTER     0x08
 /* Count Cache Flush Assist HW Instruction */
 #define SPAPR_CAP_CCF_ASSIST            0x09
+/* FWNMI machine check handling */
+#define SPAPR_CAP_FWNMI_MCE             0x0A
 /* Num Caps */
-#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
+#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
 
 /*
  * Capability Values
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 5eedce8..9c7b71d 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
 static int cap_ppc_count_cache_flush_assist;
 static int cap_ppc_nested_kvm_hv;
 static int cap_large_decr;
+static int cap_ppc_fwnmi;
 
 static uint32_t debug_inst_opcode;
 
@@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     kvmppc_get_cpu_characteristics(s);
     cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
     cap_large_decr = kvmppc_get_dec_bits();
+    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
     /*
      * Note: setting it to false because there is not such capability
      * in KVM at this moment.
@@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
     }
 }
 
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+
+    if (!cap_ppc_fwnmi) {
+        return 1;
+    }
+
+    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
+}
+
+
 int kvmppc_smt_threads(void)
 {
     return cap_ppc_smt ? cap_ppc_smt : 1;
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 6edc42f..28919d3 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
 void kvmppc_set_papr(PowerPCCPU *cpu);
 int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
 void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
 int kvmppc_smt_threads(void);
 void kvmppc_hint_smt_possible(Error **errp);
 int kvmppc_set_smt_threads(int smt);
@@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
 {
 }
 
+static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    return 1;
+}
+
 static inline int kvmppc_smt_threads(void)
 {
     return 1;

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda

Enable the KVM capability KVM_CAP_PPC_FWNMI so that
the KVM causes guest exit with NMI as exit reason
when it encounters a machine check exception on the
address belonging to a guest. Without this capability
enabled, KVM redirects machine check exceptions to
guest's 0x200 vector.

This patch also deals with the case when a guest with
the KVM_CAP_PPC_FWNMI capability enabled is attempted
to migrate to a host that does not support this
capability.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    1 +
 hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
 include/hw/ppc/spapr.h |    4 +++-
 target/ppc/kvm.c       |   14 ++++++++++++++
 target/ppc/kvm_ppc.h   |    6 ++++++
 6 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ffd1715..44e09bb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
     spapr_caps_add_properties(smc, &error_abort);
     smc->irq = &spapr_irq_xics;
     smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index edc5ed0..5b3af04 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
     }
 }
 
+static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
+                                Error **errp)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+
+    if (!val) {
+        return; /* Disabled by default */
+    }
+
+    if (kvm_enabled()) {
+        if (kvmppc_fwnmi_enable(cpu)) {
+            error_setg(errp, "Requested fwnmi capability not support by KVM");
+        }
+    }
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
     [SPAPR_CAP_HTM] = {
         .name = "htm",
@@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
         .type = "bool",
         .apply = cap_ccf_assist_apply,
     },
+    [SPAPR_CAP_FWNMI_MCE] = {
+        .name = "fwnmi-mce",
+        .description = "Handle fwnmi machine check exceptions",
+        .index = SPAPR_CAP_FWNMI_MCE,
+        .get = spapr_cap_get_bool,
+        .set = spapr_cap_set_bool,
+        .type = "bool",
+        .apply = cap_fwnmi_mce_apply,
+    },
 };
 
 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
@@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
+SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
 
 void spapr_caps_init(SpaprMachineState *spapr)
 {
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index d3499f9..997cf19 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -49,6 +49,7 @@
 #include "hw/ppc/fdt.h"
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
+#include "kvm_ppc.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
                                   target_ulong args,
                                   uint32_t nret, target_ulong rets)
 {
+    int ret;
     uint64_t rtas_addr = spapr_get_rtas_addr();
 
     if (!rtas_addr) {
@@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
         return;
     }
 
+    ret = kvmppc_fwnmi_enable(cpu);
+
+    if (ret == 1) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    if (ret < 0) {
+        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+        return;
+    }
+
     spapr->guest_machine_check_addr = rtas_ld(args, 1);
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 03f34bf..9d16ad1 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -78,8 +78,10 @@ typedef enum {
 #define SPAPR_CAP_LARGE_DECREMENTER     0x08
 /* Count Cache Flush Assist HW Instruction */
 #define SPAPR_CAP_CCF_ASSIST            0x09
+/* FWNMI machine check handling */
+#define SPAPR_CAP_FWNMI_MCE             0x0A
 /* Num Caps */
-#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
+#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
 
 /*
  * Capability Values
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 5eedce8..9c7b71d 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
 static int cap_ppc_count_cache_flush_assist;
 static int cap_ppc_nested_kvm_hv;
 static int cap_large_decr;
+static int cap_ppc_fwnmi;
 
 static uint32_t debug_inst_opcode;
 
@@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     kvmppc_get_cpu_characteristics(s);
     cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
     cap_large_decr = kvmppc_get_dec_bits();
+    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
     /*
      * Note: setting it to false because there is not such capability
      * in KVM at this moment.
@@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
     }
 }
 
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+
+    if (!cap_ppc_fwnmi) {
+        return 1;
+    }
+
+    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
+}
+
+
 int kvmppc_smt_threads(void)
 {
     return cap_ppc_smt ? cap_ppc_smt : 1;
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 6edc42f..28919d3 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
 void kvmppc_set_papr(PowerPCCPU *cpu);
 int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
 void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
+int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
 int kvmppc_smt_threads(void);
 void kvmppc_hint_smt_possible(Error **errp);
 int kvmppc_set_smt_threads(int smt);
@@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
 {
 }
 
+static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
+{
+    return 1;
+}
+
 static inline int kvmppc_smt_threads(void)
 {
     return 1;



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 6/6] migration: Block migration while handling machine check
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda, mahesh

Block VM migration requests until the machine check
error handling is complete as (i) these errors are
specific to the source hardware and is irrelevant on
the target hardware, (ii) these errors cause data
corruption and should be handled before migration.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr_events.c  |   17 +++++++++++++++++
 hw/ppc/spapr_rtas.c    |    4 ++++
 include/hw/ppc/spapr.h |    3 +++
 3 files changed, 24 insertions(+)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 4032db0..45b990c 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -41,6 +41,7 @@
 #include "qemu/bcd.h"
 #include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
+#include "migration/blocker.h"
 
 #define RTAS_LOG_VERSION_MASK                   0xff000000
 #define   RTAS_LOG_VERSION_6                    0x06000000
@@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int ret;
+    Error *local_err = NULL;
+
+    error_setg(&spapr->migration_blocker,
+            "Live migration not supported during machine check handling");
+    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
+    if (ret < 0) {
+        /*
+         * We don't want to abort and let the migration to continue. In a
+         * rare case, the machine check handler will run on the target
+         * hardware. Though this is not preferable, it is better than aborting
+         * the migration or killing the VM.
+         */
+        error_free(spapr->migration_blocker);
+        fprintf(stderr, "Warning: Machine check during VM migration\n");
+    }
 
     while (spapr->mc_status != -1) {
         /*
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 997cf19..1229a0e 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -50,6 +50,7 @@
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
 #include "kvm_ppc.h"
+#include "migration/blocker.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
         spapr->mc_status = -1;
         qemu_cond_signal(&spapr->mc_delivery_cond);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+        migrate_del_blocker(spapr->migration_blocker);
+        error_free(spapr->migration_blocker);
+        spapr->migration_blocker = NULL;
     }
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9d16ad1..dda5fd2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -10,6 +10,7 @@
 #include "hw/ppc/spapr_irq.h"
 #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
 #include "hw/ppc/xics.h"        /* For ICSState */
+#include "qapi/error.h"
 
 struct SpaprVioBus;
 struct SpaprPhbState;
@@ -213,6 +214,8 @@ struct SpaprMachineState {
     SpaprCapabilities def, eff, mig;
 
     unsigned gpu_numa_id;
+
+    Error *migration_blocker;
 };
 
 #define H_SUCCESS         0

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH v8 6/6] migration: Block migration while handling machine check
@ 2019-04-22  7:03   ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-22  7:03 UTC (permalink / raw)
  To: aik, qemu-ppc, qemu-devel, david; +Cc: paulus, aravinda

Block VM migration requests until the machine check
error handling is complete as (i) these errors are
specific to the source hardware and is irrelevant on
the target hardware, (ii) these errors cause data
corruption and should be handled before migration.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr_events.c  |   17 +++++++++++++++++
 hw/ppc/spapr_rtas.c    |    4 ++++
 include/hw/ppc/spapr.h |    3 +++
 3 files changed, 24 insertions(+)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 4032db0..45b990c 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -41,6 +41,7 @@
 #include "qemu/bcd.h"
 #include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
+#include "migration/blocker.h"
 
 #define RTAS_LOG_VERSION_MASK                   0xff000000
 #define   RTAS_LOG_VERSION_6                    0x06000000
@@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    int ret;
+    Error *local_err = NULL;
+
+    error_setg(&spapr->migration_blocker,
+            "Live migration not supported during machine check handling");
+    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
+    if (ret < 0) {
+        /*
+         * We don't want to abort and let the migration to continue. In a
+         * rare case, the machine check handler will run on the target
+         * hardware. Though this is not preferable, it is better than aborting
+         * the migration or killing the VM.
+         */
+        error_free(spapr->migration_blocker);
+        fprintf(stderr, "Warning: Machine check during VM migration\n");
+    }
 
     while (spapr->mc_status != -1) {
         /*
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 997cf19..1229a0e 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -50,6 +50,7 @@
 #include "target/ppc/mmu-hash64.h"
 #include "target/ppc/mmu-book3s-v3.h"
 #include "kvm_ppc.h"
+#include "migration/blocker.h"
 
 static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
         spapr->mc_status = -1;
         qemu_cond_signal(&spapr->mc_delivery_cond);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+        migrate_del_blocker(spapr->migration_blocker);
+        error_free(spapr->migration_blocker);
+        spapr->migration_blocker = NULL;
     }
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9d16ad1..dda5fd2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -10,6 +10,7 @@
 #include "hw/ppc/spapr_irq.h"
 #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
 #include "hw/ppc/xics.h"        /* For ICSState */
+#include "qapi/error.h"
 
 struct SpaprVioBus;
 struct SpaprPhbState;
@@ -213,6 +214,8 @@ struct SpaprMachineState {
     SpaprCapabilities def, eff, mig;
 
     unsigned gpu_numa_id;
+
+    Error *migration_blocker;
 };
 
 #define H_SUCCESS         0



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
@ 2019-04-23  6:45     ` David Gibson
  0 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-04-23  6:45 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-ppc, qemu-devel, paulus, mahesh

[-- Attachment #1: Type: text/plain, Size: 8012 bytes --]

On Mon, Apr 22, 2019 at 12:32:58PM +0530, Aravinda Prasad wrote:
> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed. This patch implements the
> releasing part of the error-log while subsequent patch
> (which builds error log) handles the locking part.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Although I wonder if it needs to be moved later in the series to avoid
advertising the availability of the RTAS calls to the guest before all
the prereq patches are in place to make them work properly.

> ---
>  hw/ppc/spapr.c         |   18 ++++++++++++++
>  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    9 ++++++-
>  3 files changed, 87 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c56939a..6642cb5 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
>      first_ppc_cpu->env.gpr[5] = 0;
>  
>      spapr->cas_reboot = false;
> +
> +    spapr->guest_machine_check_addr = -1;
> +
> +    /* Signal all vCPUs waiting on this condition */
> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>  }
>  
>  static void spapr_create_nvram(SpaprMachineState *spapr)
> @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
>      },
>  };
>  
> +static const VMStateDescription vmstate_spapr_machine_check = {
> +    .name = "spapr_machine_check",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_dtb,
>          &vmstate_spapr_cap_large_decr,
>          &vmstate_spapr_cap_ccf_assist,
> +        &vmstate_spapr_machine_check,
>          NULL
>      }
>  };
> @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
>  
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
> +
> +    qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index ee24212..c2f3991 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    uint64_t rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (!spapr->guest_machine_check_addr) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>      }
>  }
>  
> +uint64_t spapr_get_rtas_addr(void)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int rtas_node;
> +    const struct fdt_property *rtas_addr_prop;
> +    void *fdt = spapr->fdt_blob;
> +    uint32_t rtas_addr;
> +
> +    /* fetch rtas addr from fdt */
> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> +    if (rtas_node == 0) {
> +        return 0;
> +    }
> +
> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> +    if (!rtas_addr_prop) {
> +        return 0;
> +    }
> +
> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> +    return (uint64_t)rtas_addr;
> +}
> +
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
>                          rtas_set_power_level);
>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>                          rtas_get_power_level);
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
>  }
>  
>  type_init(core_rtas_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 7e32f30..ec6f33e 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -187,6 +187,10 @@ struct SpaprMachineState {
>       * occurs during the unplug process. */
>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>  
> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> +    target_ulong guest_machine_check_addr;
> +    QemuCond mc_delivery_cond;
> +
>      /*< public >*/
>      char *kvm_type;
>      char *host_model;
> @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> +uint64_t spapr_get_rtas_addr(void);
>  #endif /* HW_SPAPR_H */
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
@ 2019-04-23  6:45     ` David Gibson
  0 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-04-23  6:45 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, qemu-ppc, aik, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8012 bytes --]

On Mon, Apr 22, 2019 at 12:32:58PM +0530, Aravinda Prasad wrote:
> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed. This patch implements the
> releasing part of the error-log while subsequent patch
> (which builds error log) handles the locking part.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Although I wonder if it needs to be moved later in the series to avoid
advertising the availability of the RTAS calls to the guest before all
the prereq patches are in place to make them work properly.

> ---
>  hw/ppc/spapr.c         |   18 ++++++++++++++
>  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    9 ++++++-
>  3 files changed, 87 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c56939a..6642cb5 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
>      first_ppc_cpu->env.gpr[5] = 0;
>  
>      spapr->cas_reboot = false;
> +
> +    spapr->guest_machine_check_addr = -1;
> +
> +    /* Signal all vCPUs waiting on this condition */
> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>  }
>  
>  static void spapr_create_nvram(SpaprMachineState *spapr)
> @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
>      },
>  };
>  
> +static const VMStateDescription vmstate_spapr_machine_check = {
> +    .name = "spapr_machine_check",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_dtb,
>          &vmstate_spapr_cap_large_decr,
>          &vmstate_spapr_cap_ccf_assist,
> +        &vmstate_spapr_machine_check,
>          NULL
>      }
>  };
> @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
>  
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
> +
> +    qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index ee24212..c2f3991 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    uint64_t rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (!spapr->guest_machine_check_addr) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>      }
>  }
>  
> +uint64_t spapr_get_rtas_addr(void)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int rtas_node;
> +    const struct fdt_property *rtas_addr_prop;
> +    void *fdt = spapr->fdt_blob;
> +    uint32_t rtas_addr;
> +
> +    /* fetch rtas addr from fdt */
> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> +    if (rtas_node == 0) {
> +        return 0;
> +    }
> +
> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> +    if (!rtas_addr_prop) {
> +        return 0;
> +    }
> +
> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> +    return (uint64_t)rtas_addr;
> +}
> +
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
>                          rtas_set_power_level);
>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>                          rtas_get_power_level);
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
>  }
>  
>  type_init(core_rtas_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 7e32f30..ec6f33e 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -187,6 +187,10 @@ struct SpaprMachineState {
>       * occurs during the unplug process. */
>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>  
> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> +    target_ulong guest_machine_check_addr;
> +    QemuCond mc_delivery_cond;
> +
>      /*< public >*/
>      char *kvm_type;
>      char *host_model;
> @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> +uint64_t spapr_get_rtas_addr(void);
>  #endif /* HW_SPAPR_H */
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 2/6] Wrapper function to wait on condition for the main loop mutex
@ 2019-04-23  6:47     ` David Gibson
  0 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-04-23  6:47 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-ppc, qemu-devel, paulus, mahesh

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On Mon, Apr 22, 2019 at 12:33:07PM +0530, Aravinda Prasad wrote:
> Introduce a wrapper function to wait on condition for
> the main loop mutex. This function atomically releases
> the main loop mutex and causes the calling thread to
> block on the condition. This wrapper is required because
> qemu_global_mutex is a static variable.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  cpus.c                   |    5 +++++
>  include/qemu/main-loop.h |    8 ++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index e83f72b..d9379e7 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1858,6 +1858,11 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> +void qemu_cond_wait_iothread(QemuCond *cond)
> +{
> +    qemu_cond_wait(cond, &qemu_global_mutex);
> +}
> +
>  static bool all_vcpus_paused(void)
>  {
>      CPUState *cpu;
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index f6ba78e..a6d20b0 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
>   */
>  void qemu_mutex_unlock_iothread(void);
>  
> +/*
> + * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
> + *
> + * This function atomically releases the main loop mutex and causes
> + * the calling thread to block on the condition.
> + */
> +void qemu_cond_wait_iothread(QemuCond *cond);
> +
>  /* internal interfaces */
>  
>  void qemu_fd_register(int fd);
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 2/6] Wrapper function to wait on condition for the main loop mutex
@ 2019-04-23  6:47     ` David Gibson
  0 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-04-23  6:47 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, qemu-ppc, aik, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On Mon, Apr 22, 2019 at 12:33:07PM +0530, Aravinda Prasad wrote:
> Introduce a wrapper function to wait on condition for
> the main loop mutex. This function atomically releases
> the main loop mutex and causes the calling thread to
> block on the condition. This wrapper is required because
> qemu_global_mutex is a static variable.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  cpus.c                   |    5 +++++
>  include/qemu/main-loop.h |    8 ++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index e83f72b..d9379e7 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1858,6 +1858,11 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> +void qemu_cond_wait_iothread(QemuCond *cond)
> +{
> +    qemu_cond_wait(cond, &qemu_global_mutex);
> +}
> +
>  static bool all_vcpus_paused(void)
>  {
>      CPUState *cpu;
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index f6ba78e..a6d20b0 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
>   */
>  void qemu_mutex_unlock_iothread(void);
>  
> +/*
> + * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
> + *
> + * This function atomically releases the main loop mutex and causes
> + * the calling thread to block on the condition.
> + */
> +void qemu_cond_wait_iothread(QemuCond *cond);
> +
>  /* internal interfaces */
>  
>  void qemu_fd_register(int fd);
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
@ 2019-04-23  6:53     ` David Gibson
  0 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-04-23  6:53 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-ppc, qemu-devel, paulus, mahesh

[-- Attachment #1: Type: text/plain, Size: 7343 bytes --]

On Mon, Apr 22, 2019 at 12:33:16PM +0530, Aravinda Prasad wrote:
> Memory error such as bit flips that cannot be corrected
> by hardware are passed on to the kernel for handling.
> If the memory address in error belongs to guest then
> the guest kernel is responsible for taking suitable action.
> Patch [1] enhances KVM to exit guest with exit reason
> set to KVM_EXIT_NMI in such cases. This patch handles
> KVM_EXIT_NMI exit.
> 
> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>     (e20bbd3d and related commits)
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

LGTM, apart from one detail noted below.

> ---
>  hw/ppc/spapr.c          |    3 +++
>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c     |    5 +++++
>  include/hw/ppc/spapr.h  |    6 ++++++
>  target/ppc/kvm.c        |   16 ++++++++++++++++
>  target/ppc/kvm_ppc.h    |    2 ++
>  target/ppc/trace-events |    2 ++
>  7 files changed, 56 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6642cb5..2779efe 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
>  
>      spapr->cas_reboot = false;
>  
> +    spapr->mc_status = -1;
>      spapr->guest_machine_check_addr = -1;
>  
>      /* Signal all vCPUs waiting on this condition */
> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
>      .minimum_version_id = 1,
>      .fields = (VMStateField[]) {
>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_INT32(mc_status, SpaprMachineState),

So, technically this is a breaking change to the migration stream.  If
this is applied immediately after the earlier patch introducing the
subsection it would be ok in practice, but it would still be
preferable to make all the migration stream changes together.

>          VMSTATE_END_OF_LIST()
>      },
>  };
> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
>  
> +    spapr->mc_status = -1;
>      qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ae0f093..9922a23 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +
> +    while (spapr->mc_status != -1) {
> +        /*
> +         * Check whether the same CPU got machine check error
> +         * while still handling the mc error (i.e., before
> +         * that CPU called "ibm,nmi-interlock"
> +         */
> +        if (spapr->mc_status == cpu->vcpu_id) {
> +            qemu_system_guest_panicked(NULL);
> +        }
> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> +        /* Meanwhile if the system is reset, then just return */
> +        if (spapr->guest_machine_check_addr == -1) {
> +            return;
> +        }
> +    }
> +    spapr->mc_status = cpu->vcpu_id;
> +}
> +
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                              uint32_t token, uint32_t nargs,
>                              target_ulong args,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index c2f3991..d3499f9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          /* NMI register not called */
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>      } else {
> +        /*
> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_status.
> +         */
> +        spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>      }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ec6f33e..f7204d0 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -189,6 +189,11 @@ struct SpaprMachineState {
>  
>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>      target_ulong guest_machine_check_addr;
> +    /*
> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> +     * handling the mc.
> +     */
> +    int mc_status;
>      QemuCond mc_delivery_cond;
>  
>      /*< public >*/
> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 9e86db0..5eedce8 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ret = 0;
>          break;
>  
> +    case KVM_EXIT_NMI:
> +        trace_kvm_handle_nmi_exception();
> +        ret = kvm_handle_nmi(cpu, run);
> +        break;
> +
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>      return data & 0xffff;
>  }
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> +{
> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    spapr_mce_req_event(cpu, recovered);
> +
> +    return 0;
> +}
> +
>  int kvmppc_enable_hwrng(void)
>  {
>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 2238513..6edc42f 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> +
>  #else
>  
>  static inline uint32_t kvmppc_get_tbfreq(void)
> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> index 7b3cfe1..d5691d2 100644
> --- a/target/ppc/trace-events
> +++ b/target/ppc/trace-events
> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>  kvm_handle_epr(void) "handle epr"
>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>  kvm_handle_debug_exception(void) "handle debug exception"
> +kvm_handle_nmi_exception(void) "handle NMI exception"
> +
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
@ 2019-04-23  6:53     ` David Gibson
  0 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-04-23  6:53 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, qemu-ppc, aik, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7343 bytes --]

On Mon, Apr 22, 2019 at 12:33:16PM +0530, Aravinda Prasad wrote:
> Memory error such as bit flips that cannot be corrected
> by hardware are passed on to the kernel for handling.
> If the memory address in error belongs to guest then
> the guest kernel is responsible for taking suitable action.
> Patch [1] enhances KVM to exit guest with exit reason
> set to KVM_EXIT_NMI in such cases. This patch handles
> KVM_EXIT_NMI exit.
> 
> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>     (e20bbd3d and related commits)
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

LGTM, apart from one detail noted below.

> ---
>  hw/ppc/spapr.c          |    3 +++
>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c     |    5 +++++
>  include/hw/ppc/spapr.h  |    6 ++++++
>  target/ppc/kvm.c        |   16 ++++++++++++++++
>  target/ppc/kvm_ppc.h    |    2 ++
>  target/ppc/trace-events |    2 ++
>  7 files changed, 56 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6642cb5..2779efe 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
>  
>      spapr->cas_reboot = false;
>  
> +    spapr->mc_status = -1;
>      spapr->guest_machine_check_addr = -1;
>  
>      /* Signal all vCPUs waiting on this condition */
> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
>      .minimum_version_id = 1,
>      .fields = (VMStateField[]) {
>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_INT32(mc_status, SpaprMachineState),

So, technically this is a breaking change to the migration stream.  If
this is applied immediately after the earlier patch introducing the
subsection it would be ok in practice, but it would still be
preferable to make all the migration stream changes together.

>          VMSTATE_END_OF_LIST()
>      },
>  };
> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
>  
> +    spapr->mc_status = -1;
>      qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ae0f093..9922a23 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +
> +    while (spapr->mc_status != -1) {
> +        /*
> +         * Check whether the same CPU got machine check error
> +         * while still handling the mc error (i.e., before
> +         * that CPU called "ibm,nmi-interlock"
> +         */
> +        if (spapr->mc_status == cpu->vcpu_id) {
> +            qemu_system_guest_panicked(NULL);
> +        }
> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> +        /* Meanwhile if the system is reset, then just return */
> +        if (spapr->guest_machine_check_addr == -1) {
> +            return;
> +        }
> +    }
> +    spapr->mc_status = cpu->vcpu_id;
> +}
> +
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                              uint32_t token, uint32_t nargs,
>                              target_ulong args,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index c2f3991..d3499f9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          /* NMI register not called */
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>      } else {
> +        /*
> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_status.
> +         */
> +        spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>      }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ec6f33e..f7204d0 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -189,6 +189,11 @@ struct SpaprMachineState {
>  
>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>      target_ulong guest_machine_check_addr;
> +    /*
> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> +     * handling the mc.
> +     */
> +    int mc_status;
>      QemuCond mc_delivery_cond;
>  
>      /*< public >*/
> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 9e86db0..5eedce8 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ret = 0;
>          break;
>  
> +    case KVM_EXIT_NMI:
> +        trace_kvm_handle_nmi_exception();
> +        ret = kvm_handle_nmi(cpu, run);
> +        break;
> +
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>      return data & 0xffff;
>  }
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> +{
> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    spapr_mce_req_event(cpu, recovered);
> +
> +    return 0;
> +}
> +
>  int kvmppc_enable_hwrng(void)
>  {
>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 2238513..6edc42f 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> +
>  #else
>  
>  static inline uint32_t kvmppc_get_tbfreq(void)
> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> index 7b3cfe1..d5691d2 100644
> --- a/target/ppc/trace-events
> +++ b/target/ppc/trace-events
> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>  kvm_handle_epr(void) "handle epr"
>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>  kvm_handle_debug_exception(void) "handle debug exception"
> +kvm_handle_nmi_exception(void) "handle NMI exception"
> +
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
@ 2019-04-23 14:38     ` Fabiano Rosas
  0 siblings, 0 replies; 65+ messages in thread
From: Fabiano Rosas @ 2019-04-23 14:38 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, mahesh, aik, qemu-ppc, qemu-devel, david

Aravinda Prasad <aravinda@linux.vnet.ibm.com> writes:

> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr && (1ULL << MSR_SF)) {

Don't you mean & instead of &&?

> +        msr |= (1ULL << MSR_SF);
> +    }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
@ 2019-04-23 14:38     ` Fabiano Rosas
  0 siblings, 0 replies; 65+ messages in thread
From: Fabiano Rosas @ 2019-04-23 14:38 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

Aravinda Prasad <aravinda@linux.vnet.ibm.com> writes:

> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr && (1ULL << MSR_SF)) {

Don't you mean & instead of &&?

> +        msr |= (1ULL << MSR_SF);
> +    }



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
@ 2019-04-24  4:50       ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-24  4:50 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-ppc, aik, qemu-devel, mahesh



On Tuesday 23 April 2019 12:23 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:33:16PM +0530, Aravinda Prasad wrote:
>> Memory error such as bit flips that cannot be corrected
>> by hardware are passed on to the kernel for handling.
>> If the memory address in error belongs to guest then
>> the guest kernel is responsible for taking suitable action.
>> Patch [1] enhances KVM to exit guest with exit reason
>> set to KVM_EXIT_NMI in such cases. This patch handles
>> KVM_EXIT_NMI exit.
>>
>> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>>     (e20bbd3d and related commits)
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> 
> LGTM, apart from one detail noted below.
> 
>> ---
>>  hw/ppc/spapr.c          |    3 +++
>>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
>>  hw/ppc/spapr_rtas.c     |    5 +++++
>>  include/hw/ppc/spapr.h  |    6 ++++++
>>  target/ppc/kvm.c        |   16 ++++++++++++++++
>>  target/ppc/kvm_ppc.h    |    2 ++
>>  target/ppc/trace-events |    2 ++
>>  7 files changed, 56 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 6642cb5..2779efe 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
>>  
>>      spapr->cas_reboot = false;
>>  
>> +    spapr->mc_status = -1;
>>      spapr->guest_machine_check_addr = -1;
>>  
>>      /* Signal all vCPUs waiting on this condition */
>> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
>>      .minimum_version_id = 1,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> 
> So, technically this is a breaking change to the migration stream.  If
> this is applied immediately after the earlier patch introducing the
> subsection it would be ok in practice, but it would still be
> preferable to make all the migration stream changes together.

Do you mean that all .fields entries to vmstate_spapr_machine_check
should be in a single patch?

Because this patch introduced the variable mc_status, I added it to
vmstate_spapr_machine_check.

Regards,
Aravinda

> 
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
>>          kvmppc_spapr_enable_inkernel_multitce();
>>      }
>>  
>> +    spapr->mc_status = -1;
>>      qemu_cond_init(&spapr->mc_delivery_cond);
>>  }
>>  
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index ae0f093..9922a23 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>  }
>>  
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +
>> +    while (spapr->mc_status != -1) {
>> +        /*
>> +         * Check whether the same CPU got machine check error
>> +         * while still handling the mc error (i.e., before
>> +         * that CPU called "ibm,nmi-interlock"
>> +         */
>> +        if (spapr->mc_status == cpu->vcpu_id) {
>> +            qemu_system_guest_panicked(NULL);
>> +        }
>> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
>> +        /* Meanwhile if the system is reset, then just return */
>> +        if (spapr->guest_machine_check_addr == -1) {
>> +            return;
>> +        }
>> +    }
>> +    spapr->mc_status = cpu->vcpu_id;
>> +}
>> +
>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                              uint32_t token, uint32_t nargs,
>>                              target_ulong args,
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index c2f3991..d3499f9 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          /* NMI register not called */
>>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>      } else {
>> +        /*
>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +         * hence unset mc_status.
>> +         */
>> +        spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>      }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index ec6f33e..f7204d0 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -189,6 +189,11 @@ struct SpaprMachineState {
>>  
>>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>      target_ulong guest_machine_check_addr;
>> +    /*
>> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
>> +     * handling the mc.
>> +     */
>> +    int mc_status;
>>      QemuCond mc_delivery_cond;
>>  
>>      /*< public >*/
>> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>>  int spapr_max_server_number(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                        uint64_t pte0, uint64_t pte1);
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>  
>>  /* DRC callbacks. */
>>  void spapr_core_release(DeviceState *dev);
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 9e86db0..5eedce8 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>          ret = 0;
>>          break;
>>  
>> +    case KVM_EXIT_NMI:
>> +        trace_kvm_handle_nmi_exception();
>> +        ret = kvm_handle_nmi(cpu, run);
>> +        break;
>> +
>>      default:
>>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>          ret = -1;
>> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>      return data & 0xffff;
>>  }
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>> +{
>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>> +
>> +    cpu_synchronize_state(CPU(cpu));
>> +
>> +    spapr_mce_req_event(cpu, recovered);
>> +
>> +    return 0;
>> +}
>> +
>>  int kvmppc_enable_hwrng(void)
>>  {
>>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index 2238513..6edc42f 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
>> +
>>  #else
>>  
>>  static inline uint32_t kvmppc_get_tbfreq(void)
>> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
>> index 7b3cfe1..d5691d2 100644
>> --- a/target/ppc/trace-events
>> +++ b/target/ppc/trace-events
>> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>>  kvm_handle_epr(void) "handle epr"
>>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>>  kvm_handle_debug_exception(void) "handle debug exception"
>> +kvm_handle_nmi_exception(void) "handle NMI exception"
>> +
>>
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
@ 2019-04-24  4:50       ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-24  4:50 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, aik, qemu-ppc, qemu-devel



On Tuesday 23 April 2019 12:23 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:33:16PM +0530, Aravinda Prasad wrote:
>> Memory error such as bit flips that cannot be corrected
>> by hardware are passed on to the kernel for handling.
>> If the memory address in error belongs to guest then
>> the guest kernel is responsible for taking suitable action.
>> Patch [1] enhances KVM to exit guest with exit reason
>> set to KVM_EXIT_NMI in such cases. This patch handles
>> KVM_EXIT_NMI exit.
>>
>> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>>     (e20bbd3d and related commits)
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> 
> LGTM, apart from one detail noted below.
> 
>> ---
>>  hw/ppc/spapr.c          |    3 +++
>>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
>>  hw/ppc/spapr_rtas.c     |    5 +++++
>>  include/hw/ppc/spapr.h  |    6 ++++++
>>  target/ppc/kvm.c        |   16 ++++++++++++++++
>>  target/ppc/kvm_ppc.h    |    2 ++
>>  target/ppc/trace-events |    2 ++
>>  7 files changed, 56 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 6642cb5..2779efe 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
>>  
>>      spapr->cas_reboot = false;
>>  
>> +    spapr->mc_status = -1;
>>      spapr->guest_machine_check_addr = -1;
>>  
>>      /* Signal all vCPUs waiting on this condition */
>> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
>>      .minimum_version_id = 1,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> 
> So, technically this is a breaking change to the migration stream.  If
> this is applied immediately after the earlier patch introducing the
> subsection it would be ok in practice, but it would still be
> preferable to make all the migration stream changes together.

Do you mean that all .fields entries to vmstate_spapr_machine_check
should be in a single patch?

Because this patch introduced the variable mc_status, I added it to
vmstate_spapr_machine_check.

Regards,
Aravinda

> 
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
>>          kvmppc_spapr_enable_inkernel_multitce();
>>      }
>>  
>> +    spapr->mc_status = -1;
>>      qemu_cond_init(&spapr->mc_delivery_cond);
>>  }
>>  
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index ae0f093..9922a23 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>  }
>>  
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +
>> +    while (spapr->mc_status != -1) {
>> +        /*
>> +         * Check whether the same CPU got machine check error
>> +         * while still handling the mc error (i.e., before
>> +         * that CPU called "ibm,nmi-interlock"
>> +         */
>> +        if (spapr->mc_status == cpu->vcpu_id) {
>> +            qemu_system_guest_panicked(NULL);
>> +        }
>> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
>> +        /* Meanwhile if the system is reset, then just return */
>> +        if (spapr->guest_machine_check_addr == -1) {
>> +            return;
>> +        }
>> +    }
>> +    spapr->mc_status = cpu->vcpu_id;
>> +}
>> +
>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                              uint32_t token, uint32_t nargs,
>>                              target_ulong args,
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index c2f3991..d3499f9 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          /* NMI register not called */
>>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>      } else {
>> +        /*
>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +         * hence unset mc_status.
>> +         */
>> +        spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>      }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index ec6f33e..f7204d0 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -189,6 +189,11 @@ struct SpaprMachineState {
>>  
>>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>      target_ulong guest_machine_check_addr;
>> +    /*
>> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
>> +     * handling the mc.
>> +     */
>> +    int mc_status;
>>      QemuCond mc_delivery_cond;
>>  
>>      /*< public >*/
>> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>>  int spapr_max_server_number(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                        uint64_t pte0, uint64_t pte1);
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>  
>>  /* DRC callbacks. */
>>  void spapr_core_release(DeviceState *dev);
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 9e86db0..5eedce8 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>          ret = 0;
>>          break;
>>  
>> +    case KVM_EXIT_NMI:
>> +        trace_kvm_handle_nmi_exception();
>> +        ret = kvm_handle_nmi(cpu, run);
>> +        break;
>> +
>>      default:
>>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>          ret = -1;
>> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>      return data & 0xffff;
>>  }
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>> +{
>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>> +
>> +    cpu_synchronize_state(CPU(cpu));
>> +
>> +    spapr_mce_req_event(cpu, recovered);
>> +
>> +    return 0;
>> +}
>> +
>>  int kvmppc_enable_hwrng(void)
>>  {
>>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index 2238513..6edc42f 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
>> +
>>  #else
>>  
>>  static inline uint32_t kvmppc_get_tbfreq(void)
>> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
>> index 7b3cfe1..d5691d2 100644
>> --- a/target/ppc/trace-events
>> +++ b/target/ppc/trace-events
>> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>>  kvm_handle_epr(void) "handle epr"
>>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>>  kvm_handle_debug_exception(void) "handle debug exception"
>> +kvm_handle_nmi_exception(void) "handle NMI exception"
>> +
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
@ 2019-04-24  4:51       ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-24  4:51 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: aik, mahesh, qemu-devel, paulus, qemu-ppc, david



On Tuesday 23 April 2019 08:08 PM, Fabiano Rosas wrote:
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> writes:
> 
>> +    /*
>> +     * Properly set bits in MSR before we invoke the handler.
>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>> +     */
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +    if (env->msr && (1ULL << MSR_SF)) {
> 
> Don't you mean & instead of &&?

Ah.. yes.. Thanks for pointing out.

> 
>> +        msr |= (1ULL << MSR_SF);
>> +    }
> 
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
@ 2019-04-24  4:51       ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-24  4:51 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Tuesday 23 April 2019 08:08 PM, Fabiano Rosas wrote:
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> writes:
> 
>> +    /*
>> +     * Properly set bits in MSR before we invoke the handler.
>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>> +     */
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +    if (env->msr && (1ULL << MSR_SF)) {
> 
> Don't you mean & instead of &&?

Ah.. yes.. Thanks for pointing out.

> 
>> +        msr |= (1ULL << MSR_SF);
>> +    }
> 
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
@ 2019-04-25  4:56       ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-25  4:56 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-ppc, qemu-devel, paulus, mahesh



On Tuesday 23 April 2019 12:15 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:32:58PM +0530, Aravinda Prasad wrote:
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls.
>>
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor that also received a machine check error waits
>> till the first processor is done reading the error log.
>> The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed. This patch implements the
>> releasing part of the error-log while subsequent patch
>> (which builds error log) handles the locking part.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Although I wonder if it needs to be moved later in the series to avoid
> advertising the availability of the RTAS calls to the guest before all
> the prereq patches are in place to make them work properly.

Patch 3 and 4 uses "guest_machine_check_addr", which is set in this
patch. If we push this beyond patch 4, then I feel the use of
"guest_machine_check_addr" looks odd in patch 3 and 4.

Regards,
Aravinda

> 
>> ---
>>  hw/ppc/spapr.c         |   18 ++++++++++++++
>>  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    9 ++++++-
>>  3 files changed, 87 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index c56939a..6642cb5 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
>>      first_ppc_cpu->env.gpr[5] = 0;
>>  
>>      spapr->cas_reboot = false;
>> +
>> +    spapr->guest_machine_check_addr = -1;
>> +
>> +    /* Signal all vCPUs waiting on this condition */
>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>> @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>      },
>>  };
>>  
>> +static const VMStateDescription vmstate_spapr_machine_check = {
>> +    .name = "spapr_machine_check",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static const VMStateDescription vmstate_spapr = {
>>      .name = "spapr",
>>      .version_id = 3,
>> @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
>>          &vmstate_spapr_dtb,
>>          &vmstate_spapr_cap_large_decr,
>>          &vmstate_spapr_cap_ccf_assist,
>> +        &vmstate_spapr_machine_check,
>>          NULL
>>      }
>>  };
>> @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
>>  
>>          kvmppc_spapr_enable_inkernel_multitce();
>>      }
>> +
>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index ee24212..c2f3991 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>      rtas_st(rets, 1, 100);
>>  }
>>  
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  SpaprMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    uint64_t rtas_addr = spapr_get_rtas_addr();
>> +
>> +    if (!rtas_addr) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (!spapr->guest_machine_check_addr) {
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +    } else {
>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    }
>> +}
>> +
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>      }
>>  }
>>  
>> +uint64_t spapr_get_rtas_addr(void)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int rtas_node;
>> +    const struct fdt_property *rtas_addr_prop;
>> +    void *fdt = spapr->fdt_blob;
>> +    uint32_t rtas_addr;
>> +
>> +    /* fetch rtas addr from fdt */
>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>> +    if (rtas_node == 0) {
>> +        return 0;
>> +    }
>> +
>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
>> +    if (!rtas_addr_prop) {
>> +        return 0;
>> +    }
>> +
>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>> +    return (uint64_t)rtas_addr;
>> +}
>> +
>> +
>>  static void core_rtas_register_types(void)
>>  {
>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
>>                          rtas_set_power_level);
>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>                          rtas_get_power_level);
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>>  }
>>  
>>  type_init(core_rtas_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 7e32f30..ec6f33e 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -187,6 +187,10 @@ struct SpaprMachineState {
>>       * occurs during the unplug process. */
>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>  
>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>> +    target_ulong guest_machine_check_addr;
>> +    QemuCond mc_delivery_cond;
>> +
>>      /*< public >*/
>>      char *kvm_type;
>>      char *host_model;
>> @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>  
>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>> +uint64_t spapr_get_rtas_addr(void);
>>  #endif /* HW_SPAPR_H */
>>
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
@ 2019-04-25  4:56       ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-04-25  4:56 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Tuesday 23 April 2019 12:15 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:32:58PM +0530, Aravinda Prasad wrote:
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls.
>>
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor that also received a machine check error waits
>> till the first processor is done reading the error log.
>> The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed. This patch implements the
>> releasing part of the error-log while subsequent patch
>> (which builds error log) handles the locking part.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Although I wonder if it needs to be moved later in the series to avoid
> advertising the availability of the RTAS calls to the guest before all
> the prereq patches are in place to make them work properly.

Patch 3 and 4 uses "guest_machine_check_addr", which is set in this
patch. If we push this beyond patch 4, then I feel the use of
"guest_machine_check_addr" looks odd in patch 3 and 4.

Regards,
Aravinda

> 
>> ---
>>  hw/ppc/spapr.c         |   18 ++++++++++++++
>>  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    9 ++++++-
>>  3 files changed, 87 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index c56939a..6642cb5 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
>>      first_ppc_cpu->env.gpr[5] = 0;
>>  
>>      spapr->cas_reboot = false;
>> +
>> +    spapr->guest_machine_check_addr = -1;
>> +
>> +    /* Signal all vCPUs waiting on this condition */
>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>> @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>      },
>>  };
>>  
>> +static const VMStateDescription vmstate_spapr_machine_check = {
>> +    .name = "spapr_machine_check",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static const VMStateDescription vmstate_spapr = {
>>      .name = "spapr",
>>      .version_id = 3,
>> @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
>>          &vmstate_spapr_dtb,
>>          &vmstate_spapr_cap_large_decr,
>>          &vmstate_spapr_cap_ccf_assist,
>> +        &vmstate_spapr_machine_check,
>>          NULL
>>      }
>>  };
>> @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
>>  
>>          kvmppc_spapr_enable_inkernel_multitce();
>>      }
>> +
>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index ee24212..c2f3991 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>      rtas_st(rets, 1, 100);
>>  }
>>  
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  SpaprMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    uint64_t rtas_addr = spapr_get_rtas_addr();
>> +
>> +    if (!rtas_addr) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (!spapr->guest_machine_check_addr) {
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +    } else {
>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    }
>> +}
>> +
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>      }
>>  }
>>  
>> +uint64_t spapr_get_rtas_addr(void)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int rtas_node;
>> +    const struct fdt_property *rtas_addr_prop;
>> +    void *fdt = spapr->fdt_blob;
>> +    uint32_t rtas_addr;
>> +
>> +    /* fetch rtas addr from fdt */
>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>> +    if (rtas_node == 0) {
>> +        return 0;
>> +    }
>> +
>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
>> +    if (!rtas_addr_prop) {
>> +        return 0;
>> +    }
>> +
>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
>> +    return (uint64_t)rtas_addr;
>> +}
>> +
>> +
>>  static void core_rtas_register_types(void)
>>  {
>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
>>                          rtas_set_power_level);
>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>                          rtas_get_power_level);
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>>  }
>>  
>>  type_init(core_rtas_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 7e32f30..ec6f33e 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -187,6 +187,10 @@ struct SpaprMachineState {
>>       * occurs during the unplug process. */
>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>  
>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>> +    target_ulong guest_machine_check_addr;
>> +    QemuCond mc_delivery_cond;
>> +
>>      /*< public >*/
>>      char *kvm_type;
>>      char *host_model;
>> @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>  
>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>> +uint64_t spapr_get_rtas_addr(void);
>>  #endif /* HW_SPAPR_H */
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
  2019-04-24  4:50       ` Aravinda Prasad
  (?)
@ 2019-05-10  6:37       ` David Gibson
  2019-05-10  6:58         ` Aravinda Prasad
  -1 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-10  6:37 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, aik, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8484 bytes --]

On Wed, Apr 24, 2019 at 10:20:42AM +0530, Aravinda Prasad wrote:
65;5601;1c> 
> 
> On Tuesday 23 April 2019 12:23 PM, David Gibson wrote:
> > On Mon, Apr 22, 2019 at 12:33:16PM +0530, Aravinda Prasad wrote:
> >> Memory error such as bit flips that cannot be corrected
> >> by hardware are passed on to the kernel for handling.
> >> If the memory address in error belongs to guest then
> >> the guest kernel is responsible for taking suitable action.
> >> Patch [1] enhances KVM to exit guest with exit reason
> >> set to KVM_EXIT_NMI in such cases. This patch handles
> >> KVM_EXIT_NMI exit.
> >>
> >> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
> >>     (e20bbd3d and related commits)
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > 
> > LGTM, apart from one detail noted below.
> > 
> >> ---
> >>  hw/ppc/spapr.c          |    3 +++
> >>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
> >>  hw/ppc/spapr_rtas.c     |    5 +++++
> >>  include/hw/ppc/spapr.h  |    6 ++++++
> >>  target/ppc/kvm.c        |   16 ++++++++++++++++
> >>  target/ppc/kvm_ppc.h    |    2 ++
> >>  target/ppc/trace-events |    2 ++
> >>  7 files changed, 56 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 6642cb5..2779efe 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
> >>  
> >>      spapr->cas_reboot = false;
> >>  
> >> +    spapr->mc_status = -1;
> >>      spapr->guest_machine_check_addr = -1;
> >>  
> >>      /* Signal all vCPUs waiting on this condition */
> >> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
> >>      .minimum_version_id = 1,
> >>      .fields = (VMStateField[]) {
> >>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> >> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> > 
> > So, technically this is a breaking change to the migration stream.  If
> > this is applied immediately after the earlier patch introducing the
> > subsection it would be ok in practice, but it would still be
> > preferable to make all the migration stream changes together.
> 
> Do you mean that all .fields entries to vmstate_spapr_machine_check
> should be in a single patch?

Yes, that's preferable.  If necessary you can move the migration
support out into its own patch which goes after the implementation of
the underlying state.

> 
> Because this patch introduced the variable mc_status, I added it to
> vmstate_spapr_machine_check.
> 
> Regards,
> Aravinda
> 
> > 
> >>          VMSTATE_END_OF_LIST()
> >>      },
> >>  };
> >> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
> >>          kvmppc_spapr_enable_inkernel_multitce();
> >>      }
> >>  
> >> +    spapr->mc_status = -1;
> >>      qemu_cond_init(&spapr->mc_delivery_cond);
> >>  }
> >>  
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index ae0f093..9922a23 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> >>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >>  }
> >>  
> >> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >> +{
> >> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +
> >> +    while (spapr->mc_status != -1) {
> >> +        /*
> >> +         * Check whether the same CPU got machine check error
> >> +         * while still handling the mc error (i.e., before
> >> +         * that CPU called "ibm,nmi-interlock"
> >> +         */
> >> +        if (spapr->mc_status == cpu->vcpu_id) {
> >> +            qemu_system_guest_panicked(NULL);
> >> +        }
> >> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> >> +        /* Meanwhile if the system is reset, then just return */
> >> +        if (spapr->guest_machine_check_addr == -1) {
> >> +            return;
> >> +        }
> >> +    }
> >> +    spapr->mc_status = cpu->vcpu_id;
> >> +}
> >> +
> >>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                              uint32_t token, uint32_t nargs,
> >>                              target_ulong args,
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index c2f3991..d3499f9 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>          /* NMI register not called */
> >>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>      } else {
> >> +        /*
> >> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >> +         * hence unset mc_status.
> >> +         */
> >> +        spapr->mc_status = -1;
> >>          qemu_cond_signal(&spapr->mc_delivery_cond);
> >>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>      }
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index ec6f33e..f7204d0 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -189,6 +189,11 @@ struct SpaprMachineState {
> >>  
> >>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> >>      target_ulong guest_machine_check_addr;
> >> +    /*
> >> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> >> +     * handling the mc.
> >> +     */
> >> +    int mc_status;
> >>      QemuCond mc_delivery_cond;
> >>  
> >>      /*< public >*/
> >> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
> >>  int spapr_max_server_number(SpaprMachineState *spapr);
> >>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> >>                        uint64_t pte0, uint64_t pte1);
> >> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> >>  
> >>  /* DRC callbacks. */
> >>  void spapr_core_release(DeviceState *dev);
> >> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> >> index 9e86db0..5eedce8 100644
> >> --- a/target/ppc/kvm.c
> >> +++ b/target/ppc/kvm.c
> >> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
> >>          ret = 0;
> >>          break;
> >>  
> >> +    case KVM_EXIT_NMI:
> >> +        trace_kvm_handle_nmi_exception();
> >> +        ret = kvm_handle_nmi(cpu, run);
> >> +        break;
> >> +
> >>      default:
> >>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
> >>          ret = -1;
> >> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >>      return data & 0xffff;
> >>  }
> >>  
> >> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> >> +{
> >> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> >> +
> >> +    cpu_synchronize_state(CPU(cpu));
> >> +
> >> +    spapr_mce_req_event(cpu, recovered);
> >> +
> >> +    return 0;
> >> +}
> >> +
> >>  int kvmppc_enable_hwrng(void)
> >>  {
> >>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> >> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> >> index 2238513..6edc42f 100644
> >> --- a/target/ppc/kvm_ppc.h
> >> +++ b/target/ppc/kvm_ppc.h
> >> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
> >>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
> >>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
> >>  
> >> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> >> +
> >>  #else
> >>  
> >>  static inline uint32_t kvmppc_get_tbfreq(void)
> >> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> >> index 7b3cfe1..d5691d2 100644
> >> --- a/target/ppc/trace-events
> >> +++ b/target/ppc/trace-events
> >> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
> >>  kvm_handle_epr(void) "handle epr"
> >>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
> >>  kvm_handle_debug_exception(void) "handle debug exception"
> >> +kvm_handle_nmi_exception(void) "handle NMI exception"
> >> +
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-04-22  7:03   ` Aravinda Prasad
  (?)
  (?)
@ 2019-05-10  6:42   ` David Gibson
  2019-05-10  7:05     ` Aravinda Prasad
  -1 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-10  6:42 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, qemu-ppc, aik, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 13866 bytes --]

On Mon, Apr 22, 2019 at 12:33:26PM +0530, Aravinda Prasad wrote:
> Upon a machine check exception (MCE) in a guest address space,
> KVM causes a guest exit to enable QEMU to build and pass the
> error to the guest in the PAPR defined rtas error log format.
> 
> This patch builds the rtas error log, copies it to the rtas_addr
> and then invokes the guest registered machine check handler. The
> handler in the guest takes suitable action(s) depending on the type
> and criticality of the error. For example, if an error is
> unrecoverable memory corruption in an application inside the
> guest, then the guest kernel sends a SIGBUS to the application.
> For recoverable errors, the guest performs recovery actions and
> logs the error.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    4 +
>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    4 +
>  3 files changed, 253 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2779efe..ffd1715 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
>          error_report("Could not get size of LPAR rtas '%s'", filename);
>          exit(1);
>      }
> +
> +    /* Resize blob to accommodate error log. */
> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
> +
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>          error_report("Could not load LPAR rtas '%s'", filename);
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 9922a23..4032db0 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -212,6 +212,106 @@ struct hp_extended_log {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_mc {
> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint32_t fru_id;
> +    uint32_t proc_id;
> +    uint8_t error_type;
> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> +    uint8_t sub_err_type;
> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> +    uint8_t reserved_1[6];
> +    uint64_t effective_address;
> +    uint64_t logical_address;
> +} QEMU_PACKED;
> +
> +struct mc_extended_log {
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_mc mc;
> +} QEMU_PACKED;
> +
> +struct MC_ierror_table {
> +    unsigned long srr1_mask;
> +    unsigned long srr1_value;
> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_ierror_table mc_ierror_table[] = {
> +{ 0x00000000081c0000, 0x0000000000040000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000080000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000100000, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000140000, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000180000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, 0, 0, 0, 0, 0 } };
> +
> +struct MC_derror_table {
> +    unsigned long dsisr_value;
> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_derror_table mc_derror_table[] = {
> +{ 0x00008000, false,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00004000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000800, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000400, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000080, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000100, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, false, 0, 0, 0, 0 } };
> +
> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> +
>  typedef enum EventClass {
>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>      EVENT_CLASS_EPOW                = 1,
> @@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
> +{
> +    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
> +    return RTAS_ERROR_LOG_MAX;
> +}
> +
> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> +                                        struct mc_extended_log *ext_elog)
> +{
> +    int i;
> +    CPUPPCState *env = &cpu->env;
> +    uint32_t summary;
> +    uint64_t dsisr = env->spr[SPR_DSISR];
> +
> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> +    if (recovered) {
> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> +    } else {
> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> +    }
> +
> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> +            if (mc_derror_table[i].dar_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> +            }
> +
> +            summary |= mc_derror_table[i].initiator
> +                        | mc_derror_table[i].severity;
> +
> +            return summary;
> +        }
> +    } else {
> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> +                    mc_ierror_table[i].srr1_value) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> +            if (mc_ierror_table[i].nip_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> +            }
> +
> +            summary |= mc_ierror_table[i].initiator
> +                        | mc_ierror_table[i].severity;
> +
> +            return summary;
> +        }
> +    }
> +
> +    summary |= RTAS_LOG_INITIATOR_CPU;
> +    return summary;
> +}
> +
> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    CPUState *cs = CPU(cpu);
> +    uint64_t rtas_addr;
> +    CPUPPCState *env = &cpu->env;
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    target_ulong r3, msr = 0;
> +    struct rtas_error_log log;
> +    struct mc_extended_log *ext_elog;
> +    uint32_t summary;
> +
> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr && (1ULL << MSR_SF)) {
> +        msr |= (1ULL << MSR_SF);
> +    }
> +
> +    msr |= (1ULL << MSR_ME);
> +
> +    if (spapr->guest_machine_check_addr == -1) {
> +        /*
> +         * This implies that we have hit a machine check between system
> +         * reset and "ibm,nmi-register". Fall back to the old machine
> +         * check behavior in such cases.
> +         */
> +        env->spr[SPR_SRR0] = env->nip;
> +        env->spr[SPR_SRR1] = env->msr;
> +        env->msr = msr;
> +        env->nip = 0x200;
> +        return;
> +    }
> +
> +    ext_elog = g_malloc0(sizeof(struct mc_extended_log));
> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> +
> +    log.summary = cpu_to_be32(summary);
> +    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
> +
> +    /* r3 should be in BE always */
> +    r3 = cpu_to_be64(env->gpr[3]);
> +    env->msr = msr;
> +
> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> +    ext_elog->mc.hdr.section_length =
> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> +    ext_elog->mc.hdr.section_version = 1;
> +
> +    /* get rtas addr from fdt */
> +    rtas_addr = spapr_get_rtas_addr();
> +    if (!rtas_addr) {
> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> +        ppc_cpu_do_system_reset(cs);
> +    }
> +
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
> +                              &log, sizeof(log));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
> +                              sizeof(log), ext_elog,
> +                              sizeof(struct mc_extended_log));
> +
> +    /* Save gpr[3] in the guest endian mode */
> +    if ((*pcc->interrupts_big_endian)(cpu)) {
> +        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);

I don't think this is right.  AIUI env->gpr[] are all stored in *host*
endianness (for ease of doing arithmetic).

> +    } else {
> +        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
> +    }
> +
> +    env->nip = spapr->guest_machine_check_addr;
> +}
> +
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> @@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>          }
>      }
>      spapr->mc_status = cpu->vcpu_id;
> +
> +    spapr_mce_dispatch_elog(cpu, recovered);
> +
> +    return;
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index f7204d0..03f34bf 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
>  #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
>  
> +/* Offset from rtas-base where error log is placed */
> +#define RTAS_ERRLOG_OFFSET       0x25

Is this offset PAPR defined, or chosen here?  Using an entirely
unaliged (odd) address seems a very strange choice.

> +
>  static inline uint64_t ppc64_phys_to_real(uint64_t addr)
>  {
>      return addr & ~0xF000000000000000ULL;
> @@ -798,6 +801,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-04-22  7:03   ` Aravinda Prasad
  (?)
@ 2019-05-10  6:46   ` David Gibson
  2019-05-10  7:15     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  -1 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-10  6:46 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, qemu-ppc, aik, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7722 bytes --]

On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
> the KVM causes guest exit with NMI as exit reason
> when it encounters a machine check exception on the
> address belonging to a guest. Without this capability
> enabled, KVM redirects machine check exceptions to
> guest's 0x200 vector.
> 
> This patch also deals with the case when a guest with
> the KVM_CAP_PPC_FWNMI capability enabled is attempted
> to migrate to a host that does not support this
> capability.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    1 +
>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
>  include/hw/ppc/spapr.h |    4 +++-
>  target/ppc/kvm.c       |   14 ++++++++++++++
>  target/ppc/kvm_ppc.h   |    6 ++++++
>  6 files changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ffd1715..44e09bb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>      spapr_caps_add_properties(smc, &error_abort);
>      smc->irq = &spapr_irq_xics;
>      smc->dr_phb_enabled = true;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index edc5ed0..5b3af04 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>      }
>  }
>  
> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> +                                Error **errp)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +
> +    if (!val) {
> +        return; /* Disabled by default */
> +    }
> +
> +    if (kvm_enabled()) {
> +        if (kvmppc_fwnmi_enable(cpu)) {
> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
> +        }
> +    }
> +}
> +
>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>      [SPAPR_CAP_HTM] = {
>          .name = "htm",
> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>          .type = "bool",
>          .apply = cap_ccf_assist_apply,
>      },
> +    [SPAPR_CAP_FWNMI_MCE] = {
> +        .name = "fwnmi-mce",
> +        .description = "Handle fwnmi machine check exceptions",
> +        .index = SPAPR_CAP_FWNMI_MCE,
> +        .get = spapr_cap_get_bool,
> +        .set = spapr_cap_set_bool,
> +        .type = "bool",
> +        .apply = cap_fwnmi_mce_apply,
> +    },
>  };
>  
>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>  
>  void spapr_caps_init(SpaprMachineState *spapr)
>  {
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index d3499f9..997cf19 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -49,6 +49,7 @@
>  #include "hw/ppc/fdt.h"
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
> +#include "kvm_ppc.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>                                    target_ulong args,
>                                    uint32_t nret, target_ulong rets)
>  {
> +    int ret;
>      uint64_t rtas_addr = spapr_get_rtas_addr();
>  
>      if (!rtas_addr) {
> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>          return;
>      }
>  
> +    ret = kvmppc_fwnmi_enable(cpu);

You shouldn't need this here as well as in cap_fwnmi_mce_apply().

Instead, you should unconditionally fail the nmi-register if the
capability is not enabled.

> +    if (ret == 1) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    if (ret < 0) {
> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +        return;
> +    }
> +
>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 03f34bf..9d16ad1 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -78,8 +78,10 @@ typedef enum {
>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>  /* Count Cache Flush Assist HW Instruction */
>  #define SPAPR_CAP_CCF_ASSIST            0x09
> +/* FWNMI machine check handling */
> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>  /* Num Caps */
> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>  
>  /*
>   * Capability Values
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 5eedce8..9c7b71d 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
>  static int cap_ppc_count_cache_flush_assist;
>  static int cap_ppc_nested_kvm_hv;
>  static int cap_large_decr;
> +static int cap_ppc_fwnmi;
>  
>  static uint32_t debug_inst_opcode;
>  
> @@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      kvmppc_get_cpu_characteristics(s);
>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>      cap_large_decr = kvmppc_get_dec_bits();
> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>      /*
>       * Note: setting it to false because there is not such capability
>       * in KVM at this moment.
> @@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>      }
>  }
>  
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!cap_ppc_fwnmi) {
> +        return 1;
> +    }
> +
> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> +}
> +
> +
>  int kvmppc_smt_threads(void)
>  {
>      return cap_ppc_smt ? cap_ppc_smt : 1;
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 6edc42f..28919d3 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
>  void kvmppc_set_papr(PowerPCCPU *cpu);
>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>  int kvmppc_smt_threads(void);
>  void kvmppc_hint_smt_possible(Error **errp);
>  int kvmppc_set_smt_threads(int smt);
> @@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  {
>  }
>  
> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> +{
> +    return 1;
> +}
> +
>  static inline int kvmppc_smt_threads(void)
>  {
>      return 1;
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 6/6] migration: Block migration while handling machine check
  2019-04-22  7:03   ` Aravinda Prasad
  (?)
@ 2019-05-10  6:51   ` David Gibson
  2019-05-10  7:16     ` Aravinda Prasad
  2019-05-29  5:46     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  -1 siblings, 2 replies; 65+ messages in thread
From: David Gibson @ 2019-05-10  6:51 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, qemu-ppc, aik, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4037 bytes --]

On Mon, Apr 22, 2019 at 12:33:45PM +0530, Aravinda Prasad wrote:
> Block VM migration requests until the machine check
> error handling is complete as (i) these errors are
> specific to the source hardware and is irrelevant on
> the target hardware, (ii) these errors cause data
> corruption and should be handled before migration.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>  hw/ppc/spapr_rtas.c    |    4 ++++
>  include/hw/ppc/spapr.h |    3 +++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 4032db0..45b990c 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -41,6 +41,7 @@
>  #include "qemu/bcd.h"
>  #include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
> +#include "migration/blocker.h"
>  
>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>  #define   RTAS_LOG_VERSION_6                    0x06000000
> @@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int ret;
> +    Error *local_err = NULL;
> +
> +    error_setg(&spapr->migration_blocker,
> +            "Live migration not supported during machine check handling");
> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
> +    if (ret < 0) {
> +        /*
> +         * We don't want to abort and let the migration to continue. In a
> +         * rare case, the machine check handler will run on the target
> +         * hardware. Though this is not preferable, it is better than aborting
> +         * the migration or killing the VM.
> +         */
> +        error_free(spapr->migration_blocker);
> +        fprintf(stderr, "Warning: Machine check during VM migration\n");

Use report_err() instead of a raw fprintf().

> +    }
>  
>      while (spapr->mc_status != -1) {
>          /*
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 997cf19..1229a0e 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -50,6 +50,7 @@
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
>  #include "kvm_ppc.h"
> +#include "migration/blocker.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        migrate_del_blocker(spapr->migration_blocker);
> +        error_free(spapr->migration_blocker);
> +        spapr->migration_blocker = NULL;
>      }
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9d16ad1..dda5fd2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -10,6 +10,7 @@
>  #include "hw/ppc/spapr_irq.h"
>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>  #include "hw/ppc/xics.h"        /* For ICSState */
> +#include "qapi/error.h"
>  
>  struct SpaprVioBus;
>  struct SpaprPhbState;
> @@ -213,6 +214,8 @@ struct SpaprMachineState {
>      SpaprCapabilities def, eff, mig;
>  
>      unsigned gpu_numa_id;
> +
> +    Error *migration_blocker;

This name doesn't seem good - it's specific to fwnmi, not any other
migration blockers we might have in future.  It also always contains
the same string - could you just initialize that in a global and just
do the migrate_add_blocker() / migrate_del_blocker() instead?

>  };
>  
>  #define H_SUCCESS         0
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
  2019-05-10  6:37       ` David Gibson
@ 2019-05-10  6:58         ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-10  6:58 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Friday 10 May 2019 12:07 PM, David Gibson wrote:
> On Wed, Apr 24, 2019 at 10:20:42AM +0530, Aravinda Prasad wrote:
> 65;5601;1c> 
>>
>> On Tuesday 23 April 2019 12:23 PM, David Gibson wrote:
>>> On Mon, Apr 22, 2019 at 12:33:16PM +0530, Aravinda Prasad wrote:
>>>> Memory error such as bit flips that cannot be corrected
>>>> by hardware are passed on to the kernel for handling.
>>>> If the memory address in error belongs to guest then
>>>> the guest kernel is responsible for taking suitable action.
>>>> Patch [1] enhances KVM to exit guest with exit reason
>>>> set to KVM_EXIT_NMI in such cases. This patch handles
>>>> KVM_EXIT_NMI exit.
>>>>
>>>> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>>>>     (e20bbd3d and related commits)
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>
>>> LGTM, apart from one detail noted below.
>>>
>>>> ---
>>>>  hw/ppc/spapr.c          |    3 +++
>>>>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
>>>>  hw/ppc/spapr_rtas.c     |    5 +++++
>>>>  include/hw/ppc/spapr.h  |    6 ++++++
>>>>  target/ppc/kvm.c        |   16 ++++++++++++++++
>>>>  target/ppc/kvm_ppc.h    |    2 ++
>>>>  target/ppc/trace-events |    2 ++
>>>>  7 files changed, 56 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 6642cb5..2779efe 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
>>>>  
>>>>      spapr->cas_reboot = false;
>>>>  
>>>> +    spapr->mc_status = -1;
>>>>      spapr->guest_machine_check_addr = -1;
>>>>  
>>>>      /* Signal all vCPUs waiting on this condition */
>>>> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
>>>>      .minimum_version_id = 1,
>>>>      .fields = (VMStateField[]) {
>>>>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>>>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>>>
>>> So, technically this is a breaking change to the migration stream.  If
>>> this is applied immediately after the earlier patch introducing the
>>> subsection it would be ok in practice, but it would still be
>>> preferable to make all the migration stream changes together.
>>
>> Do you mean that all .fields entries to vmstate_spapr_machine_check
>> should be in a single patch?
> 
> Yes, that's preferable.  If necessary you can move the migration
> support out into its own patch which goes after the implementation of
> the underlying state.

Sure..

> 
>>
>> Because this patch introduced the variable mc_status, I added it to
>> vmstate_spapr_machine_check.
>>
>> Regards,
>> Aravinda
>>
>>>
>>>>          VMSTATE_END_OF_LIST()
>>>>      },
>>>>  };
>>>> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
>>>>          kvmppc_spapr_enable_inkernel_multitce();
>>>>      }
>>>>  
>>>> +    spapr->mc_status = -1;
>>>>      qemu_cond_init(&spapr->mc_delivery_cond);
>>>>  }
>>>>  
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index ae0f093..9922a23 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>>>  }
>>>>  
>>>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>> +{
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> +
>>>> +    while (spapr->mc_status != -1) {
>>>> +        /*
>>>> +         * Check whether the same CPU got machine check error
>>>> +         * while still handling the mc error (i.e., before
>>>> +         * that CPU called "ibm,nmi-interlock"
>>>> +         */
>>>> +        if (spapr->mc_status == cpu->vcpu_id) {
>>>> +            qemu_system_guest_panicked(NULL);
>>>> +        }
>>>> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
>>>> +        /* Meanwhile if the system is reset, then just return */
>>>> +        if (spapr->guest_machine_check_addr == -1) {
>>>> +            return;
>>>> +        }
>>>> +    }
>>>> +    spapr->mc_status = cpu->vcpu_id;
>>>> +}
>>>> +
>>>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                              uint32_t token, uint32_t nargs,
>>>>                              target_ulong args,
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index c2f3991..d3499f9 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>>          /* NMI register not called */
>>>>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>>      } else {
>>>> +        /*
>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>>>> +         * hence unset mc_status.
>>>> +         */
>>>> +        spapr->mc_status = -1;
>>>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>      }
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index ec6f33e..f7204d0 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -189,6 +189,11 @@ struct SpaprMachineState {
>>>>  
>>>>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>>>      target_ulong guest_machine_check_addr;
>>>> +    /*
>>>> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
>>>> +     * handling the mc.
>>>> +     */
>>>> +    int mc_status;
>>>>      QemuCond mc_delivery_cond;
>>>>  
>>>>      /*< public >*/
>>>> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>>>>  int spapr_max_server_number(SpaprMachineState *spapr);
>>>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>>>                        uint64_t pte0, uint64_t pte1);
>>>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>>>  
>>>>  /* DRC callbacks. */
>>>>  void spapr_core_release(DeviceState *dev);
>>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>>>> index 9e86db0..5eedce8 100644
>>>> --- a/target/ppc/kvm.c
>>>> +++ b/target/ppc/kvm.c
>>>> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>>>          ret = 0;
>>>>          break;
>>>>  
>>>> +    case KVM_EXIT_NMI:
>>>> +        trace_kvm_handle_nmi_exception();
>>>> +        ret = kvm_handle_nmi(cpu, run);
>>>> +        break;
>>>> +
>>>>      default:
>>>>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>>>          ret = -1;
>>>> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>>>      return data & 0xffff;
>>>>  }
>>>>  
>>>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>>>> +{
>>>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>>>> +
>>>> +    cpu_synchronize_state(CPU(cpu));
>>>> +
>>>> +    spapr_mce_req_event(cpu, recovered);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>  int kvmppc_enable_hwrng(void)
>>>>  {
>>>>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
>>>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>>>> index 2238513..6edc42f 100644
>>>> --- a/target/ppc/kvm_ppc.h
>>>> +++ b/target/ppc/kvm_ppc.h
>>>> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>>>>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>>>>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>>>>  
>>>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
>>>> +
>>>>  #else
>>>>  
>>>>  static inline uint32_t kvmppc_get_tbfreq(void)
>>>> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
>>>> index 7b3cfe1..d5691d2 100644
>>>> --- a/target/ppc/trace-events
>>>> +++ b/target/ppc/trace-events
>>>> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>>>>  kvm_handle_epr(void) "handle epr"
>>>>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>>>>  kvm_handle_debug_exception(void) "handle debug exception"
>>>> +kvm_handle_nmi_exception(void) "handle NMI exception"
>>>> +
>>>>
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-10  6:42   ` [Qemu-devel] " David Gibson
@ 2019-05-10  7:05     ` Aravinda Prasad
  2019-05-10  9:52       ` David Gibson
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-10  7:05 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Friday 10 May 2019 12:12 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:33:26PM +0530, Aravinda Prasad wrote:
>> Upon a machine check exception (MCE) in a guest address space,
>> KVM causes a guest exit to enable QEMU to build and pass the
>> error to the guest in the PAPR defined rtas error log format.
>>
>> This patch builds the rtas error log, copies it to the rtas_addr
>> and then invokes the guest registered machine check handler. The
>> handler in the guest takes suitable action(s) depending on the type
>> and criticality of the error. For example, if an error is
>> unrecoverable memory corruption in an application inside the
>> guest, then the guest kernel sends a SIGBUS to the application.
>> For recoverable errors, the guest performs recovery actions and
>> logs the error.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |    4 +
>>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    4 +
>>  3 files changed, 253 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 2779efe..ffd1715 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>          exit(1);
>>      }
>> +
>> +    /* Resize blob to accommodate error log. */
>> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
>> +
>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>>          error_report("Could not load LPAR rtas '%s'", filename);
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 9922a23..4032db0 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -212,6 +212,106 @@ struct hp_extended_log {
>>      struct rtas_event_log_v6_hp hp;
>>  } QEMU_PACKED;
>>  
>> +struct rtas_event_log_v6_mc {
>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>> +    struct rtas_event_log_v6_section_header hdr;
>> +    uint32_t fru_id;
>> +    uint32_t proc_id;
>> +    uint8_t error_type;
>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>> +    uint8_t sub_err_type;
>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>> +    uint8_t reserved_1[6];
>> +    uint64_t effective_address;
>> +    uint64_t logical_address;
>> +} QEMU_PACKED;
>> +
>> +struct mc_extended_log {
>> +    struct rtas_event_log_v6 v6hdr;
>> +    struct rtas_event_log_v6_mc mc;
>> +} QEMU_PACKED;
>> +
>> +struct MC_ierror_table {
>> +    unsigned long srr1_mask;
>> +    unsigned long srr1_value;
>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_ierror_table mc_ierror_table[] = {
>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0, 0, 0, 0, 0, 0 } };
>> +
>> +struct MC_derror_table {
>> +    unsigned long dsisr_value;
>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>> +    uint8_t error_type;
>> +    uint8_t error_subtype;
>> +    unsigned int initiator;
>> +    unsigned int severity;
>> +};
>> +
>> +static const struct MC_derror_table mc_derror_table[] = {
>> +{ 0x00008000, false,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00004000, true,
>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000800, true,
>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000400, true,
>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000080, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0x00000100, true,
>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>> +{ 0, false, 0, 0, 0, 0 } };
>> +
>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>> +
>>  typedef enum EventClass {
>>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>      EVENT_CLASS_EPOW                = 1,
>> @@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>  }
>>  
>> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
>> +{
>> +    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
>> +    return RTAS_ERROR_LOG_MAX;
>> +}
>> +
>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>> +                                        struct mc_extended_log *ext_elog)
>> +{
>> +    int i;
>> +    CPUPPCState *env = &cpu->env;
>> +    uint32_t summary;
>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>> +
>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>> +    if (recovered) {
>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>> +    } else {
>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>> +    }
>> +
>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>> +            if (mc_derror_table[i].dar_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
>> +            }
>> +
>> +            summary |= mc_derror_table[i].initiator
>> +                        | mc_derror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    } else {
>> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>> +                    mc_ierror_table[i].srr1_value) {
>> +                continue;
>> +            }
>> +
>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>> +            if (mc_ierror_table[i].nip_valid) {
>> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
>> +            }
>> +
>> +            summary |= mc_ierror_table[i].initiator
>> +                        | mc_ierror_table[i].severity;
>> +
>> +            return summary;
>> +        }
>> +    }
>> +
>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>> +    return summary;
>> +}
>> +
>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    CPUState *cs = CPU(cpu);
>> +    uint64_t rtas_addr;
>> +    CPUPPCState *env = &cpu->env;
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    target_ulong r3, msr = 0;
>> +    struct rtas_error_log log;
>> +    struct mc_extended_log *ext_elog;
>> +    uint32_t summary;
>> +
>> +    /*
>> +     * Properly set bits in MSR before we invoke the handler.
>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>> +     */
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +    if (env->msr && (1ULL << MSR_SF)) {
>> +        msr |= (1ULL << MSR_SF);
>> +    }
>> +
>> +    msr |= (1ULL << MSR_ME);
>> +
>> +    if (spapr->guest_machine_check_addr == -1) {
>> +        /*
>> +         * This implies that we have hit a machine check between system
>> +         * reset and "ibm,nmi-register". Fall back to the old machine
>> +         * check behavior in such cases.
>> +         */
>> +        env->spr[SPR_SRR0] = env->nip;
>> +        env->spr[SPR_SRR1] = env->msr;
>> +        env->msr = msr;
>> +        env->nip = 0x200;
>> +        return;
>> +    }
>> +
>> +    ext_elog = g_malloc0(sizeof(struct mc_extended_log));
>> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
>> +
>> +    log.summary = cpu_to_be32(summary);
>> +    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
>> +
>> +    /* r3 should be in BE always */
>> +    r3 = cpu_to_be64(env->gpr[3]);
>> +    env->msr = msr;
>> +
>> +    spapr_init_v6hdr(&ext_elog->v6hdr);
>> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
>> +    ext_elog->mc.hdr.section_length =
>> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
>> +    ext_elog->mc.hdr.section_version = 1;
>> +
>> +    /* get rtas addr from fdt */
>> +    rtas_addr = spapr_get_rtas_addr();
>> +    if (!rtas_addr) {
>> +        /* Unable to fetch rtas_addr. Hence reset the guest */
>> +        ppc_cpu_do_system_reset(cs);
>> +    }
>> +
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
>> +                              &log, sizeof(log));
>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
>> +                              sizeof(log), ext_elog,
>> +                              sizeof(struct mc_extended_log));
>> +
>> +    /* Save gpr[3] in the guest endian mode */
>> +    if ((*pcc->interrupts_big_endian)(cpu)) {
>> +        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
> 
> I don't think this is right.  AIUI env->gpr[] are all stored in *host*
> endianness (for ease of doing arithmetic).

env-gpr[3] is later used by guest to fetch the RTAS log. My guess is
that we will not do an endianness change of all the gprs during a switch
from host to guest (that will be costly). But let me cross check.

> 
>> +    } else {
>> +        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
>> +    }
>> +
>> +    env->nip = spapr->guest_machine_check_addr;
>> +}
>> +
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> @@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>          }
>>      }
>>      spapr->mc_status = cpu->vcpu_id;
>> +
>> +    spapr_mce_dispatch_elog(cpu, recovered);
>> +
>> +    return;
>>  }
>>  
>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index f7204d0..03f34bf 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>  #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
>>  #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
>>  
>> +/* Offset from rtas-base where error log is placed */
>> +#define RTAS_ERRLOG_OFFSET       0x25
> 
> Is this offset PAPR defined, or chosen here?  Using an entirely
> unaliged (odd) address seems a very strange choice.

This is not PAPR defined. I will make it 0x30. Or do you prefer any
other offset?

Regards,
Aravinda

> 
>> +
>>  static inline uint64_t ppc64_phys_to_real(uint64_t addr)
>>  {
>>      return addr & ~0xF000000000000000ULL;
>> @@ -798,6 +801,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                        uint64_t pte0, uint64_t pte1);
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
>>  
>>  /* DRC callbacks. */
>>  void spapr_core_release(DeviceState *dev);
>>
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-10  6:46   ` David Gibson
@ 2019-05-10  7:15     ` Aravinda Prasad
  2019-05-10  9:53       ` David Gibson
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-10  7:15 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, aik, qemu-ppc, qemu-devel



On Friday 10 May 2019 12:16 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
>> the KVM causes guest exit with NMI as exit reason
>> when it encounters a machine check exception on the
>> address belonging to a guest. Without this capability
>> enabled, KVM redirects machine check exceptions to
>> guest's 0x200 vector.
>>
>> This patch also deals with the case when a guest with
>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
>> to migrate to a host that does not support this
>> capability.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |    1 +
>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
>>  include/hw/ppc/spapr.h |    4 +++-
>>  target/ppc/kvm.c       |   14 ++++++++++++++
>>  target/ppc/kvm_ppc.h   |    6 ++++++
>>  6 files changed, 64 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index ffd1715..44e09bb 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>      spapr_caps_add_properties(smc, &error_abort);
>>      smc->irq = &spapr_irq_xics;
>>      smc->dr_phb_enabled = true;
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index edc5ed0..5b3af04 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>      }
>>  }
>>  
>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>> +                                Error **errp)
>> +{
>> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>> +
>> +    if (!val) {
>> +        return; /* Disabled by default */
>> +    }
>> +
>> +    if (kvm_enabled()) {
>> +        if (kvmppc_fwnmi_enable(cpu)) {
>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
>> +        }
>> +    }
>> +}
>> +
>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>      [SPAPR_CAP_HTM] = {
>>          .name = "htm",
>> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>          .type = "bool",
>>          .apply = cap_ccf_assist_apply,
>>      },
>> +    [SPAPR_CAP_FWNMI_MCE] = {
>> +        .name = "fwnmi-mce",
>> +        .description = "Handle fwnmi machine check exceptions",
>> +        .index = SPAPR_CAP_FWNMI_MCE,
>> +        .get = spapr_cap_get_bool,
>> +        .set = spapr_cap_set_bool,
>> +        .type = "bool",
>> +        .apply = cap_fwnmi_mce_apply,
>> +    },
>>  };
>>  
>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>  
>>  void spapr_caps_init(SpaprMachineState *spapr)
>>  {
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index d3499f9..997cf19 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -49,6 +49,7 @@
>>  #include "hw/ppc/fdt.h"
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>> +#include "kvm_ppc.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>                                    target_ulong args,
>>                                    uint32_t nret, target_ulong rets)
>>  {
>> +    int ret;
>>      uint64_t rtas_addr = spapr_get_rtas_addr();
>>  
>>      if (!rtas_addr) {
>> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>          return;
>>      }
>>  
>> +    ret = kvmppc_fwnmi_enable(cpu);
> 
> You shouldn't need this here as well as in cap_fwnmi_mce_apply().
> 
> Instead, you should unconditionally fail the nmi-register if the
> capability is not enabled.

cap_fwnmi is not enabled by default, because if it is enabled by default
them KVM will start routing machine check exceptions via guest exit
instead of routing it to guest's 0x200.

During early boot since guest has not yet issued nmi-register, KVM is
expected to route exceptions to 0x200. Therefore we enable cap_fwnmi
only when a guest issues nmi-register.

Or we should take the approach of enabling this capability by default
and then from QEMU route the error to 0x200 if guest has not issued
nmi-register.

> 
>> +    if (ret == 1) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    if (ret < 0) {
>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> +        return;
>> +    }
>> +
>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>  }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 03f34bf..9d16ad1 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -78,8 +78,10 @@ typedef enum {
>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>>  /* Count Cache Flush Assist HW Instruction */
>>  #define SPAPR_CAP_CCF_ASSIST            0x09
>> +/* FWNMI machine check handling */
>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>>  /* Num Caps */
>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>>  
>>  /*
>>   * Capability Values
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 5eedce8..9c7b71d 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
>>  static int cap_ppc_count_cache_flush_assist;
>>  static int cap_ppc_nested_kvm_hv;
>>  static int cap_large_decr;
>> +static int cap_ppc_fwnmi;
>>  
>>  static uint32_t debug_inst_opcode;
>>  
>> @@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>      kvmppc_get_cpu_characteristics(s);
>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>>      cap_large_decr = kvmppc_get_dec_bits();
>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>      /*
>>       * Note: setting it to false because there is not such capability
>>       * in KVM at this moment.
>> @@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>      }
>>  }
>>  
>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>> +{
>> +    CPUState *cs = CPU(cpu);
>> +
>> +    if (!cap_ppc_fwnmi) {
>> +        return 1;
>> +    }
>> +
>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>> +}
>> +
>> +
>>  int kvmppc_smt_threads(void)
>>  {
>>      return cap_ppc_smt ? cap_ppc_smt : 1;
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index 6edc42f..28919d3 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
>>  void kvmppc_set_papr(PowerPCCPU *cpu);
>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>>  int kvmppc_smt_threads(void);
>>  void kvmppc_hint_smt_possible(Error **errp);
>>  int kvmppc_set_smt_threads(int smt);
>> @@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>  {
>>  }
>>  
>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>> +{
>> +    return 1;
>> +}
>> +
>>  static inline int kvmppc_smt_threads(void)
>>  {
>>      return 1;
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 6/6] migration: Block migration while handling machine check
  2019-05-10  6:51   ` David Gibson
@ 2019-05-10  7:16     ` Aravinda Prasad
  2019-05-29  5:46     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  1 sibling, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-10  7:16 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Friday 10 May 2019 12:21 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:33:45PM +0530, Aravinda Prasad wrote:
>> Block VM migration requests until the machine check
>> error handling is complete as (i) these errors are
>> specific to the source hardware and is irrelevant on
>> the target hardware, (ii) these errors cause data
>> corruption and should be handled before migration.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>  include/hw/ppc/spapr.h |    3 +++
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 4032db0..45b990c 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -41,6 +41,7 @@
>>  #include "qemu/bcd.h"
>>  #include "hw/ppc/spapr_ovec.h"
>>  #include <libfdt.h>
>> +#include "migration/blocker.h"
>>  
>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>> @@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int ret;
>> +    Error *local_err = NULL;
>> +
>> +    error_setg(&spapr->migration_blocker,
>> +            "Live migration not supported during machine check handling");
>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
>> +    if (ret < 0) {
>> +        /*
>> +         * We don't want to abort and let the migration to continue. In a
>> +         * rare case, the machine check handler will run on the target
>> +         * hardware. Though this is not preferable, it is better than aborting
>> +         * the migration or killing the VM.
>> +         */
>> +        error_free(spapr->migration_blocker);
>> +        fprintf(stderr, "Warning: Machine check during VM migration\n");
> 
> Use report_err() instead of a raw fprintf().

sure..

> 
>> +    }
>>  
>>      while (spapr->mc_status != -1) {
>>          /*
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 997cf19..1229a0e 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -50,6 +50,7 @@
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>>  #include "kvm_ppc.h"
>> +#include "migration/blocker.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +        migrate_del_blocker(spapr->migration_blocker);
>> +        error_free(spapr->migration_blocker);
>> +        spapr->migration_blocker = NULL;
>>      }
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 9d16ad1..dda5fd2 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -10,6 +10,7 @@
>>  #include "hw/ppc/spapr_irq.h"
>>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>>  #include "hw/ppc/xics.h"        /* For ICSState */
>> +#include "qapi/error.h"
>>  
>>  struct SpaprVioBus;
>>  struct SpaprPhbState;
>> @@ -213,6 +214,8 @@ struct SpaprMachineState {
>>      SpaprCapabilities def, eff, mig;
>>  
>>      unsigned gpu_numa_id;
>> +
>> +    Error *migration_blocker;
> 
> This name doesn't seem good - it's specific to fwnmi, not any other
> migration blockers we might have in future.  It also always contains
> the same string - could you just initialize that in a global and just
> do the migrate_add_blocker() / migrate_del_blocker() instead?

sure..

> 
>>  };
>>  
>>  #define H_SUCCESS         0
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-04-22  7:02   ` Aravinda Prasad
  (?)
  (?)
@ 2019-05-10  9:06   ` Greg Kurz
  2019-05-10  9:54     ` David Gibson
                       ` (2 more replies)
  -1 siblings, 3 replies; 65+ messages in thread
From: Greg Kurz @ 2019-05-10  9:06 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Mon, 22 Apr 2019 12:32:58 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor that also received a machine check error waits
> till the first processor is done reading the error log.
> The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed. This patch implements the
> releasing part of the error-log while subsequent patch
> (which builds error log) handles the locking part.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |   18 ++++++++++++++
>  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    9 ++++++-
>  3 files changed, 87 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c56939a..6642cb5 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
>      first_ppc_cpu->env.gpr[5] = 0;
>  
>      spapr->cas_reboot = false;
> +
> +    spapr->guest_machine_check_addr = -1;
> +
> +    /* Signal all vCPUs waiting on this condition */
> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>  }
>  
>  static void spapr_create_nvram(SpaprMachineState *spapr)
> @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
>      },
>  };
>  
> +static const VMStateDescription vmstate_spapr_machine_check = {
> +    .name = "spapr_machine_check",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },

This VMState descriptor is missing a .needed field because we only want
to migrate the subsection if the guest has called NMI register, ie.
spapr->guest_machine_check_addr != (target_ulong) -1.

> +};
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
> @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_dtb,
>          &vmstate_spapr_cap_large_decr,
>          &vmstate_spapr_cap_ccf_assist,
> +        &vmstate_spapr_machine_check,
>          NULL
>      }
>  };
> @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
>  
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
> +
> +    qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index ee24212..c2f3991 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  SpaprMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    uint64_t rtas_addr = spapr_get_rtas_addr();
> +
> +    if (!rtas_addr) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (!spapr->guest_machine_check_addr) {

Hmm... the default value is -1. It looks like the check should rather be:

    if (spapr->guest_machine_check_addr == (target_ulong) -1) {


> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>      }
>  }
>  
> +uint64_t spapr_get_rtas_addr(void)

Shouldn't this be hwaddr instead of uint64_t ?

> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int rtas_node;
> +    const struct fdt_property *rtas_addr_prop;
> +    void *fdt = spapr->fdt_blob;
> +    uint32_t rtas_addr;
> +
> +    /* fetch rtas addr from fdt */
> +    rtas_node = fdt_path_offset(fdt, "/rtas");
> +    if (rtas_node == 0) {
> +        return 0;
> +    }
> +
> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> +    if (!rtas_addr_prop) {

Just for curiosity: this is ok for linux, but what about other OSes (eg. AIX) ?

> +        return 0;
> +    }
> +
> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);

Also this assumes the OS called RTAS instantiate-rtas, but some other
OS might have called RTAS instantiate-rtas-64 instead. I guess it is
ok for now because SLOF only provides the 32-bit variant, but a
comment would certainly help IMHO.

> +    return (uint64_t)rtas_addr;
> +}
> +
> +
>  static void core_rtas_register_types(void)
>  {
>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
>                          rtas_set_power_level);
>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>                          rtas_get_power_level);
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
>  }
>  
>  type_init(core_rtas_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 7e32f30..ec6f33e 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -187,6 +187,10 @@ struct SpaprMachineState {
>       * occurs during the unplug process. */
>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>  
> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> +    target_ulong guest_machine_check_addr;
> +    QemuCond mc_delivery_cond;
> +
>      /*< public >*/
>      char *kvm_type;
>      char *host_model;
> @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> +uint64_t spapr_get_rtas_addr(void);
>  #endif /* HW_SPAPR_H */
> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-10  7:05     ` Aravinda Prasad
@ 2019-05-10  9:52       ` David Gibson
  2019-05-13  5:00         ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-10  9:52 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, qemu-ppc, aik, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 15075 bytes --]

On Fri, May 10, 2019 at 12:35:13PM +0530, Aravinda Prasad wrote:
> 
> 
> On Friday 10 May 2019 12:12 PM, David Gibson wrote:
> > On Mon, Apr 22, 2019 at 12:33:26PM +0530, Aravinda Prasad wrote:
> >> Upon a machine check exception (MCE) in a guest address space,
> >> KVM causes a guest exit to enable QEMU to build and pass the
> >> error to the guest in the PAPR defined rtas error log format.
> >>
> >> This patch builds the rtas error log, copies it to the rtas_addr
> >> and then invokes the guest registered machine check handler. The
> >> handler in the guest takes suitable action(s) depending on the type
> >> and criticality of the error. For example, if an error is
> >> unrecoverable memory corruption in an application inside the
> >> guest, then the guest kernel sends a SIGBUS to the application.
> >> For recoverable errors, the guest performs recovery actions and
> >> logs the error.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |    4 +
> >>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |    4 +
> >>  3 files changed, 253 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 2779efe..ffd1715 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
> >>          error_report("Could not get size of LPAR rtas '%s'", filename);
> >>          exit(1);
> >>      }
> >> +
> >> +    /* Resize blob to accommodate error log. */
> >> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
> >> +
> >>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
> >>          error_report("Could not load LPAR rtas '%s'", filename);
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index 9922a23..4032db0 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -212,6 +212,106 @@ struct hp_extended_log {
> >>      struct rtas_event_log_v6_hp hp;
> >>  } QEMU_PACKED;
> >>  
> >> +struct rtas_event_log_v6_mc {
> >> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> >> +    struct rtas_event_log_v6_section_header hdr;
> >> +    uint32_t fru_id;
> >> +    uint32_t proc_id;
> >> +    uint8_t error_type;
> >> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> >> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> >> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> >> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> >> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> >> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> >> +    uint8_t sub_err_type;
> >> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> >> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> >> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> >> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> >> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> >> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> >> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> >> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> >> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> >> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> >> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> >> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> >> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> >> +    uint8_t reserved_1[6];
> >> +    uint64_t effective_address;
> >> +    uint64_t logical_address;
> >> +} QEMU_PACKED;
> >> +
> >> +struct mc_extended_log {
> >> +    struct rtas_event_log_v6 v6hdr;
> >> +    struct rtas_event_log_v6_mc mc;
> >> +} QEMU_PACKED;
> >> +
> >> +struct MC_ierror_table {
> >> +    unsigned long srr1_mask;
> >> +    unsigned long srr1_value;
> >> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_ierror_table mc_ierror_table[] = {
> >> +{ 0x00000000081c0000, 0x0000000000040000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000080000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000100000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000140000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000000081c0000, 0x0000000000180000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, 0, 0, 0, 0, 0 } };
> >> +
> >> +struct MC_derror_table {
> >> +    unsigned long dsisr_value;
> >> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> >> +    uint8_t error_type;
> >> +    uint8_t error_subtype;
> >> +    unsigned int initiator;
> >> +    unsigned int severity;
> >> +};
> >> +
> >> +static const struct MC_derror_table mc_derror_table[] = {
> >> +{ 0x00008000, false,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00004000, true,
> >> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000800, true,
> >> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000400, true,
> >> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000080, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0x00000100, true,
> >> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> >> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> >> +{ 0, false, 0, 0, 0, 0 } };
> >> +
> >> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> >> +
> >>  typedef enum EventClass {
> >>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
> >>      EVENT_CLASS_EPOW                = 1,
> >> @@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> >>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >>  }
> >>  
> >> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
> >> +{
> >> +    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
> >> +    return RTAS_ERROR_LOG_MAX;
> >> +}
> >> +
> >> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> >> +                                        struct mc_extended_log *ext_elog)
> >> +{
> >> +    int i;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    uint32_t summary;
> >> +    uint64_t dsisr = env->spr[SPR_DSISR];
> >> +
> >> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> >> +    if (recovered) {
> >> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> >> +    } else {
> >> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> >> +    }
> >> +
> >> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> >> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> >> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> >> +            if (mc_derror_table[i].dar_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> >> +            }
> >> +
> >> +            summary |= mc_derror_table[i].initiator
> >> +                        | mc_derror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    } else {
> >> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> >> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> >> +                    mc_ierror_table[i].srr1_value) {
> >> +                continue;
> >> +            }
> >> +
> >> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> >> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> >> +            if (mc_ierror_table[i].nip_valid) {
> >> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> >> +            }
> >> +
> >> +            summary |= mc_ierror_table[i].initiator
> >> +                        | mc_ierror_table[i].severity;
> >> +
> >> +            return summary;
> >> +        }
> >> +    }
> >> +
> >> +    summary |= RTAS_LOG_INITIATOR_CPU;
> >> +    return summary;
> >> +}
> >> +
> >> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >> +{
> >> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    CPUState *cs = CPU(cpu);
> >> +    uint64_t rtas_addr;
> >> +    CPUPPCState *env = &cpu->env;
> >> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> >> +    target_ulong r3, msr = 0;
> >> +    struct rtas_error_log log;
> >> +    struct mc_extended_log *ext_elog;
> >> +    uint32_t summary;
> >> +
> >> +    /*
> >> +     * Properly set bits in MSR before we invoke the handler.
> >> +     * SRR0/1, DAR and DSISR are properly set by KVM
> >> +     */
> >> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> >> +        msr |= (1ULL << MSR_LE);
> >> +    }
> >> +
> >> +    if (env->msr && (1ULL << MSR_SF)) {
> >> +        msr |= (1ULL << MSR_SF);
> >> +    }
> >> +
> >> +    msr |= (1ULL << MSR_ME);
> >> +
> >> +    if (spapr->guest_machine_check_addr == -1) {
> >> +        /*
> >> +         * This implies that we have hit a machine check between system
> >> +         * reset and "ibm,nmi-register". Fall back to the old machine
> >> +         * check behavior in such cases.
> >> +         */
> >> +        env->spr[SPR_SRR0] = env->nip;
> >> +        env->spr[SPR_SRR1] = env->msr;
> >> +        env->msr = msr;
> >> +        env->nip = 0x200;
> >> +        return;
> >> +    }
> >> +
> >> +    ext_elog = g_malloc0(sizeof(struct mc_extended_log));
> >> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> >> +
> >> +    log.summary = cpu_to_be32(summary);
> >> +    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
> >> +
> >> +    /* r3 should be in BE always */
> >> +    r3 = cpu_to_be64(env->gpr[3]);
> >> +    env->msr = msr;
> >> +
> >> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> >> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> >> +    ext_elog->mc.hdr.section_length =
> >> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> >> +    ext_elog->mc.hdr.section_version = 1;
> >> +
> >> +    /* get rtas addr from fdt */
> >> +    rtas_addr = spapr_get_rtas_addr();
> >> +    if (!rtas_addr) {
> >> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> >> +        ppc_cpu_do_system_reset(cs);
> >> +    }
> >> +
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
> >> +                              &log, sizeof(log));
> >> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
> >> +                              sizeof(log), ext_elog,
> >> +                              sizeof(struct mc_extended_log));
> >> +
> >> +    /* Save gpr[3] in the guest endian mode */
> >> +    if ((*pcc->interrupts_big_endian)(cpu)) {
> >> +        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
> > 
> > I don't think this is right.  AIUI env->gpr[] are all stored in *host*
> > endianness (for ease of doing arithmetic).
> 
> env-gpr[3] is later used by guest to fetch the RTAS log. My guess is
> that we will not do an endianness change of all the gprs during a switch
> from host to guest (that will be costly).

There's no need to "change endianness".  In TCG the host needs to do
arithmetic on the values and so they are in host endian.  With KVM the
env values are only synchronized when we enter/exit KVM and they're
going to registers, not memory and so have no endianness.

> But let me cross check.
> 
> > 
> >> +    } else {
> >> +        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
> >> +    }
> >> +
> >> +    env->nip = spapr->guest_machine_check_addr;
> >> +}
> >> +
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>  {
> >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> @@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>          }
> >>      }
> >>      spapr->mc_status = cpu->vcpu_id;
> >> +
> >> +    spapr_mce_dispatch_elog(cpu, recovered);
> >> +
> >> +    return;
> >>  }
> >>  
> >>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index f7204d0..03f34bf 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> >>  #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
> >>  #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
> >>  
> >> +/* Offset from rtas-base where error log is placed */
> >> +#define RTAS_ERRLOG_OFFSET       0x25
> > 
> > Is this offset PAPR defined, or chosen here?  Using an entirely
> > unaliged (odd) address seems a very strange choice.
> 
> This is not PAPR defined. I will make it 0x30. Or do you prefer any
> other offset?

0x30 should be fine.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-10  7:15     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
@ 2019-05-10  9:53       ` David Gibson
  2019-05-13 10:30         ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-10  9:53 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, aik, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9181 bytes --]

On Fri, May 10, 2019 at 12:45:29PM +0530, Aravinda Prasad wrote:
> 
> 
> On Friday 10 May 2019 12:16 PM, David Gibson wrote:
> > On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
> >> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
> >> the KVM causes guest exit with NMI as exit reason
> >> when it encounters a machine check exception on the
> >> address belonging to a guest. Without this capability
> >> enabled, KVM redirects machine check exceptions to
> >> guest's 0x200 vector.
> >>
> >> This patch also deals with the case when a guest with
> >> the KVM_CAP_PPC_FWNMI capability enabled is attempted
> >> to migrate to a host that does not support this
> >> capability.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |    1 +
> >>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
> >>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
> >>  include/hw/ppc/spapr.h |    4 +++-
> >>  target/ppc/kvm.c       |   14 ++++++++++++++
> >>  target/ppc/kvm_ppc.h   |    6 ++++++
> >>  6 files changed, 64 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index ffd1715..44e09bb 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>      spapr_caps_add_properties(smc, &error_abort);
> >>      smc->irq = &spapr_irq_xics;
> >>      smc->dr_phb_enabled = true;
> >> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >> index edc5ed0..5b3af04 100644
> >> --- a/hw/ppc/spapr_caps.c
> >> +++ b/hw/ppc/spapr_caps.c
> >> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>      }
> >>  }
> >>  
> >> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >> +                                Error **errp)
> >> +{
> >> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> >> +
> >> +    if (!val) {
> >> +        return; /* Disabled by default */
> >> +    }
> >> +
> >> +    if (kvm_enabled()) {
> >> +        if (kvmppc_fwnmi_enable(cpu)) {
> >> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
> >> +        }
> >> +    }
> >> +}
> >> +
> >>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>      [SPAPR_CAP_HTM] = {
> >>          .name = "htm",
> >> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>          .type = "bool",
> >>          .apply = cap_ccf_assist_apply,
> >>      },
> >> +    [SPAPR_CAP_FWNMI_MCE] = {
> >> +        .name = "fwnmi-mce",
> >> +        .description = "Handle fwnmi machine check exceptions",
> >> +        .index = SPAPR_CAP_FWNMI_MCE,
> >> +        .get = spapr_cap_get_bool,
> >> +        .set = spapr_cap_set_bool,
> >> +        .type = "bool",
> >> +        .apply = cap_fwnmi_mce_apply,
> >> +    },
> >>  };
> >>  
> >>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> >> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
> >>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> >>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
> >>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> >> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
> >>  
> >>  void spapr_caps_init(SpaprMachineState *spapr)
> >>  {
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index d3499f9..997cf19 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -49,6 +49,7 @@
> >>  #include "hw/ppc/fdt.h"
> >>  #include "target/ppc/mmu-hash64.h"
> >>  #include "target/ppc/mmu-book3s-v3.h"
> >> +#include "kvm_ppc.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>                                    target_ulong args,
> >>                                    uint32_t nret, target_ulong rets)
> >>  {
> >> +    int ret;
> >>      uint64_t rtas_addr = spapr_get_rtas_addr();
> >>  
> >>      if (!rtas_addr) {
> >> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>          return;
> >>      }
> >>  
> >> +    ret = kvmppc_fwnmi_enable(cpu);
> > 
> > You shouldn't need this here as well as in cap_fwnmi_mce_apply().
> > 
> > Instead, you should unconditionally fail the nmi-register if the
> > capability is not enabled.
> 
> cap_fwnmi is not enabled by default, because if it is enabled by default
> them KVM will start routing machine check exceptions via guest exit
> instead of routing it to guest's 0x200.
> 
> During early boot since guest has not yet issued nmi-register, KVM is
> expected to route exceptions to 0x200. Therefore we enable cap_fwnmi
> only when a guest issues nmi-register.

Except that's not true - you enable it in cap_fwnmi_mce_apply() which
will be executed whenever the machine capability is enabled.

> Or we should take the approach of enabling this capability by default
> and then from QEMU route the error to 0x200 if guest has not issued
> nmi-register.
> 
> > 
> >> +    if (ret == 1) {
> >> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >> +        return;
> >> +    }
> >> +
> >> +    if (ret < 0) {
> >> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >> +        return;
> >> +    }
> >> +
> >>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>  }
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index 03f34bf..9d16ad1 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -78,8 +78,10 @@ typedef enum {
> >>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
> >>  /* Count Cache Flush Assist HW Instruction */
> >>  #define SPAPR_CAP_CCF_ASSIST            0x09
> >> +/* FWNMI machine check handling */
> >> +#define SPAPR_CAP_FWNMI_MCE             0x0A
> >>  /* Num Caps */
> >> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
> >> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
> >>  
> >>  /*
> >>   * Capability Values
> >> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> >> index 5eedce8..9c7b71d 100644
> >> --- a/target/ppc/kvm.c
> >> +++ b/target/ppc/kvm.c
> >> @@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
> >>  static int cap_ppc_count_cache_flush_assist;
> >>  static int cap_ppc_nested_kvm_hv;
> >>  static int cap_large_decr;
> >> +static int cap_ppc_fwnmi;
> >>  
> >>  static uint32_t debug_inst_opcode;
> >>  
> >> @@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >>      kvmppc_get_cpu_characteristics(s);
> >>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
> >>      cap_large_decr = kvmppc_get_dec_bits();
> >> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
> >>      /*
> >>       * Note: setting it to false because there is not such capability
> >>       * in KVM at this moment.
> >> @@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>      }
> >>  }
> >>  
> >> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> >> +{
> >> +    CPUState *cs = CPU(cpu);
> >> +
> >> +    if (!cap_ppc_fwnmi) {
> >> +        return 1;
> >> +    }
> >> +
> >> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> >> +}
> >> +
> >> +
> >>  int kvmppc_smt_threads(void)
> >>  {
> >>      return cap_ppc_smt ? cap_ppc_smt : 1;
> >> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> >> index 6edc42f..28919d3 100644
> >> --- a/target/ppc/kvm_ppc.h
> >> +++ b/target/ppc/kvm_ppc.h
> >> @@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
> >>  void kvmppc_set_papr(PowerPCCPU *cpu);
> >>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
> >>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
> >> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
> >>  int kvmppc_smt_threads(void);
> >>  void kvmppc_hint_smt_possible(Error **errp);
> >>  int kvmppc_set_smt_threads(int smt);
> >> @@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>  {
> >>  }
> >>  
> >> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> >> +{
> >> +    return 1;
> >> +}
> >> +
> >>  static inline int kvmppc_smt_threads(void)
> >>  {
> >>      return 1;
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-05-10  9:06   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-05-10  9:54     ` David Gibson
  2019-05-10 14:33     ` Greg Kurz
  2019-05-13  4:53     ` Aravinda Prasad
  2 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-05-10  9:54 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, Aravinda Prasad

[-- Attachment #1: Type: text/plain, Size: 6620 bytes --]

On Fri, May 10, 2019 at 11:06:04AM +0200, Greg Kurz wrote:
> On Mon, 22 Apr 2019 12:32:58 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
> > This patch adds support in QEMU to handle "ibm,nmi-register"
> > and "ibm,nmi-interlock" RTAS calls.
> > 
> > The machine check notification address is saved when the
> > OS issues "ibm,nmi-register" RTAS call.
> > 
> > This patch also handles the case when multiple processors
> > experience machine check at or about the same time by
> > handling "ibm,nmi-interlock" call. In such cases, as per
> > PAPR, subsequent processors serialize waiting for the first
> > processor to issue the "ibm,nmi-interlock" call. The second
> > processor that also received a machine check error waits
> > till the first processor is done reading the error log.
> > The first processor issues "ibm,nmi-interlock" call
> > when the error log is consumed. This patch implements the
> > releasing part of the error-log while subsequent patch
> > (which builds error log) handles the locking part.
> > 
> > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |   18 ++++++++++++++
> >  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h |    9 ++++++-
> >  3 files changed, 87 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index c56939a..6642cb5 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
> >      first_ppc_cpu->env.gpr[5] = 0;
> >  
> >      spapr->cas_reboot = false;
> > +
> > +    spapr->guest_machine_check_addr = -1;
> > +
> > +    /* Signal all vCPUs waiting on this condition */
> > +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
> >  }
> >  
> >  static void spapr_create_nvram(SpaprMachineState *spapr)
> > @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
> >      },
> >  };
> >  
> > +static const VMStateDescription vmstate_spapr_machine_check = {
> > +    .name = "spapr_machine_check",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> 
> This VMState descriptor is missing a .needed field because we only want
> to migrate the subsection if the guest has called NMI register, ie.
> spapr->guest_machine_check_addr != (target_ulong) -1.
> 
> > +};
> > +
> >  static const VMStateDescription vmstate_spapr = {
> >      .name = "spapr",
> >      .version_id = 3,
> > @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
> >          &vmstate_spapr_dtb,
> >          &vmstate_spapr_cap_large_decr,
> >          &vmstate_spapr_cap_ccf_assist,
> > +        &vmstate_spapr_machine_check,
> >          NULL
> >      }
> >  };
> > @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
> >  
> >          kvmppc_spapr_enable_inkernel_multitce();
> >      }
> > +
> > +    qemu_cond_init(&spapr->mc_delivery_cond);
> >  }
> >  
> >  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index ee24212..c2f3991 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >      rtas_st(rets, 1, 100);
> >  }
> >  
> > +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> > +                                  SpaprMachineState *spapr,
> > +                                  uint32_t token, uint32_t nargs,
> > +                                  target_ulong args,
> > +                                  uint32_t nret, target_ulong rets)
> > +{
> > +    uint64_t rtas_addr = spapr_get_rtas_addr();
> > +
> > +    if (!rtas_addr) {
> > +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> > +        return;
> > +    }
> > +
> > +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +}
> > +
> > +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> > +                                   SpaprMachineState *spapr,
> > +                                   uint32_t token, uint32_t nargs,
> > +                                   target_ulong args,
> > +                                   uint32_t nret, target_ulong rets)
> > +{
> > +    if (!spapr->guest_machine_check_addr) {
> 
> Hmm... the default value is -1. It looks like the check should rather be:
> 
>     if (spapr->guest_machine_check_addr == (target_ulong) -1) {
> 
> 
> > +        /* NMI register not called */
> > +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > +    } else {
> > +        qemu_cond_signal(&spapr->mc_delivery_cond);
> > +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    }
> > +}
> > +
> > +
> >  static struct rtas_call {
> >      const char *name;
> >      spapr_rtas_fn fn;
> > @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
> >      }
> >  }
> >  
> > +uint64_t spapr_get_rtas_addr(void)
> 
> Shouldn't this be hwaddr instead of uint64_t ?
> 
> > +{
> > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > +    int rtas_node;
> > +    const struct fdt_property *rtas_addr_prop;
> > +    void *fdt = spapr->fdt_blob;
> > +    uint32_t rtas_addr;
> > +
> > +    /* fetch rtas addr from fdt */
> > +    rtas_node = fdt_path_offset(fdt, "/rtas");
> > +    if (rtas_node == 0) {
> > +        return 0;
> > +    }
> > +
> > +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> > +    if (!rtas_addr_prop) {
> 
> Just for curiosity: this is ok for linux, but what about other OSes (eg. AIX) ?
> 
> > +        return 0;
> > +    }
> > +
> > +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> 
> Also this assumes the OS called RTAS instantiate-rtas, but some other
> OS might have called RTAS instantiate-rtas-64 instead. I guess it is
> ok for now because SLOF only provides the 32-bit variant, but a
> comment would certainly help IMHO.

I have a feeling kvm-unit-tests may not call instantiate-rtas at all.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 2/6] Wrapper function to wait on condition for the main loop mutex
  2019-04-22  7:03   ` Aravinda Prasad
  (?)
  (?)
@ 2019-05-10 13:14   ` Greg Kurz
  -1 siblings, 0 replies; 65+ messages in thread
From: Greg Kurz @ 2019-05-10 13:14 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Mon, 22 Apr 2019 12:33:07 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> Introduce a wrapper function to wait on condition for
> the main loop mutex. This function atomically releases
> the main loop mutex and causes the calling thread to
> block on the condition. This wrapper is required because
> qemu_global_mutex is a static variable.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  cpus.c                   |    5 +++++
>  include/qemu/main-loop.h |    8 ++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index e83f72b..d9379e7 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1858,6 +1858,11 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> +void qemu_cond_wait_iothread(QemuCond *cond)
> +{
> +    qemu_cond_wait(cond, &qemu_global_mutex);
> +}
> +
>  static bool all_vcpus_paused(void)
>  {
>      CPUState *cpu;
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index f6ba78e..a6d20b0 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -295,6 +295,14 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
>   */
>  void qemu_mutex_unlock_iothread(void);
>  
> +/*
> + * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
> + *
> + * This function atomically releases the main loop mutex and causes
> + * the calling thread to block on the condition.
> + */
> +void qemu_cond_wait_iothread(QemuCond *cond);
> +
>  /* internal interfaces */
>  
>  void qemu_fd_register(int fd);
> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-05-10  9:06   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-05-10  9:54     ` David Gibson
@ 2019-05-10 14:33     ` Greg Kurz
  2019-05-13  4:57       ` Aravinda Prasad
  2019-05-13  4:53     ` Aravinda Prasad
  2 siblings, 1 reply; 65+ messages in thread
From: Greg Kurz @ 2019-05-10 14:33 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Fri, 10 May 2019 11:06:04 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Mon, 22 Apr 2019 12:32:58 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
> > This patch adds support in QEMU to handle "ibm,nmi-register"
> > and "ibm,nmi-interlock" RTAS calls.
> > 
> > The machine check notification address is saved when the
> > OS issues "ibm,nmi-register" RTAS call.
> > 
> > This patch also handles the case when multiple processors
> > experience machine check at or about the same time by
> > handling "ibm,nmi-interlock" call. In such cases, as per
> > PAPR, subsequent processors serialize waiting for the first
> > processor to issue the "ibm,nmi-interlock" call. The second
> > processor that also received a machine check error waits
> > till the first processor is done reading the error log.
> > The first processor issues "ibm,nmi-interlock" call
> > when the error log is consumed. This patch implements the
> > releasing part of the error-log while subsequent patch
> > (which builds error log) handles the locking part.
> > 
> > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |   18 ++++++++++++++
> >  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h |    9 ++++++-
> >  3 files changed, 87 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index c56939a..6642cb5 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
> >      first_ppc_cpu->env.gpr[5] = 0;
> >  
> >      spapr->cas_reboot = false;
> > +
> > +    spapr->guest_machine_check_addr = -1;
> > +
> > +    /* Signal all vCPUs waiting on this condition */
> > +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
> >  }
> >  
> >  static void spapr_create_nvram(SpaprMachineState *spapr)
> > @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
> >      },
> >  };
> >  
> > +static const VMStateDescription vmstate_spapr_machine_check = {
> > +    .name = "spapr_machine_check",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),

Also this should use VMSTATE_UINTTL()

> > +        VMSTATE_END_OF_LIST()
> > +    },  
> 
> This VMState descriptor is missing a .needed field because we only want
> to migrate the subsection if the guest has called NMI register, ie.
> spapr->guest_machine_check_addr != (target_ulong) -1.
> 
> > +};
> > +
> >  static const VMStateDescription vmstate_spapr = {765cf442a8afe8e5c8c6896b5072066df5129077
> >      .name = "spapr",
> >      .version_id = 3,
> > @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
> >          &vmstate_spapr_dtb,
> >          &vmstate_spapr_cap_large_decr,
> >          &vmstate_spapr_cap_ccf_assist,
> > +        &vmstate_spapr_machine_check,
> >          NULL
> >      }
> >  };
> > @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
> >  
> >          kvmppc_spapr_enable_inkernel_multitce();
> >      }
> > +
> > +    qemu_cond_init(&spapr->mc_delivery_cond);
> >  }
> >  
> >  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index ee24212..c2f3991 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >      rtas_st(rets, 1, 100);
> >  }
> >  
> > +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> > +                                  SpaprMachineState *spapr,
> > +                                  uint32_t token, uint32_t nargs,
> > +                                  target_ulong args,
> > +                                  uint32_t nret, target_ulong rets)
> > +{
> > +    uint64_t rtas_addr = spapr_get_rtas_addr();
> > +
> > +    if (!rtas_addr) {
> > +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> > +        return;
> > +    }
> > +
> > +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +}
> > +
> > +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> > +                                   SpaprMachineState *spapr,
> > +                                   uint32_t token, uint32_t nargs,
> > +                                   target_ulong args,
> > +                                   uint32_t nret, target_ulong rets)
> > +{
> > +    if (!spapr->guest_machine_check_addr) {  
> 
> Hmm... the default value is -1. It looks like the check should rather be:
> 
>     if (spapr->guest_machine_check_addr == (target_ulong) -1) {
> 
> 
> > +        /* NMI register not called */
> > +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > +    } else {
> > +        qemu_cond_signal(&spapr->mc_delivery_cond);
> > +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    }
> > +}
> > +
> > +
> >  static struct rtas_call {
> >      const char *name;
> >      spapr_rtas_fn fn;
> > @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
> >      }
> >  }
> >  
> > +uint64_t spapr_get_rtas_addr(void)  
> 
> Shouldn't this be hwaddr instead of uint64_t ?
> 
> > +{
> > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > +    int rtas_node;
> > +    const struct fdt_property *rtas_addr_prop;
> > +    void *fdt = spapr->fdt_blob;
> > +    uint32_t rtas_addr;
> > +
> > +    /* fetch rtas addr from fdt */
> > +    rtas_node = fdt_path_offset(fdt, "/rtas");
> > +    if (rtas_node == 0) {
> > +        return 0;
> > +    }
> > +
> > +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
> > +    if (!rtas_addr_prop) {  
> 
> Just for curiosity: this is ok for linux, but what about other OSes (eg. AIX) ?
> 
> > +        return 0;
> > +    }
> > +
> > +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);  
> 
> Also this assumes the OS called RTAS instantiate-rtas, but some other
> OS might have called RTAS instantiate-rtas-64 instead. I guess it is
> ok for now because SLOF only provides the 32-bit variant, but a
> comment would certainly help IMHO.
> 
> > +    return (uint64_t)rtas_addr;
> > +}
> > +
> > +
> >  static void core_rtas_register_types(void)
> >  {
> >      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
> > @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
> >                          rtas_set_power_level);
> >      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> >                          rtas_get_power_level);
> > +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> > +                        rtas_ibm_nmi_register);
> > +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> > +                        rtas_ibm_nmi_interlock);
> >  }
> >  
> >  type_init(core_rtas_register_types)
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 7e32f30..ec6f33e 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -187,6 +187,10 @@ struct SpaprMachineState {
> >       * occurs during the unplug process. */
> >      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
> >  
> > +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> > +    target_ulong guest_machine_check_addr;
> > +    QemuCond mc_delivery_cond;
> > +
> >      /*< public >*/
> >      char *kvm_type;
> >      char *host_model;
> > @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> >  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
> >  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
> >  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
> > +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
> > +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
> >  
> > -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
> > +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
> >  
> >  /* RTAS ibm,get-system-parameter token values */
> >  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> > @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> >  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
> >  
> >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > +uint64_t spapr_get_rtas_addr(void);
> >  #endif /* HW_SPAPR_H */
> > 
> >   
> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
  2019-04-22  7:03   ` Aravinda Prasad
  (?)
  (?)
@ 2019-05-10 16:25   ` Greg Kurz
  2019-05-13  5:40     ` Aravinda Prasad
  -1 siblings, 1 reply; 65+ messages in thread
From: Greg Kurz @ 2019-05-10 16:25 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Mon, 22 Apr 2019 12:33:16 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> Memory error such as bit flips that cannot be corrected
> by hardware are passed on to the kernel for handling.
> If the memory address in error belongs to guest then
> the guest kernel is responsible for taking suitable action.
> Patch [1] enhances KVM to exit guest with exit reason
> set to KVM_EXIT_NMI in such cases. This patch handles
> KVM_EXIT_NMI exit.
> 
> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>     (e20bbd3d and related commits)
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c          |    3 +++
>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
>  hw/ppc/spapr_rtas.c     |    5 +++++
>  include/hw/ppc/spapr.h  |    6 ++++++
>  target/ppc/kvm.c        |   16 ++++++++++++++++
>  target/ppc/kvm_ppc.h    |    2 ++
>  target/ppc/trace-events |    2 ++
>  7 files changed, 56 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6642cb5..2779efe 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
>  
>      spapr->cas_reboot = false;
>  
> +    spapr->mc_status = -1;
>      spapr->guest_machine_check_addr = -1;
>  
>      /* Signal all vCPUs waiting on this condition */
> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
>      .minimum_version_id = 1,
>      .fields = (VMStateField[]) {
>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
>          kvmppc_spapr_enable_inkernel_multitce();
>      }
>  
> +    spapr->mc_status = -1;

Since this is done at reset, do we need it here ?

>      qemu_cond_init(&spapr->mc_delivery_cond);
>  }
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ae0f093..9922a23 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +
> +    while (spapr->mc_status != -1) {
> +        /*
> +         * Check whether the same CPU got machine check error
> +         * while still handling the mc error (i.e., before
> +         * that CPU called "ibm,nmi-interlock"

Missing )

> +         */
> +        if (spapr->mc_status == cpu->vcpu_id) {
> +            qemu_system_guest_panicked(NULL);

If we don't also return, is there a chance we end up stuck in
qemu_cond_wait_iothread() below ?

> +        }
> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> +        /* Meanwhile if the system is reset, then just return */
> +        if (spapr->guest_machine_check_addr == -1) {
> +            return;
> +        }
> +    }
> +    spapr->mc_status = cpu->vcpu_id;
> +}
> +
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                              uint32_t token, uint32_t nargs,
>                              target_ulong args,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index c2f3991..d3499f9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          /* NMI register not called */
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>      } else {
> +        /*
> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_status.
> +         */
> +        spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>      }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ec6f33e..f7204d0 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -189,6 +189,11 @@ struct SpaprMachineState {
>  
>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>      target_ulong guest_machine_check_addr;
> +    /*
> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> +     * handling the mc.
> +     */
> +    int mc_status;
>      QemuCond mc_delivery_cond;
>  
>      /*< public >*/
> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 9e86db0..5eedce8 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ret = 0;
>          break;
>  
> +    case KVM_EXIT_NMI:
> +        trace_kvm_handle_nmi_exception();
> +        ret = kvm_handle_nmi(cpu, run);
> +        break;
> +
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>      return data & 0xffff;
>  }
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> +{
> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> +
> +    cpu_synchronize_state(CPU(cpu));
> +
> +    spapr_mce_req_event(cpu, recovered);
> +
> +    return 0;
> +}
> +
>  int kvmppc_enable_hwrng(void)
>  {
>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 2238513..6edc42f 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>  
> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> +
>  #else
>  
>  static inline uint32_t kvmppc_get_tbfreq(void)
> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> index 7b3cfe1..d5691d2 100644
> --- a/target/ppc/trace-events
> +++ b/target/ppc/trace-events
> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>  kvm_handle_epr(void) "handle epr"
>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>  kvm_handle_debug_exception(void) "handle debug exception"
> +kvm_handle_nmi_exception(void) "handle NMI exception"
> +

new blank line at EOF.

> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-05-10  9:06   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2019-05-10  9:54     ` David Gibson
  2019-05-10 14:33     ` Greg Kurz
@ 2019-05-13  4:53     ` Aravinda Prasad
  2 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-13  4:53 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Friday 10 May 2019 02:36 PM, Greg Kurz wrote:
> On Mon, 22 Apr 2019 12:32:58 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls.
>>
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor that also received a machine check error waits
>> till the first processor is done reading the error log.
>> The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed. This patch implements the
>> releasing part of the error-log while subsequent patch
>> (which builds error log) handles the locking part.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |   18 ++++++++++++++
>>  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |    9 ++++++-
>>  3 files changed, 87 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index c56939a..6642cb5 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
>>      first_ppc_cpu->env.gpr[5] = 0;
>>  
>>      spapr->cas_reboot = false;
>> +
>> +    spapr->guest_machine_check_addr = -1;
>> +
>> +    /* Signal all vCPUs waiting on this condition */
>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>> @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>      },
>>  };
>>  
>> +static const VMStateDescription vmstate_spapr_machine_check = {
>> +    .name = "spapr_machine_check",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_END_OF_LIST()
>> +    },
> 
> This VMState descriptor is missing a .needed field because we only want
> to migrate the subsection if the guest has called NMI register, ie.
> spapr->guest_machine_check_addr != (target_ulong) -1.

Ok.. let me check.

> 
>> +};
>> +
>>  static const VMStateDescription vmstate_spapr = {
>>      .name = "spapr",
>>      .version_id = 3,
>> @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
>>          &vmstate_spapr_dtb,
>>          &vmstate_spapr_cap_large_decr,
>>          &vmstate_spapr_cap_ccf_assist,
>> +        &vmstate_spapr_machine_check,
>>          NULL
>>      }
>>  };
>> @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
>>  
>>          kvmppc_spapr_enable_inkernel_multitce();
>>      }
>> +
>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>  }
>>  
>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index ee24212..c2f3991 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>      rtas_st(rets, 1, 100);
>>  }
>>  
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  SpaprMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    uint64_t rtas_addr = spapr_get_rtas_addr();
>> +
>> +    if (!rtas_addr) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (!spapr->guest_machine_check_addr) {
> 
> Hmm... the default value is -1. It looks like the check should rather be:
> 
>     if (spapr->guest_machine_check_addr == (target_ulong) -1) {

ok..

> 
> 
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +    } else {
>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    }
>> +}
>> +
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>      }
>>  }
>>  
>> +uint64_t spapr_get_rtas_addr(void)
> 
> Shouldn't this be hwaddr instead of uint64_t ?

Yes, I will change it.

> 
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int rtas_node;
>> +    const struct fdt_property *rtas_addr_prop;
>> +    void *fdt = spapr->fdt_blob;
>> +    uint32_t rtas_addr;
>> +
>> +    /* fetch rtas addr from fdt */
>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>> +    if (rtas_node == 0) {
>> +        return 0;
>> +    }
>> +
>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
>> +    if (!rtas_addr_prop) {
> 
> Just for curiosity: this is ok for linux, but what about other OSes (eg. AIX) ?

Really not sure! Need to check.

> 
>> +        return 0;
>> +    }
>> +
>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);
> 
> Also this assumes the OS called RTAS instantiate-rtas, but some other
> OS might have called RTAS instantiate-rtas-64 instead. I guess it is
> ok for now because SLOF only provides the 32-bit variant, but a
> comment would certainly help IMHO.

Sure..

Regards,
Aravinda

> 
>> +    return (uint64_t)rtas_addr;
>> +}
>> +
>> +
>>  static void core_rtas_register_types(void)
>>  {
>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>> @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
>>                          rtas_set_power_level);
>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>                          rtas_get_power_level);
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>>  }
>>  
>>  type_init(core_rtas_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 7e32f30..ec6f33e 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -187,6 +187,10 @@ struct SpaprMachineState {
>>       * occurs during the unplug process. */
>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>  
>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>> +    target_ulong guest_machine_check_addr;
>> +    QemuCond mc_delivery_cond;
>> +
>>      /*< public >*/
>>      char *kvm_type;
>>      char *host_model;
>> @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>> @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>  
>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>> +uint64_t spapr_get_rtas_addr(void);
>>  #endif /* HW_SPAPR_H */
>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2019-05-10 14:33     ` Greg Kurz
@ 2019-05-13  4:57       ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-13  4:57 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Friday 10 May 2019 08:03 PM, Greg Kurz wrote:
> On Fri, 10 May 2019 11:06:04 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
>> On Mon, 22 Apr 2019 12:32:58 +0530
>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>
>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>> and "ibm,nmi-interlock" RTAS calls.
>>>
>>> The machine check notification address is saved when the
>>> OS issues "ibm,nmi-register" RTAS call.
>>>
>>> This patch also handles the case when multiple processors
>>> experience machine check at or about the same time by
>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>> PAPR, subsequent processors serialize waiting for the first
>>> processor to issue the "ibm,nmi-interlock" call. The second
>>> processor that also received a machine check error waits
>>> till the first processor is done reading the error log.
>>> The first processor issues "ibm,nmi-interlock" call
>>> when the error log is consumed. This patch implements the
>>> releasing part of the error-log while subsequent patch
>>> (which builds error log) handles the locking part.
>>>
>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>> ---
>>>  hw/ppc/spapr.c         |   18 ++++++++++++++
>>>  hw/ppc/spapr_rtas.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/hw/ppc/spapr.h |    9 ++++++-
>>>  3 files changed, 87 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index c56939a..6642cb5 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1805,6 +1805,11 @@ static void spapr_machine_reset(void)
>>>      first_ppc_cpu->env.gpr[5] = 0;
>>>  
>>>      spapr->cas_reboot = false;
>>> +
>>> +    spapr->guest_machine_check_addr = -1;
>>> +
>>> +    /* Signal all vCPUs waiting on this condition */
>>> +    qemu_cond_broadcast(&spapr->mc_delivery_cond);
>>>  }
>>>  
>>>  static void spapr_create_nvram(SpaprMachineState *spapr)
>>> @@ -2095,6 +2100,16 @@ static const VMStateDescription vmstate_spapr_dtb = {
>>>      },
>>>  };
>>>  
>>> +static const VMStateDescription vmstate_spapr_machine_check = {
>>> +    .name = "spapr_machine_check",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .fields = (VMStateField[]) {
>>> +        VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> 
> Also this should use VMSTATE_UINTTL()

sure..

Regards,
Aravinda

> 
>>> +        VMSTATE_END_OF_LIST()
>>> +    },  
>>
>> This VMState descriptor is missing a .needed field because we only want
>> to migrate the subsection if the guest has called NMI register, ie.
>> spapr->guest_machine_check_addr != (target_ulong) -1.
>>
>>> +};
>>> +
>>>  static const VMStateDescription vmstate_spapr = {765cf442a8afe8e5c8c6896b5072066df5129077
>>>      .name = "spapr",
>>>      .version_id = 3,
>>> @@ -2127,6 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
>>>          &vmstate_spapr_dtb,
>>>          &vmstate_spapr_cap_large_decr,
>>>          &vmstate_spapr_cap_ccf_assist,
>>> +        &vmstate_spapr_machine_check,
>>>          NULL
>>>      }
>>>  };
>>> @@ -3068,6 +3084,8 @@ static void spapr_machine_init(MachineState *machine)
>>>  
>>>          kvmppc_spapr_enable_inkernel_multitce();
>>>      }
>>> +
>>> +    qemu_cond_init(&spapr->mc_delivery_cond);
>>>  }
>>>  
>>>  static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index ee24212..c2f3991 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -348,6 +348,39 @@ static void rtas_get_power_level(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>      rtas_st(rets, 1, 100);
>>>  }
>>>  
>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>> +                                  SpaprMachineState *spapr,
>>> +                                  uint32_t token, uint32_t nargs,
>>> +                                  target_ulong args,
>>> +                                  uint32_t nret, target_ulong rets)
>>> +{
>>> +    uint64_t rtas_addr = spapr_get_rtas_addr();
>>> +
>>> +    if (!rtas_addr) {
>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>> +        return;
>>> +    }
>>> +
>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +}
>>> +
>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>> +                                   SpaprMachineState *spapr,
>>> +                                   uint32_t token, uint32_t nargs,
>>> +                                   target_ulong args,
>>> +                                   uint32_t nret, target_ulong rets)
>>> +{
>>> +    if (!spapr->guest_machine_check_addr) {  
>>
>> Hmm... the default value is -1. It looks like the check should rather be:
>>
>>     if (spapr->guest_machine_check_addr == (target_ulong) -1) {
>>
>>
>>> +        /* NMI register not called */
>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> +    } else {
>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +    }
>>> +}
>>> +
>>> +
>>>  static struct rtas_call {
>>>      const char *name;
>>>      spapr_rtas_fn fn;
>>> @@ -466,6 +499,30 @@ void spapr_load_rtas(SpaprMachineState *spapr, void *fdt, hwaddr addr)
>>>      }
>>>  }
>>>  
>>> +uint64_t spapr_get_rtas_addr(void)  
>>
>> Shouldn't this be hwaddr instead of uint64_t ?
>>
>>> +{
>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> +    int rtas_node;
>>> +    const struct fdt_property *rtas_addr_prop;
>>> +    void *fdt = spapr->fdt_blob;
>>> +    uint32_t rtas_addr;
>>> +
>>> +    /* fetch rtas addr from fdt */
>>> +    rtas_node = fdt_path_offset(fdt, "/rtas");
>>> +    if (rtas_node == 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    rtas_addr_prop = fdt_get_property(fdt, rtas_node, "linux,rtas-base", NULL);
>>> +    if (!rtas_addr_prop) {  
>>
>> Just for curiosity: this is ok for linux, but what about other OSes (eg. AIX) ?
>>
>>> +        return 0;
>>> +    }
>>> +
>>> +    rtas_addr = fdt32_to_cpu(*(uint32_t *)rtas_addr_prop->data);  
>>
>> Also this assumes the OS called RTAS instantiate-rtas, but some other
>> OS might have called RTAS instantiate-rtas-64 instead. I guess it is
>> ok for now because SLOF only provides the 32-bit variant, but a
>> comment would certainly help IMHO.
>>
>>> +    return (uint64_t)rtas_addr;
>>> +}
>>> +
>>> +
>>>  static void core_rtas_register_types(void)
>>>  {
>>>      spapr_rtas_register(RTAS_DISPLAY_CHARACTER, "display-character",
>>> @@ -489,6 +546,10 @@ static void core_rtas_register_types(void)
>>>                          rtas_set_power_level);
>>>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>>                          rtas_get_power_level);
>>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>>> +                        rtas_ibm_nmi_register);
>>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>>> +                        rtas_ibm_nmi_interlock);
>>>  }
>>>  
>>>  type_init(core_rtas_register_types)
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index 7e32f30..ec6f33e 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -187,6 +187,10 @@ struct SpaprMachineState {
>>>       * occurs during the unplug process. */
>>>      QTAILQ_HEAD(, SpaprDimmState) pending_dimm_unplugs;
>>>  
>>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>> +    target_ulong guest_machine_check_addr;
>>> +    QemuCond mc_delivery_cond;
>>> +
>>>      /*< public >*/
>>>      char *kvm_type;
>>>      char *host_model;
>>> @@ -623,8 +627,10 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>>  #define RTAS_IBM_CREATE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x27)
>>>  #define RTAS_IBM_REMOVE_PE_DMA_WINDOW           (RTAS_TOKEN_BASE + 0x28)
>>>  #define RTAS_IBM_RESET_PE_DMA_WINDOW            (RTAS_TOKEN_BASE + 0x29)
>>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2A)
>>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2B)
>>>  
>>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2A)
>>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2C)
>>>  
>>>  /* RTAS ibm,get-system-parameter token values */
>>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>> @@ -874,4 +880,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>>>  #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
>>>  
>>>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>>> +uint64_t spapr_get_rtas_addr(void);
>>>  #endif /* HW_SPAPR_H */
>>>
>>>   
>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-10  9:52       ` David Gibson
@ 2019-05-13  5:00         ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-13  5:00 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Friday 10 May 2019 03:22 PM, David Gibson wrote:
> On Fri, May 10, 2019 at 12:35:13PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Friday 10 May 2019 12:12 PM, David Gibson wrote:
>>> On Mon, Apr 22, 2019 at 12:33:26PM +0530, Aravinda Prasad wrote:

[...]

>>>> +    /* Save gpr[3] in the guest endian mode */
>>>> +    if ((*pcc->interrupts_big_endian)(cpu)) {
>>>> +        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
>>>
>>> I don't think this is right.  AIUI env->gpr[] are all stored in *host*
>>> endianness (for ease of doing arithmetic).
>>
>> env-gpr[3] is later used by guest to fetch the RTAS log. My guess is
>> that we will not do an endianness change of all the gprs during a switch
>> from host to guest (that will be costly).
> 
> There's no need to "change endianness".  In TCG the host needs to do
> arithmetic on the values and so they are in host endian.  With KVM the
> env values are only synchronized when we enter/exit KVM and they're
> going to registers, not memory and so have no endianness.

Ah.. ok.

> 
>> But let me cross check.
>>
>>>
>>>> +    } else {
>>>> +        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
>>>> +    }
>>>> +
>>>> +    env->nip = spapr->guest_machine_check_addr;
>>>> +}
>>>> +
>>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>  {
>>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> @@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>          }
>>>>      }
>>>>      spapr->mc_status = cpu->vcpu_id;
>>>> +
>>>> +    spapr_mce_dispatch_elog(cpu, recovered);
>>>> +
>>>> +    return;
>>>>  }
>>>>  
>>>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index f7204d0..03f34bf 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>>>  #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
>>>>  #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
>>>>  
>>>> +/* Offset from rtas-base where error log is placed */
>>>> +#define RTAS_ERRLOG_OFFSET       0x25
>>>
>>> Is this offset PAPR defined, or chosen here?  Using an entirely
>>> unaliged (odd) address seems a very strange choice.
>>
>> This is not PAPR defined. I will make it 0x30. Or do you prefer any
>> other offset?
> 
> 0x30 should be fine.

ok..

> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
  2019-05-10 16:25   ` Greg Kurz
@ 2019-05-13  5:40     ` Aravinda Prasad
  2019-05-13  5:56       ` David Gibson
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-13  5:40 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Friday 10 May 2019 09:55 PM, Greg Kurz wrote:
> On Mon, 22 Apr 2019 12:33:16 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> Memory error such as bit flips that cannot be corrected
>> by hardware are passed on to the kernel for handling.
>> If the memory address in error belongs to guest then
>> the guest kernel is responsible for taking suitable action.
>> Patch [1] enhances KVM to exit guest with exit reason
>> set to KVM_EXIT_NMI in such cases. This patch handles
>> KVM_EXIT_NMI exit.
>>
>> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
>>     (e20bbd3d and related commits)
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c          |    3 +++
>>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
>>  hw/ppc/spapr_rtas.c     |    5 +++++
>>  include/hw/ppc/spapr.h  |    6 ++++++
>>  target/ppc/kvm.c        |   16 ++++++++++++++++
>>  target/ppc/kvm_ppc.h    |    2 ++
>>  target/ppc/trace-events |    2 ++
>>  7 files changed, 56 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 6642cb5..2779efe 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
>>  
>>      spapr->cas_reboot = false;
>>  
>> +    spapr->mc_status = -1;
>>      spapr->guest_machine_check_addr = -1;
>>  
>>      /* Signal all vCPUs waiting on this condition */
>> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
>>      .minimum_version_id = 1,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
>> +        VMSTATE_INT32(mc_status, SpaprMachineState),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
>>          kvmppc_spapr_enable_inkernel_multitce();
>>      }
>>  
>> +    spapr->mc_status = -1;
> 
> Since this is done at reset, do we need it here ?

Yes, because we need to initialize this on a fresh boot. I need to
check, but if spapr_machine_reset() is called every time a system boots
then we don't need qemu_cond_init() here as well.

> 
>>      qemu_cond_init(&spapr->mc_delivery_cond);
>>  }
>>  
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index ae0f093..9922a23 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>  }
>>  
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +
>> +    while (spapr->mc_status != -1) {
>> +        /*
>> +         * Check whether the same CPU got machine check error
>> +         * while still handling the mc error (i.e., before
>> +         * that CPU called "ibm,nmi-interlock"
> 
> Missing )

ok.

> 
>> +         */
>> +        if (spapr->mc_status == cpu->vcpu_id) {
>> +            qemu_system_guest_panicked(NULL);
> 
> If we don't also return, is there a chance we end up stuck in
> qemu_cond_wait_iothread() below ?

I think I need to return here


> 
>> +        }
>> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
>> +        /* Meanwhile if the system is reset, then just return */
>> +        if (spapr->guest_machine_check_addr == -1) {
>> +            return;
>> +        }
>> +    }
>> +    spapr->mc_status = cpu->vcpu_id;
>> +}
>> +
>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                              uint32_t token, uint32_t nargs,
>>                              target_ulong args,
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index c2f3991..d3499f9 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          /* NMI register not called */
>>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>      } else {
>> +        /*
>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +         * hence unset mc_status.
>> +         */
>> +        spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>      }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index ec6f33e..f7204d0 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -189,6 +189,11 @@ struct SpaprMachineState {
>>  
>>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>>      target_ulong guest_machine_check_addr;
>> +    /*
>> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
>> +     * handling the mc.
>> +     */
>> +    int mc_status;
>>      QemuCond mc_delivery_cond;
>>  
>>      /*< public >*/
>> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
>>  int spapr_max_server_number(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                        uint64_t pte0, uint64_t pte1);
>> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>  
>>  /* DRC callbacks. */
>>  void spapr_core_release(DeviceState *dev);
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 9e86db0..5eedce8 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>          ret = 0;
>>          break;
>>  
>> +    case KVM_EXIT_NMI:
>> +        trace_kvm_handle_nmi_exception();
>> +        ret = kvm_handle_nmi(cpu, run);
>> +        break;
>> +
>>      default:
>>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>          ret = -1;
>> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>      return data & 0xffff;
>>  }
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
>> +{
>> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
>> +
>> +    cpu_synchronize_state(CPU(cpu));
>> +
>> +    spapr_mce_req_event(cpu, recovered);
>> +
>> +    return 0;
>> +}
>> +
>>  int kvmppc_enable_hwrng(void)
>>  {
>>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index 2238513..6edc42f 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
>>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
>>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
>>  
>> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
>> +
>>  #else
>>  
>>  static inline uint32_t kvmppc_get_tbfreq(void)
>> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
>> index 7b3cfe1..d5691d2 100644
>> --- a/target/ppc/trace-events
>> +++ b/target/ppc/trace-events
>> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
>>  kvm_handle_epr(void) "handle epr"
>>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
>>  kvm_handle_debug_exception(void) "handle debug exception"
>> +kvm_handle_nmi_exception(void) "handle NMI exception"
>> +
> 
> new blank line at EOF.

ok

> 
>>
>>
> 
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 3/6] target/ppc: Handle NMI guest exit
  2019-05-13  5:40     ` Aravinda Prasad
@ 2019-05-13  5:56       ` David Gibson
  0 siblings, 0 replies; 65+ messages in thread
From: David Gibson @ 2019-05-13  5:56 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 8296 bytes --]

On Mon, May 13, 2019 at 11:10:28AM +0530, Aravinda Prasad wrote:
> 
> 
> On Friday 10 May 2019 09:55 PM, Greg Kurz wrote:
> > On Mon, 22 Apr 2019 12:33:16 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > 
> >> Memory error such as bit flips that cannot be corrected
> >> by hardware are passed on to the kernel for handling.
> >> If the memory address in error belongs to guest then
> >> the guest kernel is responsible for taking suitable action.
> >> Patch [1] enhances KVM to exit guest with exit reason
> >> set to KVM_EXIT_NMI in such cases. This patch handles
> >> KVM_EXIT_NMI exit.
> >>
> >> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
> >>     (e20bbd3d and related commits)
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c          |    3 +++
> >>  hw/ppc/spapr_events.c   |   22 ++++++++++++++++++++++
> >>  hw/ppc/spapr_rtas.c     |    5 +++++
> >>  include/hw/ppc/spapr.h  |    6 ++++++
> >>  target/ppc/kvm.c        |   16 ++++++++++++++++
> >>  target/ppc/kvm_ppc.h    |    2 ++
> >>  target/ppc/trace-events |    2 ++
> >>  7 files changed, 56 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 6642cb5..2779efe 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -1806,6 +1806,7 @@ static void spapr_machine_reset(void)
> >>  
> >>      spapr->cas_reboot = false;
> >>  
> >> +    spapr->mc_status = -1;
> >>      spapr->guest_machine_check_addr = -1;
> >>  
> >>      /* Signal all vCPUs waiting on this condition */
> >> @@ -2106,6 +2107,7 @@ static const VMStateDescription vmstate_spapr_machine_check = {
> >>      .minimum_version_id = 1,
> >>      .fields = (VMStateField[]) {
> >>          VMSTATE_UINT64(guest_machine_check_addr, SpaprMachineState),
> >> +        VMSTATE_INT32(mc_status, SpaprMachineState),
> >>          VMSTATE_END_OF_LIST()
> >>      },
> >>  };
> >> @@ -3085,6 +3087,7 @@ static void spapr_machine_init(MachineState *machine)
> >>          kvmppc_spapr_enable_inkernel_multitce();
> >>      }
> >>  
> >> +    spapr->mc_status = -1;
> > 
> > Since this is done at reset, do we need it here ?
> 
> Yes, because we need to initialize this on a fresh boot. I need to
> check, but if spapr_machine_reset() is called every time a system boots

It is.

> then we don't need qemu_cond_init() here as well.
> 
> > 
> >>      qemu_cond_init(&spapr->mc_delivery_cond);
> >>  }
> >>  
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index ae0f093..9922a23 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -620,6 +620,28 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> >>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >>  }
> >>  
> >> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >> +{
> >> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +
> >> +    while (spapr->mc_status != -1) {
> >> +        /*
> >> +         * Check whether the same CPU got machine check error
> >> +         * while still handling the mc error (i.e., before
> >> +         * that CPU called "ibm,nmi-interlock"
> > 
> > Missing )
> 
> ok.
> 
> > 
> >> +         */
> >> +        if (spapr->mc_status == cpu->vcpu_id) {
> >> +            qemu_system_guest_panicked(NULL);
> > 
> > If we don't also return, is there a chance we end up stuck in
> > qemu_cond_wait_iothread() below ?
> 
> I think I need to return here
> 
> 
> > 
> >> +        }
> >> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> >> +        /* Meanwhile if the system is reset, then just return */
> >> +        if (spapr->guest_machine_check_addr == -1) {
> >> +            return;
> >> +        }
> >> +    }
> >> +    spapr->mc_status = cpu->vcpu_id;
> >> +}
> >> +
> >>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                              uint32_t token, uint32_t nargs,
> >>                              target_ulong args,
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index c2f3991..d3499f9 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -375,6 +375,11 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>          /* NMI register not called */
> >>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>      } else {
> >> +        /*
> >> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >> +         * hence unset mc_status.
> >> +         */
> >> +        spapr->mc_status = -1;
> >>          qemu_cond_signal(&spapr->mc_delivery_cond);
> >>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>      }
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index ec6f33e..f7204d0 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -189,6 +189,11 @@ struct SpaprMachineState {
> >>  
> >>      /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> >>      target_ulong guest_machine_check_addr;
> >> +    /*
> >> +     * mc_status is set to -1 if mc is not in progress, else is set to the CPU
> >> +     * handling the mc.
> >> +     */
> >> +    int mc_status;
> >>      QemuCond mc_delivery_cond;
> >>  
> >>      /*< public >*/
> >> @@ -792,6 +797,7 @@ void spapr_clear_pending_events(SpaprMachineState *spapr);
> >>  int spapr_max_server_number(SpaprMachineState *spapr);
> >>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> >>                        uint64_t pte0, uint64_t pte1);
> >> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> >>  
> >>  /* DRC callbacks. */
> >>  void spapr_core_release(DeviceState *dev);
> >> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> >> index 9e86db0..5eedce8 100644
> >> --- a/target/ppc/kvm.c
> >> +++ b/target/ppc/kvm.c
> >> @@ -1759,6 +1759,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
> >>          ret = 0;
> >>          break;
> >>  
> >> +    case KVM_EXIT_NMI:
> >> +        trace_kvm_handle_nmi_exception();
> >> +        ret = kvm_handle_nmi(cpu, run);
> >> +        break;
> >> +
> >>      default:
> >>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
> >>          ret = -1;
> >> @@ -2837,6 +2842,17 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >>      return data & 0xffff;
> >>  }
> >>  
> >> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
> >> +{
> >> +    bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
> >> +
> >> +    cpu_synchronize_state(CPU(cpu));
> >> +
> >> +    spapr_mce_req_event(cpu, recovered);
> >> +
> >> +    return 0;
> >> +}
> >> +
> >>  int kvmppc_enable_hwrng(void)
> >>  {
> >>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_PPC_HWRNG)) {
> >> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> >> index 2238513..6edc42f 100644
> >> --- a/target/ppc/kvm_ppc.h
> >> +++ b/target/ppc/kvm_ppc.h
> >> @@ -80,6 +80,8 @@ bool kvmppc_hpt_needs_host_contiguous_pages(void);
> >>  void kvm_check_mmu(PowerPCCPU *cpu, Error **errp);
> >>  void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online);
> >>  
> >> +int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
> >> +
> >>  #else
> >>  
> >>  static inline uint32_t kvmppc_get_tbfreq(void)
> >> diff --git a/target/ppc/trace-events b/target/ppc/trace-events
> >> index 7b3cfe1..d5691d2 100644
> >> --- a/target/ppc/trace-events
> >> +++ b/target/ppc/trace-events
> >> @@ -28,3 +28,5 @@ kvm_handle_papr_hcall(void) "handle PAPR hypercall"
> >>  kvm_handle_epr(void) "handle epr"
> >>  kvm_handle_watchdog_expiry(void) "handle watchdog expiry"
> >>  kvm_handle_debug_exception(void) "handle debug exception"
> >> +kvm_handle_nmi_exception(void) "handle NMI exception"
> >> +
> > 
> > new blank line at EOF.
> 
> ok
> 
> > 
> >>
> >>
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-10  9:53       ` David Gibson
@ 2019-05-13 10:30         ` Aravinda Prasad
  2019-05-14  4:47           ` David Gibson
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-13 10:30 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, aik, qemu-ppc, qemu-devel



On Friday 10 May 2019 03:23 PM, David Gibson wrote:
> On Fri, May 10, 2019 at 12:45:29PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Friday 10 May 2019 12:16 PM, David Gibson wrote:
>>> On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
>>>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
>>>> the KVM causes guest exit with NMI as exit reason
>>>> when it encounters a machine check exception on the
>>>> address belonging to a guest. Without this capability
>>>> enabled, KVM redirects machine check exceptions to
>>>> guest's 0x200 vector.
>>>>
>>>> This patch also deals with the case when a guest with
>>>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>> to migrate to a host that does not support this
>>>> capability.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr.c         |    1 +
>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>>>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>  target/ppc/kvm.c       |   14 ++++++++++++++
>>>>  target/ppc/kvm_ppc.h   |    6 ++++++
>>>>  6 files changed, 64 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index ffd1715..44e09bb 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>      smc->irq = &spapr_irq_xics;
>>>>      smc->dr_phb_enabled = true;
>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>> index edc5ed0..5b3af04 100644
>>>> --- a/hw/ppc/spapr_caps.c
>>>> +++ b/hw/ppc/spapr_caps.c
>>>> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>      }
>>>>  }
>>>>  
>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>> +                                Error **errp)
>>>> +{
>>>> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>>>> +
>>>> +    if (!val) {
>>>> +        return; /* Disabled by default */
>>>> +    }
>>>> +
>>>> +    if (kvm_enabled()) {
>>>> +        if (kvmppc_fwnmi_enable(cpu)) {
>>>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>      [SPAPR_CAP_HTM] = {
>>>>          .name = "htm",
>>>> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>          .type = "bool",
>>>>          .apply = cap_ccf_assist_apply,
>>>>      },
>>>> +    [SPAPR_CAP_FWNMI_MCE] = {
>>>> +        .name = "fwnmi-mce",
>>>> +        .description = "Handle fwnmi machine check exceptions",
>>>> +        .index = SPAPR_CAP_FWNMI_MCE,
>>>> +        .get = spapr_cap_get_bool,
>>>> +        .set = spapr_cap_set_bool,
>>>> +        .type = "bool",
>>>> +        .apply = cap_fwnmi_mce_apply,
>>>> +    },
>>>>  };
>>>>  
>>>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>>>> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>>>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>>>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>>>  
>>>>  void spapr_caps_init(SpaprMachineState *spapr)
>>>>  {
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index d3499f9..997cf19 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -49,6 +49,7 @@
>>>>  #include "hw/ppc/fdt.h"
>>>>  #include "target/ppc/mmu-hash64.h"
>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>> +#include "kvm_ppc.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>                                    target_ulong args,
>>>>                                    uint32_t nret, target_ulong rets)
>>>>  {
>>>> +    int ret;
>>>>      uint64_t rtas_addr = spapr_get_rtas_addr();
>>>>  
>>>>      if (!rtas_addr) {
>>>> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>          return;
>>>>      }
>>>>  
>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>
>>> You shouldn't need this here as well as in cap_fwnmi_mce_apply().
>>>
>>> Instead, you should unconditionally fail the nmi-register if the
>>> capability is not enabled.
>>
>> cap_fwnmi is not enabled by default, because if it is enabled by default
>> them KVM will start routing machine check exceptions via guest exit
>> instead of routing it to guest's 0x200.
>>
>> During early boot since guest has not yet issued nmi-register, KVM is
>> expected to route exceptions to 0x200. Therefore we enable cap_fwnmi
>> only when a guest issues nmi-register.
> 
> Except that's not true - you enable it in cap_fwnmi_mce_apply() which
> will be executed whenever the machine capability is enabled.

I enable cap_fwnmi in cap_fwnmi_mce_apply() only when the "val" argument
(which is the effective cap value) is set. In early boot "val" is not
set as cap_fwnmi by default is not set, hence cap_fwnmi is not enabled.

My understanding is that, cap_fwnmi_mce_apply() is also called during
migration on the target machine. If effective cap for cap_fwnmi is
enabled on source machine than I think "val" will be set when
cap_fwnmi_mce_apply() is called on target machine. I then call
kvmppc_fwnmi_enable() to enable cap_fwnmi on target.

Regards,
Aravinda

> 
>> Or we should take the approach of enabling this capability by default
>> and then from QEMU route the error to 0x200 if guest has not issued
>> nmi-register.
>>
>>>
>>>> +    if (ret == 1) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (ret < 0) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> +        return;
>>>> +    }
>>>> +
>>>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>  }
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 03f34bf..9d16ad1 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -78,8 +78,10 @@ typedef enum {
>>>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>>>>  /* Count Cache Flush Assist HW Instruction */
>>>>  #define SPAPR_CAP_CCF_ASSIST            0x09
>>>> +/* FWNMI machine check handling */
>>>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>>>>  /* Num Caps */
>>>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
>>>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>>>>  
>>>>  /*
>>>>   * Capability Values
>>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>>>> index 5eedce8..9c7b71d 100644
>>>> --- a/target/ppc/kvm.c
>>>> +++ b/target/ppc/kvm.c
>>>> @@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
>>>>  static int cap_ppc_count_cache_flush_assist;
>>>>  static int cap_ppc_nested_kvm_hv;
>>>>  static int cap_large_decr;
>>>> +static int cap_ppc_fwnmi;
>>>>  
>>>>  static uint32_t debug_inst_opcode;
>>>>  
>>>> @@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>      kvmppc_get_cpu_characteristics(s);
>>>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>>>>      cap_large_decr = kvmppc_get_dec_bits();
>>>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>>>      /*
>>>>       * Note: setting it to false because there is not such capability
>>>>       * in KVM at this moment.
>>>> @@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>      }
>>>>  }
>>>>  
>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>> +{
>>>> +    CPUState *cs = CPU(cpu);
>>>> +
>>>> +    if (!cap_ppc_fwnmi) {
>>>> +        return 1;
>>>> +    }
>>>> +
>>>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>>>> +}
>>>> +
>>>> +
>>>>  int kvmppc_smt_threads(void)
>>>>  {
>>>>      return cap_ppc_smt ? cap_ppc_smt : 1;
>>>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>>>> index 6edc42f..28919d3 100644
>>>> --- a/target/ppc/kvm_ppc.h
>>>> +++ b/target/ppc/kvm_ppc.h
>>>> @@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
>>>>  void kvmppc_set_papr(PowerPCCPU *cpu);
>>>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>>>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>>>>  int kvmppc_smt_threads(void);
>>>>  void kvmppc_hint_smt_possible(Error **errp);
>>>>  int kvmppc_set_smt_threads(int smt);
>>>> @@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>  {
>>>>  }
>>>>  
>>>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>> +{
>>>> +    return 1;
>>>> +}
>>>> +
>>>>  static inline int kvmppc_smt_threads(void)
>>>>  {
>>>>      return 1;
>>>>
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-04-22  7:03   ` Aravinda Prasad
                     ` (2 preceding siblings ...)
  (?)
@ 2019-05-13 11:30   ` Greg Kurz
  2019-05-14  0:08     ` David Gibson
  -1 siblings, 1 reply; 65+ messages in thread
From: Greg Kurz @ 2019-05-13 11:30 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Mon, 22 Apr 2019 12:33:26 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> Upon a machine check exception (MCE) in a guest address space,
> KVM causes a guest exit to enable QEMU to build and pass the
> error to the guest in the PAPR defined rtas error log format.
> 
> This patch builds the rtas error log, copies it to the rtas_addr
> and then invokes the guest registered machine check handler. The
> handler in the guest takes suitable action(s) depending on the type
> and criticality of the error. For example, if an error is
> unrecoverable memory corruption in an application inside the
> guest, then the guest kernel sends a SIGBUS to the application.
> For recoverable errors, the guest performs recovery actions and
> logs the error.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    4 +
>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |    4 +
>  3 files changed, 253 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2779efe..ffd1715 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
>          error_report("Could not get size of LPAR rtas '%s'", filename);
>          exit(1);
>      }
> +
> +    /* Resize blob to accommodate error log. */
> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
> +

This is the only user for spapr_get_rtas_size(), which is trivial.
I suggest you simply open-code it here.

But also, spapr->rtas_size is a guest visible thing, "rtas-size" prop in the
DT. Since existing machine types don't do that, I guess we should only use
the new size if cap-fwnmi-mce=on for the sake of compatibility.

>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>          error_report("Could not load LPAR rtas '%s'", filename);
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 9922a23..4032db0 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -212,6 +212,106 @@ struct hp_extended_log {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_mc {

Even if the rest of the code in this file seems to ignore CODING_STYLE,
maybe it's time to start using CamelCase.

David ?

> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint32_t fru_id;
> +    uint32_t proc_id;
> +    uint8_t error_type;
> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> +    uint8_t sub_err_type;
> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> +    uint8_t reserved_1[6];
> +    uint64_t effective_address;
> +    uint64_t logical_address;
> +} QEMU_PACKED;
> +
> +struct mc_extended_log {
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_mc mc;
> +} QEMU_PACKED;
> +
> +struct MC_ierror_table {
> +    unsigned long srr1_mask;
> +    unsigned long srr1_value;
> +    bool nip_valid; /* nip is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_ierror_table mc_ierror_table[] = {
> +{ 0x00000000081c0000, 0x0000000000040000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000080000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x00000000000c0000, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000100000, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000140000, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000000081c0000, 0x0000000000180000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, 0, 0, 0, 0, 0 } };
> +
> +struct MC_derror_table {
> +    unsigned long dsisr_value;
> +    bool dar_valid; /* dar is a valid indicator of faulting address */
> +    uint8_t error_type;
> +    uint8_t error_subtype;
> +    unsigned int initiator;
> +    unsigned int severity;
> +};
> +
> +static const struct MC_derror_table mc_derror_table[] = {
> +{ 0x00008000, false,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00004000, true,
> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000800, true,
> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000400, true,
> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000080, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0x00000100, true,
> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> +{ 0, false, 0, 0, 0, 0 } };
> +
> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> +
>  typedef enum EventClass {
>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>      EVENT_CLASS_EPOW                = 1,
> @@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
> +{
> +    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
> +    return RTAS_ERROR_LOG_MAX;
> +}
> +
> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> +                                        struct mc_extended_log *ext_elog)
> +{
> +    int i;
> +    CPUPPCState *env = &cpu->env;
> +    uint32_t summary;
> +    uint64_t dsisr = env->spr[SPR_DSISR];
> +
> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> +    if (recovered) {
> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> +    } else {
> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> +    }
> +
> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> +            if (mc_derror_table[i].dar_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> +            }
> +
> +            summary |= mc_derror_table[i].initiator
> +                        | mc_derror_table[i].severity;
> +
> +            return summary;
> +        }
> +    } else {
> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> +                    mc_ierror_table[i].srr1_value) {
> +                continue;
> +            }
> +
> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> +            if (mc_ierror_table[i].nip_valid) {
> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> +            }
> +
> +            summary |= mc_ierror_table[i].initiator
> +                        | mc_ierror_table[i].severity;
> +
> +            return summary;
> +        }
> +    }
> +
> +    summary |= RTAS_LOG_INITIATOR_CPU;
> +    return summary;
> +}
> +
> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    CPUState *cs = CPU(cpu);
> +    uint64_t rtas_addr;
> +    CPUPPCState *env = &cpu->env;
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    target_ulong r3, msr = 0;
> +    struct rtas_error_log log;
> +    struct mc_extended_log *ext_elog;
> +    uint32_t summary;
> +
> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr && (1ULL << MSR_SF)) {
> +        msr |= (1ULL << MSR_SF);
> +    }
> +
> +    msr |= (1ULL << MSR_ME);
> +
> +    if (spapr->guest_machine_check_addr == -1) {

Should be (target_ulong) -1

> +        /*
> +         * This implies that we have hit a machine check between system
> +         * reset and "ibm,nmi-register". Fall back to the old machine
> +         * check behavior in such cases.
> +         */
> +        env->spr[SPR_SRR0] = env->nip;
> +        env->spr[SPR_SRR1] = env->msr;
> +        env->msr = msr;
> +        env->nip = 0x200;
> +        return;
> +    }
> +
> +    ext_elog = g_malloc0(sizeof(struct mc_extended_log));

sizeof(*ext_elog) is preferable IMHO, same remark for the other sizeof sites.

Also, I can't find the corresponding call to g_free(), which should be
somewhere in this function IIUC.

> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> +
> +    log.summary = cpu_to_be32(summary);
> +    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
> +
> +    /* r3 should be in BE always */
> +    r3 = cpu_to_be64(env->gpr[3]);
> +    env->msr = msr;
> +
> +    spapr_init_v6hdr(&ext_elog->v6hdr);
> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> +    ext_elog->mc.hdr.section_length =
> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> +    ext_elog->mc.hdr.section_version = 1;
> +
> +    /* get rtas addr from fdt */
> +    rtas_addr = spapr_get_rtas_addr();
> +    if (!rtas_addr) {
> +        /* Unable to fetch rtas_addr. Hence reset the guest */
> +        ppc_cpu_do_system_reset(cs);
> +    }
> +
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
> +                              &log, sizeof(log));
> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
> +                              sizeof(log), ext_elog,
> +                              sizeof(struct mc_extended_log));
> +
> +    /* Save gpr[3] in the guest endian mode */
> +    if ((*pcc->interrupts_big_endian)(cpu)) {
> +        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
> +    } else {
> +        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
> +    }
> +
> +    env->nip = spapr->guest_machine_check_addr;
> +}
> +
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> @@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>          }
>      }
>      spapr->mc_status = cpu->vcpu_id;
> +
> +    spapr_mce_dispatch_elog(cpu, recovered);
> +
> +    return;
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index f7204d0..03f34bf 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>  #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
>  #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
>  
> +/* Offset from rtas-base where error log is placed */
> +#define RTAS_ERRLOG_OFFSET       0x25
> +

We already have an RTAS_ERROR_LOG_MAX macro defined in this file.
Maybe use the same "ERROR_LOG" wording for consistency.

>  static inline uint64_t ppc64_phys_to_real(uint64_t addr)
>  {
>      return addr & ~0xF000000000000000ULL;
> @@ -798,6 +801,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
>  
>  /* DRC callbacks. */
>  void spapr_core_release(DeviceState *dev);
> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-13 11:30   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-05-14  0:08     ` David Gibson
  2019-05-14  4:26       ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-14  0:08 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, Aravinda Prasad

[-- Attachment #1: Type: text/plain, Size: 15528 bytes --]

On Mon, May 13, 2019 at 01:30:53PM +0200, Greg Kurz wrote:
> On Mon, 22 Apr 2019 12:33:26 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
> > Upon a machine check exception (MCE) in a guest address space,
> > KVM causes a guest exit to enable QEMU to build and pass the
> > error to the guest in the PAPR defined rtas error log format.
> > 
> > This patch builds the rtas error log, copies it to the rtas_addr
> > and then invokes the guest registered machine check handler. The
> > handler in the guest takes suitable action(s) depending on the type
> > and criticality of the error. For example, if an error is
> > unrecoverable memory corruption in an application inside the
> > guest, then the guest kernel sends a SIGBUS to the application.
> > For recoverable errors, the guest performs recovery actions and
> > logs the error.
> > 
> > Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |    4 +
> >  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h |    4 +
> >  3 files changed, 253 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 2779efe..ffd1715 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
> >          error_report("Could not get size of LPAR rtas '%s'", filename);
> >          exit(1);
> >      }
> > +
> > +    /* Resize blob to accommodate error log. */
> > +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
> > +
> 
> This is the only user for spapr_get_rtas_size(), which is trivial.
> I suggest you simply open-code it here.

I agree.

> But also, spapr->rtas_size is a guest visible thing, "rtas-size" prop in the
> DT. Since existing machine types don't do that, I guess we should only use
> the new size if cap-fwnmi-mce=on for the sake of compatibility.

Yes, that's a good idea.  Changing this is very unlikely to break a
guest, but it's easy to be safe here so let's do it.

> 
> >      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
> >          error_report("Could not load LPAR rtas '%s'", filename);
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 9922a23..4032db0 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -212,6 +212,106 @@ struct hp_extended_log {
> >      struct rtas_event_log_v6_hp hp;
> >  } QEMU_PACKED;
> >  
> > +struct rtas_event_log_v6_mc {
> 
> Even if the rest of the code in this file seems to ignore CODING_STYLE,
> maybe it's time to start using CamelCase.
> 
> David ?

Out of scope here, I think.

> > +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
> > +    struct rtas_event_log_v6_section_header hdr;
> > +    uint32_t fru_id;
> > +    uint32_t proc_id;
> > +    uint8_t error_type;
> > +#define RTAS_LOG_V6_MC_TYPE_UE                           0
> > +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
> > +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
> > +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
> > +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
> > +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
> > +    uint8_t sub_err_type;
> > +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
> > +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
> > +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
> > +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
> > +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
> > +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
> > +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
> > +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
> > +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
> > +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
> > +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
> > +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
> > +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
> > +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
> > +    uint8_t reserved_1[6];
> > +    uint64_t effective_address;
> > +    uint64_t logical_address;
> > +} QEMU_PACKED;
> > +
> > +struct mc_extended_log {
> > +    struct rtas_event_log_v6 v6hdr;
> > +    struct rtas_event_log_v6_mc mc;
> > +} QEMU_PACKED;
> > +
> > +struct MC_ierror_table {
> > +    unsigned long srr1_mask;
> > +    unsigned long srr1_value;
> > +    bool nip_valid; /* nip is a valid indicator of faulting address */
> > +    uint8_t error_type;
> > +    uint8_t error_subtype;
> > +    unsigned int initiator;
> > +    unsigned int severity;
> > +};
> > +
> > +static const struct MC_ierror_table mc_ierror_table[] = {
> > +{ 0x00000000081c0000, 0x0000000000040000, true,
> > +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000000081c0000, 0x0000000000080000, true,
> > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000000081c0000, 0x00000000000c0000, true,
> > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000000081c0000, 0x0000000000100000, true,
> > +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000000081c0000, 0x0000000000140000, true,
> > +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000000081c0000, 0x0000000000180000, true,
> > +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0, 0, 0, 0, 0, 0 } };
> > +
> > +struct MC_derror_table {
> > +    unsigned long dsisr_value;
> > +    bool dar_valid; /* dar is a valid indicator of faulting address */
> > +    uint8_t error_type;
> > +    uint8_t error_subtype;
> > +    unsigned int initiator;
> > +    unsigned int severity;
> > +};
> > +
> > +static const struct MC_derror_table mc_derror_table[] = {
> > +{ 0x00008000, false,
> > +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00004000, true,
> > +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000800, true,
> > +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000400, true,
> > +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000080, true,
> > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0x00000100, true,
> > +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
> > +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
> > +{ 0, false, 0, 0, 0, 0 } };
> > +
> > +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
> > +
> >  typedef enum EventClass {
> >      EVENT_CLASS_INTERNAL_ERRORS     = 0,
> >      EVENT_CLASS_EPOW                = 1,
> > @@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
> >                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >  }
> >  
> > +ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
> > +{
> > +    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
> > +    return RTAS_ERROR_LOG_MAX;
> > +}
> > +
> > +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
> > +                                        struct mc_extended_log *ext_elog)
> > +{
> > +    int i;
> > +    CPUPPCState *env = &cpu->env;
> > +    uint32_t summary;
> > +    uint64_t dsisr = env->spr[SPR_DSISR];
> > +
> > +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
> > +    if (recovered) {
> > +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
> > +    } else {
> > +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
> > +    }
> > +
> > +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
> > +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
> > +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
> > +                continue;
> > +            }
> > +
> > +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
> > +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
> > +            if (mc_derror_table[i].dar_valid) {
> > +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
> > +            }
> > +
> > +            summary |= mc_derror_table[i].initiator
> > +                        | mc_derror_table[i].severity;
> > +
> > +            return summary;
> > +        }
> > +    } else {
> > +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
> > +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
> > +                    mc_ierror_table[i].srr1_value) {
> > +                continue;
> > +            }
> > +
> > +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
> > +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
> > +            if (mc_ierror_table[i].nip_valid) {
> > +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> > +            }
> > +
> > +            summary |= mc_ierror_table[i].initiator
> > +                        | mc_ierror_table[i].severity;
> > +
> > +            return summary;
> > +        }
> > +    }
> > +
> > +    summary |= RTAS_LOG_INITIATOR_CPU;
> > +    return summary;
> > +}
> > +
> > +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> > +{
> > +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > +    CPUState *cs = CPU(cpu);
> > +    uint64_t rtas_addr;
> > +    CPUPPCState *env = &cpu->env;
> > +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> > +    target_ulong r3, msr = 0;
> > +    struct rtas_error_log log;
> > +    struct mc_extended_log *ext_elog;
> > +    uint32_t summary;
> > +
> > +    /*
> > +     * Properly set bits in MSR before we invoke the handler.
> > +     * SRR0/1, DAR and DSISR are properly set by KVM
> > +     */
> > +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> > +        msr |= (1ULL << MSR_LE);
> > +    }
> > +
> > +    if (env->msr && (1ULL << MSR_SF)) {
> > +        msr |= (1ULL << MSR_SF);
> > +    }
> > +
> > +    msr |= (1ULL << MSR_ME);
> > +
> > +    if (spapr->guest_machine_check_addr == -1) {
> 
> Should be (target_ulong) -1

I think the == itself should perform the necessary coercion.

> > +        /*
> > +         * This implies that we have hit a machine check between system
> > +         * reset and "ibm,nmi-register". Fall back to the old machine
> > +         * check behavior in such cases.
> > +         */
> > +        env->spr[SPR_SRR0] = env->nip;
> > +        env->spr[SPR_SRR1] = env->msr;
> > +        env->msr = msr;
> > +        env->nip = 0x200;
> > +        return;
> > +    }
> > +
> > +    ext_elog = g_malloc0(sizeof(struct mc_extended_log));
> 
> sizeof(*ext_elog) is preferable IMHO, same remark for the other sizeof sites.

Agreed.

> Also, I can't find the corresponding call to g_free(), which should be
> somewhere in this function IIUC.

Yes, that needs fixing.

> 
> > +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
> > +
> > +    log.summary = cpu_to_be32(summary);
> > +    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
> > +
> > +    /* r3 should be in BE always */
> > +    r3 = cpu_to_be64(env->gpr[3]);
> > +    env->msr = msr;
> > +
> > +    spapr_init_v6hdr(&ext_elog->v6hdr);
> > +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
> > +    ext_elog->mc.hdr.section_length =
> > +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
> > +    ext_elog->mc.hdr.section_version = 1;
> > +
> > +    /* get rtas addr from fdt */
> > +    rtas_addr = spapr_get_rtas_addr();
> > +    if (!rtas_addr) {
> > +        /* Unable to fetch rtas_addr. Hence reset the guest */
> > +        ppc_cpu_do_system_reset(cs);
> > +    }
> > +
> > +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
> > +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
> > +                              &log, sizeof(log));
> > +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
> > +                              sizeof(log), ext_elog,
> > +                              sizeof(struct mc_extended_log));
> > +
> > +    /* Save gpr[3] in the guest endian mode */
> > +    if ((*pcc->interrupts_big_endian)(cpu)) {
> > +        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
> > +    } else {
> > +        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
> > +    }
> > +
> > +    env->nip = spapr->guest_machine_check_addr;
> > +}
> > +
> >  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >  {
> >      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > @@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >          }
> >      }
> >      spapr->mc_status = cpu->vcpu_id;
> > +
> > +    spapr_mce_dispatch_elog(cpu, recovered);
> > +
> > +    return;
> >  }
> >  
> >  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index f7204d0..03f34bf 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> >  #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
> >  #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
> >  
> > +/* Offset from rtas-base where error log is placed */
> > +#define RTAS_ERRLOG_OFFSET       0x25
> > +
> 
> We already have an RTAS_ERROR_LOG_MAX macro defined in this file.
> Maybe use the same "ERROR_LOG" wording for consistency.

Agreed.

> >  static inline uint64_t ppc64_phys_to_real(uint64_t addr)
> >  {
> >      return addr & ~0xF000000000000000ULL;
> > @@ -798,6 +801,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
> >  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> >                        uint64_t pte0, uint64_t pte1);
> >  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> > +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
> >  
> >  /* DRC callbacks. */
> >  void spapr_core_release(DeviceState *dev);
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-14  0:08     ` David Gibson
@ 2019-05-14  4:26       ` Aravinda Prasad
  2019-05-14  4:40         ` David Gibson
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-14  4:26 UTC (permalink / raw)
  To: David Gibson, Greg Kurz; +Cc: paulus, qemu-ppc, aik, qemu-devel



On Tuesday 14 May 2019 05:38 AM, David Gibson wrote:
> On Mon, May 13, 2019 at 01:30:53PM +0200, Greg Kurz wrote:
>> On Mon, 22 Apr 2019 12:33:26 +0530
>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>
>>> Upon a machine check exception (MCE) in a guest address space,
>>> KVM causes a guest exit to enable QEMU to build and pass the
>>> error to the guest in the PAPR defined rtas error log format.
>>>
>>> This patch builds the rtas error log, copies it to the rtas_addr
>>> and then invokes the guest registered machine check handler. The
>>> handler in the guest takes suitable action(s) depending on the type
>>> and criticality of the error. For example, if an error is
>>> unrecoverable memory corruption in an application inside the
>>> guest, then the guest kernel sends a SIGBUS to the application.
>>> For recoverable errors, the guest performs recovery actions and
>>> logs the error.
>>>
>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>> ---
>>>  hw/ppc/spapr.c         |    4 +
>>>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/hw/ppc/spapr.h |    4 +
>>>  3 files changed, 253 insertions(+)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 2779efe..ffd1715 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
>>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>>          exit(1);
>>>      }
>>> +
>>> +    /* Resize blob to accommodate error log. */
>>> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
>>> +
>>
>> This is the only user for spapr_get_rtas_size(), which is trivial.
>> I suggest you simply open-code it here.
> 
> I agree.

Sure.

> 
>> But also, spapr->rtas_size is a guest visible thing, "rtas-size" prop in the
>> DT. Since existing machine types don't do that, I guess we should only use
>> the new size if cap-fwnmi-mce=on for the sake of compatibility.
> 
> Yes, that's a good idea.  Changing this is very unlikely to break a
> guest, but it's easy to be safe here so let's do it.

I did it like that because the rtas_blob is allocated based on rtas_size
in spapr_machine_init(). During spapr_machine_init() it is not know if
the guest calls "ibm, nmi-register". So if we want to use the new size
only when cap_fwnmi=on, then we have to realloc the blob in "ibm,
nmi-register".


> 
>>
>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>>>          error_report("Could not load LPAR rtas '%s'", filename);
>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>> index 9922a23..4032db0 100644
>>> --- a/hw/ppc/spapr_events.c
>>> +++ b/hw/ppc/spapr_events.c
>>> @@ -212,6 +212,106 @@ struct hp_extended_log {
>>>      struct rtas_event_log_v6_hp hp;
>>>  } QEMU_PACKED;
>>>  
>>> +struct rtas_event_log_v6_mc {
>>
>> Even if the rest of the code in this file seems to ignore CODING_STYLE,
>> maybe it's time to start using CamelCase.
>>
>> David ?
> 
> Out of scope here, I think.
> 
>>> +#define RTAS_LOG_V6_SECTION_ID_MC                   0x4D43 /* MC */
>>> +    struct rtas_event_log_v6_section_header hdr;
>>> +    uint32_t fru_id;
>>> +    uint32_t proc_id;
>>> +    uint8_t error_type;
>>> +#define RTAS_LOG_V6_MC_TYPE_UE                           0
>>> +#define RTAS_LOG_V6_MC_TYPE_SLB                          1
>>> +#define RTAS_LOG_V6_MC_TYPE_ERAT                         2
>>> +#define RTAS_LOG_V6_MC_TYPE_TLB                          4
>>> +#define RTAS_LOG_V6_MC_TYPE_D_CACHE                      5
>>> +#define RTAS_LOG_V6_MC_TYPE_I_CACHE                      7
>>> +    uint8_t sub_err_type;
>>> +#define RTAS_LOG_V6_MC_UE_INDETERMINATE                  0
>>> +#define RTAS_LOG_V6_MC_UE_IFETCH                         1
>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH         2
>>> +#define RTAS_LOG_V6_MC_UE_LOAD_STORE                     3
>>> +#define RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE     4
>>> +#define RTAS_LOG_V6_MC_SLB_PARITY                        0
>>> +#define RTAS_LOG_V6_MC_SLB_MULTIHIT                      1
>>> +#define RTAS_LOG_V6_MC_SLB_INDETERMINATE                 2
>>> +#define RTAS_LOG_V6_MC_ERAT_PARITY                       1
>>> +#define RTAS_LOG_V6_MC_ERAT_MULTIHIT                     2
>>> +#define RTAS_LOG_V6_MC_ERAT_INDETERMINATE                3
>>> +#define RTAS_LOG_V6_MC_TLB_PARITY                        1
>>> +#define RTAS_LOG_V6_MC_TLB_MULTIHIT                      2
>>> +#define RTAS_LOG_V6_MC_TLB_INDETERMINATE                 3
>>> +    uint8_t reserved_1[6];
>>> +    uint64_t effective_address;
>>> +    uint64_t logical_address;
>>> +} QEMU_PACKED;
>>> +
>>> +struct mc_extended_log {
>>> +    struct rtas_event_log_v6 v6hdr;
>>> +    struct rtas_event_log_v6_mc mc;
>>> +} QEMU_PACKED;
>>> +
>>> +struct MC_ierror_table {
>>> +    unsigned long srr1_mask;
>>> +    unsigned long srr1_value;
>>> +    bool nip_valid; /* nip is a valid indicator of faulting address */
>>> +    uint8_t error_type;
>>> +    uint8_t error_subtype;
>>> +    unsigned int initiator;
>>> +    unsigned int severity;
>>> +};
>>> +
>>> +static const struct MC_ierror_table mc_ierror_table[] = {
>>> +{ 0x00000000081c0000, 0x0000000000040000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_IFETCH,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000080000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x00000000000c0000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000100000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000140000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000000081c0000, 0x0000000000180000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_IFETCH,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0, 0, 0, 0, 0, 0 } };
>>> +
>>> +struct MC_derror_table {
>>> +    unsigned long dsisr_value;
>>> +    bool dar_valid; /* dar is a valid indicator of faulting address */
>>> +    uint8_t error_type;
>>> +    uint8_t error_subtype;
>>> +    unsigned int initiator;
>>> +    unsigned int severity;
>>> +};
>>> +
>>> +static const struct MC_derror_table mc_derror_table[] = {
>>> +{ 0x00008000, false,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_LOAD_STORE,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00004000, true,
>>> +  RTAS_LOG_V6_MC_TYPE_UE, RTAS_LOG_V6_MC_UE_PAGE_TABLE_WALK_LOAD_STORE,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000800, true,
>>> +  RTAS_LOG_V6_MC_TYPE_ERAT, RTAS_LOG_V6_MC_ERAT_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000400, true,
>>> +  RTAS_LOG_V6_MC_TYPE_TLB, RTAS_LOG_V6_MC_TLB_MULTIHIT,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000080, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_MULTIHIT,  /* Before PARITY */
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0x00000100, true,
>>> +  RTAS_LOG_V6_MC_TYPE_SLB, RTAS_LOG_V6_MC_SLB_PARITY,
>>> +  RTAS_LOG_INITIATOR_CPU, RTAS_LOG_SEVERITY_ERROR_SYNC, },
>>> +{ 0, false, 0, 0, 0, 0 } };
>>> +
>>> +#define SRR1_MC_LOADSTORE(srr1) ((srr1) & PPC_BIT(42))
>>> +
>>>  typedef enum EventClass {
>>>      EVENT_CLASS_INTERNAL_ERRORS     = 0,
>>>      EVENT_CLASS_EPOW                = 1,
>>> @@ -620,6 +720,147 @@ void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>>>                              RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>>>  }
>>>  
>>> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_size)
>>> +{
>>> +    g_assert(old_rtas_size < RTAS_ERRLOG_OFFSET);
>>> +    return RTAS_ERROR_LOG_MAX;
>>> +}
>>> +
>>> +static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>>> +                                        struct mc_extended_log *ext_elog)
>>> +{
>>> +    int i;
>>> +    CPUPPCState *env = &cpu->env;
>>> +    uint32_t summary;
>>> +    uint64_t dsisr = env->spr[SPR_DSISR];
>>> +
>>> +    summary = RTAS_LOG_VERSION_6 | RTAS_LOG_OPTIONAL_PART_PRESENT;
>>> +    if (recovered) {
>>> +        summary |= RTAS_LOG_DISPOSITION_FULLY_RECOVERED;
>>> +    } else {
>>> +        summary |= RTAS_LOG_DISPOSITION_NOT_RECOVERED;
>>> +    }
>>> +
>>> +    if (SRR1_MC_LOADSTORE(env->spr[SPR_SRR1])) {
>>> +        for (i = 0; mc_derror_table[i].dsisr_value; i++) {
>>> +            if (!(dsisr & mc_derror_table[i].dsisr_value)) {
>>> +                continue;
>>> +            }
>>> +
>>> +            ext_elog->mc.error_type = mc_derror_table[i].error_type;
>>> +            ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>>> +            if (mc_derror_table[i].dar_valid) {
>>> +                ext_elog->mc.effective_address = cpu_to_be64(env->spr[SPR_DAR]);
>>> +            }
>>> +
>>> +            summary |= mc_derror_table[i].initiator
>>> +                        | mc_derror_table[i].severity;
>>> +
>>> +            return summary;
>>> +        }
>>> +    } else {
>>> +        for (i = 0; mc_ierror_table[i].srr1_mask; i++) {
>>> +            if ((env->spr[SPR_SRR1] & mc_ierror_table[i].srr1_mask) !=
>>> +                    mc_ierror_table[i].srr1_value) {
>>> +                continue;
>>> +            }
>>> +
>>> +            ext_elog->mc.error_type = mc_ierror_table[i].error_type;
>>> +            ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>>> +            if (mc_ierror_table[i].nip_valid) {
>>> +                ext_elog->mc.effective_address = cpu_to_be64(env->nip);
>>> +            }
>>> +
>>> +            summary |= mc_ierror_table[i].initiator
>>> +                        | mc_ierror_table[i].severity;
>>> +
>>> +            return summary;
>>> +        }
>>> +    }
>>> +
>>> +    summary |= RTAS_LOG_INITIATOR_CPU;
>>> +    return summary;
>>> +}
>>> +
>>> +static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>> +{
>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> +    CPUState *cs = CPU(cpu);
>>> +    uint64_t rtas_addr;
>>> +    CPUPPCState *env = &cpu->env;
>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>> +    target_ulong r3, msr = 0;
>>> +    struct rtas_error_log log;
>>> +    struct mc_extended_log *ext_elog;
>>> +    uint32_t summary;
>>> +
>>> +    /*
>>> +     * Properly set bits in MSR before we invoke the handler.
>>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>>> +     */
>>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>>> +        msr |= (1ULL << MSR_LE);
>>> +    }
>>> +
>>> +    if (env->msr && (1ULL << MSR_SF)) {
>>> +        msr |= (1ULL << MSR_SF);
>>> +    }
>>> +
>>> +    msr |= (1ULL << MSR_ME);
>>> +
>>> +    if (spapr->guest_machine_check_addr == -1) {
>>
>> Should be (target_ulong) -1
> 
> I think the == itself should perform the necessary coercion.
> 
>>> +        /*
>>> +         * This implies that we have hit a machine check between system
>>> +         * reset and "ibm,nmi-register". Fall back to the old machine
>>> +         * check behavior in such cases.
>>> +         */
>>> +        env->spr[SPR_SRR0] = env->nip;
>>> +        env->spr[SPR_SRR1] = env->msr;
>>> +        env->msr = msr;
>>> +        env->nip = 0x200;
>>> +        return;
>>> +    }
>>> +
>>> +    ext_elog = g_malloc0(sizeof(struct mc_extended_log));
>>
>> sizeof(*ext_elog) is preferable IMHO, same remark for the other sizeof sites.
> 
> Agreed.

ok.

> 
>> Also, I can't find the corresponding call to g_free(), which should be
>> somewhere in this function IIUC.
> 
> Yes, that needs fixing.

Yes, missed calling free. Will fix it.

> 
>>
>>> +    summary = spapr_mce_get_elog_type(cpu, recovered, ext_elog);
>>> +
>>> +    log.summary = cpu_to_be32(summary);
>>> +    log.extended_length = cpu_to_be32(sizeof(struct mc_extended_log));
>>> +
>>> +    /* r3 should be in BE always */
>>> +    r3 = cpu_to_be64(env->gpr[3]);
>>> +    env->msr = msr;
>>> +
>>> +    spapr_init_v6hdr(&ext_elog->v6hdr);
>>> +    ext_elog->mc.hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MC);
>>> +    ext_elog->mc.hdr.section_length =
>>> +                    cpu_to_be16(sizeof(struct rtas_event_log_v6_mc));
>>> +    ext_elog->mc.hdr.section_version = 1;
>>> +
>>> +    /* get rtas addr from fdt */
>>> +    rtas_addr = spapr_get_rtas_addr();
>>> +    if (!rtas_addr) {
>>> +        /* Unable to fetch rtas_addr. Hence reset the guest */
>>> +        ppc_cpu_do_system_reset(cs);
>>> +    }
>>> +
>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET, &r3, sizeof(r3));
>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3),
>>> +                              &log, sizeof(log));
>>> +    cpu_physical_memory_write(rtas_addr + RTAS_ERRLOG_OFFSET + sizeof(r3) +
>>> +                              sizeof(log), ext_elog,
>>> +                              sizeof(struct mc_extended_log));
>>> +
>>> +    /* Save gpr[3] in the guest endian mode */
>>> +    if ((*pcc->interrupts_big_endian)(cpu)) {
>>> +        env->gpr[3] = cpu_to_be64(rtas_addr + RTAS_ERRLOG_OFFSET);
>>> +    } else {
>>> +        env->gpr[3] = cpu_to_le64(rtas_addr + RTAS_ERRLOG_OFFSET);
>>> +    }
>>> +
>>> +    env->nip = spapr->guest_machine_check_addr;
>>> +}
>>> +
>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>  {
>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>> @@ -640,6 +881,10 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>          }
>>>      }
>>>      spapr->mc_status = cpu->vcpu_id;
>>> +
>>> +    spapr_mce_dispatch_elog(cpu, recovered);
>>> +
>>> +    return;
>>>  }
>>>  
>>>  static void check_exception(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index f7204d0..03f34bf 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -661,6 +661,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>>  #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2
>>>  #define DIAGNOSTICS_RUN_MODE_PERIODIC  3
>>>  
>>> +/* Offset from rtas-base where error log is placed */
>>> +#define RTAS_ERRLOG_OFFSET       0x25
>>> +
>>
>> We already have an RTAS_ERROR_LOG_MAX macro defined in this file.
>> Maybe use the same "ERROR_LOG" wording for consistency.
> 
> Agreed.

ok.

Regards,
Aravinda

> 
>>>  static inline uint64_t ppc64_phys_to_real(uint64_t addr)
>>>  {
>>>      return addr & ~0xF000000000000000ULL;
>>> @@ -798,6 +801,7 @@ int spapr_max_server_number(SpaprMachineState *spapr);
>>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>>                        uint64_t pte0, uint64_t pte1);
>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>>> +ssize_t spapr_get_rtas_size(ssize_t old_rtas_sizea);
>>>  
>>>  /* DRC callbacks. */
>>>  void spapr_core_release(DeviceState *dev);
>>>
>>>
>>
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-14  4:26       ` Aravinda Prasad
@ 2019-05-14  4:40         ` David Gibson
  2019-05-14  5:06           ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-14  4:40 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]

On Tue, May 14, 2019 at 09:56:41AM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 14 May 2019 05:38 AM, David Gibson wrote:
> > On Mon, May 13, 2019 at 01:30:53PM +0200, Greg Kurz wrote:
> >> On Mon, 22 Apr 2019 12:33:26 +0530
> >> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >>
> >>> Upon a machine check exception (MCE) in a guest address space,
> >>> KVM causes a guest exit to enable QEMU to build and pass the
> >>> error to the guest in the PAPR defined rtas error log format.
> >>>
> >>> This patch builds the rtas error log, copies it to the rtas_addr
> >>> and then invokes the guest registered machine check handler. The
> >>> handler in the guest takes suitable action(s) depending on the type
> >>> and criticality of the error. For example, if an error is
> >>> unrecoverable memory corruption in an application inside the
> >>> guest, then the guest kernel sends a SIGBUS to the application.
> >>> For recoverable errors, the guest performs recovery actions and
> >>> logs the error.
> >>>
> >>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>> ---
> >>>  hw/ppc/spapr.c         |    4 +
> >>>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  include/hw/ppc/spapr.h |    4 +
> >>>  3 files changed, 253 insertions(+)
> >>>
> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>> index 2779efe..ffd1715 100644
> >>> --- a/hw/ppc/spapr.c
> >>> +++ b/hw/ppc/spapr.c
> >>> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
> >>>          error_report("Could not get size of LPAR rtas '%s'", filename);
> >>>          exit(1);
> >>>      }
> >>> +
> >>> +    /* Resize blob to accommodate error log. */
> >>> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
> >>> +
> >>
> >> This is the only user for spapr_get_rtas_size(), which is trivial.
> >> I suggest you simply open-code it here.
> > 
> > I agree.
> 
> Sure.
> 
> > 
> >> But also, spapr->rtas_size is a guest visible thing, "rtas-size" prop in the
> >> DT. Since existing machine types don't do that, I guess we should only use
> >> the new size if cap-fwnmi-mce=on for the sake of compatibility.
> > 
> > Yes, that's a good idea.  Changing this is very unlikely to break a
> > guest, but it's easy to be safe here so let's do it.
> 
> I did it like that because the rtas_blob is allocated based on rtas_size
> in spapr_machine_init(). During spapr_machine_init() it is not know if
> the guest calls "ibm, nmi-register". So if we want to use the new size
> only when cap_fwnmi=on, then we have to realloc the blob in "ibm,
> nmi-register".

What?  Just always allocate the necessary space in
spapr_machine_init() if cap_fwnmi=on, it'll be wasted if
ibm,nmi-register is never called, but it's not that much space so we
don't really care.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-13 10:30         ` Aravinda Prasad
@ 2019-05-14  4:47           ` David Gibson
  2019-05-14  5:32             ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-14  4:47 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, aik, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11303 bytes --]

On Mon, May 13, 2019 at 04:00:43PM +0530, Aravinda Prasad wrote:
> 
> 
> On Friday 10 May 2019 03:23 PM, David Gibson wrote:
> > On Fri, May 10, 2019 at 12:45:29PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Friday 10 May 2019 12:16 PM, David Gibson wrote:
> >>> On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
> >>>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
> >>>> the KVM causes guest exit with NMI as exit reason
> >>>> when it encounters a machine check exception on the
> >>>> address belonging to a guest. Without this capability
> >>>> enabled, KVM redirects machine check exceptions to
> >>>> guest's 0x200 vector.
> >>>>
> >>>> This patch also deals with the case when a guest with
> >>>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
> >>>> to migrate to a host that does not support this
> >>>> capability.
> >>>>
> >>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>> ---
> >>>>  hw/ppc/spapr.c         |    1 +
> >>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
> >>>>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
> >>>>  include/hw/ppc/spapr.h |    4 +++-
> >>>>  target/ppc/kvm.c       |   14 ++++++++++++++
> >>>>  target/ppc/kvm_ppc.h   |    6 ++++++
> >>>>  6 files changed, 64 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>> index ffd1715..44e09bb 100644
> >>>> --- a/hw/ppc/spapr.c
> >>>> +++ b/hw/ppc/spapr.c
> >>>> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>      smc->irq = &spapr_irq_xics;
> >>>>      smc->dr_phb_enabled = true;
> >>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >>>> index edc5ed0..5b3af04 100644
> >>>> --- a/hw/ppc/spapr_caps.c
> >>>> +++ b/hw/ppc/spapr_caps.c
> >>>> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>      }
> >>>>  }
> >>>>  
> >>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >>>> +                                Error **errp)
> >>>> +{
> >>>> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> >>>> +
> >>>> +    if (!val) {
> >>>> +        return; /* Disabled by default */
> >>>> +    }
> >>>> +
> >>>> +    if (kvm_enabled()) {
> >>>> +        if (kvmppc_fwnmi_enable(cpu)) {
> >>>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
> >>>> +        }
> >>>> +    }
> >>>> +}
> >>>> +
> >>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>>>      [SPAPR_CAP_HTM] = {
> >>>>          .name = "htm",
> >>>> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>>>          .type = "bool",
> >>>>          .apply = cap_ccf_assist_apply,
> >>>>      },
> >>>> +    [SPAPR_CAP_FWNMI_MCE] = {
> >>>> +        .name = "fwnmi-mce",
> >>>> +        .description = "Handle fwnmi machine check exceptions",
> >>>> +        .index = SPAPR_CAP_FWNMI_MCE,
> >>>> +        .get = spapr_cap_get_bool,
> >>>> +        .set = spapr_cap_set_bool,
> >>>> +        .type = "bool",
> >>>> +        .apply = cap_fwnmi_mce_apply,
> >>>> +    },
> >>>>  };
> >>>>  
> >>>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> >>>> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
> >>>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> >>>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
> >>>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> >>>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
> >>>>  
> >>>>  void spapr_caps_init(SpaprMachineState *spapr)
> >>>>  {
> >>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>> index d3499f9..997cf19 100644
> >>>> --- a/hw/ppc/spapr_rtas.c
> >>>> +++ b/hw/ppc/spapr_rtas.c
> >>>> @@ -49,6 +49,7 @@
> >>>>  #include "hw/ppc/fdt.h"
> >>>>  #include "target/ppc/mmu-hash64.h"
> >>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>> +#include "kvm_ppc.h"
> >>>>  
> >>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>                                     uint32_t token, uint32_t nargs,
> >>>> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>                                    target_ulong args,
> >>>>                                    uint32_t nret, target_ulong rets)
> >>>>  {
> >>>> +    int ret;
> >>>>      uint64_t rtas_addr = spapr_get_rtas_addr();
> >>>>  
> >>>>      if (!rtas_addr) {
> >>>> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>          return;
> >>>>      }
> >>>>  
> >>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>
> >>> You shouldn't need this here as well as in cap_fwnmi_mce_apply().
> >>>
> >>> Instead, you should unconditionally fail the nmi-register if the
> >>> capability is not enabled.
> >>
> >> cap_fwnmi is not enabled by default, because if it is enabled by default
> >> them KVM will start routing machine check exceptions via guest exit
> >> instead of routing it to guest's 0x200.
> >>
> >> During early boot since guest has not yet issued nmi-register, KVM is
> >> expected to route exceptions to 0x200. Therefore we enable cap_fwnmi
> >> only when a guest issues nmi-register.
> > 
> > Except that's not true - you enable it in cap_fwnmi_mce_apply() which
> > will be executed whenever the machine capability is enabled.
> 
> I enable cap_fwnmi in cap_fwnmi_mce_apply() only when the "val" argument
> (which is the effective cap value) is set. In early boot "val" is not
> set as cap_fwnmi by default is not set, hence cap_fwnmi is not
> enabled.

Uh.. if that's true, something else is horribly wrong.  SPAPR caps are
designed to have a fixed value for the lifetime of the VM.  Otherwise
they will fail in their purpose of making sure we have a consistent
environment across migrations.  So if the 'val' changes after the
first call to apply(), then something is broken.

> 
> My understanding is that, cap_fwnmi_mce_apply() is also called during
> migration on the target machine.

Only in the sense that the machine is initialized before processing
the incoming migration.  The capability values must be equal on either
side of the migration (that's checked elsewhere).  Well, actually,
you're allowed to increase the cap value across a migration, just not
decrease it.

> If effective cap for cap_fwnmi is
> enabled on source machine than I think "val" will be set when
> cap_fwnmi_mce_apply() is called on target machine.

Nope.  The migrated value of the cap will be *validated* against the
value set on the destination setup, but it won't *alter* the value on
the destination (the result is that you have it enabled on the source,
but not the destination, the migration will fail).

> I then call
> kvmppc_fwnmi_enable() to enable cap_fwnmi on target.
> 
> Regards,
> Aravinda
> 
> > 
> >> Or we should take the approach of enabling this capability by default
> >> and then from QEMU route the error to 0x200 if guest has not issued
> >> nmi-register.
> >>
> >>>
> >>>> +    if (ret == 1) {
> >>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    if (ret < 0) {
> >>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>  }
> >>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >>>> index 03f34bf..9d16ad1 100644
> >>>> --- a/include/hw/ppc/spapr.h
> >>>> +++ b/include/hw/ppc/spapr.h
> >>>> @@ -78,8 +78,10 @@ typedef enum {
> >>>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
> >>>>  /* Count Cache Flush Assist HW Instruction */
> >>>>  #define SPAPR_CAP_CCF_ASSIST            0x09
> >>>> +/* FWNMI machine check handling */
> >>>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
> >>>>  /* Num Caps */
> >>>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
> >>>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
> >>>>  
> >>>>  /*
> >>>>   * Capability Values
> >>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> >>>> index 5eedce8..9c7b71d 100644
> >>>> --- a/target/ppc/kvm.c
> >>>> +++ b/target/ppc/kvm.c
> >>>> @@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
> >>>>  static int cap_ppc_count_cache_flush_assist;
> >>>>  static int cap_ppc_nested_kvm_hv;
> >>>>  static int cap_large_decr;
> >>>> +static int cap_ppc_fwnmi;
> >>>>  
> >>>>  static uint32_t debug_inst_opcode;
> >>>>  
> >>>> @@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >>>>      kvmppc_get_cpu_characteristics(s);
> >>>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
> >>>>      cap_large_decr = kvmppc_get_dec_bits();
> >>>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
> >>>>      /*
> >>>>       * Note: setting it to false because there is not such capability
> >>>>       * in KVM at this moment.
> >>>> @@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>>>      }
> >>>>  }
> >>>>  
> >>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> >>>> +{
> >>>> +    CPUState *cs = CPU(cpu);
> >>>> +
> >>>> +    if (!cap_ppc_fwnmi) {
> >>>> +        return 1;
> >>>> +    }
> >>>> +
> >>>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> >>>> +}
> >>>> +
> >>>> +
> >>>>  int kvmppc_smt_threads(void)
> >>>>  {
> >>>>      return cap_ppc_smt ? cap_ppc_smt : 1;
> >>>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> >>>> index 6edc42f..28919d3 100644
> >>>> --- a/target/ppc/kvm_ppc.h
> >>>> +++ b/target/ppc/kvm_ppc.h
> >>>> @@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
> >>>>  void kvmppc_set_papr(PowerPCCPU *cpu);
> >>>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
> >>>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
> >>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
> >>>>  int kvmppc_smt_threads(void);
> >>>>  void kvmppc_hint_smt_possible(Error **errp);
> >>>>  int kvmppc_set_smt_threads(int smt);
> >>>> @@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>>>  {
> >>>>  }
> >>>>  
> >>>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
> >>>> +{
> >>>> +    return 1;
> >>>> +}
> >>>> +
> >>>>  static inline int kvmppc_smt_threads(void)
> >>>>  {
> >>>>      return 1;
> >>>>
> >>>
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-14  4:40         ` David Gibson
@ 2019-05-14  5:06           ` Aravinda Prasad
  2019-05-16  1:47             ` David Gibson
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-14  5:06 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc



On Tuesday 14 May 2019 10:10 AM, David Gibson wrote:
> On Tue, May 14, 2019 at 09:56:41AM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 14 May 2019 05:38 AM, David Gibson wrote:
>>> On Mon, May 13, 2019 at 01:30:53PM +0200, Greg Kurz wrote:
>>>> On Mon, 22 Apr 2019 12:33:26 +0530
>>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>>
>>>>> Upon a machine check exception (MCE) in a guest address space,
>>>>> KVM causes a guest exit to enable QEMU to build and pass the
>>>>> error to the guest in the PAPR defined rtas error log format.
>>>>>
>>>>> This patch builds the rtas error log, copies it to the rtas_addr
>>>>> and then invokes the guest registered machine check handler. The
>>>>> handler in the guest takes suitable action(s) depending on the type
>>>>> and criticality of the error. For example, if an error is
>>>>> unrecoverable memory corruption in an application inside the
>>>>> guest, then the guest kernel sends a SIGBUS to the application.
>>>>> For recoverable errors, the guest performs recovery actions and
>>>>> logs the error.
>>>>>
>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>> ---
>>>>>  hw/ppc/spapr.c         |    4 +
>>>>>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  include/hw/ppc/spapr.h |    4 +
>>>>>  3 files changed, 253 insertions(+)
>>>>>
>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>> index 2779efe..ffd1715 100644
>>>>> --- a/hw/ppc/spapr.c
>>>>> +++ b/hw/ppc/spapr.c
>>>>> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
>>>>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>>>>          exit(1);
>>>>>      }
>>>>> +
>>>>> +    /* Resize blob to accommodate error log. */
>>>>> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
>>>>> +
>>>>
>>>> This is the only user for spapr_get_rtas_size(), which is trivial.
>>>> I suggest you simply open-code it here.
>>>
>>> I agree.
>>
>> Sure.
>>
>>>
>>>> But also, spapr->rtas_size is a guest visible thing, "rtas-size" prop in the
>>>> DT. Since existing machine types don't do that, I guess we should only use
>>>> the new size if cap-fwnmi-mce=on for the sake of compatibility.
>>>
>>> Yes, that's a good idea.  Changing this is very unlikely to break a
>>> guest, but it's easy to be safe here so let's do it.
>>
>> I did it like that because the rtas_blob is allocated based on rtas_size
>> in spapr_machine_init(). During spapr_machine_init() it is not know if
>> the guest calls "ibm, nmi-register". So if we want to use the new size
>> only when cap_fwnmi=on, then we have to realloc the blob in "ibm,
>> nmi-register".
> 
> What?  Just always allocate the necessary space in
> spapr_machine_init() if cap_fwnmi=on, it'll be wasted if
> ibm,nmi-register is never called, but it's not that much space so we
> don't really care.

Yes, not that much space, and ibm,nmi-register is called when the Linux
kernel boots. I guess, even though other OSes might not call
ibm,nmi-register, they do not constitute significant QEMU on Power users.

So I think, I will keep the code as is.

> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-14  4:47           ` David Gibson
@ 2019-05-14  5:32             ` Aravinda Prasad
  2019-05-16  1:45               ` David Gibson
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-14  5:32 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, aik, qemu-ppc, qemu-devel



On Tuesday 14 May 2019 10:17 AM, David Gibson wrote:
> On Mon, May 13, 2019 at 04:00:43PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Friday 10 May 2019 03:23 PM, David Gibson wrote:
>>> On Fri, May 10, 2019 at 12:45:29PM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Friday 10 May 2019 12:16 PM, David Gibson wrote:
>>>>> On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
>>>>>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
>>>>>> the KVM causes guest exit with NMI as exit reason
>>>>>> when it encounters a machine check exception on the
>>>>>> address belonging to a guest. Without this capability
>>>>>> enabled, KVM redirects machine check exceptions to
>>>>>> guest's 0x200 vector.
>>>>>>
>>>>>> This patch also deals with the case when a guest with
>>>>>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>>>> to migrate to a host that does not support this
>>>>>> capability.
>>>>>>
>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  hw/ppc/spapr.c         |    1 +
>>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>>>>>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
>>>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>>>  target/ppc/kvm.c       |   14 ++++++++++++++
>>>>>>  target/ppc/kvm_ppc.h   |    6 ++++++
>>>>>>  6 files changed, 64 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>> index ffd1715..44e09bb 100644
>>>>>> --- a/hw/ppc/spapr.c
>>>>>> +++ b/hw/ppc/spapr.c
>>>>>> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>      smc->irq = &spapr_irq_xics;
>>>>>>      smc->dr_phb_enabled = true;
>>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>>>> index edc5ed0..5b3af04 100644
>>>>>> --- a/hw/ppc/spapr_caps.c
>>>>>> +++ b/hw/ppc/spapr_caps.c
>>>>>> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>      }
>>>>>>  }
>>>>>>  
>>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>> +                                Error **errp)
>>>>>> +{
>>>>>> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>>>>>> +
>>>>>> +    if (!val) {
>>>>>> +        return; /* Disabled by default */
>>>>>> +    }
>>>>>> +
>>>>>> +    if (kvm_enabled()) {
>>>>>> +        if (kvmppc_fwnmi_enable(cpu)) {
>>>>>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
>>>>>> +        }
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>>>      [SPAPR_CAP_HTM] = {
>>>>>>          .name = "htm",
>>>>>> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>>>          .type = "bool",
>>>>>>          .apply = cap_ccf_assist_apply,
>>>>>>      },
>>>>>> +    [SPAPR_CAP_FWNMI_MCE] = {
>>>>>> +        .name = "fwnmi-mce",
>>>>>> +        .description = "Handle fwnmi machine check exceptions",
>>>>>> +        .index = SPAPR_CAP_FWNMI_MCE,
>>>>>> +        .get = spapr_cap_get_bool,
>>>>>> +        .set = spapr_cap_set_bool,
>>>>>> +        .type = "bool",
>>>>>> +        .apply = cap_fwnmi_mce_apply,
>>>>>> +    },
>>>>>>  };
>>>>>>  
>>>>>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>>>>>> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>>>>>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>>>>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>>>>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>>>>>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>>>>>  
>>>>>>  void spapr_caps_init(SpaprMachineState *spapr)
>>>>>>  {
>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>>> index d3499f9..997cf19 100644
>>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>>> @@ -49,6 +49,7 @@
>>>>>>  #include "hw/ppc/fdt.h"
>>>>>>  #include "target/ppc/mmu-hash64.h"
>>>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>>>> +#include "kvm_ppc.h"
>>>>>>  
>>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>>>                                     uint32_t token, uint32_t nargs,
>>>>>> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>>>                                    target_ulong args,
>>>>>>                                    uint32_t nret, target_ulong rets)
>>>>>>  {
>>>>>> +    int ret;
>>>>>>      uint64_t rtas_addr = spapr_get_rtas_addr();
>>>>>>  
>>>>>>      if (!rtas_addr) {
>>>>>> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>>>          return;
>>>>>>      }
>>>>>>  
>>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>>>
>>>>> You shouldn't need this here as well as in cap_fwnmi_mce_apply().
>>>>>
>>>>> Instead, you should unconditionally fail the nmi-register if the
>>>>> capability is not enabled.
>>>>
>>>> cap_fwnmi is not enabled by default, because if it is enabled by default
>>>> them KVM will start routing machine check exceptions via guest exit
>>>> instead of routing it to guest's 0x200.
>>>>
>>>> During early boot since guest has not yet issued nmi-register, KVM is
>>>> expected to route exceptions to 0x200. Therefore we enable cap_fwnmi
>>>> only when a guest issues nmi-register.
>>>
>>> Except that's not true - you enable it in cap_fwnmi_mce_apply() which
>>> will be executed whenever the machine capability is enabled.
>>
>> I enable cap_fwnmi in cap_fwnmi_mce_apply() only when the "val" argument
>> (which is the effective cap value) is set. In early boot "val" is not
>> set as cap_fwnmi by default is not set, hence cap_fwnmi is not
>> enabled.
> 
> Uh.. if that's true, something else is horribly wrong.  SPAPR caps are
> designed to have a fixed value for the lifetime of the VM.  Otherwise
> they will fail in their purpose of making sure we have a consistent
> environment across migrations.  So if the 'val' changes after the
> first call to apply(), then something is broken.

If SPAPR caps are initialized before boot that do not change later, then
for cap_fwnmi, the default value is off at boot and the cap is enabled
only when guest issues "ibm,nmi-register" call. Should SPAPR caps be
updated when "ibm,nmi-register" is called?

> 
>>
>> My understanding is that, cap_fwnmi_mce_apply() is also called during
>> migration on the target machine.
> 
> Only in the sense that the machine is initialized before processing
> the incoming migration.  The capability values must be equal on either
> side of the migration (that's checked elsewhere).  Well, actually,
> you're allowed to increase the cap value across a migration, just not
> decrease it.

Ah.. ok.. I am still familiarizing myself with the migration code..

> 
>> If effective cap for cap_fwnmi is
>> enabled on source machine than I think "val" will be set when
>> cap_fwnmi_mce_apply() is called on target machine.
> 
> Nope.  The migrated value of the cap will be *validated* against the
> value set on the destination setup, but it won't *alter* the value on
> the destination (the result is that you have it enabled on the source,
> but not the destination, the migration will fail).

But if cap_fwnmi is set on the host, which function is responsible to
enable it on the destination? I think cap_fwnmi_mce_apply() is
responsible for enabling it on the destination. If that is the case
cap_fwnmi_mce_apply() should know if cap_fwnmi is set on the host and
the only way it can check that is based on the "val" argument passed on
to it.

Or am I missing something here?

Regards,
Aravinda

> 
>> I then call
>> kvmppc_fwnmi_enable() to enable cap_fwnmi on target.
>>
>> Regards,
>> Aravinda
>>
>>>
>>>> Or we should take the approach of enabling this capability by default
>>>> and then from QEMU route the error to 0x200 if guest has not issued
>>>> nmi-register.
>>>>
>>>>>
>>>>>> +    if (ret == 1) {
>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (ret < 0) {
>>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>>>  }
>>>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>>>> index 03f34bf..9d16ad1 100644
>>>>>> --- a/include/hw/ppc/spapr.h
>>>>>> +++ b/include/hw/ppc/spapr.h
>>>>>> @@ -78,8 +78,10 @@ typedef enum {
>>>>>>  #define SPAPR_CAP_LARGE_DECREMENTER     0x08
>>>>>>  /* Count Cache Flush Assist HW Instruction */
>>>>>>  #define SPAPR_CAP_CCF_ASSIST            0x09
>>>>>> +/* FWNMI machine check handling */
>>>>>> +#define SPAPR_CAP_FWNMI_MCE             0x0A
>>>>>>  /* Num Caps */
>>>>>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_CCF_ASSIST + 1)
>>>>>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_FWNMI_MCE + 1)
>>>>>>  
>>>>>>  /*
>>>>>>   * Capability Values
>>>>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>>>>>> index 5eedce8..9c7b71d 100644
>>>>>> --- a/target/ppc/kvm.c
>>>>>> +++ b/target/ppc/kvm.c
>>>>>> @@ -83,6 +83,7 @@ static int cap_ppc_safe_indirect_branch;
>>>>>>  static int cap_ppc_count_cache_flush_assist;
>>>>>>  static int cap_ppc_nested_kvm_hv;
>>>>>>  static int cap_large_decr;
>>>>>> +static int cap_ppc_fwnmi;
>>>>>>  
>>>>>>  static uint32_t debug_inst_opcode;
>>>>>>  
>>>>>> @@ -150,6 +151,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>>>      kvmppc_get_cpu_characteristics(s);
>>>>>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>>>>>>      cap_large_decr = kvmppc_get_dec_bits();
>>>>>> +    cap_ppc_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>>>>>      /*
>>>>>>       * Note: setting it to false because there is not such capability
>>>>>>       * in KVM at this moment.
>>>>>> @@ -2117,6 +2119,18 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>>>      }
>>>>>>  }
>>>>>>  
>>>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>>>> +{
>>>>>> +    CPUState *cs = CPU(cpu);
>>>>>> +
>>>>>> +    if (!cap_ppc_fwnmi) {
>>>>>> +        return 1;
>>>>>> +    }
>>>>>> +
>>>>>> +    return kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>>  int kvmppc_smt_threads(void)
>>>>>>  {
>>>>>>      return cap_ppc_smt ? cap_ppc_smt : 1;
>>>>>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>>>>>> index 6edc42f..28919d3 100644
>>>>>> --- a/target/ppc/kvm_ppc.h
>>>>>> +++ b/target/ppc/kvm_ppc.h
>>>>>> @@ -27,6 +27,7 @@ void kvmppc_enable_h_page_init(void);
>>>>>>  void kvmppc_set_papr(PowerPCCPU *cpu);
>>>>>>  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>>>>>>  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>>>>>> +int kvmppc_fwnmi_enable(PowerPCCPU *cpu);
>>>>>>  int kvmppc_smt_threads(void);
>>>>>>  void kvmppc_hint_smt_possible(Error **errp);
>>>>>>  int kvmppc_set_smt_threads(int smt);
>>>>>> @@ -159,6 +160,11 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>>>>>  {
>>>>>>  }
>>>>>>  
>>>>>> +static inline int kvmppc_fwnmi_enable(PowerPCCPU *cpu)
>>>>>> +{
>>>>>> +    return 1;
>>>>>> +}
>>>>>> +
>>>>>>  static inline int kvmppc_smt_threads(void)
>>>>>>  {
>>>>>>      return 1;
>>>>>>
>>>>>
>>>>
>>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-14  5:32             ` Aravinda Prasad
@ 2019-05-16  1:45               ` David Gibson
  2019-05-16  4:59                 ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-16  1:45 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: paulus, aik, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9977 bytes --]

On Tue, May 14, 2019 at 11:02:07AM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 14 May 2019 10:17 AM, David Gibson wrote:
> > On Mon, May 13, 2019 at 04:00:43PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Friday 10 May 2019 03:23 PM, David Gibson wrote:
> >>> On Fri, May 10, 2019 at 12:45:29PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Friday 10 May 2019 12:16 PM, David Gibson wrote:
> >>>>> On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
> >>>>>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
> >>>>>> the KVM causes guest exit with NMI as exit reason
> >>>>>> when it encounters a machine check exception on the
> >>>>>> address belonging to a guest. Without this capability
> >>>>>> enabled, KVM redirects machine check exceptions to
> >>>>>> guest's 0x200 vector.
> >>>>>>
> >>>>>> This patch also deals with the case when a guest with
> >>>>>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
> >>>>>> to migrate to a host that does not support this
> >>>>>> capability.
> >>>>>>
> >>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>>>> ---
> >>>>>>  hw/ppc/spapr.c         |    1 +
> >>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
> >>>>>>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
> >>>>>>  include/hw/ppc/spapr.h |    4 +++-
> >>>>>>  target/ppc/kvm.c       |   14 ++++++++++++++
> >>>>>>  target/ppc/kvm_ppc.h   |    6 ++++++
> >>>>>>  6 files changed, 64 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>> index ffd1715..44e09bb 100644
> >>>>>> --- a/hw/ppc/spapr.c
> >>>>>> +++ b/hw/ppc/spapr.c
> >>>>>> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>      smc->irq = &spapr_irq_xics;
> >>>>>>      smc->dr_phb_enabled = true;
> >>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >>>>>> index edc5ed0..5b3af04 100644
> >>>>>> --- a/hw/ppc/spapr_caps.c
> >>>>>> +++ b/hw/ppc/spapr_caps.c
> >>>>>> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>>>      }
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
> >>>>>> +                                Error **errp)
> >>>>>> +{
> >>>>>> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> >>>>>> +
> >>>>>> +    if (!val) {
> >>>>>> +        return; /* Disabled by default */
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    if (kvm_enabled()) {
> >>>>>> +        if (kvmppc_fwnmi_enable(cpu)) {
> >>>>>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
> >>>>>> +        }
> >>>>>> +    }
> >>>>>> +}
> >>>>>> +
> >>>>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>>>>>      [SPAPR_CAP_HTM] = {
> >>>>>>          .name = "htm",
> >>>>>> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>>>>>          .type = "bool",
> >>>>>>          .apply = cap_ccf_assist_apply,
> >>>>>>      },
> >>>>>> +    [SPAPR_CAP_FWNMI_MCE] = {
> >>>>>> +        .name = "fwnmi-mce",
> >>>>>> +        .description = "Handle fwnmi machine check exceptions",
> >>>>>> +        .index = SPAPR_CAP_FWNMI_MCE,
> >>>>>> +        .get = spapr_cap_get_bool,
> >>>>>> +        .set = spapr_cap_set_bool,
> >>>>>> +        .type = "bool",
> >>>>>> +        .apply = cap_fwnmi_mce_apply,
> >>>>>> +    },
> >>>>>>  };
> >>>>>>  
> >>>>>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> >>>>>> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
> >>>>>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> >>>>>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
> >>>>>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> >>>>>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
> >>>>>>  
> >>>>>>  void spapr_caps_init(SpaprMachineState *spapr)
> >>>>>>  {
> >>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>>> index d3499f9..997cf19 100644
> >>>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>>> @@ -49,6 +49,7 @@
> >>>>>>  #include "hw/ppc/fdt.h"
> >>>>>>  #include "target/ppc/mmu-hash64.h"
> >>>>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>>>> +#include "kvm_ppc.h"
> >>>>>>  
> >>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>>>>>                                     uint32_t token, uint32_t nargs,
> >>>>>> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>>                                    target_ulong args,
> >>>>>>                                    uint32_t nret, target_ulong rets)
> >>>>>>  {
> >>>>>> +    int ret;
> >>>>>>      uint64_t rtas_addr = spapr_get_rtas_addr();
> >>>>>>  
> >>>>>>      if (!rtas_addr) {
> >>>>>> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>>          return;
> >>>>>>      }
> >>>>>>  
> >>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>>>
> >>>>> You shouldn't need this here as well as in cap_fwnmi_mce_apply().
> >>>>>
> >>>>> Instead, you should unconditionally fail the nmi-register if the
> >>>>> capability is not enabled.
> >>>>
> >>>> cap_fwnmi is not enabled by default, because if it is enabled by default
> >>>> them KVM will start routing machine check exceptions via guest exit
> >>>> instead of routing it to guest's 0x200.
> >>>>
> >>>> During early boot since guest has not yet issued nmi-register, KVM is
> >>>> expected to route exceptions to 0x200. Therefore we enable cap_fwnmi
> >>>> only when a guest issues nmi-register.
> >>>
> >>> Except that's not true - you enable it in cap_fwnmi_mce_apply() which
> >>> will be executed whenever the machine capability is enabled.
> >>
> >> I enable cap_fwnmi in cap_fwnmi_mce_apply() only when the "val" argument
> >> (which is the effective cap value) is set. In early boot "val" is not
> >> set as cap_fwnmi by default is not set, hence cap_fwnmi is not
> >> enabled.
> > 
> > Uh.. if that's true, something else is horribly wrong.  SPAPR caps are
> > designed to have a fixed value for the lifetime of the VM.  Otherwise
> > they will fail in their purpose of making sure we have a consistent
> > environment across migrations.  So if the 'val' changes after the
> > first call to apply(), then something is broken.
> 
> If SPAPR caps are initialized before boot that do not change later, then
> for cap_fwnmi, the default value is off at boot and the cap is enabled
> only when guest issues "ibm,nmi-register" call. Should SPAPR caps be
> updated when "ibm,nmi-register" is called?

So the confusing thing here is that there are spapr machine caps, and
those are separate from the KVM caps for the VM.  Then the KVM caps
also have whether the cap is possible and whether it is current
activated.

The spapr machine caps *must* remain static for the VM's lifetime and
only cover possibilities, not runtime configuration.  KVM caps may be
activated as necessary.

So you can leave activating the KVM cap until nmi-register.  But if
the spapr cap is disabled you must prohibit nmi-register.

The apply() functions are responsible for checking if the spapr caps
are possible on the KVM implementation.  So if cap_fwnmi_mci_apply()
is called with val==1 and KVM doesn't support the fwnmi extensions, it
must fail outright.

> >> My understanding is that, cap_fwnmi_mce_apply() is also called during
> >> migration on the target machine.
> > 
> > Only in the sense that the machine is initialized before processing
> > the incoming migration.  The capability values must be equal on either
> > side of the migration (that's checked elsewhere).  Well, actually,
> > you're allowed to increase the cap value across a migration, just not
> > decrease it.
> 
> Ah.. ok.. I am still familiarizing myself with the migration code..
> 
> > 
> >> If effective cap for cap_fwnmi is
> >> enabled on source machine than I think "val" will be set when
> >> cap_fwnmi_mce_apply() is called on target machine.
> > 
> > Nope.  The migrated value of the cap will be *validated* against the
> > value set on the destination setup, but it won't *alter* the value on
> > the destination (the result is that you have it enabled on the source,
> > but not the destination, the migration will fail).
> 
> But if cap_fwnmi is set on the host, which function is responsible to

I'm not sure what you mean by "the host" here.

> enable it on the destination? I think cap_fwnmi_mce_apply() is
> responsible for enabling it on the destination.

Enabling the spapr cap?  It is set based on the command line and
machine type, just like on the source machine.

> If that is the case
> cap_fwnmi_mce_apply() should know if cap_fwnmi is set on the host and
> the only way it can check that is based on the "val" argument passed on
> to it.
> 
> Or am I missing something here?

Probably, but I'm not sure exactly what.

The val argument to apply() is set to the value of the spapr cap.
This is based on the qemu command line and machine type, and must be
the same on source and destination.  As a general rule, qemu requires
that the same machine options are used on source and destination.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-14  5:06           ` Aravinda Prasad
@ 2019-05-16  1:47             ` David Gibson
  2019-05-16  4:54               ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: David Gibson @ 2019-05-16  1:47 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3860 bytes --]

On Tue, May 14, 2019 at 10:36:17AM +0530, Aravinda Prasad wrote:
> 
> 
> On Tuesday 14 May 2019 10:10 AM, David Gibson wrote:
> > On Tue, May 14, 2019 at 09:56:41AM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Tuesday 14 May 2019 05:38 AM, David Gibson wrote:
> >>> On Mon, May 13, 2019 at 01:30:53PM +0200, Greg Kurz wrote:
> >>>> On Mon, 22 Apr 2019 12:33:26 +0530
> >>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> >>>>
> >>>>> Upon a machine check exception (MCE) in a guest address space,
> >>>>> KVM causes a guest exit to enable QEMU to build and pass the
> >>>>> error to the guest in the PAPR defined rtas error log format.
> >>>>>
> >>>>> This patch builds the rtas error log, copies it to the rtas_addr
> >>>>> and then invokes the guest registered machine check handler. The
> >>>>> handler in the guest takes suitable action(s) depending on the type
> >>>>> and criticality of the error. For example, if an error is
> >>>>> unrecoverable memory corruption in an application inside the
> >>>>> guest, then the guest kernel sends a SIGBUS to the application.
> >>>>> For recoverable errors, the guest performs recovery actions and
> >>>>> logs the error.
> >>>>>
> >>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >>>>> ---
> >>>>>  hw/ppc/spapr.c         |    4 +
> >>>>>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>  include/hw/ppc/spapr.h |    4 +
> >>>>>  3 files changed, 253 insertions(+)
> >>>>>
> >>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>> index 2779efe..ffd1715 100644
> >>>>> --- a/hw/ppc/spapr.c
> >>>>> +++ b/hw/ppc/spapr.c
> >>>>> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
> >>>>>          error_report("Could not get size of LPAR rtas '%s'", filename);
> >>>>>          exit(1);
> >>>>>      }
> >>>>> +
> >>>>> +    /* Resize blob to accommodate error log. */
> >>>>> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
> >>>>> +
> >>>>
> >>>> This is the only user for spapr_get_rtas_size(), which is trivial.
> >>>> I suggest you simply open-code it here.
> >>>
> >>> I agree.
> >>
> >> Sure.
> >>
> >>>
> >>>> But also, spapr->rtas_size is a guest visible thing, "rtas-size" prop in the
> >>>> DT. Since existing machine types don't do that, I guess we should only use
> >>>> the new size if cap-fwnmi-mce=on for the sake of compatibility.
> >>>
> >>> Yes, that's a good idea.  Changing this is very unlikely to break a
> >>> guest, but it's easy to be safe here so let's do it.
> >>
> >> I did it like that because the rtas_blob is allocated based on rtas_size
> >> in spapr_machine_init(). During spapr_machine_init() it is not know if
> >> the guest calls "ibm, nmi-register". So if we want to use the new size
> >> only when cap_fwnmi=on, then we have to realloc the blob in "ibm,
> >> nmi-register".
> > 
> > What?  Just always allocate the necessary space in
> > spapr_machine_init() if cap_fwnmi=on, it'll be wasted if
> > ibm,nmi-register is never called, but it's not that much space so we
> > don't really care.
> 
> Yes, not that much space, and ibm,nmi-register is called when the Linux
> kernel boots. I guess, even though other OSes might not call
> ibm,nmi-register, they do not constitute significant QEMU on Power users.
> 
> So I think, I will keep the code as is.

No, that's not right.  It's impractical to change the allocation
depending on whether fwnmi is currently active.  But you *can* (and
should) base the allocation on whether fwnmi is *possible* - that is,
the value of the spapr cap.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE
  2019-05-16  1:47             ` David Gibson
@ 2019-05-16  4:54               ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-16  4:54 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc



On Thursday 16 May 2019 07:17 AM, David Gibson wrote:
> On Tue, May 14, 2019 at 10:36:17AM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 14 May 2019 10:10 AM, David Gibson wrote:
>>> On Tue, May 14, 2019 at 09:56:41AM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Tuesday 14 May 2019 05:38 AM, David Gibson wrote:
>>>>> On Mon, May 13, 2019 at 01:30:53PM +0200, Greg Kurz wrote:
>>>>>> On Mon, 22 Apr 2019 12:33:26 +0530
>>>>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>>> Upon a machine check exception (MCE) in a guest address space,
>>>>>>> KVM causes a guest exit to enable QEMU to build and pass the
>>>>>>> error to the guest in the PAPR defined rtas error log format.
>>>>>>>
>>>>>>> This patch builds the rtas error log, copies it to the rtas_addr
>>>>>>> and then invokes the guest registered machine check handler. The
>>>>>>> handler in the guest takes suitable action(s) depending on the type
>>>>>>> and criticality of the error. For example, if an error is
>>>>>>> unrecoverable memory corruption in an application inside the
>>>>>>> guest, then the guest kernel sends a SIGBUS to the application.
>>>>>>> For recoverable errors, the guest performs recovery actions and
>>>>>>> logs the error.
>>>>>>>
>>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>>> ---
>>>>>>>  hw/ppc/spapr.c         |    4 +
>>>>>>>  hw/ppc/spapr_events.c  |  245 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  include/hw/ppc/spapr.h |    4 +
>>>>>>>  3 files changed, 253 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>>> index 2779efe..ffd1715 100644
>>>>>>> --- a/hw/ppc/spapr.c
>>>>>>> +++ b/hw/ppc/spapr.c
>>>>>>> @@ -2918,6 +2918,10 @@ static void spapr_machine_init(MachineState *machine)
>>>>>>>          error_report("Could not get size of LPAR rtas '%s'", filename);
>>>>>>>          exit(1);
>>>>>>>      }
>>>>>>> +
>>>>>>> +    /* Resize blob to accommodate error log. */
>>>>>>> +    spapr->rtas_size = spapr_get_rtas_size(spapr->rtas_size);
>>>>>>> +
>>>>>>
>>>>>> This is the only user for spapr_get_rtas_size(), which is trivial.
>>>>>> I suggest you simply open-code it here.
>>>>>
>>>>> I agree.
>>>>
>>>> Sure.
>>>>
>>>>>
>>>>>> But also, spapr->rtas_size is a guest visible thing, "rtas-size" prop in the
>>>>>> DT. Since existing machine types don't do that, I guess we should only use
>>>>>> the new size if cap-fwnmi-mce=on for the sake of compatibility.
>>>>>
>>>>> Yes, that's a good idea.  Changing this is very unlikely to break a
>>>>> guest, but it's easy to be safe here so let's do it.
>>>>
>>>> I did it like that because the rtas_blob is allocated based on rtas_size
>>>> in spapr_machine_init(). During spapr_machine_init() it is not know if
>>>> the guest calls "ibm, nmi-register". So if we want to use the new size
>>>> only when cap_fwnmi=on, then we have to realloc the blob in "ibm,
>>>> nmi-register".
>>>
>>> What?  Just always allocate the necessary space in
>>> spapr_machine_init() if cap_fwnmi=on, it'll be wasted if
>>> ibm,nmi-register is never called, but it's not that much space so we
>>> don't really care.
>>
>> Yes, not that much space, and ibm,nmi-register is called when the Linux
>> kernel boots. I guess, even though other OSes might not call
>> ibm,nmi-register, they do not constitute significant QEMU on Power users.
>>
>> So I think, I will keep the code as is.
> 
> No, that's not right.  It's impractical to change the allocation
> depending on whether fwnmi is currently active.  But you *can* (and
> should) base the allocation on whether fwnmi is *possible* - that is,
> the value of the spapr cap.

Sure..

> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability
  2019-05-16  1:45               ` David Gibson
@ 2019-05-16  4:59                 ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-16  4:59 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, aik, qemu-ppc, qemu-devel



On Thursday 16 May 2019 07:15 AM, David Gibson wrote:
> On Tue, May 14, 2019 at 11:02:07AM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 14 May 2019 10:17 AM, David Gibson wrote:
>>> On Mon, May 13, 2019 at 04:00:43PM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Friday 10 May 2019 03:23 PM, David Gibson wrote:
>>>>> On Fri, May 10, 2019 at 12:45:29PM +0530, Aravinda Prasad wrote:
>>>>>>
>>>>>>
>>>>>> On Friday 10 May 2019 12:16 PM, David Gibson wrote:
>>>>>>> On Mon, Apr 22, 2019 at 12:33:35PM +0530, Aravinda Prasad wrote:
>>>>>>>> Enable the KVM capability KVM_CAP_PPC_FWNMI so that
>>>>>>>> the KVM causes guest exit with NMI as exit reason
>>>>>>>> when it encounters a machine check exception on the
>>>>>>>> address belonging to a guest. Without this capability
>>>>>>>> enabled, KVM redirects machine check exceptions to
>>>>>>>> guest's 0x200 vector.
>>>>>>>>
>>>>>>>> This patch also deals with the case when a guest with
>>>>>>>> the KVM_CAP_PPC_FWNMI capability enabled is attempted
>>>>>>>> to migrate to a host that does not support this
>>>>>>>> capability.
>>>>>>>>
>>>>>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>>>>>> ---
>>>>>>>>  hw/ppc/spapr.c         |    1 +
>>>>>>>>  hw/ppc/spapr_caps.c    |   26 ++++++++++++++++++++++++++
>>>>>>>>  hw/ppc/spapr_rtas.c    |   14 ++++++++++++++
>>>>>>>>  include/hw/ppc/spapr.h |    4 +++-
>>>>>>>>  target/ppc/kvm.c       |   14 ++++++++++++++
>>>>>>>>  target/ppc/kvm_ppc.h   |    6 ++++++
>>>>>>>>  6 files changed, 64 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>>>> index ffd1715..44e09bb 100644
>>>>>>>> --- a/hw/ppc/spapr.c
>>>>>>>> +++ b/hw/ppc/spapr.c
>>>>>>>> @@ -4372,6 +4372,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>>>      smc->irq = &spapr_irq_xics;
>>>>>>>>      smc->dr_phb_enabled = true;
>>>>>>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>>>>>>> index edc5ed0..5b3af04 100644
>>>>>>>> --- a/hw/ppc/spapr_caps.c
>>>>>>>> +++ b/hw/ppc/spapr_caps.c
>>>>>>>> @@ -473,6 +473,22 @@ static void cap_ccf_assist_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void cap_fwnmi_mce_apply(SpaprMachineState *spapr, uint8_t val,
>>>>>>>> +                                Error **errp)
>>>>>>>> +{
>>>>>>>> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>>>>>>>> +
>>>>>>>> +    if (!val) {
>>>>>>>> +        return; /* Disabled by default */
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (kvm_enabled()) {
>>>>>>>> +        if (kvmppc_fwnmi_enable(cpu)) {
>>>>>>>> +            error_setg(errp, "Requested fwnmi capability not support by KVM");
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>>>>>      [SPAPR_CAP_HTM] = {
>>>>>>>>          .name = "htm",
>>>>>>>> @@ -571,6 +587,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>>>>>>          .type = "bool",
>>>>>>>>          .apply = cap_ccf_assist_apply,
>>>>>>>>      },
>>>>>>>> +    [SPAPR_CAP_FWNMI_MCE] = {
>>>>>>>> +        .name = "fwnmi-mce",
>>>>>>>> +        .description = "Handle fwnmi machine check exceptions",
>>>>>>>> +        .index = SPAPR_CAP_FWNMI_MCE,
>>>>>>>> +        .get = spapr_cap_get_bool,
>>>>>>>> +        .set = spapr_cap_set_bool,
>>>>>>>> +        .type = "bool",
>>>>>>>> +        .apply = cap_fwnmi_mce_apply,
>>>>>>>> +    },
>>>>>>>>  };
>>>>>>>>  
>>>>>>>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>>>>>>>> @@ -706,6 +731,7 @@ SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>>>>>>>>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>>>>>>>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>>>>>>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>>>>>>>> +SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI_MCE);
>>>>>>>>  
>>>>>>>>  void spapr_caps_init(SpaprMachineState *spapr)
>>>>>>>>  {
>>>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>>>>> index d3499f9..997cf19 100644
>>>>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>>>>> @@ -49,6 +49,7 @@
>>>>>>>>  #include "hw/ppc/fdt.h"
>>>>>>>>  #include "target/ppc/mmu-hash64.h"
>>>>>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>>>>>> +#include "kvm_ppc.h"
>>>>>>>>  
>>>>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>>>>>                                     uint32_t token, uint32_t nargs,
>>>>>>>> @@ -354,6 +355,7 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>>>>>                                    target_ulong args,
>>>>>>>>                                    uint32_t nret, target_ulong rets)
>>>>>>>>  {
>>>>>>>> +    int ret;
>>>>>>>>      uint64_t rtas_addr = spapr_get_rtas_addr();
>>>>>>>>  
>>>>>>>>      if (!rtas_addr) {
>>>>>>>> @@ -361,6 +363,18 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>>>>>          return;
>>>>>>>>      }
>>>>>>>>  
>>>>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>>>>>
>>>>>>> You shouldn't need this here as well as in cap_fwnmi_mce_apply().
>>>>>>>
>>>>>>> Instead, you should unconditionally fail the nmi-register if the
>>>>>>> capability is not enabled.
>>>>>>
>>>>>> cap_fwnmi is not enabled by default, because if it is enabled by default
>>>>>> them KVM will start routing machine check exceptions via guest exit
>>>>>> instead of routing it to guest's 0x200.
>>>>>>
>>>>>> During early boot since guest has not yet issued nmi-register, KVM is
>>>>>> expected to route exceptions to 0x200. Therefore we enable cap_fwnmi
>>>>>> only when a guest issues nmi-register.
>>>>>
>>>>> Except that's not true - you enable it in cap_fwnmi_mce_apply() which
>>>>> will be executed whenever the machine capability is enabled.
>>>>
>>>> I enable cap_fwnmi in cap_fwnmi_mce_apply() only when the "val" argument
>>>> (which is the effective cap value) is set. In early boot "val" is not
>>>> set as cap_fwnmi by default is not set, hence cap_fwnmi is not
>>>> enabled.
>>>
>>> Uh.. if that's true, something else is horribly wrong.  SPAPR caps are
>>> designed to have a fixed value for the lifetime of the VM.  Otherwise
>>> they will fail in their purpose of making sure we have a consistent
>>> environment across migrations.  So if the 'val' changes after the
>>> first call to apply(), then something is broken.
>>
>> If SPAPR caps are initialized before boot that do not change later, then
>> for cap_fwnmi, the default value is off at boot and the cap is enabled
>> only when guest issues "ibm,nmi-register" call. Should SPAPR caps be
>> updated when "ibm,nmi-register" is called?
> 
> So the confusing thing here is that there are spapr machine caps, and
> those are separate from the KVM caps for the VM.  Then the KVM caps
> also have whether the cap is possible and whether it is current
> activated.
> 
> The spapr machine caps *must* remain static for the VM's lifetime and
> only cover possibilities, not runtime configuration.  KVM caps may be
> activated as necessary.
> 
> So you can leave activating the KVM cap until nmi-register.  But if
> the spapr cap is disabled you must prohibit nmi-register.
> 
> The apply() functions are responsible for checking if the spapr caps
> are possible on the KVM implementation.  So if cap_fwnmi_mci_apply()
> is called with val==1 and KVM doesn't support the fwnmi extensions, it
> must fail outright.

I see, this clears my confusion on spapr machine caps and KVM caps..

> 
>>>> My understanding is that, cap_fwnmi_mce_apply() is also called during
>>>> migration on the target machine.
>>>
>>> Only in the sense that the machine is initialized before processing
>>> the incoming migration.  The capability values must be equal on either
>>> side of the migration (that's checked elsewhere).  Well, actually,
>>> you're allowed to increase the cap value across a migration, just not
>>> decrease it.
>>
>> Ah.. ok.. I am still familiarizing myself with the migration code..
>>
>>>
>>>> If effective cap for cap_fwnmi is
>>>> enabled on source machine than I think "val" will be set when
>>>> cap_fwnmi_mce_apply() is called on target machine.
>>>
>>> Nope.  The migrated value of the cap will be *validated* against the
>>> value set on the destination setup, but it won't *alter* the value on
>>> the destination (the result is that you have it enabled on the source,
>>> but not the destination, the migration will fail).
>>
>> But if cap_fwnmi is set on the host, which function is responsible to
> 
> I'm not sure what you mean by "the host" here.
> 
>> enable it on the destination? I think cap_fwnmi_mce_apply() is
>> responsible for enabling it on the destination.
> 
> Enabling the spapr cap?  It is set based on the command line and
> machine type, just like on the source machine.
> 
>> If that is the case
>> cap_fwnmi_mce_apply() should know if cap_fwnmi is set on the host and
>> the only way it can check that is based on the "val" argument passed on
>> to it.
>>
>> Or am I missing something here?
> 
> Probably, but I'm not sure exactly what.
> 
> The val argument to apply() is set to the value of the spapr cap.
> This is based on the qemu command line and machine type, and must be
> the same on source and destination.  As a general rule, qemu requires
> that the same machine options are used on source and destination.

Please ignore my previous statements this was made without clear
understanding of spapr machine caps and KVM caps.

I will resend the patches with the modifications.

> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 6/6] migration: Block migration while handling machine check
  2019-04-22  7:03   ` Aravinda Prasad
  (?)
  (?)
@ 2019-05-16 10:54   ` Greg Kurz
  2019-05-16 10:59     ` Aravinda Prasad
  -1 siblings, 1 reply; 65+ messages in thread
From: Greg Kurz @ 2019-05-16 10:54 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, qemu-devel, paulus, qemu-ppc, david

On Mon, 22 Apr 2019 12:33:45 +0530
Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:

> Block VM migration requests until the machine check
> error handling is complete as (i) these errors are
> specific to the source hardware and is irrelevant on
> the target hardware, (ii) these errors cause data
> corruption and should be handled before migration.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>  hw/ppc/spapr_rtas.c    |    4 ++++
>  include/hw/ppc/spapr.h |    3 +++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 4032db0..45b990c 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -41,6 +41,7 @@
>  #include "qemu/bcd.h"
>  #include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
> +#include "migration/blocker.h"
>  
>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>  #define   RTAS_LOG_VERSION_6                    0x06000000
> @@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    int ret;
> +    Error *local_err = NULL;
> +
> +    error_setg(&spapr->migration_blocker,
> +            "Live migration not supported during machine check handling");
> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);

migrate_add_blocker() propagates the reason of the failure in local_err,
ie. because a migration is already in progress or --only-migratable was
passed on the QEMU command line, along with the error message passed in
the first argument. This means that...

> +    if (ret < 0) {
> +        /*
> +         * We don't want to abort and let the migration to continue. In a
> +         * rare case, the machine check handler will run on the target
> +         * hardware. Though this is not preferable, it is better than aborting
> +         * the migration or killing the VM.
> +         */
> +        error_free(spapr->migration_blocker);
> +        fprintf(stderr, "Warning: Machine check during VM migration\n");

... you should just do:

        error_report_err(local_err);

This also takes care of freeing local_err which would be leaked otherwise.

> +    }
>  
>      while (spapr->mc_status != -1) {
>          /*
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 997cf19..1229a0e 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -50,6 +50,7 @@
>  #include "target/ppc/mmu-hash64.h"
>  #include "target/ppc/mmu-book3s-v3.h"
>  #include "kvm_ppc.h"
> +#include "migration/blocker.h"
>  
>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>          spapr->mc_status = -1;
>          qemu_cond_signal(&spapr->mc_delivery_cond);
>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        migrate_del_blocker(spapr->migration_blocker);
> +        error_free(spapr->migration_blocker);
> +        spapr->migration_blocker = NULL;
>      }
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9d16ad1..dda5fd2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -10,6 +10,7 @@
>  #include "hw/ppc/spapr_irq.h"
>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>  #include "hw/ppc/xics.h"        /* For ICSState */
> +#include "qapi/error.h"
>  
>  struct SpaprVioBus;
>  struct SpaprPhbState;
> @@ -213,6 +214,8 @@ struct SpaprMachineState {
>      SpaprCapabilities def, eff, mig;
>  
>      unsigned gpu_numa_id;
> +
> +    Error *migration_blocker;
>  };
>  
>  #define H_SUCCESS         0
> 
> 



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 6/6] migration: Block migration while handling machine check
  2019-05-16 10:54   ` Greg Kurz
@ 2019-05-16 10:59     ` Aravinda Prasad
  2019-05-16 14:17       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-16 10:59 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, qemu-devel, paulus, qemu-ppc, david



On Thursday 16 May 2019 04:24 PM, Greg Kurz wrote:
> On Mon, 22 Apr 2019 12:33:45 +0530
> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> 
>> Block VM migration requests until the machine check
>> error handling is complete as (i) these errors are
>> specific to the source hardware and is irrelevant on
>> the target hardware, (ii) these errors cause data
>> corruption and should be handled before migration.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>  include/hw/ppc/spapr.h |    3 +++
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 4032db0..45b990c 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -41,6 +41,7 @@
>>  #include "qemu/bcd.h"
>>  #include "hw/ppc/spapr_ovec.h"
>>  #include <libfdt.h>
>> +#include "migration/blocker.h"
>>  
>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>> @@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int ret;
>> +    Error *local_err = NULL;
>> +
>> +    error_setg(&spapr->migration_blocker,
>> +            "Live migration not supported during machine check handling");
>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
> 
> migrate_add_blocker() propagates the reason of the failure in local_err,
> ie. because a migration is already in progress or --only-migratable was
> passed on the QEMU command line, along with the error message passed in
> the first argument. This means that...
> 
>> +    if (ret < 0) {
>> +        /*
>> +         * We don't want to abort and let the migration to continue. In a
>> +         * rare case, the machine check handler will run on the target
>> +         * hardware. Though this is not preferable, it is better than aborting
>> +         * the migration or killing the VM.
>> +         */
>> +        error_free(spapr->migration_blocker);
>> +        fprintf(stderr, "Warning: Machine check during VM migration\n");
> 
> ... you should just do:
> 
>         error_report_err(local_err);
> 
> This also takes care of freeing local_err which would be leaked otherwise.

Sure. I am planning to use warn_report_err() as I don't want to abort.

Regards,
Aravinda

> 
>> +    }
>>  
>>      while (spapr->mc_status != -1) {
>>          /*
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 997cf19..1229a0e 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -50,6 +50,7 @@
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>>  #include "kvm_ppc.h"
>> +#include "migration/blocker.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +        migrate_del_blocker(spapr->migration_blocker);
>> +        error_free(spapr->migration_blocker);
>> +        spapr->migration_blocker = NULL;
>>      }
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 9d16ad1..dda5fd2 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -10,6 +10,7 @@
>>  #include "hw/ppc/spapr_irq.h"
>>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>>  #include "hw/ppc/xics.h"        /* For ICSState */
>> +#include "qapi/error.h"
>>  
>>  struct SpaprVioBus;
>>  struct SpaprPhbState;
>> @@ -213,6 +214,8 @@ struct SpaprMachineState {
>>      SpaprCapabilities def, eff, mig;
>>  
>>      unsigned gpu_numa_id;
>> +
>> +    Error *migration_blocker;
>>  };
>>  
>>  #define H_SUCCESS         0
>>
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 6/6] migration: Block migration while handling machine check
  2019-05-16 10:59     ` Aravinda Prasad
@ 2019-05-16 14:17       ` Dr. David Alan Gilbert
  2019-05-20  5:57         ` Aravinda Prasad
  0 siblings, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-16 14:17 UTC (permalink / raw)
  To: Aravinda Prasad; +Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc, david

* Aravinda Prasad (aravinda@linux.vnet.ibm.com) wrote:
> 
> 
> On Thursday 16 May 2019 04:24 PM, Greg Kurz wrote:
> > On Mon, 22 Apr 2019 12:33:45 +0530
> > Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
> > 
> >> Block VM migration requests until the machine check
> >> error handling is complete as (i) these errors are
> >> specific to the source hardware and is irrelevant on
> >> the target hardware, (ii) these errors cause data
> >> corruption and should be handled before migration.
> >>
> >> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
> >>  hw/ppc/spapr_rtas.c    |    4 ++++
> >>  include/hw/ppc/spapr.h |    3 +++
> >>  3 files changed, 24 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> >> index 4032db0..45b990c 100644
> >> --- a/hw/ppc/spapr_events.c
> >> +++ b/hw/ppc/spapr_events.c
> >> @@ -41,6 +41,7 @@
> >>  #include "qemu/bcd.h"
> >>  #include "hw/ppc/spapr_ovec.h"
> >>  #include <libfdt.h>
> >> +#include "migration/blocker.h"
> >>  
> >>  #define RTAS_LOG_VERSION_MASK                   0xff000000
> >>  #define   RTAS_LOG_VERSION_6                    0x06000000
> >> @@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
> >>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> >>  {
> >>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    int ret;
> >> +    Error *local_err = NULL;
> >> +
> >> +    error_setg(&spapr->migration_blocker,
> >> +            "Live migration not supported during machine check handling");
> >> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
> > 
> > migrate_add_blocker() propagates the reason of the failure in local_err,
> > ie. because a migration is already in progress or --only-migratable was
> > passed on the QEMU command line, along with the error message passed in
> > the first argument. This means that...
> > 
> >> +    if (ret < 0) {
> >> +        /*
> >> +         * We don't want to abort and let the migration to continue. In a
> >> +         * rare case, the machine check handler will run on the target
> >> +         * hardware. Though this is not preferable, it is better than aborting
> >> +         * the migration or killing the VM.
> >> +         */
> >> +        error_free(spapr->migration_blocker);
> >> +        fprintf(stderr, "Warning: Machine check during VM migration\n");
> > 
> > ... you should just do:
> > 
> >         error_report_err(local_err);
> > 
> > This also takes care of freeing local_err which would be leaked otherwise.
> 
> Sure. I am planning to use warn_report_err() as I don't want to abort.

I worry what the high level effect of this blocker will be.
Since failing hardware is a common reason for wanting to do a migrate
I worry that if the hardware is reporting lots of errors you might not
be able to migrate the VM to more solid hardware because of this
blocker.

Dave

> Regards,
> Aravinda
> 
> > 
> >> +    }
> >>  
> >>      while (spapr->mc_status != -1) {
> >>          /*
> >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >> index 997cf19..1229a0e 100644
> >> --- a/hw/ppc/spapr_rtas.c
> >> +++ b/hw/ppc/spapr_rtas.c
> >> @@ -50,6 +50,7 @@
> >>  #include "target/ppc/mmu-hash64.h"
> >>  #include "target/ppc/mmu-book3s-v3.h"
> >>  #include "kvm_ppc.h"
> >> +#include "migration/blocker.h"
> >>  
> >>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >>                                     uint32_t token, uint32_t nargs,
> >> @@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>          spapr->mc_status = -1;
> >>          qemu_cond_signal(&spapr->mc_delivery_cond);
> >>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >> +        migrate_del_blocker(spapr->migration_blocker);
> >> +        error_free(spapr->migration_blocker);
> >> +        spapr->migration_blocker = NULL;
> >>      }
> >>  }
> >>  
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index 9d16ad1..dda5fd2 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -10,6 +10,7 @@
> >>  #include "hw/ppc/spapr_irq.h"
> >>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
> >>  #include "hw/ppc/xics.h"        /* For ICSState */
> >> +#include "qapi/error.h"
> >>  
> >>  struct SpaprVioBus;
> >>  struct SpaprPhbState;
> >> @@ -213,6 +214,8 @@ struct SpaprMachineState {
> >>      SpaprCapabilities def, eff, mig;
> >>  
> >>      unsigned gpu_numa_id;
> >> +
> >> +    Error *migration_blocker;
> >>  };
> >>  
> >>  #define H_SUCCESS         0
> >>
> >>
> > 
> 
> -- 
> Regards,
> Aravinda
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 6/6] migration: Block migration while handling machine check
  2019-05-16 14:17       ` Dr. David Alan Gilbert
@ 2019-05-20  5:57         ` Aravinda Prasad
  0 siblings, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-20  5:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aik, Greg Kurz, qemu-devel, paulus, qemu-ppc, david



On Thursday 16 May 2019 07:47 PM, Dr. David Alan Gilbert wrote:
> * Aravinda Prasad (aravinda@linux.vnet.ibm.com) wrote:
>>
>>
>> On Thursday 16 May 2019 04:24 PM, Greg Kurz wrote:
>>> On Mon, 22 Apr 2019 12:33:45 +0530
>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote:
>>>
>>>> Block VM migration requests until the machine check
>>>> error handling is complete as (i) these errors are
>>>> specific to the source hardware and is irrelevant on
>>>> the target hardware, (ii) these errors cause data
>>>> corruption and should be handled before migration.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>>>  include/hw/ppc/spapr.h |    3 +++
>>>>  3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>> index 4032db0..45b990c 100644
>>>> --- a/hw/ppc/spapr_events.c
>>>> +++ b/hw/ppc/spapr_events.c
>>>> @@ -41,6 +41,7 @@
>>>>  #include "qemu/bcd.h"
>>>>  #include "hw/ppc/spapr_ovec.h"
>>>>  #include <libfdt.h>
>>>> +#include "migration/blocker.h"
>>>>  
>>>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>>>> @@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>  {
>>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>> +    int ret;
>>>> +    Error *local_err = NULL;
>>>> +
>>>> +    error_setg(&spapr->migration_blocker,
>>>> +            "Live migration not supported during machine check handling");
>>>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
>>>
>>> migrate_add_blocker() propagates the reason of the failure in local_err,
>>> ie. because a migration is already in progress or --only-migratable was
>>> passed on the QEMU command line, along with the error message passed in
>>> the first argument. This means that...
>>>
>>>> +    if (ret < 0) {
>>>> +        /*
>>>> +         * We don't want to abort and let the migration to continue. In a
>>>> +         * rare case, the machine check handler will run on the target
>>>> +         * hardware. Though this is not preferable, it is better than aborting
>>>> +         * the migration or killing the VM.
>>>> +         */
>>>> +        error_free(spapr->migration_blocker);
>>>> +        fprintf(stderr, "Warning: Machine check during VM migration\n");
>>>
>>> ... you should just do:
>>>
>>>         error_report_err(local_err);
>>>
>>> This also takes care of freeing local_err which would be leaked otherwise.
>>
>> Sure. I am planning to use warn_report_err() as I don't want to abort.
> 
> I worry what the high level effect of this blocker will be.
> Since failing hardware is a common reason for wanting to do a migrate
> I worry that if the hardware is reporting lots of errors you might not
> be able to migrate the VM to more solid hardware because of this
> blocker.

We handle two cases, (i) migration initiated during error handling which
we block as we don't want to migrate when we are handling the error. For
example, for memory errors, we need to take some actions like poisoning
the page. If we allow migration during error handling, the handler may
execute on the target host and may poison a clean page on the target.
But, a migration retry will succeed, (ii) errors reported after
migration is initiated: in such cases we let the migration continue
without blocking/aborting.

This is because memory errors are not very frequent, but are still
important to handle as it can cause data corruption. However, if the
hardware is reporting lots of errors, then the chances of host itself
crashing is very high.

> 
> Dave
> 
>> Regards,
>> Aravinda
>>
>>>
>>>> +    }
>>>>  
>>>>      while (spapr->mc_status != -1) {
>>>>          /*
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index 997cf19..1229a0e 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -50,6 +50,7 @@
>>>>  #include "target/ppc/mmu-hash64.h"
>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>>  #include "kvm_ppc.h"
>>>> +#include "migration/blocker.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>>          spapr->mc_status = -1;
>>>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>> +        migrate_del_blocker(spapr->migration_blocker);
>>>> +        error_free(spapr->migration_blocker);
>>>> +        spapr->migration_blocker = NULL;
>>>>      }
>>>>  }
>>>>  
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index 9d16ad1..dda5fd2 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -10,6 +10,7 @@
>>>>  #include "hw/ppc/spapr_irq.h"
>>>>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>>>>  #include "hw/ppc/xics.h"        /* For ICSState */
>>>> +#include "qapi/error.h"
>>>>  
>>>>  struct SpaprVioBus;
>>>>  struct SpaprPhbState;
>>>> @@ -213,6 +214,8 @@ struct SpaprMachineState {
>>>>      SpaprCapabilities def, eff, mig;
>>>>  
>>>>      unsigned gpu_numa_id;
>>>> +
>>>> +    Error *migration_blocker;
>>>>  };
>>>>  
>>>>  #define H_SUCCESS         0
>>>>
>>>>
>>>
>>
>> -- 
>> Regards,
>> Aravinda
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v8 6/6] migration: Block migration while handling machine check
  2019-05-10  6:51   ` David Gibson
  2019-05-10  7:16     ` Aravinda Prasad
@ 2019-05-29  5:46     ` Aravinda Prasad
  1 sibling, 0 replies; 65+ messages in thread
From: Aravinda Prasad @ 2019-05-29  5:46 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, aik, qemu-ppc, qemu-devel



On Friday 10 May 2019 12:21 PM, David Gibson wrote:
> On Mon, Apr 22, 2019 at 12:33:45PM +0530, Aravinda Prasad wrote:
>> Block VM migration requests until the machine check
>> error handling is complete as (i) these errors are
>> specific to the source hardware and is irrelevant on
>> the target hardware, (ii) these errors cause data
>> corruption and should be handled before migration.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>  include/hw/ppc/spapr.h |    3 +++
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 4032db0..45b990c 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -41,6 +41,7 @@
>>  #include "qemu/bcd.h"
>>  #include "hw/ppc/spapr_ovec.h"
>>  #include <libfdt.h>
>> +#include "migration/blocker.h"
>>  
>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>> @@ -864,6 +865,22 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, bool recovered)
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>  {
>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    int ret;
>> +    Error *local_err = NULL;
>> +
>> +    error_setg(&spapr->migration_blocker,
>> +            "Live migration not supported during machine check handling");
>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
>> +    if (ret < 0) {
>> +        /*
>> +         * We don't want to abort and let the migration to continue. In a
>> +         * rare case, the machine check handler will run on the target
>> +         * hardware. Though this is not preferable, it is better than aborting
>> +         * the migration or killing the VM.
>> +         */
>> +        error_free(spapr->migration_blocker);
>> +        fprintf(stderr, "Warning: Machine check during VM migration\n");
> 
> Use report_err() instead of a raw fprintf().
> 
>> +    }
>>  
>>      while (spapr->mc_status != -1) {
>>          /*
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 997cf19..1229a0e 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -50,6 +50,7 @@
>>  #include "target/ppc/mmu-hash64.h"
>>  #include "target/ppc/mmu-book3s-v3.h"
>>  #include "kvm_ppc.h"
>> +#include "migration/blocker.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -396,6 +397,9 @@ static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>          spapr->mc_status = -1;
>>          qemu_cond_signal(&spapr->mc_delivery_cond);
>>          rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +        migrate_del_blocker(spapr->migration_blocker);
>> +        error_free(spapr->migration_blocker);
>> +        spapr->migration_blocker = NULL;
>>      }
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 9d16ad1..dda5fd2 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -10,6 +10,7 @@
>>  #include "hw/ppc/spapr_irq.h"
>>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>>  #include "hw/ppc/xics.h"        /* For ICSState */
>> +#include "qapi/error.h"
>>  
>>  struct SpaprVioBus;
>>  struct SpaprPhbState;
>> @@ -213,6 +214,8 @@ struct SpaprMachineState {
>>      SpaprCapabilities def, eff, mig;
>>  
>>      unsigned gpu_numa_id;
>> +
>> +    Error *migration_blocker;
> 
> This name doesn't seem good - it's specific to fwnmi, not any other
> migration blockers we might have in future.  It also always contains
> the same string - could you just initialize that in a global and just
> do the migrate_add_blocker() / migrate_del_blocker() instead?

I retained it in SpaprMachineState instead of a global variable because
we add the blocker in spapr_events.c and delete it in spapr_rtas.c

But I have renamed it to fwnmi_migration_blocker.

Regards,
Aravinda

> 
>>  };
>>  
>>  #define H_SUCCESS         0
>>
> 

-- 
Regards,
Aravinda



^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2019-05-29  5:48 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-22  7:02 [Qemu-devel] [PATCH v8 0/6] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
2019-04-22  7:02 ` Aravinda Prasad
2019-04-22  7:02 ` [Qemu-devel] [PATCH v8 1/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
2019-04-22  7:02   ` Aravinda Prasad
2019-04-23  6:45   ` David Gibson
2019-04-23  6:45     ` David Gibson
2019-04-25  4:56     ` Aravinda Prasad
2019-04-25  4:56       ` Aravinda Prasad
2019-05-10  9:06   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-05-10  9:54     ` David Gibson
2019-05-10 14:33     ` Greg Kurz
2019-05-13  4:57       ` Aravinda Prasad
2019-05-13  4:53     ` Aravinda Prasad
2019-04-22  7:03 ` [Qemu-devel] [PATCH v8 2/6] Wrapper function to wait on condition for the main loop mutex Aravinda Prasad
2019-04-22  7:03   ` Aravinda Prasad
2019-04-23  6:47   ` David Gibson
2019-04-23  6:47     ` David Gibson
2019-05-10 13:14   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-04-22  7:03 ` [Qemu-devel] [PATCH v8 3/6] target/ppc: Handle NMI guest exit Aravinda Prasad
2019-04-22  7:03   ` Aravinda Prasad
2019-04-23  6:53   ` David Gibson
2019-04-23  6:53     ` David Gibson
2019-04-24  4:50     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2019-04-24  4:50       ` Aravinda Prasad
2019-05-10  6:37       ` David Gibson
2019-05-10  6:58         ` Aravinda Prasad
2019-05-10 16:25   ` Greg Kurz
2019-05-13  5:40     ` Aravinda Prasad
2019-05-13  5:56       ` David Gibson
2019-04-22  7:03 ` [Qemu-devel] [PATCH v8 4/6] target/ppc: Build rtas error log upon an MCE Aravinda Prasad
2019-04-22  7:03   ` Aravinda Prasad
2019-04-23 14:38   ` Fabiano Rosas
2019-04-23 14:38     ` Fabiano Rosas
2019-04-24  4:51     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2019-04-24  4:51       ` Aravinda Prasad
2019-05-10  6:42   ` [Qemu-devel] " David Gibson
2019-05-10  7:05     ` Aravinda Prasad
2019-05-10  9:52       ` David Gibson
2019-05-13  5:00         ` Aravinda Prasad
2019-05-13 11:30   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-05-14  0:08     ` David Gibson
2019-05-14  4:26       ` Aravinda Prasad
2019-05-14  4:40         ` David Gibson
2019-05-14  5:06           ` Aravinda Prasad
2019-05-16  1:47             ` David Gibson
2019-05-16  4:54               ` Aravinda Prasad
2019-04-22  7:03 ` [Qemu-devel] [PATCH v8 5/6] ppc: spapr: Enable FWNMI capability Aravinda Prasad
2019-04-22  7:03   ` Aravinda Prasad
2019-05-10  6:46   ` David Gibson
2019-05-10  7:15     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2019-05-10  9:53       ` David Gibson
2019-05-13 10:30         ` Aravinda Prasad
2019-05-14  4:47           ` David Gibson
2019-05-14  5:32             ` Aravinda Prasad
2019-05-16  1:45               ` David Gibson
2019-05-16  4:59                 ` Aravinda Prasad
2019-04-22  7:03 ` [Qemu-devel] [PATCH v8 6/6] migration: Block migration while handling machine check Aravinda Prasad
2019-04-22  7:03   ` Aravinda Prasad
2019-05-10  6:51   ` David Gibson
2019-05-10  7:16     ` Aravinda Prasad
2019-05-29  5:46     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2019-05-16 10:54   ` Greg Kurz
2019-05-16 10:59     ` Aravinda Prasad
2019-05-16 14:17       ` Dr. David Alan Gilbert
2019-05-20  5:57         ` Aravinda Prasad

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.