All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests
@ 2015-12-16  6:08 Aravinda Prasad
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 1/4] spapr: Register and handle HCALL to receive updated RTAS region Aravinda Prasad
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-16  6:08 UTC (permalink / raw)
  To: qemu-ppc, agraf, qemu-devel; +Cc: benh, aik, paulus, sam.bobroff, david

This series of patches add support for FWNMI in PowerKVM guests.

Memory error such as bit flips that cannot be corrected
by hardware is passed on to the kernel for handling
by raising machine check exception (an NMI). Upon such
machine check exception, if the address in error belongs
to guest then KVM causes a guest exit with KVM_EXIT_NMI
exit reason [1].

This patch series adds functionality to pass on such
machine check exception to the guest kernel by suitably
handling KVM_EXIT_NMI exit and building the error log.
This patch series explores the alternate design
discussed in the QEMU mailing list [2], [3].

In earlier design [2] KVM invoked guest's NMI interrupt
vector 0x200 upon machine check exception, while QEMU
patched 0x200 vector to issue hcall to facilitate
building and copying of error log into RTAS space.

The new design simplifies couple of things. Thanks to
David Gibson for the suggestion:

    - Eliminates patching of 0x200 interrupt vector from
      QEMU, hence, trampoline not required
    - Avoids introducing a new private hcall to be called
      from guest's 0x200 vector to enable QEMU to build
      error log
    - Simplifies re-trying by other processors when one
      of the processor is consuming/processing the error
      log.

Change Log v2:
    - Added KVM capability
    - Serialized multiple NMIs with a conditional wait

[1] http://marc.info/?l=kvm&m=145024538523497
[2] https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
[3] https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00898.html

---

Aravinda Prasad (4):
      spapr: Register and handle HCALL to receive updated RTAS region
      spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
      target-ppc: Handle NMI guest exit
      spapr: Introduce FWNMI KVM capability


 cpus.c                    |    5 +++
 hw/ppc/spapr.c            |    6 +++
 hw/ppc/spapr_hcall.c      |    8 ++++
 hw/ppc/spapr_rtas.c       |   53 +++++++++++++++++++++++++++
 include/hw/ppc/spapr.h    |   13 ++++++-
 include/qemu/main-loop.h  |    8 ++++
 linux-headers/linux/kvm.h |    1 +
 target-ppc/kvm.c          |   88 +++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h      |   81 +++++++++++++++++++++++++++++++++++++++++
 9 files changed, 261 insertions(+), 2 deletions(-)

-- 
Aravinda Prasad

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] spapr: Register and handle HCALL to receive updated RTAS region
  2015-12-16  6:08 [Qemu-devel] [PATCH v2 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
@ 2015-12-16  6:08 ` Aravinda Prasad
  2015-12-17  3:25   ` David Gibson
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 2/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-16  6:08 UTC (permalink / raw)
  To: qemu-ppc, agraf, qemu-devel; +Cc: benh, aik, paulus, sam.bobroff, david

Receive updates from SLOF about the updated rtas-base.
A separate patch for SLOF [1] adds functionality to invoke
a private HCALL whenever OS issues instantiate-rtas with
a new rtas-base.

This is required as QEMU needs to know the updated rtas-base
as it allocates error reporting structure in RTAS space upon
a machine check exception.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-August/120386.html

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr_hcall.c   |    8 ++++++++
 include/hw/ppc/spapr.h |    3 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 652ddf6..9703ed2 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -580,6 +580,13 @@ static target_ulong h_rtas(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                            nret, rtas_r3 + 12 + 4*nargs);
 }
 
+static target_ulong h_rtas_update(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+                                  target_ulong opcode, target_ulong *args)
+{
+    spapr->rtas_addr = args[0];
+    return 0;
+}
+
 static target_ulong h_logical_load(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                                    target_ulong opcode, target_ulong *args)
 {
@@ -1008,6 +1015,7 @@ static void hypercall_register_types(void)
 
     /* qemu/KVM-PPC specific hcalls */
     spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
+    spapr_register_hypercall(KVMPPC_H_RTAS_UPDATE, h_rtas_update);
 
     spapr_register_hypercall(H_SET_MODE, h_set_mode);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 91a61ab..b5cadd7 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -347,7 +347,8 @@ struct sPAPRMachineState {
 #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
 /* Client Architecture support */
 #define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
-#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
+#define KVMPPC_H_RTAS_UPDATE    (KVMPPC_HCALL_BASE + 0x3)
+#define KVMPPC_HCALL_MAX        KVMPPC_H_RTAS_UPDATE
 
 typedef struct sPAPRDeviceTreeUpdateHeader {
     uint32_t version_id;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2015-12-16  6:08 [Qemu-devel] [PATCH v2 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 1/4] spapr: Register and handle HCALL to receive updated RTAS region Aravinda Prasad
@ 2015-12-16  6:08 ` Aravinda Prasad
  2015-12-17  3:51   ` David Gibson
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 3/4] target-ppc: Handle NMI guest exit Aravinda Prasad
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability Aravinda Prasad
  3 siblings, 1 reply; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-16  6:08 UTC (permalink / raw)
  To: qemu-ppc, agraf, qemu-devel; +Cc: benh, aik, paulus, sam.bobroff, david

This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.

The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.

This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor waits till the first processor, which also
received a machine check error, is done reading the error
log. The first processor issues "ibm,nmi-interlock" call
when the error log is consumed. This patch implements the
releasing part of the error-log while subsequent patch
(which builds error log) handles the locking part.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c    |   36 ++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |   10 +++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 9869bc9..17c4672 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -597,6 +597,38 @@ out:
     rtas_st(rets, 0, rc);
 }
 
+static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
+                                  sPAPRMachineState *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args,
+                                  uint32_t nret, target_ulong rets)
+{
+    spapr->mc_in_progress = false;
+    qemu_cond_init(&spapr->mc_delivery_cond);
+    spapr->guest_machine_check_addr = rtas_ld(args, 1);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
+static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
+                                   sPAPRMachineState *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    if (!spapr->guest_machine_check_addr) {
+        /* NMI register not called */
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+    } else {
+        /*
+         * VCPU issuing "ibm,nmi-interlock" is done with NMI handling,
+         * hence unset mc_in_progress.
+         */
+        spapr->mc_in_progress = false;
+        qemu_cond_signal(&spapr->mc_delivery_cond);
+        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    }
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -747,6 +779,10 @@ static void core_rtas_register_types(void)
                         rtas_get_sensor_state);
     spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
                         rtas_ibm_configure_connector);
+    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
+                        rtas_ibm_nmi_register);
+    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
+                        rtas_ibm_nmi_interlock);
 }
 
 type_init(core_rtas_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index b5cadd7..de84a4e 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -74,6 +74,12 @@ struct sPAPRMachineState {
     /* RTAS state */
     QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
 
+    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
+    target_ulong guest_machine_check_addr;
+    bool mc_in_progress;
+    int mc_cpu;
+    QemuCond mc_delivery_cond;
+
     /*< public >*/
     char *kvm_type;
 };
@@ -458,8 +464,10 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 #define RTAS_IBM_SET_SLOT_RESET                 (RTAS_TOKEN_BASE + 0x23)
 #define RTAS_IBM_CONFIGURE_PE                   (RTAS_TOKEN_BASE + 0x24)
 #define RTAS_IBM_SLOT_ERROR_DETAIL              (RTAS_TOKEN_BASE + 0x25)
+#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x26)
+#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x27)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x26)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x28)
 
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] target-ppc: Handle NMI guest exit
  2015-12-16  6:08 [Qemu-devel] [PATCH v2 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 1/4] spapr: Register and handle HCALL to receive updated RTAS region Aravinda Prasad
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 2/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
@ 2015-12-16  6:08 ` Aravinda Prasad
  2015-12-17  4:00   ` David Gibson
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability Aravinda Prasad
  3 siblings, 1 reply; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-16  6:08 UTC (permalink / raw)
  To: qemu-ppc, agraf, qemu-devel; +Cc: benh, aik, paulus, sam.bobroff, david

Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases.

This patch handles KVM_EXIT_NMI exit. If the guest OS
has registered the machine check handling routine by
calling "ibm,nmi-register", then the handler builds
the error log and invokes the registered handler else
invokes the handler at 0x200.

[1] http://marc.info/?l=kvm&m=145024538523497

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 cpus.c                   |    5 +++
 hw/ppc/spapr.c           |    6 +++
 include/qemu/main-loop.h |    8 ++++
 target-ppc/kvm.c         |   86 ++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h     |   81 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 186 insertions(+)

diff --git a/cpus.c b/cpus.c
index dddd056..7b7dd0f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1154,6 +1154,11 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_cond_wait_iothread(QemuCond *cond)
+{
+    qemu_cond_wait(cond, &qemu_global_mutex);
+}
+
 static int all_vcpus_paused(void)
 {
     CPUState *cpu;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 05926a3..501dd70 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1556,6 +1556,12 @@ static void ppc_spapr_init(MachineState *machine)
         exit(1);
     }
     spapr->rtas_size = get_image_size(filename);
+
+    assert(spapr->rtas_size < RTAS_ERRLOG_OFFSET);
+
+    /* Resize blob to accommodate error log. */
+    spapr->rtas_size = RTAS_ERRLOG_OFFSET + sizeof(struct RtasMCELog);
+
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
     if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
         error_report("Could not load LPAR rtas '%s'", filename);
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 9976909..c4d4446 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -263,6 +263,14 @@ void qemu_mutex_lock_iothread(void);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/**
+ * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ *
+ * This function atomically releases the main loop mutex and causes
+ * the calling thread to block on the condition.
+ */
+void qemu_cond_wait_iothread(QemuCond *cond);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 110436d..2bbb46d 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1665,6 +1665,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ret = 0;
         break;
 
+    case KVM_EXIT_NMI:
+        DPRINTF("handle NMI exception\n");
+        ret = kvm_handle_nmi(cpu);
+        break;
+
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
@@ -2484,3 +2489,84 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
     return data & 0xffff;
 }
+
+int kvm_handle_nmi(PowerPCCPU *cpu)
+{
+    struct RtasMCELog mc_log;
+    CPUPPCState *env = &cpu->env;
+    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    target_ulong msr = 0;
+
+    cpu_synchronize_state(CPU(ppc_env_get_cpu(env)));
+
+    /*
+     * Properly set bits in MSR before we invoke the handler.
+     * SRR0/1, DAR and DSISR are properly set by KVM
+     */
+    if (!(*pcc->interrupts_big_endian)(cpu)) {
+        msr |= (1ULL << MSR_LE);
+    }
+
+    if (env->msr && (1ULL << MSR_SF)) {
+        msr |= (1ULL << MSR_SF);
+    }
+
+    msr |= (1ULL << MSR_ME);
+    env->msr = msr;
+
+    if (!spapr->guest_machine_check_addr) {
+        /*
+         * If OS has not registered with "ibm,nmi-register"
+         * jump to 0x200
+         */
+        env->nip = 0x200;
+        return 0;
+    }
+
+    while (spapr->mc_in_progress) {
+        /*
+         * Check whether the same CPU got machine check error
+         * while still handling the mc error (i.e., before
+         * that CPU called "ibm,nmi-interlock"
+         */
+        if (spapr->mc_cpu == cpu->cpu_dt_id) {
+            qemu_system_guest_panicked();
+        }
+        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
+    }
+    spapr->mc_in_progress = true;
+    spapr->mc_cpu = cpu->cpu_dt_id;
+
+    /* Set error log fields */
+    mc_log.r3 = env->gpr[3];
+    mc_log.err_log.byte0 = 0;
+    mc_log.err_log.byte1 =
+        (RTAS_SEVERITY_ERROR_SYNC << RTAS_ELOG_SEVERITY_SHIFT);
+    mc_log.err_log.byte1 |=
+        (RTAS_DISP_NOT_RECOVERED << RTAS_ELOG_DISPOSITION_SHIFT);
+    mc_log.err_log.byte2 =
+        (RTAS_INITIATOR_MEMORY << RTAS_ELOG_INITIATOR_SHIFT);
+    mc_log.err_log.byte2 |= RTAS_TARGET_MEMORY;
+
+    if (env->spr[SPR_DSISR] & P7_DSISR_MC_UE) {
+        mc_log.err_log.byte3 = RTAS_TYPE_ECC_UNCORR;
+    } else {
+        mc_log.err_log.byte3 = 0;
+    }
+
+    /* Handle all Host/Guest LE/BE combinations */
+    if (env->msr & (1ULL << MSR_LE)) {
+        mc_log.r3 = cpu_to_le64(mc_log.r3);
+    } else {
+        mc_log.r3 = cpu_to_be64(mc_log.r3);
+    }
+
+    cpu_physical_memory_write(spapr->rtas_addr + RTAS_ERRLOG_OFFSET,
+                              &mc_log, sizeof(mc_log));
+
+    env->nip = spapr->guest_machine_check_addr;
+    env->gpr[3] = spapr->rtas_addr + RTAS_ERRLOG_OFFSET;
+
+    return 0;
+}
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 5c1d334..ea3345b 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -53,6 +53,87 @@ void kvmppc_hash64_free_pteg(uint64_t token);
 void kvmppc_hash64_write_pte(CPUPPCState *env, target_ulong pte_index,
                              target_ulong pte0, target_ulong pte1);
 bool kvmppc_has_cap_fixup_hcalls(void);
+int kvm_handle_nmi(PowerPCCPU *cpu);
+
+/* Offset from rtas-base where error log is placed */
+#define RTAS_ERRLOG_OFFSET       0x200
+
+#define RTAS_ELOG_SEVERITY_SHIFT         0x5
+#define RTAS_ELOG_DISPOSITION_SHIFT      0x3
+#define RTAS_ELOG_INITIATOR_SHIFT        0x4
+
+/*
+ * Only required RTAS event severity, disposition, initiator
+ * target and type are copied from arch/powerpc/include/asm/rtas.h
+ */
+
+/* RTAS event severity */
+#define RTAS_SEVERITY_ERROR_SYNC    0x3
+
+/* RTAS event disposition */
+#define RTAS_DISP_NOT_RECOVERED     0x2
+
+/* RTAS event initiator */
+#define RTAS_INITIATOR_MEMORY       0x4
+
+/* RTAS event target */
+#define RTAS_TARGET_MEMORY          0x4
+
+/* RTAS event type */
+#define RTAS_TYPE_ECC_UNCORR        0x09
+
+/*
+ * Currently KVM only passes on the uncorrected machine
+ * check memory error to guest. Other machine check errors
+ * such as SLB multi-hit and TLB multi-hit are recovered
+ * in KVM and are not passed on to guest.
+ *
+ * DSISR Bit for uncorrected machine check error. Based
+ * on arch/powerpc/include/asm/mce.h
+ */
+#define PPC_BIT(bit)                (0x8000000000000000ULL >> bit)
+#define P7_DSISR_MC_UE              (PPC_BIT(48))  /* P8 too */
+
+/* Adopted from kernel source arch/powerpc/include/asm/rtas.h */
+struct rtas_error_log {
+    /* Byte 0 */
+    uint8_t     byte0;          /* Architectural version */
+
+    /* Byte 1 */
+    uint8_t     byte1;
+    /* XXXXXXXX
+     * XXX      3: Severity level of error
+     *    XX    2: Degree of recovery
+     *      X   1: Extended log present?
+     *       XX 2: Reserved
+     */
+
+    /* Byte 2 */
+    uint8_t     byte2;
+    /* XXXXXXXX
+     * XXXX     4: Initiator of event
+     *     XXXX 4: Target of failed operation
+     */
+    uint8_t     byte3;          /* General event or error*/
+    __be32      extended_log_length;    /* length in bytes */
+    unsigned char   buffer[1];      /* Start of extended log */
+                                /* Variable length.      */
+};
+
+/*
+ * Data format in RTAS-Blob
+ *
+ * This structure contains error information related to Machine
+ * Check exception. This is filled up and copied to rtas-blob
+ * upon machine check exception. The address of rtas-blob is
+ * passed on to OS registered machine check notification
+ * routines upon machine check exception
+ */
+struct RtasMCELog {
+    target_ulong r3;
+    struct rtas_error_log err_log;
+};
+
 
 #else
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability
  2015-12-16  6:08 [Qemu-devel] [PATCH v2 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
                   ` (2 preceding siblings ...)
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 3/4] target-ppc: Handle NMI guest exit Aravinda Prasad
@ 2015-12-16  6:08 ` Aravinda Prasad
  2015-12-17  4:02   ` David Gibson
  2015-12-17  7:03   ` Thomas Huth
  3 siblings, 2 replies; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-16  6:08 UTC (permalink / raw)
  To: qemu-ppc, agraf, qemu-devel; +Cc: benh, aik, paulus, sam.bobroff, david

Introduce a new KVM capability to control how KVM
behaves on machine check exception.

Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector if the address in
error belongs to guest. With this capability KVM
causes a guest exit with NMI exit reason.

This is required to avoid problem if a new kernel/KVM
is used with an old QEMU. As old QEMU might not
understand the new NMI exit type and treat it as a
fatal error, even though the guest could have actually
handled the error if the exception was delivered to
guest's 0x200 interrupt vector.

PS: KVM_CAP_PPC_FWNMI is set to 121 as 119 and 120 are
used by KVM_CAP_GUEST_DEBUG_HW_BPS and KVM_CAP_GUEST_DEBUG_HW_WPS
in KVM code, but still not reflected in QEMU code.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c       |   17 +++++++++++++++++
 linux-headers/linux/kvm.h |    1 +
 target-ppc/kvm.c          |    2 ++
 3 files changed, 20 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 17c4672..53319da 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -38,6 +38,8 @@
 #include <libfdt.h>
 #include "hw/ppc/spapr_drc.h"
 
+extern int cap_fwnmi;
+
 /* #define DEBUG_SPAPR */
 
 #ifdef DEBUG_SPAPR
@@ -603,9 +605,24 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
                                   target_ulong args,
                                   uint32_t nret, target_ulong rets)
 {
+    int ret;
+    CPUState *cs = CPU(cpu);
+
+    if (!cap_fwnmi) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
     spapr->mc_in_progress = false;
     qemu_cond_init(&spapr->mc_delivery_cond);
     spapr->guest_machine_check_addr = rtas_ld(args, 1);
+
+    ret = kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
+    if (ret < 0) {
+        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
+        return;
+    }
+
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
 
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 683f713..2db1fba 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -819,6 +819,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_DISABLE_QUIRKS 116
 #define KVM_CAP_X86_SMM 117
 #define KVM_CAP_MULTI_ADDRESS_SPACE 118
+#define KVM_CAP_PPC_FWNMI 121
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 2bbb46d..5339c04 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -74,6 +74,7 @@ static int cap_ppc_watchdog;
 static int cap_papr;
 static int cap_htab_fd;
 static int cap_fixup_hcalls;
+int cap_fwnmi;
 
 static uint32_t debug_inst_opcode;
 
@@ -116,6 +117,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
      * only activated after this by kvmppc_set_papr() */
     cap_htab_fd = kvm_check_extension(s, KVM_CAP_PPC_HTAB_FD);
     cap_fixup_hcalls = kvm_check_extension(s, KVM_CAP_PPC_FIXUP_HCALL);
+    cap_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] spapr: Register and handle HCALL to receive updated RTAS region
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 1/4] spapr: Register and handle HCALL to receive updated RTAS region Aravinda Prasad
@ 2015-12-17  3:25   ` David Gibson
  0 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2015-12-17  3:25 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: benh, aik, agraf, qemu-devel, qemu-ppc, paulus, sam.bobroff

[-- Attachment #1: Type: text/plain, Size: 834 bytes --]

On Wed, Dec 16, 2015 at 11:38:13AM +0530, Aravinda Prasad wrote:
> Receive updates from SLOF about the updated rtas-base.
> A separate patch for SLOF [1] adds functionality to invoke
> a private HCALL whenever OS issues instantiate-rtas with
> a new rtas-base.
> 
> This is required as QEMU needs to know the updated rtas-base
> as it allocates error reporting structure in RTAS space upon
> a machine check exception.
> 
> [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-August/120386.html
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 2/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
@ 2015-12-17  3:51   ` David Gibson
  2015-12-17  4:45     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  0 siblings, 1 reply; 14+ messages in thread
From: David Gibson @ 2015-12-17  3:51 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: benh, aik, agraf, qemu-devel, qemu-ppc, paulus, sam.bobroff

[-- Attachment #1: Type: text/plain, Size: 5437 bytes --]

On Wed, Dec 16, 2015 at 11:38:22AM +0530, Aravinda Prasad wrote:
> This patch adds support in QEMU to handle "ibm,nmi-register"
> and "ibm,nmi-interlock" RTAS calls.
> 
> The machine check notification address is saved when the
> OS issues "ibm,nmi-register" RTAS call.
> 
> This patch also handles the case when multiple processors
> experience machine check at or about the same time by
> handling "ibm,nmi-interlock" call. In such cases, as per
> PAPR, subsequent processors serialize waiting for the first
> processor to issue the "ibm,nmi-interlock" call. The second
> processor waits till the first processor, which also
> received a machine check error, is done reading the error
> log. The first processor issues "ibm,nmi-interlock" call
> when the error log is consumed. This patch implements the
> releasing part of the error-log while subsequent patch
> (which builds error log) handles the locking part.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c    |   36 ++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |   10 +++++++++-
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 9869bc9..17c4672 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -597,6 +597,38 @@ out:
>      rtas_st(rets, 0, rc);
>  }
>  
> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> +                                  sPAPRMachineState *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args,
> +                                  uint32_t nret, target_ulong rets)
> +{
> +    spapr->mc_in_progress = false;
> +    qemu_cond_init(&spapr->mc_delivery_cond);

I think initializing mc_in_progress and the condition variable should
go in the overall initialization (in fact during system reset) instead
of here.  Things using these shouldn't be invoked until after the
nmi_register, but it's safer to have the qemu internal variables
initialized beforehand.

> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}

It also looks like you need to add code to clear
guest_machine_check_addr if the system is reset

> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> +                                   sPAPRMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    if (!spapr->guest_machine_check_addr) {
> +        /* NMI register not called */
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +    } else {
> +        /*
> +         * VCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> +         * hence unset mc_in_progress.
> +         */
> +        spapr->mc_in_progress = false;
> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    }
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -747,6 +779,10 @@ static void core_rtas_register_types(void)
>                          rtas_get_sensor_state);
>      spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
>                          rtas_ibm_configure_connector);
> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
> +                        rtas_ibm_nmi_register);
> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
> +                        rtas_ibm_nmi_interlock);
>  }
>  
>  type_init(core_rtas_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b5cadd7..de84a4e 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -74,6 +74,12 @@ struct sPAPRMachineState {
>      /* RTAS state */
>      QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
>  
> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
> +    target_ulong guest_machine_check_addr;
> +    bool mc_in_progress;
> +    int mc_cpu;

mc_cpu doesn't appear to be used.

I think guest_machine_check_addr and mc_in_progress will also need to
be added to migration state.

> +    QemuCond mc_delivery_cond;
> +
>      /*< public >*/
>      char *kvm_type;
>  };
> @@ -458,8 +464,10 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
>  #define RTAS_IBM_SET_SLOT_RESET                 (RTAS_TOKEN_BASE + 0x23)
>  #define RTAS_IBM_CONFIGURE_PE                   (RTAS_TOKEN_BASE + 0x24)
>  #define RTAS_IBM_SLOT_ERROR_DETAIL              (RTAS_TOKEN_BASE + 0x25)
> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x26)
> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x27)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x26)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x28)
>  
>  /* RTAS ibm,get-system-parameter token values */
>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/4] target-ppc: Handle NMI guest exit
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 3/4] target-ppc: Handle NMI guest exit Aravinda Prasad
@ 2015-12-17  4:00   ` David Gibson
  2015-12-17  5:01     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
  0 siblings, 1 reply; 14+ messages in thread
From: David Gibson @ 2015-12-17  4:00 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: benh, aik, agraf, qemu-devel, qemu-ppc, paulus, sam.bobroff

[-- Attachment #1: Type: text/plain, Size: 9892 bytes --]

On Wed, Dec 16, 2015 at 11:38:37AM +0530, Aravinda Prasad wrote:
> Memory error such as bit flips that cannot be corrected
> by hardware are passed on to the kernel for handling.
> If the memory address in error belongs to guest then
> guest kernel is responsible for taking suitable action.
> Patch [1] enhances KVM to exit guest with exit reason
> set to KVM_EXIT_NMI in such cases.
> 
> This patch handles KVM_EXIT_NMI exit. If the guest OS
> has registered the machine check handling routine by
> calling "ibm,nmi-register", then the handler builds
> the error log and invokes the registered handler else
> invokes the handler at 0x200.
> 
> [1] http://marc.info/?l=kvm&m=145024538523497
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  cpus.c                   |    5 +++
>  hw/ppc/spapr.c           |    6 +++
>  include/qemu/main-loop.h |    8 ++++
>  target-ppc/kvm.c         |   86 ++++++++++++++++++++++++++++++++++++++++++++++
>  target-ppc/kvm_ppc.h     |   81 +++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 186 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index dddd056..7b7dd0f 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1154,6 +1154,11 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> +void qemu_cond_wait_iothread(QemuCond *cond)
> +{
> +    qemu_cond_wait(cond, &qemu_global_mutex);
> +}
> +

Even though it's trivial, this generic change should probably go in a
separate patch from the papr specific pieces.

Speaking of which, I think it's sufficiently trivial you could just
inline it in the header.

>  static int all_vcpus_paused(void)
>  {
>      CPUState *cpu;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 05926a3..501dd70 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1556,6 +1556,12 @@ static void ppc_spapr_init(MachineState *machine)
>          exit(1);
>      }
>      spapr->rtas_size = get_image_size(filename);
> +
> +    assert(spapr->rtas_size < RTAS_ERRLOG_OFFSET);
> +
> +    /* Resize blob to accommodate error log. */
> +    spapr->rtas_size = RTAS_ERRLOG_OFFSET + sizeof(struct RtasMCELog);
> +
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>          error_report("Could not load LPAR rtas '%s'", filename);
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index 9976909..c4d4446 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -263,6 +263,14 @@ void qemu_mutex_lock_iothread(void);
>   */
>  void qemu_mutex_unlock_iothread(void);
>  
> +/**
> + * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
> + *
> + * This function atomically releases the main loop mutex and causes
> + * the calling thread to block on the condition.
> + */
> +void qemu_cond_wait_iothread(QemuCond *cond);
> +
>  /* internal interfaces */
>  
>  void qemu_fd_register(int fd);
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 110436d..2bbb46d 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -1665,6 +1665,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ret = 0;
>          break;
>  
> +    case KVM_EXIT_NMI:
> +        DPRINTF("handle NMI exception\n");
> +        ret = kvm_handle_nmi(cpu);
> +        break;
> +
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> @@ -2484,3 +2489,84 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>      return data & 0xffff;
>  }
> +
> +int kvm_handle_nmi(PowerPCCPU *cpu)
> +{
> +    struct RtasMCELog mc_log;
> +    CPUPPCState *env = &cpu->env;

You go from cpu to env here..

> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    target_ulong msr = 0;
> +
> +    cpu_synchronize_state(CPU(ppc_env_get_cpu(env)));

Then back again awkwardly, although you still have the cpu variable.

> +
> +    /*
> +     * Properly set bits in MSR before we invoke the handler.
> +     * SRR0/1, DAR and DSISR are properly set by KVM
> +     */
> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
> +        msr |= (1ULL << MSR_LE);
> +    }
> +
> +    if (env->msr && (1ULL << MSR_SF)) {
> +        msr |= (1ULL << MSR_SF);
> +    }
> +
> +    msr |= (1ULL << MSR_ME);

Based on earlier discussions, sounds like assert(msr & (1ULL <<
MSR_ME)) would actually be correct here.

> +    env->msr = msr;
> +
> +    if (!spapr->guest_machine_check_addr) {
> +        /*
> +         * If OS has not registered with "ibm,nmi-register"
> +         * jump to 0x200
> +         */
> +        env->nip = 0x200;
> +        return 0;
> +    }
> +
> +    while (spapr->mc_in_progress) {
> +        /*
> +         * Check whether the same CPU got machine check error
> +         * while still handling the mc error (i.e., before
> +         * that CPU called "ibm,nmi-interlock"
> +         */
> +        if (spapr->mc_cpu == cpu->cpu_dt_id) {
> +            qemu_system_guest_panicked();
> +        }
> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
> +    }
> +    spapr->mc_in_progress = true;
> +    spapr->mc_cpu = cpu->cpu_dt_id;
> +
> +    /* Set error log fields */
> +    mc_log.r3 = env->gpr[3];
> +    mc_log.err_log.byte0 = 0;
> +    mc_log.err_log.byte1 =
> +        (RTAS_SEVERITY_ERROR_SYNC << RTAS_ELOG_SEVERITY_SHIFT);
> +    mc_log.err_log.byte1 |=
> +        (RTAS_DISP_NOT_RECOVERED << RTAS_ELOG_DISPOSITION_SHIFT);
> +    mc_log.err_log.byte2 =
> +        (RTAS_INITIATOR_MEMORY << RTAS_ELOG_INITIATOR_SHIFT);
> +    mc_log.err_log.byte2 |= RTAS_TARGET_MEMORY;
> +
> +    if (env->spr[SPR_DSISR] & P7_DSISR_MC_UE) {
> +        mc_log.err_log.byte3 = RTAS_TYPE_ECC_UNCORR;
> +    } else {
> +        mc_log.err_log.byte3 = 0;
> +    }
> +
> +    /* Handle all Host/Guest LE/BE combinations */
> +    if (env->msr & (1ULL << MSR_LE)) {
> +        mc_log.r3 = cpu_to_le64(mc_log.r3);
> +    } else {
> +        mc_log.r3 = cpu_to_be64(mc_log.r3);
> +    }
> +
> +    cpu_physical_memory_write(spapr->rtas_addr + RTAS_ERRLOG_OFFSET,
> +                              &mc_log, sizeof(mc_log));
> +
> +    env->nip = spapr->guest_machine_check_addr;
> +    env->gpr[3] = spapr->rtas_addr + RTAS_ERRLOG_OFFSET;
> +
> +    return 0;
> +}
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 5c1d334..ea3345b 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -53,6 +53,87 @@ void kvmppc_hash64_free_pteg(uint64_t token);
>  void kvmppc_hash64_write_pte(CPUPPCState *env, target_ulong pte_index,
>                               target_ulong pte0, target_ulong pte1);
>  bool kvmppc_has_cap_fixup_hcalls(void);
> +int kvm_handle_nmi(PowerPCCPU *cpu);
> +
> +/* Offset from rtas-base where error log is placed */
> +#define RTAS_ERRLOG_OFFSET       0x200
> +
> +#define RTAS_ELOG_SEVERITY_SHIFT         0x5
> +#define RTAS_ELOG_DISPOSITION_SHIFT      0x3
> +#define RTAS_ELOG_INITIATOR_SHIFT        0x4
> +
> +/*
> + * Only required RTAS event severity, disposition, initiator
> + * target and type are copied from arch/powerpc/include/asm/rtas.h
> + */
> +
> +/* RTAS event severity */
> +#define RTAS_SEVERITY_ERROR_SYNC    0x3
> +
> +/* RTAS event disposition */
> +#define RTAS_DISP_NOT_RECOVERED     0x2
> +
> +/* RTAS event initiator */
> +#define RTAS_INITIATOR_MEMORY       0x4
> +
> +/* RTAS event target */
> +#define RTAS_TARGET_MEMORY          0x4
> +
> +/* RTAS event type */
> +#define RTAS_TYPE_ECC_UNCORR        0x09
> +
> +/*
> + * Currently KVM only passes on the uncorrected machine
> + * check memory error to guest. Other machine check errors
> + * such as SLB multi-hit and TLB multi-hit are recovered
> + * in KVM and are not passed on to guest.
> + *
> + * DSISR Bit for uncorrected machine check error. Based
> + * on arch/powerpc/include/asm/mce.h
> + */
> +#define PPC_BIT(bit)                (0x8000000000000000ULL >> bit)
> +#define P7_DSISR_MC_UE              (PPC_BIT(48))  /* P8 too */
> +
> +/* Adopted from kernel source arch/powerpc/include/asm/rtas.h */
> +struct rtas_error_log {
> +    /* Byte 0 */
> +    uint8_t     byte0;          /* Architectural version */
> +
> +    /* Byte 1 */
> +    uint8_t     byte1;
> +    /* XXXXXXXX
> +     * XXX      3: Severity level of error
> +     *    XX    2: Degree of recovery
> +     *      X   1: Extended log present?
> +     *       XX 2: Reserved
> +     */
> +
> +    /* Byte 2 */
> +    uint8_t     byte2;
> +    /* XXXXXXXX
> +     * XXXX     4: Initiator of event
> +     *     XXXX 4: Target of failed operation
> +     */
> +    uint8_t     byte3;          /* General event or error*/
> +    __be32      extended_log_length;    /* length in bytes */
> +    unsigned char   buffer[1];      /* Start of extended log */
> +                                /* Variable length.      */
> +};
> +
> +/*
> + * Data format in RTAS-Blob
> + *
> + * This structure contains error information related to Machine
> + * Check exception. This is filled up and copied to rtas-blob
> + * upon machine check exception. The address of rtas-blob is
> + * passed on to OS registered machine check notification
> + * routines upon machine check exception
> + */
> +struct RtasMCELog {
> +    target_ulong r3;
> +    struct rtas_error_log err_log;
> +};
> +
>  
>  #else
>  
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability Aravinda Prasad
@ 2015-12-17  4:02   ` David Gibson
  2015-12-17  4:38     ` Aravinda Prasad
  2015-12-17  7:03   ` Thomas Huth
  1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2015-12-17  4:02 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: benh, aik, agraf, qemu-devel, qemu-ppc, paulus, sam.bobroff

[-- Attachment #1: Type: text/plain, Size: 3758 bytes --]

On Wed, Dec 16, 2015 at 11:38:47AM +0530, Aravinda Prasad wrote:
> Introduce a new KVM capability to control how KVM
> behaves on machine check exception.
> 
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problem if a new kernel/KVM
> is used with an old QEMU. As old QEMU might not
> understand the new NMI exit type and treat it as a
> fatal error, even though the guest could have actually
> handled the error if the exception was delivered to
> guest's 0x200 interrupt vector.
> 
> PS: KVM_CAP_PPC_FWNMI is set to 121 as 119 and 120 are
> used by KVM_CAP_GUEST_DEBUG_HW_BPS and KVM_CAP_GUEST_DEBUG_HW_WPS
> in KVM code, but still not reflected in QEMU code.

The commit message seems to be written as if this were the kernel
patch adding the capability there, rather than the qemu patch using
it.

> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c       |   17 +++++++++++++++++
>  linux-headers/linux/kvm.h |    1 +
>  target-ppc/kvm.c          |    2 ++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 17c4672..53319da 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -38,6 +38,8 @@
>  #include <libfdt.h>
>  #include "hw/ppc/spapr_drc.h"
>  
> +extern int cap_fwnmi;
> +
>  /* #define DEBUG_SPAPR */
>  
>  #ifdef DEBUG_SPAPR
> @@ -603,9 +605,24 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>                                    target_ulong args,
>                                    uint32_t nret, target_ulong rets)
>  {
> +    int ret;
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!cap_fwnmi) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
>      spapr->mc_in_progress = false;
>      qemu_cond_init(&spapr->mc_delivery_cond);
>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +
> +    ret = kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> +    if (ret < 0) {
> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +        return;
> +    }
> +
>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
>  
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index 683f713..2db1fba 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -819,6 +819,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_DISABLE_QUIRKS 116
>  #define KVM_CAP_X86_SMM 117
>  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
> +#define KVM_CAP_PPC_FWNMI 121
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 2bbb46d..5339c04 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -74,6 +74,7 @@ static int cap_ppc_watchdog;
>  static int cap_papr;
>  static int cap_htab_fd;
>  static int cap_fixup_hcalls;
> +int cap_fwnmi;
>  
>  static uint32_t debug_inst_opcode;
>  
> @@ -116,6 +117,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>       * only activated after this by kvmppc_set_papr() */
>      cap_htab_fd = kvm_check_extension(s, KVM_CAP_PPC_HTAB_FD);
>      cap_fixup_hcalls = kvm_check_extension(s, KVM_CAP_PPC_FIXUP_HCALL);
> +    cap_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>  
>      if (!cap_interrupt_level) {
>          fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability
  2015-12-17  4:02   ` David Gibson
@ 2015-12-17  4:38     ` Aravinda Prasad
  2016-10-13  2:02       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-17  4:38 UTC (permalink / raw)
  To: David Gibson; +Cc: benh, aik, agraf, qemu-devel, paulus, qemu-ppc, sam.bobroff



On Thursday 17 December 2015 09:32 AM, David Gibson wrote:
> On Wed, Dec 16, 2015 at 11:38:47AM +0530, Aravinda Prasad wrote:
>> Introduce a new KVM capability to control how KVM
>> behaves on machine check exception.
>>
>> Without this capability, KVM redirects machine check
>> exceptions to guest's 0x200 vector if the address in
>> error belongs to guest. With this capability KVM
>> causes a guest exit with NMI exit reason.
>>
>> This is required to avoid problem if a new kernel/KVM
>> is used with an old QEMU. As old QEMU might not
>> understand the new NMI exit type and treat it as a
>> fatal error, even though the guest could have actually
>> handled the error if the exception was delivered to
>> guest's 0x200 interrupt vector.
>>
>> PS: KVM_CAP_PPC_FWNMI is set to 121 as 119 and 120 are
>> used by KVM_CAP_GUEST_DEBUG_HW_BPS and KVM_CAP_GUEST_DEBUG_HW_WPS
>> in KVM code, but still not reflected in QEMU code.
> 
> The commit message seems to be written as if this were the kernel
> patch adding the capability there, rather than the qemu patch using
> it.
> 

I will reword it.

Regards,
Aravinda

>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_rtas.c       |   17 +++++++++++++++++
>>  linux-headers/linux/kvm.h |    1 +
>>  target-ppc/kvm.c          |    2 ++
>>  3 files changed, 20 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 17c4672..53319da 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -38,6 +38,8 @@
>>  #include <libfdt.h>
>>  #include "hw/ppc/spapr_drc.h"
>>  
>> +extern int cap_fwnmi;
>> +
>>  /* #define DEBUG_SPAPR */
>>  
>>  #ifdef DEBUG_SPAPR
>> @@ -603,9 +605,24 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>                                    target_ulong args,
>>                                    uint32_t nret, target_ulong rets)
>>  {
>> +    int ret;
>> +    CPUState *cs = CPU(cpu);
>> +
>> +    if (!cap_fwnmi) {
>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>> +        return;
>> +    }
>> +
>>      spapr->mc_in_progress = false;
>>      qemu_cond_init(&spapr->mc_delivery_cond);
>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +
>> +    ret = kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>> +    if (ret < 0) {
>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> +        return;
>> +    }
>> +
>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>  }
>>  
>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>> index 683f713..2db1fba 100644
>> --- a/linux-headers/linux/kvm.h
>> +++ b/linux-headers/linux/kvm.h
>> @@ -819,6 +819,7 @@ struct kvm_ppc_smmu_info {
>>  #define KVM_CAP_DISABLE_QUIRKS 116
>>  #define KVM_CAP_X86_SMM 117
>>  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
>> +#define KVM_CAP_PPC_FWNMI 121
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index 2bbb46d..5339c04 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -74,6 +74,7 @@ static int cap_ppc_watchdog;
>>  static int cap_papr;
>>  static int cap_htab_fd;
>>  static int cap_fixup_hcalls;
>> +int cap_fwnmi;
>>  
>>  static uint32_t debug_inst_opcode;
>>  
>> @@ -116,6 +117,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>       * only activated after this by kvmppc_set_papr() */
>>      cap_htab_fd = kvm_check_extension(s, KVM_CAP_PPC_HTAB_FD);
>>      cap_fixup_hcalls = kvm_check_extension(s, KVM_CAP_PPC_FIXUP_HCALL);
>> +    cap_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>  
>>      if (!cap_interrupt_level) {
>>          fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
>>
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
  2015-12-17  3:51   ` David Gibson
@ 2015-12-17  4:45     ` Aravinda Prasad
  0 siblings, 0 replies; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-17  4:45 UTC (permalink / raw)
  To: David Gibson; +Cc: benh, qemu-devel, qemu-ppc, paulus, sam.bobroff



On Thursday 17 December 2015 09:21 AM, David Gibson wrote:
> On Wed, Dec 16, 2015 at 11:38:22AM +0530, Aravinda Prasad wrote:
>> This patch adds support in QEMU to handle "ibm,nmi-register"
>> and "ibm,nmi-interlock" RTAS calls.
>>
>> The machine check notification address is saved when the
>> OS issues "ibm,nmi-register" RTAS call.
>>
>> This patch also handles the case when multiple processors
>> experience machine check at or about the same time by
>> handling "ibm,nmi-interlock" call. In such cases, as per
>> PAPR, subsequent processors serialize waiting for the first
>> processor to issue the "ibm,nmi-interlock" call. The second
>> processor waits till the first processor, which also
>> received a machine check error, is done reading the error
>> log. The first processor issues "ibm,nmi-interlock" call
>> when the error log is consumed. This patch implements the
>> releasing part of the error-log while subsequent patch
>> (which builds error log) handles the locking part.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_rtas.c    |   36 ++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |   10 +++++++++-
>>  2 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 9869bc9..17c4672 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -597,6 +597,38 @@ out:
>>      rtas_st(rets, 0, rc);
>>  }
>>  
>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>> +                                  sPAPRMachineState *spapr,
>> +                                  uint32_t token, uint32_t nargs,
>> +                                  target_ulong args,
>> +                                  uint32_t nret, target_ulong rets)
>> +{
>> +    spapr->mc_in_progress = false;
>> +    qemu_cond_init(&spapr->mc_delivery_cond);
> 
> I think initializing mc_in_progress and the condition variable should
> go in the overall initialization (in fact during system reset) instead
> of here.  Things using these shouldn't be invoked until after the
> nmi_register, but it's safer to have the qemu internal variables
> initialized beforehand.

sure

> 
>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
> 
> It also looks like you need to add code to clear
> guest_machine_check_addr if the system is reset

will add it

> 
>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>> +                                   sPAPRMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    if (!spapr->guest_machine_check_addr) {
>> +        /* NMI register not called */
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +    } else {
>> +        /*
>> +         * VCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>> +         * hence unset mc_in_progress.
>> +         */
>> +        spapr->mc_in_progress = false;
>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>> +        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +    }
>> +}
>> +
>>  static struct rtas_call {
>>      const char *name;
>>      spapr_rtas_fn fn;
>> @@ -747,6 +779,10 @@ static void core_rtas_register_types(void)
>>                          rtas_get_sensor_state);
>>      spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
>>                          rtas_ibm_configure_connector);
>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER, "ibm,nmi-register",
>> +                        rtas_ibm_nmi_register);
>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>> +                        rtas_ibm_nmi_interlock);
>>  }
>>  
>>  type_init(core_rtas_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index b5cadd7..de84a4e 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -74,6 +74,12 @@ struct sPAPRMachineState {
>>      /* RTAS state */
>>      QTAILQ_HEAD(, sPAPRConfigureConnectorState) ccs_list;
>>  
>> +    /* State related to "ibm,nmi-register" and "ibm,nmi-interlock" calls */
>> +    target_ulong guest_machine_check_addr;
>> +    bool mc_in_progress;
>> +    int mc_cpu;
> 
> mc_cpu doesn't appear to be used.

mc_cpu is used in patch 3.

> 
> I think guest_machine_check_addr and mc_in_progress will also need to
> be added to migration state.

sure. I have not looked into migration state before, but I think I need
to check if the KVM on target after migration has the KVM_CAP_PPC_FWNMI
capability.

Regards,
Aravinda

> 
>> +    QemuCond mc_delivery_cond;
>> +
>>      /*< public >*/
>>      char *kvm_type;
>>  };
>> @@ -458,8 +464,10 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
>>  #define RTAS_IBM_SET_SLOT_RESET                 (RTAS_TOKEN_BASE + 0x23)
>>  #define RTAS_IBM_CONFIGURE_PE                   (RTAS_TOKEN_BASE + 0x24)
>>  #define RTAS_IBM_SLOT_ERROR_DETAIL              (RTAS_TOKEN_BASE + 0x25)
>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x26)
>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x27)
>>  
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x26)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x28)
>>  
>>  /* RTAS ibm,get-system-parameter token values */
>>  #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 3/4] target-ppc: Handle NMI guest exit
  2015-12-17  4:00   ` David Gibson
@ 2015-12-17  5:01     ` Aravinda Prasad
  0 siblings, 0 replies; 14+ messages in thread
From: Aravinda Prasad @ 2015-12-17  5:01 UTC (permalink / raw)
  To: David Gibson; +Cc: benh, qemu-devel, qemu-ppc, paulus, sam.bobroff



On Thursday 17 December 2015 09:30 AM, David Gibson wrote:
> On Wed, Dec 16, 2015 at 11:38:37AM +0530, Aravinda Prasad wrote:
>> Memory error such as bit flips that cannot be corrected
>> by hardware are passed on to the kernel for handling.
>> If the memory address in error belongs to guest then
>> guest kernel is responsible for taking suitable action.
>> Patch [1] enhances KVM to exit guest with exit reason
>> set to KVM_EXIT_NMI in such cases.
>>
>> This patch handles KVM_EXIT_NMI exit. If the guest OS
>> has registered the machine check handling routine by
>> calling "ibm,nmi-register", then the handler builds
>> the error log and invokes the registered handler else
>> invokes the handler at 0x200.
>>
>> [1] http://marc.info/?l=kvm&m=145024538523497
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  cpus.c                   |    5 +++
>>  hw/ppc/spapr.c           |    6 +++
>>  include/qemu/main-loop.h |    8 ++++
>>  target-ppc/kvm.c         |   86 ++++++++++++++++++++++++++++++++++++++++++++++
>>  target-ppc/kvm_ppc.h     |   81 +++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 186 insertions(+)
>>
>> diff --git a/cpus.c b/cpus.c
>> index dddd056..7b7dd0f 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1154,6 +1154,11 @@ void qemu_mutex_unlock_iothread(void)
>>      qemu_mutex_unlock(&qemu_global_mutex);
>>  }
>>  
>> +void qemu_cond_wait_iothread(QemuCond *cond)
>> +{
>> +    qemu_cond_wait(cond, &qemu_global_mutex);
>> +}
>> +
> 
> Even though it's trivial, this generic change should probably go in a
> separate patch from the papr specific pieces.
> 
> Speaking of which, I think it's sufficiently trivial you could just
> inline it in the header.

I had it in a separate patch before. I changed my mind later as it is
trivial.

I will include it in a separate patch.

> 
>>  static int all_vcpus_paused(void)
>>  {
>>      CPUState *cpu;
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 05926a3..501dd70 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1556,6 +1556,12 @@ static void ppc_spapr_init(MachineState *machine)
>>          exit(1);
>>      }
>>      spapr->rtas_size = get_image_size(filename);
>> +
>> +    assert(spapr->rtas_size < RTAS_ERRLOG_OFFSET);
>> +
>> +    /* Resize blob to accommodate error log. */
>> +    spapr->rtas_size = RTAS_ERRLOG_OFFSET + sizeof(struct RtasMCELog);
>> +
>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>      if (load_image_size(filename, spapr->rtas_blob, spapr->rtas_size) < 0) {
>>          error_report("Could not load LPAR rtas '%s'", filename);
>> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
>> index 9976909..c4d4446 100644
>> --- a/include/qemu/main-loop.h
>> +++ b/include/qemu/main-loop.h
>> @@ -263,6 +263,14 @@ void qemu_mutex_lock_iothread(void);
>>   */
>>  void qemu_mutex_unlock_iothread(void);
>>  
>> +/**
>> + * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
>> + *
>> + * This function atomically releases the main loop mutex and causes
>> + * the calling thread to block on the condition.
>> + */
>> +void qemu_cond_wait_iothread(QemuCond *cond);
>> +
>>  /* internal interfaces */
>>  
>>  void qemu_fd_register(int fd);
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index 110436d..2bbb46d 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -1665,6 +1665,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>          ret = 0;
>>          break;
>>  
>> +    case KVM_EXIT_NMI:
>> +        DPRINTF("handle NMI exception\n");
>> +        ret = kvm_handle_nmi(cpu);
>> +        break;
>> +
>>      default:
>>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>          ret = -1;
>> @@ -2484,3 +2489,84 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>  {
>>      return data & 0xffff;
>>  }
>> +
>> +int kvm_handle_nmi(PowerPCCPU *cpu)
>> +{
>> +    struct RtasMCELog mc_log;
>> +    CPUPPCState *env = &cpu->env;
> 
> You go from cpu to env here..
> 
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    target_ulong msr = 0;
>> +
>> +    cpu_synchronize_state(CPU(ppc_env_get_cpu(env)));
> 
> Then back again awkwardly, although you still have the cpu variable.

ah.. I overlooked it.

> 
>> +
>> +    /*
>> +     * Properly set bits in MSR before we invoke the handler.
>> +     * SRR0/1, DAR and DSISR are properly set by KVM
>> +     */
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +    if (env->msr && (1ULL << MSR_SF)) {
>> +        msr |= (1ULL << MSR_SF);
>> +    }
>> +
>> +    msr |= (1ULL << MSR_ME);
> 
> Based on earlier discussions, sounds like assert(msr & (1ULL <<
> MSR_ME)) would actually be correct here.

Based on
http://lists.nongnu.org/archive/html/qemu-ppc/2015-11/msg00306.html, I
always set MSR_ME and don't assert if not set. Or am I missing anything
here?

Regards,
Aravinda

> 
>> +    env->msr = msr;
>> +
>> +    if (!spapr->guest_machine_check_addr) {
>> +        /*
>> +         * If OS has not registered with "ibm,nmi-register"
>> +         * jump to 0x200
>> +         */
>> +        env->nip = 0x200;
>> +        return 0;
>> +    }
>> +
>> +    while (spapr->mc_in_progress) {
>> +        /*
>> +         * Check whether the same CPU got machine check error
>> +         * while still handling the mc error (i.e., before
>> +         * that CPU called "ibm,nmi-interlock"
>> +         */
>> +        if (spapr->mc_cpu == cpu->cpu_dt_id) {
>> +            qemu_system_guest_panicked();
>> +        }
>> +        qemu_cond_wait_iothread(&spapr->mc_delivery_cond);
>> +    }
>> +    spapr->mc_in_progress = true;
>> +    spapr->mc_cpu = cpu->cpu_dt_id;
>> +
>> +    /* Set error log fields */
>> +    mc_log.r3 = env->gpr[3];
>> +    mc_log.err_log.byte0 = 0;
>> +    mc_log.err_log.byte1 =
>> +        (RTAS_SEVERITY_ERROR_SYNC << RTAS_ELOG_SEVERITY_SHIFT);
>> +    mc_log.err_log.byte1 |=
>> +        (RTAS_DISP_NOT_RECOVERED << RTAS_ELOG_DISPOSITION_SHIFT);
>> +    mc_log.err_log.byte2 =
>> +        (RTAS_INITIATOR_MEMORY << RTAS_ELOG_INITIATOR_SHIFT);
>> +    mc_log.err_log.byte2 |= RTAS_TARGET_MEMORY;
>> +
>> +    if (env->spr[SPR_DSISR] & P7_DSISR_MC_UE) {
>> +        mc_log.err_log.byte3 = RTAS_TYPE_ECC_UNCORR;
>> +    } else {
>> +        mc_log.err_log.byte3 = 0;
>> +    }
>> +
>> +    /* Handle all Host/Guest LE/BE combinations */
>> +    if (env->msr & (1ULL << MSR_LE)) {
>> +        mc_log.r3 = cpu_to_le64(mc_log.r3);
>> +    } else {
>> +        mc_log.r3 = cpu_to_be64(mc_log.r3);
>> +    }
>> +
>> +    cpu_physical_memory_write(spapr->rtas_addr + RTAS_ERRLOG_OFFSET,
>> +                              &mc_log, sizeof(mc_log));
>> +
>> +    env->nip = spapr->guest_machine_check_addr;
>> +    env->gpr[3] = spapr->rtas_addr + RTAS_ERRLOG_OFFSET;
>> +
>> +    return 0;
>> +}
>> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
>> index 5c1d334..ea3345b 100644
>> --- a/target-ppc/kvm_ppc.h
>> +++ b/target-ppc/kvm_ppc.h
>> @@ -53,6 +53,87 @@ void kvmppc_hash64_free_pteg(uint64_t token);
>>  void kvmppc_hash64_write_pte(CPUPPCState *env, target_ulong pte_index,
>>                               target_ulong pte0, target_ulong pte1);
>>  bool kvmppc_has_cap_fixup_hcalls(void);
>> +int kvm_handle_nmi(PowerPCCPU *cpu);
>> +
>> +/* Offset from rtas-base where error log is placed */
>> +#define RTAS_ERRLOG_OFFSET       0x200
>> +
>> +#define RTAS_ELOG_SEVERITY_SHIFT         0x5
>> +#define RTAS_ELOG_DISPOSITION_SHIFT      0x3
>> +#define RTAS_ELOG_INITIATOR_SHIFT        0x4
>> +
>> +/*
>> + * Only required RTAS event severity, disposition, initiator
>> + * target and type are copied from arch/powerpc/include/asm/rtas.h
>> + */
>> +
>> +/* RTAS event severity */
>> +#define RTAS_SEVERITY_ERROR_SYNC    0x3
>> +
>> +/* RTAS event disposition */
>> +#define RTAS_DISP_NOT_RECOVERED     0x2
>> +
>> +/* RTAS event initiator */
>> +#define RTAS_INITIATOR_MEMORY       0x4
>> +
>> +/* RTAS event target */
>> +#define RTAS_TARGET_MEMORY          0x4
>> +
>> +/* RTAS event type */
>> +#define RTAS_TYPE_ECC_UNCORR        0x09
>> +
>> +/*
>> + * Currently KVM only passes on the uncorrected machine
>> + * check memory error to guest. Other machine check errors
>> + * such as SLB multi-hit and TLB multi-hit are recovered
>> + * in KVM and are not passed on to guest.
>> + *
>> + * DSISR Bit for uncorrected machine check error. Based
>> + * on arch/powerpc/include/asm/mce.h
>> + */
>> +#define PPC_BIT(bit)                (0x8000000000000000ULL >> bit)
>> +#define P7_DSISR_MC_UE              (PPC_BIT(48))  /* P8 too */
>> +
>> +/* Adopted from kernel source arch/powerpc/include/asm/rtas.h */
>> +struct rtas_error_log {
>> +    /* Byte 0 */
>> +    uint8_t     byte0;          /* Architectural version */
>> +
>> +    /* Byte 1 */
>> +    uint8_t     byte1;
>> +    /* XXXXXXXX
>> +     * XXX      3: Severity level of error
>> +     *    XX    2: Degree of recovery
>> +     *      X   1: Extended log present?
>> +     *       XX 2: Reserved
>> +     */
>> +
>> +    /* Byte 2 */
>> +    uint8_t     byte2;
>> +    /* XXXXXXXX
>> +     * XXXX     4: Initiator of event
>> +     *     XXXX 4: Target of failed operation
>> +     */
>> +    uint8_t     byte3;          /* General event or error*/
>> +    __be32      extended_log_length;    /* length in bytes */
>> +    unsigned char   buffer[1];      /* Start of extended log */
>> +                                /* Variable length.      */
>> +};
>> +
>> +/*
>> + * Data format in RTAS-Blob
>> + *
>> + * This structure contains error information related to Machine
>> + * Check exception. This is filled up and copied to rtas-blob
>> + * upon machine check exception. The address of rtas-blob is
>> + * passed on to OS registered machine check notification
>> + * routines upon machine check exception
>> + */
>> +struct RtasMCELog {
>> +    target_ulong r3;
>> +    struct rtas_error_log err_log;
>> +};
>> +
>>  
>>  #else
>>  
>>
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability
  2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability Aravinda Prasad
  2015-12-17  4:02   ` David Gibson
@ 2015-12-17  7:03   ` Thomas Huth
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Huth @ 2015-12-17  7:03 UTC (permalink / raw)
  To: Aravinda Prasad, qemu-ppc, agraf, qemu-devel
  Cc: aik, benh, paulus, sam.bobroff, david

On 16/12/15 07:08, Aravinda Prasad wrote:
> Introduce a new KVM capability to control how KVM
> behaves on machine check exception.
> 
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problem if a new kernel/KVM
> is used with an old QEMU. As old QEMU might not
> understand the new NMI exit type and treat it as a
> fatal error, even though the guest could have actually
> handled the error if the exception was delivered to
> guest's 0x200 interrupt vector.
> 
> PS: KVM_CAP_PPC_FWNMI is set to 121 as 119 and 120 are
> used by KVM_CAP_GUEST_DEBUG_HW_BPS and KVM_CAP_GUEST_DEBUG_HW_WPS
> in KVM code, but still not reflected in QEMU code.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c       |   17 +++++++++++++++++
>  linux-headers/linux/kvm.h |    1 +
>  target-ppc/kvm.c          |    2 ++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 17c4672..53319da 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -38,6 +38,8 @@
>  #include <libfdt.h>
>  #include "hw/ppc/spapr_drc.h"
>  
> +extern int cap_fwnmi;
> +
>  /* #define DEBUG_SPAPR */
>  
>  #ifdef DEBUG_SPAPR
> @@ -603,9 +605,24 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>                                    target_ulong args,
>                                    uint32_t nret, target_ulong rets)
>  {
> +    int ret;
> +    CPUState *cs = CPU(cpu);
> +
> +    if (!cap_fwnmi) {
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> +        return;
> +    }
> +
>      spapr->mc_in_progress = false;
>      qemu_cond_init(&spapr->mc_delivery_cond);
>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
> +
> +    ret = kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
> +    if (ret < 0) {
> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> +        return;
> +    }
> +
>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
>  
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index 683f713..2db1fba 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -819,6 +819,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_DISABLE_QUIRKS 116
>  #define KVM_CAP_X86_SMM 117
>  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
> +#define KVM_CAP_PPC_FWNMI 121
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  

As far as I know, the proper way to update the files in the
linux-headers directory, is to use the scripts/update-linux-headers.sh
script to refresh them. So you also have to make sure that the line with
KVM_CAP_PPC_FWNMI has been accepted first by the upstream kernel folks,
i.e. it should be first in Paolo's KVM tree for example before you can
update the QEMU sources.

Please also do that update in a separate patch (since it can get bigger
sometimes; and that way it can also be ACKed separately from people who
are not directly involved in the ppc stuff).

 Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability
  2015-12-17  4:38     ` Aravinda Prasad
@ 2016-10-13  2:02       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Alexey Kardashevskiy @ 2016-10-13  2:02 UTC (permalink / raw)
  To: Aravinda Prasad, David Gibson
  Cc: benh, agraf, qemu-devel, qemu-ppc, paulus, sam.bobroff

On 17/12/15 15:38, Aravinda Prasad wrote:
> 
> 
> On Thursday 17 December 2015 09:32 AM, David Gibson wrote:
>> On Wed, Dec 16, 2015 at 11:38:47AM +0530, Aravinda Prasad wrote:
>>> Introduce a new KVM capability to control how KVM
>>> behaves on machine check exception.
>>>
>>> Without this capability, KVM redirects machine check
>>> exceptions to guest's 0x200 vector if the address in
>>> error belongs to guest. With this capability KVM
>>> causes a guest exit with NMI exit reason.
>>>
>>> This is required to avoid problem if a new kernel/KVM
>>> is used with an old QEMU. As old QEMU might not
>>> understand the new NMI exit type and treat it as a
>>> fatal error, even though the guest could have actually
>>> handled the error if the exception was delivered to
>>> guest's 0x200 interrupt vector.
>>>
>>> PS: KVM_CAP_PPC_FWNMI is set to 121 as 119 and 120 are
>>> used by KVM_CAP_GUEST_DEBUG_HW_BPS and KVM_CAP_GUEST_DEBUG_HW_WPS
>>> in KVM code, but still not reflected in QEMU code.
>>
>> The commit message seems to be written as if this were the kernel
>> patch adding the capability there, rather than the qemu patch using
>> it.
>>
> 
> I will reword it.


Was there any progress in regard of FWNMI?



> 
> Regards,
> Aravinda
> 
>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>> ---
>>>  hw/ppc/spapr_rtas.c       |   17 +++++++++++++++++
>>>  linux-headers/linux/kvm.h |    1 +
>>>  target-ppc/kvm.c          |    2 ++
>>>  3 files changed, 20 insertions(+)
>>>
>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index 17c4672..53319da 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -38,6 +38,8 @@
>>>  #include <libfdt.h>
>>>  #include "hw/ppc/spapr_drc.h"
>>>  
>>> +extern int cap_fwnmi;
>>> +
>>>  /* #define DEBUG_SPAPR */
>>>  
>>>  #ifdef DEBUG_SPAPR
>>> @@ -603,9 +605,24 @@ static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>                                    target_ulong args,
>>>                                    uint32_t nret, target_ulong rets)
>>>  {
>>> +    int ret;
>>> +    CPUState *cs = CPU(cpu);
>>> +
>>> +    if (!cap_fwnmi) {
>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>> +        return;
>>> +    }
>>> +
>>>      spapr->mc_in_progress = false;
>>>      qemu_cond_init(&spapr->mc_delivery_cond);
>>>      spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>> +
>>> +    ret = kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0);
>>> +    if (ret < 0) {
>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>> +        return;
>>> +    }
>>> +
>>>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>  }
>>>  
>>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>>> index 683f713..2db1fba 100644
>>> --- a/linux-headers/linux/kvm.h
>>> +++ b/linux-headers/linux/kvm.h
>>> @@ -819,6 +819,7 @@ struct kvm_ppc_smmu_info {
>>>  #define KVM_CAP_DISABLE_QUIRKS 116
>>>  #define KVM_CAP_X86_SMM 117
>>>  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
>>> +#define KVM_CAP_PPC_FWNMI 121
>>>  
>>>  #ifdef KVM_CAP_IRQ_ROUTING
>>>  
>>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>>> index 2bbb46d..5339c04 100644
>>> --- a/target-ppc/kvm.c
>>> +++ b/target-ppc/kvm.c
>>> @@ -74,6 +74,7 @@ static int cap_ppc_watchdog;
>>>  static int cap_papr;
>>>  static int cap_htab_fd;
>>>  static int cap_fixup_hcalls;
>>> +int cap_fwnmi;
>>>  
>>>  static uint32_t debug_inst_opcode;
>>>  
>>> @@ -116,6 +117,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>       * only activated after this by kvmppc_set_papr() */
>>>      cap_htab_fd = kvm_check_extension(s, KVM_CAP_PPC_HTAB_FD);
>>>      cap_fixup_hcalls = kvm_check_extension(s, KVM_CAP_PPC_FIXUP_HCALL);
>>> +    cap_fwnmi = kvm_check_extension(s, KVM_CAP_PPC_FWNMI);
>>>  
>>>      if (!cap_interrupt_level) {
>>>          fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
>>>
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-10-13  2:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16  6:08 [Qemu-devel] [PATCH v2 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 1/4] spapr: Register and handle HCALL to receive updated RTAS region Aravinda Prasad
2015-12-17  3:25   ` David Gibson
2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 2/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
2015-12-17  3:51   ` David Gibson
2015-12-17  4:45     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 3/4] target-ppc: Handle NMI guest exit Aravinda Prasad
2015-12-17  4:00   ` David Gibson
2015-12-17  5:01     ` [Qemu-devel] [Qemu-ppc] " Aravinda Prasad
2015-12-16  6:08 ` [Qemu-devel] [PATCH v2 4/4] spapr: Introduce FWNMI KVM capability Aravinda Prasad
2015-12-17  4:02   ` David Gibson
2015-12-17  4:38     ` Aravinda Prasad
2016-10-13  2:02       ` Alexey Kardashevskiy
2015-12-17  7:03   ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.