All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
@ 2017-07-05 17:13 Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 01/26] spapr: introduce the XIVE_EXPLOIT option in CAS Cédric Le Goater
                   ` (27 more replies)
  0 siblings, 28 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with an
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode, the newer POWER9 interrupt model. This
patchset is a first proposal to add XIVE support in the sPAPR machine.

The first patches introduce the XIVE exploitation mode in CAS.

Follow models for the XIVE interrupt controller, source and presenter.
We try to reuse the ICS and ICP models of XICS because the sPAPR
machine is tied to the XICSFabric interface and should be using a
common framework to be able to switch from one controller model to
another. To be discussed of course.

Then comes support for the Hypervisor's call which are used to
configure the interrupt sources and the event/notification queues of
the guest.

Finally, the last patches try to integrate the XIVE interrupt model in
the sPAPR machine and this not without a couple of serious hacks to
have something to test. See 'Caveats' below for more details.

This is a first draft and I expect a lot of rewrite before it reaches
mainline QEMU. Nevertheless, it compiles, boots and can be used for
some testing.

Code is here:

  https://github.com/legoater/qemu/commits/xive
  https://github.com/legoater/linux/commits/xive

Pre-compiled kernel (4.12) and initrd images can be found :

  http://kaod.org/qemu/ppc-xive/
       
Caveats :

 - Unnecessary complexity 

   I started working on XIVE looking at OPAL because I had the
   ambition to provide a common framework for the PowerNV and sPAPR
   machines. This is still the goal but the XIVE support for the
   PowerNV machine will be *much *more complex and we could use
   something simpler for sPAPR probably. This is why there are some
   clumsiness with the IRQ allocator and at the end of the patchset
   with the IPI interrupt source.

 - Switching interrupt model after CAS. 

   We now need a way to configure the guest with the interrupt model
   negotiated in CAS.

   But, currently, the sPAPR machine make uses of the controller very
   early in the initialization sequence. The interrupt source is used
   to allocate IRQ numbers and populate the device tree and the
   interrupt presenter objects are created along with the CPU.

   One approach would be to support the reset of the ICP and the ICS
   objects of the guest. We could be use a bitmap to allocate the IRQ
   numbers needed to populate the device tree and then instantiate the
   correct ICS with the bitmap as a parameter. The ICPs could be
   allocated later in the boot process. May be on demand, when a CPU
   is first notified.

 - Migration not addressed

 - Hotplug not addressed

 - KVM support

   The guest needs to be run with kernel_irqchip=off on a POWER9
   system.

 - LSI

   lightly tested.
   
Thanks,

C. 

Cédric Le Goater (26):
  spapr: introduce the XIVE_EXPLOIT option in CAS
  spapr: populate device tree depending on XIVE_EXPLOIT option
  target/ppc/POWER9: add POWERPC_EXCP_POWER9
  ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  ppc/xive: define XIVE internal tables
  ppc/xive: introduce a XIVE interrupt source model
  ppc/xive: add MMIO handlers to the XIVE interrupt source
  ppc/xive: add flags to the XIVE interrupt source
  ppc/xive: add an overall memory region for the ESBs
  ppc/xive: record interrupt source MMIO address for hcalls
  ppc/xics: introduce a print_info() handler to the ICS and ICP objects
  ppc/xive: add a print_info() handler for the interrupt source
  ppc/xive: introduce a XIVE interrupt presenter model
  ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  ppc/xive: push EQ data in OS event queues
  ppc/xive: notify CPU when interrupt priority is more privileged
  ppc/xive: add hcalls support
  ppc/xive: add device tree support
  ppc/xive: introduce a helper to map the XIVE memory regions
  ppc/xive: introduce a helper to create XIVE interrupt source objects
  ppc/xive: introduce routines to allocate IRQ numbers
  ppc/xive: create an XIVE interrupt source to handle IPIs
  spapr: add a XIVE object to the sPAPR machine
  spapr: include the XIVE interrupt source for IPIs
  spapr: print the XIVE interrupt source for IPIs in the monitor
  spapr: force XIVE exploitation mode for POWER9 (HACK)

 default-configs/ppc64-softmmu.mak |    2 +
 hw/intc/Makefile.objs             |    2 +
 hw/intc/xics.c                    |   36 +-
 hw/intc/xive-internal.h           |  218 ++++++++
 hw/intc/xive.c                    | 1024 +++++++++++++++++++++++++++++++++++++
 hw/intc/xive_spapr.c              |  796 ++++++++++++++++++++++++++++
 hw/ppc/spapr.c                    |  141 ++++-
 include/hw/ppc/spapr.h            |   17 +-
 include/hw/ppc/spapr_ovec.h       |    1 +
 include/hw/ppc/xics.h             |    2 +
 include/hw/ppc/xive.h             |   80 +++
 target/ppc/cpu-qom.h              |    2 +
 target/ppc/excp_helper.c          |    9 +-
 target/ppc/translate.c            |    3 +-
 target/ppc/translate_init.c       |    2 +-
 15 files changed, 2306 insertions(+), 29 deletions(-)
 create mode 100644 hw/intc/xive-internal.h
 create mode 100644 hw/intc/xive.c
 create mode 100644 hw/intc/xive_spapr.c
 create mode 100644 include/hw/ppc/xive.h

-- 
2.7.5

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 01/26] spapr: introduce the XIVE_EXPLOIT option in CAS
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 02/26] spapr: populate device tree depending on XIVE_EXPLOIT option Cédric Le Goater
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility
(the former POWER8 interrupt model) or in XIVE exploitation mode (the
newer POWER9 interrupt model).

Bit 7 of Byte 23 of vector 5 is used for this purpose.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c              | 13 +++++++------
 include/hw/ppc/spapr_ovec.h |  1 +
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d4d781876b27..27b12adc3582 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -910,7 +910,8 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
 {
     PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
 
-    char val[2 * 3] = {
+    char val[2 * 4] = {
+        23, 0x00, /* Xive mode: 0 = legacy (as in ISA 2.7), 1 = Exploitation */
         24, 0x00, /* Hash/Radix, filled in below. */
         25, 0x00, /* Hash options: Segment Tables == no, GTSE == no. */
         26, 0x40, /* Radix options: GTSE == yes. */
@@ -918,19 +919,19 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
 
     if (kvm_enabled()) {
         if (kvmppc_has_cap_mmu_radix() && kvmppc_has_cap_mmu_hash_v3()) {
-            val[1] = 0x80; /* OV5_MMU_BOTH */
+            val[3] = 0x80; /* OV5_MMU_BOTH */
         } else if (kvmppc_has_cap_mmu_radix()) {
-            val[1] = 0x40; /* OV5_MMU_RADIX_300 */
+            val[3] = 0x40; /* OV5_MMU_RADIX_300 */
         } else {
-            val[1] = 0x00; /* Hash */
+            val[3] = 0x00; /* Hash */
         }
     } else {
         if (first_ppc_cpu->env.mmu_model & POWERPC_MMU_V3) {
             /* V3 MMU supports both hash and radix (with dynamic switching) */
-            val[1] = 0xC0;
+            val[3] = 0xC0;
         } else {
             /* Otherwise we can only do hash */
-            val[1] = 0x00;
+            val[3] = 0x00;
         }
     }
     _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index f088833204de..0b464e22e75d 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -50,6 +50,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
 #define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
 #define OV5_HP_EVT              OV_BIT(6, 5)
+#define OV5_XIVE_EXPLOIT        OV_BIT(23, 7)
 
 /* ISA 3.00 MMU features: */
 #define OV5_MMU_BOTH            OV_BIT(24, 0) /* Radix and hash */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 02/26] spapr: populate device tree depending on XIVE_EXPLOIT option
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 01/26] spapr: introduce the XIVE_EXPLOIT option in CAS Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9 Cédric Le Goater
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

When XIVE is supported, the device tree should be populated
accordingly and the XIVE memory regions mapped to activate MMIOs.

Depending on the design we choose, we could also allocate different
ICS and ICP objects, or switch between objects. This needs to be
discussed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 27b12adc3582..0256e7a537bf 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -778,6 +778,11 @@ static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
         }
     }
 
+    /* /interrupt controller */
+    if (!spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)) {
+        spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
+    }
+
     offset = fdt_path_offset(fdt, "/chosen");
     if (offset < 0) {
         offset = fdt_add_subnode(fdt, 0, "chosen");
@@ -801,7 +806,7 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
 
     size -= sizeof(hdr);
 
-    /* Create sceleton */
+    /* Create skeleton */
     fdt_skel = g_malloc0(size);
     _FDT((fdt_create(fdt_skel, size)));
     _FDT((fdt_begin_node(fdt_skel, "")));
@@ -1069,9 +1074,6 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     _FDT(fdt_setprop_cell(fdt, 0, "#address-cells", 2));
     _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
-    /* /interrupt controller */
-    spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
-
     ret = spapr_populate_memory(spapr, fdt);
     if (ret < 0) {
         error_report("couldn't setup memory nodes in fdt");
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 01/26] spapr: introduce the XIVE_EXPLOIT option in CAS Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 02/26] spapr: populate device tree depending on XIVE_EXPLOIT option Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-10 10:26   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model Cédric Le Goater
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Prepare ground for the new exception model XIVE of POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 target/ppc/cpu-qom.h        | 2 ++
 target/ppc/excp_helper.c    | 9 ++++++---
 target/ppc/translate.c      | 3 ++-
 target/ppc/translate_init.c | 2 +-
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index d0cf6ca2a971..d7b78cf3f71c 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -132,6 +132,8 @@ enum powerpc_excp_t {
     POWERPC_EXCP_POWER7,
     /* POWER8 exception model           */
     POWERPC_EXCP_POWER8,
+    /* POWER9 exception model           */
+    POWERPC_EXCP_POWER9,
 };
 
 /*****************************************************************************/
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 3a9f0861e773..dc7dff36a580 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -148,9 +148,11 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
      */
 #if defined(TARGET_PPC64)
     if (excp_model == POWERPC_EXCP_POWER7 ||
-        excp_model == POWERPC_EXCP_POWER8) {
+        excp_model == POWERPC_EXCP_POWER8 ||
+        excp_model == POWERPC_EXCP_POWER9) {
         lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
-        if (excp_model == POWERPC_EXCP_POWER8) {
+        if (excp_model == POWERPC_EXCP_POWER8 ||
+            excp_model == POWERPC_EXCP_POWER9) {
             ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
         } else {
             ail = 0;
@@ -651,7 +653,8 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
         if (!(new_msr & MSR_HVB) && (env->spr[SPR_LPCR] & LPCR_ILE)) {
             new_msr |= (target_ulong)1 << MSR_LE;
         }
-    } else if (excp_model == POWERPC_EXCP_POWER8) {
+    } else if (excp_model == POWERPC_EXCP_POWER8 ||
+               excp_model == POWERPC_EXCP_POWER9) {
         if (new_msr & MSR_HVB) {
             if (env->spr[SPR_HID0] & HID0_HILE) {
                 new_msr |= (target_ulong)1 << MSR_LE;
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c0cd64d927c2..2d8c1b9e6836 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7064,7 +7064,8 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
 
 #if defined(TARGET_PPC64)
     if (env->excp_model == POWERPC_EXCP_POWER7 ||
-        env->excp_model == POWERPC_EXCP_POWER8) {
+        env->excp_model == POWERPC_EXCP_POWER8 ||
+        env->excp_model == POWERPC_EXCP_POWER9) {
         cpu_fprintf(f, "HSRR0 " TARGET_FMT_lx " HSRR1 " TARGET_FMT_lx "\n",
                     env->spr[SPR_HSRR0], env->spr[SPR_HSRR1]);
     }
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 53aff5a7b734..b8c7b8150318 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8962,7 +8962,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
     pcc->sps = &POWER7_POWER8_sps;
     pcc->radix_page_info = &POWER9_radix_page_info;
 #endif
-    pcc->excp_model = POWERPC_EXCP_POWER8;
+    pcc->excp_model = POWERPC_EXCP_POWER9;
     pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
     pcc->bfd_mach = bfd_mach_ppc64;
     pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (2 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9 Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-19  3:08   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables Cédric Le Goater
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Let's provide an empty shell for the XIVE controller model with a
couple of attributes for the IRQ number allocator. The latter is
largely inspired by OPAL which allocates IPI IRQ numbers from the
bottom of the IRQ number space and allocates the HW IRQ numbers from
the top.

The number of IPIs is simply deduced from the max number of CPUs the
guest supports and we provision a arbitrary number of HW irqs.

The XIVE object is kept private because it will hold internal tables
which do not need to be exposed to sPAPR.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |  1 +
 hw/intc/Makefile.objs             |  1 +
 hw/intc/xive-internal.h           | 28 ++++++++++++
 hw/intc/xive.c                    | 94 +++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/xive.h             | 27 +++++++++++
 5 files changed, 151 insertions(+)
 create mode 100644 hw/intc/xive-internal.h
 create mode 100644 hw/intc/xive.c
 create mode 100644 include/hw/ppc/xive.h

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 46c95993217d..1179c07e6e9f 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -56,6 +56,7 @@ CONFIG_SM501=y
 CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
+CONFIG_XIVE=$(CONFIG_PSERIES)
 # For PReP
 CONFIG_SERIAL_ISA=y
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 78426a7dafcd..28b83456bfcc 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
+obj-$(CONFIG_XIVE) += xive.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
new file mode 100644
index 000000000000..155c2dcd6066
--- /dev/null
+++ b/hw/intc/xive-internal.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright 2016,2017 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _INTC_XIVE_INTERNAL_H
+#define _INTC_XIVE_INTERNAL_H
+
+#include <hw/sysbus.h>
+
+struct XIVE {
+    SysBusDevice parent;
+
+    /* Properties */
+    uint32_t     nr_targets;
+
+    /* IRQ number allocator */
+    uint32_t     int_count;     /* Number of interrupts: nr_targets + HW IRQs */
+    uint32_t     int_base;      /* Min index */
+    uint32_t     int_max;       /* Max index */
+    uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
+    uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
+};
+
+#endif /* _INTC_XIVE_INTERNAL_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
new file mode 100644
index 000000000000..5b4ea915d87c
--- /dev/null
+++ b/hw/intc/xive.c
@@ -0,0 +1,94 @@
+/*
+ * QEMU PowerPC XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/dma.h"
+#include "monitor/monitor.h"
+#include "hw/ppc/xive.h"
+
+#include "xive-internal.h"
+
+/*
+ * Main XIVE object
+ */
+
+/* Let's provision some HW IRQ numbers. We could use a XIVE property
+ * also but it does not seem necessary for the moment.
+ */
+#define MAX_HW_IRQS_ENTRIES (8 * 1024)
+
+static void xive_init(Object *obj)
+{
+    ;
+}
+
+static void xive_realize(DeviceState *dev, Error **errp)
+{
+    XIVE *x = XIVE(dev);
+
+    if (!x->nr_targets) {
+        error_setg(errp, "Number of interrupt targets needs to be greater 0");
+        return;
+    }
+
+    /* Initialize IRQ number allocator. Let's use a base number if we
+     * need to introduce a notion of blocks one day.
+     */
+    x->int_base = 0;
+    x->int_count = x->nr_targets + MAX_HW_IRQS_ENTRIES;
+    x->int_max = x->int_base + x->int_count;
+    x->int_hw_bot = x->int_max;
+    x->int_ipi_top = x->int_base;
+
+    /* Reserve some numbers as OPAL does ? */
+    if (x->int_ipi_top < 0x10) {
+        x->int_ipi_top = 0x10;
+    }
+}
+
+static Property xive_properties[] = {
+    DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xive_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = xive_realize;
+    dc->props = xive_properties;
+    dc->desc = "XIVE";
+}
+
+static const TypeInfo xive_info = {
+    .name = TYPE_XIVE,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_init = xive_init,
+    .instance_size = sizeof(XIVE),
+    .class_init = xive_class_init,
+};
+
+static void xive_register_types(void)
+{
+    type_register_static(&xive_info);
+}
+
+type_init(xive_register_types)
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
new file mode 100644
index 000000000000..863f5a9c6b5f
--- /dev/null
+++ b/include/hw/ppc/xive.h
@@ -0,0 +1,27 @@
+/*
+ * QEMU PowerPC XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef PPC_XIVE_H
+#define PPC_XIVE_H
+
+typedef struct XIVE XIVE;
+
+#define TYPE_XIVE "xive"
+#define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
+
+#endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (3 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-19  3:24   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

The XIVE interrupt controller of the POWER9 uses a set of tables to
redirect exception from event sources to CPU threads. Among which we
choose to model :

 - the State Bit Entries (SBE), also known as Event State Buffer
   (ESB). This is a two bit state machine for each event source which
   is used to trigger events. The bits are named "P" (pending) and "Q"
   (queued) and can be controlled by MMIO.

 - the Interrupt Virtualization Entry (IVE) table, also known as Event
   Assignment Structure (EAS). This table is indexed by the IRQ number
   and is looked up to find the Event Queue associated with a
   triggered event.

 - the Event Queue Descriptor (EQD) table, also known as Event
   Notification Descriptor (END). The EQD contains fields that specify
   the Event Queue on which event data is posted (and later pulled by
   the OS) and also a target (or VPD) to notify.

An additional table was not modeled but we might need to to support
the H_INT_SET_OS_REPORTING_LINE hcall:

 - the Virtual Processor Descriptor (VPD) table, also known as
   Notification Virtual Target (NVT).

The XIVE object is expanded with the tables described above. The size
of each table depends on the number of provisioned IRQ and the maximum
number of CPUs in the system. The indexing is very basic and might
need to be improved for the EQs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive-internal.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/intc/xive.c          | 72 +++++++++++++++++++++++++++++++++++++
 2 files changed, 167 insertions(+)

diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index 155c2dcd6066..8e755aa88a14 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -11,6 +11,89 @@
 
 #include <hw/sysbus.h>
 
+/* Utilities to manipulate these (originaly from OPAL) */
+#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
+#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
+#define SETFIELD(m, v, val)                             \
+        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
+
+#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
+#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
+#define PPC_BIT8(bit)           (0x80UL >> (bit))
+#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
+#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
+                                 PPC_BIT32(bs))
+
+/* IVE/EAS
+ *
+ * One per interrupt source. Targets that interrupt to a given EQ
+ * and provides the corresponding logical interrupt number (EQ data)
+ *
+ * We also map this structure to the escalation descriptor inside
+ * an EQ, though in that case the valid and masked bits are not used.
+ */
+typedef struct XiveIVE {
+        /* Use a single 64-bit definition to make it easier to
+         * perform atomic updates
+         */
+        uint64_t        w;
+#define IVE_VALID       PPC_BIT(0)
+#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
+#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
+#define IVE_MASKED      PPC_BIT(32)              /* Masked */
+#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
+} XiveIVE;
+
+/* EQ */
+typedef struct XiveEQ {
+        uint32_t        w0;
+#define EQ_W0_VALID             PPC_BIT32(0)
+#define EQ_W0_ENQUEUE           PPC_BIT32(1)
+#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
+#define EQ_W0_BACKLOG           PPC_BIT32(3)
+#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
+#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
+#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
+#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
+#define EQ_W0_SW0               PPC_BIT32(16)
+#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
+#define EQ_QSIZE_4K             0
+#define EQ_QSIZE_64K            4
+#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
+        uint32_t        w1;
+#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
+#define EQ_W1_ESn_P             PPC_BIT32(0)
+#define EQ_W1_ESn_Q             PPC_BIT32(1)
+#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
+#define EQ_W1_ESe_P             PPC_BIT32(2)
+#define EQ_W1_ESe_Q             PPC_BIT32(3)
+#define EQ_W1_GENERATION        PPC_BIT32(9)
+#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
+        uint32_t        w2;
+#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
+#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
+        uint32_t        w3;
+#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
+        uint32_t        w4;
+#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
+#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
+        uint32_t        w5;
+#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
+        uint32_t        w6;
+#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
+#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
+#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
+        uint32_t        w7;
+#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
+#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
+#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
+#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
+#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
+} XiveEQ;
+
+#define XIVE_EQ_PRIORITY_COUNT 8
+#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
+
 struct XIVE {
     SysBusDevice parent;
 
@@ -23,6 +106,18 @@ struct XIVE {
     uint32_t     int_max;       /* Max index */
     uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
     uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
+
+    /* XIVE internal tables */
+    void         *sbe;
+    XiveIVE      *ivt;
+    XiveEQ       *eqdt;
 };
 
+void xive_reset(void *dev);
+XiveIVE *xive_get_ive(XIVE *x, uint32_t isn);
+XiveEQ *xive_get_eq(XIVE *x, uint32_t idx);
+
+bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t prio,
+                        uint32_t *out_eq_idx);
+
 #endif /* _INTC_XIVE_INTERNAL_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 5b4ea915d87c..5b14d8155317 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -35,6 +35,27 @@
  */
 #define MAX_HW_IRQS_ENTRIES (8 * 1024)
 
+
+void xive_reset(void *dev)
+{
+    XIVE *x = XIVE(dev);
+    int i;
+
+    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
+    memset(x->sbe, 0x55, x->int_count / 4);
+
+    /* Clear and mask all valid IVEs */
+    for (i = x->int_base; i < x->int_max; i++) {
+        XiveIVE *ive = &x->ivt[i];
+        if (ive->w & IVE_VALID) {
+            ive->w = IVE_VALID | IVE_MASKED;
+        }
+    }
+
+    /* clear all EQs */
+    memset(x->eqdt, 0, x->nr_targets * XIVE_EQ_PRIORITY_COUNT * sizeof(XiveEQ));
+}
+
 static void xive_init(Object *obj)
 {
     ;
@@ -62,6 +83,19 @@ static void xive_realize(DeviceState *dev, Error **errp)
     if (x->int_ipi_top < 0x10) {
         x->int_ipi_top = 0x10;
     }
+
+    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
+    x->sbe = g_malloc0(x->int_count / 4);
+
+    /* Allocate the IVT (Interrupt Virtualization Table) */
+    x->ivt = g_malloc0(x->int_count * sizeof(XiveIVE));
+
+    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
+     * for each thread in the system */
+    x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
+                        sizeof(XiveEQ));
+
+    qemu_register_reset(xive_reset, dev);
 }
 
 static Property xive_properties[] = {
@@ -92,3 +126,41 @@ static void xive_register_types(void)
 }
 
 type_init(xive_register_types)
+
+XiveIVE *xive_get_ive(XIVE *x, uint32_t lisn)
+{
+    uint32_t idx = lisn;
+
+    if (idx < x->int_base || idx >= x->int_max) {
+        return NULL;
+    }
+
+    return &x->ivt[idx];
+}
+
+XiveEQ *xive_get_eq(XIVE *x, uint32_t idx)
+{
+    if (idx >= x->nr_targets * XIVE_EQ_PRIORITY_COUNT) {
+        return NULL;
+    }
+
+    return &x->eqdt[idx];
+}
+
+/* TODO: improve EQ indexing. This is very simple and relies on the
+ * fact that target (CPU) numbers start at 0 and are contiguous. It
+ * should be OK for sPAPR.
+ */
+bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
+                        uint32_t *out_eq_idx)
+{
+    if (priority > XIVE_PRIORITY_MAX || target >= x->nr_targets) {
+        return false;
+    }
+
+    if (out_eq_idx) {
+        *out_eq_idx = target + priority;
+    }
+
+    return true;
+}
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (4 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  4:02   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source Cédric Le Goater
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

This is very similar to the current ICS_SIMPLE model in XICS. We try
to reuse the ICS model because the sPAPR machine is tied to the
XICSFabric interface and should be using a common framework to switch
from one controller model to another: XICS <-> XIVE.

The next patch will introduce the MMIO handlers to interact with XIVE
interrupt sources.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c        | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/xive.h |  12 ++++++
 2 files changed, 122 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 5b14d8155317..9ff14c0da595 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -26,6 +26,115 @@
 
 #include "xive-internal.h"
 
+static void xive_icp_irq(XiveICSState *xs, int lisn)
+{
+
+}
+
+/*
+ * XIVE Interrupt Source
+ */
+static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
+{
+    if (val) {
+        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
+    }
+}
+
+static void xive_ics_set_irq_lsi(XiveICSState *xs, int srcno, int val)
+{
+    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
+
+    if (val) {
+        irq->status |= XICS_STATUS_ASSERTED;
+    } else {
+        irq->status &= ~XICS_STATUS_ASSERTED;
+    }
+
+    if (irq->status & XICS_STATUS_ASSERTED
+        && !(irq->status & XICS_STATUS_SENT)) {
+        irq->status |= XICS_STATUS_SENT;
+        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
+    }
+}
+
+static void xive_ics_set_irq(void *opaque, int srcno, int val)
+{
+    XiveICSState *xs = ICS_XIVE(opaque);
+    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
+
+    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+        xive_ics_set_irq_lsi(xs, srcno, val);
+    } else {
+        xive_ics_set_irq_msi(xs, srcno, val);
+    }
+}
+
+static void xive_ics_reset(void *dev)
+{
+    ICSState *ics = ICS_BASE(dev);
+    int i;
+    uint8_t flags[ics->nr_irqs];
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        flags[i] = ics->irqs[i].flags;
+    }
+
+    memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        ics->irqs[i].flags = flags[i];
+    }
+}
+
+static void xive_ics_realize(ICSState *ics, Error **errp)
+{
+    XiveICSState *xs = ICS_XIVE(ics);
+    Object *obj;
+    Error *err = NULL;
+
+    obj = object_property_get_link(OBJECT(xs), "xive", &err);
+    if (!obj) {
+        error_setg(errp, "%s: required link 'xive' not found: %s",
+                   __func__, error_get_pretty(err));
+        return;
+    }
+    xs->xive = XIVE(obj);
+
+    if (!ics->nr_irqs) {
+        error_setg(errp, "Number of interrupts needs to be greater 0");
+        return;
+    }
+
+    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
+    ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
+
+    qemu_register_reset(xive_ics_reset, xs);
+}
+
+static Property xive_ics_properties[] = {
+    DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
+    DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xive_ics_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    ICSStateClass *isc = ICS_BASE_CLASS(klass);
+
+    isc->realize = xive_ics_realize;
+
+    dc->props = xive_ics_properties;
+}
+
+static const TypeInfo xive_ics_info = {
+    .name = TYPE_ICS_XIVE,
+    .parent = TYPE_ICS_BASE,
+    .instance_size = sizeof(XiveICSState),
+    .class_init = xive_ics_class_init,
+};
+
 /*
  * Main XIVE object
  */
@@ -123,6 +232,7 @@ static const TypeInfo xive_info = {
 static void xive_register_types(void)
 {
     type_register_static(&xive_info);
+    type_register_static(&xive_ics_info);
 }
 
 type_init(xive_register_types)
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 863f5a9c6b5f..544cc6e0c796 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -19,9 +19,21 @@
 #ifndef PPC_XIVE_H
 #define PPC_XIVE_H
 
+#include "hw/ppc/xics.h"
+
 typedef struct XIVE XIVE;
+typedef struct XiveICSState XiveICSState;
 
 #define TYPE_XIVE "xive"
 #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
 
+#define TYPE_ICS_XIVE "xive-source"
+#define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
+
+struct XiveICSState {
+    ICSState parent_obj;
+
+    XIVE         *xive;
+};
+
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (5 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  4:29   ` David Gibson
  2017-07-24  6:50   ` Alexey Kardashevskiy
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags " Cédric Le Goater
                   ` (20 subsequent siblings)
  27 siblings, 2 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Each interrupt source is associated with a 2-bit state machine called
an Event State Buffer (ESB). It is controlled by MMIO to trigger
events.

See code for more details on the states.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c        | 230 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/xive.h |   3 +
 2 files changed, 233 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 9ff14c0da595..816031b8ac81 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -32,6 +32,226 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
 }
 
 /*
+ * "magic" Event State Buffer (ESB) MMIO offsets.
+ *
+ * Each interrupt source has a 2-bit state machine called ESB
+ * which can be controlled by MMIO. It's made of 2 bits, P and
+ * Q. P indicates that an interrupt is pending (has been sent
+ * to a queue and is waiting for an EOI). Q indicates that the
+ * interrupt has been triggered while pending.
+ *
+ * This acts as a coalescing mechanism in order to guarantee
+ * that a given interrupt only occurs at most once in a queue.
+ *
+ * When doing an EOI, the Q bit will indicate if the interrupt
+ * needs to be re-triggered.
+ *
+ * The following offsets into the ESB MMIO allow to read or
+ * manipulate the PQ bits. They must be used with an 8-bytes
+ * load instruction. They all return the previous state of the
+ * interrupt (atomically).
+ *
+ * Additionally, some ESB pages support doing an EOI via a
+ * store at 0 and some ESBs support doing a trigger via a
+ * separate trigger page.
+ */
+#define XIVE_ESB_GET            0x800
+#define XIVE_ESB_SET_PQ_00      0xc00
+#define XIVE_ESB_SET_PQ_01      0xd00
+#define XIVE_ESB_SET_PQ_10      0xe00
+#define XIVE_ESB_SET_PQ_11      0xf00
+
+#define XIVE_ESB_VAL_P          0x2
+#define XIVE_ESB_VAL_Q          0x1
+
+#define XIVE_ESB_RESET          0x0
+#define XIVE_ESB_PENDING        0x2
+#define XIVE_ESB_QUEUED         0x3
+#define XIVE_ESB_OFF            0x1
+
+static uint8_t xive_pq_get(XIVE *x, uint32_t lisn)
+{
+    uint32_t idx = lisn;
+    uint32_t byte = idx / 4;
+    uint32_t bit  = (idx % 4) * 2;
+    uint8_t* pqs = (uint8_t *) x->sbe;
+
+    return (pqs[byte] >> bit) & 0x3;
+}
+
+static void xive_pq_set(XIVE *x, uint32_t lisn, uint8_t pq)
+{
+    uint32_t idx = lisn;
+    uint32_t byte = idx / 4;
+    uint32_t bit  = (idx % 4) * 2;
+    uint8_t* pqs = (uint8_t *) x->sbe;
+
+    pqs[byte] &= ~(0x3 << bit);
+    pqs[byte] |= (pq & 0x3) << bit;
+}
+
+static bool xive_pq_eoi(XIVE *x, uint32_t lisn)
+{
+    uint8_t old_pq = xive_pq_get(x, lisn);
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        xive_pq_set(x, lisn, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_PENDING:
+        xive_pq_set(x, lisn, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_QUEUED:
+        xive_pq_set(x, lisn, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_OFF:
+        xive_pq_set(x, lisn, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+static bool xive_pq_trigger(XIVE *x, uint32_t lisn)
+{
+    uint8_t old_pq = xive_pq_get(x, lisn);
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        xive_pq_set(x, lisn, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_PENDING:
+        xive_pq_set(x, lisn, XIVE_ESB_QUEUED);
+        return true;
+    case XIVE_ESB_QUEUED:
+        xive_pq_set(x, lisn, XIVE_ESB_QUEUED);
+        return true;
+    case XIVE_ESB_OFF:
+        xive_pq_set(x, lisn, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+/*
+ * XIVE Interrupt Source MMIOs
+ */
+static void xive_ics_eoi(XiveICSState *xs, uint32_t srcno)
+{
+    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
+
+    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+        irq->status &= ~XICS_STATUS_SENT;
+    }
+}
+
+/* TODO: handle second page */
+static uint64_t xive_esb_read(void *opaque, hwaddr addr, unsigned size)
+{
+    XiveICSState *xs = ICS_XIVE(opaque);
+    XIVE *x = xs->xive;
+    uint32_t offset = addr & 0xF00;
+    uint32_t srcno = addr >> xs->esb_shift;
+    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
+    XiveIVE *ive;
+    uint64_t ret = -1;
+
+    ive = xive_get_ive(x, lisn);
+    if (!ive || !(ive->w & IVE_VALID))  {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
+        goto out;
+    }
+
+    if (srcno >= ICS_BASE(xs)->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
+                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
+        goto out;
+    }
+
+    switch (offset) {
+    case 0:
+        xive_ics_eoi(xs, srcno);
+
+        /* return TRUE or FALSE depending on PQ value */
+        ret = xive_pq_eoi(x, lisn);
+        break;
+
+    case XIVE_ESB_GET:
+        ret = xive_pq_get(x, lisn);
+        break;
+
+    case XIVE_ESB_SET_PQ_00:
+    case XIVE_ESB_SET_PQ_01:
+    case XIVE_ESB_SET_PQ_10:
+    case XIVE_ESB_SET_PQ_11:
+        ret = xive_pq_get(x, lisn);
+        xive_pq_set(x, lisn, (offset >> 8) & 0x3);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
+    }
+
+out:
+    return ret;
+}
+
+static void xive_esb_write(void *opaque, hwaddr addr,
+                           uint64_t value, unsigned size)
+{
+    XiveICSState *xs = ICS_XIVE(opaque);
+    XIVE *x = xs->xive;
+    uint32_t offset = addr & 0xF00;
+    uint32_t srcno = addr >> xs->esb_shift;
+    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
+    XiveIVE *ive;
+    bool notify = false;
+
+    ive = xive_get_ive(x, lisn);
+    if (!ive || !(ive->w & IVE_VALID))  {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
+        return;
+    }
+
+    if (srcno >= ICS_BASE(xs)->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
+                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
+        return;
+    }
+
+    switch (offset) {
+    case 0:
+        /* TODO: should we trigger even if the IVE is masked ? */
+        notify = xive_pq_trigger(x, lisn);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
+                      offset);
+        return;
+    }
+
+    if (notify && !(ive->w & IVE_MASKED)) {
+        qemu_irq_pulse(ICS_BASE(xs)->qirqs[srcno]);
+    }
+}
+
+static const MemoryRegionOps xive_esb_ops = {
+    .read = xive_esb_read,
+    .write = xive_esb_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+};
+
+/*
  * XIVE Interrupt Source
  */
 static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
@@ -106,15 +326,25 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
         return;
     }
 
+    if (!xs->esb_shift) {
+        error_setg(errp, "ESB page size needs to be greater 0");
+        return;
+    }
+
     ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
     ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
 
+    memory_region_init_io(&xs->esb_iomem, OBJECT(xs), &xive_esb_ops, xs,
+                          "xive.esb",
+                          (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
+
     qemu_register_reset(xive_ics_reset, xs);
 }
 
 static Property xive_ics_properties[] = {
     DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
     DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
+    DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 544cc6e0c796..5303d96f5f59 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -33,6 +33,9 @@ typedef struct XiveICSState XiveICSState;
 struct XiveICSState {
     ICSState parent_obj;
 
+    uint32_t     esb_shift;
+    MemoryRegion esb_iomem;
+
     XIVE         *xive;
 };
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (6 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  4:36   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs Cédric Le Goater
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

These flags define some characteristics of the source :

 - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
                       specific hcall H_INT_ESB
 - XIVE_SRC_LSI        LSI or MSI source
 - XIVE_SRC_TRIGGER    the full function page supports trigger
 - XIVE_SRC_STORE_EOI  EOI can with a store.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c        | 1 +
 include/hw/ppc/xive.h | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 816031b8ac81..8f8bb8b787bd 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -345,6 +345,7 @@ static Property xive_ics_properties[] = {
     DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
     DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
     DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
+    DEFINE_PROP_UINT64("flags", XiveICSState, flags, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 5303d96f5f59..1178300c9df3 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -30,9 +30,18 @@ typedef struct XiveICSState XiveICSState;
 #define TYPE_ICS_XIVE "xive-source"
 #define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
 
+/*
+ * XIVE Interrupt source flags
+ */
+#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
+#define XIVE_SRC_LSI           (1ull << (63 - 61))
+#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
+#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
+
 struct XiveICSState {
     ICSState parent_obj;
 
+    uint64_t     flags;
     uint32_t     esb_shift;
     MemoryRegion esb_iomem;
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (7 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags " Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  4:49   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 10/26] ppc/xive: record interrupt source MMIO address for hcalls Cédric Le Goater
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Each source adds its own ESB mempry region to the overall ESB memory
region of the controller. It will be mapped in the CPU address space
when XIVE is activated.

The default mapping address for the ESB memory region is the same one
used on baremetal.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive-internal.h |  5 +++++
 hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index 8e755aa88a14..c06be823aad0 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -98,6 +98,7 @@ struct XIVE {
     SysBusDevice parent;
 
     /* Properties */
+    uint32_t     chip_id;
     uint32_t     nr_targets;
 
     /* IRQ number allocator */
@@ -111,6 +112,10 @@ struct XIVE {
     void         *sbe;
     XiveIVE      *ivt;
     XiveEQ       *eqdt;
+
+    /* ESB and TIMA memory location */
+    hwaddr       vc_base;
+    MemoryRegion esb_iomem;
 };
 
 void xive_reset(void *dev);
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 8f8bb8b787bd..a1cb87a07b76 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -312,6 +312,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
     XiveICSState *xs = ICS_XIVE(ics);
     Object *obj;
     Error *err = NULL;
+    XIVE *x;
 
     obj = object_property_get_link(OBJECT(xs), "xive", &err);
     if (!obj) {
@@ -319,7 +320,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
                    __func__, error_get_pretty(err));
         return;
     }
-    xs->xive = XIVE(obj);
+    x = xs->xive = XIVE(obj);
 
     if (!ics->nr_irqs) {
         error_setg(errp, "Number of interrupts needs to be greater 0");
@@ -338,6 +339,11 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
                           "xive.esb",
                           (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
 
+    /* Install the ESB memory region in the overall one */
+    memory_region_add_subregion(&x->esb_iomem,
+                                ICS_BASE(xs)->offset * (1 << xs->esb_shift),
+                                &xs->esb_iomem);
+
     qemu_register_reset(xive_ics_reset, xs);
 }
 
@@ -375,6 +381,32 @@ static const TypeInfo xive_ics_info = {
  */
 #define MAX_HW_IRQS_ENTRIES (8 * 1024)
 
+/* VC BAR contains set translations for the ESBs and the EQs. */
+#define VC_BAR_DEFAULT   0x10000000000ull
+#define VC_BAR_SIZE      0x08000000000ull
+
+#define P9_MMIO_BASE     0x006000000000000ull
+#define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
+
+static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
+{
+    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
+                  __func__, offset, size);
+    return 0;
+}
+
+static void xive_esb_default_write(void *opaque, hwaddr offset, uint64_t value,
+                unsigned size)
+{
+    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
+                  __func__, offset, value, size);
+}
+
+static const MemoryRegionOps xive_esb_default_ops = {
+    .read = xive_esb_default_read,
+    .write = xive_esb_default_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
 
 void xive_reset(void *dev)
 {
@@ -435,10 +467,20 @@ static void xive_realize(DeviceState *dev, Error **errp)
     x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
                         sizeof(XiveEQ));
 
+    /* VC BAR. That's the full window but we will only map the
+     * subregions in use. */
+    x->vc_base = (hwaddr)(P9_CHIP_BASE(x->chip_id) | VC_BAR_DEFAULT);
+
+    /* install default memory region handlers to log bogus access */
+    memory_region_init_io(&x->esb_iomem, NULL, &xive_esb_default_ops,
+                          NULL, "xive.esb", VC_BAR_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
+
     qemu_register_reset(xive_reset, dev);
 }
 
 static Property xive_properties[] = {
+    DEFINE_PROP_UINT32("chip-id", XIVE, chip_id, 0),
     DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 10/26] ppc/xive: record interrupt source MMIO address for hcalls
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (8 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  5:11   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects Cédric Le Goater
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

The address of the MMIO page through which the Event State Buffer is
controlled is returned to the guest by the H_INT_GET_SOURCE_INFO hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c        | 3 +++
 include/hw/ppc/xive.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index a1cb87a07b76..0db97fd33981 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -344,6 +344,9 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
                                 ICS_BASE(xs)->offset * (1 << xs->esb_shift),
                                 &xs->esb_iomem);
 
+    /* Record base address which is needed by the hcalls */
+    xs->esb_base = x->vc_base + ICS_BASE(xs)->offset * (1 << xs->esb_shift);
+
     qemu_register_reset(xive_ics_reset, xs);
 }
 
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 1178300c9df3..b06bc861b845 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -43,6 +43,7 @@ struct XiveICSState {
 
     uint64_t     flags;
     uint32_t     esb_shift;
+    hwaddr       esb_base;
     MemoryRegion esb_iomem;
 
     XIVE         *xive;
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (9 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 10/26] ppc/xive: record interrupt source MMIO address for hcalls Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  5:13   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 12/26] ppc/xive: add a print_info() handler for the interrupt source Cédric Le Goater
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

This handler will be used to customize the ouput of the XIVE interrupt
source and presenter objects.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics.c        | 36 ++++++++++++++++++++++++------------
 include/hw/ppc/xics.h |  2 ++
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index faa5c631f655..7837c2022b4a 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -40,18 +40,26 @@
 
 void icp_pic_print_info(ICPState *icp, Monitor *mon)
 {
+    ICPStateClass *k = ICP_GET_CLASS(icp);
     int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
 
     if (!icp->output) {
         return;
     }
-    monitor_printf(mon, "CPU %d XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
-                   cpu_index, icp->xirr, icp->xirr_owner,
-                   icp->pending_priority, icp->mfrr);
+
+    monitor_printf(mon, "CPU %d ", cpu_index);
+    if (k->print_info) {
+        k->print_info(icp, mon);
+    } else {
+        monitor_printf(mon, "XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
+                       icp->xirr, icp->xirr_owner,
+                       icp->pending_priority, icp->mfrr);
+    }
 }
 
 void ics_pic_print_info(ICSState *ics, Monitor *mon)
 {
+    ICSStateClass *k = ICS_BASE_GET_CLASS(ics);
     uint32_t i;
 
     monitor_printf(mon, "ICS %4x..%4x %p\n",
@@ -61,17 +69,21 @@ void ics_pic_print_info(ICSState *ics, Monitor *mon)
         return;
     }
 
-    for (i = 0; i < ics->nr_irqs; i++) {
-        ICSIRQState *irq = ics->irqs + i;
+    if (k->print_info) {
+        k->print_info(ics, mon);
+    } else {
+        for (i = 0; i < ics->nr_irqs; i++) {
+            ICSIRQState *irq = ics->irqs + i;
 
-        if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
-            continue;
+            if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
+                continue;
+            }
+            monitor_printf(mon, "  %4x %s %02x %02x\n",
+                           ics->offset + i,
+                           (irq->flags & XICS_FLAGS_IRQ_LSI) ?
+                           "LSI" : "MSI",
+                           irq->priority, irq->status);
         }
-        monitor_printf(mon, "  %4x %s %02x %02x\n",
-                       ics->offset + i,
-                       (irq->flags & XICS_FLAGS_IRQ_LSI) ?
-                       "LSI" : "MSI",
-                       irq->priority, irq->status);
     }
 }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 28d248abad61..902f3bfd0e33 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -69,6 +69,7 @@ struct ICPStateClass {
     void (*pre_save)(ICPState *icp);
     int (*post_load)(ICPState *icp, int version_id);
     void (*reset)(ICPState *icp);
+    void (*print_info)(ICPState *icp, Monitor *mon);
 };
 
 struct ICPState {
@@ -119,6 +120,7 @@ struct ICSStateClass {
     void (*reject)(ICSState *s, uint32_t irq);
     void (*resend)(ICSState *s);
     void (*eoi)(ICSState *s, uint32_t irq);
+    void (*print_info)(ICSState *s, Monitor *mon);
 };
 
 struct ICSState {
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 12/26] ppc/xive: add a print_info() handler for the interrupt source
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (10 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 13/26] ppc/xive: introduce a XIVE interrupt presenter model Cédric Le Goater
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

This is much like the default one but we expose the PQ bits also.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 0db97fd33981..db808e0cbe3d 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -290,6 +290,25 @@ static void xive_ics_set_irq(void *opaque, int srcno, int val)
     }
 }
 
+static void xive_ics_print_info(ICSState *ics, Monitor *mon)
+{
+    XiveICSState *xs = ICS_XIVE(ics);
+    int i;
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        ICSIRQState *irq = ics->irqs + i;
+
+        if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
+            continue;
+        }
+        monitor_printf(mon, "  %4x %s pq=%02x status=%02x\n",
+                       ics->offset + i,
+                       (irq->flags & XICS_FLAGS_IRQ_LSI) ? "LSI" : "MSI",
+                       xive_pq_get(xs->xive, ics->offset + i),
+                       irq->status);
+    }
+}
+
 static void xive_ics_reset(void *dev)
 {
     ICSState *ics = ICS_BASE(dev);
@@ -364,6 +383,7 @@ static void xive_ics_class_init(ObjectClass *klass, void *data)
     ICSStateClass *isc = ICS_BASE_CLASS(klass);
 
     isc->realize = xive_ics_realize;
+    isc->print_info = xive_ics_print_info;
 
     dc->props = xive_ics_properties;
 }
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 13/26] ppc/xive: introduce a XIVE interrupt presenter model
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (11 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 12/26] ppc/xive: add a print_info() handler for the interrupt source Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  6:05   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the " Cédric Le Goater
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Just like the interrupt source model, we try to reuse the ICP model
because the sPAPR machine is tied to the XICSFabric interface and
should be using a common framework to switch from one controller model
to another: XICS <-> XIVE.

The XIVE interrupt presenter exposes a set of Thread Interrupt
Management Areas, also called rings, one per different level of
privilege (four in all). We only expose the OS ring for the sPAPR
support for the moment. This area is used to handle priority
management and interrupt acknowledgment among other things.

The next patch will introduce the MMIO handlers to interact with the
TIMA, OS only.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive-internal.h | 84 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/intc/xive.c          | 43 +++++++++++++++++++++++++
 include/hw/ppc/xive.h   | 14 +++++++++
 3 files changed, 141 insertions(+)

diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index c06be823aad0..ba5e648a5258 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -24,6 +24,90 @@
 #define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
                                  PPC_BIT32(bs))
 
+/*
+ * Thread Management (aka "TM") registers
+ */
+
+/* TM register offsets */
+#define TM_QW0_USER             0x000 /* All rings */
+#define TM_QW1_OS               0x010 /* Ring 0..2 */
+#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
+#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
+
+/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
+#define TM_NSR                  0x0  /*  +   +   -   +  */
+#define TM_CPPR                 0x1  /*  -   +   -   +  */
+#define TM_IPB                  0x2  /*  -   +   +   +  */
+#define TM_LSMFB                0x3  /*  -   +   +   +  */
+#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
+#define TM_INC                  0x5  /*  -   +   -   +  */
+#define TM_AGE                  0x6  /*  -   +   -   +  */
+#define TM_PIPR                 0x7  /*  -   +   -   +  */
+
+#define TM_WORD0                0x0
+#define TM_WORD1                0x4
+
+/*
+ * QW word 2 contains the valid bit at the top and other fields
+ * depending on the QW.
+ */
+#define TM_WORD2                0x8
+#define   TM_QW0W2_VU           PPC_BIT32(0)
+#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
+#define   TM_QW1W2_VO           PPC_BIT32(0)
+#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
+#define   TM_QW2W2_VP           PPC_BIT32(0)
+#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
+#define   TM_QW3W2_VT           PPC_BIT32(0)
+#define   TM_QW3W2_LP           PPC_BIT32(6)
+#define   TM_QW3W2_LE           PPC_BIT32(7)
+#define   TM_QW3W2_T            PPC_BIT32(31)
+
+/*
+ * In addition to normal loads to "peek" and writes (only when invalid)
+ * using 4 and 8 bytes accesses, the above registers support these
+ * "special" byte operations:
+ *
+ *   - Byte load from QW0[NSR] - User level NSR (EBB)
+ *   - Byte store to QW0[NSR] - User level NSR (EBB)
+ *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
+ *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
+ *                                    otherwise VT||0000000
+ *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
+ *
+ * Then we have all these "special" CI ops at these offset that trigger
+ * all sorts of side effects:
+ */
+#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
+#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
+#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
+#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
+                                         * context */
+#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
+#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
+                                         * context to reg */
+#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
+                                         * context to reg*/
+#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
+#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
+                                         * line */
+#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
+#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
+                                         * line */
+#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
+/* XXX more... */
+
+/* NSR fields for the various QW ack types */
+#define TM_QW0_NSR_EB           PPC_BIT8(0)
+#define TM_QW1_NSR_EO           PPC_BIT8(0)
+#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
+#define  TM_QW3_NSR_HE_NONE     0
+#define  TM_QW3_NSR_HE_POOL     1
+#define  TM_QW3_NSR_HE_PHYS     2
+#define  TM_QW3_NSR_HE_LSI      3
+#define TM_QW3_NSR_I            PPC_BIT8(2)
+#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
+
 /* IVE/EAS
  *
  * One per interrupt source. Targets that interrupt to a given EQ
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index db808e0cbe3d..c08a4f8efb58 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -26,6 +26,48 @@
 
 #include "xive-internal.h"
 
+static void xive_icp_reset(ICPState *icp)
+{
+    XiveICPState *xicp = XIVE_ICP(icp);
+
+    memset(xicp->tima, 0, sizeof(xicp->tima));
+}
+
+static void xive_icp_print_info(ICPState *icp, Monitor *mon)
+{
+    XiveICPState *xicp = XIVE_ICP(icp);
+
+    monitor_printf(mon, " CPPR=%02x IPB=%02x PIPR=%02x NSR=%02x\n",
+                   xicp->tima_os[TM_CPPR], xicp->tima_os[TM_IPB],
+                   xicp->tima_os[TM_PIPR], xicp->tima_os[TM_NSR]);
+}
+
+static void xive_icp_init(Object *obj)
+{
+    XiveICPState *xicp = XIVE_ICP(obj);
+
+    xicp->tima_os = &xicp->tima[TM_QW1_OS];
+}
+
+static void xive_icp_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    ICPStateClass *icpc = ICP_CLASS(klass);
+
+    dc->desc = "PowerNV Xive ICP";
+    icpc->reset = xive_icp_reset;
+    icpc->print_info = xive_icp_print_info;
+}
+
+static const TypeInfo xive_icp_info = {
+    .name          = TYPE_XIVE_ICP,
+    .parent        = TYPE_ICP,
+    .instance_size = sizeof(XiveICPState),
+    .instance_init = xive_icp_init,
+    .class_init    = xive_icp_class_init,
+    .class_size    = sizeof(ICPStateClass),
+};
+
 static void xive_icp_irq(XiveICSState *xs, int lisn)
 {
 
@@ -529,6 +571,7 @@ static void xive_register_types(void)
 {
     type_register_static(&xive_info);
     type_register_static(&xive_ics_info);
+    type_register_static(&xive_icp_info);
 }
 
 type_init(xive_register_types)
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index b06bc861b845..f87df8107dd9 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -23,6 +23,7 @@
 
 typedef struct XIVE XIVE;
 typedef struct XiveICSState XiveICSState;
+typedef struct XiveICPState XiveICPState;
 
 #define TYPE_XIVE "xive"
 #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
@@ -38,6 +39,9 @@ typedef struct XiveICSState XiveICSState;
 #define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
 #define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
 
+#define TYPE_XIVE_ICP "xive-icp"
+#define XIVE_ICP(obj) OBJECT_CHECK(XiveICPState, (obj), TYPE_XIVE_ICP)
+
 struct XiveICSState {
     ICSState parent_obj;
 
@@ -49,4 +53,14 @@ struct XiveICSState {
     XIVE         *xive;
 };
 
+/* Number of Thread Management Interrupt Areas */
+#define XIVE_TM_RING_COUNT 4
+
+struct XiveICPState {
+    ICPState parent_obj;
+
+    uint8_t tima[XIVE_TM_RING_COUNT * 0x10];
+    uint8_t *tima_os;
+};
+
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (12 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 13/26] ppc/xive: introduce a XIVE interrupt presenter model Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  6:35   ` David Gibson
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 15/26] ppc/xive: push EQ data in OS event queues Cédric Le Goater
                   ` (13 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

The Thread Interrupt Management Area for the OS is mostly used to
acknowledge interrupts and set the CPPR of the CPU.

The TIMA is mapped at the same address for each CPU. 'current_cpu' is
used to retrieve the targeted interrupt presenter object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive-internal.h |   4 ++
 hw/intc/xive.c          | 187 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 191 insertions(+)

diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index ba5e648a5258..5e8b78a1ea6a 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -200,6 +200,10 @@ struct XIVE {
     /* ESB and TIMA memory location */
     hwaddr       vc_base;
     MemoryRegion esb_iomem;
+
+    uint32_t     tm_shift;
+    hwaddr       tm_base;
+    MemoryRegion tm_iomem;
 };
 
 void xive_reset(void *dev);
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index c08a4f8efb58..82b2f0dcda0b 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -26,6 +26,180 @@
 
 #include "xive-internal.h"
 
+static uint8_t priority_to_ipb(uint8_t priority)
+{
+    return priority < XIVE_EQ_PRIORITY_COUNT ? 1 << (7 - priority) : 0;
+}
+
+static uint64_t xive_icp_accept(XiveICPState *xicp)
+{
+    ICPState *icp = ICP(xicp);
+    uint8_t nsr = xicp->tima_os[TM_NSR];
+
+    qemu_irq_lower(icp->output);
+
+    if (xicp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
+        uint8_t cppr = xicp->tima_os[TM_PIPR];
+
+        xicp->tima_os[TM_CPPR] = cppr;
+
+        /* Reset the pending buffer bit */
+        xicp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
+
+        /* Drop Exception bit for OS */
+        xicp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
+    }
+
+    return (nsr << 8) | xicp->tima_os[TM_CPPR];
+}
+
+static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
+{
+    if (cppr > XIVE_PRIORITY_MAX) {
+        cppr = 0xff;
+    }
+
+    xicp->tima_os[TM_CPPR] = cppr;
+}
+
+/*
+ * Thread Interrupt Management Area MMIO
+ */
+static uint64_t xive_tm_read_special(XiveICPState *icp, hwaddr offset,
+                                     unsigned size)
+{
+    uint64_t ret = -1;
+
+    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
+        ret = xive_icp_accept(icp);
+    } else {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
+                      HWADDR_PRIx" size %d\n", offset, size);
+    }
+
+    return ret;
+}
+
+static uint64_t xive_tm_read(void *opaque, hwaddr offset, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    XiveICPState *icp = XIVE_ICP(cpu->intc);
+    uint64_t ret = -1;
+    int i;
+
+    if (offset >= TM_SPC_ACK_EBB) {
+        return xive_tm_read_special(icp, offset, size);
+    }
+
+    if (offset & TM_QW1_OS) {
+        switch (size) {
+        case 1:
+        case 2:
+        case 4:
+        case 8:
+            if (QEMU_IS_ALIGNED(offset, size)) {
+                ret = 0;
+                for (i = 0; i < size; i++) {
+                    ret |= icp->tima[offset + i] << (8 * i);
+                }
+            } else {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                              "XIVE: invalid TIMA read alignment @%"
+                              HWADDR_PRIx" size %d\n", offset, size);
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
+                      HWADDR_PRIx"\n", offset);
+    }
+
+    return ret;
+}
+
+static bool xive_tm_is_readonly(uint8_t index)
+{
+    /* Let's be optimistic and prepare ground for HV mode support */
+    switch (index) {
+    case TM_QW1_OS + TM_CPPR:
+        return false;
+    default:
+        return true;
+    }
+}
+
+static void xive_tm_write_special(XiveICPState *xicp, hwaddr offset,
+                                  uint64_t value, unsigned size)
+{
+    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
+        xicp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
+    } else {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                      HWADDR_PRIx" size %d\n", offset, size);
+    }
+
+    /* TODO: support TM_SPC_ACK_OS_EL */
+}
+
+static void xive_tm_write(void *opaque, hwaddr offset,
+                           uint64_t value, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    XiveICPState *icp = XIVE_ICP(cpu->intc);
+    int i;
+
+    if (offset >= TM_SPC_ACK_EBB) {
+        xive_tm_write_special(icp, offset, value, size);
+        return;
+    }
+
+    if (offset & TM_QW1_OS) {
+        switch (size) {
+        case 1:
+            if (offset == TM_QW1_OS + TM_CPPR) {
+                xive_icp_set_cppr(icp, value & 0xff);
+            }
+            break;
+        case 4:
+        case 8:
+            if (QEMU_IS_ALIGNED(offset, size)) {
+                for (i = 0; i < size; i++) {
+                    if (!xive_tm_is_readonly(offset + i)) {
+                        icp->tima[offset + i] = (value >> (8 * i)) & 0xff;
+                    }
+                }
+            } else {
+                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                              HWADDR_PRIx" size %d\n", offset, size);
+            }
+            break;
+        default:
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                          HWADDR_PRIx" size %d\n", offset, size);
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
+                      HWADDR_PRIx"\n", offset);
+    }
+}
+
+
+static const MemoryRegionOps xive_tm_ops = {
+    .read = xive_tm_read,
+    .write = xive_tm_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
+
 static void xive_icp_reset(ICPState *icp)
 {
     XiveICPState *xicp = XIVE_ICP(icp);
@@ -453,6 +627,11 @@ static const TypeInfo xive_ics_info = {
 #define P9_MMIO_BASE     0x006000000000000ull
 #define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
 
+/* Thread Interrupt Management Area MMIO */
+#define TM_BAR_DEFAULT   0x30203180000ull
+#define TM_SHIFT         16
+#define TM_BAR_SIZE      (XIVE_TM_RING_COUNT * (1 << TM_SHIFT))
+
 static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
 {
     qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
@@ -541,6 +720,14 @@ static void xive_realize(DeviceState *dev, Error **errp)
                           NULL, "xive.esb", VC_BAR_SIZE);
     sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
 
+    /* TM BAR. Same address for each chip */
+    x->tm_base = (P9_MMIO_BASE | TM_BAR_DEFAULT);
+    x->tm_shift = TM_SHIFT;
+
+    memory_region_init_io(&x->tm_iomem, OBJECT(x), &xive_tm_ops, x,
+                          "xive.tm", TM_BAR_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->tm_iomem);
+
     qemu_register_reset(xive_reset, dev);
 }
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 15/26] ppc/xive: push EQ data in OS event queues
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (13 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the " Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged Cédric Le Goater
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

If a triggered event is let through, the event queue data defined in
the associated IVE is pushed in the in-memory event queue of the
OS. The latter is a memory ring buffer defined by the OS with
H_INT_SET_QUEUE_CONFIG hcall.

Then, an interrupt presenter is located and notified. See next patch.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 82b2f0dcda0b..c3c1e9c9db2d 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -242,9 +242,103 @@ static const TypeInfo xive_icp_info = {
     .class_size    = sizeof(ICPStateClass),
 };
 
+static XiveICPState *xive_icp_get(XICSFabric *xi, int server)
+{
+    XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
+    ICPState *icp = xic->icp_get(xi, server);
+
+    return XIVE_ICP(icp);
+}
+
+static void xive_eq_push(XiveEQ *eq, uint32_t data)
+{
+    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
+    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
+    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
+    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
+
+    uint64_t qaddr = qaddr_base + (qindex << 2);
+    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
+    uint32_t qentries = 1 << (qsize + 10);
+
+    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
+                      HWADDR_PRIx "\n", __func__, qaddr);
+        return;
+    }
+
+    qindex = (qindex + 1) % qentries;
+    if (qindex == 0) {
+        qgen ^= 1;
+        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
+    }
+    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
+}
+
 static void xive_icp_irq(XiveICSState *xs, int lisn)
 {
+    XIVE *x = xs->xive;
+    XiveICPState *xicp;
+    XiveIVE *ive;
+    XiveEQ *eq;
+    uint32_t eq_idx;
+    uint32_t priority;
+    uint32_t target;
+
+    ive = xive_get_ive(x, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
+        return;
+    }
 
+    if (ive->w & IVE_MASKED) {
+        return;
+    }
+
+    /* Find our XiveEQ */
+    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
+    eq = xive_get_eq(x, eq_idx);
+    if (!eq) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", lisn);
+        return;
+    }
+
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
+    }
+
+    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
+        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
+    }
+
+    target = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
+
+    /* use the XICSFabric (machine) to get the ICP */
+    xicp = xive_icp_get(ICS_BASE(xs)->xics, target);
+    if (!xicp) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for target %d\n", target);
+        return;
+    }
+
+    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
+        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
+
+        /* The EQ is masked. Can this happen ?  */
+        if (priority == 0xff) {
+            return;
+        }
+
+        /* Update the IPB (Interrupt Pending Buffer) with the priority
+         * of the new notification and inform the ICP, which will
+         * decide to raise the exception, or not, depending on its
+         * current CPPR value.
+         */
+        xicp->tima_os[TM_IPB] |= priority_to_ipb(priority);
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
+    }
 }
 
 /*
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (14 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 15/26] ppc/xive: push EQ data in OS event queues Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-09-09  7:39   ` Benjamin Herrenschmidt
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 17/26] ppc/xive: add hcalls support Cédric Le Goater
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index c3c1e9c9db2d..cda1fa18e44d 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -53,6 +53,21 @@ static uint64_t xive_icp_accept(XiveICPState *xicp)
     return (nsr << 8) | xicp->tima_os[TM_CPPR];
 }
 
+static uint8_t ipb_to_pipr(uint8_t ibp)
+{
+    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
+}
+
+static void xive_icp_notify(XiveICPState *xicp)
+{
+    xicp->tima_os[TM_PIPR] = ipb_to_pipr(xicp->tima_os[TM_IPB]);
+
+    if (xicp->tima_os[TM_PIPR] < xicp->tima_os[TM_CPPR]) {
+        xicp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
+        qemu_irq_raise(ICP(xicp)->output);
+    }
+}
+
 static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
 {
     if (cppr > XIVE_PRIORITY_MAX) {
@@ -60,6 +75,10 @@ static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
     }
 
     xicp->tima_os[TM_CPPR] = cppr;
+
+    /* CPPR has changed, inform the ICP which might raise an
+     * exception */
+    xive_icp_notify(xicp);
 }
 
 /*
@@ -339,6 +358,8 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
     } else {
         qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
     }
+
+    xive_icp_notify(xicp);
 }
 
 /*
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 17/26] ppc/xive: add hcalls support
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (15 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-24  9:39   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 18/26] ppc/xive: add device tree support Cédric Le Goater
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

A set of Hypervisor's call are used to configure the interrupt sources
and the event/notification queues of the guest:

   H_INT_GET_SOURCE_INFO
   H_INT_SET_SOURCE_CONFIG
   H_INT_GET_SOURCE_CONFIG
   H_INT_GET_QUEUE_INFO
   H_INT_SET_QUEUE_CONFIG
   H_INT_GET_QUEUE_CONFIG
   H_INT_RESET
   H_INT_ESB

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE
   H_INT_SYNC

See below for the documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |   1 +
 hw/intc/Makefile.objs             |   1 +
 hw/intc/xive_spapr.c              | 745 ++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h            |  17 +-
 include/hw/ppc/xive.h             |   4 +
 5 files changed, 767 insertions(+), 1 deletion(-)
 create mode 100644 hw/intc/xive_spapr.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 1179c07e6e9f..3888168adf95 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -57,6 +57,7 @@ CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_XIVE=$(CONFIG_PSERIES)
+CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
 # For PReP
 CONFIG_SERIAL_ISA=y
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 28b83456bfcc..31b4fae2d1a8 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -36,6 +36,7 @@ obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
 obj-$(CONFIG_XIVE) += xive.o
+obj-$(CONFIG_XIVE_SPAPR) += xive_spapr.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
diff --git a/hw/intc/xive_spapr.c b/hw/intc/xive_spapr.c
new file mode 100644
index 000000000000..b634d1f28f10
--- /dev/null
+++ b/hw/intc/xive_spapr.c
@@ -0,0 +1,745 @@
+/*
+ * QEMU PowerPC XIVE model for pSeries
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "cpu.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/xive.h"
+#include "hw/ppc/fdt.h"
+#include "monitor/monitor.h"
+
+#include "xive-internal.h"
+
+static XiveICSState *xive_ics_find(sPAPRMachineState *spapr, uint32_t lisn)
+{
+    XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
+    ICSState *ics = xic->ics_get(XICS_FABRIC(spapr), lisn);
+
+    return ICS_XIVE(ics);
+}
+
+static bool priority_is_valid(int priority)
+{
+    return priority >= 0 && priority < 8;
+}
+
+/*
+ * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
+ * real address of the MMIO page through which the Event State Buffer
+ * entry associated with the value of the "lisn" parameter is managed.
+ *
+ * Parameters:
+ * Input
+ * - "flags"
+ *       Bits 0-63 reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as returned
+ *       by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output
+ * - R4: "flags"
+ *       Bits 0-59: Reserved
+ *       Bit 60: H_INT_ESB must be used for Event State Buffer
+ *               management
+ *       Bit 61: 1 == LSI  0 == MSI
+ *       Bit 62: the full function page supports trigger
+ *       Bit 63: Store EOI Supported
+ * - R5: Logical Real address of full function Event State Buffer
+ *       management page, -1 if ESB hcall flag is set to 1.
+ * - R6: Logical Real Address of trigger only Event State Buffer
+ *       management page or -1.
+ * - R7: Power of 2 page size for the ESB management pages returned in
+ *       R5 and R6.
+ */
+static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
+                                          sPAPRMachineState *spapr,
+                                          target_ulong opcode,
+                                          target_ulong *args)
+{
+    target_ulong flags  = args[0];
+    target_ulong lisn   = args[1];
+    XiveICSState *xs;
+    uint32_t srcno;
+    uint64_t mmio_base;
+    ICSIRQState *irq;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    xs = xive_ics_find(spapr, lisn);
+    if (!xs) {
+        return H_P2;
+    }
+
+    srcno = lisn - ICS_BASE(xs)->offset;
+    mmio_base = (uint64_t)xs->esb_base + (1ull << xs->esb_shift) * srcno;
+    irq = &ICS_BASE(xs)->irqs[srcno];
+
+    args[0] = 0;
+    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+        args[0] |= XIVE_SRC_LSI;
+    }
+    if (xs->flags & XIVE_SRC_TRIGGER) {
+        args[0] |= XIVE_SRC_TRIGGER;
+    }
+
+    /* never used in QEMU  */
+    if (xs->flags & XIVE_SRC_H_INT_ESB) {
+        args[1] = -1;
+    } else {
+        args[1] = mmio_base;
+        if (xs->flags & XIVE_SRC_TRIGGER) {
+            args[2] = -1; /* No specific trigger page */
+        } else {
+            args[2] = -1; /* TODO: support for specific trigger page */
+        }
+    }
+
+    args[3] = xs->esb_shift;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
+ * Interrupt Source to a target. The Logical Interrupt Source is
+ * designated with the "lisn" parameter and the target is designated
+ * with the "target" and "priority" parameters.  Upon return from the
+ * hcall(), no additional interrupts will be directed to the old EQ.
+ * The old EQ should be investigated for interrupts that occurred
+ * prior to or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-61: Reserved
+ *      Bit 62: set the "eisn" in the EA
+ *      Bit 63: masks the interrupt source in the hardware interrupt
+ *      control structure. An interrupt masked by this mechanism will
+ *      be dropped, but it's source state bits will still be
+ *      set. There is no race-free way of unmasking and restoring the
+ *      source. Thus this should only be used in interrupts that are
+ *      also masked at the source, and only in cases where the
+ *      interrupt is not meant to be used for a large amount of time
+ *      because no valid target exists for it for example
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as returned by
+ *      the H_ALLOCATE_VAS_WINDOW hcall
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *      "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *      "ibm,plat-res-int-priorities"
+ * - "eisn" is the guest EISN associated with the "lisn"
+ *
+ * Output:
+ * - None
+ */
+
+#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
+#define XIVE_SRC_MASK     (1ull << (63 - 63))
+
+static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    XiveIVE *ive;
+    uint64_t new_ive;
+    target_ulong flags    = args[0];
+    target_ulong lisn     = args[1];
+    target_ulong target   = args[2];
+    target_ulong priority = args[3];
+    target_ulong eisn     = args[4];
+    uint32_t eq_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
+        return H_PARAMETER;
+    }
+
+    ive = xive_get_ive(spapr->xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+    new_ive = ive->w;
+
+    /* Let's handle 0xff priority as if the interrupt was masked */
+    if (priority == 0xff || (flags & XIVE_SRC_MASK)) {
+        new_ive |= IVE_MASKED;
+        priority = 7;
+    } else {
+        new_ive = ive->w & ~IVE_MASKED;
+    }
+
+    if (!priority_is_valid(priority)) {
+        return H_P4;
+    }
+
+    /* First find the EQ corresponding to the target */
+    if (!xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P3;
+    }
+
+    /* And update */
+    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
+    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
+
+    if (flags & XIVE_SRC_SET_EISN) {
+        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
+    }
+
+    ive->w = new_ive;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
+ * target/priority pair is assigned to the specified Logical Interrupt
+ * Source.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63 Reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as
+ *      returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - R4: Target to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R5: Priority to which the specified Logical Interrupt Source is
+ *       assigned
+ */
+static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong lisn = args[1];
+    XiveIVE *ive;
+    XiveEQ *eq;
+    uint32_t eq_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    ive = xive_get_ive(spapr->xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
+    eq = xive_get_eq(spapr->xive, eq_idx);
+    if (!eq) {
+        return H_P2;
+    }
+
+    if (ive->w & IVE_MASKED) {
+        args[1] = 0xff;
+    } else {
+        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
+    }
+
+    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
+ * address of the notification management page associated with the
+ * specified target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *       Bits 0-63 Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: Logical real address of notification page
+ * - R5: Power of 2 page size of the notification page
+ */
+static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
+                                         sPAPRMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    uint32_t eq_idx;
+    XiveEQ *eq;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    if (!xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    eq = xive_get_eq(spapr->xive, eq_idx);
+    if (!eq)  {
+        return H_PARAMETER;
+    }
+
+    args[0] = -1; /* TODO: return ESn page */
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        args[1] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
+    } else {
+        args[1] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
+ * a given "target" and "priority".  It is also used to set the
+ * notification config associated with the EQ.  An EQ size of 0 is
+ * used to reset the EQ config for a given target and priority. If
+ * resetting the EQ config, the END associated with the given "target"
+ * and "priority" will be changed to disable queueing.
+ *
+ * Upon return from the hcall(), no additional interrupts will be
+ * directed to the old EQ (if one was set).  The old EQ (if one was
+ * set) should be investigated for interrupts that occurred prior to
+ * or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      Bit 63: Unconditional Notify (n) per the XIVE spec
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ * - "eventQueue": The logical real address of the start of the EQ
+ * - "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
+ *
+ * Output:
+ * - None
+ */
+
+#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
+
+static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    target_ulong qpage    = args[3];
+    target_ulong qsize    = args[4];
+    uint32_t eq_idx;
+    XiveEQ *old_eq;
+    XiveEQ eq;
+    uint32_t qdata;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_EQ_ALWAYS_NOTIFY) {
+        return H_PARAMETER;
+    }
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    if (!xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    old_eq = xive_get_eq(spapr->xive, eq_idx);
+    if (!old_eq)  {
+        return H_HARDWARE;
+    }
+
+    eq = *old_eq;
+
+    /* Let's validate the EQ address with a read of first EQ entry */
+    if (address_space_read(&address_space_memory, qpage, MEMTXATTRS_UNSPECIFIED,
+                           (uint8_t *) &qdata, sizeof(qdata))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to read EQ data @0x%"
+                      HWADDR_PRIx "\n", __func__, qpage);
+        return H_P4;
+    }
+
+    switch (qsize) {
+    case 12:
+    case 16:
+    case 21:
+    case 24:
+        eq.w3 = ((uint64_t)qpage) & 0xffffffff;
+        eq.w2 = (((uint64_t)qpage)) >> 32 & 0x0fffffff;
+        eq.w0 |= EQ_W0_ENQUEUE;
+        eq.w0 = SETFIELD(EQ_W0_QSIZE, eq.w0, qsize - 12);
+        break;
+    case 0:
+        eq.w2 = eq.w3 = 0;
+        eq.w0 &= ~EQ_W0_ENQUEUE;
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid EQ size %"PRIx64"\n",
+                      __func__, qsize);
+        return H_P5;
+    }
+
+    /* Ensure the priority and target are correctly set (they will not
+     * be right after allocation
+     */
+    eq.w6 = SETFIELD(EQ_W6_NVT_BLOCK, 0ul, 0ul) |
+        SETFIELD(EQ_W6_NVT_INDEX, 0ul, target);
+    eq.w7 = SETFIELD(EQ_W7_F0_PRIORITY, 0ul, priority);
+
+    /* TODO: depends on notitification page (ESn) from H_INT_GET_QUEUE_INFO */
+    if (flags & XIVE_EQ_ALWAYS_NOTIFY) {
+        eq.w0 |= EQ_W0_UCOND_NOTIFY;
+    }
+
+    eq.w1 = EQ_W1_GENERATION | SETFIELD(EQ_W1_PAGE_OFF, 0ul, 0ul);
+    eq.w0 |= EQ_W0_VALID;
+
+    /* Update EQ */
+    *old_eq = eq;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
+ * target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: "flags":
+ *       Bits 0-62: Reserved
+ *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec *
+ * - R5: The logical real address of the start of the EQ
+ * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
+ */
+static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    uint32_t eq_idx;
+    XiveEQ *eq;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    if (!xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    eq = xive_get_eq(spapr->xive, eq_idx);
+    if (!eq)  {
+        return H_HARDWARE;
+    }
+
+    if (eq->w0 & EQ_W0_UCOND_NOTIFY) {
+        args[0] = XIVE_EQ_ALWAYS_NOTIFY;
+    } else {
+        args[0] = 0;
+    }
+
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        args[1] =
+            (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
+        args[2] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
+    } else {
+        args[1] = 0;
+        args[2] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
+ * reporting cache line pair for the input "target".  The reporting
+ * cache lines will contain the OS interrupt context when the OS
+ * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
+ * interrupt. The reporting cache lines can be reset by inputting -1
+ * in "reportingLine".  Issuing the CI store byte without reporting
+ * cache lines registered will result in the data not being accessible
+ * to the OS.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "reportingLine": The logical real address of the reporting cache
+ *    line pair
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /* TODO: H_INT_SET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
+ * real address of the reporting cache line pair set for the input
+ * "target".  If no reporting cache line pair has been set, -1 is
+ * returned.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "reportingLine": The logical real address of the reporting cache
+ *   line pair
+ *
+ * Output:
+ * - R4: The logical real address of the reporting line if set, else -1
+ */
+static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /* TODO: H_INT_GET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_ESB hcall() is used to issue a load or store to the ESB
+ * page for the input "lisn".  This hcall is only supported for LISNs
+ * that have the ESB hcall flag set to 1 when returned from hcall()
+ * H_INT_GET_SOURCE_INFO.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      bit 63: Store: Store=1, store operation, else load operation
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *          "ibm,xive-lisn-ranges" properties, or as returned by the
+ *          ibm,query-interrupt-source-number RTAS call, or as
+ *          returned by the H_ALLOCATE_VAS_WINDOW hcall
+ * - "esbOffset" is the offset into the ESB page for the load or store operation
+ * - "storeData" is the data to write for a store operation
+ *
+ * Output:
+ * - R4: R4: The value of the load if load operation, else -1
+ */
+
+#define XIVE_ESB_STORE (1ull << (63 - 63))
+
+static target_ulong h_int_esb(PowerPCCPU *cpu,
+                              sPAPRMachineState *spapr,
+                              target_ulong opcode,
+                              target_ulong *args)
+{
+    target_ulong flags   = args[0];
+    target_ulong lisn    = args[1];
+    target_ulong offset  = args[2];
+    target_ulong data    = args[3];
+    XiveICSState *xs;
+    uint32_t srcno;
+    uint64_t esb_base;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_ESB_STORE) {
+        return H_PARAMETER;
+    }
+
+    xs = xive_ics_find(spapr, lisn);
+    if (!xs) {
+        return H_P2;
+    }
+
+    if (offset > (1ull << xs->esb_shift)) {
+        return H_P3;
+    }
+
+    srcno = lisn - ICS_BASE(xs)->offset;
+    esb_base = (uint64_t)xs->esb_base + (1ull << xs->esb_shift) * srcno;
+    esb_base += offset;
+
+    if (dma_memory_rw(&address_space_memory, esb_base, &data, 8,
+                      (flags & XIVE_ESB_STORE))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to rw data @0x%"
+                      HWADDR_PRIx "\n", __func__, esb_base);
+        return H_HARDWARE;
+    }
+    args[0] = (flags & XIVE_ESB_STORE) ? -1 : data;
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SYNC hcall() is used to issue syncs.  Is this IPI sync
+ * and HW sync?  Need the OS teams to let us know what syncs need to
+ * be provided.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_sync(PowerPCCPU *cpu,
+                               sPAPRMachineState *spapr,
+                               target_ulong opcode,
+                               target_ulong *args)
+{
+    target_ulong flags   = args[0];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /* TODO: H_INT_SYNC, I have no idea what needs to be done */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_RESET hcall() is used to reset all of the partition's
+ * interrupt exploitation structures to their initial state.  This
+ * means losing all previously set interrupt state set via
+ * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_reset(PowerPCCPU *cpu,
+                                sPAPRMachineState *spapr,
+                                target_ulong opcode,
+                                target_ulong *args)
+{
+    target_ulong flags   = args[0];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    xive_reset(spapr->xive);
+    return H_SUCCESS;
+}
+
+void xive_spapr_init(sPAPRMachineState *spapr)
+{
+    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
+    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
+    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
+    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
+    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
+                             h_int_set_os_reporting_line);
+    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
+                             h_int_get_os_reporting_line);
+    spapr_register_hypercall(H_INT_ESB, h_int_esb);
+    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
+    spapr_register_hypercall(H_INT_RESET, h_int_reset);
+}
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a66bbac35242..dd69c084baa6 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -13,6 +13,7 @@ struct sPAPRPHBState;
 struct sPAPRNVRAM;
 typedef struct sPAPREventLogEntry sPAPREventLogEntry;
 typedef struct sPAPREventSource sPAPREventSource;
+typedef struct XIVE XIVE;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -115,6 +116,7 @@ struct sPAPRMachineState {
     MemoryHotplugState hotplug_memory;
 
     const char *icp_type;
+    XIVE    *xive;
 };
 
 #define H_SUCCESS         0
@@ -371,7 +373,20 @@ struct sPAPRMachineState {
 #define H_INVALIDATE_PID        0x378
 #define H_REGISTER_PROC_TBL     0x37C
 #define H_SIGNAL_SYS_RESET      0x380
-#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
+
+#define H_INT_GET_SOURCE_INFO   0x3A8
+#define H_INT_SET_SOURCE_CONFIG 0x3AC
+#define H_INT_GET_SOURCE_CONFIG 0x3B0
+#define H_INT_GET_QUEUE_INFO    0x3B4
+#define H_INT_SET_QUEUE_CONFIG  0x3B8
+#define H_INT_GET_QUEUE_CONFIG  0x3BC
+#define H_INT_SET_OS_REPORTING_LINE 0x3C0
+#define H_INT_GET_OS_REPORTING_LINE 0x3C4
+#define H_INT_ESB               0x3C8
+#define H_INT_SYNC              0x3CC
+#define H_INT_RESET             0x3D0
+
+#define MAX_HCALL_OPCODE        H_INT_RESET
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index f87df8107dd9..af48d62cc776 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -63,4 +63,8 @@ struct XiveICPState {
     uint8_t *tima_os;
 };
 
+typedef struct sPAPRMachineState sPAPRMachineState;
+
+void xive_spapr_init(sPAPRMachineState *spapr);
+
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 18/26] ppc/xive: add device tree support
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (16 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 17/26] ppc/xive: add hcalls support Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions Cédric Le Goater
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

As for XICS, the XIVE interface for the guest is described in the
device tree under the interrupt controller node. A couple of new
properties are specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA) for the user level for the OS level. Only
   the OS level is taken into account.

 - "ibm,xive-eq-sizes"

   the size of the event queues.

 - "ibm,xive-lisn-ranges"

   the interrupt numbers ranges assigned to the guest. These are
   allocated using a simple bitmap.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive_spapr.c  | 36 ++++++++++++++++++++++++++++++++++++
 include/hw/ppc/xive.h |  1 +
 2 files changed, 37 insertions(+)

diff --git a/hw/intc/xive_spapr.c b/hw/intc/xive_spapr.c
index b634d1f28f10..64282cb4bfab 100644
--- a/hw/intc/xive_spapr.c
+++ b/hw/intc/xive_spapr.c
@@ -743,3 +743,39 @@ void xive_spapr_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_INT_SYNC, h_int_sync);
     spapr_register_hypercall(H_INT_RESET, h_int_reset);
 }
+
+void xive_spapr_populate(XIVE *x, void *fdt)
+{
+    int node;
+    uint64_t timas[2 * 2];
+    uint32_t lisn_ranges[] = {
+        cpu_to_be32(x->int_ipi_top - x->int_base - x->nr_targets),  /* start */
+        cpu_to_be32(x->nr_targets),  /* count */
+    };
+    uint32_t eq_sizes[] = {
+        cpu_to_be32(12), /* 4K */
+        cpu_to_be32(16), /* 64K */
+        cpu_to_be32(21), /* 2M */
+        cpu_to_be32(24), /* 16M */
+    };
+    int i;
+
+    /* Thread Interrupt Management Areas : User and OS */
+    for (i = 0; i < 2; i++) {
+        timas[i * 2] = cpu_to_be64(x->tm_base + i * (1 << x->tm_shift));
+        timas[i * 2 + 1] = cpu_to_be64(1 << x->tm_shift);
+    }
+
+    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
+
+    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
+    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
+
+    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
+    _FDT(fdt_setprop_cell(fdt, node, "#interrupt-cells", 2));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
+                     sizeof(eq_sizes)));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
+                     sizeof(lisn_ranges)));
+}
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index af48d62cc776..288116aeb8f4 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -66,5 +66,6 @@ struct XiveICPState {
 typedef struct sPAPRMachineState sPAPRMachineState;
 
 void xive_spapr_init(sPAPRMachineState *spapr);
+void xive_spapr_populate(XIVE *x, void *fdt);
 
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (17 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 18/26] ppc/xive: add device tree support Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-25  2:54   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 20/26] ppc/xive: introduce a helper to create XIVE interrupt source objects Cédric Le Goater
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

It will be used when the guest chooses the XIVE exploitation mode in
CAS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c        | 11 +++++++++++
 include/hw/ppc/xive.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index cda1fa18e44d..895dd2b2f61b 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -915,3 +915,14 @@ bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
 
     return true;
 }
+
+void xive_mmio_map(XIVE *x)
+{
+    /* ESBs */
+    sysbus_mmio_map(SYS_BUS_DEVICE(x), 0, x->vc_base);
+
+    /* Thread Management Interrupt Areas */
+    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
+     * region needs some rework in the handlers */
+    sysbus_mmio_map(SYS_BUS_DEVICE(x), 1, x->tm_base + (1 << x->tm_shift));
+}
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 288116aeb8f4..560f6ab66f73 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -68,4 +68,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 void xive_spapr_init(sPAPRMachineState *spapr);
 void xive_spapr_populate(XIVE *x, void *fdt);
 
+void xive_mmio_map(XIVE *x);
+
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 20/26] ppc/xive: introduce a helper to create XIVE interrupt source objects
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (18 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 21/26] ppc/xive: introduce routines to allocate IRQ numbers Cédric Le Goater
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c        | 21 +++++++++++++++++++++
 include/hw/ppc/xive.h |  4 ++++
 2 files changed, 25 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 895dd2b2f61b..bec123649ebd 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -726,6 +726,27 @@ static const TypeInfo xive_ics_info = {
     .class_init = xive_ics_class_init,
 };
 
+void xive_ics_create(XiveICSState *xs, XIVE *x, uint32_t offset,
+                     uint32_t nr_irqs, uint32_t shift,
+                     uint32_t flags, Error **errp)
+{
+    Error *error = NULL;
+
+    object_property_add_const_link(OBJECT(xs), "xive", OBJECT(x),
+                                   &error_fatal);
+    object_property_add_const_link(OBJECT(xs), "xics",
+                                   OBJECT(qdev_get_machine()), &error_fatal);
+    object_property_set_int(OBJECT(xs), shift, "shift", &error_fatal);
+    object_property_set_int(OBJECT(xs), flags, "flags", &error_fatal);
+    object_property_set_int(OBJECT(xs), offset, "irq-base", &error_fatal);
+    object_property_set_int(OBJECT(xs), nr_irqs, "nr-irqs", &error_fatal);
+    object_property_set_bool(OBJECT(xs), true, "realized", &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+}
+
 /*
  * Main XIVE object
  */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 560f6ab66f73..a1c7797658ba 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -70,4 +70,8 @@ void xive_spapr_populate(XIVE *x, void *fdt);
 
 void xive_mmio_map(XIVE *x);
 
+void xive_ics_create(XiveICSState *xs, XIVE *x, uint32_t offset,
+                     uint32_t nr_irqs, uint32_t shift, uint32_t flags,
+                     Error **errp);
+
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 21/26] ppc/xive: introduce routines to allocate IRQ numbers
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (19 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 20/26] ppc/xive: introduce a helper to create XIVE interrupt source objects Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 22/26] ppc/xive: create an XIVE interrupt source to handle IPIs Cédric Le Goater
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

The IRQ number allocator is inspired by OPAL which allocates IPI IRQ
numbers from the bottom of the IRQ number space and allocates the HW
IRQ numbers from the top.

So, this might be slightly overkill for our need. Needs to be
discussed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c        | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/xive.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index bec123649ebd..42eefbe7fd65 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -748,6 +748,59 @@ void xive_ics_create(XiveICSState *xs, XIVE *x, uint32_t offset,
 }
 
 /*
+ * IRQ number allocators
+ */
+uint32_t xive_alloc_hw_irqs(XIVE *x, uint32_t count, uint32_t align)
+{
+    uint32_t base;
+    int i;
+
+    base = x->int_hw_bot - count;
+    base &= ~(align - 1);
+    if (base < x->int_ipi_top) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "XIVE: HW alloc request for %d interrupts "
+                      "aligned to %d failed\n",
+                      count, align);
+        return -1;
+    }
+
+    x->int_hw_bot = base;
+
+    for (i = 0; i < count; i++) {
+        XiveIVE *ive = xive_get_ive(x, base + i);
+
+        ive->w = IVE_VALID | IVE_MASKED;
+    }
+    return base;
+}
+
+static uint32_t xive_alloc_ipi_irqs(XIVE *x, uint32_t count, uint32_t align)
+{
+    uint32_t base;
+    int i;
+
+    base = x->int_ipi_top + (align - 1);
+    base &= ~(align - 1);
+    if (base >= x->int_hw_bot) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "IPI alloc request for %d interrupts aligned to %d "
+                      "failed\n",
+                      count, align);
+                return -1;
+    }
+
+    x->int_ipi_top = base + count;
+
+    for (i = 0; i < count; i++) {
+        XiveIVE *ive = xive_get_ive(x, base + i);
+
+        ive->w = IVE_VALID | IVE_MASKED;
+    }
+    return base;
+}
+
+/*
  * Main XIVE object
  */
 
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index a1c7797658ba..3c1cd96ea4d0 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -69,6 +69,7 @@ void xive_spapr_init(sPAPRMachineState *spapr);
 void xive_spapr_populate(XIVE *x, void *fdt);
 
 void xive_mmio_map(XIVE *x);
+uint32_t xive_alloc_hw_irqs(XIVE *x, uint32_t count, uint32_t align);
 
 void xive_ics_create(XiveICSState *xs, XIVE *x, uint32_t offset,
                      uint32_t nr_irqs, uint32_t shift, uint32_t flags,
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 22/26] ppc/xive: create an XIVE interrupt source to handle IPIs
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (20 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 21/26] ppc/xive: introduce routines to allocate IRQ numbers Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 23/26] spapr: add a XIVE object to the sPAPR machine Cédric Le Goater
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Isolate the IPIs in their own interrupt source.

This is not strictly needed for sPAPR, but it might useful for
PowerNV.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive-internal.h |  2 ++
 hw/intc/xive.c          | 24 +++++++++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index 5e8b78a1ea6a..f37e07f00038 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -204,6 +204,8 @@ struct XIVE {
     uint32_t     tm_shift;
     hwaddr       tm_base;
     MemoryRegion tm_iomem;
+
+    XiveICSState ipi_xs;
 };
 
 void xive_reset(void *dev);
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 42eefbe7fd65..257b324e1d32 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -821,6 +821,9 @@ static uint32_t xive_alloc_ipi_irqs(XIVE *x, uint32_t count, uint32_t align)
 #define TM_SHIFT         16
 #define TM_BAR_SIZE      (XIVE_TM_RING_COUNT * (1 << TM_SHIFT))
 
+/* One 64k page. OPAL has two */
+#define IPI_ESB_SHIFT    (16)
+
 static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
 {
     qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
@@ -863,12 +866,18 @@ void xive_reset(void *dev)
 
 static void xive_init(Object *obj)
 {
-    ;
+    XIVE *x = XIVE(obj);
+
+    object_initialize(&x->ipi_xs, sizeof(x->ipi_xs), TYPE_ICS_XIVE);
+    object_property_add_child(obj, "ipis", OBJECT(&x->ipi_xs), NULL);
 }
 
 static void xive_realize(DeviceState *dev, Error **errp)
 {
+    Error *error = NULL;
     XIVE *x = XIVE(dev);
+    uint32_t ipi_base;
+    int i;
 
     if (!x->nr_targets) {
         error_setg(errp, "Number of interrupt targets needs to be greater 0");
@@ -917,6 +926,19 @@ static void xive_realize(DeviceState *dev, Error **errp)
                           "xive.tm", TM_BAR_SIZE);
     sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->tm_iomem);
 
+    /* IPI source */
+    ipi_base = xive_alloc_ipi_irqs(x, x->nr_targets, 1);
+
+    xive_ics_create(&x->ipi_xs, x, ipi_base, x->nr_targets,
+                    IPI_ESB_SHIFT, XIVE_SRC_TRIGGER, &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+
+    for (i = 0; i < ICS_BASE(&x->ipi_xs)->nr_irqs; i++) {
+        ics_set_irq_type(ICS_BASE(&x->ipi_xs), i, false);
+    }
     qemu_register_reset(xive_reset, dev);
 }
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 23/26] spapr: add a XIVE object to the sPAPR machine
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (21 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 22/26] ppc/xive: create an XIVE interrupt source to handle IPIs Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 24/26] spapr: include the XIVE interrupt source for IPIs Cédric Le Goater
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Let's create the XIVE object whether it used or not by the
machine. CAS will decide which model will be used for the interrupt
controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0256e7a537bf..45527b4c5eca 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -54,6 +54,7 @@
 #include "hw/ppc/spapr_vio.h"
 #include "hw/pci-host/spapr.h"
 #include "hw/ppc/xics.h"
+#include "hw/ppc/xive.h"
 #include "hw/pci/msi.h"
 
 #include "hw/pci/pci.h"
@@ -204,6 +205,38 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
     }
 }
 
+static XIVE *spapr_xive_create(sPAPRMachineState *spapr, int nr_servers,
+                               Error **errp)
+{
+    Error *local_err = NULL;
+    Object *obj;
+
+    /* TODO: We don't have KVM support yet so check irqchip=off here */
+    if (kvm_enabled() && machine_kernel_irqchip_required(MACHINE(spapr))) {
+        error_prepend(errp, "kernel_irqchip requested but unavailable");
+        return NULL;
+    }
+
+    obj = object_new(TYPE_XIVE);
+    object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
+    object_property_set_int(obj, nr_servers, "nr-targets", &local_err);
+    if (local_err) {
+        goto error;
+    }
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        goto error;
+    }
+
+    /* Install hcalls */
+    xive_spapr_init(spapr);
+
+    return XIVE(obj);
+error:
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
                                   int smt_threads)
 {
@@ -2192,6 +2225,14 @@ static void ppc_spapr_init(MachineState *machine)
     /* Set up Interrupt Controller before we create the VCPUs */
     xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
 
+    /* Set up XIVE. CAS will choose whether the guest runs in XICS
+     * (legacy mode) or XIVE Exploitation mode
+     *
+     * TODO: if XIVE creation fails, force the use of XICS legacy
+     */
+    spapr->xive = spapr_xive_create(spapr, xics_max_server_number(),
+                                    &error_fatal);
+
     /* Set up containers for ibm,client-set-architecture negotiated options */
     spapr->ov5 = spapr_ovec_new();
     spapr->ov5_cas = spapr_ovec_new();
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 24/26] spapr: include the XIVE interrupt source for IPIs
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (22 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 23/26] spapr: add a XIVE object to the sPAPR machine Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 25/26] spapr: print the XIVE interrupt source for IPIs in the monitor Cédric Le Goater
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive_spapr.c  | 10 ++++++++++
 hw/ppc/spapr.c        | 11 ++++++++++-
 include/hw/ppc/xive.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/intc/xive_spapr.c b/hw/intc/xive_spapr.c
index 64282cb4bfab..eb8a5c081e51 100644
--- a/hw/intc/xive_spapr.c
+++ b/hw/intc/xive_spapr.c
@@ -26,6 +26,16 @@
 
 #include "xive-internal.h"
 
+/*
+ * Used by the XICSFabric ics_get handler in sPAPR
+ */
+ICSState *xive_ics_get(XIVE *x, uint32_t lisn)
+{
+    ICSState *ics = ICS_BASE(&x->ipi_xs);
+
+    return ics_valid_irq(ics, lisn) ? ics : NULL;
+}
+
 static XiveICSState *xive_ics_find(sPAPRMachineState *spapr, uint32_t lisn)
 {
     XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 45527b4c5eca..816661f4c9ad 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3361,7 +3361,16 @@ static ICSState *spapr_ics_get(XICSFabric *dev, int irq)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(dev);
 
-    return ics_valid_irq(spapr->ics, irq) ? spapr->ics : NULL;
+    if (ics_valid_irq(spapr->ics, irq)) {
+        return spapr->ics;
+    }
+
+    /* If needed, check the XIVE IPI source also */
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return xive_ics_get(spapr->xive, irq);
+    }
+
+    return NULL;
 }
 
 static void spapr_ics_resend(XICSFabric *dev)
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 3c1cd96ea4d0..dc5309264422 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -74,5 +74,6 @@ uint32_t xive_alloc_hw_irqs(XIVE *x, uint32_t count, uint32_t align);
 void xive_ics_create(XiveICSState *xs, XIVE *x, uint32_t offset,
                      uint32_t nr_irqs, uint32_t shift, uint32_t flags,
                      Error **errp);
+ICSState *xive_ics_get(XIVE *x, uint32_t lisn);
 
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 25/26] spapr: print the XIVE interrupt source for IPIs in the monitor
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (23 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 24/26] spapr: include the XIVE interrupt source for IPIs Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 26/26] spapr: force XIVE exploitation mode for POWER9 (HACK) Cédric Le Goater
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive_spapr.c  | 5 +++++
 hw/ppc/spapr.c        | 4 ++++
 include/hw/ppc/xive.h | 1 +
 3 files changed, 10 insertions(+)

diff --git a/hw/intc/xive_spapr.c b/hw/intc/xive_spapr.c
index eb8a5c081e51..4f689f8b97c0 100644
--- a/hw/intc/xive_spapr.c
+++ b/hw/intc/xive_spapr.c
@@ -36,6 +36,11 @@ ICSState *xive_ics_get(XIVE *x, uint32_t lisn)
     return ics_valid_irq(ics, lisn) ? ics : NULL;
 }
 
+void xive_ics_pic_print_info(XIVE *x, Monitor *mon)
+{
+    ics_pic_print_info(ICS_BASE(&x->ipi_xs), mon);
+}
+
 static XiveICSState *xive_ics_find(sPAPRMachineState *spapr, uint32_t lisn)
 {
     XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 816661f4c9ad..ca3a6bc2ea16 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3399,6 +3399,10 @@ static void spapr_pic_print_info(InterruptStatsProvider *obj,
         icp_pic_print_info(ICP(cpu->intc), mon);
     }
 
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        xive_ics_pic_print_info(spapr->xive, mon);
+    }
+
     ics_pic_print_info(spapr->ics, mon);
 }
 
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index dc5309264422..ee9b32d8c884 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -75,5 +75,6 @@ void xive_ics_create(XiveICSState *xs, XIVE *x, uint32_t offset,
                      uint32_t nr_irqs, uint32_t shift, uint32_t flags,
                      Error **errp);
 ICSState *xive_ics_get(XIVE *x, uint32_t lisn);
+void xive_ics_pic_print_info(XIVE *x, Monitor *mon);
 
 #endif /* PPC_XIVE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [Qemu-devel] [RFC PATCH 26/26] spapr: force XIVE exploitation mode for POWER9 (HACK)
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (24 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 25/26] spapr: print the XIVE interrupt source for IPIs in the monitor Cédric Le Goater
@ 2017-07-05 17:13 ` Cédric Le Goater
  2017-07-25  2:43   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
  2017-07-10 10:24 ` [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
  2017-07-19  3:00 ` David Gibson
  27 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel,
	Cédric Le Goater

The CAS negotiation process determines the interrupt controller model
to use in the guest but currently, the sPAPR machine make uses of the
controller very early in the initialization sequence. The interrupt
source is used to allocate IRQ numbers and populate the device tree
and the interrupt presenter objects are created along with the CPU.

One solution would be use a bitmap to allocate these IRQ numbers and
then instantiate the interrupt source object of the correct type with
the bitmap as a constructor parameter.

As for the interrupt presenter objects, we could allocated them later
in the boot process. May be on demand, when a CPU is first notified.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ca3a6bc2ea16..623fc776c886 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -237,6 +237,38 @@ error:
     return NULL;
 }
 
+static XiveICSState *spapr_xive_ics_create(XIVE *x, int nr_irqs, Error **errp)
+{
+    Error *local_err = NULL;
+    int irq_base;
+    Object *obj;
+
+    /*
+     * TODO: use an XICS_IRQ_BASE alignment to be in sync with XICS
+     * irq numbers. we should probably simplify the XIVE model or use
+     * a common allocator. a bitmap maybe ?
+     */
+    irq_base = xive_alloc_hw_irqs(x, nr_irqs, XICS_IRQ_BASE);
+    if (irq_base < 0) {
+        error_setg(errp, "Failed to allocate %d irqs", nr_irqs);
+        return NULL;
+    }
+
+    obj = object_new(TYPE_ICS_XIVE);
+    object_property_add_child(OBJECT(x), "hw", obj, NULL);
+
+    xive_ics_create(ICS_XIVE(obj), x, irq_base, nr_irqs, 16 /* 64KB page */,
+                    XIVE_SRC_TRIGGER, &local_err);
+    if (local_err) {
+        goto error;
+    }
+    return ICS_XIVE(obj);
+
+error:
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
                                   int smt_threads)
 {
@@ -814,6 +846,11 @@ static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
     /* /interrupt controller */
     if (!spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)) {
         spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
+    } else {
+        xive_spapr_populate(spapr->xive, fdt);
+
+        /* Install XIVE MMIOs */
+        xive_mmio_map(spapr->xive);
     }
 
     offset = fdt_path_offset(fdt, "/chosen");
@@ -963,6 +1000,13 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
         } else {
             val[3] = 0x00; /* Hash */
         }
+
+        /* TODO: introduce a kvmppc_has_cap_xive() ? Works with
+         * irqchip=off for now
+         */
+        if (first_ppc_cpu->env.excp_model & POWERPC_EXCP_POWER9) {
+            val[1] = 0x01;
+        }
     } else {
         if (first_ppc_cpu->env.mmu_model & POWERPC_MMU_V3) {
             /* V3 MMU supports both hash and radix (with dynamic switching) */
@@ -971,6 +1015,9 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
             /* Otherwise we can only do hash */
             val[3] = 0x00;
         }
+        if (first_ppc_cpu->env.excp_model & POWERPC_EXCP_POWER9) {
+            val[1] = 0x01;
+        }
     }
     _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
                      val, sizeof(val)));
@@ -2237,6 +2284,21 @@ static void ppc_spapr_init(MachineState *machine)
     spapr->ov5 = spapr_ovec_new();
     spapr->ov5_cas = spapr_ovec_new();
 
+    /* TODO: force XIVE mode by default on POWER9.
+     *
+     * Switching from XICS to XIVE is badly broken. The ICP type is
+     * incorrect and the ICS is needed before the CAS negotiation to
+     * allocate irq numbers ...
+     */
+    if (strstr(machine->cpu_model, "POWER9") ||
+        !strcmp(machine->cpu_model, "host")) {
+        spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
+
+        spapr->icp_type = TYPE_XIVE_ICP;
+        spapr->ics = ICS_BASE(
+            spapr_xive_ics_create(spapr->xive, XICS_IRQS_SPAPR, &error_fatal));
+    }
+
     if (smc->dr_lmb_enabled) {
         spapr_ovec_set(spapr->ov5, OV5_DRCONF_MEMORY);
         spapr_validate_node_memory(machine, &error_fatal);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (25 preceding siblings ...)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 26/26] spapr: force XIVE exploitation mode for POWER9 (HACK) Cédric Le Goater
@ 2017-07-10 10:24 ` David Gibson
  2017-07-10 12:36   ` Cédric Le Goater
  2017-07-19  3:00 ` David Gibson
  27 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-10 10:24 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1666 bytes --]

On Wed, Jul 05, 2017 at 07:13:13PM +0200, Cédric Le Goater wrote:
> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> negotiation process determines whether the guest operates with an
> interrupt controller using the XICS legacy model, as found on POWER8,
> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> patchset is a first proposal to add XIVE support in the sPAPR machine.
> 
> The first patches introduce the XIVE exploitation mode in CAS.
> 
> Follow models for the XIVE interrupt controller, source and presenter.
> We try to reuse the ICS and ICP models of XICS because the sPAPR
> machine is tied to the XICSFabric interface and should be using a
> common framework to be able to switch from one controller model to
> another. To be discussed of course.
> 
> Then comes support for the Hypervisor's call which are used to
> configure the interrupt sources and the event/notification queues of
> the guest.
> 
> Finally, the last patches try to integrate the XIVE interrupt model in
> the sPAPR machine and this not without a couple of serious hacks to
> have something to test. See 'Caveats' below for more details.
> 
> This is a first draft and I expect a lot of rewrite before it reaches
> mainline QEMU. Nevertheless, it compiles, boots and can be used for
> some testing.

1 & 2 are straightforward enough that I've applied them already.  The
rest will take longer to review, obviously.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9 Cédric Le Goater
@ 2017-07-10 10:26   ` David Gibson
  2017-07-10 12:49     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-10 10:26 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4104 bytes --]

On Wed, Jul 05, 2017 at 07:13:16PM +0200, Cédric Le Goater wrote:
> Prepare ground for the new exception model XIVE of POWER9.

I'm a bit confused by this.  The excp_model is about the CPU core's
irq model, not the external irq controller's.

Now.. I could imagine the POWER9 having a different core model that
came along with XIVE, but I can't see this new model being used for
anything anywhere in the rest of the series.

> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  target/ppc/cpu-qom.h        | 2 ++
>  target/ppc/excp_helper.c    | 9 ++++++---
>  target/ppc/translate.c      | 3 ++-
>  target/ppc/translate_init.c | 2 +-
>  4 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> index d0cf6ca2a971..d7b78cf3f71c 100644
> --- a/target/ppc/cpu-qom.h
> +++ b/target/ppc/cpu-qom.h
> @@ -132,6 +132,8 @@ enum powerpc_excp_t {
>      POWERPC_EXCP_POWER7,
>      /* POWER8 exception model           */
>      POWERPC_EXCP_POWER8,
> +    /* POWER9 exception model           */
> +    POWERPC_EXCP_POWER9,
>  };
>  
>  /*****************************************************************************/
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 3a9f0861e773..dc7dff36a580 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -148,9 +148,11 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>       */
>  #if defined(TARGET_PPC64)
>      if (excp_model == POWERPC_EXCP_POWER7 ||
> -        excp_model == POWERPC_EXCP_POWER8) {
> +        excp_model == POWERPC_EXCP_POWER8 ||
> +        excp_model == POWERPC_EXCP_POWER9) {
>          lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
> -        if (excp_model == POWERPC_EXCP_POWER8) {
> +        if (excp_model == POWERPC_EXCP_POWER8 ||
> +            excp_model == POWERPC_EXCP_POWER9) {
>              ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
>          } else {
>              ail = 0;
> @@ -651,7 +653,8 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>          if (!(new_msr & MSR_HVB) && (env->spr[SPR_LPCR] & LPCR_ILE)) {
>              new_msr |= (target_ulong)1 << MSR_LE;
>          }
> -    } else if (excp_model == POWERPC_EXCP_POWER8) {
> +    } else if (excp_model == POWERPC_EXCP_POWER8 ||
> +               excp_model == POWERPC_EXCP_POWER9) {
>          if (new_msr & MSR_HVB) {
>              if (env->spr[SPR_HID0] & HID0_HILE) {
>                  new_msr |= (target_ulong)1 << MSR_LE;
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index c0cd64d927c2..2d8c1b9e6836 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -7064,7 +7064,8 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>  
>  #if defined(TARGET_PPC64)
>      if (env->excp_model == POWERPC_EXCP_POWER7 ||
> -        env->excp_model == POWERPC_EXCP_POWER8) {
> +        env->excp_model == POWERPC_EXCP_POWER8 ||
> +        env->excp_model == POWERPC_EXCP_POWER9) {
>          cpu_fprintf(f, "HSRR0 " TARGET_FMT_lx " HSRR1 " TARGET_FMT_lx "\n",
>                      env->spr[SPR_HSRR0], env->spr[SPR_HSRR1]);
>      }
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index 53aff5a7b734..b8c7b8150318 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8962,7 +8962,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
>      pcc->sps = &POWER7_POWER8_sps;
>      pcc->radix_page_info = &POWER9_radix_page_info;
>  #endif
> -    pcc->excp_model = POWERPC_EXCP_POWER8;
> +    pcc->excp_model = POWERPC_EXCP_POWER9;
>      pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
>      pcc->bfd_mach = bfd_mach_ppc64;
>      pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
  2017-07-10 10:24 ` [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
@ 2017-07-10 12:36   ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-10 12:36 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/10/2017 12:24 PM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:13PM +0200, Cédric Le Goater wrote:
>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>> negotiation process determines whether the guest operates with an
>> interrupt controller using the XICS legacy model, as found on POWER8,
>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>> patchset is a first proposal to add XIVE support in the sPAPR machine.
>>
>> The first patches introduce the XIVE exploitation mode in CAS.
>>
>> Follow models for the XIVE interrupt controller, source and presenter.
>> We try to reuse the ICS and ICP models of XICS because the sPAPR
>> machine is tied to the XICSFabric interface and should be using a
>> common framework to be able to switch from one controller model to
>> another. To be discussed of course.
>>
>> Then comes support for the Hypervisor's call which are used to
>> configure the interrupt sources and the event/notification queues of
>> the guest.
>>
>> Finally, the last patches try to integrate the XIVE interrupt model in
>> the sPAPR machine and this not without a couple of serious hacks to
>> have something to test. See 'Caveats' below for more details.
>>
>> This is a first draft and I expect a lot of rewrite before it reaches
>> mainline QEMU. Nevertheless, it compiles, boots and can be used for
>> some testing.
> 
> 1 & 2 are straightforward enough that I've applied them already.  The
> rest will take longer to review, obviously.

For sure ... I don't expect anything soon. This is really a first 
draft to show the differences with XICS in the overall mechanics. 
The guest boots and perf are OK but the integration with the sPAPR 
machine is a mess. I also think the IRQ allocator is too complex 
for the sPAPR need and the Xive ICP object is useless. The changelogs 
are too short. 

I have continued working on CAS support and have found a solution
which allows a guest to switch interrupt controller: XICS <-> XIVE, 
under TCG and under KVM,kernel_irqchip=off. 

The XIVE ICP lives under ICPState for ease of use. As for the ICS, 
two different objects, XIVE and XICS, are maintained under the 
sPAPR machine in which the 'irqs' array needs to be synced when 
changing model. It's not too much of a hack I think and it is 
migration friendly. We will see when discussed.

I have pushed on github these changes and I am now exploring the 
abyssal zone of migration and cpu hot-plugging.

Cheers,

C.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-10 10:26   ` David Gibson
@ 2017-07-10 12:49     ` Cédric Le Goater
  2017-07-10 21:00       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-10 12:49 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/10/2017 12:26 PM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:16PM +0200, Cédric Le Goater wrote:
>> Prepare ground for the new exception model XIVE of POWER9.
> 
> I'm a bit confused by this.  The excp_model is about the CPU core's
> irq model, not the external irq controller's.

yes this is true, but the POWER9 CPU is the only criteria we have 
to distinguish a machine supporting XIVE and XICS from one only 
supporting XICS.

My idea was to use this flag to activate the OV5_XIVE_EXPLOIT bit 
in ibm,arch-vec-5-platform-support ov5_platform, like this is done
for the MMU. See spapr_dt_ov5_platform_support()
 
> Now.. I could imagine the POWER9 having a different core model that
> came along with XIVE, but I can't see this new model being used for
> anything anywhere in the rest of the series.

See patch 26. But, maybe, I am taking a shortcut and we need another
family of flags. 

Thanks,

C. 

>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  target/ppc/cpu-qom.h        | 2 ++
>>  target/ppc/excp_helper.c    | 9 ++++++---
>>  target/ppc/translate.c      | 3 ++-
>>  target/ppc/translate_init.c | 2 +-
>>  4 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
>> index d0cf6ca2a971..d7b78cf3f71c 100644
>> --- a/target/ppc/cpu-qom.h
>> +++ b/target/ppc/cpu-qom.h
>> @@ -132,6 +132,8 @@ enum powerpc_excp_t {
>>      POWERPC_EXCP_POWER7,
>>      /* POWER8 exception model           */
>>      POWERPC_EXCP_POWER8,
>> +    /* POWER9 exception model           */
>> +    POWERPC_EXCP_POWER9,
>>  };
>>  
>>  /*****************************************************************************/
>> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
>> index 3a9f0861e773..dc7dff36a580 100644
>> --- a/target/ppc/excp_helper.c
>> +++ b/target/ppc/excp_helper.c
>> @@ -148,9 +148,11 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>>       */
>>  #if defined(TARGET_PPC64)
>>      if (excp_model == POWERPC_EXCP_POWER7 ||
>> -        excp_model == POWERPC_EXCP_POWER8) {
>> +        excp_model == POWERPC_EXCP_POWER8 ||
>> +        excp_model == POWERPC_EXCP_POWER9) {
>>          lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
>> -        if (excp_model == POWERPC_EXCP_POWER8) {
>> +        if (excp_model == POWERPC_EXCP_POWER8 ||
>> +            excp_model == POWERPC_EXCP_POWER9) {
>>              ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
>>          } else {
>>              ail = 0;
>> @@ -651,7 +653,8 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>>          if (!(new_msr & MSR_HVB) && (env->spr[SPR_LPCR] & LPCR_ILE)) {
>>              new_msr |= (target_ulong)1 << MSR_LE;
>>          }
>> -    } else if (excp_model == POWERPC_EXCP_POWER8) {
>> +    } else if (excp_model == POWERPC_EXCP_POWER8 ||
>> +               excp_model == POWERPC_EXCP_POWER9) {
>>          if (new_msr & MSR_HVB) {
>>              if (env->spr[SPR_HID0] & HID0_HILE) {
>>                  new_msr |= (target_ulong)1 << MSR_LE;
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index c0cd64d927c2..2d8c1b9e6836 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -7064,7 +7064,8 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>>  
>>  #if defined(TARGET_PPC64)
>>      if (env->excp_model == POWERPC_EXCP_POWER7 ||
>> -        env->excp_model == POWERPC_EXCP_POWER8) {
>> +        env->excp_model == POWERPC_EXCP_POWER8 ||
>> +        env->excp_model == POWERPC_EXCP_POWER9) {
>>          cpu_fprintf(f, "HSRR0 " TARGET_FMT_lx " HSRR1 " TARGET_FMT_lx "\n",
>>                      env->spr[SPR_HSRR0], env->spr[SPR_HSRR1]);
>>      }
>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>> index 53aff5a7b734..b8c7b8150318 100644
>> --- a/target/ppc/translate_init.c
>> +++ b/target/ppc/translate_init.c
>> @@ -8962,7 +8962,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
>>      pcc->sps = &POWER7_POWER8_sps;
>>      pcc->radix_page_info = &POWER9_radix_page_info;
>>  #endif
>> -    pcc->excp_model = POWERPC_EXCP_POWER8;
>> +    pcc->excp_model = POWERPC_EXCP_POWER9;
>>      pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
>>      pcc->bfd_mach = bfd_mach_ppc64;
>>      pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-10 12:49     ` Cédric Le Goater
@ 2017-07-10 21:00       ` Benjamin Herrenschmidt
  2017-07-11  9:01         ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-10 21:00 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Mon, 2017-07-10 at 14:49 +0200, Cédric Le Goater wrote:
> On 07/10/2017 12:26 PM, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:16PM +0200, Cédric Le Goater wrote:
> > > Prepare ground for the new exception model XIVE of POWER9.
> > 
> > I'm a bit confused by this.  The excp_model is about the CPU core's
> > irq model, not the external irq controller's.
> 
> yes this is true, but the POWER9 CPU is the only criteria we have 
> to distinguish a machine supporting XIVE and XICS from one only 
> supporting XICS.

Why ? I don't understand.

We do want an EXCP_POWER9 for other things, like the fact that we have
a separate interrupt input for hypervisor, with associated vectors
etc...  but that still doesn't relate to what interrupt controller is
there.

> My idea was to use this flag to activate the OV5_XIVE_EXPLOIT bit 
> in ibm,arch-vec-5-platform-support ov5_platform, like this is done
> for the MMU. See spapr_dt_ov5_platform_support()

I disagree, the MMU is in the core, the XIVE isn't. It would be
possibly to make a P9 core if a XICS in theory :-)

> > Now.. I could imagine the POWER9 having a different core model that
> > came along with XIVE, but I can't see this new model being used for
> > anything anywhere in the rest of the series.
> 
> See patch 26. But, maybe, I am taking a shortcut and we need another
> family of flags. 

Or just some kind of enum for the interrupt controller, how do we do
with OpenPIC vs. XICS already ? Old POWER3 had OpenPIC.

> Thanks,
> 
> C. 
> 
> > > 
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > > ---
> > >  target/ppc/cpu-qom.h        | 2 ++
> > >  target/ppc/excp_helper.c    | 9 ++++++---
> > >  target/ppc/translate.c      | 3 ++-
> > >  target/ppc/translate_init.c | 2 +-
> > >  4 files changed, 11 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> > > index d0cf6ca2a971..d7b78cf3f71c 100644
> > > --- a/target/ppc/cpu-qom.h
> > > +++ b/target/ppc/cpu-qom.h
> > > @@ -132,6 +132,8 @@ enum powerpc_excp_t {
> > >      POWERPC_EXCP_POWER7,
> > >      /* POWER8 exception model           */
> > >      POWERPC_EXCP_POWER8,
> > > +    /* POWER9 exception model           */
> > > +    POWERPC_EXCP_POWER9,
> > >  };
> > >  
> > >  /*****************************************************************************/
> > > diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> > > index 3a9f0861e773..dc7dff36a580 100644
> > > --- a/target/ppc/excp_helper.c
> > > +++ b/target/ppc/excp_helper.c
> > > @@ -148,9 +148,11 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
> > >       */
> > >  #if defined(TARGET_PPC64)
> > >      if (excp_model == POWERPC_EXCP_POWER7 ||
> > > -        excp_model == POWERPC_EXCP_POWER8) {
> > > +        excp_model == POWERPC_EXCP_POWER8 ||
> > > +        excp_model == POWERPC_EXCP_POWER9) {
> > >          lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
> > > -        if (excp_model == POWERPC_EXCP_POWER8) {
> > > +        if (excp_model == POWERPC_EXCP_POWER8 ||
> > > +            excp_model == POWERPC_EXCP_POWER9) {
> > >              ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
> > >          } else {
> > >              ail = 0;
> > > @@ -651,7 +653,8 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
> > >          if (!(new_msr & MSR_HVB) && (env->spr[SPR_LPCR] & LPCR_ILE)) {
> > >              new_msr |= (target_ulong)1 << MSR_LE;
> > >          }
> > > -    } else if (excp_model == POWERPC_EXCP_POWER8) {
> > > +    } else if (excp_model == POWERPC_EXCP_POWER8 ||
> > > +               excp_model == POWERPC_EXCP_POWER9) {
> > >          if (new_msr & MSR_HVB) {
> > >              if (env->spr[SPR_HID0] & HID0_HILE) {
> > >                  new_msr |= (target_ulong)1 << MSR_LE;
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index c0cd64d927c2..2d8c1b9e6836 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -7064,7 +7064,8 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
> > >  
> > >  #if defined(TARGET_PPC64)
> > >      if (env->excp_model == POWERPC_EXCP_POWER7 ||
> > > -        env->excp_model == POWERPC_EXCP_POWER8) {
> > > +        env->excp_model == POWERPC_EXCP_POWER8 ||
> > > +        env->excp_model == POWERPC_EXCP_POWER9) {
> > >          cpu_fprintf(f, "HSRR0 " TARGET_FMT_lx " HSRR1 " TARGET_FMT_lx "\n",
> > >                      env->spr[SPR_HSRR0], env->spr[SPR_HSRR1]);
> > >      }
> > > diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> > > index 53aff5a7b734..b8c7b8150318 100644
> > > --- a/target/ppc/translate_init.c
> > > +++ b/target/ppc/translate_init.c
> > > @@ -8962,7 +8962,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
> > >      pcc->sps = &POWER7_POWER8_sps;
> > >      pcc->radix_page_info = &POWER9_radix_page_info;
> > >  #endif
> > > -    pcc->excp_model = POWERPC_EXCP_POWER8;
> > > +    pcc->excp_model = POWERPC_EXCP_POWER9;
> > >      pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
> > >      pcc->bfd_mach = bfd_mach_ppc64;
> > >      pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-10 21:00       ` Benjamin Herrenschmidt
@ 2017-07-11  9:01         ` Cédric Le Goater
  2017-07-11 13:27           ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-11  9:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/10/2017 11:00 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-10 at 14:49 +0200, Cédric Le Goater wrote:
>> On 07/10/2017 12:26 PM, David Gibson wrote:
>>> On Wed, Jul 05, 2017 at 07:13:16PM +0200, Cédric Le Goater wrote:
>>>> Prepare ground for the new exception model XIVE of POWER9.
>>>
>>> I'm a bit confused by this.  The excp_model is about the CPU core's
>>> irq model, not the external irq controller's.
>>
>> yes this is true, but the POWER9 CPU is the only criteria we have 
>> to distinguish a machine supporting XIVE and XICS from one only 
>> supporting XICS.
> 
> Why ? I don't understand.
> 
> We do want an EXCP_POWER9 for other things, like the fact that we have
> a separate interrupt input for hypervisor, with associated vectors
> etc...  but that still doesn't relate to what interrupt controller is
> there.
> 
>> My idea was to use this flag to activate the OV5_XIVE_EXPLOIT bit 
>> in ibm,arch-vec-5-platform-support ov5_platform, like this is done
>> for the MMU. See spapr_dt_ov5_platform_support()
> 
> I disagree, the MMU is in the core, the XIVE isn't. It would be
> possibly to make a P9 core if a XICS in theory :-)

ok. I understand. We could even "build" one in QEMU. HW would be 
another story ... 

So should XIVE support be a sPAPR machine property only enabled if 
'cpu_model' matches "POWER9.*" ? The XICS/XIVE initialization is done 
quite early in the machine init so this needs some checks.

>>> Now.. I could imagine the POWER9 having a different core model that
>>> came along with XIVE, but I can't see this new model being used for
>>> anything anywhere in the rest of the series.
>>
>> See patch 26. But, maybe, I am taking a shortcut and we need another
>> family of flags. 
> 
> Or just some kind of enum for the interrupt controller, how do we do
> with OpenPIC vs. XICS already ? Old POWER3 had OpenPIC.

AFAICT, we don't have such a CPU in QEMU/ppc. 


We could use some extra flag to change the ICS behavior. The path I am
taking duplicates the ICS code but in real, we only need to change the
irq handlers. 

Thanks,

C. 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-11  9:01         ` Cédric Le Goater
@ 2017-07-11 13:27           ` David Gibson
  2017-07-11 13:52             ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-11 13:27 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]

On Tue, Jul 11, 2017 at 11:01:15AM +0200, Cédric Le Goater wrote:
> On 07/10/2017 11:00 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2017-07-10 at 14:49 +0200, Cédric Le Goater wrote:
> >> On 07/10/2017 12:26 PM, David Gibson wrote:
> >>> On Wed, Jul 05, 2017 at 07:13:16PM +0200, Cédric Le Goater wrote:
> >>>> Prepare ground for the new exception model XIVE of POWER9.
> >>>
> >>> I'm a bit confused by this.  The excp_model is about the CPU core's
> >>> irq model, not the external irq controller's.
> >>
> >> yes this is true, but the POWER9 CPU is the only criteria we have 
> >> to distinguish a machine supporting XIVE and XICS from one only 
> >> supporting XICS.
> > 
> > Why ? I don't understand.
> > 
> > We do want an EXCP_POWER9 for other things, like the fact that we have
> > a separate interrupt input for hypervisor, with associated vectors
> > etc...  but that still doesn't relate to what interrupt controller is
> > there.
> > 
> >> My idea was to use this flag to activate the OV5_XIVE_EXPLOIT bit 
> >> in ibm,arch-vec-5-platform-support ov5_platform, like this is done
> >> for the MMU. See spapr_dt_ov5_platform_support()
> > 
> > I disagree, the MMU is in the core, the XIVE isn't. It would be
> > possibly to make a P9 core if a XICS in theory :-)
> 
> ok. I understand. We could even "build" one in QEMU. HW would be 
> another story ... 
> 
> So should XIVE support be a sPAPR machine property only enabled if 
> 'cpu_model' matches "POWER9.*" ? The XICS/XIVE initialization is done 
> quite early in the machine init so this needs some checks.

Basically, yes.  The interrupt controller setup is generally something
the machine looks after.  What I'd actually suggest is a machine
parameter for XICS vs. XIVE, whose default value is based on the CPU
model.  Just as we could build a POWER9 with XICS in qemu, we could
build a POWER8 with XIVE.

> 
> >>> Now.. I could imagine the POWER9 having a different core model that
> >>> came along with XIVE, but I can't see this new model being used for
> >>> anything anywhere in the rest of the series.
> >>
> >> See patch 26. But, maybe, I am taking a shortcut and we need another
> >> family of flags. 
> > 
> > Or just some kind of enum for the interrupt controller, how do we do
> > with OpenPIC vs. XICS already ? Old POWER3 had OpenPIC.
> 
> AFAICT, we don't have such a CPU in QEMU/ppc.

More to the point we don't have any machine type for those old POWER3
setups.

> We could use some extra flag to change the ICS behavior. The path I am
> taking duplicates the ICS code but in real, we only need to change the
> irq handlers. 
> 
> Thanks,
> 
> C. 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-11 13:27           ` David Gibson
@ 2017-07-11 13:52             ` Cédric Le Goater
  2017-07-11 21:20               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-11 13:52 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/11/2017 03:27 PM, David Gibson wrote:
> On Tue, Jul 11, 2017 at 11:01:15AM +0200, Cédric Le Goater wrote:
>> On 07/10/2017 11:00 PM, Benjamin Herrenschmidt wrote:
>>> On Mon, 2017-07-10 at 14:49 +0200, Cédric Le Goater wrote:
>>>> On 07/10/2017 12:26 PM, David Gibson wrote:
>>>>> On Wed, Jul 05, 2017 at 07:13:16PM +0200, Cédric Le Goater wrote:
>>>>>> Prepare ground for the new exception model XIVE of POWER9.
>>>>>
>>>>> I'm a bit confused by this.  The excp_model is about the CPU core's
>>>>> irq model, not the external irq controller's.
>>>>
>>>> yes this is true, but the POWER9 CPU is the only criteria we have 
>>>> to distinguish a machine supporting XIVE and XICS from one only 
>>>> supporting XICS.
>>>
>>> Why ? I don't understand.
>>>
>>> We do want an EXCP_POWER9 for other things, like the fact that we have
>>> a separate interrupt input for hypervisor, with associated vectors
>>> etc...  but that still doesn't relate to what interrupt controller is
>>> there.
>>>
>>>> My idea was to use this flag to activate the OV5_XIVE_EXPLOIT bit 
>>>> in ibm,arch-vec-5-platform-support ov5_platform, like this is done
>>>> for the MMU. See spapr_dt_ov5_platform_support()
>>>
>>> I disagree, the MMU is in the core, the XIVE isn't. It would be
>>> possibly to make a P9 core if a XICS in theory :-)
>>
>> ok. I understand. We could even "build" one in QEMU. HW would be 
>> another story ... 
>>
>> So should XIVE support be a sPAPR machine property only enabled if 
>> 'cpu_model' matches "POWER9.*" ? The XICS/XIVE initialization is done 
>> quite early in the machine init so this needs some checks.
> 
> Basically, yes.  The interrupt controller setup is generally something
> the machine looks after.  What I'd actually suggest is a machine
> parameter for XICS vs. XIVE, whose default value is based on the CPU
> model.  

yes. That is what I have been starting to work with now : 
 
+static bool ppc_support_xive(MachineState *machine)
+{
+   PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(first_cpu);
+
+   return pcc->pvr_match(pcc, CPU_POWERPC_POWER9_BASE);
+}

I am using the PVR to catch the cpu 'host' model. We could add 
later on a machine property to disable the XIVE mode.  


> Just as we could build a POWER9 with XICS in qemu, we could build a 
> POWER8 with XIVE.

That might be the case with a POWER9 running in POWER8 compat mode. 
I need to check. 

Thanks,

C. 


>>>>> Now.. I could imagine the POWER9 having a different core model that
>>>>> came along with XIVE, but I can't see this new model being used for
>>>>> anything anywhere in the rest of the series.
>>>>
>>>> See patch 26. But, maybe, I am taking a shortcut and we need another
>>>> family of flags. 
>>>
>>> Or just some kind of enum for the interrupt controller, how do we do
>>> with OpenPIC vs. XICS already ? Old POWER3 had OpenPIC.
>>
>> AFAICT, we don't have such a CPU in QEMU/ppc.
> 
> More to the point we don't have any machine type for those old POWER3
> setups.
> 
>> We could use some extra flag to change the ICS behavior. The path I am
>> taking duplicates the ICS code but in real, we only need to change the
>> irq handlers. 
>>
>> Thanks,
>>
>> C. 
>>
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9
  2017-07-11 13:52             ` Cédric Le Goater
@ 2017-07-11 21:20               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-11 21:20 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Tue, 2017-07-11 at 15:52 +0200, Cédric Le Goater wrote:
> 
> > Just as we could build a POWER9 with XICS in qemu, we could build a 
> > POWER8 with XIVE.
> 
> That might be the case with a POWER9 running in POWER8 compat mode. 
> I need to check. 

Yes. In "compat" mode (or more generally if the guest doesn't advertize
support for native XIVE, whether it's in compat or native P9 mode is
irrelevant in fact), we provide the old hcalls and effectively need to
instantiate a XICS in qemu (KVM will exploit the XIVE under the hood
but will make it look like a XICS to qemu ioctl's as well).

Cheers,
Ben.

> Thanks,
> 
> C. 
> 
> 
> > > > > > Now.. I could imagine the POWER9 having a different core model that
> > > > > > came along with XIVE, but I can't see this new model being used for
> > > > > > anything anywhere in the rest of the series.
> > > > > 
> > > > > See patch 26. But, maybe, I am taking a shortcut and we need another
> > > > > family of flags. 
> > > > 
> > > > Or just some kind of enum for the interrupt controller, how do we do
> > > > with OpenPIC vs. XICS already ? Old POWER3 had OpenPIC.
> > > 
> > > AFAICT, we don't have such a CPU in QEMU/ppc.
> > 
> > More to the point we don't have any machine type for those old POWER3
> > setups.
> > 
> > > We could use some extra flag to change the ICS behavior. The path I am
> > > taking duplicates the ICS code but in real, we only need to change the
> > > irq handlers. 
> > > 
> > > Thanks,
> > > 
> > > C. 
> > > 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
  2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (26 preceding siblings ...)
  2017-07-10 10:24 ` [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
@ 2017-07-19  3:00 ` David Gibson
  2017-07-19  3:55   ` Benjamin Herrenschmidt
  27 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-19  3:00 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1740 bytes --]

On Wed, Jul 05, 2017 at 07:13:13PM +0200, Cédric Le Goater wrote:
> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> negotiation process determines whether the guest operates with an
> interrupt controller using the XICS legacy model, as found on POWER8,
> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> patchset is a first proposal to add XIVE support in the sPAPR machine.
> 
> The first patches introduce the XIVE exploitation mode in CAS.
> 
> Follow models for the XIVE interrupt controller, source and presenter.
> We try to reuse the ICS and ICP models of XICS because the sPAPR
> machine is tied to the XICSFabric interface and should be using a
> common framework to be able to switch from one controller model to
> another. To be discussed of course.
> 
> Then comes support for the Hypervisor's call which are used to
> configure the interrupt sources and the event/notification queues of
> the guest.
> 
> Finally, the last patches try to integrate the XIVE interrupt model in
> the sPAPR machine and this not without a couple of serious hacks to
> have something to test. See 'Caveats' below for more details.
> 
> This is a first draft and I expect a lot of rewrite before it reaches
> mainline QEMU. Nevertheless, it compiles, boots and can be used for
> some testing.

So, this is probably obvious, but I'm not considering this a candidate
for qemu 2.10 (seeing as the soft freeze was yesterday).  I'll still
try to review and, once ready, queue for 2.11.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model Cédric Le Goater
@ 2017-07-19  3:08   ` David Gibson
  2017-07-19  3:23     ` David Gibson
                       ` (3 more replies)
  0 siblings, 4 replies; 122+ messages in thread
From: David Gibson @ 2017-07-19  3:08 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8630 bytes --]

On Wed, Jul 05, 2017 at 07:13:17PM +0200, Cédric Le Goater wrote:
> Let's provide an empty shell for the XIVE controller model with a
> couple of attributes for the IRQ number allocator. The latter is
> largely inspired by OPAL which allocates IPI IRQ numbers from the
> bottom of the IRQ number space and allocates the HW IRQ numbers from
> the top.
> 
> The number of IPIs is simply deduced from the max number of CPUs the
> guest supports and we provision a arbitrary number of HW irqs.
> 
> The XIVE object is kept private because it will hold internal tables
> which do not need to be exposed to sPAPR.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  default-configs/ppc64-softmmu.mak |  1 +
>  hw/intc/Makefile.objs             |  1 +
>  hw/intc/xive-internal.h           | 28 ++++++++++++
>  hw/intc/xive.c                    | 94 +++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/xive.h             | 27 +++++++++++
>  5 files changed, 151 insertions(+)
>  create mode 100644 hw/intc/xive-internal.h
>  create mode 100644 hw/intc/xive.c
>  create mode 100644 include/hw/ppc/xive.h
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 46c95993217d..1179c07e6e9f 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -56,6 +56,7 @@ CONFIG_SM501=y
>  CONFIG_XICS=$(CONFIG_PSERIES)
>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
> +CONFIG_XIVE=$(CONFIG_PSERIES)
>  # For PReP
>  CONFIG_SERIAL_ISA=y
>  CONFIG_MC146818RTC=y
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 78426a7dafcd..28b83456bfcc 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
>  obj-$(CONFIG_XICS) += xics.o
>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> +obj-$(CONFIG_XIVE) += xive.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> new file mode 100644
> index 000000000000..155c2dcd6066
> --- /dev/null
> +++ b/hw/intc/xive-internal.h
> @@ -0,0 +1,28 @@
> +/*
> + * Copyright 2016,2017 IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#ifndef _INTC_XIVE_INTERNAL_H
> +#define _INTC_XIVE_INTERNAL_H
> +
> +#include <hw/sysbus.h>
> +
> +struct XIVE {
> +    SysBusDevice parent;

XIVE probably shouldn't be a SysBusDevice.  According to agraf, that
should only be used for things which have an MMIO presence on a bus
structure that's not worth the bother of more specifically modelling.

I don't think that's the case for XIVE, so it should just have
TYPE_DEVICE as its parent.  There are several pseries things which
already get this wrong (mostly because I made them before fully
understanding the role of the SysBus), but we should avoid adding
others.

> +    /* Properties */
> +    uint32_t     nr_targets;
> +
> +    /* IRQ number allocator */
> +    uint32_t     int_count;     /* Number of interrupts: nr_targets + HW IRQs */
> +    uint32_t     int_base;      /* Min index */
> +    uint32_t     int_max;       /* Max index */
> +    uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
> +    uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
> +};
> +
> +#endif /* _INTC_XIVE_INTERNAL_H */
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> new file mode 100644
> index 000000000000..5b4ea915d87c
> --- /dev/null
> +++ b/hw/intc/xive.c
> @@ -0,0 +1,94 @@
> +/*
> + * QEMU PowerPC XIVE model
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "target/ppc/cpu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/dma.h"
> +#include "monitor/monitor.h"
> +#include "hw/ppc/xive.h"
> +
> +#include "xive-internal.h"
> +
> +/*
> + * Main XIVE object

As with XICs, does it really make sense for there to be a "main" XIVE
object, or should be an interface attached to the machine?

> + */
> +
> +/* Let's provision some HW IRQ numbers. We could use a XIVE property
> + * also but it does not seem necessary for the moment.
> + */
> +#define MAX_HW_IRQS_ENTRIES (8 * 1024)
> +
> +static void xive_init(Object *obj)
> +{
> +    ;
> +}
> +
> +static void xive_realize(DeviceState *dev, Error **errp)
> +{
> +    XIVE *x = XIVE(dev);
> +
> +    if (!x->nr_targets) {
> +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
> +        return;
> +    }
> +
> +    /* Initialize IRQ number allocator. Let's use a base number if we
> +     * need to introduce a notion of blocks one day.
> +     */
> +    x->int_base = 0;
> +    x->int_count = x->nr_targets + MAX_HW_IRQS_ENTRIES;
> +    x->int_max = x->int_base + x->int_count;
> +    x->int_hw_bot = x->int_max;
> +    x->int_ipi_top = x->int_base;
> +
> +    /* Reserve some numbers as OPAL does ? */
> +    if (x->int_ipi_top < 0x10) {
> +        x->int_ipi_top = 0x10;
> +    }

I'm somewhat uncomfortable with an irq allocater here in the intc
code.  As a rule, irq allocation is the responsibility of the machine,
not any sub-component.  Furthermore, it should allocate in a way which
is repeatable, since they need to stay stable across reboots and
migrations.

And, yes, we have an allocator of sorts in XICS - it has caused a
number of problems in the past.

> +}
> +
> +static Property xive_properties[] = {
> +    DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void xive_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->realize = xive_realize;
> +    dc->props = xive_properties;
> +    dc->desc = "XIVE";
> +}
> +
> +static const TypeInfo xive_info = {
> +    .name = TYPE_XIVE,
> +    .parent = TYPE_SYS_BUS_DEVICE,
> +    .instance_init = xive_init,
> +    .instance_size = sizeof(XIVE),
> +    .class_init = xive_class_init,
> +};
> +
> +static void xive_register_types(void)
> +{
> +    type_register_static(&xive_info);
> +}
> +
> +type_init(xive_register_types)
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> new file mode 100644
> index 000000000000..863f5a9c6b5f
> --- /dev/null
> +++ b/include/hw/ppc/xive.h
> @@ -0,0 +1,27 @@
> +/*
> + * QEMU PowerPC XIVE model
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef PPC_XIVE_H
> +#define PPC_XIVE_H
> +
> +typedef struct XIVE XIVE;
> +
> +#define TYPE_XIVE "xive"
> +#define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
> +
> +#endif /* PPC_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  3:08   ` David Gibson
@ 2017-07-19  3:23     ` David Gibson
  2017-07-19  3:56     ` Benjamin Herrenschmidt
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 122+ messages in thread
From: David Gibson @ 2017-07-19  3:23 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9342 bytes --]

On Wed, Jul 19, 2017 at 01:08:49PM +1000, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:17PM +0200, Cédric Le Goater wrote:
> > Let's provide an empty shell for the XIVE controller model with a
> > couple of attributes for the IRQ number allocator. The latter is
> > largely inspired by OPAL which allocates IPI IRQ numbers from the
> > bottom of the IRQ number space and allocates the HW IRQ numbers from
> > the top.
> > 
> > The number of IPIs is simply deduced from the max number of CPUs the
> > guest supports and we provision a arbitrary number of HW irqs.
> > 
> > The XIVE object is kept private because it will hold internal tables
> > which do not need to be exposed to sPAPR.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  default-configs/ppc64-softmmu.mak |  1 +
> >  hw/intc/Makefile.objs             |  1 +
> >  hw/intc/xive-internal.h           | 28 ++++++++++++
> >  hw/intc/xive.c                    | 94 +++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/xive.h             | 27 +++++++++++
> >  5 files changed, 151 insertions(+)
> >  create mode 100644 hw/intc/xive-internal.h
> >  create mode 100644 hw/intc/xive.c
> >  create mode 100644 include/hw/ppc/xive.h
> > 
> > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > index 46c95993217d..1179c07e6e9f 100644
> > --- a/default-configs/ppc64-softmmu.mak
> > +++ b/default-configs/ppc64-softmmu.mak
> > @@ -56,6 +56,7 @@ CONFIG_SM501=y
> >  CONFIG_XICS=$(CONFIG_PSERIES)
> >  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >  CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
> > +CONFIG_XIVE=$(CONFIG_PSERIES)
> >  # For PReP
> >  CONFIG_SERIAL_ISA=y
> >  CONFIG_MC146818RTC=y
> > diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> > index 78426a7dafcd..28b83456bfcc 100644
> > --- a/hw/intc/Makefile.objs
> > +++ b/hw/intc/Makefile.objs
> > @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >  obj-$(CONFIG_XICS) += xics.o
> >  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> > +obj-$(CONFIG_XIVE) += xive.o
> >  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> > diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> > new file mode 100644
> > index 000000000000..155c2dcd6066
> > --- /dev/null
> > +++ b/hw/intc/xive-internal.h
> > @@ -0,0 +1,28 @@
> > +/*
> > + * Copyright 2016,2017 IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +#ifndef _INTC_XIVE_INTERNAL_H
> > +#define _INTC_XIVE_INTERNAL_H
> > +
> > +#include <hw/sysbus.h>
> > +
> > +struct XIVE {
> > +    SysBusDevice parent;
> 
> XIVE probably shouldn't be a SysBusDevice.  According to agraf, that
> should only be used for things which have an MMIO presence on a bus
> structure that's not worth the bother of more specifically modelling.
> 
> I don't think that's the case for XIVE, so it should just have
> TYPE_DEVICE as its parent.  There are several pseries things which
> already get this wrong (mostly because I made them before fully
> understanding the role of the SysBus), but we should avoid adding
> others.
> 
> > +    /* Properties */
> > +    uint32_t     nr_targets;
> > +
> > +    /* IRQ number allocator */
> > +    uint32_t     int_count;     /* Number of interrupts: nr_targets + HW IRQs */
> > +    uint32_t     int_base;      /* Min index */
> > +    uint32_t     int_max;       /* Max index */
> > +    uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
> > +    uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
> > +};
> > +
> > +#endif /* _INTC_XIVE_INTERNAL_H */
> > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > new file mode 100644
> > index 000000000000..5b4ea915d87c
> > --- /dev/null
> > +++ b/hw/intc/xive.c
> > @@ -0,0 +1,94 @@
> > +/*
> > + * QEMU PowerPC XIVE model
> > + *
> > + * Copyright (c) 2017, IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "qapi/error.h"
> > +#include "target/ppc/cpu.h"
> > +#include "sysemu/cpus.h"
> > +#include "sysemu/dma.h"
> > +#include "monitor/monitor.h"
> > +#include "hw/ppc/xive.h"
> > +
> > +#include "xive-internal.h"
> > +
> > +/*
> > + * Main XIVE object
> 
> As with XICs, does it really make sense for there to be a "main" XIVE
> object, or should be an interface attached to the machine?
> 
> > + */
> > +
> > +/* Let's provision some HW IRQ numbers. We could use a XIVE property
> > + * also but it does not seem necessary for the moment.
> > + */
> > +#define MAX_HW_IRQS_ENTRIES (8 * 1024)
> > +
> > +static void xive_init(Object *obj)
> > +{
> > +    ;
> > +}
> > +
> > +static void xive_realize(DeviceState *dev, Error **errp)
> > +{
> > +    XIVE *x = XIVE(dev);
> > +
> > +    if (!x->nr_targets) {
> > +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
> > +        return;
> > +    }
> > +
> > +    /* Initialize IRQ number allocator. Let's use a base number if we
> > +     * need to introduce a notion of blocks one day.
> > +     */
> > +    x->int_base = 0;
> > +    x->int_count = x->nr_targets + MAX_HW_IRQS_ENTRIES;
> > +    x->int_max = x->int_base + x->int_count;

Also storing a value which is easily derivable from values already in
the structure is a bit silly.  I think you should drop either
int_count or int_max.

> > +    x->int_hw_bot = x->int_max;
> > +    x->int_ipi_top = x->int_base;
> > +
> > +    /* Reserve some numbers as OPAL does ? */
> > +    if (x->int_ipi_top < 0x10) {
> > +        x->int_ipi_top = 0x10;
> > +    }
> 
> I'm somewhat uncomfortable with an irq allocater here in the intc
> code.  As a rule, irq allocation is the responsibility of the machine,
> not any sub-component.  Furthermore, it should allocate in a way which
> is repeatable, since they need to stay stable across reboots and
> migrations.
> 
> And, yes, we have an allocator of sorts in XICS - it has caused a
> number of problems in the past.
> 
> > +}
> > +
> > +static Property xive_properties[] = {
> > +    DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
> > +    DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> > +static void xive_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +
> > +    dc->realize = xive_realize;
> > +    dc->props = xive_properties;
> > +    dc->desc = "XIVE";
> > +}
> > +
> > +static const TypeInfo xive_info = {
> > +    .name = TYPE_XIVE,
> > +    .parent = TYPE_SYS_BUS_DEVICE,
> > +    .instance_init = xive_init,
> > +    .instance_size = sizeof(XIVE),
> > +    .class_init = xive_class_init,
> > +};
> > +
> > +static void xive_register_types(void)
> > +{
> > +    type_register_static(&xive_info);
> > +}
> > +
> > +type_init(xive_register_types)
> > diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> > new file mode 100644
> > index 000000000000..863f5a9c6b5f
> > --- /dev/null
> > +++ b/include/hw/ppc/xive.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * QEMU PowerPC XIVE model
> > + *
> > + * Copyright (c) 2017, IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef PPC_XIVE_H
> > +#define PPC_XIVE_H
> > +
> > +typedef struct XIVE XIVE;
> > +
> > +#define TYPE_XIVE "xive"
> > +#define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
> > +
> > +#endif /* PPC_XIVE_H */
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables Cédric Le Goater
@ 2017-07-19  3:24   ` David Gibson
  2017-07-24 12:52     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-19  3:24 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9703 bytes --]

On Wed, Jul 05, 2017 at 07:13:18PM +0200, Cédric Le Goater wrote:
> The XIVE interrupt controller of the POWER9 uses a set of tables to
> redirect exception from event sources to CPU threads. Among which we
> choose to model :
> 
>  - the State Bit Entries (SBE), also known as Event State Buffer
>    (ESB). This is a two bit state machine for each event source which
>    is used to trigger events. The bits are named "P" (pending) and "Q"
>    (queued) and can be controlled by MMIO.
> 
>  - the Interrupt Virtualization Entry (IVE) table, also known as Event
>    Assignment Structure (EAS). This table is indexed by the IRQ number
>    and is looked up to find the Event Queue associated with a
>    triggered event.
> 
>  - the Event Queue Descriptor (EQD) table, also known as Event
>    Notification Descriptor (END). The EQD contains fields that specify
>    the Event Queue on which event data is posted (and later pulled by
>    the OS) and also a target (or VPD) to notify.
> 
> An additional table was not modeled but we might need to to support
> the H_INT_SET_OS_REPORTING_LINE hcall:
> 
>  - the Virtual Processor Descriptor (VPD) table, also known as
>    Notification Virtual Target (NVT).
> 
> The XIVE object is expanded with the tables described above. The size
> of each table depends on the number of provisioned IRQ and the maximum
> number of CPUs in the system. The indexing is very basic and might
> need to be improved for the EQs.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive-internal.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive.c          | 72 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 167 insertions(+)
> 
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index 155c2dcd6066..8e755aa88a14 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -11,6 +11,89 @@
>  
>  #include <hw/sysbus.h>
>  
> +/* Utilities to manipulate these (originaly from OPAL) */
> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
> +#define SETFIELD(m, v, val)                             \
> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
> +
> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
> +                                 PPC_BIT32(bs))
> +
> +/* IVE/EAS
> + *
> + * One per interrupt source. Targets that interrupt to a given EQ
> + * and provides the corresponding logical interrupt number (EQ data)
> + *
> + * We also map this structure to the escalation descriptor inside
> + * an EQ, though in that case the valid and masked bits are not used.
> + */
> +typedef struct XiveIVE {
> +        /* Use a single 64-bit definition to make it easier to
> +         * perform atomic updates
> +         */
> +        uint64_t        w;
> +#define IVE_VALID       PPC_BIT(0)
> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> +} XiveIVE;
> +
> +/* EQ */
> +typedef struct XiveEQ {
> +        uint32_t        w0;
> +#define EQ_W0_VALID             PPC_BIT32(0)
> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
> +#define EQ_W0_SW0               PPC_BIT32(16)
> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
> +#define EQ_QSIZE_4K             0
> +#define EQ_QSIZE_64K            4
> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
> +        uint32_t        w1;
> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
> +#define EQ_W1_ESn_P             PPC_BIT32(0)
> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
> +#define EQ_W1_ESe_P             PPC_BIT32(2)
> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
> +#define EQ_W1_GENERATION        PPC_BIT32(9)
> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> +        uint32_t        w2;
> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> +        uint32_t        w3;
> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> +        uint32_t        w4;
> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
> +        uint32_t        w5;
> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
> +        uint32_t        w6;
> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> +        uint32_t        w7;
> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> +} XiveEQ;
> +
> +#define XIVE_EQ_PRIORITY_COUNT 8
> +#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
> +
>  struct XIVE {
>      SysBusDevice parent;
>  
> @@ -23,6 +106,18 @@ struct XIVE {
>      uint32_t     int_max;       /* Max index */
>      uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
>      uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
> +
> +    /* XIVE internal tables */
> +    void         *sbe;
> +    XiveIVE      *ivt;
> +    XiveEQ       *eqdt;
>  };
>  
> +void xive_reset(void *dev);
> +XiveIVE *xive_get_ive(XIVE *x, uint32_t isn);
> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx);
> +
> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t prio,
> +                        uint32_t *out_eq_idx);
> +
>  #endif /* _INTC_XIVE_INTERNAL_H */
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 5b4ea915d87c..5b14d8155317 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -35,6 +35,27 @@
>   */
>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
>  
> +
> +void xive_reset(void *dev)
> +{
> +    XIVE *x = XIVE(dev);
> +    int i;
> +
> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
> +    memset(x->sbe, 0x55, x->int_count / 4);

I think strictly this should be a DIV_ROUND_UP to handle the case of
int_count not a multiple of 4.

> +
> +    /* Clear and mask all valid IVEs */
> +    for (i = x->int_base; i < x->int_max; i++) {
> +        XiveIVE *ive = &x->ivt[i];
> +        if (ive->w & IVE_VALID) {
> +            ive->w = IVE_VALID | IVE_MASKED;
> +        }
> +    }
> +
> +    /* clear all EQs */
> +    memset(x->eqdt, 0, x->nr_targets * XIVE_EQ_PRIORITY_COUNT * sizeof(XiveEQ));
> +}
> +
>  static void xive_init(Object *obj)
>  {
>      ;
> @@ -62,6 +83,19 @@ static void xive_realize(DeviceState *dev, Error **errp)
>      if (x->int_ipi_top < 0x10) {
>          x->int_ipi_top = 0x10;
>      }
> +
> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
> +    x->sbe = g_malloc0(x->int_count / 4);

And here as well.

> +
> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> +    x->ivt = g_malloc0(x->int_count * sizeof(XiveIVE));
> +
> +    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
> +     * for each thread in the system */
> +    x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
> +                        sizeof(XiveEQ));
> +
> +    qemu_register_reset(xive_reset, dev);
>  }
>  
>  static Property xive_properties[] = {
> @@ -92,3 +126,41 @@ static void xive_register_types(void)
>  }
>  
>  type_init(xive_register_types)
> +
> +XiveIVE *xive_get_ive(XIVE *x, uint32_t lisn)
> +{
> +    uint32_t idx = lisn;
> +
> +    if (idx < x->int_base || idx >= x->int_max) {
> +        return NULL;
> +    }
> +
> +    return &x->ivt[idx];

Should be idx - int_base, no?

> +}
> +
> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx)
> +{
> +    if (idx >= x->nr_targets * XIVE_EQ_PRIORITY_COUNT) {
> +        return NULL;
> +    }
> +
> +    return &x->eqdt[idx];
> +}
> +
> +/* TODO: improve EQ indexing. This is very simple and relies on the
> + * fact that target (CPU) numbers start at 0 and are contiguous. It
> + * should be OK for sPAPR.
> + */
> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
> +                        uint32_t *out_eq_idx)
> +{
> +    if (priority > XIVE_PRIORITY_MAX || target >= x->nr_targets) {
> +        return false;
> +    }
> +
> +    if (out_eq_idx) {
> +        *out_eq_idx = target + priority;
> +    }
> +
> +    return true;

Seems a clunky interface.  Why not return a XiveEQ *, NULL if the
inputs aren't valud.

> +}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
  2017-07-19  3:00 ` David Gibson
@ 2017-07-19  3:55   ` Benjamin Herrenschmidt
  2017-07-24  7:28     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  3:55 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Wed, 2017-07-19 at 13:00 +1000, David Gibson wrote:
> So, this is probably obvious, but I'm not considering this a candidate
> for qemu 2.10 (seeing as the soft freeze was yesterday).  I'll still
> try to review and, once ready, queue for 2.11.

Right. I need to review still and we need to make sure we have the
right plumbing for migration etc... and of course I need to do the
KVM bits. So it's definitely not 2.10 material.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  3:08   ` David Gibson
  2017-07-19  3:23     ` David Gibson
@ 2017-07-19  3:56     ` Benjamin Herrenschmidt
  2017-07-19  4:01       ` David Gibson
  2017-07-19  4:02     ` Benjamin Herrenschmidt
  2017-07-24 13:00     ` Cédric Le Goater
  3 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  3:56 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:17PM +0200, Cédric Le Goater wrote:
> > Let's provide an empty shell for the XIVE controller model with a
> > couple of attributes for the IRQ number allocator. The latter is
> > largely inspired by OPAL which allocates IPI IRQ numbers from the
> > bottom of the IRQ number space and allocates the HW IRQ numbers from
> > the top.
> > 
> > The number of IPIs is simply deduced from the max number of CPUs the
> > guest supports and we provision a arbitrary number of HW irqs.
> > 
> > The XIVE object is kept private because it will hold internal tables
> > which do not need to be exposed to sPAPR.

It does have an MMIO presence though... more than one even. There's the
TIMA (per-HW thread control area) and there's the per-interrupt MMIO
space which are exposed to the guest. There's also the per-queue
MMIO control area too.

Ben.

> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  default-configs/ppc64-softmmu.mak |  1 +
> >  hw/intc/Makefile.objs             |  1 +
> >  hw/intc/xive-internal.h           | 28 ++++++++++++
> >  hw/intc/xive.c                    | 94 +++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/xive.h             | 27 +++++++++++
> >  5 files changed, 151 insertions(+)
> >  create mode 100644 hw/intc/xive-internal.h
> >  create mode 100644 hw/intc/xive.c
> >  create mode 100644 include/hw/ppc/xive.h
> > 
> > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > index 46c95993217d..1179c07e6e9f 100644
> > --- a/default-configs/ppc64-softmmu.mak
> > +++ b/default-configs/ppc64-softmmu.mak
> > @@ -56,6 +56,7 @@ CONFIG_SM501=y
> >  CONFIG_XICS=$(CONFIG_PSERIES)
> >  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >  CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
> > +CONFIG_XIVE=$(CONFIG_PSERIES)
> >  # For PReP
> >  CONFIG_SERIAL_ISA=y
> >  CONFIG_MC146818RTC=y
> > diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> > index 78426a7dafcd..28b83456bfcc 100644
> > --- a/hw/intc/Makefile.objs
> > +++ b/hw/intc/Makefile.objs
> > @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >  obj-$(CONFIG_XICS) += xics.o
> >  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> > +obj-$(CONFIG_XIVE) += xive.o
> >  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> > diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> > new file mode 100644
> > index 000000000000..155c2dcd6066
> > --- /dev/null
> > +++ b/hw/intc/xive-internal.h
> > @@ -0,0 +1,28 @@
> > +/*
> > + * Copyright 2016,2017 IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +#ifndef _INTC_XIVE_INTERNAL_H
> > +#define _INTC_XIVE_INTERNAL_H
> > +
> > +#include <hw/sysbus.h>
> > +
> > +struct XIVE {
> > +    SysBusDevice parent;
> 
> XIVE probably shouldn't be a SysBusDevice.  According to agraf, that
> should only be used for things which have an MMIO presence on a bus
> structure that's not worth the bother of more specifically modelling.
> 
> I don't think that's the case for XIVE, so it should just have
> TYPE_DEVICE as its parent.  There are several pseries things which
> already get this wrong (mostly because I made them before fully
> understanding the role of the SysBus), but we should avoid adding
> others.
> 
> > +    /* Properties */
> > +    uint32_t     nr_targets;
> > +
> > +    /* IRQ number allocator */
> > +    uint32_t     int_count;     /* Number of interrupts: nr_targets + HW IRQs */
> > +    uint32_t     int_base;      /* Min index */
> > +    uint32_t     int_max;       /* Max index */
> > +    uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
> > +    uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
> > +};
> > +
> > +#endif /* _INTC_XIVE_INTERNAL_H */
> > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > new file mode 100644
> > index 000000000000..5b4ea915d87c
> > --- /dev/null
> > +++ b/hw/intc/xive.c
> > @@ -0,0 +1,94 @@
> > +/*
> > + * QEMU PowerPC XIVE model
> > + *
> > + * Copyright (c) 2017, IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "qapi/error.h"
> > +#include "target/ppc/cpu.h"
> > +#include "sysemu/cpus.h"
> > +#include "sysemu/dma.h"
> > +#include "monitor/monitor.h"
> > +#include "hw/ppc/xive.h"
> > +
> > +#include "xive-internal.h"
> > +
> > +/*
> > + * Main XIVE object
> 
> As with XICs, does it really make sense for there to be a "main" XIVE
> object, or should be an interface attached to the machine?
> 
> > + */
> > +
> > +/* Let's provision some HW IRQ numbers. We could use a XIVE property
> > + * also but it does not seem necessary for the moment.
> > + */
> > +#define MAX_HW_IRQS_ENTRIES (8 * 1024)
> > +
> > +static void xive_init(Object *obj)
> > +{
> > +    ;
> > +}
> > +
> > +static void xive_realize(DeviceState *dev, Error **errp)
> > +{
> > +    XIVE *x = XIVE(dev);
> > +
> > +    if (!x->nr_targets) {
> > +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
> > +        return;
> > +    }
> > +
> > +    /* Initialize IRQ number allocator. Let's use a base number if we
> > +     * need to introduce a notion of blocks one day.
> > +     */
> > +    x->int_base = 0;
> > +    x->int_count = x->nr_targets + MAX_HW_IRQS_ENTRIES;
> > +    x->int_max = x->int_base + x->int_count;
> > +    x->int_hw_bot = x->int_max;
> > +    x->int_ipi_top = x->int_base;
> > +
> > +    /* Reserve some numbers as OPAL does ? */
> > +    if (x->int_ipi_top < 0x10) {
> > +        x->int_ipi_top = 0x10;
> > +    }
> 
> I'm somewhat uncomfortable with an irq allocater here in the intc
> code.  As a rule, irq allocation is the responsibility of the machine,
> not any sub-component.  Furthermore, it should allocate in a way which
> is repeatable, since they need to stay stable across reboots and
> migrations.
> 
> And, yes, we have an allocator of sorts in XICS - it has caused a
> number of problems in the past.
> 
> > +}
> > +
> > +static Property xive_properties[] = {
> > +    DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
> > +    DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> > +static void xive_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +
> > +    dc->realize = xive_realize;
> > +    dc->props = xive_properties;
> > +    dc->desc = "XIVE";
> > +}
> > +
> > +static const TypeInfo xive_info = {
> > +    .name = TYPE_XIVE,
> > +    .parent = TYPE_SYS_BUS_DEVICE,
> > +    .instance_init = xive_init,
> > +    .instance_size = sizeof(XIVE),
> > +    .class_init = xive_class_init,
> > +};
> > +
> > +static void xive_register_types(void)
> > +{
> > +    type_register_static(&xive_info);
> > +}
> > +
> > +type_init(xive_register_types)
> > diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> > new file mode 100644
> > index 000000000000..863f5a9c6b5f
> > --- /dev/null
> > +++ b/include/hw/ppc/xive.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * QEMU PowerPC XIVE model
> > + *
> > + * Copyright (c) 2017, IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef PPC_XIVE_H
> > +#define PPC_XIVE_H
> > +
> > +typedef struct XIVE XIVE;
> > +
> > +#define TYPE_XIVE "xive"
> > +#define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
> > +
> > +#endif /* PPC_XIVE_H */
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  3:56     ` Benjamin Herrenschmidt
@ 2017-07-19  4:01       ` David Gibson
  2017-07-19  4:18         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-19  4:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

On Wed, Jul 19, 2017 at 01:56:57PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:17PM +0200, Cédric Le Goater wrote:
> > > Let's provide an empty shell for the XIVE controller model with a
> > > couple of attributes for the IRQ number allocator. The latter is
> > > largely inspired by OPAL which allocates IPI IRQ numbers from the
> > > bottom of the IRQ number space and allocates the HW IRQ numbers from
> > > the top.
> > > 
> > > The number of IPIs is simply deduced from the max number of CPUs the
> > > guest supports and we provision a arbitrary number of HW irqs.
> > > 
> > > The XIVE object is kept private because it will hold internal tables
> > > which do not need to be exposed to sPAPR.
> 
> It does have an MMIO presence though... more than one even. There's the
> TIMA (per-HW thread control area) and there's the per-interrupt MMIO
> space which are exposed to the guest. There's also the per-queue
> MMIO control area too.

Ok.  Always?  Or just on powernv?

If it only has an MMIO presence on powernv, then the "core" xive
object should probably be TYPE_DEVICE, with the powernv specific
device being a SysBusDevice which incorporates the core xive device
inside it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  3:08   ` David Gibson
  2017-07-19  3:23     ` David Gibson
  2017-07-19  3:56     ` Benjamin Herrenschmidt
@ 2017-07-19  4:02     ` Benjamin Herrenschmidt
  2017-07-21  7:50       ` David Gibson
  2017-07-24 13:00     ` Cédric Le Goater
  3 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:02 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> 
> I'm somewhat uncomfortable with an irq allocater here in the intc
> code.  As a rule, irq allocation is the responsibility of the machine,
> not any sub-component.  Furthermore, it should allocate in a way which
> is repeatable, since they need to stay stable across reboots and
> migrations.
> 
> And, yes, we have an allocator of sorts in XICS - it has caused a
> number of problems in the past.

So....

For a bare metal model (which we don't have yet) of XIVE, the IRQ
numbering is entirely an artifact of how the HW is configured. There
should thus be no interrupt numbers visible to qemu.

For a PAPR model things are a bit different, but if we want to
maximize code re-use between the two, we probably need to make sure
the interrupts "allocated" by the machine for XIVE can be represented
by the HW model.

That means:

 - Each chip has a range (high bits are the block ID, which maps to a
chip, low bits, around 512K to 1M interrupts is the per-chip space).

 - Interrupts 0...N of that range (N depends on how much backing
memory and MMIO space is provisioned for each chip) are "generic IPIs"
which are somewhat generic interrupt source that can be triggered with
an MMIO store and routed to any target. Those are used in PAPR for
things like IPIs and some type of accelerator interrupts.

 - Portions of that range (which may or may not overlap the 0...N
above, if they do they "shadow" the generic interrupts) can be
configured to be the HW sources from the various PCIe bridges and
the PSI controller.

Cheers,
Ben.

> > +}
> > +
> > +static Property xive_properties[] = {
> > +    DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
> > +    DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> > +static void xive_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +
> > +    dc->realize = xive_realize;
> > +    dc->props = xive_properties;
> > +    dc->desc = "XIVE";
> > +}
> > +
> > +static const TypeInfo xive_info = {
> > +    .name = TYPE_XIVE,
> > +    .parent = TYPE_SYS_BUS_DEVICE,
> > +    .instance_init = xive_init,
> > +    .instance_size = sizeof(XIVE),
> > +    .class_init = xive_class_init,
> > +};
> > +
> > +static void xive_register_types(void)
> > +{
> > +    type_register_static(&xive_info);
> > +}
> > +
> > +type_init(xive_register_types)
> > diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> > new file mode 100644
> > index 000000000000..863f5a9c6b5f
> > --- /dev/null
> > +++ b/include/hw/ppc/xive.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * QEMU PowerPC XIVE model
> > + *
> > + * Copyright (c) 2017, IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef PPC_XIVE_H
> > +#define PPC_XIVE_H
> > +
> > +typedef struct XIVE XIVE;
> > +
> > +#define TYPE_XIVE "xive"
> > +#define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
> > +
> > +#endif /* PPC_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  4:01       ` David Gibson
@ 2017-07-19  4:18         ` Benjamin Herrenschmidt
  2017-07-19  4:25           ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:18 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

On Wed, 2017-07-19 at 14:01 +1000, David Gibson wrote:
> On Wed, Jul 19, 2017 at 01:56:57PM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> > > On Wed, Jul 05, 2017 at 07:13:17PM +0200, Cédric Le Goater wrote:
> > > > Let's provide an empty shell for the XIVE controller model with a
> > > > couple of attributes for the IRQ number allocator. The latter is
> > > > largely inspired by OPAL which allocates IPI IRQ numbers from the
> > > > bottom of the IRQ number space and allocates the HW IRQ numbers from
> > > > the top.
> > > > 
> > > > The number of IPIs is simply deduced from the max number of CPUs the
> > > > guest supports and we provision a arbitrary number of HW irqs.
> > > > 
> > > > The XIVE object is kept private because it will hold internal tables
> > > > which do not need to be exposed to sPAPR.
> > 
> > It does have an MMIO presence though... more than one even. There's the
> > TIMA (per-HW thread control area) and there's the per-interrupt MMIO
> > space which are exposed to the guest. There's also the per-queue
> > MMIO control area too.
> 
> Ok.  Always?  Or just on powernv?
> 
> If it only has an MMIO presence on powernv, then the "core" xive
> object should probably be TYPE_DEVICE, with the powernv specific
> device being a SysBusDevice which incorporates the core xive device
> inside it.

No the ones above are on PAPR. PowerNV has even more :-)

The TIMA (thread management area) is the MMIO area through which
you control the current CPU priority etc...

It's designed in HW to "know" which core/thread is accessing it (it's
at a fixed address) and respond appropriately based on that and which
virtual CPU has been activated on that core/thread.

It's part of what allows XIVE to deliver interrupts without any HV
calls.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  4:18         ` Benjamin Herrenschmidt
@ 2017-07-19  4:25           ` David Gibson
  0 siblings, 0 replies; 122+ messages in thread
From: David Gibson @ 2017-07-19  4:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

On Wed, Jul 19, 2017 at 02:18:17PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-19 at 14:01 +1000, David Gibson wrote:
> > On Wed, Jul 19, 2017 at 01:56:57PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> > > > On Wed, Jul 05, 2017 at 07:13:17PM +0200, Cédric Le Goater wrote:
> > > > > Let's provide an empty shell for the XIVE controller model with a
> > > > > couple of attributes for the IRQ number allocator. The latter is
> > > > > largely inspired by OPAL which allocates IPI IRQ numbers from the
> > > > > bottom of the IRQ number space and allocates the HW IRQ numbers from
> > > > > the top.
> > > > > 
> > > > > The number of IPIs is simply deduced from the max number of CPUs the
> > > > > guest supports and we provision a arbitrary number of HW irqs.
> > > > > 
> > > > > The XIVE object is kept private because it will hold internal tables
> > > > > which do not need to be exposed to sPAPR.
> > > 
> > > It does have an MMIO presence though... more than one even. There's the
> > > TIMA (per-HW thread control area) and there's the per-interrupt MMIO
> > > space which are exposed to the guest. There's also the per-queue
> > > MMIO control area too.
> > 
> > Ok.  Always?  Or just on powernv?
> > 
> > If it only has an MMIO presence on powernv, then the "core" xive
> > object should probably be TYPE_DEVICE, with the powernv specific
> > device being a SysBusDevice which incorporates the core xive device
> > inside it.
> 
> No the ones above are on PAPR. PowerNV has even more :-)

Ok.  SusBusDevice is reasonable then.

> The TIMA (thread management area) is the MMIO area through which
> you control the current CPU priority etc...
> 
> It's designed in HW to "know" which core/thread is accessing it (it's
> at a fixed address) and respond appropriately based on that and which
> virtual CPU has been activated on that core/thread.
> 
> It's part of what allows XIVE to deliver interrupts without any HV
> calls.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  4:02     ` Benjamin Herrenschmidt
@ 2017-07-21  7:50       ` David Gibson
  2017-07-21  8:21         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-21  7:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]

On Wed, Jul 19, 2017 at 02:02:18PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> > 
> > I'm somewhat uncomfortable with an irq allocater here in the intc
> > code.  As a rule, irq allocation is the responsibility of the machine,
> > not any sub-component.  Furthermore, it should allocate in a way which
> > is repeatable, since they need to stay stable across reboots and
> > migrations.
> > 
> > And, yes, we have an allocator of sorts in XICS - it has caused a
> > number of problems in the past.
> 
> So....
> 
> For a bare metal model (which we don't have yet) of XIVE, the IRQ
> numbering is entirely an artifact of how the HW is configured. There
> should thus be no interrupt numbers visible to qemu.

Uh.. I don't entirely follow.  Do you mean that during boot the guest
programs the irq numbers into the various components?

In that case this allocator stuff definitely doesn't belong on the
xive code.

> For a PAPR model things are a bit different, but if we want to
> maximize code re-use between the two, we probably need to make sure
> the interrupts "allocated" by the machine for XIVE can be represented
> by the HW model.
> 
> That means:
> 
>  - Each chip has a range (high bits are the block ID, which maps to a
> chip, low bits, around 512K to 1M interrupts is the per-chip space).
> 
>  - Interrupts 0...N of that range (N depends on how much backing
> memory and MMIO space is provisioned for each chip) are "generic IPIs"
> which are somewhat generic interrupt source that can be triggered with
> an MMIO store and routed to any target. Those are used in PAPR for
> things like IPIs and some type of accelerator interrupts.
> 
>  - Portions of that range (which may or may not overlap the 0...N
> above, if they do they "shadow" the generic interrupts) can be
> configured to be the HW sources from the various PCIe bridges and
> the PSI controller.

Err.. I'm confused how this not sure this relates to spapr.  There are
no chips or PSI there, and the PCI bridges aren't really the same
thing.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-21  7:50       ` David Gibson
@ 2017-07-21  8:21         ` Benjamin Herrenschmidt
  2017-07-24  3:28           ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-21  8:21 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

On Fri, 2017-07-21 at 17:50 +1000, David Gibson wrote:
> On Wed, Jul 19, 2017 at 02:02:18PM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> > > 
> > > I'm somewhat uncomfortable with an irq allocater here in the intc
> > > code.  As a rule, irq allocation is the responsibility of the machine,
> > > not any sub-component.  Furthermore, it should allocate in a way which
> > > is repeatable, since they need to stay stable across reboots and
> > > migrations.
> > > 
> > > And, yes, we have an allocator of sorts in XICS - it has caused a
> > > number of problems in the past.
> > 
> > So....
> > 
> > For a bare metal model (which we don't have yet) of XIVE, the IRQ
> > numbering is entirely an artifact of how the HW is configured. There
> > should thus be no interrupt numbers visible to qemu.
> 
> Uh.. I don't entirely follow.  Do you mean that during boot the guest
> programs the irq numbers into the various components?

I said a "bare metal model" but yes. Pretty much. 
> 
> In that case this allocator stuff definitely doesn't belong on the
> xive code.
> 
> > For a PAPR model things are a bit different, but if we want to
> > maximize code re-use between the two, we probably need to make sure
> > the interrupts "allocated" by the machine for XIVE can be represented
> > by the HW model.
> > 
> > That means:
> > 
> >  - Each chip has a range (high bits are the block ID, which maps to a
> > chip, low bits, around 512K to 1M interrupts is the per-chip space).
> > 
> >  - Interrupts 0...N of that range (N depends on how much backing
> > memory and MMIO space is provisioned for each chip) are "generic IPIs"
> > which are somewhat generic interrupt source that can be triggered with
> > an MMIO store and routed to any target. Those are used in PAPR for
> > things like IPIs and some type of accelerator interrupts.
> > 
> >  - Portions of that range (which may or may not overlap the 0...N
> > above, if they do they "shadow" the generic interrupts) can be
> > configured to be the HW sources from the various PCIe bridges and
> > the PSI controller.
> 
> Err.. I'm confused how this not sure this relates to spapr.  There are
> no chips or PSI there, and the PCI bridges aren't really the same
> thing.

The above is the HW model, sorry for the confusion. With a few comments
about how they are used in PAPR.

So yes, in PAPR there's an "allocator" because the hypervisor will
create a guest "virtual" (or logical to use PAPR terminology) interrupt
number space, in order to represents the various interrupts into the
guest.

Those numbers however are just tokens, they don't have to represent any
real HW concept. So they can be "allocated" in a rather fixed way, for
example, you could have something like a fixed map where you put all
the PCI interrupts at a certain number (a factor of the PHB# with room
or a fix number per PHB, maybe 16K or so, the HW does 4K max). Another
based would have a chunk of "general purpose" IPIs (for use for actual
IPIs and for other things to come). And a range for the virtual device
interrupts for example. Or you can just use an allocator.

But it's fundamentally an allocator that sits in the hypervisor, so in
our case, I would say in the spapr "component" of XIVE, rather than the
XIVE HW model itself.

Now what Cedric did, because XIVE is very complex and we need something
for PAPR quickly, is not a complete HW model, but a somewhat simplified
one that only handles what PAPR exposes. So in that case where the
allocator sits is a bit of a TBD...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-21  8:21         ` Benjamin Herrenschmidt
@ 2017-07-24  3:28           ` David Gibson
  2017-07-24  3:53             ` Alexey Kardashevskiy
  2017-07-24  5:04             ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 122+ messages in thread
From: David Gibson @ 2017-07-24  3:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4787 bytes --]

On Fri, Jul 21, 2017 at 06:21:31PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2017-07-21 at 17:50 +1000, David Gibson wrote:
> > On Wed, Jul 19, 2017 at 02:02:18PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
> > > > 
> > > > I'm somewhat uncomfortable with an irq allocater here in the intc
> > > > code.  As a rule, irq allocation is the responsibility of the machine,
> > > > not any sub-component.  Furthermore, it should allocate in a way which
> > > > is repeatable, since they need to stay stable across reboots and
> > > > migrations.
> > > > 
> > > > And, yes, we have an allocator of sorts in XICS - it has caused a
> > > > number of problems in the past.
> > > 
> > > So....
> > > 
> > > For a bare metal model (which we don't have yet) of XIVE, the IRQ
> > > numbering is entirely an artifact of how the HW is configured. There
> > > should thus be no interrupt numbers visible to qemu.
> > 
> > Uh.. I don't entirely follow.  Do you mean that during boot the guest
> > programs the irq numbers into the various components?
> 
> I said a "bare metal model" but yes. Pretty much. 

Right, by "guest" I meant the kernel running under qemu, even if its
running on a bare-metal equivalent platform.

> > In that case this allocator stuff definitely doesn't belong on the
> > xive code.
> > 
> > > For a PAPR model things are a bit different, but if we want to
> > > maximize code re-use between the two, we probably need to make sure
> > > the interrupts "allocated" by the machine for XIVE can be represented
> > > by the HW model.
> > > 
> > > That means:
> > > 
> > >  - Each chip has a range (high bits are the block ID, which maps to a
> > > chip, low bits, around 512K to 1M interrupts is the per-chip space).
> > > 
> > >  - Interrupts 0...N of that range (N depends on how much backing
> > > memory and MMIO space is provisioned for each chip) are "generic IPIs"
> > > which are somewhat generic interrupt source that can be triggered with
> > > an MMIO store and routed to any target. Those are used in PAPR for
> > > things like IPIs and some type of accelerator interrupts.
> > > 
> > >  - Portions of that range (which may or may not overlap the 0...N
> > > above, if they do they "shadow" the generic interrupts) can be
> > > configured to be the HW sources from the various PCIe bridges and
> > > the PSI controller.
> > 
> > Err.. I'm confused how this not sure this relates to spapr.  There are
> > no chips or PSI there, and the PCI bridges aren't really the same
> > thing.
> 
> The above is the HW model, sorry for the confusion. With a few comments
> about how they are used in PAPR.
> 
> So yes, in PAPR there's an "allocator" because the hypervisor will
> create a guest "virtual" (or logical to use PAPR terminology) interrupt
> number space, in order to represents the various interrupts into the
> guest.

Ok, but are each of those logical irqs bound to a specific device/PHB
line/whatever, or can they be configured by the guest?

> Those numbers however are just tokens, they don't have to represent any
> real HW concept. So they can be "allocated" in a rather fixed way, for
> example, you could have something like a fixed map where you put all
> the PCI interrupts at a certain number (a factor of the PHB# with room
> or a fix number per PHB, maybe 16K or so, the HW does 4K max). Another
> based would have a chunk of "general purpose" IPIs (for use for actual
> IPIs and for other things to come). And a range for the virtual device
> interrupts for example. Or you can just use an allocator.

Hm.  So what I'm meaning by an "allocator" is something at least
partially dynamic.  Something you say "give me an irq" and it gives
you the next available or similar.  As opposed to any mapping from
devices to (logical) irqs, which the machine will need to supply one
way or another.

> But it's fundamentally an allocator that sits in the hypervisor, so in
> our case, I would say in the spapr "component" of XIVE, rather than the
> XIVE HW model itself.

Maybe..

> Now what Cedric did, because XIVE is very complex and we need something
> for PAPR quickly, is not a complete HW model, but a somewhat simplified
> one that only handles what PAPR exposes. So in that case where the
> allocator sits is a bit of a TBD...

Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
machine type level needs extreme caution, or the irqs may not be
stable which will generally break migration.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-24  3:28           ` David Gibson
@ 2017-07-24  3:53             ` Alexey Kardashevskiy
  2017-07-24  5:04             ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-24  3:53 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt
  Cc: qemu-devel, qemu-ppc, Cédric Le Goater, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4957 bytes --]

On 24/07/17 13:28, David Gibson wrote:
> On Fri, Jul 21, 2017 at 06:21:31PM +1000, Benjamin Herrenschmidt wrote:
>> On Fri, 2017-07-21 at 17:50 +1000, David Gibson wrote:
>>> On Wed, Jul 19, 2017 at 02:02:18PM +1000, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2017-07-19 at 13:08 +1000, David Gibson wrote:
>>>>>
>>>>> I'm somewhat uncomfortable with an irq allocater here in the intc
>>>>> code.  As a rule, irq allocation is the responsibility of the machine,
>>>>> not any sub-component.  Furthermore, it should allocate in a way which
>>>>> is repeatable, since they need to stay stable across reboots and
>>>>> migrations.
>>>>>
>>>>> And, yes, we have an allocator of sorts in XICS - it has caused a
>>>>> number of problems in the past.
>>>>
>>>> So....
>>>>
>>>> For a bare metal model (which we don't have yet) of XIVE, the IRQ
>>>> numbering is entirely an artifact of how the HW is configured. There
>>>> should thus be no interrupt numbers visible to qemu.
>>>
>>> Uh.. I don't entirely follow.  Do you mean that during boot the guest
>>> programs the irq numbers into the various components?
>>
>> I said a "bare metal model" but yes. Pretty much. 
> 
> Right, by "guest" I meant the kernel running under qemu, even if its
> running on a bare-metal equivalent platform.
> 
>>> In that case this allocator stuff definitely doesn't belong on the
>>> xive code.
>>>
>>>> For a PAPR model things are a bit different, but if we want to
>>>> maximize code re-use between the two, we probably need to make sure
>>>> the interrupts "allocated" by the machine for XIVE can be represented
>>>> by the HW model.
>>>>
>>>> That means:
>>>>
>>>>  - Each chip has a range (high bits are the block ID, which maps to a
>>>> chip, low bits, around 512K to 1M interrupts is the per-chip space).
>>>>
>>>>  - Interrupts 0...N of that range (N depends on how much backing
>>>> memory and MMIO space is provisioned for each chip) are "generic IPIs"
>>>> which are somewhat generic interrupt source that can be triggered with
>>>> an MMIO store and routed to any target. Those are used in PAPR for
>>>> things like IPIs and some type of accelerator interrupts.
>>>>
>>>>  - Portions of that range (which may or may not overlap the 0...N
>>>> above, if they do they "shadow" the generic interrupts) can be
>>>> configured to be the HW sources from the various PCIe bridges and
>>>> the PSI controller.
>>>
>>> Err.. I'm confused how this not sure this relates to spapr.  There are
>>> no chips or PSI there, and the PCI bridges aren't really the same
>>> thing.
>>
>> The above is the HW model, sorry for the confusion. With a few comments
>> about how they are used in PAPR.
>>
>> So yes, in PAPR there's an "allocator" because the hypervisor will
>> create a guest "virtual" (or logical to use PAPR terminology) interrupt
>> number space, in order to represents the various interrupts into the
>> guest.
> 
> Ok, but are each of those logical irqs bound to a specific device/PHB
> line/whatever, or can they be configured by the guest?
> 
>> Those numbers however are just tokens, they don't have to represent any
>> real HW concept. So they can be "allocated" in a rather fixed way, for
>> example, you could have something like a fixed map where you put all
>> the PCI interrupts at a certain number (a factor of the PHB# with room
>> or a fix number per PHB, maybe 16K or so, the HW does 4K max). Another
>> based would have a chunk of "general purpose" IPIs (for use for actual
>> IPIs and for other things to come). And a range for the virtual device
>> interrupts for example. Or you can just use an allocator.
> 
> Hm.  So what I'm meaning by an "allocator" is something at least
> partially dynamic.  Something you say "give me an irq" and it gives
> you the next available or similar.  As opposed to any mapping from
> devices to (logical) irqs, which the machine will need to supply one
> way or another.

I am probably reading it wrong but the XIVE's allocator allocates IRQ
ranges for interrupt source controls (which are CPU cores, PHBs, PSI - in
bare metal - so they allocate just once per machine creation). Individual
interrupts are still allocated via spapr_ics_alloc_block().



> 
>> But it's fundamentally an allocator that sits in the hypervisor, so in
>> our case, I would say in the spapr "component" of XIVE, rather than the
>> XIVE HW model itself.
> 
> Maybe..
> 
>> Now what Cedric did, because XIVE is very complex and we need something
>> for PAPR quickly, is not a complete HW model, but a somewhat simplified
>> one that only handles what PAPR exposes. So in that case where the
>> allocator sits is a bit of a TBD...
> 
> Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
> machine type level needs extreme caution, or the irqs may not be
> stable which will generally break migration.
> 


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
@ 2017-07-24  4:02   ` David Gibson
  2017-07-24  6:00     ` Alexey Kardashevskiy
  2017-07-24 15:13     ` Cédric Le Goater
  0 siblings, 2 replies; 122+ messages in thread
From: David Gibson @ 2017-07-24  4:02 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6072 bytes --]

On Wed, Jul 05, 2017 at 07:13:19PM +0200, Cédric Le Goater wrote:
> This is very similar to the current ICS_SIMPLE model in XICS. We try
> to reuse the ICS model because the sPAPR machine is tied to the
> XICSFabric interface and should be using a common framework to switch
> from one controller model to another: XICS <-> XIVE.

Hm.  I'm not entirely concvinced re-using the xics ICSState class in
this way is a good idea, though maybe it's a reasonable first step.
With this patch alone some code is shared, but there are some real
uglies around the edges.

Seems to me at least long term you need to either 1) make the XIVE ics
separate, even if it has similarities to the XICS one or 2) truly
unify them, with a common base type and methods to handle the
differences.


> The next patch will introduce the MMIO handlers to interact with XIVE
> interrupt sources.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive.c        | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/xive.h |  12 ++++++
>  2 files changed, 122 insertions(+)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 5b14d8155317..9ff14c0da595 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -26,6 +26,115 @@
>  
>  #include "xive-internal.h"
>  
> +static void xive_icp_irq(XiveICSState *xs, int lisn)
> +{
> +
> +}
> +
> +/*
> + * XIVE Interrupt Source
> + */
> +static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
> +{
> +    if (val) {
> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
> +    }
> +}
> +
> +static void xive_ics_set_irq_lsi(XiveICSState *xs, int srcno, int val)
> +{
> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
> +
> +    if (val) {
> +        irq->status |= XICS_STATUS_ASSERTED;
> +    } else {
> +        irq->status &= ~XICS_STATUS_ASSERTED;
> +    }
> +
> +    if (irq->status & XICS_STATUS_ASSERTED
> +        && !(irq->status & XICS_STATUS_SENT)) {
> +        irq->status |= XICS_STATUS_SENT;
> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
> +    }
> +}
> +
> +static void xive_ics_set_irq(void *opaque, int srcno, int val)
> +{
> +    XiveICSState *xs = ICS_XIVE(opaque);
> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
> +
> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> +        xive_ics_set_irq_lsi(xs, srcno, val);
> +    } else {
> +        xive_ics_set_irq_msi(xs, srcno, val);
> +    }
> +}

e.g. you have some code re-use, but still need to more-or-less
duplicate the set_irq code as above.

> +static void xive_ics_reset(void *dev)
> +{
> +    ICSState *ics = ICS_BASE(dev);
> +    int i;
> +    uint8_t flags[ics->nr_irqs];
> +
> +    for (i = 0; i < ics->nr_irqs; i++) {
> +        flags[i] = ics->irqs[i].flags;
> +    }
> +
> +    memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
> +
> +    for (i = 0; i < ics->nr_irqs; i++) {
> +        ics->irqs[i].flags = flags[i];
> +    }

This save, clear, restore is also kind ugly.  I'm also not sure why
this needs a reset method when I can't find one for the xics ICS.

Does the xics irqstate structure really cover what you need for xive?
I had the impression elsewhere that xive had a different priority
model to xics.  And there's the xics pointer in the icsstate structure
which is definitely redundant.

> +}
> +
> +static void xive_ics_realize(ICSState *ics, Error **errp)
> +{
> +    XiveICSState *xs = ICS_XIVE(ics);
> +    Object *obj;
> +    Error *err = NULL;
> +
> +    obj = object_property_get_link(OBJECT(xs), "xive", &err);
> +    if (!obj) {
> +        error_setg(errp, "%s: required link 'xive' not found: %s",
> +                   __func__, error_get_pretty(err));
> +        return;
> +    }
> +    xs->xive = XIVE(obj);
> +
> +    if (!ics->nr_irqs) {
> +        error_setg(errp, "Number of interrupts needs to be greater 0");
> +        return;
> +    }
> +
> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
> +    ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
> +
> +    qemu_register_reset(xive_ics_reset, xs);
> +}
> +
> +static Property xive_ics_properties[] = {
> +    DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
> +    DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void xive_ics_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    ICSStateClass *isc = ICS_BASE_CLASS(klass);
> +
> +    isc->realize = xive_ics_realize;
> +
> +    dc->props = xive_ics_properties;
> +}
> +
> +static const TypeInfo xive_ics_info = {
> +    .name = TYPE_ICS_XIVE,
> +    .parent = TYPE_ICS_BASE,
> +    .instance_size = sizeof(XiveICSState),
> +    .class_init = xive_ics_class_init,
> +};
> +
>  /*
>   * Main XIVE object
>   */
> @@ -123,6 +232,7 @@ static const TypeInfo xive_info = {
>  static void xive_register_types(void)
>  {
>      type_register_static(&xive_info);
> +    type_register_static(&xive_ics_info);
>  }
>  
>  type_init(xive_register_types)
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 863f5a9c6b5f..544cc6e0c796 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -19,9 +19,21 @@
>  #ifndef PPC_XIVE_H
>  #define PPC_XIVE_H
>  
> +#include "hw/ppc/xics.h"
> +
>  typedef struct XIVE XIVE;
> +typedef struct XiveICSState XiveICSState;
>  
>  #define TYPE_XIVE "xive"
>  #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
>  
> +#define TYPE_ICS_XIVE "xive-source"
> +#define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
> +
> +struct XiveICSState {
> +    ICSState parent_obj;
> +
> +    XIVE         *xive;
> +};

>  #endif /* PPC_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source Cédric Le Goater
@ 2017-07-24  4:29   ` David Gibson
  2017-07-24  8:56     ` Benjamin Herrenschmidt
  2017-07-24 15:55     ` Cédric Le Goater
  2017-07-24  6:50   ` Alexey Kardashevskiy
  1 sibling, 2 replies; 122+ messages in thread
From: David Gibson @ 2017-07-24  4:29 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 10016 bytes --]

On Wed, Jul 05, 2017 at 07:13:20PM +0200, Cédric Le Goater wrote:
> Each interrupt source is associated with a 2-bit state machine called
> an Event State Buffer (ESB). It is controlled by MMIO to trigger
> events.
> 
> See code for more details on the states.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive.c        | 230 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/xive.h |   3 +
>  2 files changed, 233 insertions(+)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 9ff14c0da595..816031b8ac81 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -32,6 +32,226 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
>  }
>  
>  /*
> + * "magic" Event State Buffer (ESB) MMIO offsets.
> + *
> + * Each interrupt source has a 2-bit state machine called ESB
> + * which can be controlled by MMIO. It's made of 2 bits, P and
> + * Q. P indicates that an interrupt is pending (has been sent
> + * to a queue and is waiting for an EOI). Q indicates that the
> + * interrupt has been triggered while pending.
> + *
> + * This acts as a coalescing mechanism in order to guarantee
> + * that a given interrupt only occurs at most once in a queue.
> + *
> + * When doing an EOI, the Q bit will indicate if the interrupt
> + * needs to be re-triggered.
> + *
> + * The following offsets into the ESB MMIO allow to read or
> + * manipulate the PQ bits. They must be used with an 8-bytes
> + * load instruction. They all return the previous state of the
> + * interrupt (atomically).
> + *
> + * Additionally, some ESB pages support doing an EOI via a
> + * store at 0 and some ESBs support doing a trigger via a
> + * separate trigger page.
> + */
> +#define XIVE_ESB_GET            0x800
> +#define XIVE_ESB_SET_PQ_00      0xc00
> +#define XIVE_ESB_SET_PQ_01      0xd00
> +#define XIVE_ESB_SET_PQ_10      0xe00
> +#define XIVE_ESB_SET_PQ_11      0xf00
> +
> +#define XIVE_ESB_VAL_P          0x2
> +#define XIVE_ESB_VAL_Q          0x1
> +
> +#define XIVE_ESB_RESET          0x0
> +#define XIVE_ESB_PENDING        0x2
> +#define XIVE_ESB_QUEUED         0x3
> +#define XIVE_ESB_OFF            0x1
> +
> +static uint8_t xive_pq_get(XIVE *x, uint32_t lisn)
> +{
> +    uint32_t idx = lisn;
> +    uint32_t byte = idx / 4;
> +    uint32_t bit  = (idx % 4) * 2;
> +    uint8_t* pqs = (uint8_t *) x->sbe;
> +
> +    return (pqs[byte] >> bit) & 0x3;
> +}
> +
> +static void xive_pq_set(XIVE *x, uint32_t lisn, uint8_t pq)
> +{
> +    uint32_t idx = lisn;
> +    uint32_t byte = idx / 4;
> +    uint32_t bit  = (idx % 4) * 2;
> +    uint8_t* pqs = (uint8_t *) x->sbe;
> +
> +    pqs[byte] &= ~(0x3 << bit);
> +    pqs[byte] |= (pq & 0x3) << bit;

I know it probably amounts to the same thing given the context, but
I'd be more comfortable with a temporary and an obviously atomic
update than two writes to the real state variable.

> +}
> +
> +static bool xive_pq_eoi(XIVE *x, uint32_t lisn)
> +{
> +    uint8_t old_pq = xive_pq_get(x, lisn);
> +
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        xive_pq_set(x, lisn, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_PENDING:
> +        xive_pq_set(x, lisn, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_QUEUED:
> +        xive_pq_set(x, lisn, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_OFF:
> +        xive_pq_set(x, lisn, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +static bool xive_pq_trigger(XIVE *x, uint32_t lisn)
> +{
> +    uint8_t old_pq = xive_pq_get(x, lisn);
> +
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        xive_pq_set(x, lisn, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_PENDING:
> +        xive_pq_set(x, lisn, XIVE_ESB_QUEUED);
> +        return true;
> +    case XIVE_ESB_QUEUED:
> +        xive_pq_set(x, lisn, XIVE_ESB_QUEUED);
> +        return true;
> +    case XIVE_ESB_OFF:
> +        xive_pq_set(x, lisn, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +/*
> + * XIVE Interrupt Source MMIOs
> + */
> +static void xive_ics_eoi(XiveICSState *xs, uint32_t srcno)
> +{
> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
> +
> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> +        irq->status &= ~XICS_STATUS_SENT;
> +    }
> +}
> +
> +/* TODO: handle second page */

Is this comment still relevent?

> +static uint64_t xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    XiveICSState *xs = ICS_XIVE(opaque);
> +    XIVE *x = xs->xive;
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t srcno = addr >> xs->esb_shift;
> +    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
> +    XiveIVE *ive;
> +    uint64_t ret = -1;
> +
> +    ive = xive_get_ive(x, lisn);
> +    if (!ive || !(ive->w & IVE_VALID))  {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> +        goto out;
> +    }
> +
> +    if (srcno >= ICS_BASE(xs)->nr_irqs) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
> +                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
> +        goto out;
> +    }
> +
> +    switch (offset) {
> +    case 0:
> +        xive_ics_eoi(xs, srcno);
> +
> +        /* return TRUE or FALSE depending on PQ value */
> +        ret = xive_pq_eoi(x, lisn);
> +        break;
> +
> +    case XIVE_ESB_GET:
> +        ret = xive_pq_get(x, lisn);
> +        break;
> +
> +    case XIVE_ESB_SET_PQ_00:
> +    case XIVE_ESB_SET_PQ_01:
> +    case XIVE_ESB_SET_PQ_10:
> +    case XIVE_ESB_SET_PQ_11:
> +        ret = xive_pq_get(x, lisn);
> +        xive_pq_set(x, lisn, (offset >> 8) & 0x3);

Again I'd prefer xive_pq_set() return the old value itself, for more
obvious atomicity.

> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
> +    }
> +
> +out:
> +    return ret;
> +}
> +
> +static void xive_esb_write(void *opaque, hwaddr addr,
> +                           uint64_t value, unsigned size)
> +{
> +    XiveICSState *xs = ICS_XIVE(opaque);
> +    XIVE *x = xs->xive;
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t srcno = addr >> xs->esb_shift;
> +    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
> +    XiveIVE *ive;
> +    bool notify = false;
> +
> +    ive = xive_get_ive(x, lisn);
> +    if (!ive || !(ive->w & IVE_VALID))  {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> +        return;
> +    }

Having this code associated with the individual ICS look directly at
the IVE table in the core xive object seems a bit dubious.  This also
points out another mismatch between the re-used ICS code and the new
XIVE code: ICS gathers all the per-source-irq flags/state into the
irqstate structure, whereas xive has per-irq information in the
centralized ecb and IVE tables.  There can certainly be good reasons
for that, but using both at once is kind of clunky.

> +    if (srcno >= ICS_BASE(xs)->nr_irqs) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
> +                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
> +        return;
> +    }
> +
> +    switch (offset) {
> +    case 0:
> +        /* TODO: should we trigger even if the IVE is masked ? */
> +        notify = xive_pq_trigger(x, lisn);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> +                      offset);
> +        return;
> +    }
> +
> +    if (notify && !(ive->w & IVE_MASKED)) {
> +        qemu_irq_pulse(ICS_BASE(xs)->qirqs[srcno]);
> +    }
> +}
> +
> +static const MemoryRegionOps xive_esb_ops = {
> +    .read = xive_esb_read,
> +    .write = xive_esb_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +/*
>   * XIVE Interrupt Source
>   */
>  static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
> @@ -106,15 +326,25 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>          return;
>      }
>  
> +    if (!xs->esb_shift) {
> +        error_setg(errp, "ESB page size needs to be greater 0");
> +        return;
> +    }
> +
>      ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>      ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
>  
> +    memory_region_init_io(&xs->esb_iomem, OBJECT(xs), &xive_esb_ops, xs,
> +                          "xive.esb",
> +                          (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
> +
>      qemu_register_reset(xive_ics_reset, xs);
>  }
>  
>  static Property xive_ics_properties[] = {
>      DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
>      DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
> +    DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 544cc6e0c796..5303d96f5f59 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -33,6 +33,9 @@ typedef struct XiveICSState XiveICSState;
>  struct XiveICSState {
>      ICSState parent_obj;
>  
> +    uint32_t     esb_shift;
> +    MemoryRegion esb_iomem;
> +
>      XIVE         *xive;
>  };
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags " Cédric Le Goater
@ 2017-07-24  4:36   ` David Gibson
  2017-07-24  7:00     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  4:36 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2251 bytes --]

On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
> These flags define some characteristics of the source :
> 
>  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
>                        specific hcall H_INT_ESB

What's the other option?

>  - XIVE_SRC_LSI        LSI or MSI source

Hrm.  This definitely duplicates info that is in the XICS per irq
state which you're re-using (and which you're using in the xive code
at this point).

>  - XIVE_SRC_TRIGGER    the full function page supports trigger
>  - XIVE_SRC_STORE_EOI  EOI can with a store.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive.c        | 1 +
>  include/hw/ppc/xive.h | 9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 816031b8ac81..8f8bb8b787bd 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -345,6 +345,7 @@ static Property xive_ics_properties[] = {
>      DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
>      DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
>      DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
> +    DEFINE_PROP_UINT64("flags", XiveICSState, flags, 0),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 5303d96f5f59..1178300c9df3 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -30,9 +30,18 @@ typedef struct XiveICSState XiveICSState;
>  #define TYPE_ICS_XIVE "xive-source"
>  #define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
>  
> +/*
> + * XIVE Interrupt source flags
> + */
> +#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
> +#define XIVE_SRC_LSI           (1ull << (63 - 61))
> +#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
> +#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
> +
>  struct XiveICSState {
>      ICSState parent_obj;
>  
> +    uint64_t     flags;
>      uint32_t     esb_shift;
>      MemoryRegion esb_iomem;
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs Cédric Le Goater
@ 2017-07-24  4:49   ` David Gibson
  2017-07-24  6:09     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  4:49 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5082 bytes --]

On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
> Each source adds its own ESB mempry region to the overall ESB memory
> region of the controller. It will be mapped in the CPU address space
> when XIVE is activated.
> 
> The default mapping address for the ESB memory region is the same one
> used on baremetal.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive-internal.h |  5 +++++
>  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index 8e755aa88a14..c06be823aad0 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -98,6 +98,7 @@ struct XIVE {
>      SysBusDevice parent;
>  
>      /* Properties */
> +    uint32_t     chip_id;

So there is a XIVE object per chip.  How does this work on PAPR?  One
logical chip/XIVE, or something more complex?

>      uint32_t     nr_targets;
>  
>      /* IRQ number allocator */
> @@ -111,6 +112,10 @@ struct XIVE {
>      void         *sbe;
>      XiveIVE      *ivt;
>      XiveEQ       *eqdt;
> +
> +    /* ESB and TIMA memory location */
> +    hwaddr       vc_base;
> +    MemoryRegion esb_iomem;
>  };
>  
>  void xive_reset(void *dev);
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 8f8bb8b787bd..a1cb87a07b76 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -312,6 +312,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>      XiveICSState *xs = ICS_XIVE(ics);
>      Object *obj;
>      Error *err = NULL;
> +    XIVE *x;

I don't really like just 'x' for a context variable like this (as
opposed to a temporary).

>  
>      obj = object_property_get_link(OBJECT(xs), "xive", &err);
>      if (!obj) {
> @@ -319,7 +320,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>                     __func__, error_get_pretty(err));
>          return;
>      }
> -    xs->xive = XIVE(obj);
> +    x = xs->xive = XIVE(obj);
>  
>      if (!ics->nr_irqs) {
>          error_setg(errp, "Number of interrupts needs to be greater 0");
> @@ -338,6 +339,11 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>                            "xive.esb",
>                            (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
>  
> +    /* Install the ESB memory region in the overall one */
> +    memory_region_add_subregion(&x->esb_iomem,
> +                                ICS_BASE(xs)->offset * (1 << xs->esb_shift),
> +                                &xs->esb_iomem);
> +
>      qemu_register_reset(xive_ics_reset, xs);
>  }
>  
> @@ -375,6 +381,32 @@ static const TypeInfo xive_ics_info = {
>   */
>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
>  
> +/* VC BAR contains set translations for the ESBs and the EQs. */
> +#define VC_BAR_DEFAULT   0x10000000000ull
> +#define VC_BAR_SIZE      0x08000000000ull
> +
> +#define P9_MMIO_BASE     0x006000000000000ull
> +#define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))

chip-based MMIO addresses leaking into the PAPR model seems like it
might not be what you want

> +static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
> +{
> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> +                  __func__, offset, size);
> +    return 0;
> +}
> +
> +static void xive_esb_default_write(void *opaque, hwaddr offset, uint64_t value,
> +                unsigned size)
> +{
> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> +                  __func__, offset, value, size);
> +}
> +
> +static const MemoryRegionOps xive_esb_default_ops = {
> +    .read = xive_esb_default_read,
> +    .write = xive_esb_default_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
>  
>  void xive_reset(void *dev)
>  {
> @@ -435,10 +467,20 @@ static void xive_realize(DeviceState *dev, Error **errp)
>      x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
>                          sizeof(XiveEQ));
>  
> +    /* VC BAR. That's the full window but we will only map the
> +     * subregions in use. */
> +    x->vc_base = (hwaddr)(P9_CHIP_BASE(x->chip_id) | VC_BAR_DEFAULT);
> +
> +    /* install default memory region handlers to log bogus access */
> +    memory_region_init_io(&x->esb_iomem, NULL, &xive_esb_default_ops,
> +                          NULL, "xive.esb", VC_BAR_SIZE);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
> +
>      qemu_register_reset(xive_reset, dev);
>  }
>  
>  static Property xive_properties[] = {
> +    DEFINE_PROP_UINT32("chip-id", XIVE, chip_id, 0),
>      DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
>      DEFINE_PROP_END_OF_LIST(),
>  };

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-24  3:28           ` David Gibson
  2017-07-24  3:53             ` Alexey Kardashevskiy
@ 2017-07-24  5:04             ` Benjamin Herrenschmidt
  2017-07-24  5:38               ` David Gibson
  1 sibling, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  5:04 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

On Mon, 2017-07-24 at 13:28 +1000, David Gibson wrote:
> > So yes, in PAPR there's an "allocator" because the hypervisor will
> > create a guest "virtual" (or logical to use PAPR terminology) interrupt
> > number space, in order to represents the various interrupts into the
> > guest.
> 
> Ok, but are each of those logical irqs bound to a specific device/PHB
> line/whatever, or can they be configured by the guest?

So for clarity, let's first establish the terminology :-)

 - HW number is a HW interrupt number on a "bare metal" system or
powernv guest. For now we will ignore those, they are effectively a
side effect of how skiboot configure the XIVE and qemu per-se doesn't
allocate them.

 - A logical number is a "guest physical" interrupt number for a PAPR
guest. These fall into roughly 2 categories at the moment:

    * "interrupts" (or related) properties in the DT, typically
interrupts for a PCI device, ranges of MSIs etc... that correspond to
HW sources from a PHB.

    * "generic IPIs". Those are ranges of "generic" interrupts that the
hypervisor gives the guest. On a real system, they correspond to chunks
allocated off a HW facility for generic interrupts. Generic interrupts
are the same as normal interrupts from the prespective of
managing/receiving them, but are "triggered" by an MMIO to a certain HW
page. There's a DT property telling the guest the interrupt number
ranges for these guys.

So that logical number above is what a PAPR guest obtains from the DT
and uses for the various H-call used to manage and configure interrupt
sources.

In addition, the XIVE supports renumbering the interrupt number that
you obtain in the queues. Both bare metal linux, KVM and guests make
use of this. This only changes the number you observe in a queue when
you receive an interrupt, it has no effect on the HW number or logical
number used for the various management calls.

This is used by Linux so that:

  - On bare metal systems or PAPR guest with "exploitation mode" (ie,
PAPR guest directly using the XIVE), we put the linux interrupt number
in there as to avoid the reverse-mapping done by linux otherwise when
receiving an interrupt.

  - On PARP guests using the legacy hcalls, KVM configures the logical
number there.

> > Those numbers however are just tokens, they don't have to represent any
> > real HW concept. So they can be "allocated" in a rather fixed way, for
> > example, you could have something like a fixed map where you put all
> > the PCI interrupts at a certain number (a factor of the PHB# with room
> > or a fix number per PHB, maybe 16K or so, the HW does 4K max). Another
> > based would have a chunk of "general purpose" IPIs (for use for actual
> > IPIs and for other things to come). And a range for the virtual device
> > interrupts for example. Or you can just use an allocator.
> 
> Hm.  So what I'm meaning by an "allocator" is something at least
> partially dynamic.  Something you say "give me an irq" and it gives
> you the next available or similar.  As opposed to any mapping from
> devices to (logical) irqs, which the machine will need to supply one
> way or another.

For the sake of repeatability/migration etc... I think a mapping is
better than an allocator.  IE, a fixed number scheme so that the range
of interrupts for PHB#x is always a fixed function of x.

We can fix the number of "generic" interrupts given to a guest. The
only requirements from a PAPR perspective is that there should be at
least as many as there are possible threads in the guest so they can be
used as IPIs.

But we may need more for other things. We can make this a machine
parameter with a default value of something like 4096. If we call N
that number of extra generic interrupts, then the number of generic
interrutps would be #possible-vcpu's + N, or something like that.

> > But it's fundamentally an allocator that sits in the hypervisor, so in
> > our case, I would say in the spapr "component" of XIVE, rather than the
> > XIVE HW model itself.
> 
> Maybe..

You are right in that a mapping is a better term than an allocator
here.

> > Now what Cedric did, because XIVE is very complex and we need something
> > for PAPR quickly, is not a complete HW model, but a somewhat simplified
> > one that only handles what PAPR exposes. So in that case where the
> > allocator sits is a bit of a TBD...
> 
> Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
> machine type level needs extreme caution, or the irqs may not be
> stable which will generally break migration.

Yes you are right. We should probably create a more "static" scheme.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 10/26] ppc/xive: record interrupt source MMIO address for hcalls
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 10/26] ppc/xive: record interrupt source MMIO address for hcalls Cédric Le Goater
@ 2017-07-24  5:11   ` David Gibson
  2017-07-24 13:45     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  5:11 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]

On Wed, Jul 05, 2017 at 07:13:23PM +0200, Cédric Le Goater wrote:
> The address of the MMIO page through which the Event State Buffer is
> controlled is returned to the guest by the H_INT_GET_SOURCE_INFO hcall.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive.c        | 3 +++
>  include/hw/ppc/xive.h | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index a1cb87a07b76..0db97fd33981 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -344,6 +344,9 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>                                  ICS_BASE(xs)->offset * (1 << xs->esb_shift),
>                                  &xs->esb_iomem);
>  
> +    /* Record base address which is needed by the hcalls */
> +    xs->esb_base = x->vc_base + ICS_BASE(xs)->offset * (1 << xs->esb_shift);

This doesn't seem like it needs to be stored in the persistent object
- it can be calculated when the hcall is made.  Plus if it's for the
hcll it only makes sense for spapr.

>      qemu_register_reset(xive_ics_reset, xs);
>  }
>  
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 1178300c9df3..b06bc861b845 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -43,6 +43,7 @@ struct XiveICSState {
>  
>      uint64_t     flags;
>      uint32_t     esb_shift;
> +    hwaddr       esb_base;
>      MemoryRegion esb_iomem;
>  
>      XIVE         *xive;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects Cédric Le Goater
@ 2017-07-24  5:13   ` David Gibson
  2017-07-24 13:58     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  5:13 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3798 bytes --]

On Wed, Jul 05, 2017 at 07:13:24PM +0200, Cédric Le Goater wrote:
> This handler will be used to customize the ouput of the XIVE interrupt
> source and presenter objects.

I'm not really happy with this without having a clear idea of where
this is heading - are you trying to share ICP and or ICS object
classes between XICS and XIVE, or will they eventually be separated
again?

> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xics.c        | 36 ++++++++++++++++++++++++------------
>  include/hw/ppc/xics.h |  2 ++
>  2 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index faa5c631f655..7837c2022b4a 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -40,18 +40,26 @@
>  
>  void icp_pic_print_info(ICPState *icp, Monitor *mon)
>  {
> +    ICPStateClass *k = ICP_GET_CLASS(icp);
>      int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
>  
>      if (!icp->output) {
>          return;
>      }
> -    monitor_printf(mon, "CPU %d XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
> -                   cpu_index, icp->xirr, icp->xirr_owner,
> -                   icp->pending_priority, icp->mfrr);
> +
> +    monitor_printf(mon, "CPU %d ", cpu_index);
> +    if (k->print_info) {
> +        k->print_info(icp, mon);
> +    } else {
> +        monitor_printf(mon, "XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
> +                       icp->xirr, icp->xirr_owner,
> +                       icp->pending_priority, icp->mfrr);
> +    }
>  }
>  
>  void ics_pic_print_info(ICSState *ics, Monitor *mon)
>  {
> +    ICSStateClass *k = ICS_BASE_GET_CLASS(ics);
>      uint32_t i;
>  
>      monitor_printf(mon, "ICS %4x..%4x %p\n",
> @@ -61,17 +69,21 @@ void ics_pic_print_info(ICSState *ics, Monitor *mon)
>          return;
>      }
>  
> -    for (i = 0; i < ics->nr_irqs; i++) {
> -        ICSIRQState *irq = ics->irqs + i;
> +    if (k->print_info) {
> +        k->print_info(ics, mon);
> +    } else {
> +        for (i = 0; i < ics->nr_irqs; i++) {
> +            ICSIRQState *irq = ics->irqs + i;
>  
> -        if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
> -            continue;
> +            if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
> +                continue;
> +            }
> +            monitor_printf(mon, "  %4x %s %02x %02x\n",
> +                           ics->offset + i,
> +                           (irq->flags & XICS_FLAGS_IRQ_LSI) ?
> +                           "LSI" : "MSI",
> +                           irq->priority, irq->status);
>          }
> -        monitor_printf(mon, "  %4x %s %02x %02x\n",
> -                       ics->offset + i,
> -                       (irq->flags & XICS_FLAGS_IRQ_LSI) ?
> -                       "LSI" : "MSI",
> -                       irq->priority, irq->status);
>      }
>  }
>  
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 28d248abad61..902f3bfd0e33 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -69,6 +69,7 @@ struct ICPStateClass {
>      void (*pre_save)(ICPState *icp);
>      int (*post_load)(ICPState *icp, int version_id);
>      void (*reset)(ICPState *icp);
> +    void (*print_info)(ICPState *icp, Monitor *mon);
>  };
>  
>  struct ICPState {
> @@ -119,6 +120,7 @@ struct ICSStateClass {
>      void (*reject)(ICSState *s, uint32_t irq);
>      void (*resend)(ICSState *s);
>      void (*eoi)(ICSState *s, uint32_t irq);
> +    void (*print_info)(ICSState *s, Monitor *mon);
>  };
>  
>  struct ICSState {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-24  5:04             ` Benjamin Herrenschmidt
@ 2017-07-24  5:38               ` David Gibson
  2017-07-24  7:20                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  5:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5707 bytes --]

On Mon, Jul 24, 2017 at 03:04:05PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 13:28 +1000, David Gibson wrote:
> > > So yes, in PAPR there's an "allocator" because the hypervisor will
> > > create a guest "virtual" (or logical to use PAPR terminology) interrupt
> > > number space, in order to represents the various interrupts into the
> > > guest.
> > 
> > Ok, but are each of those logical irqs bound to a specific device/PHB
> > line/whatever, or can they be configured by the guest?
> 
> So for clarity, let's first establish the terminology :-)
> 
>  - HW number is a HW interrupt number on a "bare metal" system or
> powernv guest. For now we will ignore those, they are effectively a
> side effect of how skiboot configure the XIVE and qemu per-se doesn't
> allocate them.
> 
>  - A logical number is a "guest physical" interrupt number for a PAPR
> guest. These fall into roughly 2 categories at the moment:
> 
>     * "interrupts" (or related) properties in the DT, typically
> interrupts for a PCI device, ranges of MSIs etc... that correspond to
> HW sources from a PHB.

Ok, I think this is the one I've mostly been thinking of.

>     * "generic IPIs". Those are ranges of "generic" interrupts that the
> hypervisor gives the guest. On a real system, they correspond to chunks
> allocated off a HW facility for generic interrupts. Generic interrupts
> are the same as normal interrupts from the prespective of
> managing/receiving them, but are "triggered" by an MMIO to a certain HW
> page. There's a DT property telling the guest the interrupt number
> ranges for these guys.
> 
> So that logical number above is what a PAPR guest obtains from the DT
> and uses for the various H-call used to manage and configure interrupt
> sources.

Ok.

> In addition, the XIVE supports renumbering the interrupt number that
> you obtain in the queues. Both bare metal linux, KVM and guests make
> use of this. This only changes the number you observe in a queue when
> you receive an interrupt, it has no effect on the HW number or logical
> number used for the various management calls.

Ok.

> This is used by Linux so that:
> 
>   - On bare metal systems or PAPR guest with "exploitation mode" (ie,
> PAPR guest directly using the XIVE), we put the linux interrupt number
> in there as to avoid the reverse-mapping done by linux otherwise when
> receiving an interrupt.
> 
>   - On PARP guests using the legacy hcalls, KVM configures the logical
> number there.

Ok.

> > > Those numbers however are just tokens, they don't have to represent any
> > > real HW concept. So they can be "allocated" in a rather fixed way, for
> > > example, you could have something like a fixed map where you put all
> > > the PCI interrupts at a certain number (a factor of the PHB# with room
> > > or a fix number per PHB, maybe 16K or so, the HW does 4K max). Another
> > > based would have a chunk of "general purpose" IPIs (for use for actual
> > > IPIs and for other things to come). And a range for the virtual device
> > > interrupts for example. Or you can just use an allocator.
> > 
> > Hm.  So what I'm meaning by an "allocator" is something at least
> > partially dynamic.  Something you say "give me an irq" and it gives
> > you the next available or similar.  As opposed to any mapping from
> > devices to (logical) irqs, which the machine will need to supply one
> > way or another.
> 
> For the sake of repeatability/migration etc... I think a mapping is
> better than an allocator.  IE, a fixed number scheme so that the range
> of interrupts for PHB#x is always a fixed function of x.

Yes, I agree.  In fact that's pretty much exactly the point I'm trying
to make.

Can we assign our logical numbers sparsely, or will that cause other
problems?

Note that for PAPR we also have the question of finding logical
interrupts for legacy PAPR VIO devices.

> We can fix the number of "generic" interrupts given to a guest. The
> only requirements from a PAPR perspective is that there should be at
> least as many as there are possible threads in the guest so they can be
> used as IPIs.

Ok.  If we can do things sparsely, allocating these well away from the
hw interrupts would make things easier.

> But we may need more for other things. We can make this a machine
> parameter with a default value of something like 4096. If we call N
> that number of extra generic interrupts, then the number of generic
> interrutps would be #possible-vcpu's + N, or something like that.

That seems reasonable.

> > > But it's fundamentally an allocator that sits in the hypervisor, so in
> > > our case, I would say in the spapr "component" of XIVE, rather than the
> > > XIVE HW model itself.
> > 
> > Maybe..
> 
> You are right in that a mapping is a better term than an allocator
> here.
> 
> > > Now what Cedric did, because XIVE is very complex and we need something
> > > for PAPR quickly, is not a complete HW model, but a somewhat simplified
> > > one that only handles what PAPR exposes. So in that case where the
> > > allocator sits is a bit of a TBD...
> > 
> > Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
> > machine type level needs extreme caution, or the irqs may not be
> > stable which will generally break migration.
> 
> Yes you are right. We should probably create a more "static" scheme.

Sounds like we're in violent agreement.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model
  2017-07-24  4:02   ` David Gibson
@ 2017-07-24  6:00     ` Alexey Kardashevskiy
  2017-07-24 15:20       ` Cédric Le Goater
  2017-07-24 15:13     ` Cédric Le Goater
  1 sibling, 1 reply; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-24  6:00 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, Alexander Graf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6277 bytes --]

On 24/07/17 14:02, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:19PM +0200, Cédric Le Goater wrote:
>> This is very similar to the current ICS_SIMPLE model in XICS. We try
>> to reuse the ICS model because the sPAPR machine is tied to the
>> XICSFabric interface and should be using a common framework to switch
>> from one controller model to another: XICS <-> XIVE.
> 
> Hm.  I'm not entirely concvinced re-using the xics ICSState class in
> this way is a good idea, though maybe it's a reasonable first step.
> With this patch alone some code is shared, but there are some real
> uglies around the edges.


Agree, using the "ICS" term in XIVE is quite confusing as "ICS" is not
mentioned in neither XIVE nor P9 specs.

> 
> Seems to me at least long term you need to either 1) make the XIVE ics
> separate, even if it has similarities to the XICS one or 2) truly
> unify them, with a common base type and methods to handle the
> differences.
> 
> 
>> The next patch will introduce the MMIO handlers to interact with XIVE
>> interrupt sources.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c        | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/xive.h |  12 ++++++
>>  2 files changed, 122 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 5b14d8155317..9ff14c0da595 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -26,6 +26,115 @@
>>  
>>  #include "xive-internal.h"
>>  
>> +static void xive_icp_irq(XiveICSState *xs, int lisn)
>> +{
>> +
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source
>> + */
>> +static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
>> +{
>> +    if (val) {
>> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
>> +    }
>> +}
>> +
>> +static void xive_ics_set_irq_lsi(XiveICSState *xs, int srcno, int val)
>> +{
>> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
>> +
>> +    if (val) {
>> +        irq->status |= XICS_STATUS_ASSERTED;
>> +    } else {
>> +        irq->status &= ~XICS_STATUS_ASSERTED;
>> +    }
>> +
>> +    if (irq->status & XICS_STATUS_ASSERTED
>> +        && !(irq->status & XICS_STATUS_SENT)) {
>> +        irq->status |= XICS_STATUS_SENT;
>> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
>> +    }
>> +}
>> +
>> +static void xive_ics_set_irq(void *opaque, int srcno, int val)
>> +{
>> +    XiveICSState *xs = ICS_XIVE(opaque);
>> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
>> +
>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>> +        xive_ics_set_irq_lsi(xs, srcno, val);
>> +    } else {
>> +        xive_ics_set_irq_msi(xs, srcno, val);
>> +    }
>> +}
> 
> e.g. you have some code re-use, but still need to more-or-less
> duplicate the set_irq code as above.
> 
>> +static void xive_ics_reset(void *dev)
>> +{
>> +    ICSState *ics = ICS_BASE(dev);
>> +    int i;
>> +    uint8_t flags[ics->nr_irqs];
>> +
>> +    for (i = 0; i < ics->nr_irqs; i++) {
>> +        flags[i] = ics->irqs[i].flags;
>> +    }
>> +
>> +    memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
>> +
>> +    for (i = 0; i < ics->nr_irqs; i++) {
>> +        ics->irqs[i].flags = flags[i];
>> +    }
> 
> This save, clear, restore is also kind ugly.  I'm also not sure why
> this needs a reset method when I can't find one for the xics ICS.
> 
> Does the xics irqstate structure really cover what you need for xive?
> I had the impression elsewhere that xive had a different priority
> model to xics.  And there's the xics pointer in the icsstate structure
> which is definitely redundant.
> 
>> +}
>> +
>> +static void xive_ics_realize(ICSState *ics, Error **errp)
>> +{
>> +    XiveICSState *xs = ICS_XIVE(ics);
>> +    Object *obj;
>> +    Error *err = NULL;
>> +
>> +    obj = object_property_get_link(OBJECT(xs), "xive", &err);
>> +    if (!obj) {
>> +        error_setg(errp, "%s: required link 'xive' not found: %s",
>> +                   __func__, error_get_pretty(err));
>> +        return;
>> +    }
>> +    xs->xive = XIVE(obj);
>> +
>> +    if (!ics->nr_irqs) {
>> +        error_setg(errp, "Number of interrupts needs to be greater 0");
>> +        return;
>> +    }
>> +
>> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>> +    ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
>> +
>> +    qemu_register_reset(xive_ics_reset, xs);
>> +}
>> +
>> +static Property xive_ics_properties[] = {
>> +    DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
>> +    DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void xive_ics_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    ICSStateClass *isc = ICS_BASE_CLASS(klass);
>> +
>> +    isc->realize = xive_ics_realize;
>> +
>> +    dc->props = xive_ics_properties;
>> +}
>> +
>> +static const TypeInfo xive_ics_info = {
>> +    .name = TYPE_ICS_XIVE,
>> +    .parent = TYPE_ICS_BASE,
>> +    .instance_size = sizeof(XiveICSState),
>> +    .class_init = xive_ics_class_init,
>> +};
>> +
>>  /*
>>   * Main XIVE object
>>   */
>> @@ -123,6 +232,7 @@ static const TypeInfo xive_info = {
>>  static void xive_register_types(void)
>>  {
>>      type_register_static(&xive_info);
>> +    type_register_static(&xive_ics_info);
>>  }
>>  
>>  type_init(xive_register_types)
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 863f5a9c6b5f..544cc6e0c796 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -19,9 +19,21 @@
>>  #ifndef PPC_XIVE_H
>>  #define PPC_XIVE_H
>>  
>> +#include "hw/ppc/xics.h"
>> +
>>  typedef struct XIVE XIVE;
>> +typedef struct XiveICSState XiveICSState;
>>  
>>  #define TYPE_XIVE "xive"
>>  #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
>>  
>> +#define TYPE_ICS_XIVE "xive-source"
>> +#define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
>> +
>> +struct XiveICSState {
>> +    ICSState parent_obj;
>> +
>> +    XIVE         *xive;
>> +};
> 
>>  #endif /* PPC_XIVE_H */
> 


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 13/26] ppc/xive: introduce a XIVE interrupt presenter model
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 13/26] ppc/xive: introduce a XIVE interrupt presenter model Cédric Le Goater
@ 2017-07-24  6:05   ` David Gibson
  2017-07-24 14:02     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  6:05 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8956 bytes --]

On Wed, Jul 05, 2017 at 07:13:26PM +0200, Cédric Le Goater wrote:
> Just like the interrupt source model, we try to reuse the ICP model
> because the sPAPR machine is tied to the XICSFabric interface and
> should be using a common framework to switch from one controller model
> to another: XICS <-> XIVE.
> 
> The XIVE interrupt presenter exposes a set of Thread Interrupt
> Management Areas, also called rings, one per different level of
> privilege (four in all). We only expose the OS ring for the sPAPR
> support for the moment. This area is used to handle priority
> management and interrupt acknowledgment among other things.
> 
> The next patch will introduce the MMIO handlers to interact with the
> TIMA, OS only.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

As with the ICS, I'm not really clear where you're going with this.
Is this a first step towards independent xics and xive ICP objects, or
a first step towards fully unified xics/xive ICPs?

> ---
>  hw/intc/xive-internal.h | 84 +++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive.c          | 43 +++++++++++++++++++++++++
>  include/hw/ppc/xive.h   | 14 +++++++++
>  3 files changed, 141 insertions(+)
> 
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index c06be823aad0..ba5e648a5258 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -24,6 +24,90 @@
>  #define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>                                   PPC_BIT32(bs))
>  
> +/*
> + * Thread Management (aka "TM") registers
> + */
> +
> +/* TM register offsets */
> +#define TM_QW0_USER             0x000 /* All rings */
> +#define TM_QW1_OS               0x010 /* Ring 0..2 */
> +#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
> +#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
> +
> +/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
> +#define TM_NSR                  0x0  /*  +   +   -   +  */
> +#define TM_CPPR                 0x1  /*  -   +   -   +  */
> +#define TM_IPB                  0x2  /*  -   +   +   +  */
> +#define TM_LSMFB                0x3  /*  -   +   +   +  */
> +#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
> +#define TM_INC                  0x5  /*  -   +   -   +  */
> +#define TM_AGE                  0x6  /*  -   +   -   +  */
> +#define TM_PIPR                 0x7  /*  -   +   -   +  */
> +
> +#define TM_WORD0                0x0
> +#define TM_WORD1                0x4
> +
> +/*
> + * QW word 2 contains the valid bit at the top and other fields
> + * depending on the QW.
> + */
> +#define TM_WORD2                0x8
> +#define   TM_QW0W2_VU           PPC_BIT32(0)
> +#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
> +#define   TM_QW1W2_VO           PPC_BIT32(0)
> +#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
> +#define   TM_QW2W2_VP           PPC_BIT32(0)
> +#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
> +#define   TM_QW3W2_VT           PPC_BIT32(0)
> +#define   TM_QW3W2_LP           PPC_BIT32(6)
> +#define   TM_QW3W2_LE           PPC_BIT32(7)
> +#define   TM_QW3W2_T            PPC_BIT32(31)
> +
> +/*
> + * In addition to normal loads to "peek" and writes (only when invalid)
> + * using 4 and 8 bytes accesses, the above registers support these
> + * "special" byte operations:
> + *
> + *   - Byte load from QW0[NSR] - User level NSR (EBB)
> + *   - Byte store to QW0[NSR] - User level NSR (EBB)
> + *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
> + *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
> + *                                    otherwise VT||0000000
> + *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
> + *
> + * Then we have all these "special" CI ops at these offset that trigger
> + * all sorts of side effects:
> + */
> +#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
> +#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
> +#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
> +#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
> +                                         * context */
> +#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
> +#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
> +                                         * context to reg */
> +#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
> +                                         * context to reg*/
> +#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
> +#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
> +                                         * line */
> +#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
> +#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
> +                                         * line */
> +#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
> +/* XXX more... */
> +
> +/* NSR fields for the various QW ack types */
> +#define TM_QW0_NSR_EB           PPC_BIT8(0)
> +#define TM_QW1_NSR_EO           PPC_BIT8(0)
> +#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
> +#define  TM_QW3_NSR_HE_NONE     0
> +#define  TM_QW3_NSR_HE_POOL     1
> +#define  TM_QW3_NSR_HE_PHYS     2
> +#define  TM_QW3_NSR_HE_LSI      3
> +#define TM_QW3_NSR_I            PPC_BIT8(2)
> +#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
> +
>  /* IVE/EAS
>   *
>   * One per interrupt source. Targets that interrupt to a given EQ
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index db808e0cbe3d..c08a4f8efb58 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -26,6 +26,48 @@
>  
>  #include "xive-internal.h"
>  
> +static void xive_icp_reset(ICPState *icp)
> +{
> +    XiveICPState *xicp = XIVE_ICP(icp);
> +
> +    memset(xicp->tima, 0, sizeof(xicp->tima));
> +}
> +
> +static void xive_icp_print_info(ICPState *icp, Monitor *mon)
> +{
> +    XiveICPState *xicp = XIVE_ICP(icp);
> +
> +    monitor_printf(mon, " CPPR=%02x IPB=%02x PIPR=%02x NSR=%02x\n",
> +                   xicp->tima_os[TM_CPPR], xicp->tima_os[TM_IPB],
> +                   xicp->tima_os[TM_PIPR], xicp->tima_os[TM_NSR]);
> +}
> +
> +static void xive_icp_init(Object *obj)
> +{
> +    XiveICPState *xicp = XIVE_ICP(obj);
> +
> +    xicp->tima_os = &xicp->tima[TM_QW1_OS];

Storing an easily derivable pointer in your structure seems a bit
pointless.

> +}
> +
> +static void xive_icp_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    ICPStateClass *icpc = ICP_CLASS(klass);
> +
> +    dc->desc = "PowerNV Xive ICP";
> +    icpc->reset = xive_icp_reset;
> +    icpc->print_info = xive_icp_print_info;
> +}
> +
> +static const TypeInfo xive_icp_info = {
> +    .name          = TYPE_XIVE_ICP,
> +    .parent        = TYPE_ICP,
> +    .instance_size = sizeof(XiveICPState),
> +    .instance_init = xive_icp_init,
> +    .class_init    = xive_icp_class_init,
> +    .class_size    = sizeof(ICPStateClass),
> +};
> +
>  static void xive_icp_irq(XiveICSState *xs, int lisn)
>  {
>  
> @@ -529,6 +571,7 @@ static void xive_register_types(void)
>  {
>      type_register_static(&xive_info);
>      type_register_static(&xive_ics_info);
> +    type_register_static(&xive_icp_info);
>  }
>  
>  type_init(xive_register_types)
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index b06bc861b845..f87df8107dd9 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -23,6 +23,7 @@
>  
>  typedef struct XIVE XIVE;
>  typedef struct XiveICSState XiveICSState;
> +typedef struct XiveICPState XiveICPState;
>  
>  #define TYPE_XIVE "xive"
>  #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
> @@ -38,6 +39,9 @@ typedef struct XiveICSState XiveICSState;
>  #define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
>  #define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
>  
> +#define TYPE_XIVE_ICP "xive-icp"
> +#define XIVE_ICP(obj) OBJECT_CHECK(XiveICPState, (obj), TYPE_XIVE_ICP)
> +
>  struct XiveICSState {
>      ICSState parent_obj;
>  
> @@ -49,4 +53,14 @@ struct XiveICSState {
>      XIVE         *xive;
>  };
>  
> +/* Number of Thread Management Interrupt Areas */
> +#define XIVE_TM_RING_COUNT 4
> +
> +struct XiveICPState {
> +    ICPState parent_obj;
> +
> +    uint8_t tima[XIVE_TM_RING_COUNT * 0x10];
> +    uint8_t *tima_os;
> +};
> +
>  #endif /* PPC_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-24  4:49   ` David Gibson
@ 2017-07-24  6:09     ` Benjamin Herrenschmidt
  2017-07-24  6:39       ` David Gibson
  2017-07-24 13:25       ` Cédric Le Goater
  0 siblings, 2 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  6:09 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Mon, 2017-07-24 at 14:49 +1000, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
> > Each source adds its own ESB mempry region to the overall ESB memory
> > region of the controller. It will be mapped in the CPU address space
> > when XIVE is activated.
> > 
> > The default mapping address for the ESB memory region is the same one
> > used on baremetal.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  hw/intc/xive-internal.h |  5 +++++
> >  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 48 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> > index 8e755aa88a14..c06be823aad0 100644
> > --- a/hw/intc/xive-internal.h
> > +++ b/hw/intc/xive-internal.h
> > @@ -98,6 +98,7 @@ struct XIVE {
> >      SysBusDevice parent;
> >  
> >      /* Properties */
> > +    uint32_t     chip_id;
> 
> So there is a XIVE object per chip.  How does this work on PAPR?  One
> logical chip/XIVE, or something more complex?

One global XIVE for PAPR. For the MMIOs, the way it works is that:

 - For MMIOs pertaining to a specific interrupt or queue, there's an H-
call that will return the proper "guest physical" address. For qemu
with KVM we'll have to probably create a single chunk of qemu address
space (a single mem region) that contains individual pages mapped with
MAP_FIXED originating from the different HW bits, we still need to sort
out how exactly we'll do that in practice.

 - For the TIMA (the presentation MMIOs), those are always at the same
physical address for everybody (so for a guest it's a single memory
region we'll map to that physical address), the HW "knows" which HW
thread is talking to it (and the hypervisor tells the HW which vcpu is
running on a given HW thread at a given point in time). That address is
obtained from the device-tree

> >      uint32_t     nr_targets;
> >  
> >      /* IRQ number allocator */
> > @@ -111,6 +112,10 @@ struct XIVE {
> >      void         *sbe;
> >      XiveIVE      *ivt;
> >      XiveEQ       *eqdt;
> > +
> > +    /* ESB and TIMA memory location */
> > +    hwaddr       vc_base;
> > +    MemoryRegion esb_iomem;
> >  };
> >  
> >  void xive_reset(void *dev);
> > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > index 8f8bb8b787bd..a1cb87a07b76 100644
> > --- a/hw/intc/xive.c
> > +++ b/hw/intc/xive.c
> > @@ -312,6 +312,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
> >      XiveICSState *xs = ICS_XIVE(ics);
> >      Object *obj;
> >      Error *err = NULL;
> > +    XIVE *x;
> 
> I don't really like just 'x' for a context variable like this (as
> opposed to a temporary).
> 
> >  
> >      obj = object_property_get_link(OBJECT(xs), "xive", &err);
> >      if (!obj) {
> > @@ -319,7 +320,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
> >                     __func__, error_get_pretty(err));
> >          return;
> >      }
> > -    xs->xive = XIVE(obj);
> > +    x = xs->xive = XIVE(obj);
> >  
> >      if (!ics->nr_irqs) {
> >          error_setg(errp, "Number of interrupts needs to be greater 0");
> > @@ -338,6 +339,11 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
> >                            "xive.esb",
> >                            (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
> >  
> > +    /* Install the ESB memory region in the overall one */
> > +    memory_region_add_subregion(&x->esb_iomem,
> > +                                ICS_BASE(xs)->offset * (1 << xs->esb_shift),
> > +                                &xs->esb_iomem);
> > +
> >      qemu_register_reset(xive_ics_reset, xs);
> >  }
> >  
> > @@ -375,6 +381,32 @@ static const TypeInfo xive_ics_info = {
> >   */
> >  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
> >  
> > +/* VC BAR contains set translations for the ESBs and the EQs. */
> > +#define VC_BAR_DEFAULT   0x10000000000ull
> > +#define VC_BAR_SIZE      0x08000000000ull
> > +
> > +#define P9_MMIO_BASE     0x006000000000000ull
> > +#define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
> 
> chip-based MMIO addresses leaking into the PAPR model seems like it
> might not be what you want
> 
> > +static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
> > +{
> > +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> > +                  __func__, offset, size);
> > +    return 0;
> > +}
> > +
> > +static void xive_esb_default_write(void *opaque, hwaddr offset, uint64_t value,
> > +                unsigned size)
> > +{
> > +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> > +                  __func__, offset, value, size);
> > +}
> > +
> > +static const MemoryRegionOps xive_esb_default_ops = {
> > +    .read = xive_esb_default_read,
> > +    .write = xive_esb_default_write,
> > +    .endianness = DEVICE_BIG_ENDIAN,
> > +};
> >  
> >  void xive_reset(void *dev)
> >  {
> > @@ -435,10 +467,20 @@ static void xive_realize(DeviceState *dev, Error **errp)
> >      x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
> >                          sizeof(XiveEQ));
> >  
> > +    /* VC BAR. That's the full window but we will only map the
> > +     * subregions in use. */
> > +    x->vc_base = (hwaddr)(P9_CHIP_BASE(x->chip_id) | VC_BAR_DEFAULT);
> > +
> > +    /* install default memory region handlers to log bogus access */
> > +    memory_region_init_io(&x->esb_iomem, NULL, &xive_esb_default_ops,
> > +                          NULL, "xive.esb", VC_BAR_SIZE);
> > +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
> > +
> >      qemu_register_reset(xive_reset, dev);
> >  }
> >  
> >  static Property xive_properties[] = {
> > +    DEFINE_PROP_UINT32("chip-id", XIVE, chip_id, 0),
> >      DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the " Cédric Le Goater
@ 2017-07-24  6:35   ` David Gibson
  2017-07-24 14:44     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  6:35 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8217 bytes --]

On Wed, Jul 05, 2017 at 07:13:27PM +0200, Cédric Le Goater wrote:
> The Thread Interrupt Management Area for the OS is mostly used to
> acknowledge interrupts and set the CPPR of the CPU.
> 
> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
> used to retrieve the targeted interrupt presenter object.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Am I right in thinking that this shoehorns the XIVE TIMA state into
the existing XICS ICP object.  That.. doesn't seem like a good idea.

> ---
>  hw/intc/xive-internal.h |   4 ++
>  hw/intc/xive.c          | 187 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 191 insertions(+)
> 
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index ba5e648a5258..5e8b78a1ea6a 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -200,6 +200,10 @@ struct XIVE {
>      /* ESB and TIMA memory location */
>      hwaddr       vc_base;
>      MemoryRegion esb_iomem;
> +
> +    uint32_t     tm_shift;
> +    hwaddr       tm_base;
> +    MemoryRegion tm_iomem;
>  };
>  
>  void xive_reset(void *dev);
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index c08a4f8efb58..82b2f0dcda0b 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -26,6 +26,180 @@
>  
>  #include "xive-internal.h"
>  
> +static uint8_t priority_to_ipb(uint8_t priority)
> +{
> +    return priority < XIVE_EQ_PRIORITY_COUNT ? 1 << (7 - priority) : 0;
> +}
> +
> +static uint64_t xive_icp_accept(XiveICPState *xicp)
> +{
> +    ICPState *icp = ICP(xicp);
> +    uint8_t nsr = xicp->tima_os[TM_NSR];
> +
> +    qemu_irq_lower(icp->output);
> +
> +    if (xicp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
> +        uint8_t cppr = xicp->tima_os[TM_PIPR];
> +
> +        xicp->tima_os[TM_CPPR] = cppr;
> +
> +        /* Reset the pending buffer bit */
> +        xicp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
> +
> +        /* Drop Exception bit for OS */
> +        xicp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
> +    }
> +
> +    return (nsr << 8) | xicp->tima_os[TM_CPPR];
> +}
> +
> +static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
> +{
> +    if (cppr > XIVE_PRIORITY_MAX) {
> +        cppr = 0xff;
> +    }
> +
> +    xicp->tima_os[TM_CPPR] = cppr;
> +}
> +
> +/*
> + * Thread Interrupt Management Area MMIO
> + */
> +static uint64_t xive_tm_read_special(XiveICPState *icp, hwaddr offset,
> +                                     unsigned size)
> +{
> +    uint64_t ret = -1;
> +
> +    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
> +        ret = xive_icp_accept(icp);
> +    } else {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
> +                      HWADDR_PRIx" size %d\n", offset, size);
> +    }
> +
> +    return ret;
> +}
> +
> +static uint64_t xive_tm_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> +    XiveICPState *icp = XIVE_ICP(cpu->intc);
> +    uint64_t ret = -1;
> +    int i;
> +
> +    if (offset >= TM_SPC_ACK_EBB) {
> +        return xive_tm_read_special(icp, offset, size);
> +    }
> +
> +    if (offset & TM_QW1_OS) {
> +        switch (size) {
> +        case 1:
> +        case 2:
> +        case 4:
> +        case 8:
> +            if (QEMU_IS_ALIGNED(offset, size)) {
> +                ret = 0;
> +                for (i = 0; i < size; i++) {
> +                    ret |= icp->tima[offset + i] << (8 * i);
> +                }
> +            } else {
> +                qemu_log_mask(LOG_GUEST_ERROR,
> +                              "XIVE: invalid TIMA read alignment @%"
> +                              HWADDR_PRIx" size %d\n", offset, size);
> +            }
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
> +                      HWADDR_PRIx"\n", offset);
> +    }
> +
> +    return ret;
> +}
> +
> +static bool xive_tm_is_readonly(uint8_t index)
> +{
> +    /* Let's be optimistic and prepare ground for HV mode support */
> +    switch (index) {
> +    case TM_QW1_OS + TM_CPPR:
> +        return false;
> +    default:
> +        return true;
> +    }
> +}
> +
> +static void xive_tm_write_special(XiveICPState *xicp, hwaddr offset,
> +                                  uint64_t value, unsigned size)
> +{
> +    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
> +        xicp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
> +    } else {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
> +                      HWADDR_PRIx" size %d\n", offset, size);
> +    }
> +
> +    /* TODO: support TM_SPC_ACK_OS_EL */
> +}
> +
> +static void xive_tm_write(void *opaque, hwaddr offset,
> +                           uint64_t value, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> +    XiveICPState *icp = XIVE_ICP(cpu->intc);
> +    int i;
> +
> +    if (offset >= TM_SPC_ACK_EBB) {
> +        xive_tm_write_special(icp, offset, value, size);
> +        return;
> +    }
> +
> +    if (offset & TM_QW1_OS) {
> +        switch (size) {
> +        case 1:
> +            if (offset == TM_QW1_OS + TM_CPPR) {
> +                xive_icp_set_cppr(icp, value & 0xff);
> +            }
> +            break;
> +        case 4:
> +        case 8:
> +            if (QEMU_IS_ALIGNED(offset, size)) {
> +                for (i = 0; i < size; i++) {
> +                    if (!xive_tm_is_readonly(offset + i)) {
> +                        icp->tima[offset + i] = (value >> (8 * i)) & 0xff;
> +                    }
> +                }
> +            } else {
> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
> +                              HWADDR_PRIx" size %d\n", offset, size);
> +            }
> +            break;
> +        default:
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
> +                          HWADDR_PRIx" size %d\n", offset, size);
> +        }
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
> +                      HWADDR_PRIx"\n", offset);
> +    }
> +}
> +
> +
> +static const MemoryRegionOps xive_tm_ops = {
> +    .read = xive_tm_read,
> +    .write = xive_tm_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +};
> +
>  static void xive_icp_reset(ICPState *icp)
>  {
>      XiveICPState *xicp = XIVE_ICP(icp);
> @@ -453,6 +627,11 @@ static const TypeInfo xive_ics_info = {
>  #define P9_MMIO_BASE     0x006000000000000ull
>  #define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
>  
> +/* Thread Interrupt Management Area MMIO */
> +#define TM_BAR_DEFAULT   0x30203180000ull
> +#define TM_SHIFT         16
> +#define TM_BAR_SIZE      (XIVE_TM_RING_COUNT * (1 << TM_SHIFT))
> +
>  static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
>  {
>      qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> @@ -541,6 +720,14 @@ static void xive_realize(DeviceState *dev, Error **errp)
>                            NULL, "xive.esb", VC_BAR_SIZE);
>      sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
>  
> +    /* TM BAR. Same address for each chip */
> +    x->tm_base = (P9_MMIO_BASE | TM_BAR_DEFAULT);
> +    x->tm_shift = TM_SHIFT;
> +
> +    memory_region_init_io(&x->tm_iomem, OBJECT(x), &xive_tm_ops, x,
> +                          "xive.tm", TM_BAR_SIZE);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->tm_iomem);
> +
>      qemu_register_reset(xive_reset, dev);
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-24  6:09     ` Benjamin Herrenschmidt
@ 2017-07-24  6:39       ` David Gibson
  2017-07-24 13:27         ` Cédric Le Goater
  2017-07-24 13:25       ` Cédric Le Goater
  1 sibling, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24  6:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2423 bytes --]

On Mon, Jul 24, 2017 at 04:09:31PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 14:49 +1000, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
> > > Each source adds its own ESB mempry region to the overall ESB memory
> > > region of the controller. It will be mapped in the CPU address space
> > > when XIVE is activated.
> > > 
> > > The default mapping address for the ESB memory region is the same one
> > > used on baremetal.
> > > 
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > > ---
> > >  hw/intc/xive-internal.h |  5 +++++
> > >  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
> > >  2 files changed, 48 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> > > index 8e755aa88a14..c06be823aad0 100644
> > > --- a/hw/intc/xive-internal.h
> > > +++ b/hw/intc/xive-internal.h
> > > @@ -98,6 +98,7 @@ struct XIVE {
> > >      SysBusDevice parent;
> > >  
> > >      /* Properties */
> > > +    uint32_t     chip_id;
> > 
> > So there is a XIVE object per chip.  How does this work on PAPR?  One
> > logical chip/XIVE, or something more complex?
> 
> One global XIVE for PAPR. For the MMIOs, the way it works is that:
> 
>  - For MMIOs pertaining to a specific interrupt or queue, there's an H-
> call that will return the proper "guest physical" address. For qemu
> with KVM we'll have to probably create a single chunk of qemu address
> space (a single mem region) that contains individual pages mapped with
> MAP_FIXED originating from the different HW bits, we still need to sort
> out how exactly we'll do that in practice.
> 
>  - For the TIMA (the presentation MMIOs), those are always at the same
> physical address for everybody (so for a guest it's a single memory
> region we'll map to that physical address), the HW "knows" which HW
> thread is talking to it (and the hypervisor tells the HW which vcpu is
> running on a given HW thread at a given point in time). That address is
> obtained from the device-tree

Ok.  That leaves "chip_id" as a rather surprising thing to see in an
object which will appear on PAPR.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source Cédric Le Goater
  2017-07-24  4:29   ` David Gibson
@ 2017-07-24  6:50   ` Alexey Kardashevskiy
  2017-07-24 15:39     ` Cédric Le Goater
  1 sibling, 1 reply; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-24  6:50 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, Alexander Graf, qemu-devel

On 06/07/17 03:13, Cédric Le Goater wrote:
> Each interrupt source is associated with a 2-bit state machine called
> an Event State Buffer (ESB). It is controlled by MMIO to trigger
> events.
> 
> See code for more details on the states.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive.c        | 230 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/xive.h |   3 +
>  2 files changed, 233 insertions(+)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 9ff14c0da595..816031b8ac81 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -32,6 +32,226 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
>  }
>  
>  /*
> + * "magic" Event State Buffer (ESB) MMIO offsets.
> + *
> + * Each interrupt source has a 2-bit state machine called ESB
> + * which can be controlled by MMIO. It's made of 2 bits, P and
> + * Q. P indicates that an interrupt is pending (has been sent
> + * to a queue and is waiting for an EOI). Q indicates that the
> + * interrupt has been triggered while pending.
> + *
> + * This acts as a coalescing mechanism in order to guarantee
> + * that a given interrupt only occurs at most once in a queue.
> + *
> + * When doing an EOI, the Q bit will indicate if the interrupt
> + * needs to be re-triggered.
> + *
> + * The following offsets into the ESB MMIO allow to read or
> + * manipulate the PQ bits. They must be used with an 8-bytes
> + * load instruction. They all return the previous state of the
> + * interrupt (atomically).
> + *
> + * Additionally, some ESB pages support doing an EOI via a
> + * store at 0 and some ESBs support doing a trigger via a
> + * separate trigger page.
> + */
> +#define XIVE_ESB_GET            0x800
> +#define XIVE_ESB_SET_PQ_00      0xc00
> +#define XIVE_ESB_SET_PQ_01      0xd00
> +#define XIVE_ESB_SET_PQ_10      0xe00
> +#define XIVE_ESB_SET_PQ_11      0xf00
> +
> +#define XIVE_ESB_VAL_P          0x2
> +#define XIVE_ESB_VAL_Q          0x1


These are not used. I'd suggest defining the states below using these two.


> +
> +#define XIVE_ESB_RESET          0x0
> +#define XIVE_ESB_PENDING        0x2
> +#define XIVE_ESB_QUEUED         0x3
> +#define XIVE_ESB_OFF            0x1
> +
> +static uint8_t xive_pq_get(XIVE *x, uint32_t lisn)
> +{
> +    uint32_t idx = lisn;
> +    uint32_t byte = idx / 4;
> +    uint32_t bit  = (idx % 4) * 2;
> +    uint8_t* pqs = (uint8_t *) x->sbe;
> +
> +    return (pqs[byte] >> bit) & 0x3;
> +}
> +
> +static void xive_pq_set(XIVE *x, uint32_t lisn, uint8_t pq)
> +{
> +    uint32_t idx = lisn;
> +    uint32_t byte = idx / 4;
> +    uint32_t bit  = (idx % 4) * 2;
> +    uint8_t* pqs = (uint8_t *) x->sbe;
> +
> +    pqs[byte] &= ~(0x3 << bit);
> +    pqs[byte] |= (pq & 0x3) << bit;
> +}
> +
> +static bool xive_pq_eoi(XIVE *x, uint32_t lisn)


Should not it return uint8_t as well (like xive_pq_get() does)? The value
than returned from .read() is uint64_t (a binary value).




-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-24  4:36   ` David Gibson
@ 2017-07-24  7:00     ` Benjamin Herrenschmidt
  2017-07-24  9:50       ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  7:00 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Mon, 2017-07-24 at 14:36 +1000, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
> > These flags define some characteristics of the source :
> > 
> >  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
> >                        specific hcall H_INT_ESB
> 
> What's the other option?

Direct MMIO access. Normally all interrupts use normal MMIOs,
each interrupts has an associated MMIO page with special MMIOs
to control the source state (PQ bits). This is something I added
to the PAPR spec (and the OPAL <-> Linux interface) to allow firmware
to work around broken HW (which happens on some P9 versions).

> >  - XIVE_SRC_LSI        LSI or MSI source
> 
> Hrm.  This definitely duplicates info that is in the XICS per irq
> state which you're re-using (and which you're using in the xive code
> at this point).

I think all those flags correspond to the flags passed via the PAPR
API, so it makes sense to have them there.

> >  - XIVE_SRC_TRIGGER    the full function page supports trigger
> >  - XIVE_SRC_STORE_EOI  EOI can with a store.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  hw/intc/xive.c        | 1 +
> >  include/hw/ppc/xive.h | 9 +++++++++
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > index 816031b8ac81..8f8bb8b787bd 100644
> > --- a/hw/intc/xive.c
> > +++ b/hw/intc/xive.c
> > @@ -345,6 +345,7 @@ static Property xive_ics_properties[] = {
> >      DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
> >      DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
> >      DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
> > +    DEFINE_PROP_UINT64("flags", XiveICSState, flags, 0),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> > index 5303d96f5f59..1178300c9df3 100644
> > --- a/include/hw/ppc/xive.h
> > +++ b/include/hw/ppc/xive.h
> > @@ -30,9 +30,18 @@ typedef struct XiveICSState XiveICSState;
> >  #define TYPE_ICS_XIVE "xive-source"
> >  #define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
> >  
> > +/*
> > + * XIVE Interrupt source flags
> > + */
> > +#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
> > +#define XIVE_SRC_LSI           (1ull << (63 - 61))
> > +#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
> > +#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
> > +
> >  struct XiveICSState {
> >      ICSState parent_obj;
> >  
> > +    uint64_t     flags;
> >      uint32_t     esb_shift;
> >      MemoryRegion esb_iomem;
> >  
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-24  5:38               ` David Gibson
@ 2017-07-24  7:20                 ` Benjamin Herrenschmidt
  2017-07-24 10:03                   ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  7:20 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

On Mon, 2017-07-24 at 15:38 +1000, David Gibson wrote:
> 
> Can we assign our logical numbers sparsely, or will that cause other
> problems?

The main issue is that they probably needs to be the same between XICS
and XIVE because by the time we get the CAS call to chose between XICS
and XIVE, we have already handed out interrupts and constructed the DT,
no ? Unless we do a real CAS reboot...

Otherwise, there's no reason they can't be sparse no.

> Note that for PAPR we also have the question of finding logical
> interrupts for legacy PAPR VIO devices.

We just make them another range ? With KVM legacy today, I just use the
generic interrupt facility for those. So when you do the ioctl to
"trigger" one, I just do an MMIO to the corresponding page and the
interrupt magically shows up wherever the guest is running the target
vcpu. In fact, I'd like to add a way to mmap that page into qemu so
that qemu can triggers them without an ioctl.

The guest doesn't care, from the guest perspective they are interrupts
coming from the DT, so they are like PCI etc...

> > We can fix the number of "generic" interrupts given to a guest. The
> > only requirements from a PAPR perspective is that there should be at
> > least as many as there are possible threads in the guest so they can be
> > used as IPIs.
> 
> Ok.  If we can do things sparsely, allocating these well away from the
> hw interrupts would make things easier.
> 
> > But we may need more for other things. We can make this a machine
> > parameter with a default value of something like 4096. If we call N
> > that number of extra generic interrupts, then the number of generic
> > interrutps would be #possible-vcpu's + N, or something like that.
> 
> That seems reasonable.
> 
> > > > But it's fundamentally an allocator that sits in the hypervisor, so in
> > > > our case, I would say in the spapr "component" of XIVE, rather than the
> > > > XIVE HW model itself.
> > > 
> > > Maybe..
> > 
> > You are right in that a mapping is a better term than an allocator
> > here.
> > 
> > > > Now what Cedric did, because XIVE is very complex and we need something
> > > > for PAPR quickly, is not a complete HW model, but a somewhat simplified
> > > > one that only handles what PAPR exposes. So in that case where the
> > > > allocator sits is a bit of a TBD...
> > > 
> > > Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
> > > machine type level needs extreme caution, or the irqs may not be
> > > stable which will generally break migration.
> > 
> > Yes you are right. We should probably create a more "static" scheme.
> 
> Sounds like we're in violent agreement.

Yup :)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
  2017-07-19  3:55   ` Benjamin Herrenschmidt
@ 2017-07-24  7:28     ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24  7:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/19/2017 05:55 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-19 at 13:00 +1000, David Gibson wrote:
>> So, this is probably obvious, but I'm not considering this a candidate
>> for qemu 2.10 (seeing as the soft freeze was yesterday).  I'll still
>> try to review and, once ready, queue for 2.11.
> 
> Right. I need to review still and we need to make sure we have the
> right plumbing for migration etc... and of course I need to do the
> KVM bits. So it's definitely not 2.10 material.

yes. This is not for 2.10 clearly. This is just an RFC to get
some feedback on the approach and on some ugly hacks I have 
put in place.

I have given KVM a quick look and it should be addressed before
we start merging anything. I think PowerNV should wait a bit. 

As for TCG, my branch supports reset, changing model XICS <-> XIVE, 
migration and CPU hotplug. KVM+kernel_irqchip=off is supported
also. Most of the issues have found a solution but now we need 
to discuss.   

I was out last week. Catching up.

Thanks,

C.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-24  4:29   ` David Gibson
@ 2017-07-24  8:56     ` Benjamin Herrenschmidt
  2017-07-24 15:55     ` Cédric Le Goater
  1 sibling, 0 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24  8:56 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Mon, 2017-07-24 at 14:29 +1000, David Gibson wrote:
> > +    case XIVE_ESB_SET_PQ_00:
> > +    case XIVE_ESB_SET_PQ_01:
> > +    case XIVE_ESB_SET_PQ_10:
> > +    case XIVE_ESB_SET_PQ_11:
> > +        ret = xive_pq_get(x, lisn);
> > +        xive_pq_set(x, lisn, (offset >> 8) & 0x3);
> 
> Again I'd prefer xive_pq_set() return the old value itself, for more
> obvious atomicity.

Agreed.  That will also help with StoreEOI (store to 0x400 of the EOI
page) which does an EOI then re-sends an interrupt if the old value was
11 (while the load EOI doesn't resend).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 17/26] ppc/xive: add hcalls support
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 17/26] ppc/xive: add hcalls support Cédric Le Goater
@ 2017-07-24  9:39   ` Alexey Kardashevskiy
  2017-07-24 14:55     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-24  9:39 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On 06/07/17 03:13, Cédric Le Goater wrote:
> A set of Hypervisor's call are used to configure the interrupt sources
> and the event/notification queues of the guest:
> 
>    H_INT_GET_SOURCE_INFO
>    H_INT_SET_SOURCE_CONFIG
>    H_INT_GET_SOURCE_CONFIG
>    H_INT_GET_QUEUE_INFO
>    H_INT_SET_QUEUE_CONFIG
>    H_INT_GET_QUEUE_CONFIG
>    H_INT_RESET
>    H_INT_ESB
> 
> Calls that still need to be addressed :
> 
>    H_INT_SET_OS_REPORTING_LINE
>    H_INT_GET_OS_REPORTING_LINE
>    H_INT_SYNC
> 
> See below for the documentation on each hcall.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  default-configs/ppc64-softmmu.mak |   1 +
>  hw/intc/Makefile.objs             |   1 +
>  hw/intc/xive_spapr.c              | 745 ++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h            |  17 +-
>  include/hw/ppc/xive.h             |   4 +
>  5 files changed, 767 insertions(+), 1 deletion(-)
>  create mode 100644 hw/intc/xive_spapr.c
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 1179c07e6e9f..3888168adf95 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -57,6 +57,7 @@ CONFIG_XICS=$(CONFIG_PSERIES)
>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
>  CONFIG_XIVE=$(CONFIG_PSERIES)
> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>  # For PReP
>  CONFIG_SERIAL_ISA=y
>  CONFIG_MC146818RTC=y
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 28b83456bfcc..31b4fae2d1a8 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -36,6 +36,7 @@ obj-$(CONFIG_XICS) += xics.o
>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>  obj-$(CONFIG_XIVE) += xive.o
> +obj-$(CONFIG_XIVE_SPAPR) += xive_spapr.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> diff --git a/hw/intc/xive_spapr.c b/hw/intc/xive_spapr.c
> new file mode 100644
> index 000000000000..b634d1f28f10
> --- /dev/null
> +++ b/hw/intc/xive_spapr.c
> @@ -0,0 +1,745 @@
> +/*
> + * QEMU PowerPC XIVE model for pSeries
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "cpu.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/ppc/xive.h"
> +#include "hw/ppc/fdt.h"
> +#include "monitor/monitor.h"
> +
> +#include "xive-internal.h"
> +
> +static XiveICSState *xive_ics_find(sPAPRMachineState *spapr, uint32_t lisn)
> +{
> +    XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
> +    ICSState *ics = xic->ics_get(XICS_FABRIC(spapr), lisn);
> +
> +    return ICS_XIVE(ics);
> +}
> +
> +static bool priority_is_valid(int priority)
> +{
> +    return priority >= 0 && priority < 8;
> +}
> +
> +/*
> + * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
> + * real address of the MMIO page through which the Event State Buffer
> + * entry associated with the value of the "lisn" parameter is managed.
> + *
> + * Parameters:
> + * Input
> + * - "flags"
> + *       Bits 0-63 reserved
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
> + *       ibm,query-interrupt-source-number RTAS call, or as returned
> + *       by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output
> + * - R4: "flags"
> + *       Bits 0-59: Reserved
> + *       Bit 60: H_INT_ESB must be used for Event State Buffer
> + *               management
> + *       Bit 61: 1 == LSI  0 == MSI
> + *       Bit 62: the full function page supports trigger
> + *       Bit 63: Store EOI Supported
> + * - R5: Logical Real address of full function Event State Buffer
> + *       management page, -1 if ESB hcall flag is set to 1.
> + * - R6: Logical Real Address of trigger only Event State Buffer
> + *       management page or -1.
> + * - R7: Power of 2 page size for the ESB management pages returned in
> + *       R5 and R6.
> + */
> +static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
> +                                          sPAPRMachineState *spapr,
> +                                          target_ulong opcode,
> +                                          target_ulong *args)
> +{
> +    target_ulong flags  = args[0];
> +    target_ulong lisn   = args[1];
> +    XiveICSState *xs;
> +    uint32_t srcno;
> +    uint64_t mmio_base;
> +    ICSIRQState *irq;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    xs = xive_ics_find(spapr, lisn);
> +    if (!xs) {
> +        return H_P2;
> +    }
> +
> +    srcno = lisn - ICS_BASE(xs)->offset;
> +    mmio_base = (uint64_t)xs->esb_base + (1ull << xs->esb_shift) * srcno;
> +    irq = &ICS_BASE(xs)->irqs[srcno];
> +
> +    args[0] = 0;
> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> +        args[0] |= XIVE_SRC_LSI;
> +    }
> +    if (xs->flags & XIVE_SRC_TRIGGER) {
> +        args[0] |= XIVE_SRC_TRIGGER;
> +    }
> +
> +    /* never used in QEMU  */
> +    if (xs->flags & XIVE_SRC_H_INT_ESB) {
> +        args[1] = -1;


args[2] in undefined here.


> +    } else {
> +        args[1] = mmio_base;
> +        if (xs->flags & XIVE_SRC_TRIGGER) {
> +            args[2] = -1; /* No specific trigger page */
> +        } else {
> +            args[2] = -1; /* TODO: support for specific trigger page */
> +        }
> +    }
> +
> +    args[3] = xs->esb_shift;
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
> + * Interrupt Source to a target. The Logical Interrupt Source is
> + * designated with the "lisn" parameter and the target is designated
> + * with the "target" and "priority" parameters.  Upon return from the
> + * hcall(), no additional interrupts will be directed to the old EQ.
> + * The old EQ should be investigated for interrupts that occurred
> + * prior to or during the hcall().
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-61: Reserved
> + *      Bit 62: set the "eisn" in the EA
> + *      Bit 63: masks the interrupt source in the hardware interrupt
> + *      control structure. An interrupt masked by this mechanism will
> + *      be dropped, but it's source state bits will still be
> + *      set. There is no race-free way of unmasking and restoring the
> + *      source. Thus this should only be used in interrupts that are
> + *      also masked at the source, and only in cases where the
> + *      interrupt is not meant to be used for a large amount of time
> + *      because no valid target exists for it for example
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> + *      ibm,query-interrupt-source-number RTAS call, or as returned by
> + *      the H_ALLOCATE_VAS_WINDOW hcall
> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> + *      "ibm,ppc-interrupt-gserver#s"
> + * - "priority" is a valid priority not in
> + *      "ibm,plat-res-int-priorities"
> + * - "eisn" is the guest EISN associated with the "lisn"
> + *
> + * Output:
> + * - None
> + */
> +
> +#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
> +#define XIVE_SRC_MASK     (1ull << (63 - 63))
> +
> +static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
> +                                            sPAPRMachineState *spapr,
> +                                            target_ulong opcode,
> +                                            target_ulong *args)
> +{
> +    XiveIVE *ive;
> +    uint64_t new_ive;
> +    target_ulong flags    = args[0];
> +    target_ulong lisn     = args[1];
> +    target_ulong target   = args[2];
> +    target_ulong priority = args[3];
> +    target_ulong eisn     = args[4];
> +    uint32_t eq_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
> +        return H_PARAMETER;
> +    }
> +
> +    ive = xive_get_ive(spapr->xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        return H_P2;
> +    }
> +    new_ive = ive->w;
> +
> +    /* Let's handle 0xff priority as if the interrupt was masked */
> +    if (priority == 0xff || (flags & XIVE_SRC_MASK)) {
> +        new_ive |= IVE_MASKED;
> +        priority = 7;
> +    } else {
> +        new_ive = ive->w & ~IVE_MASKED;
> +    }
> +
> +    if (!priority_is_valid(priority)) {
> +        return H_P4;
> +    }
> +
> +    /* First find the EQ corresponding to the target */
> +    if (!xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
> +        return H_P3;
> +    }
> +
> +    /* And update */
> +    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
> +    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
> +
> +    if (flags & XIVE_SRC_SET_EISN) {
> +        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
> +    }
> +
> +    ive->w = new_ive;
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
> + * target/priority pair is assigned to the specified Logical Interrupt
> + * Source.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-63 Reserved
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> + *      ibm,query-interrupt-source-number RTAS call, or as
> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output:
> + * - R4: Target to which the specified Logical Interrupt Source is
> + *       assigned
> + * - R5: Priority to which the specified Logical Interrupt Source is
> + *       assigned
> + */
> +static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
> +                                            sPAPRMachineState *spapr,
> +                                            target_ulong opcode,
> +                                            target_ulong *args)
> +{
> +    target_ulong flags = args[0];
> +    target_ulong lisn = args[1];
> +    XiveIVE *ive;
> +    XiveEQ *eq;
> +    uint32_t eq_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    ive = xive_get_ive(spapr->xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        return H_P2;
> +    }
> +
> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
> +    eq = xive_get_eq(spapr->xive, eq_idx);
> +    if (!eq) {
> +        return H_P2;
> +    }
> +
> +    if (ive->w & IVE_MASKED) {
> +        args[1] = 0xff;
> +    } else {
> +        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> +    }
> +
> +    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);



R6 is missing but you added it in your github tree so never mind :)





-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-24  7:00     ` Benjamin Herrenschmidt
@ 2017-07-24  9:50       ` David Gibson
  2017-07-24 11:07         ` Benjamin Herrenschmidt
  2017-07-25  8:17         ` Cédric Le Goater
  0 siblings, 2 replies; 122+ messages in thread
From: David Gibson @ 2017-07-24  9:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]

On Mon, Jul 24, 2017 at 05:00:57PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 14:36 +1000, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
> > > These flags define some characteristics of the source :
> > > 
> > >  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
> > >                        specific hcall H_INT_ESB
> > 
> > What's the other option?
> 
> Direct MMIO access. Normally all interrupts use normal MMIOs,
> each interrupts has an associated MMIO page with special MMIOs
> to control the source state (PQ bits). This is something I added
> to the PAPR spec (and the OPAL <-> Linux interface) to allow firmware
> to work around broken HW (which happens on some P9 versions).

Ok.. and that's something that can be decided at runtime?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-24  7:20                 ` Benjamin Herrenschmidt
@ 2017-07-24 10:03                   ` David Gibson
  2017-07-25  8:52                     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-24 10:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3336 bytes --]

On Mon, Jul 24, 2017 at 05:20:26PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 15:38 +1000, David Gibson wrote:
> > 
> > Can we assign our logical numbers sparsely, or will that cause other
> > problems?
> 
> The main issue is that they probably needs to be the same between XICS
> and XIVE because by the time we get the CAS call to chose between XICS
> and XIVE, we have already handed out interrupts and constructed the DT,
> no ? Unless we do a real CAS reboot...

A real CAS reboot probably isn't unreasonable for this case.

I definitely think we need to go one way or the other - either fully
unify the irq mapping between xics and xive, or fully separate them.

> Otherwise, there's no reason they can't be sparse no.
> 
> > Note that for PAPR we also have the question of finding logical
> > interrupts for legacy PAPR VIO devices.
> 
> We just make them another range ? With KVM legacy today, I just use the
> generic interrupt facility for those. So when you do the ioctl to
> "trigger" one, I just do an MMIO to the corresponding page and the
> interrupt magically shows up wherever the guest is running the target
> vcpu. In fact, I'd like to add a way to mmap that page into qemu so
> that qemu can triggers them without an ioctl.

Ok.

> The guest doesn't care, from the guest perspective they are interrupts
> coming from the DT, so they are like PCI etc...

Ok.

> > > We can fix the number of "generic" interrupts given to a guest. The
> > > only requirements from a PAPR perspective is that there should be at
> > > least as many as there are possible threads in the guest so they can be
> > > used as IPIs.
> > 
> > Ok.  If we can do things sparsely, allocating these well away from the
> > hw interrupts would make things easier.
> > 
> > > But we may need more for other things. We can make this a machine
> > > parameter with a default value of something like 4096. If we call N
> > > that number of extra generic interrupts, then the number of generic
> > > interrutps would be #possible-vcpu's + N, or something like that.
> > 
> > That seems reasonable.
> > 
> > > > > But it's fundamentally an allocator that sits in the hypervisor, so in
> > > > > our case, I would say in the spapr "component" of XIVE, rather than the
> > > > > XIVE HW model itself.
> > > > 
> > > > Maybe..
> > > 
> > > You are right in that a mapping is a better term than an allocator
> > > here.
> > > 
> > > > > Now what Cedric did, because XIVE is very complex and we need something
> > > > > for PAPR quickly, is not a complete HW model, but a somewhat simplified
> > > > > one that only handles what PAPR exposes. So in that case where the
> > > > > allocator sits is a bit of a TBD...
> > > > 
> > > > Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
> > > > machine type level needs extreme caution, or the irqs may not be
> > > > stable which will generally break migration.
> > > 
> > > Yes you are right. We should probably create a more "static" scheme.
> > 
> > Sounds like we're in violent agreement.
> 
> Yup :)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-24  9:50       ` David Gibson
@ 2017-07-24 11:07         ` Benjamin Herrenschmidt
  2017-07-24 11:47           ` Cédric Le Goater
  2017-07-25  4:18           ` David Gibson
  2017-07-25  8:17         ` Cédric Le Goater
  1 sibling, 2 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-24 11:07 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

On Mon, 2017-07-24 at 19:50 +1000, David Gibson wrote:
> On Mon, Jul 24, 2017 at 05:00:57PM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2017-07-24 at 14:36 +1000, David Gibson wrote:
> > > On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
> > > > These flags define some characteristics of the source :
> > > > 
> > > >  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
> > > >                        specific hcall H_INT_ESB
> > > 
> > > What's the other option?
> > 
> > Direct MMIO access. Normally all interrupts use normal MMIOs,
> > each interrupts has an associated MMIO page with special MMIOs
> > to control the source state (PQ bits). This is something I added
> > to the PAPR spec (and the OPAL <-> Linux interface) to allow firmware
> > to work around broken HW (which happens on some P9 versions).
> 
> Ok.. and that's something that can be decided at runtime?

Well, at this point I think nothing will set that flag.... It's there
for workaround around HW bugs on some chips. At least in full emu it
shouldn't happen unless we try to emulate those bugs. Hopefully direct
MMIO will just work.

Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-24 11:07         ` Benjamin Herrenschmidt
@ 2017-07-24 11:47           ` Cédric Le Goater
  2017-07-25  4:19             ` David Gibson
  2017-07-25  4:18           ` David Gibson
  1 sibling, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 01:07 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 19:50 +1000, David Gibson wrote:
>> On Mon, Jul 24, 2017 at 05:00:57PM +1000, Benjamin Herrenschmidt wrote:
>>> On Mon, 2017-07-24 at 14:36 +1000, David Gibson wrote:
>>>> On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
>>>>> These flags define some characteristics of the source :
>>>>>
>>>>>  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
>>>>>                        specific hcall H_INT_ESB
>>>>
>>>> What's the other option?
>>>
>>> Direct MMIO access. Normally all interrupts use normal MMIOs,
>>> each interrupts has an associated MMIO page with special MMIOs
>>> to control the source state (PQ bits). This is something I added
>>> to the PAPR spec (and the OPAL <-> Linux interface) to allow firmware
>>> to work around broken HW (which happens on some P9 versions).
>>
>> Ok.. and that's something that can be decided at runtime?
> 
> Well, at this point I think nothing will set that flag.... It's there
> for workaround around HW bugs on some chips. At least in full emu it
> shouldn't happen unless we try to emulate those bugs. Hopefully direct
> MMIO will just work.

Nevertheless I have added support for the hcall in Linux and QEMU.
To use, I think we could create a specific source. 

C. 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables
  2017-07-19  3:24   ` David Gibson
@ 2017-07-24 12:52     ` Cédric Le Goater
  2017-07-25  2:16       ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 12:52 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/19/2017 05:24 AM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:18PM +0200, Cédric Le Goater wrote:
>> The XIVE interrupt controller of the POWER9 uses a set of tables to
>> redirect exception from event sources to CPU threads. Among which we
>> choose to model :
>>
>>  - the State Bit Entries (SBE), also known as Event State Buffer
>>    (ESB). This is a two bit state machine for each event source which
>>    is used to trigger events. The bits are named "P" (pending) and "Q"
>>    (queued) and can be controlled by MMIO.
>>
>>  - the Interrupt Virtualization Entry (IVE) table, also known as Event
>>    Assignment Structure (EAS). This table is indexed by the IRQ number
>>    and is looked up to find the Event Queue associated with a
>>    triggered event.
>>
>>  - the Event Queue Descriptor (EQD) table, also known as Event
>>    Notification Descriptor (END). The EQD contains fields that specify
>>    the Event Queue on which event data is posted (and later pulled by
>>    the OS) and also a target (or VPD) to notify.
>>
>> An additional table was not modeled but we might need to to support
>> the H_INT_SET_OS_REPORTING_LINE hcall:
>>
>>  - the Virtual Processor Descriptor (VPD) table, also known as
>>    Notification Virtual Target (NVT).
>>
>> The XIVE object is expanded with the tables described above. The size
>> of each table depends on the number of provisioned IRQ and the maximum
>> number of CPUs in the system. The indexing is very basic and might
>> need to be improved for the EQs.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive-internal.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive.c          | 72 +++++++++++++++++++++++++++++++++++++
>>  2 files changed, 167 insertions(+)
>>
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> index 155c2dcd6066..8e755aa88a14 100644
>> --- a/hw/intc/xive-internal.h
>> +++ b/hw/intc/xive-internal.h
>> @@ -11,6 +11,89 @@
>>  
>>  #include <hw/sysbus.h>
>>  
>> +/* Utilities to manipulate these (originaly from OPAL) */
>> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
>> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
>> +#define SETFIELD(m, v, val)                             \
>> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
>> +
>> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
>> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
>> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
>> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>> +                                 PPC_BIT32(bs))
>> +
>> +/* IVE/EAS
>> + *
>> + * One per interrupt source. Targets that interrupt to a given EQ
>> + * and provides the corresponding logical interrupt number (EQ data)
>> + *
>> + * We also map this structure to the escalation descriptor inside
>> + * an EQ, though in that case the valid and masked bits are not used.
>> + */
>> +typedef struct XiveIVE {
>> +        /* Use a single 64-bit definition to make it easier to
>> +         * perform atomic updates
>> +         */
>> +        uint64_t        w;
>> +#define IVE_VALID       PPC_BIT(0)
>> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
>> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>> +} XiveIVE;
>> +
>> +/* EQ */
>> +typedef struct XiveEQ {
>> +        uint32_t        w0;
>> +#define EQ_W0_VALID             PPC_BIT32(0)
>> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
>> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
>> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
>> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
>> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
>> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
>> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
>> +#define EQ_W0_SW0               PPC_BIT32(16)
>> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
>> +#define EQ_QSIZE_4K             0
>> +#define EQ_QSIZE_64K            4
>> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
>> +        uint32_t        w1;
>> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
>> +#define EQ_W1_ESn_P             PPC_BIT32(0)
>> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
>> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
>> +#define EQ_W1_ESe_P             PPC_BIT32(2)
>> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
>> +#define EQ_W1_GENERATION        PPC_BIT32(9)
>> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
>> +        uint32_t        w2;
>> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
>> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
>> +        uint32_t        w3;
>> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
>> +        uint32_t        w4;
>> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
>> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
>> +        uint32_t        w5;
>> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
>> +        uint32_t        w6;
>> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
>> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
>> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
>> +        uint32_t        w7;
>> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
>> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
>> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
>> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
>> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>> +} XiveEQ;
>> +
>> +#define XIVE_EQ_PRIORITY_COUNT 8
>> +#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
>> +
>>  struct XIVE {
>>      SysBusDevice parent;
>>  
>> @@ -23,6 +106,18 @@ struct XIVE {
>>      uint32_t     int_max;       /* Max index */
>>      uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
>>      uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
>> +
>> +    /* XIVE internal tables */
>> +    void         *sbe;
>> +    XiveIVE      *ivt;
>> +    XiveEQ       *eqdt;
>>  };
>>  
>> +void xive_reset(void *dev);
>> +XiveIVE *xive_get_ive(XIVE *x, uint32_t isn);
>> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx);
>> +
>> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t prio,
>> +                        uint32_t *out_eq_idx);
>> +
>>  #endif /* _INTC_XIVE_INTERNAL_H */
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 5b4ea915d87c..5b14d8155317 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -35,6 +35,27 @@
>>   */
>>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
>>  
>> +
>> +void xive_reset(void *dev)
>> +{
>> +    XIVE *x = XIVE(dev);
>> +    int i;
>> +
>> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
>> +    memset(x->sbe, 0x55, x->int_count / 4);
> 
> I think strictly this should be a DIV_ROUND_UP to handle the case of
> int_count not a multiple of 4.

ok. 
 
>> +
>> +    /* Clear and mask all valid IVEs */
>> +    for (i = x->int_base; i < x->int_max; i++) {
>> +        XiveIVE *ive = &x->ivt[i];
>> +        if (ive->w & IVE_VALID) {
>> +            ive->w = IVE_VALID | IVE_MASKED;
>> +        }
>> +    }
>> +
>> +    /* clear all EQs */
>> +    memset(x->eqdt, 0, x->nr_targets * XIVE_EQ_PRIORITY_COUNT * sizeof(XiveEQ));
>> +}
>> +
>>  static void xive_init(Object *obj)
>>  {
>>      ;
>> @@ -62,6 +83,19 @@ static void xive_realize(DeviceState *dev, Error **errp)
>>      if (x->int_ipi_top < 0x10) {
>>          x->int_ipi_top = 0x10;
>>      }
>> +
>> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>> +    x->sbe = g_malloc0(x->int_count / 4);
> 
> And here as well.

yes.

>> +
>> +    /* Allocate the IVT (Interrupt Virtualization Table) */
>> +    x->ivt = g_malloc0(x->int_count * sizeof(XiveIVE));
>> +
>> +    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
>> +     * for each thread in the system */
>> +    x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
>> +                        sizeof(XiveEQ));
>> +
>> +    qemu_register_reset(xive_reset, dev);
>>  }
>>  
>>  static Property xive_properties[] = {
>> @@ -92,3 +126,41 @@ static void xive_register_types(void)
>>  }
>>  
>>  type_init(xive_register_types)
>> +
>> +XiveIVE *xive_get_ive(XIVE *x, uint32_t lisn)
>> +{
>> +    uint32_t idx = lisn;
>> +
>> +    if (idx < x->int_base || idx >= x->int_max) {
>> +        return NULL;
>> +    }
>> +
>> +    return &x->ivt[idx];
> 
> Should be idx - int_base, no?

no, not in the allocator model I have chosen. The IRQ numbers 
are exposed to the guest with their offset. But this is another 
discussion which I would rather continue in another thread. 
 
>> +}
>> +
>> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx)
>> +{
>> +    if (idx >= x->nr_targets * XIVE_EQ_PRIORITY_COUNT) {
>> +        return NULL;
>> +    }
>> +
>> +    return &x->eqdt[idx];
>> +}
>> +
>> +/* TODO: improve EQ indexing. This is very simple and relies on the
>> + * fact that target (CPU) numbers start at 0 and are contiguous. It
>> + * should be OK for sPAPR.
>> + */
>> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
>> +                        uint32_t *out_eq_idx)
>> +{
>> +    if (priority > XIVE_PRIORITY_MAX || target >= x->nr_targets) {
>> +        return false;
>> +    }
>> +
>> +    if (out_eq_idx) {
>> +        *out_eq_idx = target + priority;
>> +    }
>> +
>> +    return true;
> 
> Seems a clunky interface.  Why not return a XiveEQ *, NULL if the
> inputs aren't valud.

Yes. This interface is inherited from OPAL and it's not consistent 
with the other xive_get_*() routines. But we are missing a XIVE 
internal table for VPs which explains the difference. I need to look 
at the support of the OS_REPORTING_LINE hcalls before simplifying.

Thanks,

C. 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-19  3:08   ` David Gibson
                       ` (2 preceding siblings ...)
  2017-07-19  4:02     ` Benjamin Herrenschmidt
@ 2017-07-24 13:00     ` Cédric Le Goater
  2017-07-25  1:26       ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
  3 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 13:00 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "qapi/error.h"
>> +#include "target/ppc/cpu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/dma.h"
>> +#include "monitor/monitor.h"
>> +#include "hw/ppc/xive.h"
>> +
>> +#include "xive-internal.h"
>> +
>> +/*
>> + * Main XIVE object
> 
> As with XICs, does it really make sense for there to be a "main" XIVE
> object, or should be an interface attached to the machine?

yes. There are internal tables which are very specific to the controller 
and I don't think they belong to the machine.

C.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-24  6:09     ` Benjamin Herrenschmidt
  2017-07-24  6:39       ` David Gibson
@ 2017-07-24 13:25       ` Cédric Le Goater
  2017-07-25  2:19         ` David Gibson
  1 sibling, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 13:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 08:09 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 14:49 +1000, David Gibson wrote:
>> On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
>>> Each source adds its own ESB mempry region to the overall ESB memory
>>> region of the controller. It will be mapped in the CPU address space
>>> when XIVE is activated.
>>>
>>> The default mapping address for the ESB memory region is the same one
>>> used on baremetal.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>>  hw/intc/xive-internal.h |  5 +++++
>>>  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
>>>  2 files changed, 48 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>>> index 8e755aa88a14..c06be823aad0 100644
>>> --- a/hw/intc/xive-internal.h
>>> +++ b/hw/intc/xive-internal.h
>>> @@ -98,6 +98,7 @@ struct XIVE {
>>>      SysBusDevice parent;
>>>  
>>>      /* Properties */
>>> +    uint32_t     chip_id;
>>
>> So there is a XIVE object per chip.  How does this work on PAPR?  One
>> logical chip/XIVE, or something more complex?
> 
> One global XIVE for PAPR. 

Yes. 

The chip-id is useless for sPAPR (0 is the default) but for a PowerNV
system, the address used to map the ESB memory region depends on the 
chip-id and  I thought we could reuse the same XIVE object. 

So, a sPAPR guest would use the address of a single chip baremetal 
system. This needs more explanation I agree. Thanks to Ben who is 
providing a lot. I will update the changelogs in the next version. 

The TIMA is mapped at a fixed address so the chip-id does not come 
in play.

> For the MMIOs, the way it works is that:
> 
>  - For MMIOs pertaining to a specific interrupt or queue, there's an H-
> call that will return the proper "guest physical" address. For qemu
> with KVM we'll have to probably create a single chunk of qemu address
> space (a single mem region) that contains individual pages mapped with
> MAP_FIXED originating from the different HW bits, we still need to sort
> out how exactly we'll do that in practice.

I haven't looked at all the KVM details. But, regarding the ESBs, I had
the above in mind and used a single memory region to contain them all. 
 
>  - For the TIMA (the presentation MMIOs), those are always at the same
> physical address for everybody (so for a guest it's a single memory
> region we'll map to that physical address), the HW "knows" which HW
> thread is talking to it (and the hypervisor tells the HW which vcpu is
> running on a given HW thread at a given point in time). That address is
> obtained from the device-tree
> 
>>>      uint32_t     nr_targets;
>>>  
>>>      /* IRQ number allocator */
>>> @@ -111,6 +112,10 @@ struct XIVE {
>>>      void         *sbe;
>>>      XiveIVE      *ivt;
>>>      XiveEQ       *eqdt;
>>> +
>>> +    /* ESB and TIMA memory location */
>>> +    hwaddr       vc_base;
>>> +    MemoryRegion esb_iomem;
>>>  };
>>>  
>>>  void xive_reset(void *dev);
>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>> index 8f8bb8b787bd..a1cb87a07b76 100644
>>> --- a/hw/intc/xive.c
>>> +++ b/hw/intc/xive.c
>>> @@ -312,6 +312,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>>      XiveICSState *xs = ICS_XIVE(ics);
>>>      Object *obj;
>>>      Error *err = NULL;
>>> +    XIVE *x;
>>
>> I don't really like just 'x' for a context variable like this (as
>> opposed to a temporary).

OK. I will change 'x' in 'xive' then.

>>>  
>>>      obj = object_property_get_link(OBJECT(xs), "xive", &err);
>>>      if (!obj) {
>>> @@ -319,7 +320,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>>                     __func__, error_get_pretty(err));
>>>          return;
>>>      }
>>> -    xs->xive = XIVE(obj);
>>> +    x = xs->xive = XIVE(obj);
>>>  
>>>      if (!ics->nr_irqs) {
>>>          error_setg(errp, "Number of interrupts needs to be greater 0");
>>> @@ -338,6 +339,11 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>>                            "xive.esb",
>>>                            (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
>>>  
>>> +    /* Install the ESB memory region in the overall one */
>>> +    memory_region_add_subregion(&x->esb_iomem,
>>> +                                ICS_BASE(xs)->offset * (1 << xs->esb_shift),
>>> +                                &xs->esb_iomem);
>>> +
>>>      qemu_register_reset(xive_ics_reset, xs);
>>>  }
>>>  
>>> @@ -375,6 +381,32 @@ static const TypeInfo xive_ics_info = {
>>>   */
>>>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
>>>  
>>> +/* VC BAR contains set translations for the ESBs and the EQs. */
>>> +#define VC_BAR_DEFAULT   0x10000000000ull
>>> +#define VC_BAR_SIZE      0x08000000000ull
>>> +
>>> +#define P9_MMIO_BASE     0x006000000000000ull
>>> +#define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
>>
>> chip-based MMIO addresses leaking into the PAPR model seems like it
>> might not be what you want

See above for the reason.


Thanks,

C. 

>>
>>> +static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
>>> +{
>>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
>>> +                  __func__, offset, size);
>>> +    return 0;
>>> +}
>>> +
>>> +static void xive_esb_default_write(void *opaque, hwaddr offset, uint64_t value,
>>> +                unsigned size)
>>> +{
>>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
>>> +                  __func__, offset, value, size);
>>> +}
>>> +
>>> +static const MemoryRegionOps xive_esb_default_ops = {
>>> +    .read = xive_esb_default_read,
>>> +    .write = xive_esb_default_write,
>>> +    .endianness = DEVICE_BIG_ENDIAN,
>>> +};
>>>  
>>>  void xive_reset(void *dev)
>>>  {
>>> @@ -435,10 +467,20 @@ static void xive_realize(DeviceState *dev, Error **errp)
>>>      x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
>>>                          sizeof(XiveEQ));
>>>  
>>> +    /* VC BAR. That's the full window but we will only map the
>>> +     * subregions in use. */
>>> +    x->vc_base = (hwaddr)(P9_CHIP_BASE(x->chip_id) | VC_BAR_DEFAULT);
>>> +
>>> +    /* install default memory region handlers to log bogus access */
>>> +    memory_region_init_io(&x->esb_iomem, NULL, &xive_esb_default_ops,
>>> +                          NULL, "xive.esb", VC_BAR_SIZE);
>>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
>>> +
>>>      qemu_register_reset(xive_reset, dev);
>>>  }
>>>  
>>>  static Property xive_properties[] = {
>>> +    DEFINE_PROP_UINT32("chip-id", XIVE, chip_id, 0),
>>>      DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
>>>      DEFINE_PROP_END_OF_LIST(),
>>>  };
>>
>>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-24  6:39       ` David Gibson
@ 2017-07-24 13:27         ` Cédric Le Goater
  2017-07-25  2:19           ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 13:27 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 08:39 AM, David Gibson wrote:
> On Mon, Jul 24, 2017 at 04:09:31PM +1000, Benjamin Herrenschmidt wrote:
>> On Mon, 2017-07-24 at 14:49 +1000, David Gibson wrote:
>>> On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
>>>> Each source adds its own ESB mempry region to the overall ESB memory
>>>> region of the controller. It will be mapped in the CPU address space
>>>> when XIVE is activated.
>>>>
>>>> The default mapping address for the ESB memory region is the same one
>>>> used on baremetal.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/xive-internal.h |  5 +++++
>>>>  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
>>>>  2 files changed, 48 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>>>> index 8e755aa88a14..c06be823aad0 100644
>>>> --- a/hw/intc/xive-internal.h
>>>> +++ b/hw/intc/xive-internal.h
>>>> @@ -98,6 +98,7 @@ struct XIVE {
>>>>      SysBusDevice parent;
>>>>  
>>>>      /* Properties */
>>>> +    uint32_t     chip_id;
>>>
>>> So there is a XIVE object per chip.  How does this work on PAPR?  One
>>> logical chip/XIVE, or something more complex?
>>
>> One global XIVE for PAPR. For the MMIOs, the way it works is that:
>>
>>  - For MMIOs pertaining to a specific interrupt or queue, there's an H-
>> call that will return the proper "guest physical" address. For qemu
>> with KVM we'll have to probably create a single chunk of qemu address
>> space (a single mem region) that contains individual pages mapped with
>> MAP_FIXED originating from the different HW bits, we still need to sort
>> out how exactly we'll do that in practice.
>>
>>  - For the TIMA (the presentation MMIOs), those are always at the same
>> physical address for everybody (so for a guest it's a single memory
>> region we'll map to that physical address), the HW "knows" which HW
>> thread is talking to it (and the hypervisor tells the HW which vcpu is
>> running on a given HW thread at a given point in time). That address is
>> obtained from the device-tree
> 
> Ok.  That leaves "chip_id" as a rather surprising thing to see in an
> object which will appear on PAPR.

We could also pass the address as a property instead of the chip-id when
creating the XIVE object. May be better for sPAPR.

C.  

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 10/26] ppc/xive: record interrupt source MMIO address for hcalls
  2017-07-24  5:11   ` David Gibson
@ 2017-07-24 13:45     ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 13:45 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 07:11 AM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:23PM +0200, Cédric Le Goater wrote:
>> The address of the MMIO page through which the Event State Buffer is
>> controlled is returned to the guest by the H_INT_GET_SOURCE_INFO hcall.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c        | 3 +++
>>  include/hw/ppc/xive.h | 1 +
>>  2 files changed, 4 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index a1cb87a07b76..0db97fd33981 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -344,6 +344,9 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>                                  ICS_BASE(xs)->offset * (1 << xs->esb_shift),
>>                                  &xs->esb_iomem);
>>  
>> +    /* Record base address which is needed by the hcalls */
>> +    xs->esb_base = x->vc_base + ICS_BASE(xs)->offset * (1 << xs->esb_shift);
> 
> This doesn't seem like it needs to be stored in the persistent object
> - it can be calculated when the hcall is made.  Plus if it's for the
> hcll it only makes sense for spapr.

yes. you are right. I will get rid of it.

Thanks,

C. 

>>      qemu_register_reset(xive_ics_reset, xs);
>>  }
>>  
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 1178300c9df3..b06bc861b845 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -43,6 +43,7 @@ struct XiveICSState {
>>  
>>      uint64_t     flags;
>>      uint32_t     esb_shift;
>> +    hwaddr       esb_base;
>>      MemoryRegion esb_iomem;
>>  
>>      XIVE         *xive;
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects
  2017-07-24  5:13   ` David Gibson
@ 2017-07-24 13:58     ` Cédric Le Goater
  2017-07-25 13:26       ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 13:58 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 07:13 AM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:24PM +0200, Cédric Le Goater wrote:
>> This handler will be used to customize the ouput of the XIVE interrupt
>> source and presenter objects.
> 
> I'm not really happy with this without having a clear idea of where
> this is heading - are you trying to share ICP and or ICS object
> classes between XICS and XIVE, or will they eventually be separated
> again?

Because of the XICSFabric interface of the sPAPR machine, we need 
to use ICPState and ICSState objects. 

sPAPR is strongly tied to ICPState and it is complex to introduce 
a new ICPState class for the sPAPR machine. We did introduce a new 
class in the past but that was for a new machine : PowerNV.
So I think we should just add a couple of attributes to ICPState
to support XIVE. That is not what the patchset does but I have
made progress since with hotplug and migration and came to that
conclusion. The consequence is that the print_info() handler is 
now obsolete for ICPs and we will need to find another way to 
customize the output.

For the interrupt source, the constraints are less strong, adding 
a new ICSState class seems like a good option and so does the 
print_info() handler.

Thanks,

C. 


>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xics.c        | 36 ++++++++++++++++++++++++------------
>>  include/hw/ppc/xics.h |  2 ++
>>  2 files changed, 26 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index faa5c631f655..7837c2022b4a 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -40,18 +40,26 @@
>>  
>>  void icp_pic_print_info(ICPState *icp, Monitor *mon)
>>  {
>> +    ICPStateClass *k = ICP_GET_CLASS(icp);
>>      int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
>>  
>>      if (!icp->output) {
>>          return;
>>      }
>> -    monitor_printf(mon, "CPU %d XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
>> -                   cpu_index, icp->xirr, icp->xirr_owner,
>> -                   icp->pending_priority, icp->mfrr);
>> +
>> +    monitor_printf(mon, "CPU %d ", cpu_index);
>> +    if (k->print_info) {
>> +        k->print_info(icp, mon);
>> +    } else {
>> +        monitor_printf(mon, "XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
>> +                       icp->xirr, icp->xirr_owner,
>> +                       icp->pending_priority, icp->mfrr);
>> +    }
>>  }
>>  
>>  void ics_pic_print_info(ICSState *ics, Monitor *mon)
>>  {
>> +    ICSStateClass *k = ICS_BASE_GET_CLASS(ics);
>>      uint32_t i;
>>  
>>      monitor_printf(mon, "ICS %4x..%4x %p\n",
>> @@ -61,17 +69,21 @@ void ics_pic_print_info(ICSState *ics, Monitor *mon)
>>          return;
>>      }
>>  
>> -    for (i = 0; i < ics->nr_irqs; i++) {
>> -        ICSIRQState *irq = ics->irqs + i;
>> +    if (k->print_info) {
>> +        k->print_info(ics, mon);
>> +    } else {
>> +        for (i = 0; i < ics->nr_irqs; i++) {
>> +            ICSIRQState *irq = ics->irqs + i;
>>  
>> -        if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
>> -            continue;
>> +            if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
>> +                continue;
>> +            }
>> +            monitor_printf(mon, "  %4x %s %02x %02x\n",
>> +                           ics->offset + i,
>> +                           (irq->flags & XICS_FLAGS_IRQ_LSI) ?
>> +                           "LSI" : "MSI",
>> +                           irq->priority, irq->status);
>>          }
>> -        monitor_printf(mon, "  %4x %s %02x %02x\n",
>> -                       ics->offset + i,
>> -                       (irq->flags & XICS_FLAGS_IRQ_LSI) ?
>> -                       "LSI" : "MSI",
>> -                       irq->priority, irq->status);
>>      }
>>  }
>>  
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 28d248abad61..902f3bfd0e33 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -69,6 +69,7 @@ struct ICPStateClass {
>>      void (*pre_save)(ICPState *icp);
>>      int (*post_load)(ICPState *icp, int version_id);
>>      void (*reset)(ICPState *icp);
>> +    void (*print_info)(ICPState *icp, Monitor *mon);
>>  };
>>  
>>  struct ICPState {
>> @@ -119,6 +120,7 @@ struct ICSStateClass {
>>      void (*reject)(ICSState *s, uint32_t irq);
>>      void (*resend)(ICSState *s);
>>      void (*eoi)(ICSState *s, uint32_t irq);
>> +    void (*print_info)(ICSState *s, Monitor *mon);
>>  };
>>  
>>  struct ICSState {
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 13/26] ppc/xive: introduce a XIVE interrupt presenter model
  2017-07-24  6:05   ` David Gibson
@ 2017-07-24 14:02     ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 14:02 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 08:05 AM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:26PM +0200, Cédric Le Goater wrote:
>> Just like the interrupt source model, we try to reuse the ICP model
>> because the sPAPR machine is tied to the XICSFabric interface and
>> should be using a common framework to switch from one controller model
>> to another: XICS <-> XIVE.
>>
>> The XIVE interrupt presenter exposes a set of Thread Interrupt
>> Management Areas, also called rings, one per different level of
>> privilege (four in all). We only expose the OS ring for the sPAPR
>> support for the moment. This area is used to handle priority
>> management and interrupt acknowledgment among other things.
>>
>> The next patch will introduce the MMIO handlers to interact with the
>> TIMA, OS only.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> As with the ICS, I'm not really clear where you're going with this.
> Is this a first step towards independent xics and xive ICP objects, or
> a first step towards fully unified xics/xive ICPs?

As stated in a previous email, I think that this patch is the wrong 
approach. sPAPR is strongly tied to ICPState and it is complex to introduce 
a new ICPState class for the sPAPR machine. We should just add a couple 
of attributes to ICPState to support XIVE.

> 
>> ---
>>  hw/intc/xive-internal.h | 84 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive.c          | 43 +++++++++++++++++++++++++
>>  include/hw/ppc/xive.h   | 14 +++++++++
>>  3 files changed, 141 insertions(+)
>>
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> index c06be823aad0..ba5e648a5258 100644
>> --- a/hw/intc/xive-internal.h
>> +++ b/hw/intc/xive-internal.h
>> @@ -24,6 +24,90 @@
>>  #define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>>                                   PPC_BIT32(bs))
>>  
>> +/*
>> + * Thread Management (aka "TM") registers
>> + */
>> +
>> +/* TM register offsets */
>> +#define TM_QW0_USER             0x000 /* All rings */
>> +#define TM_QW1_OS               0x010 /* Ring 0..2 */
>> +#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
>> +#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
>> +
>> +/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
>> +#define TM_NSR                  0x0  /*  +   +   -   +  */
>> +#define TM_CPPR                 0x1  /*  -   +   -   +  */
>> +#define TM_IPB                  0x2  /*  -   +   +   +  */
>> +#define TM_LSMFB                0x3  /*  -   +   +   +  */
>> +#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
>> +#define TM_INC                  0x5  /*  -   +   -   +  */
>> +#define TM_AGE                  0x6  /*  -   +   -   +  */
>> +#define TM_PIPR                 0x7  /*  -   +   -   +  */
>> +
>> +#define TM_WORD0                0x0
>> +#define TM_WORD1                0x4
>> +
>> +/*
>> + * QW word 2 contains the valid bit at the top and other fields
>> + * depending on the QW.
>> + */
>> +#define TM_WORD2                0x8
>> +#define   TM_QW0W2_VU           PPC_BIT32(0)
>> +#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
>> +#define   TM_QW1W2_VO           PPC_BIT32(0)
>> +#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
>> +#define   TM_QW2W2_VP           PPC_BIT32(0)
>> +#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
>> +#define   TM_QW3W2_VT           PPC_BIT32(0)
>> +#define   TM_QW3W2_LP           PPC_BIT32(6)
>> +#define   TM_QW3W2_LE           PPC_BIT32(7)
>> +#define   TM_QW3W2_T            PPC_BIT32(31)
>> +
>> +/*
>> + * In addition to normal loads to "peek" and writes (only when invalid)
>> + * using 4 and 8 bytes accesses, the above registers support these
>> + * "special" byte operations:
>> + *
>> + *   - Byte load from QW0[NSR] - User level NSR (EBB)
>> + *   - Byte store to QW0[NSR] - User level NSR (EBB)
>> + *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
>> + *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
>> + *                                    otherwise VT||0000000
>> + *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
>> + *
>> + * Then we have all these "special" CI ops at these offset that trigger
>> + * all sorts of side effects:
>> + */
>> +#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
>> +#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
>> +#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
>> +#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
>> +                                         * context */
>> +#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
>> +#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
>> +                                         * context to reg */
>> +#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
>> +                                         * context to reg*/
>> +#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
>> +#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
>> +                                         * line */
>> +#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
>> +#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
>> +                                         * line */
>> +#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
>> +/* XXX more... */
>> +
>> +/* NSR fields for the various QW ack types */
>> +#define TM_QW0_NSR_EB           PPC_BIT8(0)
>> +#define TM_QW1_NSR_EO           PPC_BIT8(0)
>> +#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
>> +#define  TM_QW3_NSR_HE_NONE     0
>> +#define  TM_QW3_NSR_HE_POOL     1
>> +#define  TM_QW3_NSR_HE_PHYS     2
>> +#define  TM_QW3_NSR_HE_LSI      3
>> +#define TM_QW3_NSR_I            PPC_BIT8(2)
>> +#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
>> +
>>  /* IVE/EAS
>>   *
>>   * One per interrupt source. Targets that interrupt to a given EQ
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index db808e0cbe3d..c08a4f8efb58 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -26,6 +26,48 @@
>>  
>>  #include "xive-internal.h"
>>  
>> +static void xive_icp_reset(ICPState *icp)
>> +{
>> +    XiveICPState *xicp = XIVE_ICP(icp);
>> +
>> +    memset(xicp->tima, 0, sizeof(xicp->tima));
>> +}
>> +
>> +static void xive_icp_print_info(ICPState *icp, Monitor *mon)
>> +{
>> +    XiveICPState *xicp = XIVE_ICP(icp);
>> +
>> +    monitor_printf(mon, " CPPR=%02x IPB=%02x PIPR=%02x NSR=%02x\n",
>> +                   xicp->tima_os[TM_CPPR], xicp->tima_os[TM_IPB],
>> +                   xicp->tima_os[TM_PIPR], xicp->tima_os[TM_NSR]);
>> +}
>> +
>> +static void xive_icp_init(Object *obj)
>> +{
>> +    XiveICPState *xicp = XIVE_ICP(obj);
>> +
>> +    xicp->tima_os = &xicp->tima[TM_QW1_OS];
> 
> Storing an easily derivable pointer in your structure seems a bit
> pointless.

Well, it is a nice-to-have to simplify a bit the code.  

Thanks,

C.

>> +}
>> +
>> +static void xive_icp_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    ICPStateClass *icpc = ICP_CLASS(klass);
>> +
>> +    dc->desc = "PowerNV Xive ICP";
>> +    icpc->reset = xive_icp_reset;
>> +    icpc->print_info = xive_icp_print_info;
>> +}
>> +
>> +static const TypeInfo xive_icp_info = {
>> +    .name          = TYPE_XIVE_ICP,
>> +    .parent        = TYPE_ICP,
>> +    .instance_size = sizeof(XiveICPState),
>> +    .instance_init = xive_icp_init,
>> +    .class_init    = xive_icp_class_init,
>> +    .class_size    = sizeof(ICPStateClass),
>> +};
>> +
>>  static void xive_icp_irq(XiveICSState *xs, int lisn)
>>  {
>>  
>> @@ -529,6 +571,7 @@ static void xive_register_types(void)
>>  {
>>      type_register_static(&xive_info);
>>      type_register_static(&xive_ics_info);
>> +    type_register_static(&xive_icp_info);
>>  }
>>  
>>  type_init(xive_register_types)
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index b06bc861b845..f87df8107dd9 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -23,6 +23,7 @@
>>  
>>  typedef struct XIVE XIVE;
>>  typedef struct XiveICSState XiveICSState;
>> +typedef struct XiveICPState XiveICPState;
>>  
>>  #define TYPE_XIVE "xive"
>>  #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
>> @@ -38,6 +39,9 @@ typedef struct XiveICSState XiveICSState;
>>  #define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
>>  #define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
>>  
>> +#define TYPE_XIVE_ICP "xive-icp"
>> +#define XIVE_ICP(obj) OBJECT_CHECK(XiveICPState, (obj), TYPE_XIVE_ICP)
>> +
>>  struct XiveICSState {
>>      ICSState parent_obj;
>>  
>> @@ -49,4 +53,14 @@ struct XiveICSState {
>>      XIVE         *xive;
>>  };
>>  
>> +/* Number of Thread Management Interrupt Areas */
>> +#define XIVE_TM_RING_COUNT 4
>> +
>> +struct XiveICPState {
>> +    ICPState parent_obj;
>> +
>> +    uint8_t tima[XIVE_TM_RING_COUNT * 0x10];
>> +    uint8_t *tima_os;
>> +};
>> +
>>  #endif /* PPC_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-24  6:35   ` David Gibson
@ 2017-07-24 14:44     ` Cédric Le Goater
  2017-07-25  4:20       ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 14:44 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 08:35 AM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:27PM +0200, Cédric Le Goater wrote:
>> The Thread Interrupt Management Area for the OS is mostly used to
>> acknowledge interrupts and set the CPPR of the CPU.
>>
>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
>> used to retrieve the targeted interrupt presenter object.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Am I right in thinking that this shoehorns the XIVE TIMA state into
> the existing XICS ICP object.  That.. doesn't seem like a good idea.

The TIMA memory region is under the XIVE object because it is 
unique for the system. The lookup of the ICP is simply done using 
'current_cpu'. The TIMA state is under the ICPState, yes, but this 
model does not seem incorrect to me as this state contains the 
interrupt information presented to a CPU.   

Thanks,

C.

>> ---
>>  hw/intc/xive-internal.h |   4 ++
>>  hw/intc/xive.c          | 187 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 191 insertions(+)
>>
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> index ba5e648a5258..5e8b78a1ea6a 100644
>> --- a/hw/intc/xive-internal.h
>> +++ b/hw/intc/xive-internal.h
>> @@ -200,6 +200,10 @@ struct XIVE {
>>      /* ESB and TIMA memory location */
>>      hwaddr       vc_base;
>>      MemoryRegion esb_iomem;
>> +
>> +    uint32_t     tm_shift;
>> +    hwaddr       tm_base;
>> +    MemoryRegion tm_iomem;
>>  };
>>  
>>  void xive_reset(void *dev);
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index c08a4f8efb58..82b2f0dcda0b 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -26,6 +26,180 @@
>>  
>>  #include "xive-internal.h"
>>  
>> +static uint8_t priority_to_ipb(uint8_t priority)
>> +{
>> +    return priority < XIVE_EQ_PRIORITY_COUNT ? 1 << (7 - priority) : 0;
>> +}
>> +
>> +static uint64_t xive_icp_accept(XiveICPState *xicp)
>> +{
>> +    ICPState *icp = ICP(xicp);
>> +    uint8_t nsr = xicp->tima_os[TM_NSR];
>> +
>> +    qemu_irq_lower(icp->output);
>> +
>> +    if (xicp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
>> +        uint8_t cppr = xicp->tima_os[TM_PIPR];
>> +
>> +        xicp->tima_os[TM_CPPR] = cppr;
>> +
>> +        /* Reset the pending buffer bit */
>> +        xicp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
>> +
>> +        /* Drop Exception bit for OS */
>> +        xicp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
>> +    }
>> +
>> +    return (nsr << 8) | xicp->tima_os[TM_CPPR];
>> +}
>> +
>> +static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
>> +{
>> +    if (cppr > XIVE_PRIORITY_MAX) {
>> +        cppr = 0xff;
>> +    }
>> +
>> +    xicp->tima_os[TM_CPPR] = cppr;
>> +}
>> +
>> +/*
>> + * Thread Interrupt Management Area MMIO
>> + */
>> +static uint64_t xive_tm_read_special(XiveICPState *icp, hwaddr offset,
>> +                                     unsigned size)
>> +{
>> +    uint64_t ret = -1;
>> +
>> +    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
>> +        ret = xive_icp_accept(icp);
>> +    } else {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
>> +                      HWADDR_PRIx" size %d\n", offset, size);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static uint64_t xive_tm_read(void *opaque, hwaddr offset, unsigned size)
>> +{
>> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
>> +    XiveICPState *icp = XIVE_ICP(cpu->intc);
>> +    uint64_t ret = -1;
>> +    int i;
>> +
>> +    if (offset >= TM_SPC_ACK_EBB) {
>> +        return xive_tm_read_special(icp, offset, size);
>> +    }
>> +
>> +    if (offset & TM_QW1_OS) {
>> +        switch (size) {
>> +        case 1:
>> +        case 2:
>> +        case 4:
>> +        case 8:
>> +            if (QEMU_IS_ALIGNED(offset, size)) {
>> +                ret = 0;
>> +                for (i = 0; i < size; i++) {
>> +                    ret |= icp->tima[offset + i] << (8 * i);
>> +                }
>> +            } else {
>> +                qemu_log_mask(LOG_GUEST_ERROR,
>> +                              "XIVE: invalid TIMA read alignment @%"
>> +                              HWADDR_PRIx" size %d\n", offset, size);
>> +            }
>> +            break;
>> +        default:
>> +            g_assert_not_reached();
>> +        }
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
>> +                      HWADDR_PRIx"\n", offset);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static bool xive_tm_is_readonly(uint8_t index)
>> +{
>> +    /* Let's be optimistic and prepare ground for HV mode support */
>> +    switch (index) {
>> +    case TM_QW1_OS + TM_CPPR:
>> +        return false;
>> +    default:
>> +        return true;
>> +    }
>> +}
>> +
>> +static void xive_tm_write_special(XiveICPState *xicp, hwaddr offset,
>> +                                  uint64_t value, unsigned size)
>> +{
>> +    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
>> +        xicp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
>> +    } else {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
>> +                      HWADDR_PRIx" size %d\n", offset, size);
>> +    }
>> +
>> +    /* TODO: support TM_SPC_ACK_OS_EL */
>> +}
>> +
>> +static void xive_tm_write(void *opaque, hwaddr offset,
>> +                           uint64_t value, unsigned size)
>> +{
>> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
>> +    XiveICPState *icp = XIVE_ICP(cpu->intc);
>> +    int i;
>> +
>> +    if (offset >= TM_SPC_ACK_EBB) {
>> +        xive_tm_write_special(icp, offset, value, size);
>> +        return;
>> +    }
>> +
>> +    if (offset & TM_QW1_OS) {
>> +        switch (size) {
>> +        case 1:
>> +            if (offset == TM_QW1_OS + TM_CPPR) {
>> +                xive_icp_set_cppr(icp, value & 0xff);
>> +            }
>> +            break;
>> +        case 4:
>> +        case 8:
>> +            if (QEMU_IS_ALIGNED(offset, size)) {
>> +                for (i = 0; i < size; i++) {
>> +                    if (!xive_tm_is_readonly(offset + i)) {
>> +                        icp->tima[offset + i] = (value >> (8 * i)) & 0xff;
>> +                    }
>> +                }
>> +            } else {
>> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
>> +                              HWADDR_PRIx" size %d\n", offset, size);
>> +            }
>> +            break;
>> +        default:
>> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
>> +                          HWADDR_PRIx" size %d\n", offset, size);
>> +        }
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
>> +                      HWADDR_PRIx"\n", offset);
>> +    }
>> +}
>> +
>> +
>> +static const MemoryRegionOps xive_tm_ops = {
>> +    .read = xive_tm_read,
>> +    .write = xive_tm_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +    .valid = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +    },
>> +    .impl = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>>  static void xive_icp_reset(ICPState *icp)
>>  {
>>      XiveICPState *xicp = XIVE_ICP(icp);
>> @@ -453,6 +627,11 @@ static const TypeInfo xive_ics_info = {
>>  #define P9_MMIO_BASE     0x006000000000000ull
>>  #define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
>>  
>> +/* Thread Interrupt Management Area MMIO */
>> +#define TM_BAR_DEFAULT   0x30203180000ull
>> +#define TM_SHIFT         16
>> +#define TM_BAR_SIZE      (XIVE_TM_RING_COUNT * (1 << TM_SHIFT))
>> +
>>  static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
>>  {
>>      qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
>> @@ -541,6 +720,14 @@ static void xive_realize(DeviceState *dev, Error **errp)
>>                            NULL, "xive.esb", VC_BAR_SIZE);
>>      sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
>>  
>> +    /* TM BAR. Same address for each chip */
>> +    x->tm_base = (P9_MMIO_BASE | TM_BAR_DEFAULT);
>> +    x->tm_shift = TM_SHIFT;
>> +
>> +    memory_region_init_io(&x->tm_iomem, OBJECT(x), &xive_tm_ops, x,
>> +                          "xive.tm", TM_BAR_SIZE);
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->tm_iomem);
>> +
>>      qemu_register_reset(xive_reset, dev);
>>  }
>>  
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 17/26] ppc/xive: add hcalls support
  2017-07-24  9:39   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
@ 2017-07-24 14:55     ` Cédric Le Goater
  2017-07-25  2:09       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 14:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson; +Cc: qemu-ppc, qemu-devel

On 07/24/2017 11:39 AM, Alexey Kardashevskiy wrote:
> On 06/07/17 03:13, Cédric Le Goater wrote:
>> A set of Hypervisor's call are used to configure the interrupt sources
>> and the event/notification queues of the guest:
>>
>>    H_INT_GET_SOURCE_INFO
>>    H_INT_SET_SOURCE_CONFIG
>>    H_INT_GET_SOURCE_CONFIG
>>    H_INT_GET_QUEUE_INFO
>>    H_INT_SET_QUEUE_CONFIG
>>    H_INT_GET_QUEUE_CONFIG
>>    H_INT_RESET
>>    H_INT_ESB
>>
>> Calls that still need to be addressed :
>>
>>    H_INT_SET_OS_REPORTING_LINE
>>    H_INT_GET_OS_REPORTING_LINE
>>    H_INT_SYNC
>>
>> See below for the documentation on each hcall.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  default-configs/ppc64-softmmu.mak |   1 +
>>  hw/intc/Makefile.objs             |   1 +
>>  hw/intc/xive_spapr.c              | 745 ++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h            |  17 +-
>>  include/hw/ppc/xive.h             |   4 +
>>  5 files changed, 767 insertions(+), 1 deletion(-)
>>  create mode 100644 hw/intc/xive_spapr.c
>>
>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>> index 1179c07e6e9f..3888168adf95 100644
>> --- a/default-configs/ppc64-softmmu.mak
>> +++ b/default-configs/ppc64-softmmu.mak
>> @@ -57,6 +57,7 @@ CONFIG_XICS=$(CONFIG_PSERIES)
>>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>>  CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
>>  CONFIG_XIVE=$(CONFIG_PSERIES)
>> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>>  # For PReP
>>  CONFIG_SERIAL_ISA=y
>>  CONFIG_MC146818RTC=y
>> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
>> index 28b83456bfcc..31b4fae2d1a8 100644
>> --- a/hw/intc/Makefile.objs
>> +++ b/hw/intc/Makefile.objs
>> @@ -36,6 +36,7 @@ obj-$(CONFIG_XICS) += xics.o
>>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>>  obj-$(CONFIG_XIVE) += xive.o
>> +obj-$(CONFIG_XIVE_SPAPR) += xive_spapr.o
>>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
>> diff --git a/hw/intc/xive_spapr.c b/hw/intc/xive_spapr.c
>> new file mode 100644
>> index 000000000000..b634d1f28f10
>> --- /dev/null
>> +++ b/hw/intc/xive_spapr.c
>> @@ -0,0 +1,745 @@
>> +/*
>> + * QEMU PowerPC XIVE model for pSeries
>> + *
>> + * Copyright (c) 2017, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "qapi/error.h"
>> +#include "cpu.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/ppc/xive.h"
>> +#include "hw/ppc/fdt.h"
>> +#include "monitor/monitor.h"
>> +
>> +#include "xive-internal.h"
>> +
>> +static XiveICSState *xive_ics_find(sPAPRMachineState *spapr, uint32_t lisn)
>> +{
>> +    XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
>> +    ICSState *ics = xic->ics_get(XICS_FABRIC(spapr), lisn);
>> +
>> +    return ICS_XIVE(ics);
>> +}
>> +
>> +static bool priority_is_valid(int priority)
>> +{
>> +    return priority >= 0 && priority < 8;
>> +}
>> +
>> +/*
>> + * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
>> + * real address of the MMIO page through which the Event State Buffer
>> + * entry associated with the value of the "lisn" parameter is managed.
>> + *
>> + * Parameters:
>> + * Input
>> + * - "flags"
>> + *       Bits 0-63 reserved
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *       ibm,query-interrupt-source-number RTAS call, or as returned
>> + *       by the H_ALLOCATE_VAS_WINDOW hcall
>> + *
>> + * Output
>> + * - R4: "flags"
>> + *       Bits 0-59: Reserved
>> + *       Bit 60: H_INT_ESB must be used for Event State Buffer
>> + *               management
>> + *       Bit 61: 1 == LSI  0 == MSI
>> + *       Bit 62: the full function page supports trigger
>> + *       Bit 63: Store EOI Supported
>> + * - R5: Logical Real address of full function Event State Buffer
>> + *       management page, -1 if ESB hcall flag is set to 1.
>> + * - R6: Logical Real Address of trigger only Event State Buffer
>> + *       management page or -1.
>> + * - R7: Power of 2 page size for the ESB management pages returned in
>> + *       R5 and R6.
>> + */
>> +static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
>> +                                          sPAPRMachineState *spapr,
>> +                                          target_ulong opcode,
>> +                                          target_ulong *args)
>> +{
>> +    target_ulong flags  = args[0];
>> +    target_ulong lisn   = args[1];
>> +    XiveICSState *xs;
>> +    uint32_t srcno;
>> +    uint64_t mmio_base;
>> +    ICSIRQState *irq;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    xs = xive_ics_find(spapr, lisn);
>> +    if (!xs) {
>> +        return H_P2;
>> +    }
>> +
>> +    srcno = lisn - ICS_BASE(xs)->offset;
>> +    mmio_base = (uint64_t)xs->esb_base + (1ull << xs->esb_shift) * srcno;
>> +    irq = &ICS_BASE(xs)->irqs[srcno];
>> +
>> +    args[0] = 0;
>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>> +        args[0] |= XIVE_SRC_LSI;
>> +    }
>> +    if (xs->flags & XIVE_SRC_TRIGGER) {
>> +        args[0] |= XIVE_SRC_TRIGGER;
>> +    }
>> +
>> +    /* never used in QEMU  */
>> +    if (xs->flags & XIVE_SRC_H_INT_ESB) {
>> +        args[1] = -1;
> 
> 
> args[2] in undefined here.

ah, yes indeed. I will fix that.
> 
> 
>> +    } else {
>> +        args[1] = mmio_base;
>> +        if (xs->flags & XIVE_SRC_TRIGGER) {
>> +            args[2] = -1; /* No specific trigger page */
>> +        } else {
>> +            args[2] = -1; /* TODO: support for specific trigger page */
>> +        }
>> +    }
>> +
>> +    args[3] = xs->esb_shift;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
>> + * Interrupt Source to a target. The Logical Interrupt Source is
>> + * designated with the "lisn" parameter and the target is designated
>> + * with the "target" and "priority" parameters.  Upon return from the
>> + * hcall(), no additional interrupts will be directed to the old EQ.
>> + * The old EQ should be investigated for interrupts that occurred
>> + * prior to or during the hcall().
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-61: Reserved
>> + *      Bit 62: set the "eisn" in the EA
>> + *      Bit 63: masks the interrupt source in the hardware interrupt
>> + *      control structure. An interrupt masked by this mechanism will
>> + *      be dropped, but it's source state bits will still be
>> + *      set. There is no race-free way of unmasking and restoring the
>> + *      source. Thus this should only be used in interrupts that are
>> + *      also masked at the source, and only in cases where the
>> + *      interrupt is not meant to be used for a large amount of time
>> + *      because no valid target exists for it for example
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *      ibm,query-interrupt-source-number RTAS call, or as returned by
>> + *      the H_ALLOCATE_VAS_WINDOW hcall
>> + * - "target" is per "ibm,ppc-interrupt-server#s" or
>> + *      "ibm,ppc-interrupt-gserver#s"
>> + * - "priority" is a valid priority not in
>> + *      "ibm,plat-res-int-priorities"
>> + * - "eisn" is the guest EISN associated with the "lisn"
>> + *
>> + * Output:
>> + * - None
>> + */
>> +
>> +#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
>> +#define XIVE_SRC_MASK     (1ull << (63 - 63))
>> +
>> +static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
>> +                                            sPAPRMachineState *spapr,
>> +                                            target_ulong opcode,
>> +                                            target_ulong *args)
>> +{
>> +    XiveIVE *ive;
>> +    uint64_t new_ive;
>> +    target_ulong flags    = args[0];
>> +    target_ulong lisn     = args[1];
>> +    target_ulong target   = args[2];
>> +    target_ulong priority = args[3];
>> +    target_ulong eisn     = args[4];
>> +    uint32_t eq_idx;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    ive = xive_get_ive(spapr->xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        return H_P2;
>> +    }
>> +    new_ive = ive->w;
>> +
>> +    /* Let's handle 0xff priority as if the interrupt was masked */
>> +    if (priority == 0xff || (flags & XIVE_SRC_MASK)) {
>> +        new_ive |= IVE_MASKED;
>> +        priority = 7;
>> +    } else {
>> +        new_ive = ive->w & ~IVE_MASKED;
>> +    }
>> +
>> +    if (!priority_is_valid(priority)) {
>> +        return H_P4;
>> +    }
>> +
>> +    /* First find the EQ corresponding to the target */
>> +    if (!xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
>> +        return H_P3;
>> +    }
>> +
>> +    /* And update */
>> +    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
>> +    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
>> +
>> +    if (flags & XIVE_SRC_SET_EISN) {
>> +        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
>> +    }
>> +
>> +    ive->w = new_ive;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
>> + * target/priority pair is assigned to the specified Logical Interrupt
>> + * Source.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-63 Reserved
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *      ibm,query-interrupt-source-number RTAS call, or as
>> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
>> + *
>> + * Output:
>> + * - R4: Target to which the specified Logical Interrupt Source is
>> + *       assigned
>> + * - R5: Priority to which the specified Logical Interrupt Source is
>> + *       assigned
>> + */
>> +static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
>> +                                            sPAPRMachineState *spapr,
>> +                                            target_ulong opcode,
>> +                                            target_ulong *args)
>> +{
>> +    target_ulong flags = args[0];
>> +    target_ulong lisn = args[1];
>> +    XiveIVE *ive;
>> +    XiveEQ *eq;
>> +    uint32_t eq_idx;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    ive = xive_get_ive(spapr->xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        return H_P2;
>> +    }
>> +
>> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
>> +    eq = xive_get_eq(spapr->xive, eq_idx);
>> +    if (!eq) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (ive->w & IVE_MASKED) {
>> +        args[1] = 0xff;
>> +    } else {
>> +        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>> +    }
>> +
>> +    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> 
> 
> 
> R6 is missing but you added it in your github tree so never mind :)
> 

Yes. I have updated the hcalls in my github tree with some fixes and 
also some small recent changes in the specs.

Thanks,

C.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model
  2017-07-24  4:02   ` David Gibson
  2017-07-24  6:00     ` Alexey Kardashevskiy
@ 2017-07-24 15:13     ` Cédric Le Goater
  1 sibling, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 15:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 06:02 AM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:19PM +0200, Cédric Le Goater wrote:
>> This is very similar to the current ICS_SIMPLE model in XICS. We try
>> to reuse the ICS model because the sPAPR machine is tied to the
>> XICSFabric interface and should be using a common framework to switch
>> from one controller model to another: XICS <-> XIVE.
> 
> Hm.  I'm not entirely concvinced re-using the xics ICSState class in
> this way is a good idea, though maybe it's a reasonable first step.
> With this patch alone some code is shared, but there are some real
> uglies around the edges.

yes. I agree. The patchset is here to discuss these model issues. 

> Seems to me at least long term you need to either 1) make the XIVE ics
> separate, even if it has similarities to the XICS one or 2) truly
> unify them, with a common base type and methods to handle the
> differences.

OK. We should also discuss the IRQ number allocator. That's another 
email thread.

>> The next patch will introduce the MMIO handlers to interact with XIVE
>> interrupt sources.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c        | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/xive.h |  12 ++++++
>>  2 files changed, 122 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 5b14d8155317..9ff14c0da595 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -26,6 +26,115 @@
>>  
>>  #include "xive-internal.h"
>>  
>> +static void xive_icp_irq(XiveICSState *xs, int lisn)
>> +{
>> +
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source
>> + */
>> +static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
>> +{
>> +    if (val) {
>> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
>> +    }
>> +}
>> +
>> +static void xive_ics_set_irq_lsi(XiveICSState *xs, int srcno, int val)
>> +{
>> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
>> +
>> +    if (val) {
>> +        irq->status |= XICS_STATUS_ASSERTED;
>> +    } else {
>> +        irq->status &= ~XICS_STATUS_ASSERTED;
>> +    }
>> +
>> +    if (irq->status & XICS_STATUS_ASSERTED
>> +        && !(irq->status & XICS_STATUS_SENT)) {
>> +        irq->status |= XICS_STATUS_SENT;
>> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
>> +    }
>> +}
>> +
>> +static void xive_ics_set_irq(void *opaque, int srcno, int val)
>> +{
>> +    XiveICSState *xs = ICS_XIVE(opaque);
>> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
>> +
>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>> +        xive_ics_set_irq_lsi(xs, srcno, val);
>> +    } else {
>> +        xive_ics_set_irq_msi(xs, srcno, val);
>> +    }
>> +}
> 
> e.g. you have some code re-use, but still need to more-or-less
> duplicate the set_irq code as above.

yes. I am not sure how to do this though. We could use some property 
on the ICS to know in which interrupt mode we are running and branch, 
but wouldn't that pollute a lot the current code ? 

>> +static void xive_ics_reset(void *dev)
>> +{
>> +    ICSState *ics = ICS_BASE(dev);
>> +    int i;
>> +    uint8_t flags[ics->nr_irqs];
>> +
>> +    for (i = 0; i < ics->nr_irqs; i++) {
>> +        flags[i] = ics->irqs[i].flags;
>> +    }
>> +
>> +    memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
>> +
>> +    for (i = 0; i < ics->nr_irqs; i++) {
>> +        ics->irqs[i].flags = flags[i];
>> +    }
> 
> This save, clear, restore is also kind ugly.  I'm also not sure why
> this needs a reset method when I can't find one for the xics ICS.

Hmm, this is a copy paste of ics_simple_reset() but we can fix both.

> Does the xics irqstate structure really cover what you need for xive?

There is too much in it. We only need the flags to know if the IRQ is
allocated and if it is a LSI. In fact, the ICSIRQState array is the
only information we need to share to support resets between the XIVE 
and XICS modes. The allocator should be the same of course. But it
is a bit of a hack for now.

> I had the impression elsewhere that xive had a different priority
> model to xics.  And there's the xics pointer in the icsstate structure
> which is definitely redundant.

In the hcalls, we need to do ICS lookups using IRQ numbers and this
is when the ics_get() handler of the XICSFabric interface is used.
I agree we could find some other ways but that is what we have put
in place for sPAPR and PowerNV.

Thanks,

C. 

> 
>> +}
>> +
>> +static void xive_ics_realize(ICSState *ics, Error **errp)
>> +{
>> +    XiveICSState *xs = ICS_XIVE(ics);
>> +    Object *obj;
>> +    Error *err = NULL;
>> +
>> +    obj = object_property_get_link(OBJECT(xs), "xive", &err);
>> +    if (!obj) {
>> +        error_setg(errp, "%s: required link 'xive' not found: %s",
>> +                   __func__, error_get_pretty(err));
>> +        return;
>> +    }
>> +    xs->xive = XIVE(obj);
>> +
>> +    if (!ics->nr_irqs) {
>> +        error_setg(errp, "Number of interrupts needs to be greater 0");
>> +        return;
>> +    }
>> +
>> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>> +    ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
>> +
>> +    qemu_register_reset(xive_ics_reset, xs);
>> +}
>> +
>> +static Property xive_ics_properties[] = {
>> +    DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
>> +    DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void xive_ics_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    ICSStateClass *isc = ICS_BASE_CLASS(klass);
>> +
>> +    isc->realize = xive_ics_realize;
>> +
>> +    dc->props = xive_ics_properties;
>> +}
>> +
>> +static const TypeInfo xive_ics_info = {
>> +    .name = TYPE_ICS_XIVE,
>> +    .parent = TYPE_ICS_BASE,
>> +    .instance_size = sizeof(XiveICSState),
>> +    .class_init = xive_ics_class_init,
>> +};
>> +
>>  /*
>>   * Main XIVE object
>>   */
>> @@ -123,6 +232,7 @@ static const TypeInfo xive_info = {
>>  static void xive_register_types(void)
>>  {
>>      type_register_static(&xive_info);
>> +    type_register_static(&xive_ics_info);
>>  }
>>  
>>  type_init(xive_register_types)
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 863f5a9c6b5f..544cc6e0c796 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -19,9 +19,21 @@
>>  #ifndef PPC_XIVE_H
>>  #define PPC_XIVE_H
>>  
>> +#include "hw/ppc/xics.h"
>> +
>>  typedef struct XIVE XIVE;
>> +typedef struct XiveICSState XiveICSState;
>>  
>>  #define TYPE_XIVE "xive"
>>  #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
>>  
>> +#define TYPE_ICS_XIVE "xive-source"
>> +#define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
>> +
>> +struct XiveICSState {
>> +    ICSState parent_obj;
>> +
>> +    XIVE         *xive;
>> +};
> 
>>  #endif /* PPC_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model
  2017-07-24  6:00     ` Alexey Kardashevskiy
@ 2017-07-24 15:20       ` Cédric Le Goater
  2017-07-25  3:06         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 15:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson; +Cc: qemu-ppc, Alexander Graf, qemu-devel

On 07/24/2017 08:00 AM, Alexey Kardashevskiy wrote:
> On 24/07/17 14:02, David Gibson wrote:
>> On Wed, Jul 05, 2017 at 07:13:19PM +0200, Cédric Le Goater wrote:
>>> This is very similar to the current ICS_SIMPLE model in XICS. We try
>>> to reuse the ICS model because the sPAPR machine is tied to the
>>> XICSFabric interface and should be using a common framework to switch
>>> from one controller model to another: XICS <-> XIVE.
>>
>> Hm.  I'm not entirely concvinced re-using the xics ICSState class in
>> this way is a good idea, though maybe it's a reasonable first step.
>> With this patch alone some code is shared, but there are some real
>> uglies around the edges.
> 
> 
> Agree, using the "ICS" term in XIVE is quite confusing as "ICS" is not
> mentioned in neither XIVE nor P9 specs.

Indeed. 

The XIVE specs mention Source Controller (P3SC) or Interrupt 
Virtualization Source Engine (IVSE). The sPAPR specs use 
Interrupt Source a lot.

Let's unify them all under one name ? I propose ICS :)

Thanks,

C. 


 
>>
>> Seems to me at least long term you need to either 1) make the XIVE ics
>> separate, even if it has similarities to the XICS one or 2) truly
>> unify them, with a common base type and methods to handle the
>> differences.
>>
>>
>>> The next patch will introduce the MMIO handlers to interact with XIVE
>>> interrupt sources.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>>  hw/intc/xive.c        | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/hw/ppc/xive.h |  12 ++++++
>>>  2 files changed, 122 insertions(+)
>>>
>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>> index 5b14d8155317..9ff14c0da595 100644
>>> --- a/hw/intc/xive.c
>>> +++ b/hw/intc/xive.c
>>> @@ -26,6 +26,115 @@
>>>  
>>>  #include "xive-internal.h"
>>>  
>>> +static void xive_icp_irq(XiveICSState *xs, int lisn)
>>> +{
>>> +
>>> +}
>>> +
>>> +/*
>>> + * XIVE Interrupt Source
>>> + */
>>> +static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
>>> +{
>>> +    if (val) {
>>> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
>>> +    }
>>> +}
>>> +
>>> +static void xive_ics_set_irq_lsi(XiveICSState *xs, int srcno, int val)
>>> +{
>>> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
>>> +
>>> +    if (val) {
>>> +        irq->status |= XICS_STATUS_ASSERTED;
>>> +    } else {
>>> +        irq->status &= ~XICS_STATUS_ASSERTED;
>>> +    }
>>> +
>>> +    if (irq->status & XICS_STATUS_ASSERTED
>>> +        && !(irq->status & XICS_STATUS_SENT)) {
>>> +        irq->status |= XICS_STATUS_SENT;
>>> +        xive_icp_irq(xs, srcno + ICS_BASE(xs)->offset);
>>> +    }
>>> +}
>>> +
>>> +static void xive_ics_set_irq(void *opaque, int srcno, int val)
>>> +{
>>> +    XiveICSState *xs = ICS_XIVE(opaque);
>>> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
>>> +
>>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>>> +        xive_ics_set_irq_lsi(xs, srcno, val);
>>> +    } else {
>>> +        xive_ics_set_irq_msi(xs, srcno, val);
>>> +    }
>>> +}
>>
>> e.g. you have some code re-use, but still need to more-or-less
>> duplicate the set_irq code as above.
>>
>>> +static void xive_ics_reset(void *dev)
>>> +{
>>> +    ICSState *ics = ICS_BASE(dev);
>>> +    int i;
>>> +    uint8_t flags[ics->nr_irqs];
>>> +
>>> +    for (i = 0; i < ics->nr_irqs; i++) {
>>> +        flags[i] = ics->irqs[i].flags;
>>> +    }
>>> +
>>> +    memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
>>> +
>>> +    for (i = 0; i < ics->nr_irqs; i++) {
>>> +        ics->irqs[i].flags = flags[i];
>>> +    }
>>
>> This save, clear, restore is also kind ugly.  I'm also not sure why
>> this needs a reset method when I can't find one for the xics ICS.
>>
>> Does the xics irqstate structure really cover what you need for xive?
>> I had the impression elsewhere that xive had a different priority
>> model to xics.  And there's the xics pointer in the icsstate structure
>> which is definitely redundant.
>>
>>> +}
>>> +
>>> +static void xive_ics_realize(ICSState *ics, Error **errp)
>>> +{
>>> +    XiveICSState *xs = ICS_XIVE(ics);
>>> +    Object *obj;
>>> +    Error *err = NULL;
>>> +
>>> +    obj = object_property_get_link(OBJECT(xs), "xive", &err);
>>> +    if (!obj) {
>>> +        error_setg(errp, "%s: required link 'xive' not found: %s",
>>> +                   __func__, error_get_pretty(err));
>>> +        return;
>>> +    }
>>> +    xs->xive = XIVE(obj);
>>> +
>>> +    if (!ics->nr_irqs) {
>>> +        error_setg(errp, "Number of interrupts needs to be greater 0");
>>> +        return;
>>> +    }
>>> +
>>> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>>> +    ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
>>> +
>>> +    qemu_register_reset(xive_ics_reset, xs);
>>> +}
>>> +
>>> +static Property xive_ics_properties[] = {
>>> +    DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
>>> +    DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
>>> +    DEFINE_PROP_END_OF_LIST(),
>>> +};
>>> +
>>> +static void xive_ics_class_init(ObjectClass *klass, void *data)
>>> +{
>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>>> +    ICSStateClass *isc = ICS_BASE_CLASS(klass);
>>> +
>>> +    isc->realize = xive_ics_realize;
>>> +
>>> +    dc->props = xive_ics_properties;
>>> +}
>>> +
>>> +static const TypeInfo xive_ics_info = {
>>> +    .name = TYPE_ICS_XIVE,
>>> +    .parent = TYPE_ICS_BASE,
>>> +    .instance_size = sizeof(XiveICSState),
>>> +    .class_init = xive_ics_class_init,
>>> +};
>>> +
>>>  /*
>>>   * Main XIVE object
>>>   */
>>> @@ -123,6 +232,7 @@ static const TypeInfo xive_info = {
>>>  static void xive_register_types(void)
>>>  {
>>>      type_register_static(&xive_info);
>>> +    type_register_static(&xive_ics_info);
>>>  }
>>>  
>>>  type_init(xive_register_types)
>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>>> index 863f5a9c6b5f..544cc6e0c796 100644
>>> --- a/include/hw/ppc/xive.h
>>> +++ b/include/hw/ppc/xive.h
>>> @@ -19,9 +19,21 @@
>>>  #ifndef PPC_XIVE_H
>>>  #define PPC_XIVE_H
>>>  
>>> +#include "hw/ppc/xics.h"
>>> +
>>>  typedef struct XIVE XIVE;
>>> +typedef struct XiveICSState XiveICSState;
>>>  
>>>  #define TYPE_XIVE "xive"
>>>  #define XIVE(obj) OBJECT_CHECK(XIVE, (obj), TYPE_XIVE)
>>>  
>>> +#define TYPE_ICS_XIVE "xive-source"
>>> +#define ICS_XIVE(obj) OBJECT_CHECK(XiveICSState, (obj), TYPE_ICS_XIVE)
>>> +
>>> +struct XiveICSState {
>>> +    ICSState parent_obj;
>>> +
>>> +    XIVE         *xive;
>>> +};
>>
>>>  #endif /* PPC_XIVE_H */
>>
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-24  6:50   ` Alexey Kardashevskiy
@ 2017-07-24 15:39     ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 15:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson; +Cc: qemu-ppc, Alexander Graf, qemu-devel

On 07/24/2017 08:50 AM, Alexey Kardashevskiy wrote:
> On 06/07/17 03:13, Cédric Le Goater wrote:
>> Each interrupt source is associated with a 2-bit state machine called
>> an Event State Buffer (ESB). It is controlled by MMIO to trigger
>> events.
>>
>> See code for more details on the states.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c        | 230 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/xive.h |   3 +
>>  2 files changed, 233 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 9ff14c0da595..816031b8ac81 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -32,6 +32,226 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
>>  }
>>  
>>  /*
>> + * "magic" Event State Buffer (ESB) MMIO offsets.
>> + *
>> + * Each interrupt source has a 2-bit state machine called ESB
>> + * which can be controlled by MMIO. It's made of 2 bits, P and
>> + * Q. P indicates that an interrupt is pending (has been sent
>> + * to a queue and is waiting for an EOI). Q indicates that the
>> + * interrupt has been triggered while pending.
>> + *
>> + * This acts as a coalescing mechanism in order to guarantee
>> + * that a given interrupt only occurs at most once in a queue.
>> + *
>> + * When doing an EOI, the Q bit will indicate if the interrupt
>> + * needs to be re-triggered.
>> + *
>> + * The following offsets into the ESB MMIO allow to read or
>> + * manipulate the PQ bits. They must be used with an 8-bytes
>> + * load instruction. They all return the previous state of the
>> + * interrupt (atomically).
>> + *
>> + * Additionally, some ESB pages support doing an EOI via a
>> + * store at 0 and some ESBs support doing a trigger via a
>> + * separate trigger page.
>> + */
>> +#define XIVE_ESB_GET            0x800
>> +#define XIVE_ESB_SET_PQ_00      0xc00
>> +#define XIVE_ESB_SET_PQ_01      0xd00
>> +#define XIVE_ESB_SET_PQ_10      0xe00
>> +#define XIVE_ESB_SET_PQ_11      0xf00
>> +
>> +#define XIVE_ESB_VAL_P          0x2
>> +#define XIVE_ESB_VAL_Q          0x1
> 
> 
> These are not used. I'd suggest defining the states below using these two.

yes. I will add a VAL_PQ also.

> 
>> +
>> +#define XIVE_ESB_RESET          0x0
>> +#define XIVE_ESB_PENDING        0x2
>> +#define XIVE_ESB_QUEUED         0x3
>> +#define XIVE_ESB_OFF            0x1
>> +
>> +static uint8_t xive_pq_get(XIVE *x, uint32_t lisn)
>> +{
>> +    uint32_t idx = lisn;
>> +    uint32_t byte = idx / 4;
>> +    uint32_t bit  = (idx % 4) * 2;
>> +    uint8_t* pqs = (uint8_t *) x->sbe;
>> +
>> +    return (pqs[byte] >> bit) & 0x3;
>> +}
>> +
>> +static void xive_pq_set(XIVE *x, uint32_t lisn, uint8_t pq)
>> +{
>> +    uint32_t idx = lisn;
>> +    uint32_t byte = idx / 4;
>> +    uint32_t bit  = (idx % 4) * 2;
>> +    uint8_t* pqs = (uint8_t *) x->sbe;
>> +
>> +    pqs[byte] &= ~(0x3 << bit);
>> +    pqs[byte] |= (pq & 0x3) << bit;
>> +}
>> +
>> +static bool xive_pq_eoi(XIVE *x, uint32_t lisn)
> 
> 
> Should not it return uint8_t as well (like xive_pq_get() does)? The value
> than returned from .read() is uint64_t (a binary value).

Yes. The bool only reflects the state machine specs but we can 
change that in the code.   

Thanks,

C. 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-24  4:29   ` David Gibson
  2017-07-24  8:56     ` Benjamin Herrenschmidt
@ 2017-07-24 15:55     ` Cédric Le Goater
  2017-07-25 12:21       ` David Gibson
  1 sibling, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-24 15:55 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 06:29 AM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:20PM +0200, Cédric Le Goater wrote:
>> Each interrupt source is associated with a 2-bit state machine called
>> an Event State Buffer (ESB). It is controlled by MMIO to trigger
>> events.
>>
>> See code for more details on the states.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c        | 230 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/xive.h |   3 +
>>  2 files changed, 233 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 9ff14c0da595..816031b8ac81 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -32,6 +32,226 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
>>  }
>>  
>>  /*
>> + * "magic" Event State Buffer (ESB) MMIO offsets.
>> + *
>> + * Each interrupt source has a 2-bit state machine called ESB
>> + * which can be controlled by MMIO. It's made of 2 bits, P and
>> + * Q. P indicates that an interrupt is pending (has been sent
>> + * to a queue and is waiting for an EOI). Q indicates that the
>> + * interrupt has been triggered while pending.
>> + *
>> + * This acts as a coalescing mechanism in order to guarantee
>> + * that a given interrupt only occurs at most once in a queue.
>> + *
>> + * When doing an EOI, the Q bit will indicate if the interrupt
>> + * needs to be re-triggered.
>> + *
>> + * The following offsets into the ESB MMIO allow to read or
>> + * manipulate the PQ bits. They must be used with an 8-bytes
>> + * load instruction. They all return the previous state of the
>> + * interrupt (atomically).
>> + *
>> + * Additionally, some ESB pages support doing an EOI via a
>> + * store at 0 and some ESBs support doing a trigger via a
>> + * separate trigger page.
>> + */
>> +#define XIVE_ESB_GET            0x800
>> +#define XIVE_ESB_SET_PQ_00      0xc00
>> +#define XIVE_ESB_SET_PQ_01      0xd00
>> +#define XIVE_ESB_SET_PQ_10      0xe00
>> +#define XIVE_ESB_SET_PQ_11      0xf00
>> +
>> +#define XIVE_ESB_VAL_P          0x2
>> +#define XIVE_ESB_VAL_Q          0x1
>> +
>> +#define XIVE_ESB_RESET          0x0
>> +#define XIVE_ESB_PENDING        0x2
>> +#define XIVE_ESB_QUEUED         0x3
>> +#define XIVE_ESB_OFF            0x1
>> +
>> +static uint8_t xive_pq_get(XIVE *x, uint32_t lisn)
>> +{
>> +    uint32_t idx = lisn;
>> +    uint32_t byte = idx / 4;
>> +    uint32_t bit  = (idx % 4) * 2;
>> +    uint8_t* pqs = (uint8_t *) x->sbe;
>> +
>> +    return (pqs[byte] >> bit) & 0x3;
>> +}
>> +
>> +static void xive_pq_set(XIVE *x, uint32_t lisn, uint8_t pq)
>> +{
>> +    uint32_t idx = lisn;
>> +    uint32_t byte = idx / 4;
>> +    uint32_t bit  = (idx % 4) * 2;
>> +    uint8_t* pqs = (uint8_t *) x->sbe;
>> +
>> +    pqs[byte] &= ~(0x3 << bit);
>> +    pqs[byte] |= (pq & 0x3) << bit;
> 
> I know it probably amounts to the same thing given the context, but
> I'd be more comfortable with a temporary and an obviously atomic
> update than two writes to the real state variable.

yes. I will look better.

>> +}
>> +
>> +static bool xive_pq_eoi(XIVE *x, uint32_t lisn)
>> +{
>> +    uint8_t old_pq = xive_pq_get(x, lisn);
>> +
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        xive_pq_set(x, lisn, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_PENDING:
>> +        xive_pq_set(x, lisn, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_QUEUED:
>> +        xive_pq_set(x, lisn, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_OFF:
>> +        xive_pq_set(x, lisn, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +static bool xive_pq_trigger(XIVE *x, uint32_t lisn)
>> +{
>> +    uint8_t old_pq = xive_pq_get(x, lisn);
>> +
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        xive_pq_set(x, lisn, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_PENDING:
>> +        xive_pq_set(x, lisn, XIVE_ESB_QUEUED);
>> +        return true;
>> +    case XIVE_ESB_QUEUED:
>> +        xive_pq_set(x, lisn, XIVE_ESB_QUEUED);
>> +        return true;
>> +    case XIVE_ESB_OFF:
>> +        xive_pq_set(x, lisn, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source MMIOs
>> + */
>> +static void xive_ics_eoi(XiveICSState *xs, uint32_t srcno)
>> +{
>> +    ICSIRQState *irq = &ICS_BASE(xs)->irqs[srcno];
>> +
>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>> +        irq->status &= ~XICS_STATUS_SENT;
>> +    }
>> +}
>> +
>> +/* TODO: handle second page */
> 
> Is this comment still relevent?

Some HW have a second page to trigger the event. I am not sure we need 
to model it though. I will make some inquiries. 

>> +static uint64_t xive_esb_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +    XiveICSState *xs = ICS_XIVE(opaque);
>> +    XIVE *x = xs->xive;
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t srcno = addr >> xs->esb_shift;
>> +    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
>> +    XiveIVE *ive;
>> +    uint64_t ret = -1;
>> +
>> +    ive = xive_get_ive(x, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID))  {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
>> +        goto out;
>> +    }
>> +
>> +    if (srcno >= ICS_BASE(xs)->nr_irqs) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
>> +                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
>> +        goto out;
>> +    }
>> +
>> +    switch (offset) {
>> +    case 0:
>> +        xive_ics_eoi(xs, srcno);
>> +
>> +        /* return TRUE or FALSE depending on PQ value */
>> +        ret = xive_pq_eoi(x, lisn);
>> +        break;
>> +
>> +    case XIVE_ESB_GET:
>> +        ret = xive_pq_get(x, lisn);
>> +        break;
>> +
>> +    case XIVE_ESB_SET_PQ_00:
>> +    case XIVE_ESB_SET_PQ_01:
>> +    case XIVE_ESB_SET_PQ_10:
>> +    case XIVE_ESB_SET_PQ_11:
>> +        ret = xive_pq_get(x, lisn);
>> +        xive_pq_set(x, lisn, (offset >> 8) & 0x3);
> 
> Again I'd prefer xive_pq_set() return the old value itself, for more
> obvious atomicity.

yes. ok.

> 
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>> +static void xive_esb_write(void *opaque, hwaddr addr,
>> +                           uint64_t value, unsigned size)
>> +{
>> +    XiveICSState *xs = ICS_XIVE(opaque);
>> +    XIVE *x = xs->xive;
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t srcno = addr >> xs->esb_shift;
>> +    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
>> +    XiveIVE *ive;
>> +    bool notify = false;
>> +
>> +    ive = xive_get_ive(x, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID))  {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
>> +        return;
>> +    }
> 
> Having this code associated with the individual ICS look directly at
> the IVE table in the core xive object seems a bit dubious.

The IVE table holds the validity and mask status of the interrupt 
entries, so we need that lookup. However, (continues below) ...

> This also
> points out another mismatch between the re-used ICS code and the new
> XIVE code: ICS gathers all the per-source-irq flags/state into the
> irqstate structure, whereas xive has per-irq information in the
> centralized ecb and IVE tables.  There can certainly be good reasons
> for that, but using both at once is kind of clunky.

I understand that you would rather put the esbs in the source they 
belong to. That is the case on real HW but it makes the modeling a 
bit more difficult. We would need to choose a MMIO address to give 
to the guest OS. I had some issues with the allocator (I need 
to look at this problem closer).

It might also be an "issue" for KVM. Ben talked about maintaining 
all the esbs of a guest under a single memory region to be able to 
map the pages in the host.

Any how, I agree this is another point to discuss in the sPAPR 
model.

Thanks,

C. 


>> +    if (srcno >= ICS_BASE(xs)->nr_irqs) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
>> +                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
>> +        return;
>> +    }
>> +
>> +    switch (offset) {
>> +    case 0:
>> +        /* TODO: should we trigger even if the IVE is masked ? */
>> +        notify = xive_pq_trigger(x, lisn);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
>> +                      offset);
>> +        return;
>> +    }
>> +
>> +    if (notify && !(ive->w & IVE_MASKED)) {
>> +        qemu_irq_pulse(ICS_BASE(xs)->qirqs[srcno]);
>> +    }
>> +}
>> +
>> +static const MemoryRegionOps xive_esb_ops = {
>> +    .read = xive_esb_read,
>> +    .write = xive_esb_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +    .valid = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +    .impl = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>> +/*
>>   * XIVE Interrupt Source
>>   */
>>  static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
>> @@ -106,15 +326,25 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>          return;
>>      }
>>  
>> +    if (!xs->esb_shift) {
>> +        error_setg(errp, "ESB page size needs to be greater 0");
>> +        return;
>> +    }
>> +
>>      ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>>      ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
>>  
>> +    memory_region_init_io(&xs->esb_iomem, OBJECT(xs), &xive_esb_ops, xs,
>> +                          "xive.esb",
>> +                          (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
>> +
>>      qemu_register_reset(xive_ics_reset, xs);
>>  }
>>  
>>  static Property xive_ics_properties[] = {
>>      DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
>>      DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
>> +    DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>>  
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 544cc6e0c796..5303d96f5f59 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -33,6 +33,9 @@ typedef struct XiveICSState XiveICSState;
>>  struct XiveICSState {
>>      ICSState parent_obj;
>>  
>> +    uint32_t     esb_shift;
>> +    MemoryRegion esb_iomem;
>> +
>>      XIVE         *xive;
>>  };
>>  
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-24 13:00     ` Cédric Le Goater
@ 2017-07-25  1:26       ` Alexey Kardashevskiy
  2017-07-25  2:17         ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-25  1:26 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On 24/07/17 23:00, Cédric Le Goater wrote:
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/log.h"
>>> +#include "qapi/error.h"
>>> +#include "target/ppc/cpu.h"
>>> +#include "sysemu/cpus.h"
>>> +#include "sysemu/dma.h"
>>> +#include "monitor/monitor.h"
>>> +#include "hw/ppc/xive.h"
>>> +
>>> +#include "xive-internal.h"
>>> +
>>> +/*
>>> + * Main XIVE object
>>
>> As with XICs, does it really make sense for there to be a "main" XIVE
>> object, or should be an interface attached to the machine?
> 
> yes. There are internal tables which are very specific to the controller 
> and I don't think they belong to the machine.

These tables belong to a CPU chip (die?) and we do not emulate these now
(machines and cores are the closest) and since we do not want (do we?) to
treat a core as a chip, the machine is the most obvious owner for these tables.


-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 17/26] ppc/xive: add hcalls support
  2017-07-24 14:55     ` Cédric Le Goater
@ 2017-07-25  2:09       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-25  2:09 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On 25/07/17 00:55, Cédric Le Goater wrote:
> On 07/24/2017 11:39 AM, Alexey Kardashevskiy wrote:
>> On 06/07/17 03:13, Cédric Le Goater wrote:
>>> A set of Hypervisor's call are used to configure the interrupt sources
>>> and the event/notification queues of the guest:
>>>
>>>    H_INT_GET_SOURCE_INFO
>>>    H_INT_SET_SOURCE_CONFIG
>>>    H_INT_GET_SOURCE_CONFIG
>>>    H_INT_GET_QUEUE_INFO
>>>    H_INT_SET_QUEUE_CONFIG
>>>    H_INT_GET_QUEUE_CONFIG
>>>    H_INT_RESET
>>>    H_INT_ESB
>>>
>>> Calls that still need to be addressed :
>>>
>>>    H_INT_SET_OS_REPORTING_LINE
>>>    H_INT_GET_OS_REPORTING_LINE
>>>    H_INT_SYNC
>>>
>>> See below for the documentation on each hcall.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---

[...]

>>
>> R6 is missing but you added it in your github tree so never mind :)
>>
> 
> Yes. I have updated the hcalls in my github tree with some fixes and 
> also some small recent changes in the specs.


This was rather a note for other reviewers if they read the specs and find
mismatches (18/26 is also updated), really :)


-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables
  2017-07-24 12:52     ` Cédric Le Goater
@ 2017-07-25  2:16       ` David Gibson
  2017-07-25 15:54         ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25  2:16 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11355 bytes --]

On Mon, Jul 24, 2017 at 02:52:29PM +0200, Cédric Le Goater wrote:
> On 07/19/2017 05:24 AM, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:18PM +0200, Cédric Le Goater wrote:
> >> The XIVE interrupt controller of the POWER9 uses a set of tables to
> >> redirect exception from event sources to CPU threads. Among which we
> >> choose to model :
> >>
> >>  - the State Bit Entries (SBE), also known as Event State Buffer
> >>    (ESB). This is a two bit state machine for each event source which
> >>    is used to trigger events. The bits are named "P" (pending) and "Q"
> >>    (queued) and can be controlled by MMIO.
> >>
> >>  - the Interrupt Virtualization Entry (IVE) table, also known as Event
> >>    Assignment Structure (EAS). This table is indexed by the IRQ number
> >>    and is looked up to find the Event Queue associated with a
> >>    triggered event.
> >>
> >>  - the Event Queue Descriptor (EQD) table, also known as Event
> >>    Notification Descriptor (END). The EQD contains fields that specify
> >>    the Event Queue on which event data is posted (and later pulled by
> >>    the OS) and also a target (or VPD) to notify.
> >>
> >> An additional table was not modeled but we might need to to support
> >> the H_INT_SET_OS_REPORTING_LINE hcall:
> >>
> >>  - the Virtual Processor Descriptor (VPD) table, also known as
> >>    Notification Virtual Target (NVT).
> >>
> >> The XIVE object is expanded with the tables described above. The size
> >> of each table depends on the number of provisioned IRQ and the maximum
> >> number of CPUs in the system. The indexing is very basic and might
> >> need to be improved for the EQs.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/xive-internal.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>  hw/intc/xive.c          | 72 +++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 167 insertions(+)
> >>
> >> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> >> index 155c2dcd6066..8e755aa88a14 100644
> >> --- a/hw/intc/xive-internal.h
> >> +++ b/hw/intc/xive-internal.h
> >> @@ -11,6 +11,89 @@
> >>  
> >>  #include <hw/sysbus.h>
> >>  
> >> +/* Utilities to manipulate these (originaly from OPAL) */
> >> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
> >> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
> >> +#define SETFIELD(m, v, val)                             \
> >> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
> >> +
> >> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
> >> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
> >> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
> >> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
> >> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
> >> +                                 PPC_BIT32(bs))
> >> +
> >> +/* IVE/EAS
> >> + *
> >> + * One per interrupt source. Targets that interrupt to a given EQ
> >> + * and provides the corresponding logical interrupt number (EQ data)
> >> + *
> >> + * We also map this structure to the escalation descriptor inside
> >> + * an EQ, though in that case the valid and masked bits are not used.
> >> + */
> >> +typedef struct XiveIVE {
> >> +        /* Use a single 64-bit definition to make it easier to
> >> +         * perform atomic updates
> >> +         */
> >> +        uint64_t        w;
> >> +#define IVE_VALID       PPC_BIT(0)
> >> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
> >> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
> >> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
> >> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> >> +} XiveIVE;
> >> +
> >> +/* EQ */
> >> +typedef struct XiveEQ {
> >> +        uint32_t        w0;
> >> +#define EQ_W0_VALID             PPC_BIT32(0)
> >> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
> >> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
> >> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
> >> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
> >> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
> >> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
> >> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
> >> +#define EQ_W0_SW0               PPC_BIT32(16)
> >> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
> >> +#define EQ_QSIZE_4K             0
> >> +#define EQ_QSIZE_64K            4
> >> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
> >> +        uint32_t        w1;
> >> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
> >> +#define EQ_W1_ESn_P             PPC_BIT32(0)
> >> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
> >> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
> >> +#define EQ_W1_ESe_P             PPC_BIT32(2)
> >> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
> >> +#define EQ_W1_GENERATION        PPC_BIT32(9)
> >> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> >> +        uint32_t        w2;
> >> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> >> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> >> +        uint32_t        w3;
> >> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> >> +        uint32_t        w4;
> >> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
> >> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
> >> +        uint32_t        w5;
> >> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
> >> +        uint32_t        w6;
> >> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
> >> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> >> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> >> +        uint32_t        w7;
> >> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
> >> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> >> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> >> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
> >> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> >> +} XiveEQ;
> >> +
> >> +#define XIVE_EQ_PRIORITY_COUNT 8
> >> +#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
> >> +
> >>  struct XIVE {
> >>      SysBusDevice parent;
> >>  
> >> @@ -23,6 +106,18 @@ struct XIVE {
> >>      uint32_t     int_max;       /* Max index */
> >>      uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
> >>      uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
> >> +
> >> +    /* XIVE internal tables */
> >> +    void         *sbe;
> >> +    XiveIVE      *ivt;
> >> +    XiveEQ       *eqdt;
> >>  };
> >>  
> >> +void xive_reset(void *dev);
> >> +XiveIVE *xive_get_ive(XIVE *x, uint32_t isn);
> >> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx);
> >> +
> >> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t prio,
> >> +                        uint32_t *out_eq_idx);
> >> +
> >>  #endif /* _INTC_XIVE_INTERNAL_H */
> >> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >> index 5b4ea915d87c..5b14d8155317 100644
> >> --- a/hw/intc/xive.c
> >> +++ b/hw/intc/xive.c
> >> @@ -35,6 +35,27 @@
> >>   */
> >>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
> >>  
> >> +
> >> +void xive_reset(void *dev)
> >> +{
> >> +    XIVE *x = XIVE(dev);
> >> +    int i;
> >> +
> >> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
> >> +    memset(x->sbe, 0x55, x->int_count / 4);
> > 
> > I think strictly this should be a DIV_ROUND_UP to handle the case of
> > int_count not a multiple of 4.
> 
> ok. 
>  
> >> +
> >> +    /* Clear and mask all valid IVEs */
> >> +    for (i = x->int_base; i < x->int_max; i++) {
> >> +        XiveIVE *ive = &x->ivt[i];
> >> +        if (ive->w & IVE_VALID) {
> >> +            ive->w = IVE_VALID | IVE_MASKED;
> >> +        }
> >> +    }
> >> +
> >> +    /* clear all EQs */
> >> +    memset(x->eqdt, 0, x->nr_targets * XIVE_EQ_PRIORITY_COUNT * sizeof(XiveEQ));
> >> +}
> >> +
> >>  static void xive_init(Object *obj)
> >>  {
> >>      ;
> >> @@ -62,6 +83,19 @@ static void xive_realize(DeviceState *dev, Error **errp)
> >>      if (x->int_ipi_top < 0x10) {
> >>          x->int_ipi_top = 0x10;
> >>      }
> >> +
> >> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
> >> +    x->sbe = g_malloc0(x->int_count / 4);
> > 
> > And here as well.
> 
> yes.
> 
> >> +
> >> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> >> +    x->ivt = g_malloc0(x->int_count * sizeof(XiveIVE));
> >> +
> >> +    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
> >> +     * for each thread in the system */
> >> +    x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
> >> +                        sizeof(XiveEQ));
> >> +
> >> +    qemu_register_reset(xive_reset, dev);
> >>  }
> >>  
> >>  static Property xive_properties[] = {
> >> @@ -92,3 +126,41 @@ static void xive_register_types(void)
> >>  }
> >>  
> >>  type_init(xive_register_types)
> >> +
> >> +XiveIVE *xive_get_ive(XIVE *x, uint32_t lisn)
> >> +{
> >> +    uint32_t idx = lisn;
> >> +
> >> +    if (idx < x->int_base || idx >= x->int_max) {
> >> +        return NULL;
> >> +    }
> >> +
> >> +    return &x->ivt[idx];
> > 
> > Should be idx - int_base, no?
> 
> no, not in the allocator model I have chosen. The IRQ numbers 
> are exposed to the guest with their offset. But this is another 
> discussion which I would rather continue in another thread. 

Uh.. but you're using idx to index IVT directly, after verifying that
it lies between int_base and int_max.  AFAICT IVT is only allocated
with int_max - int_base entries, so without an offset here you'll
overrun it, won't you?

> >> +}
> >> +
> >> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx)
> >> +{
> >> +    if (idx >= x->nr_targets * XIVE_EQ_PRIORITY_COUNT) {
> >> +        return NULL;
> >> +    }
> >> +
> >> +    return &x->eqdt[idx];
> >> +}
> >> +
> >> +/* TODO: improve EQ indexing. This is very simple and relies on the
> >> + * fact that target (CPU) numbers start at 0 and are contiguous. It
> >> + * should be OK for sPAPR.
> >> + */
> >> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
> >> +                        uint32_t *out_eq_idx)
> >> +{
> >> +    if (priority > XIVE_PRIORITY_MAX || target >= x->nr_targets) {
> >> +        return false;
> >> +    }
> >> +
> >> +    if (out_eq_idx) {
> >> +        *out_eq_idx = target + priority;
> >> +    }
> >> +
> >> +    return true;
> > 
> > Seems a clunky interface.  Why not return a XiveEQ *, NULL if the
> > inputs aren't valud.
> 
> Yes. This interface is inherited from OPAL and it's not consistent 
> with the other xive_get_*() routines. But we are missing a XIVE 
> internal table for VPs which explains the difference. I need to look 
> at the support of the OS_REPORTING_LINE hcalls before simplifying.
> 
> Thanks,
> 
> C. 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-25  1:26       ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
@ 2017-07-25  2:17         ` David Gibson
  0 siblings, 0 replies; 122+ messages in thread
From: David Gibson @ 2017-07-25  2:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]

On Tue, Jul 25, 2017 at 11:26:13AM +1000, Alexey Kardashevskiy wrote:
> On 24/07/17 23:00, Cédric Le Goater wrote:
> >>> +#include "qemu/osdep.h"
> >>> +#include "qemu/log.h"
> >>> +#include "qapi/error.h"
> >>> +#include "target/ppc/cpu.h"
> >>> +#include "sysemu/cpus.h"
> >>> +#include "sysemu/dma.h"
> >>> +#include "monitor/monitor.h"
> >>> +#include "hw/ppc/xive.h"
> >>> +
> >>> +#include "xive-internal.h"
> >>> +
> >>> +/*
> >>> + * Main XIVE object
> >>
> >> As with XICs, does it really make sense for there to be a "main" XIVE
> >> object, or should be an interface attached to the machine?
> > 
> > yes. There are internal tables which are very specific to the controller 
> > and I don't think they belong to the machine.
> 
> These tables belong to a CPU chip (die?) and we do not emulate these now
> (machines and cores are the closest) and since we do not want (do we?) to
> treat a core as a chip, the machine is the most obvious owner for these tables.

No, I think it's reasonable for them to be owned by a XIVE object
under the machine.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-24 13:25       ` Cédric Le Goater
@ 2017-07-25  2:19         ` David Gibson
  2017-07-25  9:50           ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25  2:19 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7566 bytes --]

On Mon, Jul 24, 2017 at 03:25:29PM +0200, Cédric Le Goater wrote:
> On 07/24/2017 08:09 AM, Benjamin Herrenschmidt wrote:
> > On Mon, 2017-07-24 at 14:49 +1000, David Gibson wrote:
> >> On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
> >>> Each source adds its own ESB mempry region to the overall ESB memory
> >>> region of the controller. It will be mapped in the CPU address space
> >>> when XIVE is activated.
> >>>
> >>> The default mapping address for the ESB memory region is the same one
> >>> used on baremetal.
> >>>
> >>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>> ---
> >>>  hw/intc/xive-internal.h |  5 +++++
> >>>  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
> >>>  2 files changed, 48 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> >>> index 8e755aa88a14..c06be823aad0 100644
> >>> --- a/hw/intc/xive-internal.h
> >>> +++ b/hw/intc/xive-internal.h
> >>> @@ -98,6 +98,7 @@ struct XIVE {
> >>>      SysBusDevice parent;
> >>>  
> >>>      /* Properties */
> >>> +    uint32_t     chip_id;
> >>
> >> So there is a XIVE object per chip.  How does this work on PAPR?  One
> >> logical chip/XIVE, or something more complex?
> > 
> > One global XIVE for PAPR. 
> 
> Yes. 
> 
> The chip-id is useless for sPAPR (0 is the default) but for a PowerNV
> system, the address used to map the ESB memory region depends on the 
> chip-id and  I thought we could reuse the same XIVE object. 

Hmm, maybe.

> So, a sPAPR guest would use the address of a single chip baremetal 
> system. This needs more explanation I agree. Thanks to Ben who is 
> providing a lot. I will update the changelogs in the next version. 

> The TIMA is mapped at a fixed address so the chip-id does not come 
> in play.
> 
> > For the MMIOs, the way it works is that:
> > 
> >  - For MMIOs pertaining to a specific interrupt or queue, there's an H-
> > call that will return the proper "guest physical" address. For qemu
> > with KVM we'll have to probably create a single chunk of qemu address
> > space (a single mem region) that contains individual pages mapped with
> > MAP_FIXED originating from the different HW bits, we still need to sort
> > out how exactly we'll do that in practice.
> 
> I haven't looked at all the KVM details. But, regarding the ESBs, I had
> the above in mind and used a single memory region to contain them all. 
>  
> >  - For the TIMA (the presentation MMIOs), those are always at the same
> > physical address for everybody (so for a guest it's a single memory
> > region we'll map to that physical address), the HW "knows" which HW
> > thread is talking to it (and the hypervisor tells the HW which vcpu is
> > running on a given HW thread at a given point in time). That address is
> > obtained from the device-tree
> > 
> >>>      uint32_t     nr_targets;
> >>>  
> >>>      /* IRQ number allocator */
> >>> @@ -111,6 +112,10 @@ struct XIVE {
> >>>      void         *sbe;
> >>>      XiveIVE      *ivt;
> >>>      XiveEQ       *eqdt;
> >>> +
> >>> +    /* ESB and TIMA memory location */
> >>> +    hwaddr       vc_base;
> >>> +    MemoryRegion esb_iomem;
> >>>  };
> >>>  
> >>>  void xive_reset(void *dev);
> >>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >>> index 8f8bb8b787bd..a1cb87a07b76 100644
> >>> --- a/hw/intc/xive.c
> >>> +++ b/hw/intc/xive.c
> >>> @@ -312,6 +312,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
> >>>      XiveICSState *xs = ICS_XIVE(ics);
> >>>      Object *obj;
> >>>      Error *err = NULL;
> >>> +    XIVE *x;
> >>
> >> I don't really like just 'x' for a context variable like this (as
> >> opposed to a temporary).
> 
> OK. I will change 'x' in 'xive' then.
> 
> >>>  
> >>>      obj = object_property_get_link(OBJECT(xs), "xive", &err);
> >>>      if (!obj) {
> >>> @@ -319,7 +320,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
> >>>                     __func__, error_get_pretty(err));
> >>>          return;
> >>>      }
> >>> -    xs->xive = XIVE(obj);
> >>> +    x = xs->xive = XIVE(obj);
> >>>  
> >>>      if (!ics->nr_irqs) {
> >>>          error_setg(errp, "Number of interrupts needs to be greater 0");
> >>> @@ -338,6 +339,11 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
> >>>                            "xive.esb",
> >>>                            (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
> >>>  
> >>> +    /* Install the ESB memory region in the overall one */
> >>> +    memory_region_add_subregion(&x->esb_iomem,
> >>> +                                ICS_BASE(xs)->offset * (1 << xs->esb_shift),
> >>> +                                &xs->esb_iomem);
> >>> +
> >>>      qemu_register_reset(xive_ics_reset, xs);
> >>>  }
> >>>  
> >>> @@ -375,6 +381,32 @@ static const TypeInfo xive_ics_info = {
> >>>   */
> >>>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
> >>>  
> >>> +/* VC BAR contains set translations for the ESBs and the EQs. */
> >>> +#define VC_BAR_DEFAULT   0x10000000000ull
> >>> +#define VC_BAR_SIZE      0x08000000000ull
> >>> +
> >>> +#define P9_MMIO_BASE     0x006000000000000ull
> >>> +#define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
> >>
> >> chip-based MMIO addresses leaking into the PAPR model seems like it
> >> might not be what you want
> 
> See above for the reason.
> 
> 
> Thanks,
> 
> C. 
> 
> >>
> >>> +static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
> >>> +{
> >>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> >>> +                  __func__, offset, size);
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static void xive_esb_default_write(void *opaque, hwaddr offset, uint64_t value,
> >>> +                unsigned size)
> >>> +{
> >>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> >>> +                  __func__, offset, value, size);
> >>> +}
> >>> +
> >>> +static const MemoryRegionOps xive_esb_default_ops = {
> >>> +    .read = xive_esb_default_read,
> >>> +    .write = xive_esb_default_write,
> >>> +    .endianness = DEVICE_BIG_ENDIAN,
> >>> +};
> >>>  
> >>>  void xive_reset(void *dev)
> >>>  {
> >>> @@ -435,10 +467,20 @@ static void xive_realize(DeviceState *dev, Error **errp)
> >>>      x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
> >>>                          sizeof(XiveEQ));
> >>>  
> >>> +    /* VC BAR. That's the full window but we will only map the
> >>> +     * subregions in use. */
> >>> +    x->vc_base = (hwaddr)(P9_CHIP_BASE(x->chip_id) | VC_BAR_DEFAULT);
> >>> +
> >>> +    /* install default memory region handlers to log bogus access */
> >>> +    memory_region_init_io(&x->esb_iomem, NULL, &xive_esb_default_ops,
> >>> +                          NULL, "xive.esb", VC_BAR_SIZE);
> >>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
> >>> +
> >>>      qemu_register_reset(xive_reset, dev);
> >>>  }
> >>>  
> >>>  static Property xive_properties[] = {
> >>> +    DEFINE_PROP_UINT32("chip-id", XIVE, chip_id, 0),
> >>>      DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
> >>>      DEFINE_PROP_END_OF_LIST(),
> >>>  };
> >>
> >>
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-24 13:27         ` Cédric Le Goater
@ 2017-07-25  2:19           ` David Gibson
  0 siblings, 0 replies; 122+ messages in thread
From: David Gibson @ 2017-07-25  2:19 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

On Mon, Jul 24, 2017 at 03:27:18PM +0200, Cédric Le Goater wrote:
> On 07/24/2017 08:39 AM, David Gibson wrote:
> > On Mon, Jul 24, 2017 at 04:09:31PM +1000, Benjamin Herrenschmidt wrote:
> >> On Mon, 2017-07-24 at 14:49 +1000, David Gibson wrote:
> >>> On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
> >>>> Each source adds its own ESB mempry region to the overall ESB memory
> >>>> region of the controller. It will be mapped in the CPU address space
> >>>> when XIVE is activated.
> >>>>
> >>>> The default mapping address for the ESB memory region is the same one
> >>>> used on baremetal.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  hw/intc/xive-internal.h |  5 +++++
> >>>>  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
> >>>>  2 files changed, 48 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> >>>> index 8e755aa88a14..c06be823aad0 100644
> >>>> --- a/hw/intc/xive-internal.h
> >>>> +++ b/hw/intc/xive-internal.h
> >>>> @@ -98,6 +98,7 @@ struct XIVE {
> >>>>      SysBusDevice parent;
> >>>>  
> >>>>      /* Properties */
> >>>> +    uint32_t     chip_id;
> >>>
> >>> So there is a XIVE object per chip.  How does this work on PAPR?  One
> >>> logical chip/XIVE, or something more complex?
> >>
> >> One global XIVE for PAPR. For the MMIOs, the way it works is that:
> >>
> >>  - For MMIOs pertaining to a specific interrupt or queue, there's an H-
> >> call that will return the proper "guest physical" address. For qemu
> >> with KVM we'll have to probably create a single chunk of qemu address
> >> space (a single mem region) that contains individual pages mapped with
> >> MAP_FIXED originating from the different HW bits, we still need to sort
> >> out how exactly we'll do that in practice.
> >>
> >>  - For the TIMA (the presentation MMIOs), those are always at the same
> >> physical address for everybody (so for a guest it's a single memory
> >> region we'll map to that physical address), the HW "knows" which HW
> >> thread is talking to it (and the hypervisor tells the HW which vcpu is
> >> running on a given HW thread at a given point in time). That address is
> >> obtained from the device-tree
> > 
> > Ok.  That leaves "chip_id" as a rather surprising thing to see in an
> > object which will appear on PAPR.
> 
> We could also pass the address as a property instead of the chip-id when
> creating the XIVE object. May be better for sPAPR.

Yes, I think that sounds like a much better option.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 26/26] spapr: force XIVE exploitation mode for POWER9 (HACK)
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 26/26] spapr: force XIVE exploitation mode for POWER9 (HACK) Cédric Le Goater
@ 2017-07-25  2:43   ` Alexey Kardashevskiy
  2017-07-25  9:20     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-25  2:43 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On 06/07/17 03:13, Cédric Le Goater wrote:
> The CAS negotiation process determines the interrupt controller model
> to use in the guest but currently, the sPAPR machine make uses of the
> controller very early in the initialization sequence. The interrupt
> source is used to allocate IRQ numbers and populate the device tree
> and the interrupt presenter objects are created along with the CPU.
> 
> One solution would be use a bitmap to allocate these IRQ numbers and
> then instantiate the interrupt source object of the correct type with
> the bitmap as a constructor parameter.
> 
> As for the interrupt presenter objects, we could allocated them later
> in the boot process. May be on demand, when a CPU is first notified.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ca3a6bc2ea16..623fc776c886 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -237,6 +237,38 @@ error:
>      return NULL;
>  }
>  
> +static XiveICSState *spapr_xive_ics_create(XIVE *x, int nr_irqs, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    int irq_base;
> +    Object *obj;
> +
> +    /*
> +     * TODO: use an XICS_IRQ_BASE alignment to be in sync with XICS
> +     * irq numbers. we should probably simplify the XIVE model or use
> +     * a common allocator. a bitmap maybe ?
> +     */
> +    irq_base = xive_alloc_hw_irqs(x, nr_irqs, XICS_IRQ_BASE);
> +    if (irq_base < 0) {
> +        error_setg(errp, "Failed to allocate %d irqs", nr_irqs);
> +        return NULL;
> +    }
> +
> +    obj = object_new(TYPE_ICS_XIVE);
> +    object_property_add_child(OBJECT(x), "hw", obj, NULL);
> +
> +    xive_ics_create(ICS_XIVE(obj), x, irq_base, nr_irqs, 16 /* 64KB page */,
> +                    XIVE_SRC_TRIGGER, &local_err);
> +    if (local_err) {
> +        goto error;
> +    }
> +    return ICS_XIVE(obj);
> +
> +error:
> +    error_propagate(errp, local_err);
> +    return NULL;
> +}
> +
>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>                                    int smt_threads)
>  {
> @@ -814,6 +846,11 @@ static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
>      /* /interrupt controller */
>      if (!spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)) {
>          spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
> +    } else {
> +        xive_spapr_populate(spapr->xive, fdt);
> +
> +        /* Install XIVE MMIOs */
> +        xive_mmio_map(spapr->xive);


xive_mmio_map() could be called where sysbus_init_mmio() is called as once
these are mmap'ed, they are never unmapped and tm_base/vc_base never
change. And XIVE is always created on P9 anyway.



>      }
>  
>      offset = fdt_path_offset(fdt, "/chosen");
> @@ -963,6 +1000,13 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
>          } else {
>              val[3] = 0x00; /* Hash */
>          }
> +
> +        /* TODO: introduce a kvmppc_has_cap_xive() ? Works with


Yes.

> +         * irqchip=off for now
> +         */
> +        if (first_ppc_cpu->env.excp_model & POWERPC_EXCP_POWER9) {
> +            val[1] = 0x01;
> +        }
>      } else {
>          if (first_ppc_cpu->env.mmu_model & POWERPC_MMU_V3) {
>              /* V3 MMU supports both hash and radix (with dynamic switching) */
> @@ -971,6 +1015,9 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
>              /* Otherwise we can only do hash */
>              val[3] = 0x00;
>          }
> +        if (first_ppc_cpu->env.excp_model & POWERPC_EXCP_POWER9) {
> +            val[1] = 0x01;
> +        }
>      }
>      _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
>                       val, sizeof(val)));
> @@ -2237,6 +2284,21 @@ static void ppc_spapr_init(MachineState *machine)
>      spapr->ov5 = spapr_ovec_new();
>      spapr->ov5_cas = spapr_ovec_new();
>  
> +    /* TODO: force XIVE mode by default on POWER9.
> +     *
> +     * Switching from XICS to XIVE is badly broken. The ICP type is
> +     * incorrect and the ICS is needed before the CAS negotiation to
> +     * allocate irq numbers ...
> +     */
> +    if (strstr(machine->cpu_model, "POWER9") ||
> +        !strcmp(machine->cpu_model, "host")) {
> +        spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
> +
> +        spapr->icp_type = TYPE_XIVE_ICP;
> +        spapr->ics = ICS_BASE(
> +            spapr_xive_ics_create(spapr->xive, XICS_IRQS_SPAPR, &error_fatal));
> +    }
> +
>      if (smc->dr_lmb_enabled) {
>          spapr_ovec_set(spapr->ov5, OV5_DRCONF_MEMORY);
>          spapr_validate_node_memory(machine, &error_fatal);
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions Cédric Le Goater
@ 2017-07-25  2:54   ` Alexey Kardashevskiy
  2017-07-25  9:18     ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-25  2:54 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On 06/07/17 03:13, Cédric Le Goater wrote:
> It will be used when the guest chooses the XIVE exploitation mode in
> CAS.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive.c        | 11 +++++++++++
>  include/hw/ppc/xive.h |  2 ++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index cda1fa18e44d..895dd2b2f61b 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -915,3 +915,14 @@ bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
>  
>      return true;
>  }
> +
> +void xive_mmio_map(XIVE *x)
> +{
> +    /* ESBs */
> +    sysbus_mmio_map(SYS_BUS_DEVICE(x), 0, x->vc_base);
> +
> +    /* Thread Management Interrupt Areas */
> +    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
> +     * region needs some rework in the handlers */
> +    sysbus_mmio_map(SYS_BUS_DEVICE(x), 1, x->tm_base + (1 << x->tm_shift));
> +}


imho it makes more sense to squash such small patches (this one and 20/26,
21/26) into those which actually make use of the new helpers - easier to
review, better for bisectability.


> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 288116aeb8f4..560f6ab66f73 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -68,4 +68,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>  void xive_spapr_init(sPAPRMachineState *spapr);
>  void xive_spapr_populate(XIVE *x, void *fdt);
>  
> +void xive_mmio_map(XIVE *x);
> +
>  #endif /* PPC_XIVE_H */
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model
  2017-07-24 15:20       ` Cédric Le Goater
@ 2017-07-25  3:06         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-25  3:06 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, Alexander Graf, qemu-devel

On 25/07/17 01:20, Cédric Le Goater wrote:
> On 07/24/2017 08:00 AM, Alexey Kardashevskiy wrote:
>> On 24/07/17 14:02, David Gibson wrote:
>>> On Wed, Jul 05, 2017 at 07:13:19PM +0200, Cédric Le Goater wrote:
>>>> This is very similar to the current ICS_SIMPLE model in XICS. We try
>>>> to reuse the ICS model because the sPAPR machine is tied to the
>>>> XICSFabric interface and should be using a common framework to switch
>>>> from one controller model to another: XICS <-> XIVE.
>>>
>>> Hm.  I'm not entirely concvinced re-using the xics ICSState class in
>>> this way is a good idea, though maybe it's a reasonable first step.
>>> With this patch alone some code is shared, but there are some real
>>> uglies around the edges.
>>
>>
>> Agree, using the "ICS" term in XIVE is quite confusing as "ICS" is not
>> mentioned in neither XIVE nor P9 specs.
> 
> Indeed. 
> 
> The XIVE specs mention Source Controller (P3SC) or Interrupt 
> Virtualization Source Engine (IVSE). The sPAPR specs use 
> Interrupt Source a lot.
> 
> Let's unify them all under one name ? I propose ICS :)


Too late because it is a part of XICS :) Ben calls them "source
controller", seems appropriate.


-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-24 11:07         ` Benjamin Herrenschmidt
  2017-07-24 11:47           ` Cédric Le Goater
@ 2017-07-25  4:18           ` David Gibson
  2017-07-25  5:47             ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25  4:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]

On Mon, Jul 24, 2017 at 09:07:19PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 19:50 +1000, David Gibson wrote:
> > On Mon, Jul 24, 2017 at 05:00:57PM +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2017-07-24 at 14:36 +1000, David Gibson wrote:
> > > > On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
> > > > > These flags define some characteristics of the source :
> > > > > 
> > > > >  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
> > > > >                        specific hcall H_INT_ESB
> > > > 
> > > > What's the other option?
> > > 
> > > Direct MMIO access. Normally all interrupts use normal MMIOs,
> > > each interrupts has an associated MMIO page with special MMIOs
> > > to control the source state (PQ bits). This is something I added
> > > to the PAPR spec (and the OPAL <-> Linux interface) to allow firmware
> > > to work around broken HW (which happens on some P9 versions).
> > 
> > Ok.. and that's something that can be decided at runtime?
> 
> Well, at this point I think nothing will set that flag.... It's there
> for workaround around HW bugs on some chips. At least in full emu it
> shouldn't happen unless we try to emulate those bugs. Hopefully direct
> MMIO will just work.

Hm.  That doesn't seem like a good match for a per-irq state
structure.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-24 11:47           ` Cédric Le Goater
@ 2017-07-25  4:19             ` David Gibson
  2017-07-25  5:49               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25  4:19 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

On Mon, Jul 24, 2017 at 01:47:28PM +0200, Cédric Le Goater wrote:
> On 07/24/2017 01:07 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2017-07-24 at 19:50 +1000, David Gibson wrote:
> >> On Mon, Jul 24, 2017 at 05:00:57PM +1000, Benjamin Herrenschmidt wrote:
> >>> On Mon, 2017-07-24 at 14:36 +1000, David Gibson wrote:
> >>>> On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
> >>>>> These flags define some characteristics of the source :
> >>>>>
> >>>>>  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
> >>>>>                        specific hcall H_INT_ESB
> >>>>
> >>>> What's the other option?
> >>>
> >>> Direct MMIO access. Normally all interrupts use normal MMIOs,
> >>> each interrupts has an associated MMIO page with special MMIOs
> >>> to control the source state (PQ bits). This is something I added
> >>> to the PAPR spec (and the OPAL <-> Linux interface) to allow firmware
> >>> to work around broken HW (which happens on some P9 versions).
> >>
> >> Ok.. and that's something that can be decided at runtime?
> > 
> > Well, at this point I think nothing will set that flag.... It's there
> > for workaround around HW bugs on some chips. At least in full emu it
> > shouldn't happen unless we try to emulate those bugs. Hopefully direct
> > MMIO will just work.
> 
> Nevertheless I have added support for the hcall in Linux and QEMU.
> To use, I think we could create a specific source.

So, IIUC, it's host constraints that would make this one way or the
other.  So what happens when a guest migrates from a host which has it
one way to one which has it the other way?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-24 14:44     ` Cédric Le Goater
@ 2017-07-25  4:20       ` David Gibson
  2017-07-25  9:08         ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25  4:20 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On Mon, Jul 24, 2017 at 04:44:00PM +0200, Cédric Le Goater wrote:
> On 07/24/2017 08:35 AM, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:27PM +0200, Cédric Le Goater wrote:
> >> The Thread Interrupt Management Area for the OS is mostly used to
> >> acknowledge interrupts and set the CPPR of the CPU.
> >>
> >> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
> >> used to retrieve the targeted interrupt presenter object.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > 
> > Am I right in thinking that this shoehorns the XIVE TIMA state into
> > the existing XICS ICP object.  That.. doesn't seem like a good idea.
> 
> The TIMA memory region is under the XIVE object because it is 
> unique for the system. The lookup of the ICP is simply done using 
> 'current_cpu'. The TIMA state is under the ICPState, yes, but this 
> model does not seem incorrect to me as this state contains the 
> interrupt information presented to a CPU.

Yeah, that's not the point I'm making.  My point is that the TIMA
state isn't really the same as xics ICP state.  You're squeezing one
into the other in a pretty ugly way.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-25  4:18           ` David Gibson
@ 2017-07-25  5:47             ` Benjamin Herrenschmidt
  2017-07-25  8:28               ` Cédric Le Goater
  2017-07-25 12:24               ` David Gibson
  0 siblings, 2 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-25  5:47 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

On Tue, 2017-07-25 at 14:18 +1000, David Gibson wrote:
> > Well, at this point I think nothing will set that flag.... It's there
> > for workaround around HW bugs on some chips. At least in full emu it
> > shouldn't happen unless we try to emulate those bugs. Hopefully direct
> > MMIO will just work.
> 
> Hm.  That doesn't seem like a good match for a per-irq state
> structure.

The flag is returned to the guest on a per-IRQ basis, so are the LSI
etc... flags, but at the HW level, indeed, they correspond to
attributes of blocks of interrupts.

It might be easier in qemu to keep that in the per-source flags
though.

Especially when we start having actual HW interrupts under the
hood with KVM. it's easier to keep the state self contained
for each of them.

Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-25  4:19             ` David Gibson
@ 2017-07-25  5:49               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-25  5:49 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On Tue, 2017-07-25 at 14:19 +1000, David Gibson wrote:
> > Nevertheless I have added support for the hcall in Linux and QEMU.
> > To use, I think we could create a specific source.
> 
> So, IIUC, it's host constraints that would make this one way or the
> other.  So what happens when a guest migrates from a host which has it
> one way to one which has it the other way?

It's probably ok to always call the hcall for the awy set -> unset, the
other way around is a problem, but it will end up depending on the kind
of interrupts we pass through.

Ben.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-24  9:50       ` David Gibson
  2017-07-24 11:07         ` Benjamin Herrenschmidt
@ 2017-07-25  8:17         ` Cédric Le Goater
  1 sibling, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25  8:17 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 11:50 AM, David Gibson wrote:
> On Mon, Jul 24, 2017 at 05:00:57PM +1000, Benjamin Herrenschmidt wrote:
>> On Mon, 2017-07-24 at 14:36 +1000, David Gibson wrote:
>>> On Wed, Jul 05, 2017 at 07:13:21PM +0200, Cédric Le Goater wrote:
>>>> These flags define some characteristics of the source :
>>>>
>>>>  - XIVE_SRC_H_INT_ESB  the Event State Buffer are controlled with a
>>>>                        specific hcall H_INT_ESB
>>>
>>> What's the other option?
>>
>> Direct MMIO access. Normally all interrupts use normal MMIOs,
>> each interrupts has an associated MMIO page with special MMIOs
>> to control the source state (PQ bits). This is something I added
>> to the PAPR spec (and the OPAL <-> Linux interface) to allow firmware
>> to work around broken HW (which happens on some P9 versions).
> 
> Ok.. and that's something that can be decided at runtime?
> 

This is a characteristic of an Interrupt Source and the associated 
object should be created with such a flag. But I don't think will 
ever use it in QEMU, maybe with KVM.

C.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-25  5:47             ` Benjamin Herrenschmidt
@ 2017-07-25  8:28               ` Cédric Le Goater
  2017-07-25 12:24               ` David Gibson
  1 sibling, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25  8:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/25/2017 07:47 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-07-25 at 14:18 +1000, David Gibson wrote:
>>> Well, at this point I think nothing will set that flag.... It's there
>>> for workaround around HW bugs on some chips. At least in full emu it
>>> shouldn't happen unless we try to emulate those bugs. Hopefully direct
>>> MMIO will just work.
>>
>> Hm.  That doesn't seem like a good match for a per-irq state
>> structure.
> 
> The flag is returned to the guest on a per-IRQ basis, so are the LSI
> etc... flags, but at the HW level, indeed, they correspond to
> attributes of blocks of interrupts.
> 
> It might be easier in qemu to keep that in the per-source flags
> though.

Theses flags are at the source level :

  XIVE_SRC_H_INT_ESB 
  XIVE_SRC_TRIGGER    
  XIVE_SRC_STORE_EOI  


The XIVE_SRC_LSI flag is only returned in the GET_SOURCE_INFO hcall 
but, internally, we use the ICSIRQState flags XICS_FLAGS_IRQ_LSI to 
store the information per-irq.       
 
Thanks,

C.


> Especially when we start having actual HW interrupts under the
> hood with KVM. it's easier to keep the state self contained
> for each of them.
> 
> Ben.
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-24 10:03                   ` David Gibson
@ 2017-07-25  8:52                     ` Cédric Le Goater
  2017-07-25 12:39                       ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25  8:52 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt; +Cc: Alexander Graf, qemu-ppc, qemu-devel

On 07/24/2017 12:03 PM, David Gibson wrote:
> On Mon, Jul 24, 2017 at 05:20:26PM +1000, Benjamin Herrenschmidt wrote:
>> On Mon, 2017-07-24 at 15:38 +1000, David Gibson wrote:
>>>
>>> Can we assign our logical numbers sparsely, or will that cause other
>>> problems?
>>
>> The main issue is that they probably needs to be the same between XICS
>> and XIVE because by the time we get the CAS call to chose between XICS
>> and XIVE, we have already handed out interrupts and constructed the DT,
>> no ? Unless we do a real CAS reboot...
> 
> A real CAS reboot probably isn't unreasonable for this case.
> 
> I definitely think we need to go one way or the other - either fully
> unify the irq mapping between xics and xive, or fully separate them.

To be able to change interrupt model at CAS time, we need to unify 
the IRQ numbering. We don't have much choice because the DT is 
already populated. We also need to share the ICSIRQState flags unless
we share the interrupt source object between the XIVE and XICS mode. 

In my current tree, I made sure that the same IRQ number ranges 
were being used in the XIVE and in the XICS allocator and that the 
ICSIRQState flags of the different sPAPR Interrupt sources (XIVE 
and XICS) were in sync. That works pretty well for reset, migration 
and hotplug, but it is bit hacky.

C.


>> Otherwise, there's no reason they can't be sparse no.
>>
>>> Note that for PAPR we also have the question of finding logical
>>> interrupts for legacy PAPR VIO devices.
>>
>> We just make them another range ? With KVM legacy today, I just use the
>> generic interrupt facility for those. So when you do the ioctl to
>> "trigger" one, I just do an MMIO to the corresponding page and the
>> interrupt magically shows up wherever the guest is running the target
>> vcpu. In fact, I'd like to add a way to mmap that page into qemu so
>> that qemu can triggers them without an ioctl.
> 
> Ok.
> 
>> The guest doesn't care, from the guest perspective they are interrupts
>> coming from the DT, so they are like PCI etc...
> 
> Ok.
> 
>>>> We can fix the number of "generic" interrupts given to a guest. The
>>>> only requirements from a PAPR perspective is that there should be at
>>>> least as many as there are possible threads in the guest so they can be
>>>> used as IPIs.
>>>
>>> Ok.  If we can do things sparsely, allocating these well away from the
>>> hw interrupts would make things easier.
>>>
>>>> But we may need more for other things. We can make this a machine
>>>> parameter with a default value of something like 4096. If we call N
>>>> that number of extra generic interrupts, then the number of generic
>>>> interrutps would be #possible-vcpu's + N, or something like that.
>>>
>>> That seems reasonable.
>>>
>>>>>> But it's fundamentally an allocator that sits in the hypervisor, so in
>>>>>> our case, I would say in the spapr "component" of XIVE, rather than the
>>>>>> XIVE HW model itself.
>>>>>
>>>>> Maybe..
>>>>
>>>> You are right in that a mapping is a better term than an allocator
>>>> here.
>>>>
>>>>>> Now what Cedric did, because XIVE is very complex and we need something
>>>>>> for PAPR quickly, is not a complete HW model, but a somewhat simplified
>>>>>> one that only handles what PAPR exposes. So in that case where the
>>>>>> allocator sits is a bit of a TBD...
>>>>>
>>>>> Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
>>>>> machine type level needs extreme caution, or the irqs may not be
>>>>> stable which will generally break migration.
>>>>
>>>> Yes you are right. We should probably create a more "static" scheme.
>>>
>>> Sounds like we're in violent agreement.
>>
>> Yup :)
>>
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-25  4:20       ` David Gibson
@ 2017-07-25  9:08         ` Cédric Le Goater
  2017-07-25 13:21           ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25  9:08 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/25/2017 06:20 AM, David Gibson wrote:
> On Mon, Jul 24, 2017 at 04:44:00PM +0200, Cédric Le Goater wrote:
>> On 07/24/2017 08:35 AM, David Gibson wrote:
>>> On Wed, Jul 05, 2017 at 07:13:27PM +0200, Cédric Le Goater wrote:
>>>> The Thread Interrupt Management Area for the OS is mostly used to
>>>> acknowledge interrupts and set the CPPR of the CPU.
>>>>
>>>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
>>>> used to retrieve the targeted interrupt presenter object.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> Am I right in thinking that this shoehorns the XIVE TIMA state into
>>> the existing XICS ICP object.  That.. doesn't seem like a good idea.
>>
>> The TIMA memory region is under the XIVE object because it is 
>> unique for the system. The lookup of the ICP is simply done using 
>> 'current_cpu'. The TIMA state is under the ICPState, yes, but this 
>> model does not seem incorrect to me as this state contains the 
>> interrupt information presented to a CPU.
> 
> Yeah, that's not the point I'm making.  My point is that the TIMA
> state isn't really the same as xics ICP state.  You're squeezing one
> into the other in a pretty ugly way.

yes, well, we need to have compatible objects between the XICS and XIVE 
mode because of the CAS negotiation. for migration compatibility, it is 
much easier to extend existing objects. This approach I am taking today.


C.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions
  2017-07-25  2:54   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
@ 2017-07-25  9:18     ` Cédric Le Goater
  2017-07-25 14:16       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25  9:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson; +Cc: qemu-ppc, qemu-devel

On 07/25/2017 04:54 AM, Alexey Kardashevskiy wrote:
> On 06/07/17 03:13, Cédric Le Goater wrote:
>> It will be used when the guest chooses the XIVE exploitation mode in
>> CAS.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c        | 11 +++++++++++
>>  include/hw/ppc/xive.h |  2 ++
>>  2 files changed, 13 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index cda1fa18e44d..895dd2b2f61b 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -915,3 +915,14 @@ bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
>>  
>>      return true;
>>  }
>> +
>> +void xive_mmio_map(XIVE *x)
>> +{
>> +    /* ESBs */
>> +    sysbus_mmio_map(SYS_BUS_DEVICE(x), 0, x->vc_base);
>> +
>> +    /* Thread Management Interrupt Areas */
>> +    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
>> +     * region needs some rework in the handlers */
>> +    sysbus_mmio_map(SYS_BUS_DEVICE(x), 1, x->tm_base + (1 << x->tm_shift));
>> +}
> 
> 
> imho it makes more sense to squash such small patches (this one and 20/26,
> 21/26) into those which actually make use of the new helpers - easier to
> review, better for bisectability.

ok. I am also realizing we should unmap.

Thanks,

C.
 
> 
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 288116aeb8f4..560f6ab66f73 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -68,4 +68,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>>  void xive_spapr_init(sPAPRMachineState *spapr);
>>  void xive_spapr_populate(XIVE *x, void *fdt);
>>  
>> +void xive_mmio_map(XIVE *x);
>> +
>>  #endif /* PPC_XIVE_H */
>>
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 26/26] spapr: force XIVE exploitation mode for POWER9 (HACK)
  2017-07-25  2:43   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
@ 2017-07-25  9:20     ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25  9:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson; +Cc: qemu-ppc, qemu-devel

On 07/25/2017 04:43 AM, Alexey Kardashevskiy wrote:
> On 06/07/17 03:13, Cédric Le Goater wrote:
>> The CAS negotiation process determines the interrupt controller model
>> to use in the guest but currently, the sPAPR machine make uses of the
>> controller very early in the initialization sequence. The interrupt
>> source is used to allocate IRQ numbers and populate the device tree
>> and the interrupt presenter objects are created along with the CPU.
>>
>> One solution would be use a bitmap to allocate these IRQ numbers and
>> then instantiate the interrupt source object of the correct type with
>> the bitmap as a constructor parameter.
>>
>> As for the interrupt presenter objects, we could allocated them later
>> in the boot process. May be on demand, when a CPU is first notified.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/ppc/spapr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 62 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index ca3a6bc2ea16..623fc776c886 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -237,6 +237,38 @@ error:
>>      return NULL;
>>  }
>>  
>> +static XiveICSState *spapr_xive_ics_create(XIVE *x, int nr_irqs, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    int irq_base;
>> +    Object *obj;
>> +
>> +    /*
>> +     * TODO: use an XICS_IRQ_BASE alignment to be in sync with XICS
>> +     * irq numbers. we should probably simplify the XIVE model or use
>> +     * a common allocator. a bitmap maybe ?
>> +     */
>> +    irq_base = xive_alloc_hw_irqs(x, nr_irqs, XICS_IRQ_BASE);
>> +    if (irq_base < 0) {
>> +        error_setg(errp, "Failed to allocate %d irqs", nr_irqs);
>> +        return NULL;
>> +    }
>> +
>> +    obj = object_new(TYPE_ICS_XIVE);
>> +    object_property_add_child(OBJECT(x), "hw", obj, NULL);
>> +
>> +    xive_ics_create(ICS_XIVE(obj), x, irq_base, nr_irqs, 16 /* 64KB page */,
>> +                    XIVE_SRC_TRIGGER, &local_err);
>> +    if (local_err) {
>> +        goto error;
>> +    }
>> +    return ICS_XIVE(obj);
>> +
>> +error:
>> +    error_propagate(errp, local_err);
>> +    return NULL;
>> +}
>> +
>>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>>                                    int smt_threads)
>>  {
>> @@ -814,6 +846,11 @@ static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
>>      /* /interrupt controller */
>>      if (!spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)) {
>>          spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
>> +    } else {
>> +        xive_spapr_populate(spapr->xive, fdt);
>> +
>> +        /* Install XIVE MMIOs */
>> +        xive_mmio_map(spapr->xive);
> 
> 
> xive_mmio_map() could be called where sysbus_init_mmio() is called as once
> these are mmap'ed, they are never unmapped and tm_base/vc_base never
> change. And XIVE is always created on P9 anyway.

OK. So you don't think we should map/unmap depending on 
CAS negotiation of the OV5_XIVE_EXPLOIT bit ? 

Thanks,

C. 


> 
> 
>>      }
>>  
>>      offset = fdt_path_offset(fdt, "/chosen");
>> @@ -963,6 +1000,13 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
>>          } else {
>>              val[3] = 0x00; /* Hash */
>>          }
>> +
>> +        /* TODO: introduce a kvmppc_has_cap_xive() ? Works with
> 
> 
> Yes.
> 
>> +         * irqchip=off for now
>> +         */
>> +        if (first_ppc_cpu->env.excp_model & POWERPC_EXCP_POWER9) {
>> +            val[1] = 0x01;
>> +        }
>>      } else {
>>          if (first_ppc_cpu->env.mmu_model & POWERPC_MMU_V3) {
>>              /* V3 MMU supports both hash and radix (with dynamic switching) */
>> @@ -971,6 +1015,9 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
>>              /* Otherwise we can only do hash */
>>              val[3] = 0x00;
>>          }
>> +        if (first_ppc_cpu->env.excp_model & POWERPC_EXCP_POWER9) {
>> +            val[1] = 0x01;
>> +        }
>>      }
>>      _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
>>                       val, sizeof(val)));
>> @@ -2237,6 +2284,21 @@ static void ppc_spapr_init(MachineState *machine)
>>      spapr->ov5 = spapr_ovec_new();
>>      spapr->ov5_cas = spapr_ovec_new();
>>  
>> +    /* TODO: force XIVE mode by default on POWER9.
>> +     *
>> +     * Switching from XICS to XIVE is badly broken. The ICP type is
>> +     * incorrect and the ICS is needed before the CAS negotiation to
>> +     * allocate irq numbers ...
>> +     */
>> +    if (strstr(machine->cpu_model, "POWER9") ||
>> +        !strcmp(machine->cpu_model, "host")) {
>> +        spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
>> +
>> +        spapr->icp_type = TYPE_XIVE_ICP;
>> +        spapr->ics = ICS_BASE(
>> +            spapr_xive_ics_create(spapr->xive, XICS_IRQS_SPAPR, &error_fatal));
>> +    }
>> +
>>      if (smc->dr_lmb_enabled) {
>>          spapr_ovec_set(spapr->ov5, OV5_DRCONF_MEMORY);
>>          spapr_validate_node_memory(machine, &error_fatal);
>>
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs
  2017-07-25  2:19         ` David Gibson
@ 2017-07-25  9:50           ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25  9:50 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/25/2017 04:19 AM, David Gibson wrote:
> On Mon, Jul 24, 2017 at 03:25:29PM +0200, Cédric Le Goater wrote:
>> On 07/24/2017 08:09 AM, Benjamin Herrenschmidt wrote:
>>> On Mon, 2017-07-24 at 14:49 +1000, David Gibson wrote:
>>>> On Wed, Jul 05, 2017 at 07:13:22PM +0200, Cédric Le Goater wrote:
>>>>> Each source adds its own ESB mempry region to the overall ESB memory
>>>>> region of the controller. It will be mapped in the CPU address space
>>>>> when XIVE is activated.
>>>>>
>>>>> The default mapping address for the ESB memory region is the same one
>>>>> used on baremetal.
>>>>>
>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>>> ---
>>>>>  hw/intc/xive-internal.h |  5 +++++
>>>>>  hw/intc/xive.c          | 44 +++++++++++++++++++++++++++++++++++++++++++-
>>>>>  2 files changed, 48 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>>>>> index 8e755aa88a14..c06be823aad0 100644
>>>>> --- a/hw/intc/xive-internal.h
>>>>> +++ b/hw/intc/xive-internal.h
>>>>> @@ -98,6 +98,7 @@ struct XIVE {
>>>>>      SysBusDevice parent;
>>>>>  
>>>>>      /* Properties */
>>>>> +    uint32_t     chip_id;
>>>>
>>>> So there is a XIVE object per chip.  How does this work on PAPR?  One
>>>> logical chip/XIVE, or something more complex?
>>>
>>> One global XIVE for PAPR. 
>>
>> Yes. 
>>
>> The chip-id is useless for sPAPR (0 is the default) but for a PowerNV
>> system, the address used to map the ESB memory region depends on the 
>> chip-id and  I thought we could reuse the same XIVE object. 
> 
> Hmm, maybe.

yes. 

I am thinking of greatly simplifying the allocator to fit only 
the sPAPR needs : a simple range of IRQ numbers with the IPIs 
at the beginning, and the HW interrupts starting at an offset 
(in sync with the XICS allocator). That's what I ended doing 
for CAS negotiation.

So we could just call it sPAPRXive and forget about PowerNV
support for the moment. 
 
C.

> 
>> So, a sPAPR guest would use the address of a single chip baremetal 
>> system. This needs more explanation I agree. Thanks to Ben who is 
>> providing a lot. I will update the changelogs in the next version. 
> 
>> The TIMA is mapped at a fixed address so the chip-id does not come 
>> in play.
>>
>>> For the MMIOs, the way it works is that:
>>>
>>>  - For MMIOs pertaining to a specific interrupt or queue, there's an H-
>>> call that will return the proper "guest physical" address. For qemu
>>> with KVM we'll have to probably create a single chunk of qemu address
>>> space (a single mem region) that contains individual pages mapped with
>>> MAP_FIXED originating from the different HW bits, we still need to sort
>>> out how exactly we'll do that in practice.
>>
>> I haven't looked at all the KVM details. But, regarding the ESBs, I had
>> the above in mind and used a single memory region to contain them all. 
>>  
>>>  - For the TIMA (the presentation MMIOs), those are always at the same
>>> physical address for everybody (so for a guest it's a single memory
>>> region we'll map to that physical address), the HW "knows" which HW
>>> thread is talking to it (and the hypervisor tells the HW which vcpu is
>>> running on a given HW thread at a given point in time). That address is
>>> obtained from the device-tree
>>>
>>>>>      uint32_t     nr_targets;
>>>>>  
>>>>>      /* IRQ number allocator */
>>>>> @@ -111,6 +112,10 @@ struct XIVE {
>>>>>      void         *sbe;
>>>>>      XiveIVE      *ivt;
>>>>>      XiveEQ       *eqdt;
>>>>> +
>>>>> +    /* ESB and TIMA memory location */
>>>>> +    hwaddr       vc_base;
>>>>> +    MemoryRegion esb_iomem;
>>>>>  };
>>>>>  
>>>>>  void xive_reset(void *dev);
>>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>>>> index 8f8bb8b787bd..a1cb87a07b76 100644
>>>>> --- a/hw/intc/xive.c
>>>>> +++ b/hw/intc/xive.c
>>>>> @@ -312,6 +312,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>>>>      XiveICSState *xs = ICS_XIVE(ics);
>>>>>      Object *obj;
>>>>>      Error *err = NULL;
>>>>> +    XIVE *x;
>>>>
>>>> I don't really like just 'x' for a context variable like this (as
>>>> opposed to a temporary).
>>
>> OK. I will change 'x' in 'xive' then.
>>
>>>>>  
>>>>>      obj = object_property_get_link(OBJECT(xs), "xive", &err);
>>>>>      if (!obj) {
>>>>> @@ -319,7 +320,7 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>>>>                     __func__, error_get_pretty(err));
>>>>>          return;
>>>>>      }
>>>>> -    xs->xive = XIVE(obj);
>>>>> +    x = xs->xive = XIVE(obj);
>>>>>  
>>>>>      if (!ics->nr_irqs) {
>>>>>          error_setg(errp, "Number of interrupts needs to be greater 0");
>>>>> @@ -338,6 +339,11 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>>>>                            "xive.esb",
>>>>>                            (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
>>>>>  
>>>>> +    /* Install the ESB memory region in the overall one */
>>>>> +    memory_region_add_subregion(&x->esb_iomem,
>>>>> +                                ICS_BASE(xs)->offset * (1 << xs->esb_shift),
>>>>> +                                &xs->esb_iomem);
>>>>> +
>>>>>      qemu_register_reset(xive_ics_reset, xs);
>>>>>  }
>>>>>  
>>>>> @@ -375,6 +381,32 @@ static const TypeInfo xive_ics_info = {
>>>>>   */
>>>>>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
>>>>>  
>>>>> +/* VC BAR contains set translations for the ESBs and the EQs. */
>>>>> +#define VC_BAR_DEFAULT   0x10000000000ull
>>>>> +#define VC_BAR_SIZE      0x08000000000ull
>>>>> +
>>>>> +#define P9_MMIO_BASE     0x006000000000000ull
>>>>> +#define P9_CHIP_BASE(id) (P9_MMIO_BASE | (0x40000000000ull * (uint64_t) (id)))
>>>>
>>>> chip-based MMIO addresses leaking into the PAPR model seems like it
>>>> might not be what you want
>>
>> See above for the reason.
>>
>>
>> Thanks,
>>
>> C. 
>>
>>>>
>>>>> +static uint64_t xive_esb_default_read(void *p, hwaddr offset, unsigned size)
>>>>> +{
>>>>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
>>>>> +                  __func__, offset, size);
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void xive_esb_default_write(void *opaque, hwaddr offset, uint64_t value,
>>>>> +                unsigned size)
>>>>> +{
>>>>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
>>>>> +                  __func__, offset, value, size);
>>>>> +}
>>>>> +
>>>>> +static const MemoryRegionOps xive_esb_default_ops = {
>>>>> +    .read = xive_esb_default_read,
>>>>> +    .write = xive_esb_default_write,
>>>>> +    .endianness = DEVICE_BIG_ENDIAN,
>>>>> +};
>>>>>  
>>>>>  void xive_reset(void *dev)
>>>>>  {
>>>>> @@ -435,10 +467,20 @@ static void xive_realize(DeviceState *dev, Error **errp)
>>>>>      x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
>>>>>                          sizeof(XiveEQ));
>>>>>  
>>>>> +    /* VC BAR. That's the full window but we will only map the
>>>>> +     * subregions in use. */
>>>>> +    x->vc_base = (hwaddr)(P9_CHIP_BASE(x->chip_id) | VC_BAR_DEFAULT);
>>>>> +
>>>>> +    /* install default memory region handlers to log bogus access */
>>>>> +    memory_region_init_io(&x->esb_iomem, NULL, &xive_esb_default_ops,
>>>>> +                          NULL, "xive.esb", VC_BAR_SIZE);
>>>>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &x->esb_iomem);
>>>>> +
>>>>>      qemu_register_reset(xive_reset, dev);
>>>>>  }
>>>>>  
>>>>>  static Property xive_properties[] = {
>>>>> +    DEFINE_PROP_UINT32("chip-id", XIVE, chip_id, 0),
>>>>>      DEFINE_PROP_UINT32("nr-targets", XIVE, nr_targets, 0),
>>>>>      DEFINE_PROP_END_OF_LIST(),
>>>>>  };
>>>>
>>>>
>>
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-24 15:55     ` Cédric Le Goater
@ 2017-07-25 12:21       ` David Gibson
  2017-07-25 15:42         ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25 12:21 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5634 bytes --]

On Mon, Jul 24, 2017 at 05:55:42PM +0200, Cédric Le Goater wrote:
> On 07/24/2017 06:29 AM, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:20PM +0200, Cédric Le Goater wrote:
> >> Each interrupt source is associated with a 2-bit state machine called
> >> an Event State Buffer (ESB). It is controlled by MMIO to trigger
> >> events.
[snip]
> >> +/* TODO: handle second page */
> > 
> > Is this comment still relevent?
> 
> Some HW have a second page to trigger the event. I am not sure we need 
> to model it though. I will make some inquiries.

Ah, ok.  Maybe clarify the comment a bit.

[snip]
> >> +static void xive_esb_write(void *opaque, hwaddr addr,
> >> +                           uint64_t value, unsigned size)
> >> +{
> >> +    XiveICSState *xs = ICS_XIVE(opaque);
> >> +    XIVE *x = xs->xive;
> >> +    uint32_t offset = addr & 0xF00;
> >> +    uint32_t srcno = addr >> xs->esb_shift;
> >> +    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
> >> +    XiveIVE *ive;
> >> +    bool notify = false;
> >> +
> >> +    ive = xive_get_ive(x, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID))  {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> >> +        return;
> >> +    }
> > 
> > Having this code associated with the individual ICS look directly at
> > the IVE table in the core xive object seems a bit dubious.
> 
> The IVE table holds the validity and mask status of the interrupt 
> entries, so we need that lookup. However, (continues below) ...
> 
> > This also
> > points out another mismatch between the re-used ICS code and the new
> > XIVE code: ICS gathers all the per-source-irq flags/state into the
> > irqstate structure, whereas xive has per-irq information in the
> > centralized ecb and IVE tables.  There can certainly be good reasons
> > for that, but using both at once is kind of clunky.
> 
> I understand that you would rather put the esbs in the source they 
> belong to. That is the case on real HW but it makes the modeling a 
> bit more difficult. We would need to choose a MMIO address to give 
> to the guest OS. I had some issues with the allocator (I need 
> to look at this problem closer).

Uh.. what do MMIO addresses have to do with this?  I'm talking about
the actual ESB state in the packed bit array.

> It might also be an "issue" for KVM. Ben talked about maintaining 
> all the esbs of a guest under a single memory region to be able to 
> map the pages in the host.
> 
> Any how, I agree this is another point to discuss in the sPAPR 
> model.
> 
> Thanks,
> 
> C. 
> 
> 
> >> +    if (srcno >= ICS_BASE(xs)->nr_irqs) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR,
> >> +                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
> >> +                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
> >> +        return;
> >> +    }
> >> +
> >> +    switch (offset) {
> >> +    case 0:
> >> +        /* TODO: should we trigger even if the IVE is masked ? */
> >> +        notify = xive_pq_trigger(x, lisn);
> >> +        break;
> >> +    default:
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> >> +                      offset);
> >> +        return;
> >> +    }
> >> +
> >> +    if (notify && !(ive->w & IVE_MASKED)) {
> >> +        qemu_irq_pulse(ICS_BASE(xs)->qirqs[srcno]);
> >> +    }
> >> +}
> >> +
> >> +static const MemoryRegionOps xive_esb_ops = {
> >> +    .read = xive_esb_read,
> >> +    .write = xive_esb_write,
> >> +    .endianness = DEVICE_BIG_ENDIAN,
> >> +    .valid = {
> >> +        .min_access_size = 8,
> >> +        .max_access_size = 8,
> >> +    },
> >> +    .impl = {
> >> +        .min_access_size = 8,
> >> +        .max_access_size = 8,
> >> +    },
> >> +};
> >> +
> >> +/*
> >>   * XIVE Interrupt Source
> >>   */
> >>  static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
> >> @@ -106,15 +326,25 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
> >>          return;
> >>      }
> >>  
> >> +    if (!xs->esb_shift) {
> >> +        error_setg(errp, "ESB page size needs to be greater 0");
> >> +        return;
> >> +    }
> >> +
> >>      ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
> >>      ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
> >>  
> >> +    memory_region_init_io(&xs->esb_iomem, OBJECT(xs), &xive_esb_ops, xs,
> >> +                          "xive.esb",
> >> +                          (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
> >> +
> >>      qemu_register_reset(xive_ics_reset, xs);
> >>  }
> >>  
> >>  static Property xive_ics_properties[] = {
> >>      DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
> >>      DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
> >> +    DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
> >>      DEFINE_PROP_END_OF_LIST(),
> >>  };
> >>  
> >> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >> index 544cc6e0c796..5303d96f5f59 100644
> >> --- a/include/hw/ppc/xive.h
> >> +++ b/include/hw/ppc/xive.h
> >> @@ -33,6 +33,9 @@ typedef struct XiveICSState XiveICSState;
> >>  struct XiveICSState {
> >>      ICSState parent_obj;
> >>  
> >> +    uint32_t     esb_shift;
> >> +    MemoryRegion esb_iomem;
> >> +
> >>      XIVE         *xive;
> >>  };
> >>  
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags to the XIVE interrupt source
  2017-07-25  5:47             ` Benjamin Herrenschmidt
  2017-07-25  8:28               ` Cédric Le Goater
@ 2017-07-25 12:24               ` David Gibson
  1 sibling, 0 replies; 122+ messages in thread
From: David Gibson @ 2017-07-25 12:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

On Tue, Jul 25, 2017 at 03:47:37PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2017-07-25 at 14:18 +1000, David Gibson wrote:
> > > Well, at this point I think nothing will set that flag.... It's there
> > > for workaround around HW bugs on some chips. At least in full emu it
> > > shouldn't happen unless we try to emulate those bugs. Hopefully direct
> > > MMIO will just work.
> > 
> > Hm.  That doesn't seem like a good match for a per-irq state
> > structure.
> 
> The flag is returned to the guest on a per-IRQ basis, so are the LSI
> etc... flags, but at the HW level, indeed, they correspond to
> attributes of blocks of interrupts.
> 
> It might be easier in qemu to keep that in the per-source flags
> though.

Yeah, I think so.

> Especially when we start having actual HW interrupts under the
> hood with KVM. it's easier to keep the state self contained
> for each of them.
> 
> Ben.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-25  8:52                     ` Cédric Le Goater
@ 2017-07-25 12:39                       ` David Gibson
  2017-07-25 13:48                         ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25 12:39 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4321 bytes --]

On Tue, Jul 25, 2017 at 10:52:27AM +0200, Cédric Le Goater wrote:
> On 07/24/2017 12:03 PM, David Gibson wrote:
> > On Mon, Jul 24, 2017 at 05:20:26PM +1000, Benjamin Herrenschmidt wrote:
> >> On Mon, 2017-07-24 at 15:38 +1000, David Gibson wrote:
> >>>
> >>> Can we assign our logical numbers sparsely, or will that cause other
> >>> problems?
> >>
> >> The main issue is that they probably needs to be the same between XICS
> >> and XIVE because by the time we get the CAS call to chose between XICS
> >> and XIVE, we have already handed out interrupts and constructed the DT,
> >> no ? Unless we do a real CAS reboot...
> > 
> > A real CAS reboot probably isn't unreasonable for this case.
> > 
> > I definitely think we need to go one way or the other - either fully
> > unify the irq mapping between xics and xive, or fully separate them.
> 
> To be able to change interrupt model at CAS time, we need to unify 
> the IRQ numbering.

Not necessarily, though it certainly might make things easier.

> We don't have much choice because the DT is 
> already populated.

We could change that, though.

> We also need to share the ICSIRQState flags unless
> we share the interrupt source object between the XIVE and XICS mode. 
> 
> In my current tree, I made sure that the same IRQ number ranges 
> were being used in the XIVE and in the XICS allocator and that the 
> ICSIRQState flags of the different sPAPR Interrupt sources (XIVE 
> and XICS) were in sync. That works pretty well for reset, migration 
> and hotplug, but it is bit hacky.
> 
> C.
> 
> 
> >> Otherwise, there's no reason they can't be sparse no.
> >>
> >>> Note that for PAPR we also have the question of finding logical
> >>> interrupts for legacy PAPR VIO devices.
> >>
> >> We just make them another range ? With KVM legacy today, I just use the
> >> generic interrupt facility for those. So when you do the ioctl to
> >> "trigger" one, I just do an MMIO to the corresponding page and the
> >> interrupt magically shows up wherever the guest is running the target
> >> vcpu. In fact, I'd like to add a way to mmap that page into qemu so
> >> that qemu can triggers them without an ioctl.
> > 
> > Ok.
> > 
> >> The guest doesn't care, from the guest perspective they are interrupts
> >> coming from the DT, so they are like PCI etc...
> > 
> > Ok.
> > 
> >>>> We can fix the number of "generic" interrupts given to a guest. The
> >>>> only requirements from a PAPR perspective is that there should be at
> >>>> least as many as there are possible threads in the guest so they can be
> >>>> used as IPIs.
> >>>
> >>> Ok.  If we can do things sparsely, allocating these well away from the
> >>> hw interrupts would make things easier.
> >>>
> >>>> But we may need more for other things. We can make this a machine
> >>>> parameter with a default value of something like 4096. If we call N
> >>>> that number of extra generic interrupts, then the number of generic
> >>>> interrutps would be #possible-vcpu's + N, or something like that.
> >>>
> >>> That seems reasonable.
> >>>
> >>>>>> But it's fundamentally an allocator that sits in the hypervisor, so in
> >>>>>> our case, I would say in the spapr "component" of XIVE, rather than the
> >>>>>> XIVE HW model itself.
> >>>>>
> >>>>> Maybe..
> >>>>
> >>>> You are right in that a mapping is a better term than an allocator
> >>>> here.
> >>>>
> >>>>>> Now what Cedric did, because XIVE is very complex and we need something
> >>>>>> for PAPR quickly, is not a complete HW model, but a somewhat simplified
> >>>>>> one that only handles what PAPR exposes. So in that case where the
> >>>>>> allocator sits is a bit of a TBD...
> >>>>>
> >>>>> Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
> >>>>> machine type level needs extreme caution, or the irqs may not be
> >>>>> stable which will generally break migration.
> >>>>
> >>>> Yes you are right. We should probably create a more "static" scheme.
> >>>
> >>> Sounds like we're in violent agreement.
> >>
> >> Yup :)
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-25  9:08         ` Cédric Le Goater
@ 2017-07-25 13:21           ` David Gibson
  2017-07-25 15:01             ` Cédric Le Goater
  0 siblings, 1 reply; 122+ messages in thread
From: David Gibson @ 2017-07-25 13:21 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3176 bytes --]

On Tue, Jul 25, 2017 at 11:08:46AM +0200, Cédric Le Goater wrote:
> On 07/25/2017 06:20 AM, David Gibson wrote:
> > On Mon, Jul 24, 2017 at 04:44:00PM +0200, Cédric Le Goater wrote:
> >> On 07/24/2017 08:35 AM, David Gibson wrote:
> >>> On Wed, Jul 05, 2017 at 07:13:27PM +0200, Cédric Le Goater wrote:
> >>>> The Thread Interrupt Management Area for the OS is mostly used to
> >>>> acknowledge interrupts and set the CPPR of the CPU.
> >>>>
> >>>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
> >>>> used to retrieve the targeted interrupt presenter object.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>
> >>> Am I right in thinking that this shoehorns the XIVE TIMA state into
> >>> the existing XICS ICP object.  That.. doesn't seem like a good idea.
> >>
> >> The TIMA memory region is under the XIVE object because it is 
> >> unique for the system. The lookup of the ICP is simply done using 
> >> 'current_cpu'. The TIMA state is under the ICPState, yes, but this 
> >> model does not seem incorrect to me as this state contains the 
> >> interrupt information presented to a CPU.
> > 
> > Yeah, that's not the point I'm making.  My point is that the TIMA
> > state isn't really the same as xics ICP state.  You're squeezing one
> > into the other in a pretty ugly way.
> 
> yes, well, we need to have compatible objects between the XICS and XIVE 
> mode because of the CAS negotiation. for migration compatibility, it is 
> much easier to extend existing objects. This approach I am taking today.

Yeah, I really don't think this approach is workable.

Roughly speaking, you're keeping the same structures between xics and
xive, but with mostly different code.  I can't see any way that's not
going to be horribly fragile, with any update to xics OR xive
requiring enormous caution not to break the other.

I really think we have to go one of two ways:

1) Abstract the notion of interrupt source and interrupt presenter, so
we can use a truly common model between xics and xive.

Given the differences between the two, I don't know this is even
possible.

2) Separate the objects entirely.  ICPs are entirely separate from
TIMAs, like wise ICSes and xive interrupt sources.

I think this is probably the way to go.  To make this work with CAS
switching will require different methods, but I don't think it's
impossible.

For example, we could (on the new machine type) create both xics ICSes
and ICPs and xive sources and TIMAs.  We'd have a (migrated) flag in
the machine saying which is currently active.  All the objects would
hang around, but only the active ones would do anything.

Now obviously that means we'd be migrating a bunch of redundant state,
but I still think it's preferable to a Frankenstinian fusion of
xics-ish and xive-ish state.  I think there's a good chance we can
improve on the basic idea to remove most or all of that redundant
state.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects
  2017-07-24 13:58     ` Cédric Le Goater
@ 2017-07-25 13:26       ` David Gibson
  0 siblings, 0 replies; 122+ messages in thread
From: David Gibson @ 2017-07-25 13:26 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5167 bytes --]

On Mon, Jul 24, 2017 at 03:58:50PM +0200, Cédric Le Goater wrote:
> On 07/24/2017 07:13 AM, David Gibson wrote:
> > On Wed, Jul 05, 2017 at 07:13:24PM +0200, Cédric Le Goater wrote:
> >> This handler will be used to customize the ouput of the XIVE interrupt
> >> source and presenter objects.
> > 
> > I'm not really happy with this without having a clear idea of where
> > this is heading - are you trying to share ICP and or ICS object
> > classes between XICS and XIVE, or will they eventually be separated
> > again?
> 
> Because of the XICSFabric interface of the sPAPR machine, we need 
> to use ICPState and ICSState objects. 

Yeah, I don't think we do.  See further comments elsewhere

> sPAPR is strongly tied to ICPState and it is complex to introduce 
> a new ICPState class for the sPAPR machine. We did introduce a new 
> class in the past but that was for a new machine : PowerNV.
> So I think we should just add a couple of attributes to ICPState
> to support XIVE. That is not what the patchset does but I have
> made progress since with hotplug and migration and came to that
> conclusion. The consequence is that the print_info() handler is 
> now obsolete for ICPs and we will need to find another way to 
> customize the output.
> 
> For the interrupt source, the constraints are less strong, adding 
> a new ICSState class seems like a good option and so does the 
> print_info() handler.
> 
> Thanks,
> 
> C. 
> 
> 
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/xics.c        | 36 ++++++++++++++++++++++++------------
> >>  include/hw/ppc/xics.h |  2 ++
> >>  2 files changed, 26 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> >> index faa5c631f655..7837c2022b4a 100644
> >> --- a/hw/intc/xics.c
> >> +++ b/hw/intc/xics.c
> >> @@ -40,18 +40,26 @@
> >>  
> >>  void icp_pic_print_info(ICPState *icp, Monitor *mon)
> >>  {
> >> +    ICPStateClass *k = ICP_GET_CLASS(icp);
> >>      int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
> >>  
> >>      if (!icp->output) {
> >>          return;
> >>      }
> >> -    monitor_printf(mon, "CPU %d XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
> >> -                   cpu_index, icp->xirr, icp->xirr_owner,
> >> -                   icp->pending_priority, icp->mfrr);
> >> +
> >> +    monitor_printf(mon, "CPU %d ", cpu_index);
> >> +    if (k->print_info) {
> >> +        k->print_info(icp, mon);
> >> +    } else {
> >> +        monitor_printf(mon, "XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
> >> +                       icp->xirr, icp->xirr_owner,
> >> +                       icp->pending_priority, icp->mfrr);
> >> +    }
> >>  }
> >>  
> >>  void ics_pic_print_info(ICSState *ics, Monitor *mon)
> >>  {
> >> +    ICSStateClass *k = ICS_BASE_GET_CLASS(ics);
> >>      uint32_t i;
> >>  
> >>      monitor_printf(mon, "ICS %4x..%4x %p\n",
> >> @@ -61,17 +69,21 @@ void ics_pic_print_info(ICSState *ics, Monitor *mon)
> >>          return;
> >>      }
> >>  
> >> -    for (i = 0; i < ics->nr_irqs; i++) {
> >> -        ICSIRQState *irq = ics->irqs + i;
> >> +    if (k->print_info) {
> >> +        k->print_info(ics, mon);
> >> +    } else {
> >> +        for (i = 0; i < ics->nr_irqs; i++) {
> >> +            ICSIRQState *irq = ics->irqs + i;
> >>  
> >> -        if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
> >> -            continue;
> >> +            if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
> >> +                continue;
> >> +            }
> >> +            monitor_printf(mon, "  %4x %s %02x %02x\n",
> >> +                           ics->offset + i,
> >> +                           (irq->flags & XICS_FLAGS_IRQ_LSI) ?
> >> +                           "LSI" : "MSI",
> >> +                           irq->priority, irq->status);
> >>          }
> >> -        monitor_printf(mon, "  %4x %s %02x %02x\n",
> >> -                       ics->offset + i,
> >> -                       (irq->flags & XICS_FLAGS_IRQ_LSI) ?
> >> -                       "LSI" : "MSI",
> >> -                       irq->priority, irq->status);
> >>      }
> >>  }
> >>  
> >> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> >> index 28d248abad61..902f3bfd0e33 100644
> >> --- a/include/hw/ppc/xics.h
> >> +++ b/include/hw/ppc/xics.h
> >> @@ -69,6 +69,7 @@ struct ICPStateClass {
> >>      void (*pre_save)(ICPState *icp);
> >>      int (*post_load)(ICPState *icp, int version_id);
> >>      void (*reset)(ICPState *icp);
> >> +    void (*print_info)(ICPState *icp, Monitor *mon);
> >>  };
> >>  
> >>  struct ICPState {
> >> @@ -119,6 +120,7 @@ struct ICSStateClass {
> >>      void (*reject)(ICSState *s, uint32_t irq);
> >>      void (*resend)(ICSState *s);
> >>      void (*eoi)(ICSState *s, uint32_t irq);
> >> +    void (*print_info)(ICSState *s, Monitor *mon);
> >>  };
> >>  
> >>  struct ICSState {
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  2017-07-25 12:39                       ` David Gibson
@ 2017-07-25 13:48                         ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25 13:48 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Alexander Graf, qemu-devel

On 07/25/2017 02:39 PM, David Gibson wrote:
> On Tue, Jul 25, 2017 at 10:52:27AM +0200, Cédric Le Goater wrote:
>> On 07/24/2017 12:03 PM, David Gibson wrote:
>>> On Mon, Jul 24, 2017 at 05:20:26PM +1000, Benjamin Herrenschmidt wrote:
>>>> On Mon, 2017-07-24 at 15:38 +1000, David Gibson wrote:
>>>>>
>>>>> Can we assign our logical numbers sparsely, or will that cause other
>>>>> problems?
>>>>
>>>> The main issue is that they probably needs to be the same between XICS
>>>> and XIVE because by the time we get the CAS call to chose between XICS
>>>> and XIVE, we have already handed out interrupts and constructed the DT,
>>>> no ? Unless we do a real CAS reboot...
>>>
>>> A real CAS reboot probably isn't unreasonable for this case.
>>>
>>> I definitely think we need to go one way or the other - either fully
>>> unify the irq mapping between xics and xive, or fully separate them.
>>
>> To be able to change interrupt model at CAS time, we need to unify 
>> the IRQ numbering.
> 
> Not necessarily, though it certainly might make things easier.
> 
>> We don't have much choice because the DT is already populated.
> 
> We could change that, though.

It would certainly help to manage the change of interrupt mode.
May be we could even instantiate only one sPAPR interrupt source
object at a time. Today I am instantiating two and switch from 
one to another. I am not sure how compatible this is with migration 
tough.

We would need to postpone all IRQ allocation until the CAS 
negotiation is complete and populate the DT accordingly in the 
CAS response. Hopefully I think this is the only reason why we do
an early allocation of the XICS or XIVE objects today. 

I need to take a closer look at the spapr_ics_alloc() calls in 
spapr_{events,pci,vio}.c files.

C.
 
 
>> We also need to share the ICSIRQState flags unless
>> we share the interrupt source object between the XIVE and XICS mode. 
>>
>> In my current tree, I made sure that the same IRQ number ranges 
>> were being used in the XIVE and in the XICS allocator and that the 
>> ICSIRQState flags of the different sPAPR Interrupt sources (XIVE 
>> and XICS) were in sync. That works pretty well for reset, migration 
>> and hotplug, but it is bit hacky.
>>
>> C.
>>
>>
>>>> Otherwise, there's no reason they can't be sparse no.
>>>>
>>>>> Note that for PAPR we also have the question of finding logical
>>>>> interrupts for legacy PAPR VIO devices.
>>>>
>>>> We just make them another range ? With KVM legacy today, I just use the
>>>> generic interrupt facility for those. So when you do the ioctl to
>>>> "trigger" one, I just do an MMIO to the corresponding page and the
>>>> interrupt magically shows up wherever the guest is running the target
>>>> vcpu. In fact, I'd like to add a way to mmap that page into qemu so
>>>> that qemu can triggers them without an ioctl.
>>>
>>> Ok.
>>>
>>>> The guest doesn't care, from the guest perspective they are interrupts
>>>> coming from the DT, so they are like PCI etc...
>>>
>>> Ok.
>>>
>>>>>> We can fix the number of "generic" interrupts given to a guest. The
>>>>>> only requirements from a PAPR perspective is that there should be at
>>>>>> least as many as there are possible threads in the guest so they can be
>>>>>> used as IPIs.
>>>>>
>>>>> Ok.  If we can do things sparsely, allocating these well away from the
>>>>> hw interrupts would make things easier.
>>>>>
>>>>>> But we may need more for other things. We can make this a machine
>>>>>> parameter with a default value of something like 4096. If we call N
>>>>>> that number of extra generic interrupts, then the number of generic
>>>>>> interrutps would be #possible-vcpu's + N, or something like that.
>>>>>
>>>>> That seems reasonable.
>>>>>
>>>>>>>> But it's fundamentally an allocator that sits in the hypervisor, so in
>>>>>>>> our case, I would say in the spapr "component" of XIVE, rather than the
>>>>>>>> XIVE HW model itself.
>>>>>>>
>>>>>>> Maybe..
>>>>>>
>>>>>> You are right in that a mapping is a better term than an allocator
>>>>>> here.
>>>>>>
>>>>>>>> Now what Cedric did, because XIVE is very complex and we need something
>>>>>>>> for PAPR quickly, is not a complete HW model, but a somewhat simplified
>>>>>>>> one that only handles what PAPR exposes. So in that case where the
>>>>>>>> allocator sits is a bit of a TBD...
>>>>>>>
>>>>>>> Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
>>>>>>> machine type level needs extreme caution, or the irqs may not be
>>>>>>> stable which will generally break migration.
>>>>>>
>>>>>> Yes you are right. We should probably create a more "static" scheme.
>>>>>
>>>>> Sounds like we're in violent agreement.
>>>>
>>>> Yup :)
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions
  2017-07-25  9:18     ` Cédric Le Goater
@ 2017-07-25 14:16       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 122+ messages in thread
From: Alexey Kardashevskiy @ 2017-07-25 14:16 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On 25/07/17 19:18, Cédric Le Goater wrote:
> On 07/25/2017 04:54 AM, Alexey Kardashevskiy wrote:
>> On 06/07/17 03:13, Cédric Le Goater wrote:
>>> It will be used when the guest chooses the XIVE exploitation mode in
>>> CAS.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>>  hw/intc/xive.c        | 11 +++++++++++
>>>  include/hw/ppc/xive.h |  2 ++
>>>  2 files changed, 13 insertions(+)
>>>
>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>> index cda1fa18e44d..895dd2b2f61b 100644
>>> --- a/hw/intc/xive.c
>>> +++ b/hw/intc/xive.c
>>> @@ -915,3 +915,14 @@ bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
>>>  
>>>      return true;
>>>  }
>>> +
>>> +void xive_mmio_map(XIVE *x)
>>> +{
>>> +    /* ESBs */
>>> +    sysbus_mmio_map(SYS_BUS_DEVICE(x), 0, x->vc_base);
>>> +
>>> +    /* Thread Management Interrupt Areas */
>>> +    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
>>> +     * region needs some rework in the handlers */
>>> +    sysbus_mmio_map(SYS_BUS_DEVICE(x), 1, x->tm_base + (1 << x->tm_shift));
>>> +}
>>
>>
>> imho it makes more sense to squash such small patches (this one and 20/26,
>> 21/26) into those which actually make use of the new helpers - easier to
>> review, better for bisectability.
> 
> ok. I am also realizing we should unmap.


I could not find a helper to do unmapping for sysbus regions so I suppose
it has to be memory_region_set_enabled() or (imho better)
memory_region_set_size() which would take zero size for the XICS emulation
case.


> 
> Thanks,
> 
> C.
>  
>>
>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>>> index 288116aeb8f4..560f6ab66f73 100644
>>> --- a/include/hw/ppc/xive.h
>>> +++ b/include/hw/ppc/xive.h
>>> @@ -68,4 +68,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>>>  void xive_spapr_init(sPAPRMachineState *spapr);
>>>  void xive_spapr_populate(XIVE *x, void *fdt);
>>>  
>>> +void xive_mmio_map(XIVE *x);
>>> +
>>>  #endif /* PPC_XIVE_H */
>>>
>>
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-25 13:21           ` David Gibson
@ 2017-07-25 15:01             ` Cédric Le Goater
  2017-07-26  2:02               ` David Gibson
  0 siblings, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25 15:01 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/25/2017 03:21 PM, David Gibson wrote:
> On Tue, Jul 25, 2017 at 11:08:46AM +0200, Cédric Le Goater wrote:
>> On 07/25/2017 06:20 AM, David Gibson wrote:
>>> On Mon, Jul 24, 2017 at 04:44:00PM +0200, Cédric Le Goater wrote:
>>>> On 07/24/2017 08:35 AM, David Gibson wrote:
>>>>> On Wed, Jul 05, 2017 at 07:13:27PM +0200, Cédric Le Goater wrote:
>>>>>> The Thread Interrupt Management Area for the OS is mostly used to
>>>>>> acknowledge interrupts and set the CPPR of the CPU.
>>>>>>
>>>>>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
>>>>>> used to retrieve the targeted interrupt presenter object.
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>>>
>>>>> Am I right in thinking that this shoehorns the XIVE TIMA state into
>>>>> the existing XICS ICP object.  That.. doesn't seem like a good idea.
>>>>
>>>> The TIMA memory region is under the XIVE object because it is 
>>>> unique for the system. The lookup of the ICP is simply done using 
>>>> 'current_cpu'. The TIMA state is under the ICPState, yes, but this 
>>>> model does not seem incorrect to me as this state contains the 
>>>> interrupt information presented to a CPU.
>>>
>>> Yeah, that's not the point I'm making.  My point is that the TIMA
>>> state isn't really the same as xics ICP state.  You're squeezing one
>>> into the other in a pretty ugly way.
>>
>> yes, well, we need to have compatible objects between the XICS and XIVE 
>> mode because of the CAS negotiation. for migration compatibility, it is 
>> much easier to extend existing objects. This approach I am taking today.
> 
> Yeah, I really don't think this approach is workable.
> 
> Roughly speaking, you're keeping the same structures between xics and
> xive, but with mostly different code.  I can't see any way that's not
> going to be horribly fragile, with any update to xics OR xive
> requiring enormous caution not to break the other.
> 
> I really think we have to go one of two ways:
> 
> 1) Abstract the notion of interrupt source and interrupt presenter, so
> we can use a truly common model between xics and xive.
> 
> Given the differences between the two, I don't know this is even
> possible.
> 
> 2) Separate the objects entirely.  ICPs are entirely separate from
> TIMAs, like wise ICSes and xive interrupt sources.
> 
> I think this is probably the way to go.  To make this work with CAS
> switching will require different methods, but I don't think it's
> impossible.
>
> For example, we could (on the new machine type) create both xics ICSes
> and ICPs and xive sources and TIMAs.  We'd have a (migrated) flag in
> the machine saying which is currently active.  All the objects would
> hang around, but only the active ones would do anything.

ok. There is still the question of sharing the IRQ numbers unless we 
find away to allocate them after the CAS negotiation. 

I will take a look at KVM support now before reworking the patchset.

Thanks,

C.

> Now obviously that means we'd be migrating a bunch of redundant state,
> but I still think it's preferable to a Frankenstinian fusion of
> xics-ish and xive-ish state.  I think there's a good chance we can
> improve on the basic idea to remove most or all of that redundant
> state.
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source
  2017-07-25 12:21       ` David Gibson
@ 2017-07-25 15:42         ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25 15:42 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/25/2017 02:21 PM, David Gibson wrote:
> On Mon, Jul 24, 2017 at 05:55:42PM +0200, Cédric Le Goater wrote:
>> On 07/24/2017 06:29 AM, David Gibson wrote:
>>> On Wed, Jul 05, 2017 at 07:13:20PM +0200, Cédric Le Goater wrote:
>>>> Each interrupt source is associated with a 2-bit state machine called
>>>> an Event State Buffer (ESB). It is controlled by MMIO to trigger
>>>> events.
> [snip]
>>>> +/* TODO: handle second page */
>>>
>>> Is this comment still relevent?
>>
>> Some HW have a second page to trigger the event. I am not sure we need 
>> to model it though. I will make some inquiries.
> 
> Ah, ok.  Maybe clarify the comment a bit.
> 
> [snip]
>>>> +static void xive_esb_write(void *opaque, hwaddr addr,
>>>> +                           uint64_t value, unsigned size)
>>>> +{
>>>> +    XiveICSState *xs = ICS_XIVE(opaque);
>>>> +    XIVE *x = xs->xive;
>>>> +    uint32_t offset = addr & 0xF00;
>>>> +    uint32_t srcno = addr >> xs->esb_shift;
>>>> +    uint32_t lisn = srcno + ICS_BASE(xs)->offset;
>>>> +    XiveIVE *ive;
>>>> +    bool notify = false;
>>>> +
>>>> +    ive = xive_get_ive(x, lisn);
>>>> +    if (!ive || !(ive->w & IVE_VALID))  {
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
>>>> +        return;
>>>> +    }
>>>
>>> Having this code associated with the individual ICS look directly at
>>> the IVE table in the core xive object seems a bit dubious.
>>
>> The IVE table holds the validity and mask status of the interrupt 
>> entries, so we need that lookup. However, (continues below) ...
>>
>>> This also
>>> points out another mismatch between the re-used ICS code and the new
>>> XIVE code: ICS gathers all the per-source-irq flags/state into the
>>> irqstate structure, whereas xive has per-irq information in the
>>> centralized ecb and IVE tables.  There can certainly be good reasons
>>> for that, but using both at once is kind of clunky.
>>
>> I understand that you would rather put the esbs in the source they 
>> belong to. That is the case on real HW but it makes the modeling a 
>> bit more difficult. We would need to choose a MMIO address to give 
>> to the guest OS. I had some issues with the allocator (I need 
>> to look at this problem closer).
> 
> Uh.. what do MMIO addresses have to do with this?  I'm talking about
> the actual ESB state in the packed bit array.

To ease the modeling, the ESB states of all IRQs are under the same
bit array and they are exposed to the guest OS through a single memory 
region so that all accesses to the ESB states are centralized. On real 
HW, the ESB cache and the MMIO base depends on the source. 
 
Anyway, that might not be an issue. I will see what I can do to 
decorrelate the XIVE source from the main XIVE object. We can 
probably move the ESB state array below the XIVE source.

C.

>> It might also be an "issue" for KVM. Ben talked about maintaining 
>> all the esbs of a guest under a single memory region to be able to 
>> map the pages in the host.
>>
>> Any how, I agree this is another point to discuss in the sPAPR 
>> model.
>>
>> Thanks,
>>
>> C. 
>>
>>
>>>> +    if (srcno >= ICS_BASE(xs)->nr_irqs) {
>>>> +        qemu_log_mask(LOG_GUEST_ERROR,
>>>> +                      "XIVE: invalid IRQ number: %d/%d lisn: %d\n",
>>>> +                      srcno, ICS_BASE(xs)->nr_irqs, lisn);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    switch (offset) {
>>>> +    case 0:
>>>> +        /* TODO: should we trigger even if the IVE is masked ? */
>>>> +        notify = xive_pq_trigger(x, lisn);
>>>> +        break;
>>>> +    default:
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
>>>> +                      offset);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (notify && !(ive->w & IVE_MASKED)) {
>>>> +        qemu_irq_pulse(ICS_BASE(xs)->qirqs[srcno]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static const MemoryRegionOps xive_esb_ops = {
>>>> +    .read = xive_esb_read,
>>>> +    .write = xive_esb_write,
>>>> +    .endianness = DEVICE_BIG_ENDIAN,
>>>> +    .valid = {
>>>> +        .min_access_size = 8,
>>>> +        .max_access_size = 8,
>>>> +    },
>>>> +    .impl = {
>>>> +        .min_access_size = 8,
>>>> +        .max_access_size = 8,
>>>> +    },
>>>> +};
>>>> +
>>>> +/*
>>>>   * XIVE Interrupt Source
>>>>   */
>>>>  static void xive_ics_set_irq_msi(XiveICSState *xs, int srcno, int val)
>>>> @@ -106,15 +326,25 @@ static void xive_ics_realize(ICSState *ics, Error **errp)
>>>>          return;
>>>>      }
>>>>  
>>>> +    if (!xs->esb_shift) {
>>>> +        error_setg(errp, "ESB page size needs to be greater 0");
>>>> +        return;
>>>> +    }
>>>> +
>>>>      ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>>>>      ics->qirqs = qemu_allocate_irqs(xive_ics_set_irq, xs, ics->nr_irqs);
>>>>  
>>>> +    memory_region_init_io(&xs->esb_iomem, OBJECT(xs), &xive_esb_ops, xs,
>>>> +                          "xive.esb",
>>>> +                          (1ull << xs->esb_shift) * ICS_BASE(xs)->nr_irqs);
>>>> +
>>>>      qemu_register_reset(xive_ics_reset, xs);
>>>>  }
>>>>  
>>>>  static Property xive_ics_properties[] = {
>>>>      DEFINE_PROP_UINT32("nr-irqs", ICSState, nr_irqs, 0),
>>>>      DEFINE_PROP_UINT32("irq-base", ICSState, offset, 0),
>>>> +    DEFINE_PROP_UINT32("shift", XiveICSState, esb_shift, 0),
>>>>      DEFINE_PROP_END_OF_LIST(),
>>>>  };
>>>>  
>>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>>>> index 544cc6e0c796..5303d96f5f59 100644
>>>> --- a/include/hw/ppc/xive.h
>>>> +++ b/include/hw/ppc/xive.h
>>>> @@ -33,6 +33,9 @@ typedef struct XiveICSState XiveICSState;
>>>>  struct XiveICSState {
>>>>      ICSState parent_obj;
>>>>  
>>>> +    uint32_t     esb_shift;
>>>> +    MemoryRegion esb_iomem;
>>>> +
>>>>      XIVE         *xive;
>>>>  };
>>>>  
>>>
>>
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables
  2017-07-25  2:16       ` David Gibson
@ 2017-07-25 15:54         ` Cédric Le Goater
  0 siblings, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-07-25 15:54 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

On 07/25/2017 04:16 AM, David Gibson wrote:
> On Mon, Jul 24, 2017 at 02:52:29PM +0200, Cédric Le Goater wrote:
>> On 07/19/2017 05:24 AM, David Gibson wrote:
>>> On Wed, Jul 05, 2017 at 07:13:18PM +0200, Cédric Le Goater wrote:
>>>> The XIVE interrupt controller of the POWER9 uses a set of tables to
>>>> redirect exception from event sources to CPU threads. Among which we
>>>> choose to model :
>>>>
>>>>  - the State Bit Entries (SBE), also known as Event State Buffer
>>>>    (ESB). This is a two bit state machine for each event source which
>>>>    is used to trigger events. The bits are named "P" (pending) and "Q"
>>>>    (queued) and can be controlled by MMIO.
>>>>
>>>>  - the Interrupt Virtualization Entry (IVE) table, also known as Event
>>>>    Assignment Structure (EAS). This table is indexed by the IRQ number
>>>>    and is looked up to find the Event Queue associated with a
>>>>    triggered event.
>>>>
>>>>  - the Event Queue Descriptor (EQD) table, also known as Event
>>>>    Notification Descriptor (END). The EQD contains fields that specify
>>>>    the Event Queue on which event data is posted (and later pulled by
>>>>    the OS) and also a target (or VPD) to notify.
>>>>
>>>> An additional table was not modeled but we might need to to support
>>>> the H_INT_SET_OS_REPORTING_LINE hcall:
>>>>
>>>>  - the Virtual Processor Descriptor (VPD) table, also known as
>>>>    Notification Virtual Target (NVT).
>>>>
>>>> The XIVE object is expanded with the tables described above. The size
>>>> of each table depends on the number of provisioned IRQ and the maximum
>>>> number of CPUs in the system. The indexing is very basic and might
>>>> need to be improved for the EQs.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/xive-internal.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  hw/intc/xive.c          | 72 +++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 167 insertions(+)
>>>>
>>>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>>>> index 155c2dcd6066..8e755aa88a14 100644
>>>> --- a/hw/intc/xive-internal.h
>>>> +++ b/hw/intc/xive-internal.h
>>>> @@ -11,6 +11,89 @@
>>>>  
>>>>  #include <hw/sysbus.h>
>>>>  
>>>> +/* Utilities to manipulate these (originaly from OPAL) */
>>>> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
>>>> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
>>>> +#define SETFIELD(m, v, val)                             \
>>>> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
>>>> +
>>>> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
>>>> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
>>>> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
>>>> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>>>> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>>>> +                                 PPC_BIT32(bs))
>>>> +
>>>> +/* IVE/EAS
>>>> + *
>>>> + * One per interrupt source. Targets that interrupt to a given EQ
>>>> + * and provides the corresponding logical interrupt number (EQ data)
>>>> + *
>>>> + * We also map this structure to the escalation descriptor inside
>>>> + * an EQ, though in that case the valid and masked bits are not used.
>>>> + */
>>>> +typedef struct XiveIVE {
>>>> +        /* Use a single 64-bit definition to make it easier to
>>>> +         * perform atomic updates
>>>> +         */
>>>> +        uint64_t        w;
>>>> +#define IVE_VALID       PPC_BIT(0)
>>>> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>>>> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>>>> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
>>>> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>>>> +} XiveIVE;
>>>> +
>>>> +/* EQ */
>>>> +typedef struct XiveEQ {
>>>> +        uint32_t        w0;
>>>> +#define EQ_W0_VALID             PPC_BIT32(0)
>>>> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
>>>> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
>>>> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
>>>> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
>>>> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
>>>> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
>>>> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
>>>> +#define EQ_W0_SW0               PPC_BIT32(16)
>>>> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
>>>> +#define EQ_QSIZE_4K             0
>>>> +#define EQ_QSIZE_64K            4
>>>> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
>>>> +        uint32_t        w1;
>>>> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
>>>> +#define EQ_W1_ESn_P             PPC_BIT32(0)
>>>> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
>>>> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
>>>> +#define EQ_W1_ESe_P             PPC_BIT32(2)
>>>> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
>>>> +#define EQ_W1_GENERATION        PPC_BIT32(9)
>>>> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
>>>> +        uint32_t        w2;
>>>> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
>>>> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
>>>> +        uint32_t        w3;
>>>> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
>>>> +        uint32_t        w4;
>>>> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
>>>> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
>>>> +        uint32_t        w5;
>>>> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
>>>> +        uint32_t        w6;
>>>> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
>>>> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
>>>> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
>>>> +        uint32_t        w7;
>>>> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
>>>> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
>>>> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
>>>> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
>>>> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>>>> +} XiveEQ;
>>>> +
>>>> +#define XIVE_EQ_PRIORITY_COUNT 8
>>>> +#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
>>>> +
>>>>  struct XIVE {
>>>>      SysBusDevice parent;
>>>>  
>>>> @@ -23,6 +106,18 @@ struct XIVE {
>>>>      uint32_t     int_max;       /* Max index */
>>>>      uint32_t     int_hw_bot;    /* Bottom index of HW IRQ allocator */
>>>>      uint32_t     int_ipi_top;   /* Highest IPI index handed out so far + 1 */
>>>> +
>>>> +    /* XIVE internal tables */
>>>> +    void         *sbe;
>>>> +    XiveIVE      *ivt;
>>>> +    XiveEQ       *eqdt;
>>>>  };
>>>>  
>>>> +void xive_reset(void *dev);
>>>> +XiveIVE *xive_get_ive(XIVE *x, uint32_t isn);
>>>> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx);
>>>> +
>>>> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t prio,
>>>> +                        uint32_t *out_eq_idx);
>>>> +
>>>>  #endif /* _INTC_XIVE_INTERNAL_H */
>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>>> index 5b4ea915d87c..5b14d8155317 100644
>>>> --- a/hw/intc/xive.c
>>>> +++ b/hw/intc/xive.c
>>>> @@ -35,6 +35,27 @@
>>>>   */
>>>>  #define MAX_HW_IRQS_ENTRIES (8 * 1024)
>>>>  
>>>> +
>>>> +void xive_reset(void *dev)
>>>> +{
>>>> +    XIVE *x = XIVE(dev);
>>>> +    int i;
>>>> +
>>>> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
>>>> +    memset(x->sbe, 0x55, x->int_count / 4);
>>>
>>> I think strictly this should be a DIV_ROUND_UP to handle the case of
>>> int_count not a multiple of 4.
>>
>> ok. 
>>  
>>>> +
>>>> +    /* Clear and mask all valid IVEs */
>>>> +    for (i = x->int_base; i < x->int_max; i++) {
>>>> +        XiveIVE *ive = &x->ivt[i];
>>>> +        if (ive->w & IVE_VALID) {
>>>> +            ive->w = IVE_VALID | IVE_MASKED;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /* clear all EQs */
>>>> +    memset(x->eqdt, 0, x->nr_targets * XIVE_EQ_PRIORITY_COUNT * sizeof(XiveEQ));
>>>> +}
>>>> +
>>>>  static void xive_init(Object *obj)
>>>>  {
>>>>      ;
>>>> @@ -62,6 +83,19 @@ static void xive_realize(DeviceState *dev, Error **errp)
>>>>      if (x->int_ipi_top < 0x10) {
>>>>          x->int_ipi_top = 0x10;
>>>>      }
>>>> +
>>>> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>>>> +    x->sbe = g_malloc0(x->int_count / 4);
>>>
>>> And here as well.
>>
>> yes.
>>
>>>> +
>>>> +    /* Allocate the IVT (Interrupt Virtualization Table) */
>>>> +    x->ivt = g_malloc0(x->int_count * sizeof(XiveIVE));
>>>> +
>>>> +    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
>>>> +     * for each thread in the system */
>>>> +    x->eqdt = g_malloc0(x->nr_targets * XIVE_EQ_PRIORITY_COUNT *
>>>> +                        sizeof(XiveEQ));
>>>> +
>>>> +    qemu_register_reset(xive_reset, dev);
>>>>  }
>>>>  
>>>>  static Property xive_properties[] = {
>>>> @@ -92,3 +126,41 @@ static void xive_register_types(void)
>>>>  }
>>>>  
>>>>  type_init(xive_register_types)
>>>> +
>>>> +XiveIVE *xive_get_ive(XIVE *x, uint32_t lisn)
>>>> +{
>>>> +    uint32_t idx = lisn;
>>>> +
>>>> +    if (idx < x->int_base || idx >= x->int_max) {
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    return &x->ivt[idx];
>>>
>>> Should be idx - int_base, no?
>>
>> no, not in the allocator model I have chosen. The IRQ numbers 
>> are exposed to the guest with their offset. But this is another 
>> discussion which I would rather continue in another thread. 
> 
> Uh.. but you're using idx to index IVT directly, after verifying that
> it lies between int_base and int_max.  AFAICT IVT is only allocated
> with int_max - int_base entries, so without an offset here you'll
> overrun it, won't you?

ah yes, you are right. I got confused because the idx used to be
calculated in a different way. Luckily, 'int_base' is zero for the 
moment. Anyway I need to rework the allocator and the indexing 
of these tables, it's too complex for sPAPR. 

Thanks,

C. 

>>>> +}
>>>> +
>>>> +XiveEQ *xive_get_eq(XIVE *x, uint32_t idx)
>>>> +{
>>>> +    if (idx >= x->nr_targets * XIVE_EQ_PRIORITY_COUNT) {
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    return &x->eqdt[idx];
>>>> +}
>>>> +
>>>> +/* TODO: improve EQ indexing. This is very simple and relies on the
>>>> + * fact that target (CPU) numbers start at 0 and are contiguous. It
>>>> + * should be OK for sPAPR.
>>>> + */
>>>> +bool xive_eq_for_target(XIVE *x, uint32_t target, uint8_t priority,
>>>> +                        uint32_t *out_eq_idx)
>>>> +{
>>>> +    if (priority > XIVE_PRIORITY_MAX || target >= x->nr_targets) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    if (out_eq_idx) {
>>>> +        *out_eq_idx = target + priority;
>>>> +    }
>>>> +
>>>> +    return true;
>>>
>>> Seems a clunky interface.  Why not return a XiveEQ *, NULL if the
>>> inputs aren't valud.
>>
>> Yes. This interface is inherited from OPAL and it's not consistent 
>> with the other xive_get_*() routines. But we are missing a XIVE 
>> internal table for VPs which explains the difference. I need to look 
>> at the support of the OS_REPORTING_LINE hcalls before simplifying.
>>
>> Thanks,
>>
>> C. 
>>
>>
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  2017-07-25 15:01             ` Cédric Le Goater
@ 2017-07-26  2:02               ` David Gibson
  0 siblings, 0 replies; 122+ messages in thread
From: David Gibson @ 2017-07-26  2:02 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Benjamin Herrenschmidt, Alexander Graf, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4339 bytes --]

On Tue, Jul 25, 2017 at 05:01:48PM +0200, Cédric Le Goater wrote:
> On 07/25/2017 03:21 PM, David Gibson wrote:
> > On Tue, Jul 25, 2017 at 11:08:46AM +0200, Cédric Le Goater wrote:
> >> On 07/25/2017 06:20 AM, David Gibson wrote:
> >>> On Mon, Jul 24, 2017 at 04:44:00PM +0200, Cédric Le Goater wrote:
> >>>> On 07/24/2017 08:35 AM, David Gibson wrote:
> >>>>> On Wed, Jul 05, 2017 at 07:13:27PM +0200, Cédric Le Goater wrote:
> >>>>>> The Thread Interrupt Management Area for the OS is mostly used to
> >>>>>> acknowledge interrupts and set the CPPR of the CPU.
> >>>>>>
> >>>>>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
> >>>>>> used to retrieve the targeted interrupt presenter object.
> >>>>>>
> >>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>>>
> >>>>> Am I right in thinking that this shoehorns the XIVE TIMA state into
> >>>>> the existing XICS ICP object.  That.. doesn't seem like a good idea.
> >>>>
> >>>> The TIMA memory region is under the XIVE object because it is 
> >>>> unique for the system. The lookup of the ICP is simply done using 
> >>>> 'current_cpu'. The TIMA state is under the ICPState, yes, but this 
> >>>> model does not seem incorrect to me as this state contains the 
> >>>> interrupt information presented to a CPU.
> >>>
> >>> Yeah, that's not the point I'm making.  My point is that the TIMA
> >>> state isn't really the same as xics ICP state.  You're squeezing one
> >>> into the other in a pretty ugly way.
> >>
> >> yes, well, we need to have compatible objects between the XICS and XIVE 
> >> mode because of the CAS negotiation. for migration compatibility, it is 
> >> much easier to extend existing objects. This approach I am taking today.
> > 
> > Yeah, I really don't think this approach is workable.
> > 
> > Roughly speaking, you're keeping the same structures between xics and
> > xive, but with mostly different code.  I can't see any way that's not
> > going to be horribly fragile, with any update to xics OR xive
> > requiring enormous caution not to break the other.
> > 
> > I really think we have to go one of two ways:
> > 
> > 1) Abstract the notion of interrupt source and interrupt presenter, so
> > we can use a truly common model between xics and xive.
> > 
> > Given the differences between the two, I don't know this is even
> > possible.
> > 
> > 2) Separate the objects entirely.  ICPs are entirely separate from
> > TIMAs, like wise ICSes and xive interrupt sources.
> > 
> > I think this is probably the way to go.  To make this work with CAS
> > switching will require different methods, but I don't think it's
> > impossible.
> >
> > For example, we could (on the new machine type) create both xics ICSes
> > and ICPs and xive sources and TIMAs.  We'd have a (migrated) flag in
> > the machine saying which is currently active.  All the objects would
> > hang around, but only the active ones would do anything.
> 
> ok. There is still the question of sharing the IRQ numbers unless we 
> find away to allocate them after the CAS negotiation. 

Right.  As I said, I think allocating after CAS is probably possible,
but it may not be necessary.

Merely lining up the irq numbers should be much simpler than actually
fusing the source and presenter structures.  Especially since both
xics and xive are flexible enough that we can map the irqs almost
however we want.

This definitely does mean the irq mapping / allocation should sit
inside the machine type, rather than the xics *or* the xive code, and
preferably be entirely common.  Then it's just a matter of setting up
both the xics and xive components so they can back that irq allocation.

> I will take a look at KVM support now before reworking the patchset.
> 
> Thanks,
> 
> C.
> 
> > Now obviously that means we'd be migrating a bunch of redundant state,
> > but I still think it's preferable to a Frankenstinian fusion of
> > xics-ish and xive-ish state.  I think there's a good chance we can
> > improve on the basic idea to remove most or all of that redundant
> > state.
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged
  2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged Cédric Le Goater
@ 2017-09-09  7:39   ` Benjamin Herrenschmidt
  2017-09-09  8:08     ` Cédric Le Goater
  2017-09-09  8:24     ` Cédric Le Goater
  0 siblings, 2 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-09  7:39 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On Wed, 2017-07-05 at 19:13 +0200, Cédric Le Goater wrote:
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xive.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index c3c1e9c9db2d..cda1fa18e44d 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -53,6 +53,21 @@ static uint64_t xive_icp_accept(XiveICPState *xicp)
>      return (nsr << 8) | xicp->tima_os[TM_CPPR];
>  }
>  
> +static uint8_t ipb_to_pipr(uint8_t ibp)
> +{
> +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
> +}

The PIPR needs to be updated also on accept etc... anything that change
IPBs or CPPR really.

Also I just learned something from the designers: The IPIR is clamped
to CPPR.

So basically the value in the PIPR is:

	v = leftmost_bit_of(ipb) (or 0xff);
	pipr = v < cppr ? v : cppr;

which means it's never actually 0xff ... surprise !

That also means I need to fix my implementation of H_IPOLL in KVM.

> +static void xive_icp_notify(XiveICPState *xicp)
> +{
> +    xicp->tima_os[TM_PIPR] = ipb_to_pipr(xicp->tima_os[TM_IPB]);
> +
> +    if (xicp->tima_os[TM_PIPR] < xicp->tima_os[TM_CPPR]) {
> +        xicp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> +        qemu_irq_raise(ICP(xicp)->output);
> +    }
> +}
> +
>  static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
>  {
>      if (cppr > XIVE_PRIORITY_MAX) {
> @@ -60,6 +75,10 @@ static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
>      }
>  
>      xicp->tima_os[TM_CPPR] = cppr;
> +
> +    /* CPPR has changed, inform the ICP which might raise an
> +     * exception */
> +    xive_icp_notify(xicp);
>  }
>  
>  /*
> @@ -339,6 +358,8 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
>      } else {
>          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
>      }
> +
> +    xive_icp_notify(xicp);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged
  2017-09-09  7:39   ` Benjamin Herrenschmidt
@ 2017-09-09  8:08     ` Cédric Le Goater
  2017-09-09  8:40       ` Benjamin Herrenschmidt
  2017-09-09  8:24     ` Cédric Le Goater
  1 sibling, 1 reply; 122+ messages in thread
From: Cédric Le Goater @ 2017-09-09  8:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: qemu-ppc, qemu-devel

On 09/09/2017 09:39 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-05 at 19:13 +0200, Cédric Le Goater wrote:
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c | 21 +++++++++++++++++++++
>>  1 file changed, 21 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index c3c1e9c9db2d..cda1fa18e44d 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -53,6 +53,21 @@ static uint64_t xive_icp_accept(XiveICPState *xicp)
>>      return (nsr << 8) | xicp->tima_os[TM_CPPR];
>>  }
>>  
>> +static uint8_t ipb_to_pipr(uint8_t ibp)
>> +{
>> +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
>> +}
> 
> The PIPR needs to be updated also on accept etc... anything that change
> IPBs or CPPR really.

I did forget the update in accept ... thanks. 

> Also I just learned something from the designers: The IPIR is clamped
> to CPPR.
> 
> So basically the value in the PIPR is:
> 
> 	v = leftmost_bit_of(ipb) (or 0xff);
> 	pipr = v < cppr ? v : cppr;
> 
> which means it's never actually 0xff ... surprise !

ah. that is not what the specs say but it shouldn't be a problem.
I will change the code to reflect that.

C.

> That also means I need to fix my implementation of H_IPOLL in KVM.
> 
>> +static void xive_icp_notify(XiveICPState *xicp)
>> +{
>> +    xicp->tima_os[TM_PIPR] = ipb_to_pipr(xicp->tima_os[TM_IPB]);
>> +
>> +    if (xicp->tima_os[TM_PIPR] < xicp->tima_os[TM_CPPR]) {
>> +        xicp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
>> +        qemu_irq_raise(ICP(xicp)->output);
>> +    }
>> +}
>> +
>>  static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
>>  {
>>      if (cppr > XIVE_PRIORITY_MAX) {
>> @@ -60,6 +75,10 @@ static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
>>      }
>>  
>>      xicp->tima_os[TM_CPPR] = cppr;
>> +
>> +    /* CPPR has changed, inform the ICP which might raise an
>> +     * exception */
>> +    xive_icp_notify(xicp);
>>  }
>>  
>>  /*
>> @@ -339,6 +358,8 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
>>      } else {
>>          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
>>      }
>> +
>> +    xive_icp_notify(xicp);
>>  }
>>  
>>  /*

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged
  2017-09-09  7:39   ` Benjamin Herrenschmidt
  2017-09-09  8:08     ` Cédric Le Goater
@ 2017-09-09  8:24     ` Cédric Le Goater
  1 sibling, 0 replies; 122+ messages in thread
From: Cédric Le Goater @ 2017-09-09  8:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: qemu-ppc, qemu-devel

On 09/09/2017 09:39 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-05 at 19:13 +0200, Cédric Le Goater wrote:
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xive.c | 21 +++++++++++++++++++++
>>  1 file changed, 21 insertions(+)
>>
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index c3c1e9c9db2d..cda1fa18e44d 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -53,6 +53,21 @@ static uint64_t xive_icp_accept(XiveICPState *xicp)
>>      return (nsr << 8) | xicp->tima_os[TM_CPPR];
>>  }
>>  
>> +static uint8_t ipb_to_pipr(uint8_t ibp)
>> +{
>> +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
>> +}
> 
> The PIPR needs to be updated also on accept etc... anything that change
> IPBs or CPPR really.

but not for the SET_OS_PENDING special write I suppose.

C.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged
  2017-09-09  8:08     ` Cédric Le Goater
@ 2017-09-09  8:40       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 122+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-09  8:40 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On Sat, 2017-09-09 at 10:08 +0200, Cédric Le Goater wrote:
> On 09/09/2017 09:39 AM, Benjamin Herrenschmidt wrote:
> > On Wed, 2017-07-05 at 19:13 +0200, Cédric Le Goater wrote:
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > > ---
> > >  hw/intc/xive.c | 21 +++++++++++++++++++++
> > >  1 file changed, 21 insertions(+)
> > > 
> > > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > > index c3c1e9c9db2d..cda1fa18e44d 100644
> > > --- a/hw/intc/xive.c
> > > +++ b/hw/intc/xive.c
> > > @@ -53,6 +53,21 @@ static uint64_t xive_icp_accept(XiveICPState *xicp)
> > >      return (nsr << 8) | xicp->tima_os[TM_CPPR];
> > >  }
> > >  
> > > +static uint8_t ipb_to_pipr(uint8_t ibp)
> > > +{
> > > +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
> > > +}
> > 
> > The PIPR needs to be updated also on accept etc... anything that change
> > IPBs or CPPR really.
> 
> I did forget the update in accept ... thanks. 
> 
> > Also I just learned something from the designers: The IPIR is clamped
> > to CPPR.
> > 
> > So basically the value in the PIPR is:
> > 
> > 	v = leftmost_bit_of(ipb) (or 0xff);
> > 	pipr = v < cppr ? v : cppr;
> > 
> > which means it's never actually 0xff ... surprise !
> 
> ah. that is not what the specs say but it shouldn't be a problem.
> I will change the code to reflect that.

Right, I've asked the HW guys to update the spec.

Cheers,
Ben.

> C.
> 
> > That also means I need to fix my implementation of H_IPOLL in KVM.
> > 
> > > +static void xive_icp_notify(XiveICPState *xicp)
> > > +{
> > > +    xicp->tima_os[TM_PIPR] = ipb_to_pipr(xicp->tima_os[TM_IPB]);
> > > +
> > > +    if (xicp->tima_os[TM_PIPR] < xicp->tima_os[TM_CPPR]) {
> > > +        xicp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> > > +        qemu_irq_raise(ICP(xicp)->output);
> > > +    }
> > > +}
> > > +
> > >  static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
> > >  {
> > >      if (cppr > XIVE_PRIORITY_MAX) {
> > > @@ -60,6 +75,10 @@ static void xive_icp_set_cppr(XiveICPState *xicp, uint8_t cppr)
> > >      }
> > >  
> > >      xicp->tima_os[TM_CPPR] = cppr;
> > > +
> > > +    /* CPPR has changed, inform the ICP which might raise an
> > > +     * exception */
> > > +    xive_icp_notify(xicp);
> > >  }
> > >  
> > >  /*
> > > @@ -339,6 +358,8 @@ static void xive_icp_irq(XiveICSState *xs, int lisn)
> > >      } else {
> > >          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> > >      }
> > > +
> > > +    xive_icp_notify(xicp);
> > >  }
> > >  
> > >  /*

^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, other threads:[~2017-09-09  8:40 UTC | newest]

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-05 17:13 [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 01/26] spapr: introduce the XIVE_EXPLOIT option in CAS Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 02/26] spapr: populate device tree depending on XIVE_EXPLOIT option Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 03/26] target/ppc/POWER9: add POWERPC_EXCP_POWER9 Cédric Le Goater
2017-07-10 10:26   ` David Gibson
2017-07-10 12:49     ` Cédric Le Goater
2017-07-10 21:00       ` Benjamin Herrenschmidt
2017-07-11  9:01         ` Cédric Le Goater
2017-07-11 13:27           ` David Gibson
2017-07-11 13:52             ` Cédric Le Goater
2017-07-11 21:20               ` Benjamin Herrenschmidt
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 04/26] ppc/xive: introduce a skeleton for the XIVE interrupt controller model Cédric Le Goater
2017-07-19  3:08   ` David Gibson
2017-07-19  3:23     ` David Gibson
2017-07-19  3:56     ` Benjamin Herrenschmidt
2017-07-19  4:01       ` David Gibson
2017-07-19  4:18         ` Benjamin Herrenschmidt
2017-07-19  4:25           ` David Gibson
2017-07-19  4:02     ` Benjamin Herrenschmidt
2017-07-21  7:50       ` David Gibson
2017-07-21  8:21         ` Benjamin Herrenschmidt
2017-07-24  3:28           ` David Gibson
2017-07-24  3:53             ` Alexey Kardashevskiy
2017-07-24  5:04             ` Benjamin Herrenschmidt
2017-07-24  5:38               ` David Gibson
2017-07-24  7:20                 ` Benjamin Herrenschmidt
2017-07-24 10:03                   ` David Gibson
2017-07-25  8:52                     ` Cédric Le Goater
2017-07-25 12:39                       ` David Gibson
2017-07-25 13:48                         ` Cédric Le Goater
2017-07-24 13:00     ` Cédric Le Goater
2017-07-25  1:26       ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
2017-07-25  2:17         ` David Gibson
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 05/26] ppc/xive: define XIVE internal tables Cédric Le Goater
2017-07-19  3:24   ` David Gibson
2017-07-24 12:52     ` Cédric Le Goater
2017-07-25  2:16       ` David Gibson
2017-07-25 15:54         ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 06/26] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
2017-07-24  4:02   ` David Gibson
2017-07-24  6:00     ` Alexey Kardashevskiy
2017-07-24 15:20       ` Cédric Le Goater
2017-07-25  3:06         ` Alexey Kardashevskiy
2017-07-24 15:13     ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 07/26] ppc/xive: add MMIO handlers to the XIVE interrupt source Cédric Le Goater
2017-07-24  4:29   ` David Gibson
2017-07-24  8:56     ` Benjamin Herrenschmidt
2017-07-24 15:55     ` Cédric Le Goater
2017-07-25 12:21       ` David Gibson
2017-07-25 15:42         ` Cédric Le Goater
2017-07-24  6:50   ` Alexey Kardashevskiy
2017-07-24 15:39     ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 08/26] ppc/xive: add flags " Cédric Le Goater
2017-07-24  4:36   ` David Gibson
2017-07-24  7:00     ` Benjamin Herrenschmidt
2017-07-24  9:50       ` David Gibson
2017-07-24 11:07         ` Benjamin Herrenschmidt
2017-07-24 11:47           ` Cédric Le Goater
2017-07-25  4:19             ` David Gibson
2017-07-25  5:49               ` Benjamin Herrenschmidt
2017-07-25  4:18           ` David Gibson
2017-07-25  5:47             ` Benjamin Herrenschmidt
2017-07-25  8:28               ` Cédric Le Goater
2017-07-25 12:24               ` David Gibson
2017-07-25  8:17         ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 09/26] ppc/xive: add an overall memory region for the ESBs Cédric Le Goater
2017-07-24  4:49   ` David Gibson
2017-07-24  6:09     ` Benjamin Herrenschmidt
2017-07-24  6:39       ` David Gibson
2017-07-24 13:27         ` Cédric Le Goater
2017-07-25  2:19           ` David Gibson
2017-07-24 13:25       ` Cédric Le Goater
2017-07-25  2:19         ` David Gibson
2017-07-25  9:50           ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 10/26] ppc/xive: record interrupt source MMIO address for hcalls Cédric Le Goater
2017-07-24  5:11   ` David Gibson
2017-07-24 13:45     ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 11/26] ppc/xics: introduce a print_info() handler to the ICS and ICP objects Cédric Le Goater
2017-07-24  5:13   ` David Gibson
2017-07-24 13:58     ` Cédric Le Goater
2017-07-25 13:26       ` David Gibson
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 12/26] ppc/xive: add a print_info() handler for the interrupt source Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 13/26] ppc/xive: introduce a XIVE interrupt presenter model Cédric Le Goater
2017-07-24  6:05   ` David Gibson
2017-07-24 14:02     ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 14/26] ppc/xive: add MMIO handlers to the " Cédric Le Goater
2017-07-24  6:35   ` David Gibson
2017-07-24 14:44     ` Cédric Le Goater
2017-07-25  4:20       ` David Gibson
2017-07-25  9:08         ` Cédric Le Goater
2017-07-25 13:21           ` David Gibson
2017-07-25 15:01             ` Cédric Le Goater
2017-07-26  2:02               ` David Gibson
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 15/26] ppc/xive: push EQ data in OS event queues Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 16/26] ppc/xive: notify CPU when interrupt priority is more privileged Cédric Le Goater
2017-09-09  7:39   ` Benjamin Herrenschmidt
2017-09-09  8:08     ` Cédric Le Goater
2017-09-09  8:40       ` Benjamin Herrenschmidt
2017-09-09  8:24     ` Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 17/26] ppc/xive: add hcalls support Cédric Le Goater
2017-07-24  9:39   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
2017-07-24 14:55     ` Cédric Le Goater
2017-07-25  2:09       ` Alexey Kardashevskiy
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 18/26] ppc/xive: add device tree support Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 19/26] ppc/xive: introduce a helper to map the XIVE memory regions Cédric Le Goater
2017-07-25  2:54   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
2017-07-25  9:18     ` Cédric Le Goater
2017-07-25 14:16       ` Alexey Kardashevskiy
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 20/26] ppc/xive: introduce a helper to create XIVE interrupt source objects Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 21/26] ppc/xive: introduce routines to allocate IRQ numbers Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 22/26] ppc/xive: create an XIVE interrupt source to handle IPIs Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 23/26] spapr: add a XIVE object to the sPAPR machine Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 24/26] spapr: include the XIVE interrupt source for IPIs Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 25/26] spapr: print the XIVE interrupt source for IPIs in the monitor Cédric Le Goater
2017-07-05 17:13 ` [Qemu-devel] [RFC PATCH 26/26] spapr: force XIVE exploitation mode for POWER9 (HACK) Cédric Le Goater
2017-07-25  2:43   ` [Qemu-devel] [Qemu-ppc] " Alexey Kardashevskiy
2017-07-25  9:20     ` Cédric Le Goater
2017-07-10 10:24 ` [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
2017-07-10 12:36   ` Cédric Le Goater
2017-07-19  3:00 ` David Gibson
2017-07-19  3:55   ` Benjamin Herrenschmidt
2017-07-24  7:28     ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.