All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9)
@ 2017-11-23 13:29 Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper Cédric Le Goater
                   ` (24 more replies)
  0 siblings, 25 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Hello,

On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with an
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode, the newer POWER9 interrupt model. XIVE
is a complex interrupt controller introducing a large number of new
features, for virtualization in particular. Here is a brief overview
for the sPAPR platform which only requires a small subset of these
functions.

XIVE uses a set of internal tables to redirect exceptions from
interrupt sources to the CPU. Each interrupt source has a 2-bit state
machine, the Event State Buffer (ESB), that allows events to be
triggered. If the event is let through, XIVE looks up in the Interrupt
Virtualization Entry (IVE) table for the Event Queue Descriptor
defined for the source. Each Event Queue Descriptor defines a
notification path to a CPU and an in-memory queue in which will be
recorded an event identifier for the OS to pull.


This patchset is the first non-RFC proposal to add XIVE support in a
POWER9 sPAPR machine. It should addressed the comments made on the
previous RFCs, the most important points being :

 - use a single IRQ allocator for the machine,
 - remove any relation with the XICS model,
 - get rid of 'nr_servers' under the XIVE object. 

The high level ideas of the current design are :

 - move all IRQ allocation code under the machine to make sure that
   the allocated IRQ numbers are kept in sync between XICS and
   XIVE. This is necessary for the devices which allocate IRQs and
   populate the device tree before the machine is even started.

 - introduce a persistent XIVE object under the sPAPR machine and let
   the CAS negotiation process decide whether it should be used or
   not. Use the 'ov5_cas' attribute for this purpose.

 - introduce a persistent XIVE interrupt presenter under the sPAPR
   core and switch ICP after CAS. Each core has now two ICPs, one
   active through the 'intc' pointer and another one among its
   children ready to be used if the guest requires it.

 - move the XIVE EQs under the cores to simplify the XIVE model

 - allocate the CPU IPIs at the beginning of the IRQ number space to
   be compatible with XICS (which starts at 4096) and also to simplify
   the model. This means that the XIVE model covers the whole IRQ
   number space. There are no offset like in XICS splitting the IRQ
   number space.


The patchset first begins with cleanups and code movements in the XICS
model and in the sPAPR machine to prepare ground for the integration
of the XIVE model. Some could be merged directly.

It continues with patches introducing new models or XIVE :

 - sPAPRXive holding the internal tables and the MMIO regions used by
   the XIVE controller.
 - sPAPRXiveICP object acting the XIVE interrupt presenter

and describing the notification process and the interrupt delivery to
the CPU.

It finishes with the integration of sPAPRXive object under the sPAPR
machine, the introducion of the new XIVE hcalls, the new device tree
layout, and the necessary adjustments to support the CAS negotiation.

Migration is addressed, CPU hotplug, and support for older machines
and QEMU versions also.


Code is here:

  https://github.com/legoater/qemu/commits/xive

Caveats :

 - KVM support : not addressed yet
   The guest needs to be run with kernel_irqchip=off on a POWER9 system.
 - LSI : hardly tested.
   
Thanks,

C.

Tests :

 - make check on each patch
 - migration :
     qemu-2.12 (pseries-2.12) <->  qemu-2.12 (pseries-2.12)
     qemu-2.12 (pseries-2.10) <->  qemu-2.12 (pseries-2.10)
     qemu-2.10 (pseries-2.10) <->  qemu-2.12 (pseries-2.10)

Cédric Le Goater (25):
  ppc/xics: introduce an icp_create() helper
  ppc/xics: assign of the CPU 'intc' pointer under the core
  spapr: introduce a spapr_icp_create() helper
  spapr: move the IRQ allocation routines under the machine
  spapr: introduce a spapr_irq_set() helper
  spapr: introduce a spapr_irq_get_qirq() helper
  migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC
  spapr: introduce a skeleton for the XIVE interrupt controller
  spapr: introduce handlers for XIVE interrupt sources
  spapr: add MMIO handlers for the XIVE interrupt sources
  spapr: describe the XIVE interrupt source flags
  spapr: introduce a XIVE interrupt presenter model
  spapr: introduce the XIVE Event Queues
  spapr: push the XIVE EQ data in OS event queue
  spapr: notify the CPU when the XIVE interrupt priority is more
    privileged
  spapr: add support for the SET_OS_PENDING command (XIVE)
  spapr: add a sPAPRXive object to the machine
  spapr: allocate IRQ numbers for the XIVE interrupt mode
  spapr: add hcalls support for the XIVE interrupt mode
  spapr: add device tree support for the XIVE interrupt mode
  spapr: introduce a helper to map the XIVE memory regions
  spapr: add XIVE support to spapr_irq_get_qirq()
  spapr: toggle the ICP depending on the selected interrupt mode
  spapr: add support to dump XIVE information
  spapr: advertise XIVE exploitation mode in CAS

 default-configs/ppc64-softmmu.mak |   1 +
 hw/intc/Makefile.objs             |   1 +
 hw/intc/spapr_xive.c              | 964 ++++++++++++++++++++++++++++++++++++++
 hw/intc/spapr_xive_hcall.c        | 949 +++++++++++++++++++++++++++++++++++++
 hw/intc/trace-events              |   4 -
 hw/intc/xics.c                    |  35 +-
 hw/intc/xics_spapr.c              | 114 -----
 hw/intc/xive-internal.h           | 189 ++++++++
 hw/ppc/pnv_core.c                 |  10 +-
 hw/ppc/spapr.c                    | 286 ++++++++++-
 hw/ppc/spapr_cpu_core.c           |  44 +-
 hw/ppc/spapr_events.c             |  16 +-
 hw/ppc/spapr_hcall.c              |   6 +
 hw/ppc/spapr_pci.c                |  10 +-
 hw/ppc/spapr_vio.c                |   2 +-
 hw/ppc/trace-events               |   4 +
 include/hw/pci-host/spapr.h       |   2 +-
 include/hw/ppc/spapr.h            |  27 +-
 include/hw/ppc/spapr_cpu_core.h   |   1 +
 include/hw/ppc/spapr_vio.h        |   2 +-
 include/hw/ppc/spapr_xive.h       |  89 ++++
 include/hw/ppc/xics.h             |   8 +-
 include/migration/vmstate.h       |  10 +
 23 files changed, 2592 insertions(+), 182 deletions(-)
 create mode 100644 hw/intc/spapr_xive.c
 create mode 100644 hw/intc/spapr_xive_hcall.c
 create mode 100644 hw/intc/xive-internal.h
 create mode 100644 include/hw/ppc/spapr_xive.h

-- 
2.13.6

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-24  2:51   ` David Gibson
  2017-11-24  9:08   ` Greg Kurz
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core Cédric Le Goater
                   ` (23 subsequent siblings)
  24 siblings, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The sPAPR and the PowerNV core objects create the interrupt presenter
object of the CPUs in a very similar way. Let's provide a common
routine in which we use the presenter 'type' as a child identifier.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics.c          | 22 ++++++++++++++++++++++
 hw/ppc/pnv_core.c       | 10 +---------
 hw/ppc/spapr_cpu_core.c | 13 ++-----------
 include/hw/ppc/xics.h   |  3 +++
 4 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index a1cc0e420c98..e4ccdff8f577 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -384,6 +384,28 @@ static const TypeInfo icp_info = {
     .class_size = sizeof(ICPStateClass),
 };
 
+Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi, Error **errp)
+{
+    Object *child = OBJECT(cs);
+    Error *local_err = NULL;
+    Object *obj;
+
+    obj = object_new(type);
+    object_property_add_child(child, type, obj, &error_abort);
+    object_unref(obj);
+    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
+                                   &error_abort);
+    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        object_unparent(obj);
+        error_propagate(errp, local_err);
+        obj = NULL;
+    }
+
+    return obj;
+}
+
 /*
  * ICS: Source layer
  */
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index 82ff440b3334..a066736846f8 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -126,7 +126,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
     Error *local_err = NULL;
     CPUState *cs = CPU(child);
     PowerPCCPU *cpu = POWERPC_CPU(cs);
-    Object *obj;
 
     object_property_set_bool(child, true, "realized", &local_err);
     if (local_err) {
@@ -134,13 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
         return;
     }
 
-    obj = object_new(TYPE_PNV_ICP);
-    object_property_add_child(child, "icp", obj, NULL);
-    object_unref(obj);
-    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
-                                   &error_abort);
-    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
-    object_property_set_bool(obj, true, "realized", &local_err);
+    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
@@ -148,7 +141,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
 
     powernv_cpu_init(cpu, &local_err);
     if (local_err) {
-        object_unparent(obj);
         error_propagate(errp, local_err);
         return;
     }
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 4ba8563d49e4..f8a520a2fa2d 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -111,7 +111,6 @@ static void spapr_cpu_core_realize_child(Object *child,
     Error *local_err = NULL;
     CPUState *cs = CPU(child);
     PowerPCCPU *cpu = POWERPC_CPU(cs);
-    Object *obj;
 
     object_property_set_bool(child, true, "realized", &local_err);
     if (local_err) {
@@ -123,21 +122,13 @@ static void spapr_cpu_core_realize_child(Object *child,
         goto error;
     }
 
-    obj = object_new(spapr->icp_type);
-    object_property_add_child(child, "icp", obj, &error_abort);
-    object_unref(obj);
-    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(spapr),
-                                   &error_abort);
-    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
-    object_property_set_bool(obj, true, "realized", &local_err);
+    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
     if (local_err) {
-        goto free_icp;
+        goto error;
     }
 
     return;
 
-free_icp:
-    object_unparent(obj);
 error:
     error_propagate(errp, local_err);
 }
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 2df99be111ce..126b47dec38b 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -212,4 +212,7 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
 void xics_spapr_init(sPAPRMachineState *spapr);
 
+Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi,
+                   Error **errp);
+
 #endif /* XICS_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-24  2:57   ` David Gibson
  2017-11-24  9:21   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper Cédric Le Goater
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The 'intc' pointer of the CPU references the interrupt presenter in
the XICS interrupt mode. When the XIVE interrupt mode is available and
activated, the machine will need to reassign this pointer to reflect
the change.

Moving this assignment under the realize routine of the CPU will ease
the process when the interrupt mode is toggled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics.c          | 1 -
 hw/ppc/pnv_core.c       | 2 +-
 hw/ppc/spapr_cpu_core.c | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index e4ccdff8f577..0f2e7273bc8f 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -334,7 +334,6 @@ static void icp_realize(DeviceState *dev, Error **errp)
     }
 
     cpu = POWERPC_CPU(obj);
-    cpu->intc = OBJECT(icp);
     icp->cs = CPU(obj);
 
     env = &cpu->env;
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index a066736846f8..90acaac45889 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -133,7 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
         return;
     }
 
-    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
+    cpu->intc = icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index f8a520a2fa2d..f7cc74512481 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -122,7 +122,7 @@ static void spapr_cpu_core_realize_child(Object *child,
         goto error;
     }
 
-    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
+    cpu->intc = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
     if (local_err) {
         goto error;
     }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-24 10:09   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine Cédric Le Goater
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

On sPAPR, the creation of the interrupt presenter depends on some of
the machine attributes. When the XIVE interrupt mode is available,
this will get more complex. So provide a machine-level helper to
isolate the process and hide the details to the sPAPR core realize
function.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c          | 14 ++++++++++++++
 hw/ppc/spapr_cpu_core.c |  2 +-
 include/hw/ppc/spapr.h  |  2 ++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 174e7ff0678d..925cbd3c1bf4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3556,6 +3556,20 @@ static ICPState *spapr_icp_get(XICSFabric *xi, int vcpu_id)
     return cpu ? ICP(cpu->intc) : NULL;
 }
 
+Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
+{
+    Error *local_err = NULL;
+    Object *obj;
+
+    obj = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    return obj;
+}
+
 static void spapr_pic_print_info(InterruptStatsProvider *obj,
                                  Monitor *mon)
 {
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index f7cc74512481..61a9850e688b 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -122,7 +122,7 @@ static void spapr_cpu_core_realize_child(Object *child,
         goto error;
     }
 
-    cpu->intc = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
+    cpu->intc = spapr_icp_create(spapr, cs, &local_err);
     if (local_err) {
         goto error;
     }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9d21ca9bde3a..9da38de34277 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -707,4 +707,6 @@ void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
 int spapr_vcpu_id(PowerPCCPU *cpu);
 PowerPCCPU *spapr_find_cpu(int vcpu_id);
 
+Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp);
+
 #endif /* HW_SPAPR_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (2 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-24  3:13   ` David Gibson
  2017-11-28 10:57   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 05/25] spapr: introduce a spapr_irq_set() helper Cédric Le Goater
                   ` (20 subsequent siblings)
  24 siblings, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Also change the prototype to use a sPAPRMachineState and prefix them
with spapr_irq_. It will let us synchronise the IRQ allocation with
the XIVE interrupt mode when available.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/trace-events   |   4 --
 hw/intc/xics_spapr.c   | 114 -------------------------------------------------
 hw/ppc/spapr.c         | 114 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_events.c  |   4 +-
 hw/ppc/spapr_pci.c     |   8 ++--
 hw/ppc/spapr_vio.c     |   2 +-
 hw/ppc/trace-events    |   4 ++
 include/hw/ppc/spapr.h |   6 +++
 include/hw/ppc/xics.h  |   4 --
 9 files changed, 131 insertions(+), 129 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index b298fac7c6a8..7077aaaee6d0 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -64,10 +64,6 @@ xics_ics_simple_set_irq_lsi(int srcno, int nr) "set_irq_lsi: srcno %d [irq 0x%x]
 xics_ics_simple_write_xive(int nr, int srcno, int server, uint8_t priority) "ics_write_xive: irq 0x%x [src %d] server 0x%x prio 0x%x"
 xics_ics_simple_reject(int nr, int srcno) "reject irq 0x%x [src %d]"
 xics_ics_simple_eoi(int nr) "ics_eoi: irq 0x%x"
-xics_alloc(int irq) "irq %d"
-xics_alloc_block(int first, int num, bool lsi, int align) "first irq %d, %d irqs, lsi=%d, alignnum %d"
-xics_ics_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
-xics_ics_free_warn(int src, int irq) "Source#%d, irq %d is already free"
 
 # hw/intc/s390_flic_kvm.c
 flic_create_device(int err) "flic: create device failed %d"
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index e8c0a1b3e903..5a0967caf430 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -245,120 +245,6 @@ void xics_spapr_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_IPOLL, h_ipoll);
 }
 
-#define ICS_IRQ_FREE(ics, srcno)   \
-    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
-
-static int ics_find_free_block(ICSState *ics, int num, int alignnum)
-{
-    int first, i;
-
-    for (first = 0; first < ics->nr_irqs; first += alignnum) {
-        if (num > (ics->nr_irqs - first)) {
-            return -1;
-        }
-        for (i = first; i < first + num; ++i) {
-            if (!ICS_IRQ_FREE(ics, i)) {
-                break;
-            }
-        }
-        if (i == (first + num)) {
-            return first;
-        }
-    }
-
-    return -1;
-}
-
-int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp)
-{
-    int irq;
-
-    if (!ics) {
-        return -1;
-    }
-    if (irq_hint) {
-        if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
-            error_setg(errp, "can't allocate IRQ %d: already in use", irq_hint);
-            return -1;
-        }
-        irq = irq_hint;
-    } else {
-        irq = ics_find_free_block(ics, 1, 1);
-        if (irq < 0) {
-            error_setg(errp, "can't allocate IRQ: no IRQ left");
-            return -1;
-        }
-        irq += ics->offset;
-    }
-
-    ics_set_irq_type(ics, irq - ics->offset, lsi);
-    trace_xics_alloc(irq);
-
-    return irq;
-}
-
-/*
- * Allocate block of consecutive IRQs, and return the number of the first IRQ in
- * the block. If align==true, aligns the first IRQ number to num.
- */
-int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
-                          bool align, Error **errp)
-{
-    int i, first = -1;
-
-    if (!ics) {
-        return -1;
-    }
-
-    /*
-     * MSIMesage::data is used for storing VIRQ so
-     * it has to be aligned to num to support multiple
-     * MSI vectors. MSI-X is not affected by this.
-     * The hint is used for the first IRQ, the rest should
-     * be allocated continuously.
-     */
-    if (align) {
-        assert((num == 1) || (num == 2) || (num == 4) ||
-               (num == 8) || (num == 16) || (num == 32));
-        first = ics_find_free_block(ics, num, num);
-    } else {
-        first = ics_find_free_block(ics, num, 1);
-    }
-    if (first < 0) {
-        error_setg(errp, "can't find a free %d-IRQ block", num);
-        return -1;
-    }
-
-    for (i = first; i < first + num; ++i) {
-        ics_set_irq_type(ics, i, lsi);
-    }
-    first += ics->offset;
-
-    trace_xics_alloc_block(first, num, lsi, align);
-
-    return first;
-}
-
-static void ics_free(ICSState *ics, int srcno, int num)
-{
-    int i;
-
-    for (i = srcno; i < srcno + num; ++i) {
-        if (ICS_IRQ_FREE(ics, i)) {
-            trace_xics_ics_free_warn(0, i + ics->offset);
-        }
-        memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
-    }
-}
-
-void spapr_ics_free(ICSState *ics, int irq, int num)
-{
-    if (ics_valid_irq(ics, irq)) {
-        trace_xics_ics_free(0, irq, num);
-        ics_free(ics, irq - ics->offset, num);
-    }
-}
-
 void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
 {
     uint32_t interrupt_server_ranges_prop[] = {
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 925cbd3c1bf4..7ae84d40bdb4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3570,6 +3570,120 @@ Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
     return obj;
 }
 
+#define ICS_IRQ_FREE(ics, srcno)   \
+    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
+
+static int ics_find_free_block(ICSState *ics, int num, int alignnum)
+{
+    int first, i;
+
+    for (first = 0; first < ics->nr_irqs; first += alignnum) {
+        if (num > (ics->nr_irqs - first)) {
+            return -1;
+        }
+        for (i = first; i < first + num; ++i) {
+            if (!ICS_IRQ_FREE(ics, i)) {
+                break;
+            }
+        }
+        if (i == (first + num)) {
+            return first;
+        }
+    }
+
+    return -1;
+}
+
+int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
+                    Error **errp)
+{
+    ICSState *ics = spapr->ics;
+    int irq;
+
+    if (!ics) {
+        return -1;
+    }
+    if (irq_hint) {
+        if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
+            error_setg(errp, "can't allocate IRQ %d: already in use", irq_hint);
+            return -1;
+        }
+        irq = irq_hint;
+    } else {
+        irq = ics_find_free_block(ics, 1, 1);
+        if (irq < 0) {
+            error_setg(errp, "can't allocate IRQ: no IRQ left");
+            return -1;
+        }
+        irq += ics->offset;
+    }
+
+    ics_set_irq_type(ics, irq - ics->offset, lsi);
+    trace_spapr_irq_alloc(irq);
+
+    return irq;
+}
+
+/*
+ * Allocate block of consecutive IRQs, and return the number of the first IRQ in
+ * the block. If align==true, aligns the first IRQ number to num.
+ */
+int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
+                          bool align, Error **errp)
+{
+    ICSState *ics = spapr->ics;
+    int i, first = -1;
+
+    if (!ics) {
+        return -1;
+    }
+
+    /*
+     * MSIMesage::data is used for storing VIRQ so
+     * it has to be aligned to num to support multiple
+     * MSI vectors. MSI-X is not affected by this.
+     * The hint is used for the first IRQ, the rest should
+     * be allocated continuously.
+     */
+    if (align) {
+        assert((num == 1) || (num == 2) || (num == 4) ||
+               (num == 8) || (num == 16) || (num == 32));
+        first = ics_find_free_block(ics, num, num);
+    } else {
+        first = ics_find_free_block(ics, num, 1);
+    }
+    if (first < 0) {
+        error_setg(errp, "can't find a free %d-IRQ block", num);
+        return -1;
+    }
+
+    for (i = first; i < first + num; ++i) {
+        ics_set_irq_type(ics, i, lsi);
+    }
+    first += ics->offset;
+
+    trace_spapr_irq_alloc_block(first, num, lsi, align);
+
+    return first;
+}
+
+void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
+{
+    ICSState *ics = spapr->ics;
+    int srcno = irq - ics->offset;
+    int i;
+
+    if (ics_valid_irq(ics, irq)) {
+        trace_spapr_irq_free(0, irq, num);
+        for (i = srcno; i < srcno + num; ++i) {
+            if (ICS_IRQ_FREE(ics, i)) {
+                trace_spapr_irq_free_warn(0, i + ics->offset);
+            }
+            memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
+        }
+    }
+}
+
 static void spapr_pic_print_info(InterruptStatsProvider *obj,
                                  Monitor *mon)
 {
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index e377fc7ddea2..cead596f3e7a 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -718,7 +718,7 @@ void spapr_events_init(sPAPRMachineState *spapr)
     spapr->event_sources = spapr_event_sources_new();
 
     spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_EPOW,
-                                 spapr_ics_alloc(spapr->ics, 0, false,
+                                 spapr_irq_alloc(spapr, 0, false,
                                                   &error_fatal));
 
     /* NOTE: if machine supports modern/dedicated hotplug event source,
@@ -731,7 +731,7 @@ void spapr_events_init(sPAPRMachineState *spapr)
      */
     if (spapr->use_hotplug_event_source) {
         spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_HOT_PLUG,
-                                     spapr_ics_alloc(spapr->ics, 0, false,
+                                     spapr_irq_alloc(spapr, 0, false,
                                                       &error_fatal));
     }
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 5a3122a9f9f9..e0ef77a480e5 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -314,7 +314,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
             return;
         }
 
-        spapr_ics_free(spapr->ics, msi->first_irq, msi->num);
+        spapr_irq_free(spapr, msi->first_irq, msi->num);
         if (msi_present(pdev)) {
             spapr_msi_setmsg(pdev, 0, false, 0, 0);
         }
@@ -352,7 +352,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     }
 
     /* Allocate MSIs */
-    irq = spapr_ics_alloc_block(spapr->ics, req_num, false,
+    irq = spapr_irq_alloc_block(spapr, req_num, false,
                            ret_intr_type == RTAS_TYPE_MSI, &err);
     if (err) {
         error_reportf_err(err, "Can't allocate MSIs for device %x: ",
@@ -363,7 +363,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
 
     /* Release previous MSIs */
     if (msi) {
-        spapr_ics_free(spapr->ics, msi->first_irq, msi->num);
+        spapr_irq_free(spapr, msi->first_irq, msi->num);
         g_hash_table_remove(phb->msi, &config_addr);
     }
 
@@ -1675,7 +1675,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         uint32_t irq;
         Error *local_err = NULL;
 
-        irq = spapr_ics_alloc_block(spapr->ics, 1, true, false, &local_err);
+        irq = spapr_irq_alloc_block(spapr, 1, true, false, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             error_prepend(errp, "can't allocate LSIs: ");
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index ea3bc8bd9e21..bb7ed2c537b0 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -454,7 +454,7 @@ static void spapr_vio_busdev_realize(DeviceState *qdev, Error **errp)
         dev->qdev.id = id;
     }
 
-    dev->irq = spapr_ics_alloc(spapr->ics, dev->irq, false, &local_err);
+    dev->irq = spapr_irq_alloc(spapr, dev->irq, false, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
index 4a6a6490fa78..b7c3e64b5ee7 100644
--- a/hw/ppc/trace-events
+++ b/hw/ppc/trace-events
@@ -12,6 +12,10 @@ spapr_pci_msi_retry(unsigned config_addr, unsigned req_num, unsigned max_irqs) "
 # hw/ppc/spapr.c
 spapr_cas_failed(unsigned long n) "DT diff buffer is too small: %ld bytes"
 spapr_cas_continue(unsigned long n) "Copy changes to the guest: %ld bytes"
+spapr_irq_alloc(int irq) "irq %d"
+spapr_irq_alloc_block(int first, int num, bool lsi, int align) "first irq %d, %d irqs, lsi=%d, alignnum %d"
+spapr_irq_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
+spapr_irq_free_warn(int src, int irq) "Source#%d, irq %d is already free"
 
 # hw/ppc/spapr_hcall.c
 spapr_cas_pvr_try(uint32_t pvr) "0x%x"
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9da38de34277..7a133f80411a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -709,4 +709,10 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id);
 
 Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp);
 
+int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
+                    Error **errp);
+int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
+                          bool align, Error **errp);
+void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
+
 #endif /* HW_SPAPR_H */
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 126b47dec38b..cea462bc7f3e 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -181,10 +181,6 @@ typedef struct XICSFabricClass {
 
 #define XICS_IRQS_SPAPR               1024
 
-int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp);
-int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi, bool align,
-                           Error **errp);
-void spapr_ics_free(ICSState *ics, int irq, int num);
 void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
 
 qemu_irq xics_get_qirq(XICSFabric *xi, int irq);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 05/25] spapr: introduce a spapr_irq_set() helper
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (3 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-24  3:16   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 06/25] spapr: introduce a spapr_irq_get_qirq() helper Cédric Le Goater
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

It will make synchronisation easier with the XIVE interrupt mode when
available. The 'irq' parameter refers to the global IRQ number space.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7ae84d40bdb4..79f38a9ff4e1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3594,6 +3594,11 @@ static int ics_find_free_block(ICSState *ics, int num, int alignnum)
     return -1;
 }
 
+static void spapr_irq_set(sPAPRMachineState *spapr, int irq, bool lsi)
+{
+    ics_set_irq_type(spapr->ics, irq - spapr->ics->offset, lsi);
+}
+
 int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
                     Error **errp)
 {
@@ -3618,7 +3623,7 @@ int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
         irq += ics->offset;
     }
 
-    ics_set_irq_type(ics, irq - ics->offset, lsi);
+    spapr_irq_set(spapr, irq, lsi);
     trace_spapr_irq_alloc(irq);
 
     return irq;
@@ -3657,10 +3662,10 @@ int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
         return -1;
     }
 
+    first += ics->offset;
     for (i = first; i < first + num; ++i) {
-        ics_set_irq_type(ics, i, lsi);
+        spapr_irq_set(spapr, i, lsi);
     }
-    first += ics->offset;
 
     trace_spapr_irq_alloc_block(first, num, lsi, align);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 06/25] spapr: introduce a spapr_irq_get_qirq() helper
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (4 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 05/25] spapr: introduce a spapr_irq_set() helper Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-24  3:18   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 07/25] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC Cédric Le Goater
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

xics_get_qirq() is only used by the sPAPR machine. Let's move it there
and change its name to reflect its scope. It will be useful for XIVE
support which will use its own set of qirqs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics.c              | 12 ------------
 hw/ppc/spapr.c              | 11 +++++++++++
 hw/ppc/spapr_events.c       | 12 +++++-------
 hw/ppc/spapr_pci.c          |  2 +-
 include/hw/pci-host/spapr.h |  2 +-
 include/hw/ppc/spapr.h      |  1 +
 include/hw/ppc/spapr_vio.h  |  2 +-
 include/hw/ppc/xics.h       |  1 -
 8 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 0f2e7273bc8f..a78b4dbd033d 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -714,18 +714,6 @@ static const TypeInfo xics_fabric_info = {
 /*
  * Exported functions
  */
-qemu_irq xics_get_qirq(XICSFabric *xi, int irq)
-{
-    XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
-    ICSState *ics = xic->ics_get(xi, irq);
-
-    if (ics) {
-        return ics->qirqs[irq - ics->offset];
-    }
-
-    return NULL;
-}
-
 ICPState *xics_icp_get(XICSFabric *xi, int server)
 {
     XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 79f38a9ff4e1..5d3325ca3c88 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3689,6 +3689,17 @@ void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
     }
 }
 
+qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq)
+{
+    ICSState *ics = spapr->ics;
+
+    if (ics_valid_irq(ics, irq)) {
+        return ics->qirqs[irq - ics->offset];
+    }
+
+    return NULL;
+}
+
 static void spapr_pic_print_info(InterruptStatsProvider *obj,
                                  Monitor *mon)
 {
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index cead596f3e7a..0427590e9cac 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -472,9 +472,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
 
     rtas_event_log_queue(spapr, entry);
 
-    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr),
-                                 rtas_event_log_to_irq(spapr,
-                                                       RTAS_LOG_TYPE_EPOW)));
+    qemu_irq_pulse(spapr_irq_get_qirq(spapr,
+                   rtas_event_log_to_irq(spapr, RTAS_LOG_TYPE_EPOW)));
 }
 
 static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
@@ -556,9 +555,8 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
 
     rtas_event_log_queue(spapr, entry);
 
-    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr),
-                                 rtas_event_log_to_irq(spapr,
-                                                       RTAS_LOG_TYPE_HOTPLUG)));
+    qemu_irq_pulse(spapr_irq_get_qirq(spapr,
+                   rtas_event_log_to_irq(spapr, RTAS_LOG_TYPE_HOTPLUG)));
 }
 
 void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
@@ -678,7 +676,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                 spapr_event_sources_get_source(spapr->event_sources, i);
 
             g_assert(source->enabled);
-            qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr), source->irq));
+            qemu_irq_pulse(spapr_irq_get_qirq(spapr, source->irq));
         }
     }
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e0ef77a480e5..a02faa12333e 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -723,7 +723,7 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
 
     trace_spapr_pci_msi_write(addr, data, irq);
 
-    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr), irq));
+    qemu_irq_pulse(spapr_irq_get_qirq(spapr, irq));
 }
 
 static const MemoryRegionOps spapr_msi_ops = {
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 38470b2f0e5c..3059fdd614e6 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -108,7 +108,7 @@ static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
-    return xics_get_qirq(XICS_FABRIC(spapr), phb->lsi_table[pin].irq);
+    return spapr_irq_get_qirq(spapr, phb->lsi_table[pin].irq);
 }
 
 PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 7a133f80411a..9a3885593c86 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -714,5 +714,6 @@ int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
 int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
                           bool align, Error **errp);
 void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
+qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq);
 
 #endif /* HW_SPAPR_H */
diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index 2e9685a5d900..404f1de2c046 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -87,7 +87,7 @@ static inline qemu_irq spapr_vio_qirq(VIOsPAPRDevice *dev)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
-    return xics_get_qirq(XICS_FABRIC(spapr), dev->irq);
+    return spapr_irq_get_qirq(spapr, dev->irq);
 }
 
 static inline bool spapr_vio_dma_valid(VIOsPAPRDevice *dev, uint64_t taddr,
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index cea462bc7f3e..2f1f35294e6d 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -183,7 +183,6 @@ typedef struct XICSFabricClass {
 
 void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
 
-qemu_irq xics_get_qirq(XICSFabric *xi, int irq);
 ICPState *xics_icp_get(XICSFabric *xi, int server);
 
 /* Internal XICS interfaces */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 07/25] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (5 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 06/25] spapr: introduce a spapr_irq_get_qirq() helper Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller Cédric Le Goater
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/migration/vmstate.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 88b55df5ae0c..c0bf06e7bf89 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -560,6 +560,16 @@ extern const VMStateInfo vmstate_info_qtailq;
     .offset     = vmstate_offset_pointer(_state, _field, _type),     \
 }
 
+#define VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
+    .name       = (stringify(_field)),                               \
+    .version_id = (_version),                                        \
+    .vmsd       = &(_vmsd),                                          \
+    .num_offset = vmstate_offset_value(_state, _field_num, uint32_t), \
+    .size       = sizeof(_type),                                     \
+    .flags      = VMS_STRUCT|VMS_VARRAY_UINT32|VMS_ALLOC|VMS_POINTER, \
+    .offset     = vmstate_offset_pointer(_state, _field, _type),     \
+}
+
 #define VMSTATE_STATIC_BUFFER(_field, _state, _version, _test, _start, _size) { \
     .name         = (stringify(_field)),                             \
     .version_id   = (_version),                                      \
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (6 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 07/25] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-28  5:40   ` David Gibson
  2017-11-29 11:49   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources Cédric Le Goater
                   ` (16 subsequent siblings)
  24 siblings, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The XIVE interrupt controller uses a set of tables to redirect exception
from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
table, also known as Event Assignment Structure (EAS), is one them.

The XIVE model is designed to make use of the full range of the IRQ
number space and does not use an offset like the XICS mode does.
Hence, the IVE table is directly indexed by the IRQ number.

The IVE stores Event Queue data associated with a source. The lookups
are performed when the source is configured or when an event is
triggered.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |   1 +
 hw/intc/Makefile.objs             |   1 +
 hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
 hw/intc/xive-internal.h           |  50 ++++++++++++
 include/hw/ppc/spapr_xive.h       |  44 ++++++++++
 5 files changed, 261 insertions(+)
 create mode 100644 hw/intc/spapr_xive.c
 create mode 100644 hw/intc/xive-internal.h
 create mode 100644 include/hw/ppc/spapr_xive.h

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index d1b3a6dd50f8..4a7f6a0696de 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -56,6 +56,7 @@ CONFIG_SM501=y
 CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
+CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
 # For PReP
 CONFIG_SERIAL_ISA=y
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index ae358569a155..49e13e7aeeee 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
+obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
new file mode 100644
index 000000000000..b2fc3007c85f
--- /dev/null
+++ b/hw/intc/spapr_xive.c
@@ -0,0 +1,165 @@
+/*
+ * QEMU PowerPC sPAPR XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/dma.h"
+#include "monitor/monitor.h"
+#include "hw/ppc/spapr_xive.h"
+
+#include "xive-internal.h"
+
+/*
+ * Main XIVE object
+ */
+
+void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
+{
+    int i;
+
+    for (i = 0; i < xive->nr_irqs; i++) {
+        XiveIVE *ive = &xive->ivt[i];
+
+        if (!(ive->w & IVE_VALID)) {
+            continue;
+        }
+
+        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
+                       ive->w & IVE_MASKED ? "M" : " ",
+                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
+                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
+    }
+}
+
+void spapr_xive_reset(void *dev)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+    int i;
+
+    /* Mask all valid IVEs in the IRQ number space. */
+    for (i = 0; i < xive->nr_irqs; i++) {
+        XiveIVE *ive = &xive->ivt[i];
+        if (ive->w & IVE_VALID) {
+            ive->w |= IVE_MASKED;
+        }
+    }
+}
+
+static void spapr_xive_realize(DeviceState *dev, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+
+    if (!xive->nr_irqs) {
+        error_setg(errp, "Number of interrupt needs to be greater 0");
+        return;
+    }
+
+    /* Allocate the IVT (Interrupt Virtualization Table) */
+    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
+
+    qemu_register_reset(spapr_xive_reset, dev);
+}
+
+static const VMStateDescription vmstate_spapr_xive_ive = {
+    .name = TYPE_SPAPR_XIVE "/ive",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(w, XiveIVE),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool vmstate_spapr_xive_needed(void *opaque)
+{
+    /* TODO check machine XIVE support */
+    return true;
+}
+
+static const VMStateDescription vmstate_spapr_xive = {
+    .name = TYPE_SPAPR_XIVE,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = vmstate_spapr_xive_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
+        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
+                                           vmstate_spapr_xive_ive, XiveIVE),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property spapr_xive_properties[] = {
+    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void spapr_xive_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = spapr_xive_realize;
+    dc->props = spapr_xive_properties;
+    dc->desc = "sPAPR XIVE interrupt controller";
+    dc->vmsd = &vmstate_spapr_xive;
+}
+
+static const TypeInfo spapr_xive_info = {
+    .name = TYPE_SPAPR_XIVE,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(sPAPRXive),
+    .class_init = spapr_xive_class_init,
+};
+
+static void spapr_xive_register_types(void)
+{
+    type_register_static(&spapr_xive_info);
+}
+
+type_init(spapr_xive_register_types)
+
+XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
+{
+    return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
+}
+
+bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
+{
+    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
+
+    if (!ive) {
+        return false;
+    }
+
+    ive->w |= IVE_VALID;
+    return true;
+}
+
+bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
+{
+    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
+
+    if (!ive) {
+        return false;
+    }
+
+    ive->w &= ~IVE_VALID;
+    return true;
+}
diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
new file mode 100644
index 000000000000..bea88d82992c
--- /dev/null
+++ b/hw/intc/xive-internal.h
@@ -0,0 +1,50 @@
+/*
+ * QEMU PowerPC XIVE model
+ *
+ * Copyright 2016,2017 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _INTC_XIVE_INTERNAL_H
+#define _INTC_XIVE_INTERNAL_H
+
+/* Utilities to manipulate these (originaly from OPAL) */
+#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
+#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
+#define SETFIELD(m, v, val)                             \
+        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
+
+#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
+#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
+#define PPC_BIT8(bit)           (0x80UL >> (bit))
+#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
+#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
+                                 PPC_BIT32(bs))
+
+/* IVE/EAS
+ *
+ * One per interrupt source. Targets that interrupt to a given EQ
+ * and provides the corresponding logical interrupt number (EQ data)
+ *
+ * We also map this structure to the escalation descriptor inside
+ * an EQ, though in that case the valid and masked bits are not used.
+ */
+typedef struct XiveIVE {
+        /* Use a single 64-bit definition to make it easier to
+         * perform atomic updates
+         */
+        uint64_t        w;
+#define IVE_VALID       PPC_BIT(0)
+#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
+#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
+#define IVE_MASKED      PPC_BIT(32)              /* Masked */
+#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
+} XiveIVE;
+
+void spapr_xive_reset(void *dev);
+XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
+
+#endif /* _INTC_XIVE_INTERNAL_H */
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
new file mode 100644
index 000000000000..795b3f4ded7c
--- /dev/null
+++ b/include/hw/ppc/spapr_xive.h
@@ -0,0 +1,44 @@
+/*
+ * QEMU PowerPC sPAPR XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef PPC_SPAPR_XIVE_H
+#define PPC_SPAPR_XIVE_H
+
+#include <hw/sysbus.h>
+
+typedef struct sPAPRXive sPAPRXive;
+typedef struct XiveIVE XiveIVE;
+
+#define TYPE_SPAPR_XIVE "spapr-xive"
+#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
+
+struct sPAPRXive {
+    SysBusDevice parent;
+
+    /* Properties */
+    uint32_t     nr_irqs;
+
+    /* XIVE internal tables */
+    XiveIVE      *ivt;
+};
+
+bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
+bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
+void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
+
+#endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (7 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-28  5:45   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the " Cédric Le Goater
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

These are very similar to the XICS handlers in a simpler form. They make
use of a status array for the LSI interrupts. The spapr_xive_irq() routine
in charge of triggering the CPU interrupt line will be filled later on.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 55 +++++++++++++++++++++++++++++++++++++++++++--
 include/hw/ppc/spapr_xive.h | 14 +++++++++++-
 2 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index b2fc3007c85f..66c533fb1d78 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -26,6 +26,47 @@
 
 #include "xive-internal.h"
 
+static void spapr_xive_irq(sPAPRXive *xive, int lisn)
+{
+
+}
+
+/*
+ * XIVE Interrupt Source
+ */
+static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int lisn, int val)
+{
+    if (val) {
+        spapr_xive_irq(xive, lisn);
+    }
+}
+
+static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int lisn, int val)
+{
+    if (val) {
+        xive->status[lisn] |= XIVE_STATUS_ASSERTED;
+    } else {
+        xive->status[lisn] &= ~XIVE_STATUS_ASSERTED;
+    }
+
+    if (xive->status[lisn] & XIVE_STATUS_ASSERTED &&
+        !(xive->status[lisn] & XIVE_STATUS_SENT)) {
+        xive->status[lisn] |= XIVE_STATUS_SENT;
+        spapr_xive_irq(xive, lisn);
+    }
+}
+
+static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+
+    if (spapr_xive_irq_is_lsi(xive, lisn)) {
+        spapr_xive_source_set_irq_lsi(xive, lisn, val);
+    } else {
+        spapr_xive_source_set_irq_msi(xive, lisn, val);
+    }
+}
+
 /*
  * Main XIVE object
  */
@@ -41,7 +82,8 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
             continue;
         }
 
-        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
+        monitor_printf(mon, "  %4x %s %s %08x %08x\n", i,
+                       spapr_xive_irq_is_lsi(xive, i) ? "LSI" : "MSI",
                        ive->w & IVE_MASKED ? "M" : " ",
                        (int) GETFIELD(IVE_EQ_INDEX, ive->w),
                        (int) GETFIELD(IVE_EQ_DATA, ive->w));
@@ -53,6 +95,8 @@ void spapr_xive_reset(void *dev)
     sPAPRXive *xive = SPAPR_XIVE(dev);
     int i;
 
+    /* Do not clear IRQs status */
+
     /* Mask all valid IVEs in the IRQ number space. */
     for (i = 0; i < xive->nr_irqs; i++) {
         XiveIVE *ive = &xive->ivt[i];
@@ -71,6 +115,11 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /* QEMU IRQs */
+    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
+                                     xive->nr_irqs);
+    xive->status = g_malloc0(xive->nr_irqs);
+
     /* Allocate the IVT (Interrupt Virtualization Table) */
     xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
 
@@ -102,6 +151,7 @@ static const VMStateDescription vmstate_spapr_xive = {
         VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
         VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
                                            vmstate_spapr_xive_ive, XiveIVE),
+        VMSTATE_VBUFFER_UINT32(status, sPAPRXive, 1, NULL, nr_irqs),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -140,7 +190,7 @@ XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
     return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
 }
 
-bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
+bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi)
 {
     XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
 
@@ -149,6 +199,7 @@ bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
     }
 
     ive->w |= IVE_VALID;
+    xive->status[lisn] |= lsi ? XIVE_STATUS_LSI : 0;
     return true;
 }
 
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 795b3f4ded7c..6a799cdaba66 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -33,11 +33,23 @@ struct sPAPRXive {
     /* Properties */
     uint32_t     nr_irqs;
 
+     /* IRQ */
+    qemu_irq     *qirqs;
+#define XIVE_STATUS_LSI                0x1
+#define XIVE_STATUS_ASSERTED           0x2
+#define XIVE_STATUS_SENT               0x4
+    uint8_t      *status;
+
     /* XIVE internal tables */
     XiveIVE      *ivt;
 };
 
-bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
+static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
+{
+    return xive->status[lisn] & XIVE_STATUS_LSI;
+}
+
+bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi);
 bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (8 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-28  6:38   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags Cédric Le Goater
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Each interrupt source is associated with a two bit state machine
called an Event State Buffer (ESB). The bits are named "P" (pending)
and "Q" (queued) and can be controlled by MMIO. It is used to trigger
events. See code for more details on the states and transitions.

The MMIO space for the ESB translation is 512GB large on baremetal
(powernv) systems and the BAR depends on the chip id. In our model for
the sPAPR machine, we choose to only map a sub memory region for the
provisionned IRQ numbers and to use the mapping address of chip 0 on a
real system. The OS will get the address of the MMIO page of the ESB
entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 268 +++++++++++++++++++++++++++++++++++++++++++-
 include/hw/ppc/spapr_xive.h |   8 ++
 2 files changed, 275 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 66c533fb1d78..f45f50fd017e 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -32,6 +32,216 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
 }
 
 /*
+ * "magic" Event State Buffer (ESB) MMIO offsets.
+ *
+ * Each interrupt source has a 2-bit state machine called ESB
+ * which can be controlled by MMIO. It's made of 2 bits, P and
+ * Q. P indicates that an interrupt is pending (has been sent
+ * to a queue and is waiting for an EOI). Q indicates that the
+ * interrupt has been triggered while pending.
+ *
+ * This acts as a coalescing mechanism in order to guarantee
+ * that a given interrupt only occurs at most once in a queue.
+ *
+ * When doing an EOI, the Q bit will indicate if the interrupt
+ * needs to be re-triggered.
+ *
+ * The following offsets into the ESB MMIO allow to read or
+ * manipulate the PQ bits. They must be used with an 8-bytes
+ * load instruction. They all return the previous state of the
+ * interrupt (atomically).
+ *
+ * Additionally, some ESB pages support doing an EOI via a
+ * store at 0 and some ESBs support doing a trigger via a
+ * separate trigger page.
+ */
+#define XIVE_ESB_GET            0x800
+#define XIVE_ESB_SET_PQ_00      0xc00
+#define XIVE_ESB_SET_PQ_01      0xd00
+#define XIVE_ESB_SET_PQ_10      0xe00
+#define XIVE_ESB_SET_PQ_11      0xf00
+
+#define XIVE_ESB_VAL_P          0x2
+#define XIVE_ESB_VAL_Q          0x1
+
+#define XIVE_ESB_RESET          0x0
+#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
+#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
+#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
+
+static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t lisn)
+{
+    uint32_t byte = lisn / 4;
+    uint32_t bit  = (lisn % 4) * 2;
+
+    assert(byte < xive->sbe_size);
+
+    return (xive->sbe[byte] >> bit) & 0x3;
+}
+
+static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t lisn, uint8_t pq)
+{
+    uint32_t byte = lisn / 4;
+    uint32_t bit  = (lisn % 4) * 2;
+    uint8_t old, new;
+
+    assert(byte < xive->sbe_size);
+
+    old = xive->sbe[byte];
+
+    new = xive->sbe[byte] & ~(0x3 << bit);
+    new |= (pq & 0x3) << bit;
+
+    xive->sbe[byte] = new;
+
+    return (old >> bit) & 0x3;
+}
+
+static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t lisn)
+{
+    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_PENDING:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_QUEUED:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_OFF:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t lisn)
+{
+    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_PENDING:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
+        return true;
+    case XIVE_ESB_QUEUED:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
+        return true;
+    case XIVE_ESB_OFF:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+/*
+ * XIVE Interrupt Source MMIOs
+ */
+static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t lisn)
+{
+    if (spapr_xive_irq_is_lsi(xive, lisn)) {
+        xive->status[lisn] &= ~XIVE_STATUS_SENT;
+    }
+}
+
+/* TODO: handle second page
+ *
+ * Some HW use a separate page for trigger. We only support the case
+ * in which the trigger can be done in the same page as the EOI.
+ */
+static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    uint32_t offset = addr & 0xF00;
+    uint32_t lisn = addr >> xive->esb_shift;
+    XiveIVE *ive;
+    uint64_t ret = -1;
+
+    ive = spapr_xive_get_ive(xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID))  {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
+        goto out;
+    }
+
+    switch (offset) {
+    case 0:
+        spapr_xive_source_eoi(xive, lisn);
+
+        /* return TRUE or FALSE depending on PQ value */
+        ret = spapr_xive_pq_eoi(xive, lisn);
+        break;
+
+    case XIVE_ESB_GET:
+        ret = spapr_xive_pq_get(xive, lisn);
+        break;
+
+    case XIVE_ESB_SET_PQ_00:
+    case XIVE_ESB_SET_PQ_01:
+    case XIVE_ESB_SET_PQ_10:
+    case XIVE_ESB_SET_PQ_11:
+        ret = spapr_xive_pq_set(xive, lisn, (offset >> 8) & 0x3);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
+    }
+
+out:
+    return ret;
+}
+
+static void spapr_xive_esb_write(void *opaque, hwaddr addr,
+                           uint64_t value, unsigned size)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    uint32_t offset = addr & 0xF00;
+    uint32_t lisn = addr >> xive->esb_shift;
+    XiveIVE *ive;
+    bool notify = false;
+
+    ive = spapr_xive_get_ive(xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID))  {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
+        return;
+    }
+
+    switch (offset) {
+    case 0:
+        /* TODO: should we trigger even if the IVE is masked ? */
+        notify = spapr_xive_pq_trigger(xive, lisn);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
+                      offset);
+        return;
+    }
+
+    if (notify && !(ive->w & IVE_MASKED)) {
+        qemu_irq_pulse(xive->qirqs[lisn]);
+    }
+}
+
+static const MemoryRegionOps spapr_xive_esb_ops = {
+    .read = spapr_xive_esb_read,
+    .write = spapr_xive_esb_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+};
+
+/*
  * XIVE Interrupt Source
  */
 static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int lisn, int val)
@@ -70,6 +280,33 @@ static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
 /*
  * Main XIVE object
  */
+#define P9_MMIO_BASE     0x006000000000000ull
+
+/* VC BAR contains set translations for the ESBs and the EQs. */
+#define VC_BAR_DEFAULT   0x10000000000ull
+#define VC_BAR_SIZE      0x08000000000ull
+#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
+
+static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
+                                            unsigned size)
+{
+    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
+                  __func__, offset, size);
+    return 0;
+}
+
+static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
+                                         uint64_t value, unsigned size)
+{
+    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
+                  __func__, offset, value, size);
+}
+
+static const MemoryRegionOps spapr_xive_esb_default_ops = {
+    .read = spapr_xive_esb_default_read,
+    .write = spapr_xive_esb_default_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
 
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
 {
@@ -77,14 +314,19 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
 
     for (i = 0; i < xive->nr_irqs; i++) {
         XiveIVE *ive = &xive->ivt[i];
+        uint8_t pq;
 
         if (!(ive->w & IVE_VALID)) {
             continue;
         }
 
-        monitor_printf(mon, "  %4x %s %s %08x %08x\n", i,
+        pq = spapr_xive_pq_get(xive, i);
+
+        monitor_printf(mon, "  %4x %s %s %c%c %08x %08x\n", i,
                        spapr_xive_irq_is_lsi(xive, i) ? "LSI" : "MSI",
                        ive->w & IVE_MASKED ? "M" : " ",
+                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
+                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
                        (int) GETFIELD(IVE_EQ_INDEX, ive->w),
                        (int) GETFIELD(IVE_EQ_DATA, ive->w));
     }
@@ -104,6 +346,9 @@ void spapr_xive_reset(void *dev)
             ive->w |= IVE_MASKED;
         }
     }
+
+    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
+    memset(xive->sbe, 0x55, xive->sbe_size);
 }
 
 static void spapr_xive_realize(DeviceState *dev, Error **errp)
@@ -123,6 +368,26 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     /* Allocate the IVT (Interrupt Virtualization Table) */
     xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
 
+    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
+    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
+    xive->sbe = g_malloc0(xive->sbe_size);
+
+    /* VC BAR. That's the full window but we will only map the
+     * subregions in use. */
+    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
+    xive->esb_shift = ESB_SHIFT;
+
+    /* Install default memory region handlers to log bogus access */
+    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
+                          NULL, "xive.esb.full", VC_BAR_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
+
+    /* Install the ESB memory region in the overall one */
+    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
+                          xive, "xive.esb",
+                          (1ull << xive->esb_shift) * xive->nr_irqs);
+    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
+
     qemu_register_reset(spapr_xive_reset, dev);
 }
 
@@ -152,6 +417,7 @@ static const VMStateDescription vmstate_spapr_xive = {
         VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
                                            vmstate_spapr_xive_ive, XiveIVE),
         VMSTATE_VBUFFER_UINT32(status, sPAPRXive, 1, NULL, nr_irqs),
+        VMSTATE_VBUFFER_UINT32(sbe, sPAPRXive, 1, NULL, sbe_size),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 6a799cdaba66..84c910e62e56 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -42,6 +42,14 @@ struct sPAPRXive {
 
     /* XIVE internal tables */
     XiveIVE      *ivt;
+    uint8_t      *sbe;
+    uint32_t     sbe_size;
+
+    /* ESB memory region */
+    uint32_t     esb_shift;
+    hwaddr       esb_base;
+    MemoryRegion esb_mr;
+    MemoryRegion esb_iomem;
 };
 
 static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (9 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the " Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-28  6:40   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model Cédric Le Goater
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The XIVE interrupt sources can have different characteristics depending
on their nature and the HW level in use. The sPAPR specs provide a set of
flags to describe them :

 - XIVE_SRC_H_INT_ESB  the Event State Buffers are controlled with a
                       specific hcall H_INT_ESB and not with MMIO
 - XIVE_SRC_LSI        LSI or MSI source (ICSIRQState level)
 - XIVE_SRC_TRIGGER    the full function page supports trigger
 - XIVE_SRC_STORE_EOI  EOI can be done with a store.

Our QEMU emulation of XIVE for the sPAPR machine gathers all sources under
a same model and provides a common source with the XIVE_SRC_TRIGGER type.
So, the above list is mostly informative apart from the XIVE_SRC_LSI flag
which will be deduced from the XIVE_STATUS_LSI flag.

The OS retrieves this information on the source with the
H_INT_GET_SOURCE_INFO hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 4 ++++
 include/hw/ppc/spapr_xive.h | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index f45f50fd017e..b1e3f8710cff 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -368,6 +368,10 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     /* Allocate the IVT (Interrupt Virtualization Table) */
     xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
 
+    /* All sources are emulated under the XIVE object and share the
+     * same characteristic */
+    xive->flags = XIVE_SRC_TRIGGER;
+
     /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
     xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
     xive->sbe = g_malloc0(xive->sbe_size);
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 84c910e62e56..7a308fb4db2b 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -40,6 +40,13 @@ struct sPAPRXive {
 #define XIVE_STATUS_SENT               0x4
     uint8_t      *status;
 
+    /* Interrupt source flags */
+#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
+#define XIVE_SRC_LSI           (1ull << (63 - 61))
+#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
+#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
+    uint32_t     flags;
+
     /* XIVE internal tables */
     XiveIVE      *ivt;
     uint8_t      *sbe;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (10 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-29  5:11   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues Cédric Le Goater
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The XIVE interrupt presenter exposes a set of rings, also called
Thread Interrupt Management Areas (TIMA), to handle priority
management and interrupt acknowledgment among other things. There is
one ring per level of privilege, four in all. The one we are
interested in for the sPAPR machine is the OS ring.

The TIMA is mapped at the same address for each CPU. 'current_cpu' is
used to retrieve the targeted interrupt presenter object holding the
cache data of the registers the model use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 271 ++++++++++++++++++++++++++++++++++++++++++++
 hw/intc/xive-internal.h     |  89 +++++++++++++++
 include/hw/ppc/spapr_xive.h |  11 ++
 3 files changed, 371 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index b1e3f8710cff..554b25e0884c 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -23,9 +23,166 @@
 #include "sysemu/dma.h"
 #include "monitor/monitor.h"
 #include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/xics.h"
 
 #include "xive-internal.h"
 
+struct sPAPRXiveICP {
+    DeviceState parent_obj;
+
+    CPUState  *cs;
+    uint8_t   tima[TM_RING_COUNT * 0x10];
+    uint8_t   *tima_os;
+    qemu_irq  output;
+};
+
+static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
+{
+    return 0;
+}
+
+static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
+{
+    if (cppr > XIVE_PRIORITY_MAX) {
+        cppr = 0xff;
+    }
+
+    icp->tima_os[TM_CPPR] = cppr;
+}
+
+/*
+ * Thread Interrupt Management Area MMIO
+ */
+static uint64_t spapr_xive_tm_read_special(sPAPRXiveICP *icp, hwaddr offset,
+                                     unsigned size)
+{
+    uint64_t ret = -1;
+
+    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
+        ret = spapr_xive_icp_accept(icp);
+    } else {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
+                      HWADDR_PRIx" size %d\n", offset, size);
+    }
+
+    return ret;
+}
+
+static uint64_t spapr_xive_tm_read(void *opaque, hwaddr offset, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    sPAPRXiveICP *icp = SPAPR_XIVE_ICP(cpu->intc);
+    uint64_t ret = -1;
+    int i;
+
+    if (offset >= TM_SPC_ACK_EBB) {
+        return spapr_xive_tm_read_special(icp, offset, size);
+    }
+
+    if ((offset & 0xf0) == TM_QW1_OS) {
+        switch (size) {
+        case 1:
+        case 2:
+        case 4:
+        case 8:
+            if (QEMU_IS_ALIGNED(offset, size)) {
+                ret = 0;
+                for (i = 0; i < size; i++) {
+                    ret |= icp->tima[offset + i] << (8 * i);
+                }
+            } else {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                              "XIVE: invalid TIMA read alignment @%"
+                              HWADDR_PRIx" size %d\n", offset, size);
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
+                      HWADDR_PRIx"\n", offset);
+    }
+
+    return ret;
+}
+
+static bool spapr_xive_tm_is_readonly(uint8_t offset)
+{
+    /* Let's be optimistic and prepare ground for HV mode support */
+    switch (offset) {
+    case TM_QW1_OS + TM_CPPR:
+        return false;
+    default:
+        return true;
+    }
+}
+
+static void spapr_xive_tm_write_special(sPAPRXiveICP *icp, hwaddr offset,
+                                  uint64_t value, unsigned size)
+{
+    /* TODO: support TM_SPC_SET_OS_PENDING */
+
+    /* TODO: support TM_SPC_ACK_OS_EL */
+}
+
+static void spapr_xive_tm_write(void *opaque, hwaddr offset,
+                           uint64_t value, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    sPAPRXiveICP *icp = SPAPR_XIVE_ICP(cpu->intc);
+    int i;
+
+    if (offset >= TM_SPC_ACK_EBB) {
+        spapr_xive_tm_write_special(icp, offset, value, size);
+        return;
+    }
+
+    if ((offset & 0xf0) == TM_QW1_OS) {
+        switch (size) {
+        case 1:
+            if (offset == TM_QW1_OS + TM_CPPR) {
+                spapr_xive_icp_set_cppr(icp, value & 0xff);
+            }
+            break;
+        case 4:
+        case 8:
+            if (QEMU_IS_ALIGNED(offset, size)) {
+                for (i = 0; i < size; i++) {
+                    if (!spapr_xive_tm_is_readonly(offset + i)) {
+                        icp->tima[offset + i] = (value >> (8 * i)) & 0xff;
+                    }
+                }
+            } else {
+                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                              HWADDR_PRIx" size %d\n", offset, size);
+            }
+            break;
+        default:
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                          HWADDR_PRIx" size %d\n", offset, size);
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
+                      HWADDR_PRIx"\n", offset);
+    }
+}
+
+
+static const MemoryRegionOps spapr_xive_tm_ops = {
+    .read = spapr_xive_tm_read,
+    .write = spapr_xive_tm_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
+
 static void spapr_xive_irq(sPAPRXive *xive, int lisn)
 {
 
@@ -287,6 +444,11 @@ static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
 #define VC_BAR_SIZE      0x08000000000ull
 #define ESB_SHIFT        16 /* One 64k page. OPAL has two */
 
+/* Thread Interrupt Management Area MMIO */
+#define TM_BAR_DEFAULT   0x30203180000ull
+#define TM_SHIFT         16
+#define TM_BAR_SIZE      (TM_RING_COUNT * (1 << TM_SHIFT))
+
 static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
                                             unsigned size)
 {
@@ -392,6 +554,14 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
                           (1ull << xive->esb_shift) * xive->nr_irqs);
     memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
 
+    /* TM BAR. Same address for each chip */
+    xive->tm_base = (P9_MMIO_BASE | TM_BAR_DEFAULT);
+    xive->tm_shift = TM_SHIFT;
+
+    memory_region_init_io(&xive->tm_iomem, OBJECT(xive), &spapr_xive_tm_ops,
+                          xive, "xive.tm", TM_BAR_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_iomem);
+
     qemu_register_reset(spapr_xive_reset, dev);
 }
 
@@ -448,9 +618,110 @@ static const TypeInfo spapr_xive_info = {
     .class_init = spapr_xive_class_init,
 };
 
+void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon)
+{
+    int cpu_index = xicp->cs ? xicp->cs->cpu_index : -1;
+
+    monitor_printf(mon, "CPU %d CPPR=%02x IPB=%02x PIPR=%02x NSR=%02x\n",
+                   cpu_index, xicp->tima_os[TM_CPPR], xicp->tima_os[TM_IPB],
+                   xicp->tima_os[TM_PIPR], xicp->tima_os[TM_NSR]);
+}
+
+static void spapr_xive_icp_reset(void *dev)
+{
+    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
+
+    memset(xicp->tima, 0, sizeof(xicp->tima));
+}
+
+static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
+{
+    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
+    PowerPCCPU *cpu;
+    CPUPPCState *env;
+    Object *obj;
+    Error *err = NULL;
+
+    obj = object_property_get_link(OBJECT(dev), ICP_PROP_CPU, &err);
+    if (!obj) {
+        error_propagate(errp, err);
+        error_prepend(errp, "required link '" ICP_PROP_CPU "' not found: ");
+        return;
+    }
+
+    cpu = POWERPC_CPU(obj);
+    xicp->cs = CPU(obj);
+
+    env = &cpu->env;
+    switch (PPC_INPUT(env)) {
+    case PPC_FLAGS_INPUT_POWER7:
+        xicp->output = env->irq_inputs[POWER7_INPUT_INT];
+        break;
+
+    case PPC_FLAGS_INPUT_970:
+        xicp->output = env->irq_inputs[PPC970_INPUT_INT];
+        break;
+
+    default:
+        error_setg(errp, "XIVE interrupt controller does not support "
+                   "this CPU bus model");
+        return;
+    }
+
+    qemu_register_reset(spapr_xive_icp_reset, dev);
+}
+
+static void spapr_xive_icp_unrealize(DeviceState *dev, Error **errp)
+{
+    qemu_unregister_reset(spapr_xive_icp_reset, dev);
+}
+
+static void spapr_xive_icp_init(Object *obj)
+{
+    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(obj);
+
+    xicp->tima_os = &xicp->tima[TM_QW1_OS];
+}
+
+static bool vmstate_spapr_xive_icp_needed(void *opaque)
+{
+    /* TODO check machine XIVE support */
+    return true;
+}
+
+static const VMStateDescription vmstate_spapr_xive_icp = {
+    .name = TYPE_SPAPR_XIVE_ICP,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = vmstate_spapr_xive_icp_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_BUFFER(tima, sPAPRXiveICP),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void spapr_xive_icp_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = spapr_xive_icp_realize;
+    dc->unrealize = spapr_xive_icp_unrealize;
+    dc->desc = "sPAPR XIVE Interrupt Presenter";
+    dc->vmsd = &vmstate_spapr_xive_icp;
+}
+
+static const TypeInfo xive_icp_info = {
+    .name          = TYPE_SPAPR_XIVE_ICP,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(sPAPRXiveICP),
+    .instance_init = spapr_xive_icp_init,
+    .class_init    = spapr_xive_icp_class_init,
+};
+
 static void spapr_xive_register_types(void)
 {
     type_register_static(&spapr_xive_info);
+    type_register_static(&xive_icp_info);
 }
 
 type_init(spapr_xive_register_types)
diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index bea88d82992c..7d329f203a9b 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -24,6 +24,93 @@
 #define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
                                  PPC_BIT32(bs))
 
+/*
+ * Thread Management (aka "TM") registers
+ */
+
+/* Number of Thread Management Interrupt Areas */
+#define TM_RING_COUNT 4
+
+/* TM register offsets */
+#define TM_QW0_USER             0x000 /* All rings */
+#define TM_QW1_OS               0x010 /* Ring 0..2 */
+#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
+#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
+
+/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
+#define TM_NSR                  0x0  /*  +   +   -   +  */
+#define TM_CPPR                 0x1  /*  -   +   -   +  */
+#define TM_IPB                  0x2  /*  -   +   +   +  */
+#define TM_LSMFB                0x3  /*  -   +   +   +  */
+#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
+#define TM_INC                  0x5  /*  -   +   -   +  */
+#define TM_AGE                  0x6  /*  -   +   -   +  */
+#define TM_PIPR                 0x7  /*  -   +   -   +  */
+
+#define TM_WORD0                0x0
+#define TM_WORD1                0x4
+
+/*
+ * QW word 2 contains the valid bit at the top and other fields
+ * depending on the QW.
+ */
+#define TM_WORD2                0x8
+#define   TM_QW0W2_VU           PPC_BIT32(0)
+#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
+#define   TM_QW1W2_VO           PPC_BIT32(0)
+#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
+#define   TM_QW2W2_VP           PPC_BIT32(0)
+#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
+#define   TM_QW3W2_VT           PPC_BIT32(0)
+#define   TM_QW3W2_LP           PPC_BIT32(6)
+#define   TM_QW3W2_LE           PPC_BIT32(7)
+#define   TM_QW3W2_T            PPC_BIT32(31)
+
+/*
+ * In addition to normal loads to "peek" and writes (only when invalid)
+ * using 4 and 8 bytes accesses, the above registers support these
+ * "special" byte operations:
+ *
+ *   - Byte load from QW0[NSR] - User level NSR (EBB)
+ *   - Byte store to QW0[NSR] - User level NSR (EBB)
+ *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
+ *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
+ *                                    otherwise VT||0000000
+ *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
+ *
+ * Then we have all these "special" CI ops at these offset that trigger
+ * all sorts of side effects:
+ */
+#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
+#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
+#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
+#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
+                                         * context */
+#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
+#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
+                                         * context to reg */
+#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
+                                         * context to reg*/
+#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
+#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
+                                         * line */
+#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
+#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
+                                         * line */
+#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
+/* XXX more... */
+
+/* NSR fields for the various QW ack types */
+#define TM_QW0_NSR_EB           PPC_BIT8(0)
+#define TM_QW1_NSR_EO           PPC_BIT8(0)
+#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
+#define  TM_QW3_NSR_HE_NONE     0
+#define  TM_QW3_NSR_HE_POOL     1
+#define  TM_QW3_NSR_HE_PHYS     2
+#define  TM_QW3_NSR_HE_LSI      3
+#define TM_QW3_NSR_I            PPC_BIT8(2)
+#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
+
 /* IVE/EAS
  *
  * One per interrupt source. Targets that interrupt to a given EQ
@@ -44,6 +131,8 @@ typedef struct XiveIVE {
 #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
 } XiveIVE;
 
+#define XIVE_PRIORITY_MAX  7
+
 void spapr_xive_reset(void *dev);
 XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
 
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 7a308fb4db2b..6e8a189e723f 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -23,10 +23,15 @@
 
 typedef struct sPAPRXive sPAPRXive;
 typedef struct XiveIVE XiveIVE;
+typedef struct sPAPRXiveICP sPAPRXiveICP;
 
 #define TYPE_SPAPR_XIVE "spapr-xive"
 #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
 
+#define TYPE_SPAPR_XIVE_ICP "spapr-xive-icp"
+#define SPAPR_XIVE_ICP(obj) \
+    OBJECT_CHECK(sPAPRXiveICP, (obj), TYPE_SPAPR_XIVE_ICP)
+
 struct sPAPRXive {
     SysBusDevice parent;
 
@@ -57,6 +62,11 @@ struct sPAPRXive {
     hwaddr       esb_base;
     MemoryRegion esb_mr;
     MemoryRegion esb_iomem;
+
+    /* TIMA memory region */
+    uint32_t     tm_shift;
+    hwaddr       tm_base;
+    MemoryRegion tm_iomem;
 };
 
 static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
@@ -67,5 +77,6 @@ static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
 bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi);
 bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
+void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (11 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-23 20:31   ` Benjamin Herrenschmidt
  2017-11-30  4:38   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue Cédric Le Goater
                   ` (11 subsequent siblings)
  24 siblings, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The Event Queue Descriptor (EQD) table, also known as Event Notification
Descriptor (END), is one of the internal tables the XIVE interrupt
controller uses to redirect exception from event sources to CPU
threads.

The EQD specifies on which Event Queue the event data should be posted
when an exception occurs (later on pulled by the OS) and which server
(VPD in XIVE terminology) to notify. The Event Queue is a much more
complex structure but we start with a simple model for the sPAPR
machine.

There is one XiveEQ per priority and the model chooses to store them
under the Xive Interrupt presenter model. It will be retrieved, just
like for XICS, through the 'intc' object pointer of the CPU.

The EQ indexing follows a simple pattern:

       (server << 3) | (priority & 0x7)

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 554b25e0884c..983317a6b3f6 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -23,6 +23,7 @@
 #include "sysemu/dma.h"
 #include "monitor/monitor.h"
 #include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/spapr.h"
 #include "hw/ppc/xics.h"
 
 #include "xive-internal.h"
@@ -34,6 +35,8 @@ struct sPAPRXiveICP {
     uint8_t   tima[TM_RING_COUNT * 0x10];
     uint8_t   *tima_os;
     qemu_irq  output;
+
+    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
 };
 
 static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
@@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
     },
 };
 
+static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
+{
+    PowerPCCPU *cpu = spapr_find_cpu(server);
+
+    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
+}
+
 static void spapr_xive_irq(sPAPRXive *xive, int lisn)
 {
 
@@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
     sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
 
     memset(xicp->tima, 0, sizeof(xicp->tima));
+
+    memset(xicp->eqt, 0, sizeof(xicp->eqt));
 }
 
 static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
@@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
     xicp->tima_os = &xicp->tima[TM_QW1_OS];
 }
 
+static const VMStateDescription vmstate_spapr_xive_icp_eq = {
+    .name = TYPE_SPAPR_XIVE_ICP "/eq",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT32(w0, XiveEQ),
+        VMSTATE_UINT32(w1, XiveEQ),
+        VMSTATE_UINT32(w2, XiveEQ),
+        VMSTATE_UINT32(w3, XiveEQ),
+        VMSTATE_UINT32(w4, XiveEQ),
+        VMSTATE_UINT32(w5, XiveEQ),
+        VMSTATE_UINT32(w6, XiveEQ),
+        VMSTATE_UINT32(w7, XiveEQ),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static bool vmstate_spapr_xive_icp_needed(void *opaque)
 {
     /* TODO check machine XIVE support */
@@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
     .needed = vmstate_spapr_xive_icp_needed,
     .fields = (VMStateField[]) {
         VMSTATE_BUFFER(tima, sPAPRXiveICP),
+        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
+                             vmstate_spapr_xive_icp_eq, XiveEQ),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
     ive->w &= ~IVE_VALID;
     return true;
 }
+
+/*
+ * Use a simple indexing for the EQs.
+ */
+XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
+{
+    int priority = eq_idx & 0x7;
+    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
+
+    return xicp ? &xicp->eqt[priority] : NULL;
+}
+
+bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
+                              uint8_t priority, uint32_t *out_eq_idx)
+{
+    if (priority > XIVE_PRIORITY_MAX) {
+        return false;
+    }
+
+    if (out_eq_idx) {
+        *out_eq_idx = (server << 3) | (priority & 0x7);
+    }
+
+    return true;
+}
diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index 7d329f203a9b..c3949671aa03 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -131,9 +131,59 @@ typedef struct XiveIVE {
 #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
 } XiveIVE;
 
+/* EQ */
+typedef struct XiveEQ {
+        uint32_t        w0;
+#define EQ_W0_VALID             PPC_BIT32(0)
+#define EQ_W0_ENQUEUE           PPC_BIT32(1)
+#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
+#define EQ_W0_BACKLOG           PPC_BIT32(3)
+#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
+#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
+#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
+#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
+#define EQ_W0_SW0               PPC_BIT32(16)
+#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
+#define EQ_QSIZE_4K             0
+#define EQ_QSIZE_64K            4
+#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
+        uint32_t        w1;
+#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
+#define EQ_W1_ESn_P             PPC_BIT32(0)
+#define EQ_W1_ESn_Q             PPC_BIT32(1)
+#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
+#define EQ_W1_ESe_P             PPC_BIT32(2)
+#define EQ_W1_ESe_Q             PPC_BIT32(3)
+#define EQ_W1_GENERATION        PPC_BIT32(9)
+#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
+        uint32_t        w2;
+#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
+#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
+        uint32_t        w3;
+#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
+        uint32_t        w4;
+#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
+#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
+        uint32_t        w5;
+#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
+        uint32_t        w6;
+#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
+#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
+#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
+        uint32_t        w7;
+#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
+#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
+#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
+#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
+#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
+} XiveEQ;
+
 #define XIVE_PRIORITY_MAX  7
 
 void spapr_xive_reset(void *dev);
 XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
+XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
+bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
+                              uint32_t *out_eq_idx);
 
 #endif /* _INTC_XIVE_INTERNAL_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (12 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-30  4:49   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged Cédric Le Goater
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

If a triggered event is let through, the Event Queue data defined in the
associated IVE is pushed in the in-memory event queue. The latter is a
circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG hcall,
one per server and priority couple. It is composed of Event Queue entries
which are 4 bytes long, the first bit being a 'generation' bit and the 31
following bits the EQ Data field.

The EQ Data field provides a way to set an invariant logical event source
number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 983317a6b3f6..df14c5a88275 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -193,9 +193,76 @@ static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
     return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
 }
 
+static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
+{
+    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
+    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
+    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
+    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
+
+    uint64_t qaddr = qaddr_base + (qindex << 2);
+    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
+    uint32_t qentries = 1 << (qsize + 10);
+
+    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
+                      HWADDR_PRIx "\n", __func__, qaddr);
+        return;
+    }
+
+    qindex = (qindex + 1) % qentries;
+    if (qindex == 0) {
+        qgen ^= 1;
+        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
+    }
+    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
+}
+
 static void spapr_xive_irq(sPAPRXive *xive, int lisn)
 {
+    XiveIVE *ive;
+    XiveEQ *eq;
+    uint32_t eq_idx;
+    uint8_t priority;
+
+    ive = spapr_xive_get_ive(xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
+        return;
+    }
 
+    if (ive->w & IVE_MASKED) {
+        return;
+    }
+
+    /* Find our XiveEQ */
+    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
+    eq = spapr_xive_get_eq(xive, eq_idx);
+    if (!eq) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", lisn);
+        return;
+    }
+
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        spapr_xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
+    }
+
+    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
+        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
+    }
+
+    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
+        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
+
+        /* The EQ is masked. Can this happen ?  */
+        if (priority == 0xff) {
+            g_assert_not_reached();
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
+    }
 }
 
 /*
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (13 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-30  5:00   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 16/25] spapr: add support for the SET_OS_PENDING command (XIVE) Cédric Le Goater
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The Pending Interrupt Priority Register (PIPR) contains the priority
of the most favored pending notification. It is calculated from the
Interrupt Pending Buffer (IPB) which indicates a pending interrupt at
the priority corresponding to the bit number.

If the PIPR is more favored (1) than the Current Processor Priority
Register (CPPR), the CPU interrupt line is raised and the EO bit of
the Notification Source Register is updated to notify the presence of
an exception for the O/S. The check needs to be done whenever the PIPR
or the CPPR is changed.

Then, the O/S Exception is raised and the O/S acknowledges the
interrupt with a special read in the TIMA. If the EO bit of the
Notification Source Register (NSR) is set (and it should), the Current
Processor Priority Register (CPPR) takes the value of the Pending
Interrupt Priority Register (PIPR). The bit number in the Interrupt
Pending Buffer (IPB) corresponding to the priority of the pending
interrupt is reseted and so is the EO bit of the NSR.

(1) numerically less than

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index df14c5a88275..fead9c7031f3 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -39,9 +39,63 @@ struct sPAPRXiveICP {
     XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
 };
 
+/* Convert a priority number to an Interrupt Pending Buffer (IPB)
+ * register, which indicates a pending interrupt at the priority
+ * corresponding to the bit number
+ */
+static uint8_t priority_to_ipb(uint8_t priority)
+{
+    return priority > XIVE_PRIORITY_MAX ?
+        0 : 1 << (XIVE_PRIORITY_MAX - priority);
+}
+
+/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
+ * Interrupt Priority Register (PIPR), which contains the priority of
+ * the most favored pending notification.
+ *
+ * TODO:
+ *
+ *   PIPR is clamped to CPPR. So the value in the PIPR is:
+ *
+ *     v = leftmost_bit_of(ipb) (or 0xff);
+ *     pipr = v < cppr ? v : cppr;
+ *
+ * Ben says: "which means it's never actually 0xff ... surprise !".
+ * But, the CPPR can be set to 0xFF ... I am confused ...
+ */
+static uint8_t ipb_to_pipr(uint8_t ibp)
+{
+    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
+}
+
 static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
 {
-    return 0;
+    uint8_t nsr = icp->tima_os[TM_NSR];
+
+    qemu_irq_lower(icp->output);
+
+    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
+        uint8_t cppr = icp->tima_os[TM_PIPR];
+
+        icp->tima_os[TM_CPPR] = cppr;
+
+        /* Reset the pending buffer bit */
+        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
+        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
+
+        /* Drop Exception bit for OS */
+        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
+    }
+
+    return (nsr << 8) | icp->tima_os[TM_CPPR];
+}
+
+static void spapr_xive_icp_notify(sPAPRXiveICP *icp)
+{
+    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
+        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
+        qemu_irq_raise(icp->output);
+    }
 }
 
 static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
@@ -51,6 +105,9 @@ static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
     }
 
     icp->tima_os[TM_CPPR] = cppr;
+
+    /* CPPR has changed, inform the ICP which might raise an exception */
+    spapr_xive_icp_notify(icp);
 }
 
 /*
@@ -224,6 +281,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
     XiveEQ *eq;
     uint32_t eq_idx;
     uint8_t priority;
+    uint32_t server;
+    sPAPRXiveICP *icp;
 
     ive = spapr_xive_get_ive(xive, lisn);
     if (!ive || !(ive->w & IVE_VALID)) {
@@ -253,6 +312,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
         qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
     }
 
+    server = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
+    icp = spapr_xive_icp_get(xive, server);
+    if (!icp) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for server %d\n", server);
+        return;
+    }
+
     if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
         priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
 
@@ -260,9 +326,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
         if (priority == 0xff) {
             g_assert_not_reached();
         }
+
+        /* Update the IPB (Interrupt Pending Buffer) with the priority
+         * of the new notification and inform the ICP, which will
+         * decide to raise the exception, or not, depending the CPPR.
+         */
+        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
+        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
     } else {
         qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
     }
+
+    spapr_xive_icp_notify(icp);
 }
 
 /*
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 16/25] spapr: add support for the SET_OS_PENDING command (XIVE)
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (14 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine Cédric Le Goater
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Adjusting the Interrupt Pending Buffer for the O/S would allow a CPU
to process event queues of other priorities during one physical
interrupt cycle. This is not currently used by the XIVE support for
sPAPR in Linux but it is by the hypervisor.

>From Ben :

  It's a way to avoid the SW replay on EOI.

  IE, assume you have 2 interrupts in the queue. You take the exception,
  ack the first one, process it etc... Then you EOI, the HW won't send
  a second notification. You need to look at the queue and continue
  consuming until it's empty.

  Today Linux checks the queue on EOI and use a SW mechanism to
  synthesize a new pseudo-external interrupt.

  This MMIO command would allow the OS to instead set back the
  corresponding priority bit to 1 in the IPB and cause the HW to
  re-emit the interrupt instead of SW.

  Linux doesn't use this today because DD1 didn't support it for the
  HV level, but other OSes might and we also might use it when we do
  groups, thus allowing redistribution.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index fead9c7031f3..b732aaf4f8ba 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -181,7 +181,14 @@ static bool spapr_xive_tm_is_readonly(uint8_t offset)
 static void spapr_xive_tm_write_special(sPAPRXiveICP *icp, hwaddr offset,
                                   uint64_t value, unsigned size)
 {
-    /* TODO: support TM_SPC_SET_OS_PENDING */
+    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
+        icp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
+        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
+        spapr_xive_icp_notify(icp);
+    } else {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                      HWADDR_PRIx" size %d\n", offset, size);
+    }
 
     /* TODO: support TM_SPC_ACK_OS_EL */
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (15 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 16/25] spapr: add support for the SET_OS_PENDING command (XIVE) Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-30  5:55   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 18/25] spapr: allocate IRQ numbers for the XIVE interrupt mode Cédric Le Goater
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The XIVE object is designed to be always available, so it is created
unconditionally on newer machines. Depending on the configuration and
the guest capabilities, the CAS negotiation process will decide which
interrupt model to use, legacy or XIVE.

The XIVE model makes use of the full range of the IRQ number space
because the IRQ numbers for the CPU IPIs are allocated in the range
below XICS_IRQ_BASE, which is unused by XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c         | 34 ++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  2 ++
 2 files changed, 36 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5d3325ca3c88..0e0107c8272c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -56,6 +56,7 @@
 #include "hw/ppc/spapr_vio.h"
 #include "hw/pci-host/spapr.h"
 #include "hw/ppc/xics.h"
+#include "hw/ppc/spapr_xive.h"
 #include "hw/pci/msi.h"
 
 #include "hw/pci/pci.h"
@@ -204,6 +205,29 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
     }
 }
 
+static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
+                                    Error **errp)
+{
+    Error *local_err = NULL;
+    Object *obj;
+
+    obj = object_new(TYPE_SPAPR_XIVE);
+    object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
+    object_property_set_int(obj, nr_irqs, "nr-irqs",  &local_err);
+    if (local_err) {
+        goto error;
+    }
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        goto error;
+    }
+
+    return SPAPR_XIVE(obj);
+error:
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
                                   int smt_threads)
 {
@@ -2360,6 +2384,16 @@ static void ppc_spapr_init(MachineState *machine)
     /* Set up Interrupt Controller before we create the VCPUs */
     xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
 
+    /* We don't have KVM support yet, so check for irqchip=on */
+    if (kvm_enabled() && machine_kernel_irqchip_required(machine)) {
+        error_report("kernel_irqchip requested. no XIVE support");
+    } else {
+        /* XIVE uses the full range of IRQ numbers. The CPU IPIs will
+         * use the range below XICS_IRQ_BASE, which is unused by XICS. */
+        spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + XICS_IRQS_SPAPR,
+                                        &error_fatal);
+    }
+
     /* Set up containers for ibm,client-architecture-support negotiated options
      */
     spapr->ov5 = spapr_ovec_new();
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9a3885593c86..90e2b0f6c678 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -14,6 +14,7 @@ struct sPAPRNVRAM;
 typedef struct sPAPREventLogEntry sPAPREventLogEntry;
 typedef struct sPAPREventSource sPAPREventSource;
 typedef struct sPAPRPendingHPT sPAPRPendingHPT;
+typedef struct sPAPRXive sPAPRXive;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -127,6 +128,7 @@ struct sPAPRMachineState {
     MemoryHotplugState hotplug_memory;
 
     const char *icp_type;
+    sPAPRXive  *xive;
 };
 
 #define H_SUCCESS         0
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 18/25] spapr: allocate IRQ numbers for the XIVE interrupt mode
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (16 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 19/25] spapr: add hcalls support " Cédric Le Goater
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The IRQ numbers for the IPIs are allocated at the bottom of the IRQ
number space to preserve compatibility with XICS which only uses IRQ
numbers above 4096.

Also make sure that the allocated IRQ numbers are kept in sync between
XICS and XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0e0107c8272c..ca4e72187f60 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2392,6 +2392,11 @@ static void ppc_spapr_init(MachineState *machine)
          * use the range below XICS_IRQ_BASE, which is unused by XICS. */
         spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + XICS_IRQS_SPAPR,
                                         &error_fatal);
+
+        /* Allocate the first IRQ numbers for the XIVE IPIs */
+        for (i = 0; i < xics_max_server_number(); ++i) {
+            spapr_xive_irq_set(spapr->xive, i, false);
+        }
     }
 
     /* Set up containers for ibm,client-architecture-support negotiated options
@@ -3631,6 +3636,7 @@ static int ics_find_free_block(ICSState *ics, int num, int alignnum)
 static void spapr_irq_set(sPAPRMachineState *spapr, int irq, bool lsi)
 {
     ics_set_irq_type(spapr->ics, irq - spapr->ics->offset, lsi);
+    spapr_xive_irq_set(spapr->xive, irq, lsi);
 }
 
 int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
@@ -3721,6 +3727,7 @@ void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
             memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
         }
     }
+    spapr_xive_irq_unset(spapr->xive, irq);
 }
 
 qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (17 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 18/25] spapr: allocate IRQ numbers for the XIVE interrupt mode Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-12-01  4:01   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 20/25] spapr: add device tree " Cédric Le Goater
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

A set of Hypervisor's call are used to configure the interrupt sources
and the event/notification queues of the guest:

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines to which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification config associated with the
   queue, only unconditional notification for the moment.  Reset is
   performed with a queue size of 0 and queueing is disabled in that
   case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the partition's interrupt exploitation structures to
   their initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure sure all
   notifications have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

See the code for more documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/Makefile.objs       |   2 +-
 hw/intc/spapr_xive_hcall.c  | 885 ++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr.c              |   2 +
 include/hw/ppc/spapr.h      |  15 +-
 include/hw/ppc/spapr_xive.h |   4 +
 5 files changed, 906 insertions(+), 2 deletions(-)
 create mode 100644 hw/intc/spapr_xive_hcall.c

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 49e13e7aeeee..122e2ec77e8d 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -35,7 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
-obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
+obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o spapr_xive_hcall.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
new file mode 100644
index 000000000000..676fe0e2d5c7
--- /dev/null
+++ b/hw/intc/spapr_xive_hcall.c
@@ -0,0 +1,885 @@
+/*
+ * QEMU PowerPC sPAPR XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "cpu.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/fdt.h"
+#include "monitor/monitor.h"
+
+#include "xive-internal.h"
+
+/* Priority ranges reserved by the hypervisor. The Linux driver is
+ * expected to choose priority 6.
+ */
+static const uint32_t reserved_priorities[] = {
+    7,    /* start */
+    0xf8, /* count */
+};
+
+static bool priority_is_valid(uint32_t priority)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(reserved_priorities) / 2; i++) {
+        uint32_t base  = reserved_priorities[2 * i];
+        uint32_t count = reserved_priorities[2 * i + 1];
+
+        if (priority >= base && priority < base + count) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: priority %d is reserved\n",
+                          __func__, priority);
+            return false;
+        }
+    }
+
+    return true;
+}
+
+/*
+ * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
+ * real address of the MMIO page through which the Event State Buffer
+ * entry associated with the value of the "lisn" parameter is managed.
+ *
+ * Parameters:
+ * Input
+ * - "flags"
+ *       Bits 0-63 reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as returned
+ *       by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output
+ * - R4: "flags"
+ *       Bits 0-59: Reserved
+ *       Bit 60: H_INT_ESB must be used for Event State Buffer
+ *               management
+ *       Bit 61: 1 == LSI  0 == MSI
+ *       Bit 62: the full function page supports trigger
+ *       Bit 63: Store EOI Supported
+ * - R5: Logical Real address of full function Event State Buffer
+ *       management page, -1 if ESB hcall flag is set to 1.
+ * - R6: Logical Real Address of trigger only Event State Buffer
+ *       management page or -1.
+ * - R7: Power of 2 page size for the ESB management pages returned in
+ *       R5 and R6.
+ */
+static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
+                                          sPAPRMachineState *spapr,
+                                          target_ulong opcode,
+                                          target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveIVE *ive;
+    target_ulong flags  = args[0];
+    target_ulong lisn   = args[1];
+    uint64_t mmio_base;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    ive = spapr_xive_get_ive(spapr->xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    mmio_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
+
+    args[0] = 0;
+    if (spapr_xive_irq_is_lsi(xive, lisn)) {
+        args[0] |= XIVE_SRC_LSI;
+    }
+    if (xive->flags & XIVE_SRC_TRIGGER) {
+        args[0] |= XIVE_SRC_TRIGGER;
+    }
+
+    if (xive->flags & XIVE_SRC_H_INT_ESB) {
+        args[1] = -1; /* never used in QEMU  */
+        args[2] = -1;
+    } else {
+        args[1] = mmio_base;
+        if (xive->flags & XIVE_SRC_TRIGGER) {
+            args[2] = -1; /* No specific trigger page */
+        } else {
+            args[2] = -1; /* TODO: support for specific trigger page */
+        }
+    }
+
+    args[3] = xive->esb_shift;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
+ * Interrupt Source to a target. The Logical Interrupt Source is
+ * designated with the "lisn" parameter and the target is designated
+ * with the "target" and "priority" parameters.  Upon return from the
+ * hcall(), no additional interrupts will be directed to the old EQ.
+ *
+ * TODO: The old EQ should be investigated for interrupts that
+ * occurred prior to or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-61: Reserved
+ *      Bit 62: set the "eisn" in the EA
+ *      Bit 63: masks the interrupt source in the hardware interrupt
+ *      control structure. An interrupt masked by this mechanism will
+ *      be dropped, but it's source state bits will still be
+ *      set. There is no race-free way of unmasking and restoring the
+ *      source. Thus this should only be used in interrupts that are
+ *      also masked at the source, and only in cases where the
+ *      interrupt is not meant to be used for a large amount of time
+ *      because no valid target exists for it for example
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as returned by
+ *      the H_ALLOCATE_VAS_WINDOW hcall
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *      "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *      "ibm,plat-res-int-priorities"
+ * - "eisn" is the guest EISN associated with the "lisn"
+ *
+ * Output:
+ * - None
+ */
+
+#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
+#define XIVE_SRC_MASK     (1ull << (63 - 63))
+
+static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    XiveIVE *ive;
+    uint64_t new_ive;
+    target_ulong flags    = args[0];
+    target_ulong lisn     = args[1];
+    target_ulong target   = args[2];
+    target_ulong priority = args[3];
+    target_ulong eisn     = args[4];
+    uint32_t eq_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    ive = spapr_xive_get_ive(spapr->xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    /* priority 0xff is used to reset the IVE */
+    if (priority == 0xff) {
+        new_ive = IVE_VALID | IVE_MASKED;
+        goto out;
+    }
+
+    new_ive = ive->w;
+
+    if (flags & XIVE_SRC_MASK) {
+        new_ive = ive->w | IVE_MASKED;
+    } else {
+        new_ive = ive->w & ~IVE_MASKED;
+    }
+
+    if (!priority_is_valid(priority)) {
+        return H_P4;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
+        return H_P3;
+    }
+
+    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
+    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
+
+    if (flags & XIVE_SRC_SET_EISN) {
+        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
+    }
+
+out:
+    /* TODO: handle syncs ? */
+
+    /* And update */
+    ive->w = new_ive;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
+ * target/priority pair is assigned to the specified Logical Interrupt
+ * Source.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63 Reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as
+ *      returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - R4: Target to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R5: Priority to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R6: EISN for the specified Logical Interrupt Source (this will be
+ *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
+ */
+static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong lisn = args[1];
+    XiveIVE *ive;
+    XiveEQ *eq;
+    uint32_t eq_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    ive = spapr_xive_get_ive(spapr->xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
+    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!eq) {
+        return H_HARDWARE;
+    }
+
+    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
+
+    if (ive->w & IVE_MASKED) {
+        args[1] = 0xff;
+    } else {
+        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
+    }
+
+    args[2] = GETFIELD(IVE_EQ_DATA, ive->w);
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
+ * address of the notification management page associated with the
+ * specified target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *       Bits 0-63 Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: Logical real address of notification page
+ * - R5: Power of 2 page size of the notification page
+ */
+static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
+                                         sPAPRMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    uint32_t eq_idx;
+    XiveEQ *eq;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!eq)  {
+        return H_HARDWARE;
+    }
+
+    args[0] = -1; /* TODO: return ESn page */
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        args[1] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
+    } else {
+        args[1] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
+ * a given "target" and "priority".  It is also used to set the
+ * notification config associated with the EQ.  An EQ size of 0 is
+ * used to reset the EQ config for a given target and priority. If
+ * resetting the EQ config, the END associated with the given "target"
+ * and "priority" will be changed to disable queueing.
+ *
+ * Upon return from the hcall(), no additional interrupts will be
+ * directed to the old EQ (if one was set). The old EQ (if one was
+ * set) should be investigated for interrupts that occurred prior to
+ * or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      Bit 63: Unconditional Notify (n) per the XIVE spec
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ * - "eventQueue": The logical real address of the start of the EQ
+ * - "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
+ *
+ * Output:
+ * - None
+ */
+
+#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
+
+static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    target_ulong qpage    = args[3];
+    target_ulong qsize    = args[4];
+    uint32_t eq_idx;
+    XiveEQ *old_eq;
+    XiveEQ eq;
+    uint32_t qdata;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_EQ_ALWAYS_NOTIFY) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    old_eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!old_eq)  {
+        return H_HARDWARE;
+    }
+
+    eq = *old_eq;
+
+    switch (qsize) {
+    case 12:
+    case 16:
+    case 21:
+    case 24:
+        eq.w3 = ((uint64_t)qpage) & 0xffffffff;
+        eq.w2 = (((uint64_t)qpage)) >> 32 & 0x0fffffff;
+        eq.w0 |= EQ_W0_ENQUEUE;
+        eq.w0 = SETFIELD(EQ_W0_QSIZE, eq.w0, qsize - 12);
+        break;
+    case 0:
+        /* reset queue and disable queueing */
+        eq.w2 = eq.w3 = 0;
+        eq.w0 &= ~EQ_W0_ENQUEUE;
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid EQ size %"PRIx64"\n",
+                      __func__, qsize);
+        return H_P5;
+    }
+
+    if (qsize) {
+        /*
+         * Let's validate the EQ address with a read of the first EQ
+         * entry. We could also check that the full queue has been
+         * zeroed by the OS.
+         */
+        if (address_space_read(&address_space_memory, qpage,
+                               MEMTXATTRS_UNSPECIFIED,
+                               (uint8_t *) &qdata, sizeof(qdata))) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to read EQ data @0x%"
+                          HWADDR_PRIx "\n", __func__, qpage);
+            return H_P4;
+        }
+    }
+
+    /* Ensure the priority and target are correctly set (they will not
+     * be right after allocation)
+     */
+    eq.w6 = SETFIELD(EQ_W6_NVT_BLOCK, 0ul, 0ul) |
+        SETFIELD(EQ_W6_NVT_INDEX, 0ul, target);
+    eq.w7 = SETFIELD(EQ_W7_F0_PRIORITY, 0ul, priority);
+
+    /* TODO: depends on notitification page (ESn) from H_INT_GET_QUEUE_INFO */
+    if (flags & XIVE_EQ_ALWAYS_NOTIFY) {
+        eq.w0 |= EQ_W0_UCOND_NOTIFY;
+    }
+
+    /* The generation bit for the EQ starts at 1 and The EQ page
+     * offset counter starts at 0.
+     */
+    eq.w1 = EQ_W1_GENERATION | SETFIELD(EQ_W1_PAGE_OFF, 0ul, 0ul);
+    eq.w0 |= EQ_W0_VALID;
+
+    /* TODO: issue syncs required to ensure all in-flight interrupts
+     * are complete on the old EQ */
+
+    /* Update EQ */
+    *old_eq = eq;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
+ * target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      Bit 63: Debug: Return debug data
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: "flags":
+ *       Bits 0-62: Reserved
+ *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
+ * - R5: The logical real address of the start of the EQ
+ * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
+ * - R7: The value of Event Queue Offset Counter per XIVE spec
+ *       if "Debug" = 1, else 0
+ *
+ */
+
+#define XIVE_EQ_DEBUG     (1ull << (63 - 63))
+
+static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    uint32_t eq_idx;
+    XiveEQ *eq;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_EQ_DEBUG) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!eq)  {
+        return H_HARDWARE;
+    }
+
+    args[0] = 0;
+    if (eq->w0 & EQ_W0_UCOND_NOTIFY) {
+        args[0] |= XIVE_EQ_ALWAYS_NOTIFY;
+    }
+
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        args[1] =
+            (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
+        args[2] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
+    } else {
+        args[1] = 0;
+        args[2] = 0;
+    }
+
+    /* TODO: do we need any locking on the EQ ? */
+    if (flags & XIVE_EQ_DEBUG) {
+        /* Load the event queue generation number into the return flags */
+        args[0] |= GETFIELD(EQ_W1_GENERATION, eq->w1);
+
+        /* Load R7 with the event queue offset counter */
+        args[3] = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
+ * reporting cache line pair for the calling thread.  The reporting
+ * cache lines will contain the OS interrupt context when the OS
+ * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
+ * interrupt. The reporting cache lines can be reset by inputting -1
+ * in "reportingLine".  Issuing the CI store byte without reporting
+ * cache lines registered will result in the data not being accessible
+ * to the OS.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "reportingLine": The logical real address of the reporting cache
+ *    line pair
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_SET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
+ * real address of the reporting cache line pair set for the input
+ * "target".  If no reporting cache line pair has been set, -1 is
+ * returned.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "reportingLine": The logical real address of the reporting cache
+ *   line pair
+ *
+ * Output:
+ * - R4: The logical real address of the reporting line if set, else -1
+ */
+static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_GET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_ESB hcall() is used to issue a load or store to the ESB
+ * page for the input "lisn".  This hcall is only supported for LISNs
+ * that have the ESB hcall flag set to 1 when returned from hcall()
+ * H_INT_GET_SOURCE_INFO.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      bit 63: Store: Store=1, store operation, else load operation
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as
+ *      returned by the H_ALLOCATE_VAS_WINDOW hcall
+ * - "esbOffset" is the offset into the ESB page for the load or store operation
+ * - "storeData" is the data to write for a store operation
+ *
+ * Output:
+ * - R4: R4: The value of the load if load operation, else -1
+ */
+
+#define XIVE_ESB_STORE (1ull << (63 - 63))
+
+static target_ulong h_int_esb(PowerPCCPU *cpu,
+                              sPAPRMachineState *spapr,
+                              target_ulong opcode,
+                              target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveIVE *ive;
+    target_ulong flags   = args[0];
+    target_ulong lisn    = args[1];
+    target_ulong offset  = args[2];
+    target_ulong data    = args[3];
+    uint64_t esb_base;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_ESB_STORE) {
+        return H_PARAMETER;
+    }
+
+    ive = spapr_xive_get_ive(xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    if (offset > (1ull << xive->esb_shift)) {
+        return H_P3;
+    }
+
+    esb_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
+    esb_base += offset;
+
+    if (dma_memory_rw(&address_space_memory, esb_base, &data, 8,
+                      (flags & XIVE_ESB_STORE))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to rw data @0x%"
+                      HWADDR_PRIx "\n", __func__, esb_base);
+        return H_HARDWARE;
+    }
+    args[0] = (flags & XIVE_ESB_STORE) ? -1 : data;
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SYNC hcall() is used to issue hardware syncs that will
+ * ensure any in flight events for the input lisn are in the event
+ * queue.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as
+ *      returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_sync(PowerPCCPU *cpu,
+                               sPAPRMachineState *spapr,
+                               target_ulong opcode,
+                               target_ulong *args)
+{
+    XiveIVE *ive;
+    target_ulong flags   = args[0];
+    target_ulong lisn    = args[1];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    ive = spapr_xive_get_ive(spapr->xive, lisn);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* This is not real hardware. Nothing to be done */
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_RESET hcall() is used to reset all of the partition's
+ * interrupt exploitation structures to their initial state.  This
+ * means losing all previously set interrupt state set via
+ * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_reset(PowerPCCPU *cpu,
+                                sPAPRMachineState *spapr,
+                                target_ulong opcode,
+                                target_ulong *args)
+{
+    target_ulong flags   = args[0];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    spapr_xive_reset(spapr->xive);
+    return H_SUCCESS;
+}
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr)
+{
+    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
+    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
+    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
+    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
+    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
+                             h_int_set_os_reporting_line);
+    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
+                             h_int_get_os_reporting_line);
+    spapr_register_hypercall(H_INT_ESB, h_int_esb);
+    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
+    spapr_register_hypercall(H_INT_RESET, h_int_reset);
+}
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ca4e72187f60..8b15c0b500d0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -222,6 +222,8 @@ static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
         goto error;
     }
 
+    spapr_xive_hcall_init(spapr);
+
     return SPAPR_XIVE(obj);
 error:
     error_propagate(errp, local_err);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 90e2b0f6c678..a25e218b34e2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -387,7 +387,20 @@ struct sPAPRMachineState {
 #define H_INVALIDATE_PID        0x378
 #define H_REGISTER_PROC_TBL     0x37C
 #define H_SIGNAL_SYS_RESET      0x380
-#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
+
+#define H_INT_GET_SOURCE_INFO   0x3A8
+#define H_INT_SET_SOURCE_CONFIG 0x3AC
+#define H_INT_GET_SOURCE_CONFIG 0x3B0
+#define H_INT_GET_QUEUE_INFO    0x3B4
+#define H_INT_SET_QUEUE_CONFIG  0x3B8
+#define H_INT_GET_QUEUE_CONFIG  0x3BC
+#define H_INT_SET_OS_REPORTING_LINE 0x3C0
+#define H_INT_GET_OS_REPORTING_LINE 0x3C4
+#define H_INT_ESB               0x3C8
+#define H_INT_SYNC              0x3CC
+#define H_INT_RESET             0x3D0
+
+#define MAX_HCALL_OPCODE        H_INT_RESET
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 6e8a189e723f..3f822220647f 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -79,4 +79,8 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
 void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
 
+typedef struct sPAPRMachineState sPAPRMachineState;
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 20/25] spapr: add device tree support for the XIVE interrupt mode
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (18 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 19/25] spapr: add hcalls support " Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-12-04  7:49   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions Cédric Le Goater
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The XIVE interface for the guest is described in the device tree under
the "interrupt-controller" node. A couple of new properties are
specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), also called rings, for the User level and
   for the Guest OS level. Only the Guest OS level is taken into
   account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the interrupt numbers ranges assigned to the guest. These are
   allocated using a simple bitmap.

and also under the root node :

 - "ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use. Simulate ranges as defined by the PowerVM Hypervisor.

When the XIVE interrupt mode is activated after the CAS negotiation,
the machine will perform a reboot to rebuild the device tree.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive_hcall.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr.c              |  7 ++++++-
 hw/ppc/spapr_hcall.c        |  6 ++++++
 include/hw/ppc/spapr_xive.h |  2 ++
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
index 676fe0e2d5c7..60c6c9f4be8f 100644
--- a/hw/intc/spapr_xive_hcall.c
+++ b/hw/intc/spapr_xive_hcall.c
@@ -883,3 +883,53 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_INT_SYNC, h_int_sync);
     spapr_register_hypercall(H_INT_RESET, h_int_reset);
 }
+
+void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
+                         void *fdt, uint32_t phandle)
+{
+    sPAPRXive *xive = spapr->xive;
+    int node;
+    uint64_t timas[2 * 2];
+    uint32_t lisn_ranges[] = {
+        cpu_to_be32(0),
+        cpu_to_be32(nr_servers),
+    };
+    uint32_t eq_sizes[] = {
+        cpu_to_be32(12), /* 4K */
+        cpu_to_be32(16), /* 64K */
+        cpu_to_be32(21), /* 2M */
+        cpu_to_be32(24), /* 16M */
+    };
+    uint32_t plat_res_int_priorities[ARRAY_SIZE(reserved_priorities)];
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(plat_res_int_priorities); i++) {
+        plat_res_int_priorities[i] = cpu_to_be32(reserved_priorities[i]);
+    }
+
+    /* Thread Interrupt Management Areas : User and OS */
+    for (i = 0; i < 2; i++) {
+        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
+        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
+    }
+
+    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
+
+    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
+    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
+
+    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
+                     sizeof(eq_sizes)));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
+                     sizeof(lisn_ranges)));
+
+    /* For SLOF */
+    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
+    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
+
+    /* top properties */
+    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
+                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
+}
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8b15c0b500d0..3a62369883cc 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1127,7 +1127,12 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
     /* /interrupt controller */
-    spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
+    } else {
+        /* Populate device tree for XIVE */
+        spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
+    }
 
     ret = spapr_populate_memory(spapr, fdt);
     if (ret < 0) {
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index be22a6b2895f..e2a1665beee9 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1646,6 +1646,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             (spapr_h_cas_compose_response(spapr, args[1], args[2],
                                           ov5_updates) != 0);
     }
+
+    /* We need to rebuild the device tree for XIVE, generate a reset */
+    if (!spapr->cas_reboot) {
+        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT);
+    }
+
     spapr_ovec_cleanup(ov5_updates);
 
     if (spapr->cas_reboot) {
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 3f822220647f..f6d4bf26e06a 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -82,5 +82,7 @@ void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
 typedef struct sPAPRMachineState sPAPRMachineState;
 
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers, void *fdt,
+                         uint32_t phandle);
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (19 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 20/25] spapr: add device tree " Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-12-04  7:52   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 22/25] spapr: add XIVE support to spapr_irq_get_qirq() Cédric Le Goater
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

When the XIVE interrupt mode is activated, the machine needs to expose
to the guest the MMIO regions use by the controller :

  - Event State Buffer (ESB)
  - Thread Interrupt Management Area (TIMA)

Migration will also need to reflect the current interrupt mode in use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive_hcall.c  | 14 ++++++++++++++
 hw/ppc/spapr.c              |  5 +++++
 include/hw/ppc/spapr_xive.h |  1 +
 3 files changed, 20 insertions(+)

diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
index 60c6c9f4be8f..ba217144878e 100644
--- a/hw/intc/spapr_xive_hcall.c
+++ b/hw/intc/spapr_xive_hcall.c
@@ -933,3 +933,17 @@ void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
     _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
                      plat_res_int_priorities, sizeof(plat_res_int_priorities)));
 }
+
+void spapr_xive_mmio_map(sPAPRMachineState *spapr)
+{
+    sPAPRXive *xive = spapr->xive;
+
+    /* ESBs */
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->esb_base);
+
+    /* Thread Management Interrupt Areas */
+    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
+     * region needs some rework in the handlers */
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1,
+                    xive->tm_base + (1 << xive->tm_shift));
+}
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3a62369883cc..734706c18cb3 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1132,6 +1132,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     } else {
         /* Populate device tree for XIVE */
         spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
+        spapr_xive_mmio_map(spapr);
     }
 
     ret = spapr_populate_memory(spapr, fdt);
@@ -1613,6 +1614,10 @@ static int spapr_post_load(void *opaque, int version_id)
         }
     }
 
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_xive_mmio_map(spapr);
+    }
+
     return err;
 }
 
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index f6d4bf26e06a..88355f7eb643 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -84,5 +84,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
 void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers, void *fdt,
                          uint32_t phandle);
+void spapr_xive_mmio_map(sPAPRMachineState *spapr);
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 22/25] spapr: add XIVE support to spapr_irq_get_qirq()
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (20 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-12-04  7:52   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 23/25] spapr: toggle the ICP depending on the selected interrupt mode Cédric Le Goater
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

The XIVE object has its own set of qirqs which is to be used when the
XIVE interrupt mode is activated.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 734706c18cb3..a91ec1c0751a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3746,8 +3746,12 @@ qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq)
 {
     ICSState *ics = spapr->ics;
 
-    if (ics_valid_irq(ics, irq)) {
-        return ics->qirqs[irq - ics->offset];
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return spapr->xive->qirqs[irq];
+    } else {
+        if (ics_valid_irq(ics, irq)) {
+            return ics->qirqs[irq - ics->offset];
+        }
     }
 
     return NULL;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 23/25] spapr: toggle the ICP depending on the selected interrupt mode
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (21 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 22/25] spapr: add XIVE support to spapr_irq_get_qirq() Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-12-04  7:56   ` David Gibson
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 24/25] spapr: add support to dump XIVE information Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 25/25] spapr: advertise XIVE exploitation mode in CAS Cédric Le Goater
  24 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Each interrupt mode has its own specific interrupt presenter object,
that we store under the CPU object, one for XICS and one for XIVE. The
active presenter, corresponding to the current interrupt mode, is
simply selected with a lookup on the children of the CPU.

Migration and CPU hotplug also need to reflect the current interrupt
mode in use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c                  | 21 ++++++++++++++++++++-
 hw/ppc/spapr_cpu_core.c         | 31 +++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_cpu_core.h |  1 +
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a91ec1c0751a..b7389dbdf5ca 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1128,8 +1128,10 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
 
     /* /interrupt controller */
     if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_cpu_core_set_icp(spapr->icp_type);
         spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
     } else {
+        spapr_cpu_core_set_icp(TYPE_SPAPR_XIVE_ICP);
         /* Populate device tree for XIVE */
         spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
         spapr_xive_mmio_map(spapr);
@@ -1615,6 +1617,7 @@ static int spapr_post_load(void *opaque, int version_id)
     }
 
     if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_cpu_core_set_icp(TYPE_SPAPR_XIVE_ICP);
         spapr_xive_mmio_map(spapr);
     }
 
@@ -3610,7 +3613,7 @@ static ICPState *spapr_icp_get(XICSFabric *xi, int vcpu_id)
 Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
 {
     Error *local_err = NULL;
-    Object *obj;
+    Object *obj, *obj_xive;
 
     obj = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
     if (local_err) {
@@ -3618,6 +3621,22 @@ Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
         return NULL;
     }
 
+    /* Add a XIVE interrupt presenter. The machine will switch the CPU
+     * ICP depending on the interrupt model negotiated at CAS time.
+     */
+    obj_xive = icp_create(cs, TYPE_SPAPR_XIVE_ICP, XICS_FABRIC(spapr),
+                          &local_err);
+    if (local_err) {
+        object_unparent(obj);
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    /* when hotplugged, the CPU should have the correct ICP */
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return obj_xive;
+    }
+
     return obj;
 }
 
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 61a9850e688b..b0e39270f262 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -257,3 +257,34 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
 };
 
 DEFINE_TYPES(spapr_cpu_core_type_infos)
+
+typedef struct ForeachFindICPArgs {
+    const char *icp_type;
+    Object *icp;
+} ForeachFindICPArgs;
+
+static int spapr_cpu_core_find_icp(Object *child, void *opaque)
+{
+    ForeachFindICPArgs *args = opaque;
+
+    if (object_dynamic_cast(child, args->icp_type)) {
+        args->icp = child;
+    }
+
+    return args->icp != NULL;
+}
+
+void spapr_cpu_core_set_icp(const char *icp_type)
+{
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        ForeachFindICPArgs args = { icp_type, NULL };
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        object_child_foreach(OBJECT(cs), spapr_cpu_core_find_icp, &args);
+        g_assert(args.icp);
+
+        cpu->intc = args.icp;
+    }
+}
diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index f2d48d6a6786..a657dfb8863c 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -38,4 +38,5 @@ typedef struct sPAPRCPUCoreClass {
 } sPAPRCPUCoreClass;
 
 const char *spapr_get_cpu_core_type(const char *cpu_type);
+void spapr_cpu_core_set_icp(const char *icp_type);
 #endif
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 24/25] spapr: add support to dump XIVE information
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (22 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 23/25] spapr: toggle the ICP depending on the selected interrupt mode Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 25/25] spapr: advertise XIVE exploitation mode in CAS Cédric Le Goater
  24 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

Modify the InterruptStatsProvider output to reflect the interrupt mode
currently in use by the machine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b7389dbdf5ca..9fe3a9966b12 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3785,10 +3785,18 @@ static void spapr_pic_print_info(InterruptStatsProvider *obj,
     CPU_FOREACH(cs) {
         PowerPCCPU *cpu = POWERPC_CPU(cs);
 
-        icp_pic_print_info(ICP(cpu->intc), mon);
+        if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+            spapr_xive_icp_pic_print_info(SPAPR_XIVE_ICP(cpu->intc), mon);
+        } else {
+            icp_pic_print_info(ICP(cpu->intc), mon);
+        }
     }
 
-    ics_pic_print_info(spapr->ics, mon);
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_xive_pic_print_info(spapr->xive, mon);
+    } else {
+        ics_pic_print_info(spapr->ics, mon);
+    }
 }
 
 int spapr_vcpu_id(PowerPCCPU *cpu)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [Qemu-devel] [PATCH 25/25] spapr: advertise XIVE exploitation mode in CAS
  2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (23 preceding siblings ...)
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 24/25] spapr: add support to dump XIVE information Cédric Le Goater
@ 2017-11-23 13:29 ` Cédric Le Goater
  24 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-23 13:29 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt
  Cc: Cédric Le Goater

We introduce a 'xive_exploitation' boolean at the machine level to
enable the XIVE interrupt mode for newer machines and to disable it on
older ones.

The XIVE interrupt mode can still be forced on the command line with a
machine option. That might be a bit dangerous. To be discussed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c   | 10 ++++++----
 hw/ppc/spapr.c         | 52 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/hw/ppc/spapr.h |  1 +
 3 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index b732aaf4f8ba..f7fab70cb8bb 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -736,8 +736,9 @@ static const VMStateDescription vmstate_spapr_xive_ive = {
 
 static bool vmstate_spapr_xive_needed(void *opaque)
 {
-    /* TODO check machine XIVE support */
-    return true;
+    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+
+    return spapr->xive_exploitation;
 }
 
 static const VMStateDescription vmstate_spapr_xive = {
@@ -863,8 +864,9 @@ static const VMStateDescription vmstate_spapr_xive_icp_eq = {
 
 static bool vmstate_spapr_xive_icp_needed(void *opaque)
 {
-    /* TODO check machine XIVE support */
-    return true;
+    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+
+    return spapr->xive_exploitation;
 }
 
 static const VMStateDescription vmstate_spapr_xive_icp = {
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9fe3a9966b12..d3a4b1f8f6f9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -961,10 +961,11 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt)
     spapr_dt_rtas_tokens(fdt, rtas);
 }
 
-/* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU features
- * that the guest may request and thus the valid values for bytes 24..26 of
- * option vector 5: */
-static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
+/* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU
+ * and the XIVE features that the guest may request and thus the valid
+ * values for bytes 23..26 of option vector 5: */
+static void spapr_dt_ov5_platform_support(sPAPRMachineState *spapr, void *fdt,
+                                          int chosen)
 {
     PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
 
@@ -987,7 +988,16 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
         } else {
             val[3] = 0x00; /* Hash */
         }
+        /* TODO: introduce a kvmppc_has_cap_xive() ? Works with
+         * irqchip=off for now
+         */
+        if (spapr->xive_exploitation) {
+            val[1] = 0x80; /* OV5_XIVE_BOTH */
+        }
     } else {
+        if (spapr->xive_exploitation) {
+            val[1] = 0x80; /* OV5_XIVE_BOTH */
+        }
         /* V3 MMU supports both hash and radix in tcg (with dynamic switching) */
         val[3] = 0xC0;
     }
@@ -1048,7 +1058,7 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt)
         _FDT(fdt_setprop_string(fdt, chosen, "linux,stdout-path", stdout_path));
     }
 
-    spapr_dt_ov5_platform_support(fdt, chosen);
+    spapr_dt_ov5_platform_support(spapr, fdt, chosen);
 
     g_free(stdout_path);
     g_free(bootlist);
@@ -2441,6 +2451,11 @@ static void ppc_spapr_init(MachineState *machine)
         spapr_ovec_set(spapr->ov5, OV5_HPT_RESIZE);
     }
 
+    /* advertise XIVE if not disabled by the user */
+    if (spapr->xive_exploitation) {
+        spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
+    }
+
     /* init CPUs */
     spapr_set_vsmt_mode(spapr, &error_fatal);
 
@@ -2840,6 +2855,21 @@ static void spapr_set_vsmt(Object *obj, Visitor *v, const char *name,
     visit_type_uint32(v, name, (uint32_t *)opaque, errp);
 }
 
+static bool spapr_get_xive_exploitation(Object *obj, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    return spapr->xive_exploitation;
+}
+
+static void spapr_set_xive_exploitation(Object *obj, bool value,
+                                            Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    spapr->xive_exploitation = value;
+}
+
 static void spapr_machine_initfn(Object *obj)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
@@ -2875,6 +2905,15 @@ static void spapr_machine_initfn(Object *obj)
     object_property_set_description(obj, "vsmt",
                                     "Virtual SMT: KVM behaves as if this were"
                                     " the host's SMT mode", &error_abort);
+
+    spapr->xive_exploitation = true;
+    object_property_add_bool(obj, "xive-exploitation",
+                            spapr_get_xive_exploitation,
+                            spapr_set_xive_exploitation,
+                            NULL);
+    object_property_set_description(obj, "xive-exploitation",
+                                    "XIVE exploitation mode POWER9",
+                                    NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -3956,7 +3995,10 @@ DEFINE_SPAPR_MACHINE(2_12, "2.12", true);
 
 static void spapr_machine_2_11_instance_options(MachineState *machine)
 {
+    sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+
     spapr_machine_2_12_instance_options(machine);
+    spapr->xive_exploitation = false;
 }
 
 static void spapr_machine_2_11_class_options(MachineClass *mc)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a25e218b34e2..c4f051f974fe 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -129,6 +129,7 @@ struct sPAPRMachineState {
 
     const char *icp_type;
     sPAPRXive  *xive;
+    bool xive_exploitation;
 };
 
 #define H_SUCCESS         0
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues Cédric Le Goater
@ 2017-11-23 20:31   ` Benjamin Herrenschmidt
  2017-11-24  8:15     ` Cédric Le Goater
  2017-11-30  4:38   ` David Gibson
  1 sibling, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-11-23 20:31 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc, qemu-devel, David Gibson

On Thu, 2017-11-23 at 14:29 +0100, Cédric Le Goater wrote:
> The Event Queue Descriptor (EQD) table, also known as Event Notification
> Descriptor (END), is one of the internal tables the XIVE interrupt
> controller uses to redirect exception from event sources to CPU
> threads.

Keep in mind tha we want to only expose to the guest priorities 0..6
as priority 7 will not be available with KVM on DD2.0 chips.

> The EQD specifies on which Event Queue the event data should be posted
> when an exception occurs (later on pulled by the OS) and which server
> (VPD in XIVE terminology) to notify. The Event Queue is a much more
> complex structure but we start with a simple model for the sPAPR
> machine.
> 
> There is one XiveEQ per priority and the model chooses to store them
> under the Xive Interrupt presenter model. It will be retrieved, just
> like for XICS, through the 'intc' object pointer of the CPU.
> 
> The EQ indexing follows a simple pattern:
> 
>        (server << 3) | (priority & 0x7)
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 106 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 554b25e0884c..983317a6b3f6 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -23,6 +23,7 @@
>  #include "sysemu/dma.h"
>  #include "monitor/monitor.h"
>  #include "hw/ppc/spapr_xive.h"
> +#include "hw/ppc/spapr.h"
>  #include "hw/ppc/xics.h"
>  
>  #include "xive-internal.h"
> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
>      uint8_t   tima[TM_RING_COUNT * 0x10];
>      uint8_t   *tima_os;
>      qemu_irq  output;
> +
> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>  };
>  
>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>      },
>  };
>  
> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
> +{
> +    PowerPCCPU *cpu = spapr_find_cpu(server);
> +
> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
> +}
> +
>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>  {
>  
> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>  
>      memset(xicp->tima, 0, sizeof(xicp->tima));
> +
> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
>  }
>  
>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
>  }
>  
> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT32(w0, XiveEQ),
> +        VMSTATE_UINT32(w1, XiveEQ),
> +        VMSTATE_UINT32(w2, XiveEQ),
> +        VMSTATE_UINT32(w3, XiveEQ),
> +        VMSTATE_UINT32(w4, XiveEQ),
> +        VMSTATE_UINT32(w5, XiveEQ),
> +        VMSTATE_UINT32(w6, XiveEQ),
> +        VMSTATE_UINT32(w7, XiveEQ),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
>  {
>      /* TODO check machine XIVE support */
> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
>      .needed = vmstate_spapr_xive_icp_needed,
>      .fields = (VMStateField[]) {
>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>      ive->w &= ~IVE_VALID;
>      return true;
>  }
> +
> +/*
> + * Use a simple indexing for the EQs.
> + */
> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
> +{
> +    int priority = eq_idx & 0x7;
> +    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
> +
> +    return xicp ? &xicp->eqt[priority] : NULL;
> +}
> +
> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
> +                              uint8_t priority, uint32_t *out_eq_idx)
> +{
> +    if (priority > XIVE_PRIORITY_MAX) {
> +        return false;
> +    }
> +
> +    if (out_eq_idx) {
> +        *out_eq_idx = (server << 3) | (priority & 0x7);
> +    }
> +
> +    return true;
> +}
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index 7d329f203a9b..c3949671aa03 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -131,9 +131,59 @@ typedef struct XiveIVE {
>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>  } XiveIVE;
>  
> +/* EQ */
> +typedef struct XiveEQ {
> +        uint32_t        w0;
> +#define EQ_W0_VALID             PPC_BIT32(0)
> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
> +#define EQ_W0_SW0               PPC_BIT32(16)
> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
> +#define EQ_QSIZE_4K             0
> +#define EQ_QSIZE_64K            4
> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
> +        uint32_t        w1;
> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
> +#define EQ_W1_ESn_P             PPC_BIT32(0)
> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
> +#define EQ_W1_ESe_P             PPC_BIT32(2)
> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
> +#define EQ_W1_GENERATION        PPC_BIT32(9)
> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> +        uint32_t        w2;
> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> +        uint32_t        w3;
> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> +        uint32_t        w4;
> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
> +        uint32_t        w5;
> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
> +        uint32_t        w6;
> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> +        uint32_t        w7;
> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> +} XiveEQ;
> +
>  #define XIVE_PRIORITY_MAX  7
>  
>  void spapr_xive_reset(void *dev);
>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
> +                              uint32_t *out_eq_idx);
>  
>  #endif /* _INTC_XIVE_INTERNAL_H */

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper Cédric Le Goater
@ 2017-11-24  2:51   ` David Gibson
  2017-11-24  7:57     ` Cédric Le Goater
  2017-11-24  9:55     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-11-24  9:08   ` Greg Kurz
  1 sibling, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-11-24  2:51 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 5454 bytes --]

On Thu, Nov 23, 2017 at 02:29:31PM +0100, Cédric Le Goater wrote:
> The sPAPR and the PowerNV core objects create the interrupt presenter
> object of the CPUs in a very similar way. Let's provide a common
> routine in which we use the presenter 'type' as a child identifier.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

One tiny nit.., apart from that

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/intc/xics.c          | 22 ++++++++++++++++++++++
>  hw/ppc/pnv_core.c       | 10 +---------
>  hw/ppc/spapr_cpu_core.c | 13 ++-----------
>  include/hw/ppc/xics.h   |  3 +++
>  4 files changed, 28 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index a1cc0e420c98..e4ccdff8f577 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -384,6 +384,28 @@ static const TypeInfo icp_info = {
>      .class_size = sizeof(ICPStateClass),
>  };
>  
> +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi, Error **errp)
> +{
> +    Object *child = OBJECT(cs);

In the original context 'child' made sense, since it was the child
object of the core.  Here, it's misleading, since it's the parent of
the xics link.  It's only used in a couple of places, so I suggest you
just opencode OBJECT(cs) in each place.

> +    Error *local_err = NULL;
> +    Object *obj;
> +
> +    obj = object_new(type);
> +    object_property_add_child(child, type, obj, &error_abort);
> +    object_unref(obj);
> +    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
> +                                   &error_abort);
> +    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> +    object_property_set_bool(obj, true, "realized", &local_err);
> +    if (local_err) {
> +        object_unparent(obj);
> +        error_propagate(errp, local_err);
> +        obj = NULL;
> +    }
> +
> +    return obj;
> +}
> +
>  /*
>   * ICS: Source layer
>   */
> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> index 82ff440b3334..a066736846f8 100644
> --- a/hw/ppc/pnv_core.c
> +++ b/hw/ppc/pnv_core.c
> @@ -126,7 +126,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>      Error *local_err = NULL;
>      CPUState *cs = CPU(child);
>      PowerPCCPU *cpu = POWERPC_CPU(cs);
> -    Object *obj;
>  
>      object_property_set_bool(child, true, "realized", &local_err);
>      if (local_err) {
> @@ -134,13 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>          return;
>      }
>  
> -    obj = object_new(TYPE_PNV_ICP);
> -    object_property_add_child(child, "icp", obj, NULL);
> -    object_unref(obj);
> -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
> -                                   &error_abort);
> -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> -    object_property_set_bool(obj, true, "realized", &local_err);
> +    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> @@ -148,7 +141,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>  
>      powernv_cpu_init(cpu, &local_err);
>      if (local_err) {
> -        object_unparent(obj);
>          error_propagate(errp, local_err);
>          return;
>      }
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 4ba8563d49e4..f8a520a2fa2d 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -111,7 +111,6 @@ static void spapr_cpu_core_realize_child(Object *child,
>      Error *local_err = NULL;
>      CPUState *cs = CPU(child);
>      PowerPCCPU *cpu = POWERPC_CPU(cs);
> -    Object *obj;
>  
>      object_property_set_bool(child, true, "realized", &local_err);
>      if (local_err) {
> @@ -123,21 +122,13 @@ static void spapr_cpu_core_realize_child(Object *child,
>          goto error;
>      }
>  
> -    obj = object_new(spapr->icp_type);
> -    object_property_add_child(child, "icp", obj, &error_abort);
> -    object_unref(obj);
> -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(spapr),
> -                                   &error_abort);
> -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> -    object_property_set_bool(obj, true, "realized", &local_err);
> +    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>      if (local_err) {
> -        goto free_icp;
> +        goto error;
>      }
>  
>      return;
>  
> -free_icp:
> -    object_unparent(obj);
>  error:
>      error_propagate(errp, local_err);
>  }
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 2df99be111ce..126b47dec38b 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -212,4 +212,7 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>  void xics_spapr_init(sPAPRMachineState *spapr);
>  
> +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi,
> +                   Error **errp);
> +
>  #endif /* XICS_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core Cédric Le Goater
@ 2017-11-24  2:57   ` David Gibson
  2017-11-24  9:21   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  1 sibling, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-11-24  2:57 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2272 bytes --]

On Thu, Nov 23, 2017 at 02:29:32PM +0100, Cédric Le Goater wrote:
> The 'intc' pointer of the CPU references the interrupt presenter in
> the XICS interrupt mode. When the XIVE interrupt mode is available and
> activated, the machine will need to reassign this pointer to reflect
> the change.
> 
> Moving this assignment under the realize routine of the CPU will ease
> the process when the interrupt mode is toggled.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/intc/xics.c          | 1 -
>  hw/ppc/pnv_core.c       | 2 +-
>  hw/ppc/spapr_cpu_core.c | 2 +-
>  3 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index e4ccdff8f577..0f2e7273bc8f 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -334,7 +334,6 @@ static void icp_realize(DeviceState *dev, Error **errp)
>      }
>  
>      cpu = POWERPC_CPU(obj);
> -    cpu->intc = OBJECT(icp);
>      icp->cs = CPU(obj);
>  
>      env = &cpu->env;
> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> index a066736846f8..90acaac45889 100644
> --- a/hw/ppc/pnv_core.c
> +++ b/hw/ppc/pnv_core.c
> @@ -133,7 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>          return;
>      }
>  
> -    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
> +    cpu->intc = icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index f8a520a2fa2d..f7cc74512481 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -122,7 +122,7 @@ static void spapr_cpu_core_realize_child(Object *child,
>          goto error;
>      }
>  
> -    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
> +    cpu->intc = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>      if (local_err) {
>          goto error;
>      }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine Cédric Le Goater
@ 2017-11-24  3:13   ` David Gibson
  2017-11-28 10:57   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  1 sibling, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-11-24  3:13 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 15111 bytes --]

On Thu, Nov 23, 2017 at 02:29:34PM +0100, Cédric Le Goater wrote:
> Also change the prototype to use a sPAPRMachineState and prefix them
> with spapr_irq_. It will let us synchronise the IRQ allocation with
> the XIVE interrupt mode when available.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/intc/trace-events   |   4 --
>  hw/intc/xics_spapr.c   | 114 -------------------------------------------------
>  hw/ppc/spapr.c         | 114 +++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_events.c  |   4 +-
>  hw/ppc/spapr_pci.c     |   8 ++--
>  hw/ppc/spapr_vio.c     |   2 +-
>  hw/ppc/trace-events    |   4 ++
>  include/hw/ppc/spapr.h |   6 +++
>  include/hw/ppc/xics.h  |   4 --
>  9 files changed, 131 insertions(+), 129 deletions(-)
> 
> diff --git a/hw/intc/trace-events b/hw/intc/trace-events
> index b298fac7c6a8..7077aaaee6d0 100644
> --- a/hw/intc/trace-events
> +++ b/hw/intc/trace-events
> @@ -64,10 +64,6 @@ xics_ics_simple_set_irq_lsi(int srcno, int nr) "set_irq_lsi: srcno %d [irq 0x%x]
>  xics_ics_simple_write_xive(int nr, int srcno, int server, uint8_t priority) "ics_write_xive: irq 0x%x [src %d] server 0x%x prio 0x%x"
>  xics_ics_simple_reject(int nr, int srcno) "reject irq 0x%x [src %d]"
>  xics_ics_simple_eoi(int nr) "ics_eoi: irq 0x%x"
> -xics_alloc(int irq) "irq %d"
> -xics_alloc_block(int first, int num, bool lsi, int align) "first irq %d, %d irqs, lsi=%d, alignnum %d"
> -xics_ics_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
> -xics_ics_free_warn(int src, int irq) "Source#%d, irq %d is already free"
>  
>  # hw/intc/s390_flic_kvm.c
>  flic_create_device(int err) "flic: create device failed %d"
> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> index e8c0a1b3e903..5a0967caf430 100644
> --- a/hw/intc/xics_spapr.c
> +++ b/hw/intc/xics_spapr.c
> @@ -245,120 +245,6 @@ void xics_spapr_init(sPAPRMachineState *spapr)
>      spapr_register_hypercall(H_IPOLL, h_ipoll);
>  }
>  
> -#define ICS_IRQ_FREE(ics, srcno)   \
> -    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
> -
> -static int ics_find_free_block(ICSState *ics, int num, int alignnum)
> -{
> -    int first, i;
> -
> -    for (first = 0; first < ics->nr_irqs; first += alignnum) {
> -        if (num > (ics->nr_irqs - first)) {
> -            return -1;
> -        }
> -        for (i = first; i < first + num; ++i) {
> -            if (!ICS_IRQ_FREE(ics, i)) {
> -                break;
> -            }
> -        }
> -        if (i == (first + num)) {
> -            return first;
> -        }
> -    }
> -
> -    return -1;
> -}
> -
> -int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp)
> -{
> -    int irq;
> -
> -    if (!ics) {
> -        return -1;
> -    }
> -    if (irq_hint) {
> -        if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
> -            error_setg(errp, "can't allocate IRQ %d: already in use", irq_hint);
> -            return -1;
> -        }
> -        irq = irq_hint;
> -    } else {
> -        irq = ics_find_free_block(ics, 1, 1);
> -        if (irq < 0) {
> -            error_setg(errp, "can't allocate IRQ: no IRQ left");
> -            return -1;
> -        }
> -        irq += ics->offset;
> -    }
> -
> -    ics_set_irq_type(ics, irq - ics->offset, lsi);
> -    trace_xics_alloc(irq);
> -
> -    return irq;
> -}
> -
> -/*
> - * Allocate block of consecutive IRQs, and return the number of the first IRQ in
> - * the block. If align==true, aligns the first IRQ number to num.
> - */
> -int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
> -                          bool align, Error **errp)
> -{
> -    int i, first = -1;
> -
> -    if (!ics) {
> -        return -1;
> -    }
> -
> -    /*
> -     * MSIMesage::data is used for storing VIRQ so
> -     * it has to be aligned to num to support multiple
> -     * MSI vectors. MSI-X is not affected by this.
> -     * The hint is used for the first IRQ, the rest should
> -     * be allocated continuously.
> -     */
> -    if (align) {
> -        assert((num == 1) || (num == 2) || (num == 4) ||
> -               (num == 8) || (num == 16) || (num == 32));
> -        first = ics_find_free_block(ics, num, num);
> -    } else {
> -        first = ics_find_free_block(ics, num, 1);
> -    }
> -    if (first < 0) {
> -        error_setg(errp, "can't find a free %d-IRQ block", num);
> -        return -1;
> -    }
> -
> -    for (i = first; i < first + num; ++i) {
> -        ics_set_irq_type(ics, i, lsi);
> -    }
> -    first += ics->offset;
> -
> -    trace_xics_alloc_block(first, num, lsi, align);
> -
> -    return first;
> -}
> -
> -static void ics_free(ICSState *ics, int srcno, int num)
> -{
> -    int i;
> -
> -    for (i = srcno; i < srcno + num; ++i) {
> -        if (ICS_IRQ_FREE(ics, i)) {
> -            trace_xics_ics_free_warn(0, i + ics->offset);
> -        }
> -        memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
> -    }
> -}
> -
> -void spapr_ics_free(ICSState *ics, int irq, int num)
> -{
> -    if (ics_valid_irq(ics, irq)) {
> -        trace_xics_ics_free(0, irq, num);
> -        ics_free(ics, irq - ics->offset, num);
> -    }
> -}
> -
>  void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
>  {
>      uint32_t interrupt_server_ranges_prop[] = {
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 925cbd3c1bf4..7ae84d40bdb4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3570,6 +3570,120 @@ Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
>      return obj;
>  }
>  
> +#define ICS_IRQ_FREE(ics, srcno)   \
> +    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
> +
> +static int ics_find_free_block(ICSState *ics, int num, int alignnum)
> +{
> +    int first, i;
> +
> +    for (first = 0; first < ics->nr_irqs; first += alignnum) {
> +        if (num > (ics->nr_irqs - first)) {
> +            return -1;
> +        }
> +        for (i = first; i < first + num; ++i) {
> +            if (!ICS_IRQ_FREE(ics, i)) {
> +                break;
> +            }
> +        }
> +        if (i == (first + num)) {
> +            return first;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
> +                    Error **errp)
> +{
> +    ICSState *ics = spapr->ics;
> +    int irq;
> +
> +    if (!ics) {
> +        return -1;
> +    }
> +    if (irq_hint) {
> +        if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
> +            error_setg(errp, "can't allocate IRQ %d: already in use", irq_hint);
> +            return -1;
> +        }
> +        irq = irq_hint;
> +    } else {
> +        irq = ics_find_free_block(ics, 1, 1);
> +        if (irq < 0) {
> +            error_setg(errp, "can't allocate IRQ: no IRQ left");
> +            return -1;
> +        }
> +        irq += ics->offset;
> +    }
> +
> +    ics_set_irq_type(ics, irq - ics->offset, lsi);
> +    trace_spapr_irq_alloc(irq);
> +
> +    return irq;
> +}
> +
> +/*
> + * Allocate block of consecutive IRQs, and return the number of the first IRQ in
> + * the block. If align==true, aligns the first IRQ number to num.
> + */
> +int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
> +                          bool align, Error **errp)
> +{
> +    ICSState *ics = spapr->ics;
> +    int i, first = -1;
> +
> +    if (!ics) {
> +        return -1;
> +    }
> +
> +    /*
> +     * MSIMesage::data is used for storing VIRQ so
> +     * it has to be aligned to num to support multiple
> +     * MSI vectors. MSI-X is not affected by this.
> +     * The hint is used for the first IRQ, the rest should
> +     * be allocated continuously.
> +     */
> +    if (align) {
> +        assert((num == 1) || (num == 2) || (num == 4) ||
> +               (num == 8) || (num == 16) || (num == 32));
> +        first = ics_find_free_block(ics, num, num);
> +    } else {
> +        first = ics_find_free_block(ics, num, 1);
> +    }
> +    if (first < 0) {
> +        error_setg(errp, "can't find a free %d-IRQ block", num);
> +        return -1;
> +    }
> +
> +    for (i = first; i < first + num; ++i) {
> +        ics_set_irq_type(ics, i, lsi);
> +    }
> +    first += ics->offset;
> +
> +    trace_spapr_irq_alloc_block(first, num, lsi, align);
> +
> +    return first;
> +}
> +
> +void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
> +{
> +    ICSState *ics = spapr->ics;
> +    int srcno = irq - ics->offset;
> +    int i;
> +
> +    if (ics_valid_irq(ics, irq)) {
> +        trace_spapr_irq_free(0, irq, num);
> +        for (i = srcno; i < srcno + num; ++i) {
> +            if (ICS_IRQ_FREE(ics, i)) {
> +                trace_spapr_irq_free_warn(0, i + ics->offset);
> +            }
> +            memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
> +        }
> +    }
> +}
> +
>  static void spapr_pic_print_info(InterruptStatsProvider *obj,
>                                   Monitor *mon)
>  {
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index e377fc7ddea2..cead596f3e7a 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -718,7 +718,7 @@ void spapr_events_init(sPAPRMachineState *spapr)
>      spapr->event_sources = spapr_event_sources_new();
>  
>      spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_EPOW,
> -                                 spapr_ics_alloc(spapr->ics, 0, false,
> +                                 spapr_irq_alloc(spapr, 0, false,
>                                                    &error_fatal));
>  
>      /* NOTE: if machine supports modern/dedicated hotplug event source,
> @@ -731,7 +731,7 @@ void spapr_events_init(sPAPRMachineState *spapr)
>       */
>      if (spapr->use_hotplug_event_source) {
>          spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_HOT_PLUG,
> -                                     spapr_ics_alloc(spapr->ics, 0, false,
> +                                     spapr_irq_alloc(spapr, 0, false,
>                                                        &error_fatal));
>      }
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 5a3122a9f9f9..e0ef77a480e5 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -314,7 +314,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>              return;
>          }
>  
> -        spapr_ics_free(spapr->ics, msi->first_irq, msi->num);
> +        spapr_irq_free(spapr, msi->first_irq, msi->num);
>          if (msi_present(pdev)) {
>              spapr_msi_setmsg(pdev, 0, false, 0, 0);
>          }
> @@ -352,7 +352,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>      }
>  
>      /* Allocate MSIs */
> -    irq = spapr_ics_alloc_block(spapr->ics, req_num, false,
> +    irq = spapr_irq_alloc_block(spapr, req_num, false,
>                             ret_intr_type == RTAS_TYPE_MSI, &err);
>      if (err) {
>          error_reportf_err(err, "Can't allocate MSIs for device %x: ",
> @@ -363,7 +363,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>  
>      /* Release previous MSIs */
>      if (msi) {
> -        spapr_ics_free(spapr->ics, msi->first_irq, msi->num);
> +        spapr_irq_free(spapr, msi->first_irq, msi->num);
>          g_hash_table_remove(phb->msi, &config_addr);
>      }
>  
> @@ -1675,7 +1675,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>          uint32_t irq;
>          Error *local_err = NULL;
>  
> -        irq = spapr_ics_alloc_block(spapr->ics, 1, true, false, &local_err);
> +        irq = spapr_irq_alloc_block(spapr, 1, true, false, &local_err);
>          if (local_err) {
>              error_propagate(errp, local_err);
>              error_prepend(errp, "can't allocate LSIs: ");
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index ea3bc8bd9e21..bb7ed2c537b0 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -454,7 +454,7 @@ static void spapr_vio_busdev_realize(DeviceState *qdev, Error **errp)
>          dev->qdev.id = id;
>      }
>  
> -    dev->irq = spapr_ics_alloc(spapr->ics, dev->irq, false, &local_err);
> +    dev->irq = spapr_irq_alloc(spapr, dev->irq, false, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 4a6a6490fa78..b7c3e64b5ee7 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -12,6 +12,10 @@ spapr_pci_msi_retry(unsigned config_addr, unsigned req_num, unsigned max_irqs) "
>  # hw/ppc/spapr.c
>  spapr_cas_failed(unsigned long n) "DT diff buffer is too small: %ld bytes"
>  spapr_cas_continue(unsigned long n) "Copy changes to the guest: %ld bytes"
> +spapr_irq_alloc(int irq) "irq %d"
> +spapr_irq_alloc_block(int first, int num, bool lsi, int align) "first irq %d, %d irqs, lsi=%d, alignnum %d"
> +spapr_irq_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
> +spapr_irq_free_warn(int src, int irq) "Source#%d, irq %d is already free"
>  
>  # hw/ppc/spapr_hcall.c
>  spapr_cas_pvr_try(uint32_t pvr) "0x%x"
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9da38de34277..7a133f80411a 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -709,4 +709,10 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id);
>  
>  Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp);
>  
> +int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
> +                    Error **errp);
> +int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
> +                          bool align, Error **errp);
> +void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
> +
>  #endif /* HW_SPAPR_H */
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 126b47dec38b..cea462bc7f3e 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -181,10 +181,6 @@ typedef struct XICSFabricClass {
>  
>  #define XICS_IRQS_SPAPR               1024
>  
> -int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp);
> -int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi, bool align,
> -                           Error **errp);
> -void spapr_ics_free(ICSState *ics, int irq, int num);
>  void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>  
>  qemu_irq xics_get_qirq(XICSFabric *xi, int irq);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 05/25] spapr: introduce a spapr_irq_set() helper
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 05/25] spapr: introduce a spapr_irq_set() helper Cédric Le Goater
@ 2017-11-24  3:16   ` David Gibson
  2017-11-24  8:32     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-24  3:16 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

On Thu, Nov 23, 2017 at 02:29:35PM +0100, Cédric Le Goater wrote:
> It will make synchronisation easier with the XIVE interrupt mode when
> available. The 'irq' parameter refers to the global IRQ number space.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

s/spapr_irq_set/spapr_irq_set_lsi/

otherwise the name doesn't tell you what it sets.

With that change,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 7ae84d40bdb4..79f38a9ff4e1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3594,6 +3594,11 @@ static int ics_find_free_block(ICSState *ics, int num, int alignnum)
>      return -1;
>  }
>  
> +static void spapr_irq_set(sPAPRMachineState *spapr, int irq, bool lsi)
> +{
> +    ics_set_irq_type(spapr->ics, irq - spapr->ics->offset, lsi);
> +}
> +
>  int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
>                      Error **errp)
>  {
> @@ -3618,7 +3623,7 @@ int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
>          irq += ics->offset;
>      }
>  
> -    ics_set_irq_type(ics, irq - ics->offset, lsi);
> +    spapr_irq_set(spapr, irq, lsi);
>      trace_spapr_irq_alloc(irq);
>  
>      return irq;
> @@ -3657,10 +3662,10 @@ int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
>          return -1;
>      }
>  
> +    first += ics->offset;
>      for (i = first; i < first + num; ++i) {
> -        ics_set_irq_type(ics, i, lsi);
> +        spapr_irq_set(spapr, i, lsi);
>      }
> -    first += ics->offset;
>  
>      trace_spapr_irq_alloc_block(first, num, lsi, align);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 06/25] spapr: introduce a spapr_irq_get_qirq() helper
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 06/25] spapr: introduce a spapr_irq_get_qirq() helper Cédric Le Goater
@ 2017-11-24  3:18   ` David Gibson
  2017-11-24  8:01     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-24  3:18 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 6646 bytes --]

On Thu, Nov 23, 2017 at 02:29:36PM +0100, Cédric Le Goater wrote:
> xics_get_qirq() is only used by the sPAPR machine. Let's move it there
> and change its name to reflect its scope. It will be useful for XIVE
> support which will use its own set of qirqs.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

s/spapr_irq_get_qirq/spapr_qirq/

for brevity

With that change

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/intc/xics.c              | 12 ------------
>  hw/ppc/spapr.c              | 11 +++++++++++
>  hw/ppc/spapr_events.c       | 12 +++++-------
>  hw/ppc/spapr_pci.c          |  2 +-
>  include/hw/pci-host/spapr.h |  2 +-
>  include/hw/ppc/spapr.h      |  1 +
>  include/hw/ppc/spapr_vio.h  |  2 +-
>  include/hw/ppc/xics.h       |  1 -
>  8 files changed, 20 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 0f2e7273bc8f..a78b4dbd033d 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -714,18 +714,6 @@ static const TypeInfo xics_fabric_info = {
>  /*
>   * Exported functions
>   */
> -qemu_irq xics_get_qirq(XICSFabric *xi, int irq)
> -{
> -    XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
> -    ICSState *ics = xic->ics_get(xi, irq);
> -
> -    if (ics) {
> -        return ics->qirqs[irq - ics->offset];
> -    }
> -
> -    return NULL;
> -}
> -
>  ICPState *xics_icp_get(XICSFabric *xi, int server)
>  {
>      XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 79f38a9ff4e1..5d3325ca3c88 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3689,6 +3689,17 @@ void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
>      }
>  }
>  
> +qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq)
> +{
> +    ICSState *ics = spapr->ics;
> +
> +    if (ics_valid_irq(ics, irq)) {
> +        return ics->qirqs[irq - ics->offset];
> +    }
> +
> +    return NULL;
> +}
> +
>  static void spapr_pic_print_info(InterruptStatsProvider *obj,
>                                   Monitor *mon)
>  {
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index cead596f3e7a..0427590e9cac 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -472,9 +472,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>  
>      rtas_event_log_queue(spapr, entry);
>  
> -    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr),
> -                                 rtas_event_log_to_irq(spapr,
> -                                                       RTAS_LOG_TYPE_EPOW)));
> +    qemu_irq_pulse(spapr_irq_get_qirq(spapr,
> +                   rtas_event_log_to_irq(spapr, RTAS_LOG_TYPE_EPOW)));
>  }
>  
>  static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> @@ -556,9 +555,8 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>  
>      rtas_event_log_queue(spapr, entry);
>  
> -    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr),
> -                                 rtas_event_log_to_irq(spapr,
> -                                                       RTAS_LOG_TYPE_HOTPLUG)));
> +    qemu_irq_pulse(spapr_irq_get_qirq(spapr,
> +                   rtas_event_log_to_irq(spapr, RTAS_LOG_TYPE_HOTPLUG)));
>  }
>  
>  void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
> @@ -678,7 +676,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>                  spapr_event_sources_get_source(spapr->event_sources, i);
>  
>              g_assert(source->enabled);
> -            qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr), source->irq));
> +            qemu_irq_pulse(spapr_irq_get_qirq(spapr, source->irq));
>          }
>      }
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index e0ef77a480e5..a02faa12333e 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -723,7 +723,7 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
>  
>      trace_spapr_pci_msi_write(addr, data, irq);
>  
> -    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr), irq));
> +    qemu_irq_pulse(spapr_irq_get_qirq(spapr, irq));
>  }
>  
>  static const MemoryRegionOps spapr_msi_ops = {
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 38470b2f0e5c..3059fdd614e6 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -108,7 +108,7 @@ static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  
> -    return xics_get_qirq(XICS_FABRIC(spapr), phb->lsi_table[pin].irq);
> +    return spapr_irq_get_qirq(spapr, phb->lsi_table[pin].irq);
>  }
>  
>  PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 7a133f80411a..9a3885593c86 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -714,5 +714,6 @@ int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
>  int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
>                            bool align, Error **errp);
>  void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
> +qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq);
>  
>  #endif /* HW_SPAPR_H */
> diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
> index 2e9685a5d900..404f1de2c046 100644
> --- a/include/hw/ppc/spapr_vio.h
> +++ b/include/hw/ppc/spapr_vio.h
> @@ -87,7 +87,7 @@ static inline qemu_irq spapr_vio_qirq(VIOsPAPRDevice *dev)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  
> -    return xics_get_qirq(XICS_FABRIC(spapr), dev->irq);
> +    return spapr_irq_get_qirq(spapr, dev->irq);
>  }
>  
>  static inline bool spapr_vio_dma_valid(VIOsPAPRDevice *dev, uint64_t taddr,
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index cea462bc7f3e..2f1f35294e6d 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -183,7 +183,6 @@ typedef struct XICSFabricClass {
>  
>  void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>  
> -qemu_irq xics_get_qirq(XICSFabric *xi, int irq);
>  ICPState *xics_icp_get(XICSFabric *xi, int server);
>  
>  /* Internal XICS interfaces */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper
  2017-11-24  2:51   ` David Gibson
@ 2017-11-24  7:57     ` Cédric Le Goater
  2017-11-24  9:55     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  1 sibling, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-24  7:57 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/24/2017 03:51 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:31PM +0100, Cédric Le Goater wrote:
>> The sPAPR and the PowerNV core objects create the interrupt presenter
>> object of the CPUs in a very similar way. Let's provide a common
>> routine in which we use the presenter 'type' as a child identifier.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> One tiny nit.., apart from that
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
>> ---
>>  hw/intc/xics.c          | 22 ++++++++++++++++++++++
>>  hw/ppc/pnv_core.c       | 10 +---------
>>  hw/ppc/spapr_cpu_core.c | 13 ++-----------
>>  include/hw/ppc/xics.h   |  3 +++
>>  4 files changed, 28 insertions(+), 20 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index a1cc0e420c98..e4ccdff8f577 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -384,6 +384,28 @@ static const TypeInfo icp_info = {
>>      .class_size = sizeof(ICPStateClass),
>>  };
>>  
>> +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi, Error **errp)
>> +{
>> +    Object *child = OBJECT(cs);
> 
> In the original context 'child' made sense, since it was the child
> object of the core.  Here, it's misleading, since it's the parent of
> the xics link.  It's only used in a couple of places, so I suggest you
> just opencode OBJECT(cs) in each place.

yes. That was a left over from the copy-paste.

Thanks,

C.


>> +    Error *local_err = NULL;
>> +    Object *obj;
>> +
>> +    obj = object_new(type);
>> +    object_property_add_child(child, type, obj, &error_abort);
>> +    object_unref(obj);
>> +    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
>> +                                   &error_abort);
>> +    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
>> +    object_property_set_bool(obj, true, "realized", &local_err);
>> +    if (local_err) {
>> +        object_unparent(obj);
>> +        error_propagate(errp, local_err);
>> +        obj = NULL;
>> +    }
>> +
>> +    return obj;
>> +}
>> +
>>  /*
>>   * ICS: Source layer
>>   */
>> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
>> index 82ff440b3334..a066736846f8 100644
>> --- a/hw/ppc/pnv_core.c
>> +++ b/hw/ppc/pnv_core.c
>> @@ -126,7 +126,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>>      Error *local_err = NULL;
>>      CPUState *cs = CPU(child);
>>      PowerPCCPU *cpu = POWERPC_CPU(cs);
>> -    Object *obj;
>>  
>>      object_property_set_bool(child, true, "realized", &local_err);
>>      if (local_err) {
>> @@ -134,13 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>>          return;
>>      }
>>  
>> -    obj = object_new(TYPE_PNV_ICP);
>> -    object_property_add_child(child, "icp", obj, NULL);
>> -    object_unref(obj);
>> -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
>> -                                   &error_abort);
>> -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
>> -    object_property_set_bool(obj, true, "realized", &local_err);
>> +    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
>>      if (local_err) {
>>          error_propagate(errp, local_err);
>>          return;
>> @@ -148,7 +141,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>>  
>>      powernv_cpu_init(cpu, &local_err);
>>      if (local_err) {
>> -        object_unparent(obj);
>>          error_propagate(errp, local_err);
>>          return;
>>      }
>> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
>> index 4ba8563d49e4..f8a520a2fa2d 100644
>> --- a/hw/ppc/spapr_cpu_core.c
>> +++ b/hw/ppc/spapr_cpu_core.c
>> @@ -111,7 +111,6 @@ static void spapr_cpu_core_realize_child(Object *child,
>>      Error *local_err = NULL;
>>      CPUState *cs = CPU(child);
>>      PowerPCCPU *cpu = POWERPC_CPU(cs);
>> -    Object *obj;
>>  
>>      object_property_set_bool(child, true, "realized", &local_err);
>>      if (local_err) {
>> @@ -123,21 +122,13 @@ static void spapr_cpu_core_realize_child(Object *child,
>>          goto error;
>>      }
>>  
>> -    obj = object_new(spapr->icp_type);
>> -    object_property_add_child(child, "icp", obj, &error_abort);
>> -    object_unref(obj);
>> -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(spapr),
>> -                                   &error_abort);
>> -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
>> -    object_property_set_bool(obj, true, "realized", &local_err);
>> +    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>>      if (local_err) {
>> -        goto free_icp;
>> +        goto error;
>>      }
>>  
>>      return;
>>  
>> -free_icp:
>> -    object_unparent(obj);
>>  error:
>>      error_propagate(errp, local_err);
>>  }
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 2df99be111ce..126b47dec38b 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -212,4 +212,7 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>>  void xics_spapr_init(sPAPRMachineState *spapr);
>>  
>> +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi,
>> +                   Error **errp);
>> +
>>  #endif /* XICS_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 06/25] spapr: introduce a spapr_irq_get_qirq() helper
  2017-11-24  3:18   ` David Gibson
@ 2017-11-24  8:01     ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-24  8:01 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/24/2017 04:18 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:36PM +0100, Cédric Le Goater wrote:
>> xics_get_qirq() is only used by the sPAPR machine. Let's move it there
>> and change its name to reflect its scope. It will be useful for XIVE
>> support which will use its own set of qirqs.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> s/spapr_irq_get_qirq/spapr_qirq/
> 
> for brevity

Sure.
 
> With that change
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Thanks,

C.
 
>> ---
>>  hw/intc/xics.c              | 12 ------------
>>  hw/ppc/spapr.c              | 11 +++++++++++
>>  hw/ppc/spapr_events.c       | 12 +++++-------
>>  hw/ppc/spapr_pci.c          |  2 +-
>>  include/hw/pci-host/spapr.h |  2 +-
>>  include/hw/ppc/spapr.h      |  1 +
>>  include/hw/ppc/spapr_vio.h  |  2 +-
>>  include/hw/ppc/xics.h       |  1 -
>>  8 files changed, 20 insertions(+), 23 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index 0f2e7273bc8f..a78b4dbd033d 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -714,18 +714,6 @@ static const TypeInfo xics_fabric_info = {
>>  /*
>>   * Exported functions
>>   */
>> -qemu_irq xics_get_qirq(XICSFabric *xi, int irq)
>> -{
>> -    XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
>> -    ICSState *ics = xic->ics_get(xi, irq);
>> -
>> -    if (ics) {
>> -        return ics->qirqs[irq - ics->offset];
>> -    }
>> -
>> -    return NULL;
>> -}
>> -
>>  ICPState *xics_icp_get(XICSFabric *xi, int server)
>>  {
>>      XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 79f38a9ff4e1..5d3325ca3c88 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3689,6 +3689,17 @@ void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
>>      }
>>  }
>>  
>> +qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq)
>> +{
>> +    ICSState *ics = spapr->ics;
>> +
>> +    if (ics_valid_irq(ics, irq)) {
>> +        return ics->qirqs[irq - ics->offset];
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>>  static void spapr_pic_print_info(InterruptStatsProvider *obj,
>>                                   Monitor *mon)
>>  {
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index cead596f3e7a..0427590e9cac 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -472,9 +472,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>>  
>>      rtas_event_log_queue(spapr, entry);
>>  
>> -    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr),
>> -                                 rtas_event_log_to_irq(spapr,
>> -                                                       RTAS_LOG_TYPE_EPOW)));
>> +    qemu_irq_pulse(spapr_irq_get_qirq(spapr,
>> +                   rtas_event_log_to_irq(spapr, RTAS_LOG_TYPE_EPOW)));
>>  }
>>  
>>  static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>> @@ -556,9 +555,8 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>>  
>>      rtas_event_log_queue(spapr, entry);
>>  
>> -    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr),
>> -                                 rtas_event_log_to_irq(spapr,
>> -                                                       RTAS_LOG_TYPE_HOTPLUG)));
>> +    qemu_irq_pulse(spapr_irq_get_qirq(spapr,
>> +                   rtas_event_log_to_irq(spapr, RTAS_LOG_TYPE_HOTPLUG)));
>>  }
>>  
>>  void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
>> @@ -678,7 +676,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>>                  spapr_event_sources_get_source(spapr->event_sources, i);
>>  
>>              g_assert(source->enabled);
>> -            qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr), source->irq));
>> +            qemu_irq_pulse(spapr_irq_get_qirq(spapr, source->irq));
>>          }
>>      }
>>  
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index e0ef77a480e5..a02faa12333e 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -723,7 +723,7 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
>>  
>>      trace_spapr_pci_msi_write(addr, data, irq);
>>  
>> -    qemu_irq_pulse(xics_get_qirq(XICS_FABRIC(spapr), irq));
>> +    qemu_irq_pulse(spapr_irq_get_qirq(spapr, irq));
>>  }
>>  
>>  static const MemoryRegionOps spapr_msi_ops = {
>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>> index 38470b2f0e5c..3059fdd614e6 100644
>> --- a/include/hw/pci-host/spapr.h
>> +++ b/include/hw/pci-host/spapr.h
>> @@ -108,7 +108,7 @@ static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
>>  {
>>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>  
>> -    return xics_get_qirq(XICS_FABRIC(spapr), phb->lsi_table[pin].irq);
>> +    return spapr_irq_get_qirq(spapr, phb->lsi_table[pin].irq);
>>  }
>>  
>>  PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index);
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 7a133f80411a..9a3885593c86 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -714,5 +714,6 @@ int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
>>  int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
>>                            bool align, Error **errp);
>>  void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
>> +qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq);
>>  
>>  #endif /* HW_SPAPR_H */
>> diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
>> index 2e9685a5d900..404f1de2c046 100644
>> --- a/include/hw/ppc/spapr_vio.h
>> +++ b/include/hw/ppc/spapr_vio.h
>> @@ -87,7 +87,7 @@ static inline qemu_irq spapr_vio_qirq(VIOsPAPRDevice *dev)
>>  {
>>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>  
>> -    return xics_get_qirq(XICS_FABRIC(spapr), dev->irq);
>> +    return spapr_irq_get_qirq(spapr, dev->irq);
>>  }
>>  
>>  static inline bool spapr_vio_dma_valid(VIOsPAPRDevice *dev, uint64_t taddr,
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index cea462bc7f3e..2f1f35294e6d 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -183,7 +183,6 @@ typedef struct XICSFabricClass {
>>  
>>  void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>>  
>> -qemu_irq xics_get_qirq(XICSFabric *xi, int irq);
>>  ICPState *xics_icp_get(XICSFabric *xi, int server);
>>  
>>  /* Internal XICS interfaces */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-23 20:31   ` Benjamin Herrenschmidt
@ 2017-11-24  8:15     ` Cédric Le Goater
  2017-11-26 21:52       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-24  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-ppc, qemu-devel, David Gibson

On 11/23/2017 09:31 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2017-11-23 at 14:29 +0100, Cédric Le Goater wrote:
>> The Event Queue Descriptor (EQD) table, also known as Event Notification
>> Descriptor (END), is one of the internal tables the XIVE interrupt
>> controller uses to redirect exception from event sources to CPU
>> threads.
> 
> Keep in mind tha we want to only expose to the guest priorities 0..6
> as priority 7 will not be available with KVM on DD2.0 chips.

The hcall patch 19 introduces the "ibm,plat-res-int-priorities" property.
The priority ranges reserved by the hypervisor are the following :

    static const uint32_t reserved_priorities[] = {
        7,    /* start */
        0xf8, /* count */
    };

So The Linux driver is expected to choose priority 6. The priority
validity is then checked in each hcall returning H_P4/H_P3 in case of 
failure.  

But it is true that we scale the arrays with :
 
    #define XIVE_PRIORITY_MAX  7

Do you want QEMU to completely remove prio 7 ? 

C. 

>> The EQD specifies on which Event Queue the event data should be posted
>> when an exception occurs (later on pulled by the OS) and which server
>> (VPD in XIVE terminology) to notify. The Event Queue is a much more
>> complex structure but we start with a simple model for the sPAPR
>> machine.
>>
>> There is one XiveEQ per priority and the model chooses to store them
>> under the Xive Interrupt presenter model. It will be retrieved, just
>> like for XICS, through the 'intc' object pointer of the CPU.
>>
>> The EQ indexing follows a simple pattern:
>>
>>        (server << 3) | (priority & 0x7)
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 106 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 554b25e0884c..983317a6b3f6 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -23,6 +23,7 @@
>>  #include "sysemu/dma.h"
>>  #include "monitor/monitor.h"
>>  #include "hw/ppc/spapr_xive.h"
>> +#include "hw/ppc/spapr.h"
>>  #include "hw/ppc/xics.h"
>>  
>>  #include "xive-internal.h"
>> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
>>      uint8_t   tima[TM_RING_COUNT * 0x10];
>>      uint8_t   *tima_os;
>>      qemu_irq  output;
>> +
>> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>>  };
>>  
>>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>>      },
>>  };
>>  
>> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>> +{
>> +    PowerPCCPU *cpu = spapr_find_cpu(server);
>> +
>> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>> +}
>> +
>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>  {
>>  
>> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
>>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>>  
>>      memset(xicp->tima, 0, sizeof(xicp->tima));
>> +
>> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
>>  }
>>  
>>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
>> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
>>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
>>  }
>>  
>> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
>> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT32(w0, XiveEQ),
>> +        VMSTATE_UINT32(w1, XiveEQ),
>> +        VMSTATE_UINT32(w2, XiveEQ),
>> +        VMSTATE_UINT32(w3, XiveEQ),
>> +        VMSTATE_UINT32(w4, XiveEQ),
>> +        VMSTATE_UINT32(w5, XiveEQ),
>> +        VMSTATE_UINT32(w6, XiveEQ),
>> +        VMSTATE_UINT32(w7, XiveEQ),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
>>  {
>>      /* TODO check machine XIVE support */
>> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
>>      .needed = vmstate_spapr_xive_icp_needed,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
>> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
>> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>>      ive->w &= ~IVE_VALID;
>>      return true;
>>  }
>> +
>> +/*
>> + * Use a simple indexing for the EQs.
>> + */
>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
>> +{
>> +    int priority = eq_idx & 0x7;
>> +    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
>> +
>> +    return xicp ? &xicp->eqt[priority] : NULL;
>> +}
>> +
>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
>> +                              uint8_t priority, uint32_t *out_eq_idx)
>> +{
>> +    if (priority > XIVE_PRIORITY_MAX) {
>> +        return false;
>> +    }
>> +
>> +    if (out_eq_idx) {
>> +        *out_eq_idx = (server << 3) | (priority & 0x7);
>> +    }
>> +
>> +    return true;
>> +}
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> index 7d329f203a9b..c3949671aa03 100644
>> --- a/hw/intc/xive-internal.h
>> +++ b/hw/intc/xive-internal.h
>> @@ -131,9 +131,59 @@ typedef struct XiveIVE {
>>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>>  } XiveIVE;
>>  
>> +/* EQ */
>> +typedef struct XiveEQ {
>> +        uint32_t        w0;
>> +#define EQ_W0_VALID             PPC_BIT32(0)
>> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
>> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
>> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
>> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
>> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
>> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
>> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
>> +#define EQ_W0_SW0               PPC_BIT32(16)
>> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
>> +#define EQ_QSIZE_4K             0
>> +#define EQ_QSIZE_64K            4
>> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
>> +        uint32_t        w1;
>> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
>> +#define EQ_W1_ESn_P             PPC_BIT32(0)
>> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
>> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
>> +#define EQ_W1_ESe_P             PPC_BIT32(2)
>> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
>> +#define EQ_W1_GENERATION        PPC_BIT32(9)
>> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
>> +        uint32_t        w2;
>> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
>> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
>> +        uint32_t        w3;
>> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
>> +        uint32_t        w4;
>> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
>> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
>> +        uint32_t        w5;
>> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
>> +        uint32_t        w6;
>> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
>> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
>> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
>> +        uint32_t        w7;
>> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
>> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
>> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
>> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
>> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>> +} XiveEQ;
>> +
>>  #define XIVE_PRIORITY_MAX  7
>>  
>>  void spapr_xive_reset(void *dev);
>>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
>> +                              uint32_t *out_eq_idx);
>>  
>>  #endif /* _INTC_XIVE_INTERNAL_H */

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 05/25] spapr: introduce a spapr_irq_set() helper
  2017-11-24  3:16   ` David Gibson
@ 2017-11-24  8:32     ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-24  8:32 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/24/2017 04:16 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:35PM +0100, Cédric Le Goater wrote:
>> It will make synchronisation easier with the XIVE interrupt mode when
>> available. The 'irq' parameter refers to the global IRQ number space.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> s/spapr_irq_set/spapr_irq_set_lsi/
> 
> otherwise the name doesn't tell you what it sets.

That is when it gets confusing. This routine does two things :

 - it allocates the IRQ number 
 - it sets the type of the allocated IRQ number, LSI or MSI. 

because both information are held under the same flag : 

  ics->irqs[srcno].flags

But the main purpose of this routine is to do the allocation, 
so that is why I changed the name. 

Now that you have the explanations and that you rather still 
have the prefix '_lsi', please tell me. In any case, I will add
a comment on what the routine is doing.

Thanks,

C.    
 
 
> With that change,
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
>> ---
>>  hw/ppc/spapr.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 7ae84d40bdb4..79f38a9ff4e1 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3594,6 +3594,11 @@ static int ics_find_free_block(ICSState *ics, int num, int alignnum)
>>      return -1;
>>  }
>>  
>> +static void spapr_irq_set(sPAPRMachineState *spapr, int irq, bool lsi)
>> +{
>> +    ics_set_irq_type(spapr->ics, irq - spapr->ics->offset, lsi);
>> +}
>> +
>>  int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
>>                      Error **errp)
>>  {
>> @@ -3618,7 +3623,7 @@ int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
>>          irq += ics->offset;
>>      }
>>  
>> -    ics_set_irq_type(ics, irq - ics->offset, lsi);
>> +    spapr_irq_set(spapr, irq, lsi);
>>      trace_spapr_irq_alloc(irq);
>>  
>>      return irq;
>> @@ -3657,10 +3662,10 @@ int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
>>          return -1;
>>      }
>>  
>> +    first += ics->offset;
>>      for (i = first; i < first + num; ++i) {
>> -        ics_set_irq_type(ics, i, lsi);
>> +        spapr_irq_set(spapr, i, lsi);
>>      }
>> -    first += ics->offset;
>>  
>>      trace_spapr_irq_alloc_block(first, num, lsi, align);
>>  
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 01/25] ppc/xics: introduce an icp_create() helper
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper Cédric Le Goater
  2017-11-24  2:51   ` David Gibson
@ 2017-11-24  9:08   ` Greg Kurz
  1 sibling, 0 replies; 128+ messages in thread
From: Greg Kurz @ 2017-11-24  9:08 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On Thu, 23 Nov 2017 14:29:31 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> The sPAPR and the PowerNV core objects create the interrupt presenter
> object of the CPUs in a very similar way. Let's provide a common
> routine in which we use the presenter 'type' as a child identifier.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/intc/xics.c          | 22 ++++++++++++++++++++++
>  hw/ppc/pnv_core.c       | 10 +---------
>  hw/ppc/spapr_cpu_core.c | 13 ++-----------
>  include/hw/ppc/xics.h   |  3 +++
>  4 files changed, 28 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index a1cc0e420c98..e4ccdff8f577 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -384,6 +384,28 @@ static const TypeInfo icp_info = {
>      .class_size = sizeof(ICPStateClass),
>  };
>  
> +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi, Error **errp)
> +{
> +    Object *child = OBJECT(cs);
> +    Error *local_err = NULL;
> +    Object *obj;
> +
> +    obj = object_new(type);
> +    object_property_add_child(child, type, obj, &error_abort);
> +    object_unref(obj);
> +    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
> +                                   &error_abort);
> +    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> +    object_property_set_bool(obj, true, "realized", &local_err);
> +    if (local_err) {
> +        object_unparent(obj);
> +        error_propagate(errp, local_err);
> +        obj = NULL;
> +    }
> +
> +    return obj;
> +}
> +
>  /*
>   * ICS: Source layer
>   */
> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> index 82ff440b3334..a066736846f8 100644
> --- a/hw/ppc/pnv_core.c
> +++ b/hw/ppc/pnv_core.c
> @@ -126,7 +126,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>      Error *local_err = NULL;
>      CPUState *cs = CPU(child);
>      PowerPCCPU *cpu = POWERPC_CPU(cs);
> -    Object *obj;
>  
>      object_property_set_bool(child, true, "realized", &local_err);
>      if (local_err) {
> @@ -134,13 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>          return;
>      }
>  
> -    obj = object_new(TYPE_PNV_ICP);
> -    object_property_add_child(child, "icp", obj, NULL);
> -    object_unref(obj);
> -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
> -                                   &error_abort);
> -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> -    object_property_set_bool(obj, true, "realized", &local_err);
> +    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> @@ -148,7 +141,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>  
>      powernv_cpu_init(cpu, &local_err);
>      if (local_err) {
> -        object_unparent(obj);
>          error_propagate(errp, local_err);
>          return;
>      }
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 4ba8563d49e4..f8a520a2fa2d 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -111,7 +111,6 @@ static void spapr_cpu_core_realize_child(Object *child,
>      Error *local_err = NULL;
>      CPUState *cs = CPU(child);
>      PowerPCCPU *cpu = POWERPC_CPU(cs);
> -    Object *obj;
>  
>      object_property_set_bool(child, true, "realized", &local_err);
>      if (local_err) {
> @@ -123,21 +122,13 @@ static void spapr_cpu_core_realize_child(Object *child,
>          goto error;
>      }
>  
> -    obj = object_new(spapr->icp_type);
> -    object_property_add_child(child, "icp", obj, &error_abort);
> -    object_unref(obj);
> -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(spapr),
> -                                   &error_abort);
> -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> -    object_property_set_bool(obj, true, "realized", &local_err);
> +    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>      if (local_err) {
> -        goto free_icp;
> +        goto error;
>      }
>  
>      return;
>  
> -free_icp:
> -    object_unparent(obj);
>  error:
>      error_propagate(errp, local_err);
>  }
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 2df99be111ce..126b47dec38b 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -212,4 +212,7 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>  void xics_spapr_init(sPAPRMachineState *spapr);
>  
> +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi,
> +                   Error **errp);
> +
>  #endif /* XICS_H */

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core Cédric Le Goater
  2017-11-24  2:57   ` David Gibson
@ 2017-11-24  9:21   ` Greg Kurz
  1 sibling, 0 replies; 128+ messages in thread
From: Greg Kurz @ 2017-11-24  9:21 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On Thu, 23 Nov 2017 14:29:32 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> The 'intc' pointer of the CPU references the interrupt presenter in
> the XICS interrupt mode. When the XIVE interrupt mode is available and
> activated, the machine will need to reassign this pointer to reflect
> the change.
> 
> Moving this assignment under the realize routine of the CPU will ease
> the process when the interrupt mode is toggled.
> 

No surprise since this was violating CPU internals actually :)

> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/intc/xics.c          | 1 -
>  hw/ppc/pnv_core.c       | 2 +-
>  hw/ppc/spapr_cpu_core.c | 2 +-
>  3 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index e4ccdff8f577..0f2e7273bc8f 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -334,7 +334,6 @@ static void icp_realize(DeviceState *dev, Error **errp)
>      }
>  
>      cpu = POWERPC_CPU(obj);
> -    cpu->intc = OBJECT(icp);
>      icp->cs = CPU(obj);
>  
>      env = &cpu->env;
> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> index a066736846f8..90acaac45889 100644
> --- a/hw/ppc/pnv_core.c
> +++ b/hw/ppc/pnv_core.c
> @@ -133,7 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
>          return;
>      }
>  
> -    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
> +    cpu->intc = icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index f8a520a2fa2d..f7cc74512481 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -122,7 +122,7 @@ static void spapr_cpu_core_realize_child(Object *child,
>          goto error;
>      }
>  
> -    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
> +    cpu->intc = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>      if (local_err) {
>          goto error;
>      }

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 01/25] ppc/xics: introduce an icp_create() helper
  2017-11-24  2:51   ` David Gibson
  2017-11-24  7:57     ` Cédric Le Goater
@ 2017-11-24  9:55     ` Greg Kurz
  2017-11-27  7:20       ` David Gibson
  1 sibling, 1 reply; 128+ messages in thread
From: Greg Kurz @ 2017-11-24  9:55 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5729 bytes --]

On Fri, 24 Nov 2017 13:51:00 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Nov 23, 2017 at 02:29:31PM +0100, Cédric Le Goater wrote:
> > The sPAPR and the PowerNV core objects create the interrupt presenter
> > object of the CPUs in a very similar way. Let's provide a common
> > routine in which we use the presenter 'type' as a child identifier.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>  
> 
> One tiny nit.., apart from that
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> > ---
> >  hw/intc/xics.c          | 22 ++++++++++++++++++++++
> >  hw/ppc/pnv_core.c       | 10 +---------
> >  hw/ppc/spapr_cpu_core.c | 13 ++-----------
> >  include/hw/ppc/xics.h   |  3 +++
> >  4 files changed, 28 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> > index a1cc0e420c98..e4ccdff8f577 100644
> > --- a/hw/intc/xics.c
> > +++ b/hw/intc/xics.c
> > @@ -384,6 +384,28 @@ static const TypeInfo icp_info = {
> >      .class_size = sizeof(ICPStateClass),
> >  };
> >  
> > +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi, Error **errp)
> > +{
> > +    Object *child = OBJECT(cs);  
> 
> In the original context 'child' made sense, since it was the child
> object of the core.  Here, it's misleading, since it's the parent of
> the xics link.  It's only used in a couple of places, so I suggest you

Oops yes :)

> just opencode OBJECT(cs) in each place.
> 

or rename child to owner, as it is done with DRCs and TCE tables.

> > +    Error *local_err = NULL;
> > +    Object *obj;
> > +
> > +    obj = object_new(type);
> > +    object_property_add_child(child, type, obj, &error_abort);
> > +    object_unref(obj);
> > +    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
> > +                                   &error_abort);
> > +    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> > +    object_property_set_bool(obj, true, "realized", &local_err);
> > +    if (local_err) {
> > +        object_unparent(obj);
> > +        error_propagate(errp, local_err);
> > +        obj = NULL;
> > +    }
> > +
> > +    return obj;
> > +}
> > +
> >  /*
> >   * ICS: Source layer
> >   */
> > diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> > index 82ff440b3334..a066736846f8 100644
> > --- a/hw/ppc/pnv_core.c
> > +++ b/hw/ppc/pnv_core.c
> > @@ -126,7 +126,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
> >      Error *local_err = NULL;
> >      CPUState *cs = CPU(child);
> >      PowerPCCPU *cpu = POWERPC_CPU(cs);
> > -    Object *obj;
> >  
> >      object_property_set_bool(child, true, "realized", &local_err);
> >      if (local_err) {
> > @@ -134,13 +133,7 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
> >          return;
> >      }
> >  
> > -    obj = object_new(TYPE_PNV_ICP);
> > -    object_property_add_child(child, "icp", obj, NULL);
> > -    object_unref(obj);
> > -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
> > -                                   &error_abort);
> > -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> > -    object_property_set_bool(obj, true, "realized", &local_err);
> > +    icp_create(cs, TYPE_PNV_ICP, xi, &local_err);
> >      if (local_err) {
> >          error_propagate(errp, local_err);
> >          return;
> > @@ -148,7 +141,6 @@ static void pnv_core_realize_child(Object *child, XICSFabric *xi, Error **errp)
> >  
> >      powernv_cpu_init(cpu, &local_err);
> >      if (local_err) {
> > -        object_unparent(obj);
> >          error_propagate(errp, local_err);
> >          return;
> >      }
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index 4ba8563d49e4..f8a520a2fa2d 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -111,7 +111,6 @@ static void spapr_cpu_core_realize_child(Object *child,
> >      Error *local_err = NULL;
> >      CPUState *cs = CPU(child);
> >      PowerPCCPU *cpu = POWERPC_CPU(cs);
> > -    Object *obj;
> >  
> >      object_property_set_bool(child, true, "realized", &local_err);
> >      if (local_err) {
> > @@ -123,21 +122,13 @@ static void spapr_cpu_core_realize_child(Object *child,
> >          goto error;
> >      }
> >  
> > -    obj = object_new(spapr->icp_type);
> > -    object_property_add_child(child, "icp", obj, &error_abort);
> > -    object_unref(obj);
> > -    object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(spapr),
> > -                                   &error_abort);
> > -    object_property_add_const_link(obj, ICP_PROP_CPU, child, &error_abort);
> > -    object_property_set_bool(obj, true, "realized", &local_err);
> > +    icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
> >      if (local_err) {
> > -        goto free_icp;
> > +        goto error;
> >      }
> >  
> >      return;
> >  
> > -free_icp:
> > -    object_unparent(obj);
> >  error:
> >      error_propagate(errp, local_err);
> >  }
> > diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> > index 2df99be111ce..126b47dec38b 100644
> > --- a/include/hw/ppc/xics.h
> > +++ b/include/hw/ppc/xics.h
> > @@ -212,4 +212,7 @@ typedef struct sPAPRMachineState sPAPRMachineState;
> >  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
> >  void xics_spapr_init(sPAPRMachineState *spapr);
> >  
> > +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi,
> > +                   Error **errp);
> > +
> >  #endif /* XICS_H */  
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper Cédric Le Goater
@ 2017-11-24 10:09   ` Greg Kurz
  2017-11-24 12:26     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: Greg Kurz @ 2017-11-24 10:09 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On Thu, 23 Nov 2017 14:29:33 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> On sPAPR, the creation of the interrupt presenter depends on some of
> the machine attributes. When the XIVE interrupt mode is available,
> this will get more complex. So provide a machine-level helper to
> isolate the process and hide the details to the sPAPR core realize
> function.
> 

Not sure it makes sense to introduce this helper that early in the series...
what about folding it in patch 23 where it is really needed ?

> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr.c          | 14 ++++++++++++++
>  hw/ppc/spapr_cpu_core.c |  2 +-
>  include/hw/ppc/spapr.h  |  2 ++
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 174e7ff0678d..925cbd3c1bf4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3556,6 +3556,20 @@ static ICPState *spapr_icp_get(XICSFabric *xi, int vcpu_id)
>      return cpu ? ICP(cpu->intc) : NULL;
>  }
>  
> +Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    Object *obj;
> +
> +    obj = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +
> +    return obj;
> +}
> +
>  static void spapr_pic_print_info(InterruptStatsProvider *obj,
>                                   Monitor *mon)
>  {
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index f7cc74512481..61a9850e688b 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -122,7 +122,7 @@ static void spapr_cpu_core_realize_child(Object *child,
>          goto error;
>      }
>  
> -    cpu->intc = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
> +    cpu->intc = spapr_icp_create(spapr, cs, &local_err);
>      if (local_err) {
>          goto error;
>      }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9d21ca9bde3a..9da38de34277 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -707,4 +707,6 @@ void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>  int spapr_vcpu_id(PowerPCCPU *cpu);
>  PowerPCCPU *spapr_find_cpu(int vcpu_id);
>  
> +Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp);
> +
>  #endif /* HW_SPAPR_H */

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper
  2017-11-24 10:09   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-11-24 12:26     ` Cédric Le Goater
  2017-11-28 10:56       ` Greg Kurz
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-24 12:26 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On 11/24/2017 10:09 AM, Greg Kurz wrote:
> On Thu, 23 Nov 2017 14:29:33 +0100
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> On sPAPR, the creation of the interrupt presenter depends on some of
>> the machine attributes. When the XIVE interrupt mode is available,
>> this will get more complex. So provide a machine-level helper to
>> isolate the process and hide the details to the sPAPR core realize
>> function.
>>
> 
> Not sure it makes sense to introduce this helper that early in the series...
> what about folding it in patch 23 where it is really needed ?

It does 'icp_type' and the 'xics_fabric' which are machine concepts 
around the sPAPR interrupt controller model.
 
But yes, it could come before patch 23. May be not folded, though.

C.


>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/ppc/spapr.c          | 14 ++++++++++++++
>>  hw/ppc/spapr_cpu_core.c |  2 +-
>>  include/hw/ppc/spapr.h  |  2 ++
>>  3 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 174e7ff0678d..925cbd3c1bf4 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3556,6 +3556,20 @@ static ICPState *spapr_icp_get(XICSFabric *xi, int vcpu_id)
>>      return cpu ? ICP(cpu->intc) : NULL;
>>  }
>>  
>> +Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    Object *obj;
>> +
>> +    obj = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return NULL;
>> +    }
>> +
>> +    return obj;
>> +}
>> +
>>  static void spapr_pic_print_info(InterruptStatsProvider *obj,
>>                                   Monitor *mon)
>>  {
>> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
>> index f7cc74512481..61a9850e688b 100644
>> --- a/hw/ppc/spapr_cpu_core.c
>> +++ b/hw/ppc/spapr_cpu_core.c
>> @@ -122,7 +122,7 @@ static void spapr_cpu_core_realize_child(Object *child,
>>          goto error;
>>      }
>>  
>> -    cpu->intc = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>> +    cpu->intc = spapr_icp_create(spapr, cs, &local_err);
>>      if (local_err) {
>>          goto error;
>>      }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 9d21ca9bde3a..9da38de34277 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -707,4 +707,6 @@ void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>>  int spapr_vcpu_id(PowerPCCPU *cpu);
>>  PowerPCCPU *spapr_find_cpu(int vcpu_id);
>>  
>> +Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp);
>> +
>>  #endif /* HW_SPAPR_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-24  8:15     ` Cédric Le Goater
@ 2017-11-26 21:52       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-11-26 21:52 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc, qemu-devel, David Gibson

On Fri, 2017-11-24 at 09:15 +0100, Cédric Le Goater wrote:
> So The Linux driver is expected to choose priority 6. The priority
> validity is then checked in each hcall returning H_P4/H_P3 in case of 
> failure.  
> 
> But it is true that we scale the arrays with :
>  
>     #define XIVE_PRIORITY_MAX  7
> 
> Do you want QEMU to completely remove prio 7 ? 

I'd like qemu to be consistent, at least make sure it errors out if the
OS tries to configue prio 7 or route an irq to it.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 01/25] ppc/xics: introduce an icp_create() helper
  2017-11-24  9:55     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-11-27  7:20       ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-11-27  7:20 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

On Fri, Nov 24, 2017 at 10:55:47AM +0100, Greg Kurz wrote:
> On Fri, 24 Nov 2017 13:51:00 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Nov 23, 2017 at 02:29:31PM +0100, Cédric Le Goater wrote:
> > > The sPAPR and the PowerNV core objects create the interrupt presenter
> > > object of the CPUs in a very similar way. Let's provide a common
> > > routine in which we use the presenter 'type' as a child identifier.
> > > 
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>  
> > 
> > One tiny nit.., apart from that
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > 
> > > ---
> > >  hw/intc/xics.c          | 22 ++++++++++++++++++++++
> > >  hw/ppc/pnv_core.c       | 10 +---------
> > >  hw/ppc/spapr_cpu_core.c | 13 ++-----------
> > >  include/hw/ppc/xics.h   |  3 +++
> > >  4 files changed, 28 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> > > index a1cc0e420c98..e4ccdff8f577 100644
> > > --- a/hw/intc/xics.c
> > > +++ b/hw/intc/xics.c
> > > @@ -384,6 +384,28 @@ static const TypeInfo icp_info = {
> > >      .class_size = sizeof(ICPStateClass),
> > >  };
> > >  
> > > +Object *icp_create(CPUState *cs, const char *type, XICSFabric *xi, Error **errp)
> > > +{
> > > +    Object *child = OBJECT(cs);  
> > 
> > In the original context 'child' made sense, since it was the child
> > object of the core.  Here, it's misleading, since it's the parent of
> > the xics link.  It's only used in a couple of places, so I suggest you
> 
> Oops yes :)
> 
> > just opencode OBJECT(cs) in each place.
> > 
> 
> or rename child to owner, as it is done with DRCs and TCE tables.

Sure.  Either's fine by me.


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller Cédric Le Goater
@ 2017-11-28  5:40   ` David Gibson
  2017-11-28 10:44     ` Cédric Le Goater
  2017-11-29 11:49   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  1 sibling, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-28  5:40 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 11694 bytes --]

On Thu, Nov 23, 2017 at 02:29:38PM +0100, Cédric Le Goater wrote:
> The XIVE interrupt controller uses a set of tables to redirect exception
> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
> table, also known as Event Assignment Structure (EAS), is one them.
> 
> The XIVE model is designed to make use of the full range of the IRQ
> number space and does not use an offset like the XICS mode does.
> Hence, the IVE table is directly indexed by the IRQ number.
> 
> The IVE stores Event Queue data associated with a source. The lookups
> are performed when the source is configured or when an event is
> triggered.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  default-configs/ppc64-softmmu.mak |   1 +
>  hw/intc/Makefile.objs             |   1 +
>  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive-internal.h           |  50 ++++++++++++
>  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
>  5 files changed, 261 insertions(+)
>  create mode 100644 hw/intc/spapr_xive.c
>  create mode 100644 hw/intc/xive-internal.h
>  create mode 100644 include/hw/ppc/spapr_xive.h
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index d1b3a6dd50f8..4a7f6a0696de 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -56,6 +56,7 @@ CONFIG_SM501=y
>  CONFIG_XICS=$(CONFIG_PSERIES)
>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>  # For PReP
>  CONFIG_SERIAL_ISA=y
>  CONFIG_MC146818RTC=y
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index ae358569a155..49e13e7aeeee 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
>  obj-$(CONFIG_XICS) += xics.o
>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> new file mode 100644
> index 000000000000..b2fc3007c85f
> --- /dev/null
> +++ b/hw/intc/spapr_xive.c
> @@ -0,0 +1,165 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE model
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "target/ppc/cpu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/dma.h"
> +#include "monitor/monitor.h"
> +#include "hw/ppc/spapr_xive.h"
> +
> +#include "xive-internal.h"
> +
> +/*
> + * Main XIVE object
> + */
> +
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> +{
> +    int i;
> +
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        XiveIVE *ive = &xive->ivt[i];
> +
> +        if (!(ive->w & IVE_VALID)) {
> +            continue;
> +        }
> +
> +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> +                       ive->w & IVE_MASKED ? "M" : " ",
> +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
> +    }
> +}
> +
> +void spapr_xive_reset(void *dev)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +    int i;
> +
> +    /* Mask all valid IVEs in the IRQ number space. */
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        XiveIVE *ive = &xive->ivt[i];
> +        if (ive->w & IVE_VALID) {
> +            ive->w |= IVE_MASKED;
> +        }
> +    }
> +}
> +
> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +
> +    if (!xive->nr_irqs) {
> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> +        return;
> +    }
> +
> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> +
> +    qemu_register_reset(spapr_xive_reset, dev);
> +}
> +
> +static const VMStateDescription vmstate_spapr_xive_ive = {
> +    .name = TYPE_SPAPR_XIVE "/ive",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT64(w, XiveIVE),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool vmstate_spapr_xive_needed(void *opaque)
> +{
> +    /* TODO check machine XIVE support */
> +    return true;
> +}
> +
> +static const VMStateDescription vmstate_spapr_xive = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = vmstate_spapr_xive_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
> +                                           vmstate_spapr_xive_ive, XiveIVE),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property spapr_xive_properties[] = {
> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->realize = spapr_xive_realize;
> +    dc->props = spapr_xive_properties;
> +    dc->desc = "sPAPR XIVE interrupt controller";
> +    dc->vmsd = &vmstate_spapr_xive;
> +}
> +
> +static const TypeInfo spapr_xive_info = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .parent = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(sPAPRXive),
> +    .class_init = spapr_xive_class_init,
> +};
> +
> +static void spapr_xive_register_types(void)
> +{
> +    type_register_static(&spapr_xive_info);
> +}
> +
> +type_init(spapr_xive_register_types)
> +
> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
> +{
> +    return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
> +}
> +
> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
> +
> +    if (!ive) {
> +        return false;
> +    }
> +
> +    ive->w |= IVE_VALID;
> +    return true;
> +}

As I said in another comment, I don't like the name: sets what, exactly?

It's not really clear to me what the VALID bit means.  Why would an
irq within the allocated range be invalid?

> +
> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
> +
> +    if (!ive) {
> +        return false;
> +    }
> +
> +    ive->w &= ~IVE_VALID;
> +    return true;
> +}
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> new file mode 100644
> index 000000000000..bea88d82992c
> --- /dev/null
> +++ b/hw/intc/xive-internal.h
> @@ -0,0 +1,50 @@
> +/*
> + * QEMU PowerPC XIVE model
> + *
> + * Copyright 2016,2017 IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#ifndef _INTC_XIVE_INTERNAL_H
> +#define _INTC_XIVE_INTERNAL_H
> +
> +/* Utilities to manipulate these (originaly from OPAL) */
> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
> +#define SETFIELD(m, v, val)                             \
> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
> +
> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
> +                                 PPC_BIT32(bs))
> +
> +/* IVE/EAS
> + *
> + * One per interrupt source. Targets that interrupt to a given EQ
> + * and provides the corresponding logical interrupt number (EQ data)
> + *
> + * We also map this structure to the escalation descriptor inside
> + * an EQ, though in that case the valid and masked bits are not used.
> + */
> +typedef struct XiveIVE {
> +        /* Use a single 64-bit definition to make it easier to
> +         * perform atomic updates
> +         */
> +        uint64_t        w;
> +#define IVE_VALID       PPC_BIT(0)
> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> +} XiveIVE;
> +
> +void spapr_xive_reset(void *dev);
> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
> +
> +#endif /* _INTC_XIVE_INTERNAL_H */
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> new file mode 100644
> index 000000000000..795b3f4ded7c
> --- /dev/null
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -0,0 +1,44 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE model
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef PPC_SPAPR_XIVE_H
> +#define PPC_SPAPR_XIVE_H
> +
> +#include <hw/sysbus.h>
> +
> +typedef struct sPAPRXive sPAPRXive;
> +typedef struct XiveIVE XiveIVE;
> +
> +#define TYPE_SPAPR_XIVE "spapr-xive"
> +#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> +
> +struct sPAPRXive {
> +    SysBusDevice parent;
> +
> +    /* Properties */
> +    uint32_t     nr_irqs;
> +
> +    /* XIVE internal tables */
> +    XiveIVE      *ivt;
> +};
> +
> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> +
> +#endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources Cédric Le Goater
@ 2017-11-28  5:45   ` David Gibson
  2017-11-28 18:18     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-28  5:45 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 5582 bytes --]

On Thu, Nov 23, 2017 at 02:29:39PM +0100, Cédric Le Goater wrote:
> These are very similar to the XICS handlers in a simpler form. They make
> use of a status array for the LSI interrupts. The spapr_xive_irq() routine
> in charge of triggering the CPU interrupt line will be filled later on.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Is the status word you add here architected as part of the XIVE spec,
or purely internal / implementation specific?

> ---
>  hw/intc/spapr_xive.c        | 55 +++++++++++++++++++++++++++++++++++++++++++--
>  include/hw/ppc/spapr_xive.h | 14 +++++++++++-
>  2 files changed, 66 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index b2fc3007c85f..66c533fb1d78 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -26,6 +26,47 @@
>  
>  #include "xive-internal.h"
>  
> +static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> +{
> +
> +}
> +
> +/*
> + * XIVE Interrupt Source
> + */
> +static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int lisn, int val)
> +{
> +    if (val) {
> +        spapr_xive_irq(xive, lisn);
> +    }
> +}
> +
> +static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int lisn, int val)
> +{
> +    if (val) {
> +        xive->status[lisn] |= XIVE_STATUS_ASSERTED;
> +    } else {
> +        xive->status[lisn] &= ~XIVE_STATUS_ASSERTED;
> +    }
> +
> +    if (xive->status[lisn] & XIVE_STATUS_ASSERTED &&
> +        !(xive->status[lisn] & XIVE_STATUS_SENT)) {
> +        xive->status[lisn] |= XIVE_STATUS_SENT;
> +        spapr_xive_irq(xive, lisn);
> +    }
> +}
> +
> +static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +
> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
> +        spapr_xive_source_set_irq_lsi(xive, lisn, val);
> +    } else {
> +        spapr_xive_source_set_irq_msi(xive, lisn, val);
> +    }
> +}
> +
>  /*
>   * Main XIVE object
>   */
> @@ -41,7 +82,8 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>              continue;
>          }
>  
> -        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> +        monitor_printf(mon, "  %4x %s %s %08x %08x\n", i,
> +                       spapr_xive_irq_is_lsi(xive, i) ? "LSI" : "MSI",
>                         ive->w & IVE_MASKED ? "M" : " ",
>                         (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>                         (int) GETFIELD(IVE_EQ_DATA, ive->w));
> @@ -53,6 +95,8 @@ void spapr_xive_reset(void *dev)
>      sPAPRXive *xive = SPAPR_XIVE(dev);
>      int i;
>  
> +    /* Do not clear IRQs status */
> +
>      /* Mask all valid IVEs in the IRQ number space. */
>      for (i = 0; i < xive->nr_irqs; i++) {
>          XiveIVE *ive = &xive->ivt[i];
> @@ -71,6 +115,11 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    /* QEMU IRQs */
> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
> +                                     xive->nr_irqs);
> +    xive->status = g_malloc0(xive->nr_irqs);
> +
>      /* Allocate the IVT (Interrupt Virtualization Table) */
>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>  
> @@ -102,6 +151,7 @@ static const VMStateDescription vmstate_spapr_xive = {
>          VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>          VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
>                                             vmstate_spapr_xive_ive, XiveIVE),
> +        VMSTATE_VBUFFER_UINT32(status, sPAPRXive, 1, NULL, nr_irqs),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> @@ -140,7 +190,7 @@ XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
>      return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
>  }
>  
> -bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi)
>  {
>      XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
>  
> @@ -149,6 +199,7 @@ bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
>      }
>  
>      ive->w |= IVE_VALID;
> +    xive->status[lisn] |= lsi ? XIVE_STATUS_LSI : 0;

How does a hardware XIVE know which irqs are LSI and which are MSI?

>      return true;
>  }
>  
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 795b3f4ded7c..6a799cdaba66 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -33,11 +33,23 @@ struct sPAPRXive {
>      /* Properties */
>      uint32_t     nr_irqs;
>  
> +     /* IRQ */
> +    qemu_irq     *qirqs;
> +#define XIVE_STATUS_LSI                0x1
> +#define XIVE_STATUS_ASSERTED           0x2
> +#define XIVE_STATUS_SENT               0x4
> +    uint8_t      *status;
> +
>      /* XIVE internal tables */
>      XiveIVE      *ivt;
>  };
>  
> -bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
> +static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
> +{
> +    return xive->status[lisn] & XIVE_STATUS_LSI;
> +}
> +
> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi);
>  bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the " Cédric Le Goater
@ 2017-11-28  6:38   ` David Gibson
  2017-11-28 18:33     ` Cédric Le Goater
  2017-12-02 14:23     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-11-28  6:38 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 13650 bytes --]

On Thu, Nov 23, 2017 at 02:29:40PM +0100, Cédric Le Goater wrote:
> Each interrupt source is associated with a two bit state machine
> called an Event State Buffer (ESB). The bits are named "P" (pending)
> and "Q" (queued) and can be controlled by MMIO. It is used to trigger
> events. See code for more details on the states and transitions.
> 
> The MMIO space for the ESB translation is 512GB large on baremetal
> (powernv) systems and the BAR depends on the chip id. In our model for
> the sPAPR machine, we choose to only map a sub memory region for the
> provisionned IRQ numbers and to use the mapping address of chip 0 on a
> real system. The OS will get the address of the MMIO page of the ESB
> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 268 +++++++++++++++++++++++++++++++++++++++++++-
>  include/hw/ppc/spapr_xive.h |   8 ++
>  2 files changed, 275 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 66c533fb1d78..f45f50fd017e 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -32,6 +32,216 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>  }
>  
>  /*
> + * "magic" Event State Buffer (ESB) MMIO offsets.
> + *
> + * Each interrupt source has a 2-bit state machine called ESB
> + * which can be controlled by MMIO. It's made of 2 bits, P and
> + * Q. P indicates that an interrupt is pending (has been sent
> + * to a queue and is waiting for an EOI). Q indicates that the
> + * interrupt has been triggered while pending.
> + *
> + * This acts as a coalescing mechanism in order to guarantee
> + * that a given interrupt only occurs at most once in a queue.
> + *
> + * When doing an EOI, the Q bit will indicate if the interrupt
> + * needs to be re-triggered.
> + *
> + * The following offsets into the ESB MMIO allow to read or
> + * manipulate the PQ bits. They must be used with an 8-bytes
> + * load instruction. They all return the previous state of the
> + * interrupt (atomically).
> + *
> + * Additionally, some ESB pages support doing an EOI via a
> + * store at 0 and some ESBs support doing a trigger via a
> + * separate trigger page.
> + */
> +#define XIVE_ESB_GET            0x800
> +#define XIVE_ESB_SET_PQ_00      0xc00
> +#define XIVE_ESB_SET_PQ_01      0xd00
> +#define XIVE_ESB_SET_PQ_10      0xe00
> +#define XIVE_ESB_SET_PQ_11      0xf00
> +
> +#define XIVE_ESB_VAL_P          0x2
> +#define XIVE_ESB_VAL_Q          0x1
> +
> +#define XIVE_ESB_RESET          0x0
> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
> +
> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t lisn)
> +{
> +    uint32_t byte = lisn / 4;
> +    uint32_t bit  = (lisn % 4) * 2;
> +
> +    assert(byte < xive->sbe_size);
> +
> +    return (xive->sbe[byte] >> bit) & 0x3;
> +}
> +
> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t lisn, uint8_t pq)
> +{
> +    uint32_t byte = lisn / 4;
> +    uint32_t bit  = (lisn % 4) * 2;
> +    uint8_t old, new;
> +
> +    assert(byte < xive->sbe_size);
> +
> +    old = xive->sbe[byte];
> +
> +    new = xive->sbe[byte] & ~(0x3 << bit);
> +    new |= (pq & 0x3) << bit;
> +
> +    xive->sbe[byte] = new;
> +
> +    return (old >> bit) & 0x3;
> +}
> +
> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t lisn)
> +{
> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
> +
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_PENDING:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_QUEUED:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_OFF:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t lisn)
> +{
> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
> +
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_PENDING:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
> +        return true;
> +    case XIVE_ESB_QUEUED:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
> +        return true;
> +    case XIVE_ESB_OFF:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +/*
> + * XIVE Interrupt Source MMIOs
> + */
> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t lisn)
> +{
> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
> +        xive->status[lisn] &= ~XIVE_STATUS_SENT;
> +    }
> +}
> +
> +/* TODO: handle second page
> + *
> + * Some HW use a separate page for trigger. We only support the case
> + * in which the trigger can be done in the same page as the EOI.
> + */
> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t lisn = addr >> xive->esb_shift;
> +    XiveIVE *ive;
> +    uint64_t ret = -1;
> +
> +    ive = spapr_xive_get_ive(xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID))  {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> +        goto out;
> +    }
> +
> +    switch (offset) {
> +    case 0:
> +        spapr_xive_source_eoi(xive, lisn);

Hrm.  I don't love that you're dealing with clearing that LSI bit
here, but setting it at a different level.

The state machines are doing my head in a bit, is there any way
you could derive the STATUS_SENT bit from the PQ bits?

> +        /* return TRUE or FALSE depending on PQ value */
> +        ret = spapr_xive_pq_eoi(xive, lisn);
> +        break;
> +
> +    case XIVE_ESB_GET:
> +        ret = spapr_xive_pq_get(xive, lisn);
> +        break;
> +
> +    case XIVE_ESB_SET_PQ_00:
> +    case XIVE_ESB_SET_PQ_01:
> +    case XIVE_ESB_SET_PQ_10:
> +    case XIVE_ESB_SET_PQ_11:
> +        ret = spapr_xive_pq_set(xive, lisn, (offset >> 8) & 0x3);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
> +    }
> +
> +out:
> +    return ret;
> +}
> +
> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
> +                           uint64_t value, unsigned size)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t lisn = addr >> xive->esb_shift;
> +    XiveIVE *ive;
> +    bool notify = false;
> +
> +    ive = spapr_xive_get_ive(xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID))  {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> +        return;
> +    }
> +
> +    switch (offset) {
> +    case 0:
> +        /* TODO: should we trigger even if the IVE is masked ? */
> +        notify = spapr_xive_pq_trigger(xive, lisn);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> +                      offset);
> +        return;
> +    }
> +
> +    if (notify && !(ive->w & IVE_MASKED)) {
> +        qemu_irq_pulse(xive->qirqs[lisn]);
> +    }
> +}
> +
> +static const MemoryRegionOps spapr_xive_esb_ops = {
> +    .read = spapr_xive_esb_read,
> +    .write = spapr_xive_esb_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +/*
>   * XIVE Interrupt Source
>   */
>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int lisn, int val)
> @@ -70,6 +280,33 @@ static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
>  /*
>   * Main XIVE object
>   */
> +#define P9_MMIO_BASE     0x006000000000000ull
> +
> +/* VC BAR contains set translations for the ESBs and the EQs. */
> +#define VC_BAR_DEFAULT   0x10000000000ull
> +#define VC_BAR_SIZE      0x08000000000ull
> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
> +
> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
> +                                            unsigned size)
> +{
> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> +                  __func__, offset, size);
> +    return 0;
> +}
> +
> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
> +                                         uint64_t value, unsigned size)
> +{
> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> +                  __func__, offset, value, size);
> +}
> +
> +static const MemoryRegionOps spapr_xive_esb_default_ops = {
> +    .read = spapr_xive_esb_default_read,
> +    .write = spapr_xive_esb_default_write,
> +    .endianness = DEVICE_BIG_ENDIAN,

I think you should at least have a valid access size field here.

> +};
>  
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>  {
> @@ -77,14 +314,19 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>  
>      for (i = 0; i < xive->nr_irqs; i++) {
>          XiveIVE *ive = &xive->ivt[i];
> +        uint8_t pq;
>  
>          if (!(ive->w & IVE_VALID)) {
>              continue;
>          }
>  
> -        monitor_printf(mon, "  %4x %s %s %08x %08x\n", i,
> +        pq = spapr_xive_pq_get(xive, i);
> +
> +        monitor_printf(mon, "  %4x %s %s %c%c %08x %08x\n", i,
>                         spapr_xive_irq_is_lsi(xive, i) ? "LSI" : "MSI",
>                         ive->w & IVE_MASKED ? "M" : " ",
> +                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
> +                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
>                         (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>                         (int) GETFIELD(IVE_EQ_DATA, ive->w));
>      }
> @@ -104,6 +346,9 @@ void spapr_xive_reset(void *dev)
>              ive->w |= IVE_MASKED;
>          }
>      }
> +
> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
> +    memset(xive->sbe, 0x55, xive->sbe_size);
>  }
>  
>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
> @@ -123,6 +368,26 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      /* Allocate the IVT (Interrupt Virtualization Table) */
>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>  
> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
> +    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
> +    xive->sbe = g_malloc0(xive->sbe_size);
> +
> +    /* VC BAR. That's the full window but we will only map the
> +     * subregions in use. */
> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
> +    xive->esb_shift = ESB_SHIFT;

Any point to having this as a variable, if it's always the same size?

> +
> +    /* Install default memory region handlers to log bogus access */
> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
> +                          NULL, "xive.esb.full", VC_BAR_SIZE);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
> +
> +    /* Install the ESB memory region in the overall one */
> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
> +                          xive, "xive.esb",
> +                          (1ull << xive->esb_shift) * xive->nr_irqs);
> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);

Is there a benegit to to having these nested regions, rather than just
validating the lisn in the read/write functions?

>      qemu_register_reset(spapr_xive_reset, dev);
>  }
>  
> @@ -152,6 +417,7 @@ static const VMStateDescription vmstate_spapr_xive = {
>          VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
>                                             vmstate_spapr_xive_ive, XiveIVE),
>          VMSTATE_VBUFFER_UINT32(status, sPAPRXive, 1, NULL, nr_irqs),
> +        VMSTATE_VBUFFER_UINT32(sbe, sPAPRXive, 1, NULL, sbe_size),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 6a799cdaba66..84c910e62e56 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -42,6 +42,14 @@ struct sPAPRXive {
>  
>      /* XIVE internal tables */
>      XiveIVE      *ivt;
> +    uint8_t      *sbe;
> +    uint32_t     sbe_size;

sbe_size is derivable from nr_irqs, so I don't think there's a point
to storing it separately .

> +
> +    /* ESB memory region */
> +    uint32_t     esb_shift;
> +    hwaddr       esb_base;
> +    MemoryRegion esb_mr;
> +    MemoryRegion esb_iomem;
>  };
>  
>  static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags Cédric Le Goater
@ 2017-11-28  6:40   ` David Gibson
  2017-11-28 18:23     ` Cédric Le Goater
  2017-12-02 14:24     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-11-28  6:40 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2872 bytes --]

On Thu, Nov 23, 2017 at 02:29:41PM +0100, Cédric Le Goater wrote:
> The XIVE interrupt sources can have different characteristics depending
> on their nature and the HW level in use. The sPAPR specs provide a set of
> flags to describe them :
> 
>  - XIVE_SRC_H_INT_ESB  the Event State Buffers are controlled with a
>                        specific hcall H_INT_ESB and not with MMIO
>  - XIVE_SRC_LSI        LSI or MSI source (ICSIRQState level)
>  - XIVE_SRC_TRIGGER    the full function page supports trigger
>  - XIVE_SRC_STORE_EOI  EOI can be done with a store.
> 
> Our QEMU emulation of XIVE for the sPAPR machine gathers all sources under
> a same model and provides a common source with the XIVE_SRC_TRIGGER type.
> So, the above list is mostly informative apart from the XIVE_SRC_LSI flag
> which will be deduced from the XIVE_STATUS_LSI flag.
> 
> The OS retrieves this information on the source with the
> H_INT_GET_SOURCE_INFO hcall.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 4 ++++
>  include/hw/ppc/spapr_xive.h | 7 +++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index f45f50fd017e..b1e3f8710cff 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -368,6 +368,10 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      /* Allocate the IVT (Interrupt Virtualization Table) */
>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>  
> +    /* All sources are emulated under the XIVE object and share the
> +     * same characteristic */
> +    xive->flags = XIVE_SRC_TRIGGER;

You never actually use this field.  And since it always has the same
value, is there a point to storing it?

> +
>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>      xive->sbe = g_malloc0(xive->sbe_size);
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 84c910e62e56..7a308fb4db2b 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -40,6 +40,13 @@ struct sPAPRXive {
>  #define XIVE_STATUS_SENT               0x4
>      uint8_t      *status;
>  
> +    /* Interrupt source flags */
> +#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
> +#define XIVE_SRC_LSI           (1ull << (63 - 61))
> +#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
> +#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
> +    uint32_t     flags;
> +
>      /* XIVE internal tables */
>      XiveIVE      *ivt;
>      uint8_t      *sbe;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-28  5:40   ` David Gibson
@ 2017-11-28 10:44     ` Cédric Le Goater
  2017-11-29  4:47       ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-28 10:44 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/28/2017 05:40 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:38PM +0100, Cédric Le Goater wrote:
>> The XIVE interrupt controller uses a set of tables to redirect exception
>> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
>> table, also known as Event Assignment Structure (EAS), is one them.
>>
>> The XIVE model is designed to make use of the full range of the IRQ
>> number space and does not use an offset like the XICS mode does.
>> Hence, the IVE table is directly indexed by the IRQ number.
>>
>> The IVE stores Event Queue data associated with a source. The lookups
>> are performed when the source is configured or when an event is
>> triggered.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  default-configs/ppc64-softmmu.mak |   1 +
>>  hw/intc/Makefile.objs             |   1 +
>>  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive-internal.h           |  50 ++++++++++++
>>  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
>>  5 files changed, 261 insertions(+)
>>  create mode 100644 hw/intc/spapr_xive.c
>>  create mode 100644 hw/intc/xive-internal.h
>>  create mode 100644 include/hw/ppc/spapr_xive.h
>>
>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>> index d1b3a6dd50f8..4a7f6a0696de 100644
>> --- a/default-configs/ppc64-softmmu.mak
>> +++ b/default-configs/ppc64-softmmu.mak
>> @@ -56,6 +56,7 @@ CONFIG_SM501=y
>>  CONFIG_XICS=$(CONFIG_PSERIES)
>>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>>  # For PReP
>>  CONFIG_SERIAL_ISA=y
>>  CONFIG_MC146818RTC=y
>> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
>> index ae358569a155..49e13e7aeeee 100644
>> --- a/hw/intc/Makefile.objs
>> +++ b/hw/intc/Makefile.objs
>> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
>>  obj-$(CONFIG_XICS) += xics.o
>>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> new file mode 100644
>> index 000000000000..b2fc3007c85f
>> --- /dev/null
>> +++ b/hw/intc/spapr_xive.c
>> @@ -0,0 +1,165 @@
>> +/*
>> + * QEMU PowerPC sPAPR XIVE model
>> + *
>> + * Copyright (c) 2017, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "qapi/error.h"
>> +#include "target/ppc/cpu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/dma.h"
>> +#include "monitor/monitor.h"
>> +#include "hw/ppc/spapr_xive.h"
>> +
>> +#include "xive-internal.h"
>> +
>> +/*
>> + * Main XIVE object
>> + */
>> +
>> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < xive->nr_irqs; i++) {
>> +        XiveIVE *ive = &xive->ivt[i];
>> +
>> +        if (!(ive->w & IVE_VALID)) {
>> +            continue;
>> +        }
>> +
>> +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
>> +                       ive->w & IVE_MASKED ? "M" : " ",
>> +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>> +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
>> +    }
>> +}
>> +
>> +void spapr_xive_reset(void *dev)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>> +    int i;
>> +
>> +    /* Mask all valid IVEs in the IRQ number space. */
>> +    for (i = 0; i < xive->nr_irqs; i++) {
>> +        XiveIVE *ive = &xive->ivt[i];
>> +        if (ive->w & IVE_VALID) {
>> +            ive->w |= IVE_MASKED;
>> +        }
>> +    }
>> +}
>> +
>> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>> +
>> +    if (!xive->nr_irqs) {
>> +        error_setg(errp, "Number of interrupt needs to be greater 0");
>> +        return;
>> +    }
>> +
>> +    /* Allocate the IVT (Interrupt Virtualization Table) */
>> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>> +
>> +    qemu_register_reset(spapr_xive_reset, dev);
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_xive_ive = {
>> +    .name = TYPE_SPAPR_XIVE "/ive",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT64(w, XiveIVE),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool vmstate_spapr_xive_needed(void *opaque)
>> +{
>> +    /* TODO check machine XIVE support */
>> +    return true;
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_xive = {
>> +    .name = TYPE_SPAPR_XIVE,
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = vmstate_spapr_xive_needed,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
>> +                                           vmstate_spapr_xive_ive, XiveIVE),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static Property spapr_xive_properties[] = {
>> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +    dc->realize = spapr_xive_realize;
>> +    dc->props = spapr_xive_properties;
>> +    dc->desc = "sPAPR XIVE interrupt controller";
>> +    dc->vmsd = &vmstate_spapr_xive;
>> +}
>> +
>> +static const TypeInfo spapr_xive_info = {
>> +    .name = TYPE_SPAPR_XIVE,
>> +    .parent = TYPE_SYS_BUS_DEVICE,
>> +    .instance_size = sizeof(sPAPRXive),
>> +    .class_init = spapr_xive_class_init,
>> +};
>> +
>> +static void spapr_xive_register_types(void)
>> +{
>> +    type_register_static(&spapr_xive_info);
>> +}
>> +
>> +type_init(spapr_xive_register_types)
>> +
>> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
>> +}
>> +
>> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
>> +
>> +    if (!ive) {
>> +        return false;
>> +    }
>> +
>> +    ive->w |= IVE_VALID;
>> +    return true;
>> +}
> 
> As I said in another comment, I don't like the name: sets what, exactly?

may be spapr_xive_irq_alloc() would be better ? I guess so. 
> 
> It's not really clear to me what the VALID bit means.  Why would an
> irq within the allocated range be invalid?

'enable' might be better a better choice but that is how the specs 
refer to this bit. 

It it used by the HW to let through or stop the notification process 
immediately when a trigger is performed on the ESB MMIOs.

Thanks,

C.

> 
>> +
>> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
>> +
>> +    if (!ive) {
>> +        return false;
>> +    }
>> +
>> +    ive->w &= ~IVE_VALID;
>> +    return true;
>> +}
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> new file mode 100644
>> index 000000000000..bea88d82992c
>> --- /dev/null
>> +++ b/hw/intc/xive-internal.h
>> @@ -0,0 +1,50 @@
>> +/*
>> + * QEMU PowerPC XIVE model
>> + *
>> + * Copyright 2016,2017 IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +#ifndef _INTC_XIVE_INTERNAL_H
>> +#define _INTC_XIVE_INTERNAL_H
>> +
>> +/* Utilities to manipulate these (originaly from OPAL) */
>> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
>> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
>> +#define SETFIELD(m, v, val)                             \
>> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
>> +
>> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
>> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
>> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
>> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>> +                                 PPC_BIT32(bs))
>> +
>> +/* IVE/EAS
>> + *
>> + * One per interrupt source. Targets that interrupt to a given EQ
>> + * and provides the corresponding logical interrupt number (EQ data)
>> + *
>> + * We also map this structure to the escalation descriptor inside
>> + * an EQ, though in that case the valid and masked bits are not used.
>> + */
>> +typedef struct XiveIVE {
>> +        /* Use a single 64-bit definition to make it easier to
>> +         * perform atomic updates
>> +         */
>> +        uint64_t        w;
>> +#define IVE_VALID       PPC_BIT(0)
>> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
>> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>> +} XiveIVE;
>> +
>> +void spapr_xive_reset(void *dev);
>> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
>> +
>> +#endif /* _INTC_XIVE_INTERNAL_H */
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> new file mode 100644
>> index 000000000000..795b3f4ded7c
>> --- /dev/null
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -0,0 +1,44 @@
>> +/*
>> + * QEMU PowerPC sPAPR XIVE model
>> + *
>> + * Copyright (c) 2017, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef PPC_SPAPR_XIVE_H
>> +#define PPC_SPAPR_XIVE_H
>> +
>> +#include <hw/sysbus.h>
>> +
>> +typedef struct sPAPRXive sPAPRXive;
>> +typedef struct XiveIVE XiveIVE;
>> +
>> +#define TYPE_SPAPR_XIVE "spapr-xive"
>> +#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
>> +
>> +struct sPAPRXive {
>> +    SysBusDevice parent;
>> +
>> +    /* Properties */
>> +    uint32_t     nr_irqs;
>> +
>> +    /* XIVE internal tables */
>> +    XiveIVE      *ivt;
>> +};
>> +
>> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
>> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>> +
>> +#endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper
  2017-11-24 12:26     ` Cédric Le Goater
@ 2017-11-28 10:56       ` Greg Kurz
  0 siblings, 0 replies; 128+ messages in thread
From: Greg Kurz @ 2017-11-28 10:56 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, David Gibson

On Fri, 24 Nov 2017 12:26:21 +0000
Cédric Le Goater <clg@kaod.org> wrote:

> On 11/24/2017 10:09 AM, Greg Kurz wrote:
> > On Thu, 23 Nov 2017 14:29:33 +0100
> > Cédric Le Goater <clg@kaod.org> wrote:
> >   
> >> On sPAPR, the creation of the interrupt presenter depends on some of
> >> the machine attributes. When the XIVE interrupt mode is available,
> >> this will get more complex. So provide a machine-level helper to
> >> isolate the process and hide the details to the sPAPR core realize
> >> function.
> >>  
> > 
> > Not sure it makes sense to introduce this helper that early in the series...
> > what about folding it in patch 23 where it is really needed ?  
> 
> It does 'icp_type' and the 'xics_fabric' which are machine concepts 
> around the sPAPR interrupt controller model.
> 

Oh yes you're right, I guess I was looking at this from the perspective of
my PHB hotplug series :)

Hence,

Reviewed-by: Greg Kurz <groug@kaod.org>

> But yes, it could come before patch 23. May be not folded, though.
> 
> C.
> 
> 
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/ppc/spapr.c          | 14 ++++++++++++++
> >>  hw/ppc/spapr_cpu_core.c |  2 +-
> >>  include/hw/ppc/spapr.h  |  2 ++
> >>  3 files changed, 17 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 174e7ff0678d..925cbd3c1bf4 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -3556,6 +3556,20 @@ static ICPState *spapr_icp_get(XICSFabric *xi, int vcpu_id)
> >>      return cpu ? ICP(cpu->intc) : NULL;
> >>  }
> >>  
> >> +Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
> >> +{
> >> +    Error *local_err = NULL;
> >> +    Object *obj;
> >> +
> >> +    obj = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +        return NULL;
> >> +    }
> >> +
> >> +    return obj;
> >> +}
> >> +
> >>  static void spapr_pic_print_info(InterruptStatsProvider *obj,
> >>                                   Monitor *mon)
> >>  {
> >> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> >> index f7cc74512481..61a9850e688b 100644
> >> --- a/hw/ppc/spapr_cpu_core.c
> >> +++ b/hw/ppc/spapr_cpu_core.c
> >> @@ -122,7 +122,7 @@ static void spapr_cpu_core_realize_child(Object *child,
> >>          goto error;
> >>      }
> >>  
> >> -    cpu->intc = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
> >> +    cpu->intc = spapr_icp_create(spapr, cs, &local_err);
> >>      if (local_err) {
> >>          goto error;
> >>      }
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index 9d21ca9bde3a..9da38de34277 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -707,4 +707,6 @@ void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
> >>  int spapr_vcpu_id(PowerPCCPU *cpu);
> >>  PowerPCCPU *spapr_find_cpu(int vcpu_id);
> >>  
> >> +Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp);
> >> +
> >>  #endif /* HW_SPAPR_H */  
> >   
> 
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine Cédric Le Goater
  2017-11-24  3:13   ` David Gibson
@ 2017-11-28 10:57   ` Greg Kurz
  1 sibling, 0 replies; 128+ messages in thread
From: Greg Kurz @ 2017-11-28 10:57 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On Thu, 23 Nov 2017 14:29:34 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> Also change the prototype to use a sPAPRMachineState and prefix them
> with spapr_irq_. It will let us synchronise the IRQ allocation with
> the XIVE interrupt mode when available.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/intc/trace-events   |   4 --
>  hw/intc/xics_spapr.c   | 114 -------------------------------------------------
>  hw/ppc/spapr.c         | 114 +++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_events.c  |   4 +-
>  hw/ppc/spapr_pci.c     |   8 ++--
>  hw/ppc/spapr_vio.c     |   2 +-
>  hw/ppc/trace-events    |   4 ++
>  include/hw/ppc/spapr.h |   6 +++
>  include/hw/ppc/xics.h  |   4 --
>  9 files changed, 131 insertions(+), 129 deletions(-)
> 
> diff --git a/hw/intc/trace-events b/hw/intc/trace-events
> index b298fac7c6a8..7077aaaee6d0 100644
> --- a/hw/intc/trace-events
> +++ b/hw/intc/trace-events
> @@ -64,10 +64,6 @@ xics_ics_simple_set_irq_lsi(int srcno, int nr) "set_irq_lsi: srcno %d [irq 0x%x]
>  xics_ics_simple_write_xive(int nr, int srcno, int server, uint8_t priority) "ics_write_xive: irq 0x%x [src %d] server 0x%x prio 0x%x"
>  xics_ics_simple_reject(int nr, int srcno) "reject irq 0x%x [src %d]"
>  xics_ics_simple_eoi(int nr) "ics_eoi: irq 0x%x"
> -xics_alloc(int irq) "irq %d"
> -xics_alloc_block(int first, int num, bool lsi, int align) "first irq %d, %d irqs, lsi=%d, alignnum %d"
> -xics_ics_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
> -xics_ics_free_warn(int src, int irq) "Source#%d, irq %d is already free"
>  
>  # hw/intc/s390_flic_kvm.c
>  flic_create_device(int err) "flic: create device failed %d"
> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> index e8c0a1b3e903..5a0967caf430 100644
> --- a/hw/intc/xics_spapr.c
> +++ b/hw/intc/xics_spapr.c
> @@ -245,120 +245,6 @@ void xics_spapr_init(sPAPRMachineState *spapr)
>      spapr_register_hypercall(H_IPOLL, h_ipoll);
>  }
>  
> -#define ICS_IRQ_FREE(ics, srcno)   \
> -    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
> -
> -static int ics_find_free_block(ICSState *ics, int num, int alignnum)
> -{
> -    int first, i;
> -
> -    for (first = 0; first < ics->nr_irqs; first += alignnum) {
> -        if (num > (ics->nr_irqs - first)) {
> -            return -1;
> -        }
> -        for (i = first; i < first + num; ++i) {
> -            if (!ICS_IRQ_FREE(ics, i)) {
> -                break;
> -            }
> -        }
> -        if (i == (first + num)) {
> -            return first;
> -        }
> -    }
> -
> -    return -1;
> -}
> -
> -int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp)
> -{
> -    int irq;
> -
> -    if (!ics) {
> -        return -1;
> -    }
> -    if (irq_hint) {
> -        if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
> -            error_setg(errp, "can't allocate IRQ %d: already in use", irq_hint);
> -            return -1;
> -        }
> -        irq = irq_hint;
> -    } else {
> -        irq = ics_find_free_block(ics, 1, 1);
> -        if (irq < 0) {
> -            error_setg(errp, "can't allocate IRQ: no IRQ left");
> -            return -1;
> -        }
> -        irq += ics->offset;
> -    }
> -
> -    ics_set_irq_type(ics, irq - ics->offset, lsi);
> -    trace_xics_alloc(irq);
> -
> -    return irq;
> -}
> -
> -/*
> - * Allocate block of consecutive IRQs, and return the number of the first IRQ in
> - * the block. If align==true, aligns the first IRQ number to num.
> - */
> -int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
> -                          bool align, Error **errp)
> -{
> -    int i, first = -1;
> -
> -    if (!ics) {
> -        return -1;
> -    }
> -
> -    /*
> -     * MSIMesage::data is used for storing VIRQ so
> -     * it has to be aligned to num to support multiple
> -     * MSI vectors. MSI-X is not affected by this.
> -     * The hint is used for the first IRQ, the rest should
> -     * be allocated continuously.
> -     */
> -    if (align) {
> -        assert((num == 1) || (num == 2) || (num == 4) ||
> -               (num == 8) || (num == 16) || (num == 32));
> -        first = ics_find_free_block(ics, num, num);
> -    } else {
> -        first = ics_find_free_block(ics, num, 1);
> -    }
> -    if (first < 0) {
> -        error_setg(errp, "can't find a free %d-IRQ block", num);
> -        return -1;
> -    }
> -
> -    for (i = first; i < first + num; ++i) {
> -        ics_set_irq_type(ics, i, lsi);
> -    }
> -    first += ics->offset;
> -
> -    trace_xics_alloc_block(first, num, lsi, align);
> -
> -    return first;
> -}
> -
> -static void ics_free(ICSState *ics, int srcno, int num)
> -{
> -    int i;
> -
> -    for (i = srcno; i < srcno + num; ++i) {
> -        if (ICS_IRQ_FREE(ics, i)) {
> -            trace_xics_ics_free_warn(0, i + ics->offset);
> -        }
> -        memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
> -    }
> -}
> -
> -void spapr_ics_free(ICSState *ics, int irq, int num)
> -{
> -    if (ics_valid_irq(ics, irq)) {
> -        trace_xics_ics_free(0, irq, num);
> -        ics_free(ics, irq - ics->offset, num);
> -    }
> -}
> -
>  void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
>  {
>      uint32_t interrupt_server_ranges_prop[] = {
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 925cbd3c1bf4..7ae84d40bdb4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3570,6 +3570,120 @@ Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
>      return obj;
>  }
>  
> +#define ICS_IRQ_FREE(ics, srcno)   \
> +    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
> +
> +static int ics_find_free_block(ICSState *ics, int num, int alignnum)
> +{
> +    int first, i;
> +
> +    for (first = 0; first < ics->nr_irqs; first += alignnum) {
> +        if (num > (ics->nr_irqs - first)) {
> +            return -1;
> +        }
> +        for (i = first; i < first + num; ++i) {
> +            if (!ICS_IRQ_FREE(ics, i)) {
> +                break;
> +            }
> +        }
> +        if (i == (first + num)) {
> +            return first;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
> +                    Error **errp)
> +{
> +    ICSState *ics = spapr->ics;
> +    int irq;
> +
> +    if (!ics) {
> +        return -1;
> +    }
> +    if (irq_hint) {
> +        if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
> +            error_setg(errp, "can't allocate IRQ %d: already in use", irq_hint);
> +            return -1;
> +        }
> +        irq = irq_hint;
> +    } else {
> +        irq = ics_find_free_block(ics, 1, 1);
> +        if (irq < 0) {
> +            error_setg(errp, "can't allocate IRQ: no IRQ left");
> +            return -1;
> +        }
> +        irq += ics->offset;
> +    }
> +
> +    ics_set_irq_type(ics, irq - ics->offset, lsi);
> +    trace_spapr_irq_alloc(irq);
> +
> +    return irq;
> +}
> +
> +/*
> + * Allocate block of consecutive IRQs, and return the number of the first IRQ in
> + * the block. If align==true, aligns the first IRQ number to num.
> + */
> +int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
> +                          bool align, Error **errp)
> +{
> +    ICSState *ics = spapr->ics;
> +    int i, first = -1;
> +
> +    if (!ics) {
> +        return -1;
> +    }
> +
> +    /*
> +     * MSIMesage::data is used for storing VIRQ so
> +     * it has to be aligned to num to support multiple
> +     * MSI vectors. MSI-X is not affected by this.
> +     * The hint is used for the first IRQ, the rest should
> +     * be allocated continuously.
> +     */
> +    if (align) {
> +        assert((num == 1) || (num == 2) || (num == 4) ||
> +               (num == 8) || (num == 16) || (num == 32));
> +        first = ics_find_free_block(ics, num, num);
> +    } else {
> +        first = ics_find_free_block(ics, num, 1);
> +    }
> +    if (first < 0) {
> +        error_setg(errp, "can't find a free %d-IRQ block", num);
> +        return -1;
> +    }
> +
> +    for (i = first; i < first + num; ++i) {
> +        ics_set_irq_type(ics, i, lsi);
> +    }
> +    first += ics->offset;
> +
> +    trace_spapr_irq_alloc_block(first, num, lsi, align);
> +
> +    return first;
> +}
> +
> +void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
> +{
> +    ICSState *ics = spapr->ics;
> +    int srcno = irq - ics->offset;
> +    int i;
> +
> +    if (ics_valid_irq(ics, irq)) {
> +        trace_spapr_irq_free(0, irq, num);
> +        for (i = srcno; i < srcno + num; ++i) {
> +            if (ICS_IRQ_FREE(ics, i)) {
> +                trace_spapr_irq_free_warn(0, i + ics->offset);
> +            }
> +            memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
> +        }
> +    }
> +}
> +
>  static void spapr_pic_print_info(InterruptStatsProvider *obj,
>                                   Monitor *mon)
>  {
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index e377fc7ddea2..cead596f3e7a 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -718,7 +718,7 @@ void spapr_events_init(sPAPRMachineState *spapr)
>      spapr->event_sources = spapr_event_sources_new();
>  
>      spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_EPOW,
> -                                 spapr_ics_alloc(spapr->ics, 0, false,
> +                                 spapr_irq_alloc(spapr, 0, false,
>                                                    &error_fatal));
>  
>      /* NOTE: if machine supports modern/dedicated hotplug event source,
> @@ -731,7 +731,7 @@ void spapr_events_init(sPAPRMachineState *spapr)
>       */
>      if (spapr->use_hotplug_event_source) {
>          spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_HOT_PLUG,
> -                                     spapr_ics_alloc(spapr->ics, 0, false,
> +                                     spapr_irq_alloc(spapr, 0, false,
>                                                        &error_fatal));
>      }
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 5a3122a9f9f9..e0ef77a480e5 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -314,7 +314,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>              return;
>          }
>  
> -        spapr_ics_free(spapr->ics, msi->first_irq, msi->num);
> +        spapr_irq_free(spapr, msi->first_irq, msi->num);
>          if (msi_present(pdev)) {
>              spapr_msi_setmsg(pdev, 0, false, 0, 0);
>          }
> @@ -352,7 +352,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>      }
>  
>      /* Allocate MSIs */
> -    irq = spapr_ics_alloc_block(spapr->ics, req_num, false,
> +    irq = spapr_irq_alloc_block(spapr, req_num, false,
>                             ret_intr_type == RTAS_TYPE_MSI, &err);
>      if (err) {
>          error_reportf_err(err, "Can't allocate MSIs for device %x: ",
> @@ -363,7 +363,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>  
>      /* Release previous MSIs */
>      if (msi) {
> -        spapr_ics_free(spapr->ics, msi->first_irq, msi->num);
> +        spapr_irq_free(spapr, msi->first_irq, msi->num);
>          g_hash_table_remove(phb->msi, &config_addr);
>      }
>  
> @@ -1675,7 +1675,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>          uint32_t irq;
>          Error *local_err = NULL;
>  
> -        irq = spapr_ics_alloc_block(spapr->ics, 1, true, false, &local_err);
> +        irq = spapr_irq_alloc_block(spapr, 1, true, false, &local_err);
>          if (local_err) {
>              error_propagate(errp, local_err);
>              error_prepend(errp, "can't allocate LSIs: ");
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index ea3bc8bd9e21..bb7ed2c537b0 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -454,7 +454,7 @@ static void spapr_vio_busdev_realize(DeviceState *qdev, Error **errp)
>          dev->qdev.id = id;
>      }
>  
> -    dev->irq = spapr_ics_alloc(spapr->ics, dev->irq, false, &local_err);
> +    dev->irq = spapr_irq_alloc(spapr, dev->irq, false, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 4a6a6490fa78..b7c3e64b5ee7 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -12,6 +12,10 @@ spapr_pci_msi_retry(unsigned config_addr, unsigned req_num, unsigned max_irqs) "
>  # hw/ppc/spapr.c
>  spapr_cas_failed(unsigned long n) "DT diff buffer is too small: %ld bytes"
>  spapr_cas_continue(unsigned long n) "Copy changes to the guest: %ld bytes"
> +spapr_irq_alloc(int irq) "irq %d"
> +spapr_irq_alloc_block(int first, int num, bool lsi, int align) "first irq %d, %d irqs, lsi=%d, alignnum %d"
> +spapr_irq_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
> +spapr_irq_free_warn(int src, int irq) "Source#%d, irq %d is already free"
>  
>  # hw/ppc/spapr_hcall.c
>  spapr_cas_pvr_try(uint32_t pvr) "0x%x"
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9da38de34277..7a133f80411a 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -709,4 +709,10 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id);
>  
>  Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp);
>  
> +int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
> +                    Error **errp);
> +int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
> +                          bool align, Error **errp);
> +void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
> +
>  #endif /* HW_SPAPR_H */
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 126b47dec38b..cea462bc7f3e 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -181,10 +181,6 @@ typedef struct XICSFabricClass {
>  
>  #define XICS_IRQS_SPAPR               1024
>  
> -int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp);
> -int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi, bool align,
> -                           Error **errp);
> -void spapr_ics_free(ICSState *ics, int irq, int num);
>  void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>  
>  qemu_irq xics_get_qirq(XICSFabric *xi, int irq);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources
  2017-11-28  5:45   ` David Gibson
@ 2017-11-28 18:18     ` Cédric Le Goater
  2017-12-02 14:26       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-28 18:18 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/28/2017 05:45 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:39PM +0100, Cédric Le Goater wrote:
>> These are very similar to the XICS handlers in a simpler form. They make
>> use of a status array for the LSI interrupts. The spapr_xive_irq() routine
>> in charge of triggering the CPU interrupt line will be filled later on.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Is the status word you add here architected as part of the XIVE spec,
> or purely internal / implementation specific?

this is the model.

> 
>> ---
>>  hw/intc/spapr_xive.c        | 55 +++++++++++++++++++++++++++++++++++++++++++--
>>  include/hw/ppc/spapr_xive.h | 14 +++++++++++-
>>  2 files changed, 66 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index b2fc3007c85f..66c533fb1d78 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -26,6 +26,47 @@
>>  
>>  #include "xive-internal.h"
>>  
>> +static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>> +{
>> +
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source
>> + */
>> +static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int lisn, int val)
>> +{
>> +    if (val) {
>> +        spapr_xive_irq(xive, lisn);
>> +    }
>> +}
>> +
>> +static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int lisn, int val)
>> +{
>> +    if (val) {
>> +        xive->status[lisn] |= XIVE_STATUS_ASSERTED;
>> +    } else {
>> +        xive->status[lisn] &= ~XIVE_STATUS_ASSERTED;
>> +    }
>> +
>> +    if (xive->status[lisn] & XIVE_STATUS_ASSERTED &&
>> +        !(xive->status[lisn] & XIVE_STATUS_SENT)) {
>> +        xive->status[lisn] |= XIVE_STATUS_SENT;
>> +        spapr_xive_irq(xive, lisn);
>> +    }
>> +}
>> +
>> +static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +
>> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
>> +        spapr_xive_source_set_irq_lsi(xive, lisn, val);
>> +    } else {
>> +        spapr_xive_source_set_irq_msi(xive, lisn, val);
>> +    }
>> +}
>> +
>>  /*
>>   * Main XIVE object
>>   */
>> @@ -41,7 +82,8 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>>              continue;
>>          }
>>  
>> -        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
>> +        monitor_printf(mon, "  %4x %s %s %08x %08x\n", i,
>> +                       spapr_xive_irq_is_lsi(xive, i) ? "LSI" : "MSI",
>>                         ive->w & IVE_MASKED ? "M" : " ",
>>                         (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>>                         (int) GETFIELD(IVE_EQ_DATA, ive->w));
>> @@ -53,6 +95,8 @@ void spapr_xive_reset(void *dev)
>>      sPAPRXive *xive = SPAPR_XIVE(dev);
>>      int i;
>>  
>> +    /* Do not clear IRQs status */
>> +
>>      /* Mask all valid IVEs in the IRQ number space. */
>>      for (i = 0; i < xive->nr_irqs; i++) {
>>          XiveIVE *ive = &xive->ivt[i];
>> @@ -71,6 +115,11 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>          return;
>>      }
>>  
>> +    /* QEMU IRQs */
>> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
>> +                                     xive->nr_irqs);
>> +    xive->status = g_malloc0(xive->nr_irqs);
>> +
>>      /* Allocate the IVT (Interrupt Virtualization Table) */
>>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>>  
>> @@ -102,6 +151,7 @@ static const VMStateDescription vmstate_spapr_xive = {
>>          VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>>          VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
>>                                             vmstate_spapr_xive_ive, XiveIVE),
>> +        VMSTATE_VBUFFER_UINT32(status, sPAPRXive, 1, NULL, nr_irqs),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> @@ -140,7 +190,7 @@ XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
>>      return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
>>  }
>>  
>> -bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
>> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi)
>>  {
>>      XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
>>  
>> @@ -149,6 +199,7 @@ bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
>>      }
>>  
>>      ive->w |= IVE_VALID;
>> +    xive->status[lisn] |= lsi ? XIVE_STATUS_LSI : 0;
> 
> How does a hardware XIVE know which irqs are LSI and which are MSI?

AFAICT, it doesn't. LSI events are configured as the other XIVE interrupts. 
The level is converted in the P bit and the Q bit should always be zero.
So I should be able to simplify the proposed model which still is mimicking 
XICS  ... I will take a look at it. 

There are a sort of special degenerated LSIs but these are for bringup.

Thanks,

C. 

>>      return true;
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 795b3f4ded7c..6a799cdaba66 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -33,11 +33,23 @@ struct sPAPRXive {
>>      /* Properties */
>>      uint32_t     nr_irqs;
>>  
>> +     /* IRQ */
>> +    qemu_irq     *qirqs;
>> +#define XIVE_STATUS_LSI                0x1
>> +#define XIVE_STATUS_ASSERTED           0x2
>> +#define XIVE_STATUS_SENT               0x4
>> +    uint8_t      *status;
>> +
>>      /* XIVE internal tables */
>>      XiveIVE      *ivt;
>>  };
>>  
>> -bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
>> +static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
>> +{
>> +    return xive->status[lisn] & XIVE_STATUS_LSI;
>> +}
>> +
>> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi);
>>  bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>>  
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags
  2017-11-28  6:40   ` David Gibson
@ 2017-11-28 18:23     ` Cédric Le Goater
  2017-12-02 14:24     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-28 18:23 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/28/2017 06:40 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:41PM +0100, Cédric Le Goater wrote:
>> The XIVE interrupt sources can have different characteristics depending
>> on their nature and the HW level in use. The sPAPR specs provide a set of
>> flags to describe them :
>>
>>  - XIVE_SRC_H_INT_ESB  the Event State Buffers are controlled with a
>>                        specific hcall H_INT_ESB and not with MMIO
>>  - XIVE_SRC_LSI        LSI or MSI source (ICSIRQState level)
>>  - XIVE_SRC_TRIGGER    the full function page supports trigger
>>  - XIVE_SRC_STORE_EOI  EOI can be done with a store.
>>
>> Our QEMU emulation of XIVE for the sPAPR machine gathers all sources under
>> a same model and provides a common source with the XIVE_SRC_TRIGGER type.
>> So, the above list is mostly informative apart from the XIVE_SRC_LSI flag
>> which will be deduced from the XIVE_STATUS_LSI flag.
>>
>> The OS retrieves this information on the source with the
>> H_INT_GET_SOURCE_INFO hcall.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 4 ++++
>>  include/hw/ppc/spapr_xive.h | 7 +++++++
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index f45f50fd017e..b1e3f8710cff 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -368,6 +368,10 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>      /* Allocate the IVT (Interrupt Virtualization Table) */
>>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>>  
>> +    /* All sources are emulated under the XIVE object and share the
>> +     * same characteristic */
>> +    xive->flags = XIVE_SRC_TRIGGER;
> 
> You never actually use this field.  And since it always has the same
> value, is there a point to storing it?

yep. not much. This is a left over. 

I will keep the defines and move the value to the XIVE hcall layer where 
it is used.

Thanks,

C.  

>> +
>>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>>      xive->sbe = g_malloc0(xive->sbe_size);
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 84c910e62e56..7a308fb4db2b 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -40,6 +40,13 @@ struct sPAPRXive {
>>  #define XIVE_STATUS_SENT               0x4
>>      uint8_t      *status;
>>  
>> +    /* Interrupt source flags */
>> +#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
>> +#define XIVE_SRC_LSI           (1ull << (63 - 61))
>> +#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
>> +#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
>> +    uint32_t     flags;
>> +
>>      /* XIVE internal tables */
>>      XiveIVE      *ivt;
>>      uint8_t      *sbe;
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-28  6:38   ` David Gibson
@ 2017-11-28 18:33     ` Cédric Le Goater
  2017-11-29  4:59       ` David Gibson
  2017-12-02 14:23     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-28 18:33 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/28/2017 06:38 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:40PM +0100, Cédric Le Goater wrote:
>> Each interrupt source is associated with a two bit state machine
>> called an Event State Buffer (ESB). The bits are named "P" (pending)
>> and "Q" (queued) and can be controlled by MMIO. It is used to trigger
>> events. See code for more details on the states and transitions.
>>
>> The MMIO space for the ESB translation is 512GB large on baremetal
>> (powernv) systems and the BAR depends on the chip id. In our model for
>> the sPAPR machine, we choose to only map a sub memory region for the
>> provisionned IRQ numbers and to use the mapping address of chip 0 on a
>> real system. The OS will get the address of the MMIO page of the ESB
>> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 268 +++++++++++++++++++++++++++++++++++++++++++-
>>  include/hw/ppc/spapr_xive.h |   8 ++
>>  2 files changed, 275 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 66c533fb1d78..f45f50fd017e 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -32,6 +32,216 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>  }
>>  
>>  /*
>> + * "magic" Event State Buffer (ESB) MMIO offsets.
>> + *
>> + * Each interrupt source has a 2-bit state machine called ESB
>> + * which can be controlled by MMIO. It's made of 2 bits, P and
>> + * Q. P indicates that an interrupt is pending (has been sent
>> + * to a queue and is waiting for an EOI). Q indicates that the
>> + * interrupt has been triggered while pending.
>> + *
>> + * This acts as a coalescing mechanism in order to guarantee
>> + * that a given interrupt only occurs at most once in a queue.
>> + *
>> + * When doing an EOI, the Q bit will indicate if the interrupt
>> + * needs to be re-triggered.
>> + *
>> + * The following offsets into the ESB MMIO allow to read or
>> + * manipulate the PQ bits. They must be used with an 8-bytes
>> + * load instruction. They all return the previous state of the
>> + * interrupt (atomically).
>> + *
>> + * Additionally, some ESB pages support doing an EOI via a
>> + * store at 0 and some ESBs support doing a trigger via a
>> + * separate trigger page.
>> + */
>> +#define XIVE_ESB_GET            0x800
>> +#define XIVE_ESB_SET_PQ_00      0xc00
>> +#define XIVE_ESB_SET_PQ_01      0xd00
>> +#define XIVE_ESB_SET_PQ_10      0xe00
>> +#define XIVE_ESB_SET_PQ_11      0xf00
>> +
>> +#define XIVE_ESB_VAL_P          0x2
>> +#define XIVE_ESB_VAL_Q          0x1
>> +
>> +#define XIVE_ESB_RESET          0x0
>> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
>> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
>> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
>> +
>> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    uint32_t byte = lisn / 4;
>> +    uint32_t bit  = (lisn % 4) * 2;
>> +
>> +    assert(byte < xive->sbe_size);
>> +
>> +    return (xive->sbe[byte] >> bit) & 0x3;
>> +}
>> +
>> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t lisn, uint8_t pq)
>> +{
>> +    uint32_t byte = lisn / 4;
>> +    uint32_t bit  = (lisn % 4) * 2;
>> +    uint8_t old, new;
>> +
>> +    assert(byte < xive->sbe_size);
>> +
>> +    old = xive->sbe[byte];
>> +
>> +    new = xive->sbe[byte] & ~(0x3 << bit);
>> +    new |= (pq & 0x3) << bit;
>> +
>> +    xive->sbe[byte] = new;
>> +
>> +    return (old >> bit) & 0x3;
>> +}
>> +
>> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
>> +
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_PENDING:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_QUEUED:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_OFF:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
>> +
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_PENDING:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
>> +        return true;
>> +    case XIVE_ESB_QUEUED:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
>> +        return true;
>> +    case XIVE_ESB_OFF:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source MMIOs
>> + */
>> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
>> +        xive->status[lisn] &= ~XIVE_STATUS_SENT;
>> +    }
>> +}
>> +
>> +/* TODO: handle second page
>> + *
>> + * Some HW use a separate page for trigger. We only support the case
>> + * in which the trigger can be done in the same page as the EOI.
>> + */
>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t lisn = addr >> xive->esb_shift;
>> +    XiveIVE *ive;
>> +    uint64_t ret = -1;
>> +
>> +    ive = spapr_xive_get_ive(xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID))  {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
>> +        goto out;
>> +    }
>> +
>> +    switch (offset) {
>> +    case 0:
>> +        spapr_xive_source_eoi(xive, lisn);
> 
> Hrm.  I don't love that you're dealing with clearing that LSI bit
> here, but setting it at a different level.
> 
> The state machines are doing my head in a bit, is there any way
> you could derive the STATUS_SENT bit from the PQ bits?

Yes. I should. 

I am also lacking a guest driver to exercise these LSIs so I didn't
pay a lot of attention to level interrupts. Any idea ?

>> +        /* return TRUE or FALSE depending on PQ value */
>> +        ret = spapr_xive_pq_eoi(xive, lisn);
>> +        break;
>> +
>> +    case XIVE_ESB_GET:
>> +        ret = spapr_xive_pq_get(xive, lisn);
>> +        break;
>> +
>> +    case XIVE_ESB_SET_PQ_00:
>> +    case XIVE_ESB_SET_PQ_01:
>> +    case XIVE_ESB_SET_PQ_10:
>> +    case XIVE_ESB_SET_PQ_11:
>> +        ret = spapr_xive_pq_set(xive, lisn, (offset >> 8) & 0x3);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
>> +                           uint64_t value, unsigned size)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t lisn = addr >> xive->esb_shift;
>> +    XiveIVE *ive;
>> +    bool notify = false;
>> +
>> +    ive = spapr_xive_get_ive(xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID))  {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
>> +        return;
>> +    }
>> +
>> +    switch (offset) {
>> +    case 0:
>> +        /* TODO: should we trigger even if the IVE is masked ? */
>> +        notify = spapr_xive_pq_trigger(xive, lisn);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
>> +                      offset);
>> +        return;
>> +    }
>> +
>> +    if (notify && !(ive->w & IVE_MASKED)) {
>> +        qemu_irq_pulse(xive->qirqs[lisn]);
>> +    }
>> +}
>> +
>> +static const MemoryRegionOps spapr_xive_esb_ops = {
>> +    .read = spapr_xive_esb_read,
>> +    .write = spapr_xive_esb_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +    .valid = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +    .impl = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>> +/*
>>   * XIVE Interrupt Source
>>   */
>>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int lisn, int val)
>> @@ -70,6 +280,33 @@ static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
>>  /*
>>   * Main XIVE object
>>   */
>> +#define P9_MMIO_BASE     0x006000000000000ull
>> +
>> +/* VC BAR contains set translations for the ESBs and the EQs. */
>> +#define VC_BAR_DEFAULT   0x10000000000ull
>> +#define VC_BAR_SIZE      0x08000000000ull
>> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
>> +
>> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
>> +                                            unsigned size)
>> +{
>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
>> +                  __func__, offset, size);
>> +    return 0;
>> +}
>> +
>> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
>> +                                         uint64_t value, unsigned size)
>> +{
>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
>> +                  __func__, offset, value, size);
>> +}
>> +
>> +static const MemoryRegionOps spapr_xive_esb_default_ops = {
>> +    .read = spapr_xive_esb_default_read,
>> +    .write = spapr_xive_esb_default_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
> 
> I think you should at least have a valid access size field here.

yes. 

>> +};
>>  
>>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>>  {
>> @@ -77,14 +314,19 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>>  
>>      for (i = 0; i < xive->nr_irqs; i++) {
>>          XiveIVE *ive = &xive->ivt[i];
>> +        uint8_t pq;
>>  
>>          if (!(ive->w & IVE_VALID)) {
>>              continue;
>>          }
>>  
>> -        monitor_printf(mon, "  %4x %s %s %08x %08x\n", i,
>> +        pq = spapr_xive_pq_get(xive, i);
>> +
>> +        monitor_printf(mon, "  %4x %s %s %c%c %08x %08x\n", i,
>>                         spapr_xive_irq_is_lsi(xive, i) ? "LSI" : "MSI",
>>                         ive->w & IVE_MASKED ? "M" : " ",
>> +                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
>> +                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
>>                         (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>>                         (int) GETFIELD(IVE_EQ_DATA, ive->w));
>>      }
>> @@ -104,6 +346,9 @@ void spapr_xive_reset(void *dev)
>>              ive->w |= IVE_MASKED;
>>          }
>>      }
>> +
>> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
>> +    memset(xive->sbe, 0x55, xive->sbe_size);
>>  }
>>  
>>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>> @@ -123,6 +368,26 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>      /* Allocate the IVT (Interrupt Virtualization Table) */
>>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>>  
>> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>> +    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>> +    xive->sbe = g_malloc0(xive->sbe_size);
>> +
>> +    /* VC BAR. That's the full window but we will only map the
>> +     * subregions in use. */
>> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
>> +    xive->esb_shift = ESB_SHIFT;
> 
> Any point to having this as a variable, if it's always the same size?

Well, it is related to the page size and we could have a HW configuration 
with two pages, with one specific page for trigger. But we don't need to 
model this, yet.  

The 'esb_shift' field is used in different places. I would rather keep it.
>> +
>> +    /* Install default memory region handlers to log bogus access */
>> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
>> +                          NULL, "xive.esb.full", VC_BAR_SIZE);
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
>> +
>> +    /* Install the ESB memory region in the overall one */
>> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
>> +                          xive, "xive.esb",
>> +                          (1ull << xive->esb_shift) * xive->nr_irqs);
>> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
> 
> Is there a benegit to to having these nested regions, rather than just
> validating the lisn in the read/write functions?

No. this was to represent the full ESB space of the system but we can 
reduce it to one region. This is not a problem I think.

>>      qemu_register_reset(spapr_xive_reset, dev);
>>  }
>>  
>> @@ -152,6 +417,7 @@ static const VMStateDescription vmstate_spapr_xive = {
>>          VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
>>                                             vmstate_spapr_xive_ive, XiveIVE),
>>          VMSTATE_VBUFFER_UINT32(status, sPAPRXive, 1, NULL, nr_irqs),
>> +        VMSTATE_VBUFFER_UINT32(sbe, sPAPRXive, 1, NULL, sbe_size),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 6a799cdaba66..84c910e62e56 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -42,6 +42,14 @@ struct sPAPRXive {
>>  
>>      /* XIVE internal tables */
>>      XiveIVE      *ivt;
>> +    uint8_t      *sbe;
>> +    uint32_t     sbe_size;
> 
> sbe_size is derivable from nr_irqs, so I don't think there's a point
> to storing it separately .

I needed the value for the vmstate macros. 

Thanks,

C. 

 
>> +
>> +    /* ESB memory region */
>> +    uint32_t     esb_shift;
>> +    hwaddr       esb_base;
>> +    MemoryRegion esb_mr;
>> +    MemoryRegion esb_iomem;
>>  };
>>  
>>  static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-28 10:44     ` Cédric Le Goater
@ 2017-11-29  4:47       ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-11-29  4:47 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 8812 bytes --]

On Tue, Nov 28, 2017 at 10:44:03AM +0000, Cédric Le Goater wrote:
> On 11/28/2017 05:40 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:38PM +0100, Cédric Le Goater wrote:
> >> The XIVE interrupt controller uses a set of tables to redirect exception
> >> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
> >> table, also known as Event Assignment Structure (EAS), is one them.
> >>
> >> The XIVE model is designed to make use of the full range of the IRQ
> >> number space and does not use an offset like the XICS mode does.
> >> Hence, the IVE table is directly indexed by the IRQ number.
> >>
> >> The IVE stores Event Queue data associated with a source. The lookups
> >> are performed when the source is configured or when an event is
> >> triggered.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  default-configs/ppc64-softmmu.mak |   1 +
> >>  hw/intc/Makefile.objs             |   1 +
> >>  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
> >>  hw/intc/xive-internal.h           |  50 ++++++++++++
> >>  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
> >>  5 files changed, 261 insertions(+)
> >>  create mode 100644 hw/intc/spapr_xive.c
> >>  create mode 100644 hw/intc/xive-internal.h
> >>  create mode 100644 include/hw/ppc/spapr_xive.h
> >>
> >> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> >> index d1b3a6dd50f8..4a7f6a0696de 100644
> >> --- a/default-configs/ppc64-softmmu.mak
> >> +++ b/default-configs/ppc64-softmmu.mak
> >> @@ -56,6 +56,7 @@ CONFIG_SM501=y
> >>  CONFIG_XICS=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> >> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> >>  # For PReP
> >>  CONFIG_SERIAL_ISA=y
> >>  CONFIG_MC146818RTC=y
> >> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> >> index ae358569a155..49e13e7aeeee 100644
> >> --- a/hw/intc/Makefile.objs
> >> +++ b/hw/intc/Makefile.objs
> >> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >>  obj-$(CONFIG_XICS) += xics.o
> >>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> >> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> >>  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> new file mode 100644
> >> index 000000000000..b2fc3007c85f
> >> --- /dev/null
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -0,0 +1,165 @@
> >> +/*
> >> + * QEMU PowerPC sPAPR XIVE model
> >> + *
> >> + * Copyright (c) 2017, IBM Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License, version 2, as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/log.h"
> >> +#include "qapi/error.h"
> >> +#include "target/ppc/cpu.h"
> >> +#include "sysemu/cpus.h"
> >> +#include "sysemu/dma.h"
> >> +#include "monitor/monitor.h"
> >> +#include "hw/ppc/spapr_xive.h"
> >> +
> >> +#include "xive-internal.h"
> >> +
> >> +/*
> >> + * Main XIVE object
> >> + */
> >> +
> >> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> >> +{
> >> +    int i;
> >> +
> >> +    for (i = 0; i < xive->nr_irqs; i++) {
> >> +        XiveIVE *ive = &xive->ivt[i];
> >> +
> >> +        if (!(ive->w & IVE_VALID)) {
> >> +            continue;
> >> +        }
> >> +
> >> +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> >> +                       ive->w & IVE_MASKED ? "M" : " ",
> >> +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> >> +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
> >> +    }
> >> +}
> >> +
> >> +void spapr_xive_reset(void *dev)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +    int i;
> >> +
> >> +    /* Mask all valid IVEs in the IRQ number space. */
> >> +    for (i = 0; i < xive->nr_irqs; i++) {
> >> +        XiveIVE *ive = &xive->ivt[i];
> >> +        if (ive->w & IVE_VALID) {
> >> +            ive->w |= IVE_MASKED;
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +
> >> +    if (!xive->nr_irqs) {
> >> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> >> +        return;
> >> +    }
> >> +
> >> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> >> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> >> +
> >> +    qemu_register_reset(spapr_xive_reset, dev);
> >> +}
> >> +
> >> +static const VMStateDescription vmstate_spapr_xive_ive = {
> >> +    .name = TYPE_SPAPR_XIVE "/ive",
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .fields = (VMStateField []) {
> >> +        VMSTATE_UINT64(w, XiveIVE),
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >> +static bool vmstate_spapr_xive_needed(void *opaque)
> >> +{
> >> +    /* TODO check machine XIVE support */
> >> +    return true;
> >> +}
> >> +
> >> +static const VMStateDescription vmstate_spapr_xive = {
> >> +    .name = TYPE_SPAPR_XIVE,
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .needed = vmstate_spapr_xive_needed,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
> >> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
> >> +                                           vmstate_spapr_xive_ive, XiveIVE),
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >> +static Property spapr_xive_properties[] = {
> >> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> >> +    DEFINE_PROP_END_OF_LIST(),
> >> +};
> >> +
> >> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
> >> +{
> >> +    DeviceClass *dc = DEVICE_CLASS(klass);
> >> +
> >> +    dc->realize = spapr_xive_realize;
> >> +    dc->props = spapr_xive_properties;
> >> +    dc->desc = "sPAPR XIVE interrupt controller";
> >> +    dc->vmsd = &vmstate_spapr_xive;
> >> +}
> >> +
> >> +static const TypeInfo spapr_xive_info = {
> >> +    .name = TYPE_SPAPR_XIVE,
> >> +    .parent = TYPE_SYS_BUS_DEVICE,
> >> +    .instance_size = sizeof(sPAPRXive),
> >> +    .class_init = spapr_xive_class_init,
> >> +};
> >> +
> >> +static void spapr_xive_register_types(void)
> >> +{
> >> +    type_register_static(&spapr_xive_info);
> >> +}
> >> +
> >> +type_init(spapr_xive_register_types)
> >> +
> >> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
> >> +}
> >> +
> >> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
> >> +
> >> +    if (!ive) {
> >> +        return false;
> >> +    }
> >> +
> >> +    ive->w |= IVE_VALID;
> >> +    return true;
> >> +}
> > 
> > As I said in another comment, I don't like the name: sets what, exactly?
> 
> may be spapr_xive_irq_alloc() would be better ? I guess so. 

Or enable/disable.

> > It's not really clear to me what the VALID bit means.  Why would an
> > irq within the allocated range be invalid?
> 
> 'enable' might be better a better choice but that is how the specs 
> refer to this bit.

Right, definitely the VALID name for the bit constant, if that's
what's in the specs.  But it still might be better to use
enable/disable for the functions to change it since that's clearer.

Unless there's something else we might want to do with the irq that
could also be called enable/disable.

> It it used by the HW to let through or stop the notification process 
> immediately when a trigger is performed on the ESB MMIOs.

Ok, so the concept does exist in hardware.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-28 18:33     ` Cédric Le Goater
@ 2017-11-29  4:59       ` David Gibson
  2017-11-29 13:56         ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-29  4:59 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 15659 bytes --]

On Tue, Nov 28, 2017 at 06:33:06PM +0000, Cédric Le Goater wrote:
> On 11/28/2017 06:38 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:40PM +0100, Cédric Le Goater wrote:
> >> Each interrupt source is associated with a two bit state machine
> >> called an Event State Buffer (ESB). The bits are named "P" (pending)
> >> and "Q" (queued) and can be controlled by MMIO. It is used to trigger
> >> events. See code for more details on the states and transitions.
> >>
> >> The MMIO space for the ESB translation is 512GB large on baremetal
> >> (powernv) systems and the BAR depends on the chip id. In our model for
> >> the sPAPR machine, we choose to only map a sub memory region for the
> >> provisionned IRQ numbers and to use the mapping address of chip 0 on a
> >> real system. The OS will get the address of the MMIO page of the ESB
> >> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive.c        | 268 +++++++++++++++++++++++++++++++++++++++++++-
> >>  include/hw/ppc/spapr_xive.h |   8 ++
> >>  2 files changed, 275 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 66c533fb1d78..f45f50fd017e 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -32,6 +32,216 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >>  }
> >>  
> >>  /*
> >> + * "magic" Event State Buffer (ESB) MMIO offsets.
> >> + *
> >> + * Each interrupt source has a 2-bit state machine called ESB
> >> + * which can be controlled by MMIO. It's made of 2 bits, P and
> >> + * Q. P indicates that an interrupt is pending (has been sent
> >> + * to a queue and is waiting for an EOI). Q indicates that the
> >> + * interrupt has been triggered while pending.
> >> + *
> >> + * This acts as a coalescing mechanism in order to guarantee
> >> + * that a given interrupt only occurs at most once in a queue.
> >> + *
> >> + * When doing an EOI, the Q bit will indicate if the interrupt
> >> + * needs to be re-triggered.
> >> + *
> >> + * The following offsets into the ESB MMIO allow to read or
> >> + * manipulate the PQ bits. They must be used with an 8-bytes
> >> + * load instruction. They all return the previous state of the
> >> + * interrupt (atomically).
> >> + *
> >> + * Additionally, some ESB pages support doing an EOI via a
> >> + * store at 0 and some ESBs support doing a trigger via a
> >> + * separate trigger page.
> >> + */
> >> +#define XIVE_ESB_GET            0x800
> >> +#define XIVE_ESB_SET_PQ_00      0xc00
> >> +#define XIVE_ESB_SET_PQ_01      0xd00
> >> +#define XIVE_ESB_SET_PQ_10      0xe00
> >> +#define XIVE_ESB_SET_PQ_11      0xf00
> >> +
> >> +#define XIVE_ESB_VAL_P          0x2
> >> +#define XIVE_ESB_VAL_Q          0x1
> >> +
> >> +#define XIVE_ESB_RESET          0x0
> >> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
> >> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> >> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
> >> +
> >> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    uint32_t byte = lisn / 4;
> >> +    uint32_t bit  = (lisn % 4) * 2;
> >> +
> >> +    assert(byte < xive->sbe_size);
> >> +
> >> +    return (xive->sbe[byte] >> bit) & 0x3;
> >> +}
> >> +
> >> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t lisn, uint8_t pq)
> >> +{
> >> +    uint32_t byte = lisn / 4;
> >> +    uint32_t bit  = (lisn % 4) * 2;
> >> +    uint8_t old, new;
> >> +
> >> +    assert(byte < xive->sbe_size);
> >> +
> >> +    old = xive->sbe[byte];
> >> +
> >> +    new = xive->sbe[byte] & ~(0x3 << bit);
> >> +    new |= (pq & 0x3) << bit;
> >> +
> >> +    xive->sbe[byte] = new;
> >> +
> >> +    return (old >> bit) & 0x3;
> >> +}
> >> +
> >> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
> >> +
> >> +    switch (old_pq) {
> >> +    case XIVE_ESB_RESET:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
> >> +        return false;
> >> +    case XIVE_ESB_PENDING:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
> >> +        return false;
> >> +    case XIVE_ESB_QUEUED:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
> >> +        return true;
> >> +    case XIVE_ESB_OFF:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
> >> +        return false;
> >> +    default:
> >> +         g_assert_not_reached();
> >> +    }
> >> +}
> >> +
> >> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
> >> +
> >> +    switch (old_pq) {
> >> +    case XIVE_ESB_RESET:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
> >> +        return true;
> >> +    case XIVE_ESB_PENDING:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
> >> +        return true;
> >> +    case XIVE_ESB_QUEUED:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
> >> +        return true;
> >> +    case XIVE_ESB_OFF:
> >> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
> >> +        return false;
> >> +    default:
> >> +         g_assert_not_reached();
> >> +    }
> >> +}
> >> +
> >> +/*
> >> + * XIVE Interrupt Source MMIOs
> >> + */
> >> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
> >> +        xive->status[lisn] &= ~XIVE_STATUS_SENT;
> >> +    }
> >> +}
> >> +
> >> +/* TODO: handle second page
> >> + *
> >> + * Some HW use a separate page for trigger. We only support the case
> >> + * in which the trigger can be done in the same page as the EOI.
> >> + */
> >> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> >> +    uint32_t offset = addr & 0xF00;
> >> +    uint32_t lisn = addr >> xive->esb_shift;
> >> +    XiveIVE *ive;
> >> +    uint64_t ret = -1;
> >> +
> >> +    ive = spapr_xive_get_ive(xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID))  {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> >> +        goto out;
> >> +    }
> >> +
> >> +    switch (offset) {
> >> +    case 0:
> >> +        spapr_xive_source_eoi(xive, lisn);
> > 
> > Hrm.  I don't love that you're dealing with clearing that LSI bit
> > here, but setting it at a different level.
> > 
> > The state machines are doing my head in a bit, is there any way
> > you could derive the STATUS_SENT bit from the PQ bits?
> 
> Yes. I should. 
> 
> I am also lacking a guest driver to exercise these LSIs so I didn't
> pay a lot of attention to level interrupts. Any idea ?

How about an old-school emulated PCI device?  Maybe rtl8139?

> >> +        /* return TRUE or FALSE depending on PQ value */
> >> +        ret = spapr_xive_pq_eoi(xive, lisn);
> >> +        break;
> >> +
> >> +    case XIVE_ESB_GET:
> >> +        ret = spapr_xive_pq_get(xive, lisn);
> >> +        break;
> >> +
> >> +    case XIVE_ESB_SET_PQ_00:
> >> +    case XIVE_ESB_SET_PQ_01:
> >> +    case XIVE_ESB_SET_PQ_10:
> >> +    case XIVE_ESB_SET_PQ_11:
> >> +        ret = spapr_xive_pq_set(xive, lisn, (offset >> 8) & 0x3);
> >> +        break;
> >> +    default:
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
> >> +    }
> >> +
> >> +out:
> >> +    return ret;
> >> +}
> >> +
> >> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
> >> +                           uint64_t value, unsigned size)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> >> +    uint32_t offset = addr & 0xF00;
> >> +    uint32_t lisn = addr >> xive->esb_shift;
> >> +    XiveIVE *ive;
> >> +    bool notify = false;
> >> +
> >> +    ive = spapr_xive_get_ive(xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID))  {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> >> +        return;
> >> +    }
> >> +
> >> +    switch (offset) {
> >> +    case 0:
> >> +        /* TODO: should we trigger even if the IVE is masked ? */
> >> +        notify = spapr_xive_pq_trigger(xive, lisn);
> >> +        break;
> >> +    default:
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> >> +                      offset);
> >> +        return;
> >> +    }
> >> +
> >> +    if (notify && !(ive->w & IVE_MASKED)) {
> >> +        qemu_irq_pulse(xive->qirqs[lisn]);
> >> +    }
> >> +}
> >> +
> >> +static const MemoryRegionOps spapr_xive_esb_ops = {
> >> +    .read = spapr_xive_esb_read,
> >> +    .write = spapr_xive_esb_write,
> >> +    .endianness = DEVICE_BIG_ENDIAN,
> >> +    .valid = {
> >> +        .min_access_size = 8,
> >> +        .max_access_size = 8,
> >> +    },
> >> +    .impl = {
> >> +        .min_access_size = 8,
> >> +        .max_access_size = 8,
> >> +    },
> >> +};
> >> +
> >> +/*
> >>   * XIVE Interrupt Source
> >>   */
> >>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int lisn, int val)
> >> @@ -70,6 +280,33 @@ static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
> >>  /*
> >>   * Main XIVE object
> >>   */
> >> +#define P9_MMIO_BASE     0x006000000000000ull
> >> +
> >> +/* VC BAR contains set translations for the ESBs and the EQs. */
> >> +#define VC_BAR_DEFAULT   0x10000000000ull
> >> +#define VC_BAR_SIZE      0x08000000000ull
> >> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
> >> +
> >> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
> >> +                                            unsigned size)
> >> +{
> >> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> >> +                  __func__, offset, size);
> >> +    return 0;
> >> +}
> >> +
> >> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
> >> +                                         uint64_t value, unsigned size)
> >> +{
> >> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> >> +                  __func__, offset, value, size);
> >> +}
> >> +
> >> +static const MemoryRegionOps spapr_xive_esb_default_ops = {
> >> +    .read = spapr_xive_esb_default_read,
> >> +    .write = spapr_xive_esb_default_write,
> >> +    .endianness = DEVICE_BIG_ENDIAN,
> > 
> > I think you should at least have a valid access size field here.
> 
> yes. 
> 
> >> +};
> >>  
> >>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> >>  {
> >> @@ -77,14 +314,19 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> >>  
> >>      for (i = 0; i < xive->nr_irqs; i++) {
> >>          XiveIVE *ive = &xive->ivt[i];
> >> +        uint8_t pq;
> >>  
> >>          if (!(ive->w & IVE_VALID)) {
> >>              continue;
> >>          }
> >>  
> >> -        monitor_printf(mon, "  %4x %s %s %08x %08x\n", i,
> >> +        pq = spapr_xive_pq_get(xive, i);
> >> +
> >> +        monitor_printf(mon, "  %4x %s %s %c%c %08x %08x\n", i,
> >>                         spapr_xive_irq_is_lsi(xive, i) ? "LSI" : "MSI",
> >>                         ive->w & IVE_MASKED ? "M" : " ",
> >> +                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
> >> +                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
> >>                         (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> >>                         (int) GETFIELD(IVE_EQ_DATA, ive->w));
> >>      }
> >> @@ -104,6 +346,9 @@ void spapr_xive_reset(void *dev)
> >>              ive->w |= IVE_MASKED;
> >>          }
> >>      }
> >> +
> >> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
> >> +    memset(xive->sbe, 0x55, xive->sbe_size);
> >>  }
> >>  
> >>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >> @@ -123,6 +368,26 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >>      /* Allocate the IVT (Interrupt Virtualization Table) */
> >>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> >>  
> >> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
> >> +    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
> >> +    xive->sbe = g_malloc0(xive->sbe_size);
> >> +
> >> +    /* VC BAR. That's the full window but we will only map the
> >> +     * subregions in use. */
> >> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
> >> +    xive->esb_shift = ESB_SHIFT;
> > 
> > Any point to having this as a variable, if it's always the same size?
> 
> Well, it is related to the page size and we could have a HW configuration 
> with two pages, with one specific page for trigger. But we don't need to 
> model this, yet.  
> 
> The 'esb_shift' field is used in different places. I would rather
> keep it.

Hm, ok.

> >> +
> >> +    /* Install default memory region handlers to log bogus access */
> >> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
> >> +                          NULL, "xive.esb.full", VC_BAR_SIZE);
> >> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
> >> +
> >> +    /* Install the ESB memory region in the overall one */
> >> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
> >> +                          xive, "xive.esb",
> >> +                          (1ull << xive->esb_shift) * xive->nr_irqs);
> >> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
> > 
> > Is there a benegit to to having these nested regions, rather than just
> > validating the lisn in the read/write functions?
> 
> No. this was to represent the full ESB space of the system but we can 
> reduce it to one region. This is not a problem I think.
> 
> >>      qemu_register_reset(spapr_xive_reset, dev);
> >>  }
> >>  
> >> @@ -152,6 +417,7 @@ static const VMStateDescription vmstate_spapr_xive = {
> >>          VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
> >>                                             vmstate_spapr_xive_ive, XiveIVE),
> >>          VMSTATE_VBUFFER_UINT32(status, sPAPRXive, 1, NULL, nr_irqs),
> >> +        VMSTATE_VBUFFER_UINT32(sbe, sPAPRXive, 1, NULL, sbe_size),
> >>          VMSTATE_END_OF_LIST()
> >>      },
> >>  };
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> index 6a799cdaba66..84c910e62e56 100644
> >> --- a/include/hw/ppc/spapr_xive.h
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -42,6 +42,14 @@ struct sPAPRXive {
> >>  
> >>      /* XIVE internal tables */
> >>      XiveIVE      *ivt;
> >> +    uint8_t      *sbe;
> >> +    uint32_t     sbe_size;
> > 
> > sbe_size is derivable from nr_irqs, so I don't think there's a point
> > to storing it separately .
> 
> I needed the value for the vmstate macros.

Ah, right.


> 
> Thanks,
> 
> C. 
> 
>  
> >> +
> >> +    /* ESB memory region */
> >> +    uint32_t     esb_shift;
> >> +    hwaddr       esb_base;
> >> +    MemoryRegion esb_mr;
> >> +    MemoryRegion esb_iomem;
> >>  };
> >>  
> >>  static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model Cédric Le Goater
@ 2017-11-29  5:11   ` David Gibson
  2017-11-29  9:55     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-29  5:11 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 18319 bytes --]

On Thu, Nov 23, 2017 at 02:29:42PM +0100, Cédric Le Goater wrote:
> The XIVE interrupt presenter exposes a set of rings, also called
> Thread Interrupt Management Areas (TIMA), to handle priority
> management and interrupt acknowledgment among other things. There is
> one ring per level of privilege, four in all. The one we are
> interested in for the sPAPR machine is the OS ring.
> 
> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
> used to retrieve the targeted interrupt presenter object holding the
> cache data of the registers the model use.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 271 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive-internal.h     |  89 +++++++++++++++
>  include/hw/ppc/spapr_xive.h |  11 ++
>  3 files changed, 371 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index b1e3f8710cff..554b25e0884c 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -23,9 +23,166 @@
>  #include "sysemu/dma.h"
>  #include "monitor/monitor.h"
>  #include "hw/ppc/spapr_xive.h"
> +#include "hw/ppc/xics.h"
>  
>  #include "xive-internal.h"
>  
> +struct sPAPRXiveICP {

I'd really prefer to avoid calling anything in xive "icp" to avoid
confusion with xics.

> +    DeviceState parent_obj;
> +
> +    CPUState  *cs;
> +    uint8_t   tima[TM_RING_COUNT * 0x10];

What does the 0x10 represent?  #define for clarity, maybe.

Do we need to model the whole range as memory, or just the relevant
pieces with read/write meaning?

> +    uint8_t   *tima_os;
> +    qemu_irq  output;
> +};
> +
> +static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> +{
> +    return 0;
> +}
> +
> +static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> +{
> +    if (cppr > XIVE_PRIORITY_MAX) {
> +        cppr = 0xff;
> +    }
> +
> +    icp->tima_os[TM_CPPR] = cppr;
> +}
> +
> +/*
> + * Thread Interrupt Management Area MMIO
> + */
> +static uint64_t spapr_xive_tm_read_special(sPAPRXiveICP *icp, hwaddr offset,
> +                                     unsigned size)
> +{
> +    uint64_t ret = -1;
> +
> +    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
> +        ret = spapr_xive_icp_accept(icp);
> +    } else {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
> +                      HWADDR_PRIx" size %d\n", offset, size);
> +    }
> +
> +    return ret;
> +}
> +
> +static uint64_t spapr_xive_tm_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);

So, strictly speaking this could be handled by setting each of the
CPUs address spaces separately, to something with their own TIMA
superimposed on address_space_memory.  What you have might be more
practical though.

> +    sPAPRXiveICP *icp = SPAPR_XIVE_ICP(cpu->intc);
> +    uint64_t ret = -1;
> +    int i;
> +
> +    if (offset >= TM_SPC_ACK_EBB) {
> +        return spapr_xive_tm_read_special(icp, offset, size);
> +    }
> +
> +    if ((offset & 0xf0) == TM_QW1_OS) {
> +        switch (size) {
> +        case 1:
> +        case 2:
> +        case 4:
> +        case 8:
> +            if (QEMU_IS_ALIGNED(offset, size)) {

Hm, the MR subsystem doesn't already split unaligned accesses?

> +                ret = 0;
> +                for (i = 0; i < size; i++) {
> +                    ret |= icp->tima[offset + i] << (8 * i);
> +                }
> +            } else {
> +                qemu_log_mask(LOG_GUEST_ERROR,
> +                              "XIVE: invalid TIMA read alignment @%"
> +                              HWADDR_PRIx" size %d\n", offset, size);
> +            }
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
> +                      HWADDR_PRIx"\n", offset);
> +    }
> +
> +    return ret;
> +}
> +
> +static bool spapr_xive_tm_is_readonly(uint8_t offset)
> +{
> +    /* Let's be optimistic and prepare ground for HV mode support */
> +    switch (offset) {
> +    case TM_QW1_OS + TM_CPPR:
> +        return false;
> +    default:
> +        return true;
> +    }
> +}
> +
> +static void spapr_xive_tm_write_special(sPAPRXiveICP *icp, hwaddr offset,
> +                                  uint64_t value, unsigned size)
> +{
> +    /* TODO: support TM_SPC_SET_OS_PENDING */
> +
> +    /* TODO: support TM_SPC_ACK_OS_EL */
> +}
> +
> +static void spapr_xive_tm_write(void *opaque, hwaddr offset,
> +                           uint64_t value, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> +    sPAPRXiveICP *icp = SPAPR_XIVE_ICP(cpu->intc);
> +    int i;
> +
> +    if (offset >= TM_SPC_ACK_EBB) {
> +        spapr_xive_tm_write_special(icp, offset, value, size);
> +        return;
> +    }
> +
> +    if ((offset & 0xf0) == TM_QW1_OS) {
> +        switch (size) {
> +        case 1:
> +            if (offset == TM_QW1_OS + TM_CPPR) {
> +                spapr_xive_icp_set_cppr(icp, value & 0xff);
> +            }
> +            break;
> +        case 4:
> +        case 8:
> +            if (QEMU_IS_ALIGNED(offset, size)) {
> +                for (i = 0; i < size; i++) {
> +                    if (!spapr_xive_tm_is_readonly(offset + i)) {
> +                        icp->tima[offset + i] = (value >> (8 * i)) & 0xff;
> +                    }
> +                }
> +            } else {
> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
> +                              HWADDR_PRIx" size %d\n", offset, size);
> +            }
> +            break;
> +        default:
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
> +                          HWADDR_PRIx" size %d\n", offset, size);
> +        }
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
> +                      HWADDR_PRIx"\n", offset);

The many qemu_log()s worry me a little.  They're not ratelimited, so
the guest could in principle chew through the host's log space.

IIUC these are very unlikely to be hit in practice, so maybe
tracepoints would be more suitable.

> +    }
> +}
> +
> +
> +static const MemoryRegionOps spapr_xive_tm_ops = {
> +    .read = spapr_xive_tm_read,
> +    .write = spapr_xive_tm_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +};
> +
>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>  {
>  
> @@ -287,6 +444,11 @@ static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
>  #define VC_BAR_SIZE      0x08000000000ull
>  #define ESB_SHIFT        16 /* One 64k page. OPAL has two */
>  
> +/* Thread Interrupt Management Area MMIO */
> +#define TM_BAR_DEFAULT   0x30203180000ull
> +#define TM_SHIFT         16
> +#define TM_BAR_SIZE      (TM_RING_COUNT * (1 << TM_SHIFT))
> +
>  static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
>                                              unsigned size)
>  {
> @@ -392,6 +554,14 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>                            (1ull << xive->esb_shift) * xive->nr_irqs);
>      memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
>  
> +    /* TM BAR. Same address for each chip */
> +    xive->tm_base = (P9_MMIO_BASE | TM_BAR_DEFAULT);
> +    xive->tm_shift = TM_SHIFT;

Any reason for this to be a variable?

> +
> +    memory_region_init_io(&xive->tm_iomem, OBJECT(xive), &spapr_xive_tm_ops,
> +                          xive, "xive.tm", TM_BAR_SIZE);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_iomem);
> +
>      qemu_register_reset(spapr_xive_reset, dev);
>  }
>  
> @@ -448,9 +618,110 @@ static const TypeInfo spapr_xive_info = {
>      .class_init = spapr_xive_class_init,
>  };
>  
> +void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon)
> +{
> +    int cpu_index = xicp->cs ? xicp->cs->cpu_index : -1;
> +
> +    monitor_printf(mon, "CPU %d CPPR=%02x IPB=%02x PIPR=%02x NSR=%02x\n",
> +                   cpu_index, xicp->tima_os[TM_CPPR], xicp->tima_os[TM_IPB],
> +                   xicp->tima_os[TM_PIPR], xicp->tima_os[TM_NSR]);
> +}
> +
> +static void spapr_xive_icp_reset(void *dev)
> +{
> +    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
> +
> +    memset(xicp->tima, 0, sizeof(xicp->tima));
> +}
> +
> +static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
> +{
> +    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
> +    PowerPCCPU *cpu;
> +    CPUPPCState *env;
> +    Object *obj;
> +    Error *err = NULL;
> +
> +    obj = object_property_get_link(OBJECT(dev), ICP_PROP_CPU, &err);
> +    if (!obj) {
> +        error_propagate(errp, err);
> +        error_prepend(errp, "required link '" ICP_PROP_CPU "' not found: ");
> +        return;
> +    }
> +
> +    cpu = POWERPC_CPU(obj);
> +    xicp->cs = CPU(obj);
> +
> +    env = &cpu->env;
> +    switch (PPC_INPUT(env)) {
> +    case PPC_FLAGS_INPUT_POWER7:
> +        xicp->output = env->irq_inputs[POWER7_INPUT_INT];
> +        break;
> +
> +    case PPC_FLAGS_INPUT_970:
> +        xicp->output = env->irq_inputs[PPC970_INPUT_INT];
> +        break;

I really don't think we need to implement XIVE for 970.

> +
> +    default:
> +        error_setg(errp, "XIVE interrupt controller does not support "
> +                   "this CPU bus model");
> +        return;
> +    }
> +
> +    qemu_register_reset(spapr_xive_icp_reset, dev);
> +}
> +
> +static void spapr_xive_icp_unrealize(DeviceState *dev, Error **errp)
> +{
> +    qemu_unregister_reset(spapr_xive_icp_reset, dev);
> +}
> +
> +static void spapr_xive_icp_init(Object *obj)
> +{
> +    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(obj);
> +
> +    xicp->tima_os = &xicp->tima[TM_QW1_OS];

This is a fixed offset, so why store it as a pointer.  For the PAPR
guest case, do we even need to model the other rings?

> +}
> +
> +static bool vmstate_spapr_xive_icp_needed(void *opaque)
> +{
> +    /* TODO check machine XIVE support */
> +    return true;
> +}
> +
> +static const VMStateDescription vmstate_spapr_xive_icp = {
> +    .name = TYPE_SPAPR_XIVE_ICP,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = vmstate_spapr_xive_icp_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_BUFFER(tima, sPAPRXiveICP),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static void spapr_xive_icp_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->realize = spapr_xive_icp_realize;
> +    dc->unrealize = spapr_xive_icp_unrealize;
> +    dc->desc = "sPAPR XIVE Interrupt Presenter";
> +    dc->vmsd = &vmstate_spapr_xive_icp;
> +}
> +
> +static const TypeInfo xive_icp_info = {
> +    .name          = TYPE_SPAPR_XIVE_ICP,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(sPAPRXiveICP),
> +    .instance_init = spapr_xive_icp_init,
> +    .class_init    = spapr_xive_icp_class_init,
> +};
> +
>  static void spapr_xive_register_types(void)
>  {
>      type_register_static(&spapr_xive_info);
> +    type_register_static(&xive_icp_info);
>  }
>  
>  type_init(spapr_xive_register_types)
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index bea88d82992c..7d329f203a9b 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -24,6 +24,93 @@
>  #define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>                                   PPC_BIT32(bs))
>  
> +/*
> + * Thread Management (aka "TM") registers

Because "TM" didn't stand for enough things already :/.

> + */
> +
> +/* Number of Thread Management Interrupt Areas */
> +#define TM_RING_COUNT 4
> +
> +/* TM register offsets */
> +#define TM_QW0_USER             0x000 /* All rings */
> +#define TM_QW1_OS               0x010 /* Ring 0..2 */
> +#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
> +#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
> +
> +/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
> +#define TM_NSR                  0x0  /*  +   +   -   +  */
> +#define TM_CPPR                 0x1  /*  -   +   -   +  */
> +#define TM_IPB                  0x2  /*  -   +   +   +  */
> +#define TM_LSMFB                0x3  /*  -   +   +   +  */
> +#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
> +#define TM_INC                  0x5  /*  -   +   -   +  */
> +#define TM_AGE                  0x6  /*  -   +   -   +  */
> +#define TM_PIPR                 0x7  /*  -   +   -   +  */
> +
> +#define TM_WORD0                0x0
> +#define TM_WORD1                0x4
> +
> +/*
> + * QW word 2 contains the valid bit at the top and other fields
> + * depending on the QW.
> + */
> +#define TM_WORD2                0x8
> +#define   TM_QW0W2_VU           PPC_BIT32(0)
> +#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
> +#define   TM_QW1W2_VO           PPC_BIT32(0)
> +#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
> +#define   TM_QW2W2_VP           PPC_BIT32(0)
> +#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
> +#define   TM_QW3W2_VT           PPC_BIT32(0)
> +#define   TM_QW3W2_LP           PPC_BIT32(6)
> +#define   TM_QW3W2_LE           PPC_BIT32(7)
> +#define   TM_QW3W2_T            PPC_BIT32(31)
> +
> +/*
> + * In addition to normal loads to "peek" and writes (only when invalid)
> + * using 4 and 8 bytes accesses, the above registers support these
> + * "special" byte operations:
> + *
> + *   - Byte load from QW0[NSR] - User level NSR (EBB)
> + *   - Byte store to QW0[NSR] - User level NSR (EBB)
> + *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
> + *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
> + *                                    otherwise VT||0000000
> + *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
> + *
> + * Then we have all these "special" CI ops at these offset that trigger
> + * all sorts of side effects:
> + */
> +#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
> +#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
> +#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
> +#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
> +                                         * context */
> +#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
> +#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
> +                                         * context to reg */
> +#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
> +                                         * context to reg*/
> +#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
> +#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
> +                                         * line */
> +#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
> +#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
> +                                         * line */
> +#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
> +/* XXX more... */
> +
> +/* NSR fields for the various QW ack types */
> +#define TM_QW0_NSR_EB           PPC_BIT8(0)
> +#define TM_QW1_NSR_EO           PPC_BIT8(0)
> +#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
> +#define  TM_QW3_NSR_HE_NONE     0
> +#define  TM_QW3_NSR_HE_POOL     1
> +#define  TM_QW3_NSR_HE_PHYS     2
> +#define  TM_QW3_NSR_HE_LSI      3
> +#define TM_QW3_NSR_I            PPC_BIT8(2)
> +#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
> +
>  /* IVE/EAS
>   *
>   * One per interrupt source. Targets that interrupt to a given EQ
> @@ -44,6 +131,8 @@ typedef struct XiveIVE {
>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>  } XiveIVE;
>  
> +#define XIVE_PRIORITY_MAX  7
> +
>  void spapr_xive_reset(void *dev);
>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
>  
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 7a308fb4db2b..6e8a189e723f 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -23,10 +23,15 @@
>  
>  typedef struct sPAPRXive sPAPRXive;
>  typedef struct XiveIVE XiveIVE;
> +typedef struct sPAPRXiveICP sPAPRXiveICP;
>  
>  #define TYPE_SPAPR_XIVE "spapr-xive"
>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
>  
> +#define TYPE_SPAPR_XIVE_ICP "spapr-xive-icp"
> +#define SPAPR_XIVE_ICP(obj) \
> +    OBJECT_CHECK(sPAPRXiveICP, (obj), TYPE_SPAPR_XIVE_ICP)
> +
>  struct sPAPRXive {
>      SysBusDevice parent;
>  
> @@ -57,6 +62,11 @@ struct sPAPRXive {
>      hwaddr       esb_base;
>      MemoryRegion esb_mr;
>      MemoryRegion esb_iomem;
> +
> +    /* TIMA memory region */
> +    uint32_t     tm_shift;
> +    hwaddr       tm_base;
> +    MemoryRegion tm_iomem;
>  };
>  
>  static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
> @@ -67,5 +77,6 @@ static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
>  bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi);
>  bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> +void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
>  
>  #endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model
  2017-11-29  5:11   ` David Gibson
@ 2017-11-29  9:55     ` Cédric Le Goater
  2017-11-30  4:06       ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-29  9:55 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/29/2017 06:11 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:42PM +0100, Cédric Le Goater wrote:
>> The XIVE interrupt presenter exposes a set of rings, also called
>> Thread Interrupt Management Areas (TIMA), to handle priority
>> management and interrupt acknowledgment among other things. There is
>> one ring per level of privilege, four in all. The one we are
>> interested in for the sPAPR machine is the OS ring.
>>
>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
>> used to retrieve the targeted interrupt presenter object holding the
>> cache data of the registers the model use.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 271 ++++++++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive-internal.h     |  89 +++++++++++++++
>>  include/hw/ppc/spapr_xive.h |  11 ++
>>  3 files changed, 371 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index b1e3f8710cff..554b25e0884c 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -23,9 +23,166 @@
>>  #include "sysemu/dma.h"
>>  #include "monitor/monitor.h"
>>  #include "hw/ppc/spapr_xive.h"
>> +#include "hw/ppc/xics.h"
>>  
>>  #include "xive-internal.h"
>>  
>> +struct sPAPRXiveICP {
> 
> I'd really prefer to avoid calling anything in xive "icp" to avoid
> confusion with xics.

OK. 

The specs refers to the whole as an IVPE : Interrupt Virtualization 
Presentation Engine. In our model, we use the TIMA cached values of 
the OS ring and the qemu_irq for the CPU line. 

Would 'sPAPRXivePresenter' be fine ?  


>> +    DeviceState parent_obj;
>> +
>> +    CPUState  *cs;
>> +    uint8_t   tima[TM_RING_COUNT * 0x10];
> 
> What does the 0x10 represent?  #define for clarity, maybe.

yes.

> Do we need to model the whole range as memory, or just the relevant
> pieces with read/write meaning?

Yes. we could limit the TIMA and MMIO region to what sPAPR only needs : 
the OS ring. 

> 
>> +    uint8_t   *tima_os;
>> +    qemu_irq  output;
>> +};
>> +
>> +static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>> +{
>> +    return 0;
>> +}
>> +
>> +static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
>> +{
>> +    if (cppr > XIVE_PRIORITY_MAX) {
>> +        cppr = 0xff;
>> +    }
>> +
>> +    icp->tima_os[TM_CPPR] = cppr;
>> +}
>> +
>> +/*
>> + * Thread Interrupt Management Area MMIO
>> + */
>> +static uint64_t spapr_xive_tm_read_special(sPAPRXiveICP *icp, hwaddr offset,
>> +                                     unsigned size)
>> +{
>> +    uint64_t ret = -1;
>> +
>> +    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
>> +        ret = spapr_xive_icp_accept(icp);
>> +    } else {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
>> +                      HWADDR_PRIx" size %d\n", offset, size);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static uint64_t spapr_xive_tm_read(void *opaque, hwaddr offset, unsigned size)
>> +{
>> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> 
> So, strictly speaking this could be handled by setting each of the
> CPUs address spaces separately, to something with their own TIMA
> superimposed on address_space_memory. 

Ah. I didn't know we could do that.

> What you have might be more practical though.

well, you will see at the end of the patchset how cpu->intc is assigned.

>> +    sPAPRXiveICP *icp = SPAPR_XIVE_ICP(cpu->intc);
>> +    uint64_t ret = -1;
>> +    int i;
>> +
>> +    if (offset >= TM_SPC_ACK_EBB) {
>> +        return spapr_xive_tm_read_special(icp, offset, size);
>> +    }
>> +
>> +    if ((offset & 0xf0) == TM_QW1_OS) {
>> +        switch (size) {
>> +        case 1:
>> +        case 2:
>> +        case 4:
>> +        case 8:
>> +            if (QEMU_IS_ALIGNED(offset, size)) {
> 
> Hm, the MR subsystem doesn't already split unaligned accesses?

euh. yes, I might be doing a little too much.

>> +                ret = 0;
>> +                for (i = 0; i < size; i++) {
>> +                    ret |= icp->tima[offset + i] << (8 * i);
>> +                }
>> +            } else {
>> +                qemu_log_mask(LOG_GUEST_ERROR,
>> +                              "XIVE: invalid TIMA read alignment @%"
>> +                              HWADDR_PRIx" size %d\n", offset, size);
>> +            }
>> +            break;
>> +        default:
>> +            g_assert_not_reached();
>> +        }
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
>> +                      HWADDR_PRIx"\n", offset);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static bool spapr_xive_tm_is_readonly(uint8_t offset)
>> +{
>> +    /* Let's be optimistic and prepare ground for HV mode support */
>> +    switch (offset) {
>> +    case TM_QW1_OS + TM_CPPR:
>> +        return false;
>> +    default:
>> +        return true;
>> +    }
>> +}
>> +
>> +static void spapr_xive_tm_write_special(sPAPRXiveICP *icp, hwaddr offset,
>> +                                  uint64_t value, unsigned size)
>> +{
>> +    /* TODO: support TM_SPC_SET_OS_PENDING */
>> +
>> +    /* TODO: support TM_SPC_ACK_OS_EL */
>> +}
>> +
>> +static void spapr_xive_tm_write(void *opaque, hwaddr offset,
>> +                           uint64_t value, unsigned size)
>> +{
>> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
>> +    sPAPRXiveICP *icp = SPAPR_XIVE_ICP(cpu->intc);
>> +    int i;
>> +
>> +    if (offset >= TM_SPC_ACK_EBB) {
>> +        spapr_xive_tm_write_special(icp, offset, value, size);
>> +        return;
>> +    }
>> +
>> +    if ((offset & 0xf0) == TM_QW1_OS) {
>> +        switch (size) {
>> +        case 1:
>> +            if (offset == TM_QW1_OS + TM_CPPR) {
>> +                spapr_xive_icp_set_cppr(icp, value & 0xff);
>> +            }
>> +            break;
>> +        case 4:
>> +        case 8:
>> +            if (QEMU_IS_ALIGNED(offset, size)) {
>> +                for (i = 0; i < size; i++) {
>> +                    if (!spapr_xive_tm_is_readonly(offset + i)) {
>> +                        icp->tima[offset + i] = (value >> (8 * i)) & 0xff;
>> +                    }
>> +                }
>> +            } else {
>> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
>> +                              HWADDR_PRIx" size %d\n", offset, size);
>> +            }
>> +            break;
>> +        default:
>> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
>> +                          HWADDR_PRIx" size %d\n", offset, size);
>> +        }
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
>> +                      HWADDR_PRIx"\n", offset);
> 
> The many qemu_log()s worry me a little.  They're not ratelimited, so
> the guest could in principle chew through the host's log space.
> 
> IIUC these are very unlikely to be hit in practice, so maybe
> tracepoints would be more suitable.

ok.

>> +    }
>> +}
>> +
>> +
>> +static const MemoryRegionOps spapr_xive_tm_ops = {
>> +    .read = spapr_xive_tm_read,
>> +    .write = spapr_xive_tm_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +    .valid = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +    },
>> +    .impl = {
>> +        .min_access_size = 1,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>  {
>>  
>> @@ -287,6 +444,11 @@ static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
>>  #define VC_BAR_SIZE      0x08000000000ull
>>  #define ESB_SHIFT        16 /* One 64k page. OPAL has two */
>>  
>> +/* Thread Interrupt Management Area MMIO */
>> +#define TM_BAR_DEFAULT   0x30203180000ull
>> +#define TM_SHIFT         16
>> +#define TM_BAR_SIZE      (TM_RING_COUNT * (1 << TM_SHIFT))
>> +
>>  static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
>>                                              unsigned size)
>>  {
>> @@ -392,6 +554,14 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>                            (1ull << xive->esb_shift) * xive->nr_irqs);
>>      memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
>>  
>> +    /* TM BAR. Same address for each chip */
>> +    xive->tm_base = (P9_MMIO_BASE | TM_BAR_DEFAULT);
>> +    xive->tm_shift = TM_SHIFT;
> 
> Any reason for this to be a variable?

no, we could just use TM_SHIFT. I will look into it.

>> +
>> +    memory_region_init_io(&xive->tm_iomem, OBJECT(xive), &spapr_xive_tm_ops,
>> +                          xive, "xive.tm", TM_BAR_SIZE);
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_iomem);
>> +
>>      qemu_register_reset(spapr_xive_reset, dev);
>>  }
>>  
>> @@ -448,9 +618,110 @@ static const TypeInfo spapr_xive_info = {
>>      .class_init = spapr_xive_class_init,
>>  };
>>  
>> +void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon)
>> +{
>> +    int cpu_index = xicp->cs ? xicp->cs->cpu_index : -1;
>> +
>> +    monitor_printf(mon, "CPU %d CPPR=%02x IPB=%02x PIPR=%02x NSR=%02x\n",
>> +                   cpu_index, xicp->tima_os[TM_CPPR], xicp->tima_os[TM_IPB],
>> +                   xicp->tima_os[TM_PIPR], xicp->tima_os[TM_NSR]);
>> +}
>> +
>> +static void spapr_xive_icp_reset(void *dev)
>> +{
>> +    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>> +
>> +    memset(xicp->tima, 0, sizeof(xicp->tima));
>> +}
>> +
>> +static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
>> +{
>> +    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>> +    PowerPCCPU *cpu;
>> +    CPUPPCState *env;
>> +    Object *obj;
>> +    Error *err = NULL;
>> +
>> +    obj = object_property_get_link(OBJECT(dev), ICP_PROP_CPU, &err);
>> +    if (!obj) {
>> +        error_propagate(errp, err);
>> +        error_prepend(errp, "required link '" ICP_PROP_CPU "' not found: ");
>> +        return;
>> +    }
>> +
>> +    cpu = POWERPC_CPU(obj);
>> +    xicp->cs = CPU(obj);
>> +
>> +    env = &cpu->env;
>> +    switch (PPC_INPUT(env)) {
>> +    case PPC_FLAGS_INPUT_POWER7:
>> +        xicp->output = env->irq_inputs[POWER7_INPUT_INT];
>> +        break;
>> +
>> +    case PPC_FLAGS_INPUT_970:
>> +        xicp->output = env->irq_inputs[PPC970_INPUT_INT];
>> +        break;
> 
> I really don't think we need to implement XIVE for 970.

Indeed. This is a left over from a copy/paste of the ICPState 
realize routine.

>> +
>> +    default:
>> +        error_setg(errp, "XIVE interrupt controller does not support "
>> +                   "this CPU bus model");
>> +        return;
>> +    }
>> +
>> +    qemu_register_reset(spapr_xive_icp_reset, dev);
>> +}
>> +
>> +static void spapr_xive_icp_unrealize(DeviceState *dev, Error **errp)
>> +{
>> +    qemu_unregister_reset(spapr_xive_icp_reset, dev);
>> +}
>> +
>> +static void spapr_xive_icp_init(Object *obj)
>> +{
>> +    sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(obj);
>> +
>> +    xicp->tima_os = &xicp->tima[TM_QW1_OS];
> 
> This is a fixed offset, so why store it as a pointer.  For the PAPR
> guest case, do we even need to model the other rings?

No we don't. I will simplify.

Thanks,

C. 

> 
>> +}
>> +
>> +static bool vmstate_spapr_xive_icp_needed(void *opaque)
>> +{
>> +    /* TODO check machine XIVE support */
>> +    return true;
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_xive_icp = {
>> +    .name = TYPE_SPAPR_XIVE_ICP,
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = vmstate_spapr_xive_icp_needed,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_BUFFER(tima, sPAPRXiveICP),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static void spapr_xive_icp_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +    dc->realize = spapr_xive_icp_realize;
>> +    dc->unrealize = spapr_xive_icp_unrealize;
>> +    dc->desc = "sPAPR XIVE Interrupt Presenter";
>> +    dc->vmsd = &vmstate_spapr_xive_icp;
>> +}
>> +
>> +static const TypeInfo xive_icp_info = {
>> +    .name          = TYPE_SPAPR_XIVE_ICP,
>> +    .parent        = TYPE_DEVICE,
>> +    .instance_size = sizeof(sPAPRXiveICP),
>> +    .instance_init = spapr_xive_icp_init,
>> +    .class_init    = spapr_xive_icp_class_init,
>> +};
>> +
>>  static void spapr_xive_register_types(void)
>>  {
>>      type_register_static(&spapr_xive_info);
>> +    type_register_static(&xive_icp_info);
>>  }
>>  
>>  type_init(spapr_xive_register_types)
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> index bea88d82992c..7d329f203a9b 100644
>> --- a/hw/intc/xive-internal.h
>> +++ b/hw/intc/xive-internal.h
>> @@ -24,6 +24,93 @@
>>  #define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>>                                   PPC_BIT32(bs))
>>  
>> +/*
>> + * Thread Management (aka "TM") registers
> 
> Because "TM" didn't stand for enough things already :/.
> 
>> + */
>> +
>> +/* Number of Thread Management Interrupt Areas */
>> +#define TM_RING_COUNT 4
>> +
>> +/* TM register offsets */
>> +#define TM_QW0_USER             0x000 /* All rings */
>> +#define TM_QW1_OS               0x010 /* Ring 0..2 */
>> +#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
>> +#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
>> +
>> +/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
>> +#define TM_NSR                  0x0  /*  +   +   -   +  */
>> +#define TM_CPPR                 0x1  /*  -   +   -   +  */
>> +#define TM_IPB                  0x2  /*  -   +   +   +  */
>> +#define TM_LSMFB                0x3  /*  -   +   +   +  */
>> +#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
>> +#define TM_INC                  0x5  /*  -   +   -   +  */
>> +#define TM_AGE                  0x6  /*  -   +   -   +  */
>> +#define TM_PIPR                 0x7  /*  -   +   -   +  */
>> +
>> +#define TM_WORD0                0x0
>> +#define TM_WORD1                0x4
>> +
>> +/*
>> + * QW word 2 contains the valid bit at the top and other fields
>> + * depending on the QW.
>> + */
>> +#define TM_WORD2                0x8
>> +#define   TM_QW0W2_VU           PPC_BIT32(0)
>> +#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
>> +#define   TM_QW1W2_VO           PPC_BIT32(0)
>> +#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
>> +#define   TM_QW2W2_VP           PPC_BIT32(0)
>> +#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
>> +#define   TM_QW3W2_VT           PPC_BIT32(0)
>> +#define   TM_QW3W2_LP           PPC_BIT32(6)
>> +#define   TM_QW3W2_LE           PPC_BIT32(7)
>> +#define   TM_QW3W2_T            PPC_BIT32(31)
>> +
>> +/*
>> + * In addition to normal loads to "peek" and writes (only when invalid)
>> + * using 4 and 8 bytes accesses, the above registers support these
>> + * "special" byte operations:
>> + *
>> + *   - Byte load from QW0[NSR] - User level NSR (EBB)
>> + *   - Byte store to QW0[NSR] - User level NSR (EBB)
>> + *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
>> + *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
>> + *                                    otherwise VT||0000000
>> + *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
>> + *
>> + * Then we have all these "special" CI ops at these offset that trigger
>> + * all sorts of side effects:
>> + */
>> +#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
>> +#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
>> +#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
>> +#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
>> +                                         * context */
>> +#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
>> +#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
>> +                                         * context to reg */
>> +#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
>> +                                         * context to reg*/
>> +#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
>> +#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
>> +                                         * line */
>> +#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
>> +#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
>> +                                         * line */
>> +#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
>> +/* XXX more... */
>> +
>> +/* NSR fields for the various QW ack types */
>> +#define TM_QW0_NSR_EB           PPC_BIT8(0)
>> +#define TM_QW1_NSR_EO           PPC_BIT8(0)
>> +#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
>> +#define  TM_QW3_NSR_HE_NONE     0
>> +#define  TM_QW3_NSR_HE_POOL     1
>> +#define  TM_QW3_NSR_HE_PHYS     2
>> +#define  TM_QW3_NSR_HE_LSI      3
>> +#define TM_QW3_NSR_I            PPC_BIT8(2)
>> +#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
>> +
>>  /* IVE/EAS
>>   *
>>   * One per interrupt source. Targets that interrupt to a given EQ
>> @@ -44,6 +131,8 @@ typedef struct XiveIVE {
>>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>>  } XiveIVE;
>>  
>> +#define XIVE_PRIORITY_MAX  7
>> +
>>  void spapr_xive_reset(void *dev);
>>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
>>  
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 7a308fb4db2b..6e8a189e723f 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -23,10 +23,15 @@
>>  
>>  typedef struct sPAPRXive sPAPRXive;
>>  typedef struct XiveIVE XiveIVE;
>> +typedef struct sPAPRXiveICP sPAPRXiveICP;
>>  
>>  #define TYPE_SPAPR_XIVE "spapr-xive"
>>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
>>  
>> +#define TYPE_SPAPR_XIVE_ICP "spapr-xive-icp"
>> +#define SPAPR_XIVE_ICP(obj) \
>> +    OBJECT_CHECK(sPAPRXiveICP, (obj), TYPE_SPAPR_XIVE_ICP)
>> +
>>  struct sPAPRXive {
>>      SysBusDevice parent;
>>  
>> @@ -57,6 +62,11 @@ struct sPAPRXive {
>>      hwaddr       esb_base;
>>      MemoryRegion esb_mr;
>>      MemoryRegion esb_iomem;
>> +
>> +    /* TIMA memory region */
>> +    uint32_t     tm_shift;
>> +    hwaddr       tm_base;
>> +    MemoryRegion tm_iomem;
>>  };
>>  
>>  static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
>> @@ -67,5 +77,6 @@ static inline bool spapr_xive_irq_is_lsi(sPAPRXive *xive, int lisn)
>>  bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn, bool lsi);
>>  bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>> +void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller Cédric Le Goater
  2017-11-28  5:40   ` David Gibson
@ 2017-11-29 11:49   ` Greg Kurz
  2017-11-29 13:46     ` Cédric Le Goater
  2017-11-30  4:22     ` David Gibson
  1 sibling, 2 replies; 128+ messages in thread
From: Greg Kurz @ 2017-11-29 11:49 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On Thu, 23 Nov 2017 14:29:38 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> The XIVE interrupt controller uses a set of tables to redirect exception
> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
> table, also known as Event Assignment Structure (EAS), is one them.
> 
> The XIVE model is designed to make use of the full range of the IRQ
> number space and does not use an offset like the XICS mode does.
> Hence, the IVE table is directly indexed by the IRQ number.
> 
> The IVE stores Event Queue data associated with a source. The lookups
> are performed when the source is configured or when an event is
> triggered.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  default-configs/ppc64-softmmu.mak |   1 +
>  hw/intc/Makefile.objs             |   1 +
>  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive-internal.h           |  50 ++++++++++++
>  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
>  5 files changed, 261 insertions(+)
>  create mode 100644 hw/intc/spapr_xive.c
>  create mode 100644 hw/intc/xive-internal.h
>  create mode 100644 include/hw/ppc/spapr_xive.h
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index d1b3a6dd50f8..4a7f6a0696de 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -56,6 +56,7 @@ CONFIG_SM501=y
>  CONFIG_XICS=$(CONFIG_PSERIES)
>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>  # For PReP
>  CONFIG_SERIAL_ISA=y
>  CONFIG_MC146818RTC=y
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index ae358569a155..49e13e7aeeee 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
>  obj-$(CONFIG_XICS) += xics.o
>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> new file mode 100644
> index 000000000000..b2fc3007c85f
> --- /dev/null
> +++ b/hw/intc/spapr_xive.c
> @@ -0,0 +1,165 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE model
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as

version 2 or (at your option) any later version.

> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "target/ppc/cpu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/dma.h"
> +#include "monitor/monitor.h"
> +#include "hw/ppc/spapr_xive.h"
> +
> +#include "xive-internal.h"
> +
> +/*
> + * Main XIVE object
> + */
> +
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> +{
> +    int i;
> +
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        XiveIVE *ive = &xive->ivt[i];
> +
> +        if (!(ive->w & IVE_VALID)) {
> +            continue;
> +        }
> +
> +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> +                       ive->w & IVE_MASKED ? "M" : " ",
> +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
> +    }
> +}
> +
> +void spapr_xive_reset(void *dev)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +    int i;
> +
> +    /* Mask all valid IVEs in the IRQ number space. */
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        XiveIVE *ive = &xive->ivt[i];
> +        if (ive->w & IVE_VALID) {
> +            ive->w |= IVE_MASKED;
> +        }
> +    }
> +}
> +
> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +
> +    if (!xive->nr_irqs) {
> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> +        return;
> +    }
> +
> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));

Even if it isn't documented, AFAIK current recommended practice is to do:

    xive->ivt = g_new0(XiveIVE, xive->nr_irqs);

> +
> +    qemu_register_reset(spapr_xive_reset, dev);

Shouldn't you set dc->reset in spapr_xive_class_init() instead ?

> +}
> +
> +static const VMStateDescription vmstate_spapr_xive_ive = {
> +    .name = TYPE_SPAPR_XIVE "/ive",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT64(w, XiveIVE),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool vmstate_spapr_xive_needed(void *opaque)
> +{
> +    /* TODO check machine XIVE support */
> +    return true;
> +}
> +
> +static const VMStateDescription vmstate_spapr_xive = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = vmstate_spapr_xive_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
> +                                           vmstate_spapr_xive_ive, XiveIVE),

Hmm... this array is allocated at realize and this will cause
the migration code to re-allocate it again with the same size,
and leak memory IIUC.

> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property spapr_xive_properties[] = {
> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->realize = spapr_xive_realize;
> +    dc->props = spapr_xive_properties;
> +    dc->desc = "sPAPR XIVE interrupt controller";
> +    dc->vmsd = &vmstate_spapr_xive;
> +}
> +
> +static const TypeInfo spapr_xive_info = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .parent = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(sPAPRXive),
> +    .class_init = spapr_xive_class_init,
> +};
> +
> +static void spapr_xive_register_types(void)
> +{
> +    type_register_static(&spapr_xive_info);
> +}
> +
> +type_init(spapr_xive_register_types)
> +
> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
> +{
> +    return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
> +}
> +
> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
> +
> +    if (!ive) {
> +        return false;
> +    }
> +
> +    ive->w |= IVE_VALID;
> +    return true;
> +}
> +
> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
> +
> +    if (!ive) {
> +        return false;
> +    }
> +
> +    ive->w &= ~IVE_VALID;
> +    return true;
> +}
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> new file mode 100644
> index 000000000000..bea88d82992c
> --- /dev/null
> +++ b/hw/intc/xive-internal.h
> @@ -0,0 +1,50 @@
> +/*
> + * QEMU PowerPC XIVE model
> + *
> + * Copyright 2016,2017 IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#ifndef _INTC_XIVE_INTERNAL_H
> +#define _INTC_XIVE_INTERNAL_H
> +
> +/* Utilities to manipulate these (originaly from OPAL) */
> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
> +#define SETFIELD(m, v, val)                             \
> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
> +
> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
> +                                 PPC_BIT32(bs))
> +
> +/* IVE/EAS
> + *
> + * One per interrupt source. Targets that interrupt to a given EQ
> + * and provides the corresponding logical interrupt number (EQ data)
> + *
> + * We also map this structure to the escalation descriptor inside
> + * an EQ, though in that case the valid and masked bits are not used.
> + */
> +typedef struct XiveIVE {
> +        /* Use a single 64-bit definition to make it easier to
> +         * perform atomic updates
> +         */
> +        uint64_t        w;
> +#define IVE_VALID       PPC_BIT(0)
> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> +} XiveIVE;
> +
> +void spapr_xive_reset(void *dev);
> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
> +
> +#endif /* _INTC_XIVE_INTERNAL_H */
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> new file mode 100644
> index 000000000000..795b3f4ded7c
> --- /dev/null
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -0,0 +1,44 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE model
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef PPC_SPAPR_XIVE_H
> +#define PPC_SPAPR_XIVE_H
> +
> +#include <hw/sysbus.h>
> +
> +typedef struct sPAPRXive sPAPRXive;
> +typedef struct XiveIVE XiveIVE;
> +
> +#define TYPE_SPAPR_XIVE "spapr-xive"
> +#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> +
> +struct sPAPRXive {
> +    SysBusDevice parent;
> +
> +    /* Properties */
> +    uint32_t     nr_irqs;
> +
> +    /* XIVE internal tables */
> +    XiveIVE      *ivt;
> +};
> +
> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> +
> +#endif /* PPC_SPAPR_XIVE_H */

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-29 11:49   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-11-29 13:46     ` Cédric Le Goater
  2017-11-29 15:51       ` Greg Kurz
  2017-11-30  4:23       ` David Gibson
  2017-11-30  4:22     ` David Gibson
  1 sibling, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-29 13:46 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On 11/29/2017 12:49 PM, Greg Kurz wrote:
> On Thu, 23 Nov 2017 14:29:38 +0100
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> The XIVE interrupt controller uses a set of tables to redirect exception
>> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
>> table, also known as Event Assignment Structure (EAS), is one them.
>>
>> The XIVE model is designed to make use of the full range of the IRQ
>> number space and does not use an offset like the XICS mode does.
>> Hence, the IVE table is directly indexed by the IRQ number.
>>
>> The IVE stores Event Queue data associated with a source. The lookups
>> are performed when the source is configured or when an event is
>> triggered.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  default-configs/ppc64-softmmu.mak |   1 +
>>  hw/intc/Makefile.objs             |   1 +
>>  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive-internal.h           |  50 ++++++++++++
>>  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
>>  5 files changed, 261 insertions(+)
>>  create mode 100644 hw/intc/spapr_xive.c
>>  create mode 100644 hw/intc/xive-internal.h
>>  create mode 100644 include/hw/ppc/spapr_xive.h
>>
>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>> index d1b3a6dd50f8..4a7f6a0696de 100644
>> --- a/default-configs/ppc64-softmmu.mak
>> +++ b/default-configs/ppc64-softmmu.mak
>> @@ -56,6 +56,7 @@ CONFIG_SM501=y
>>  CONFIG_XICS=$(CONFIG_PSERIES)
>>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>>  # For PReP
>>  CONFIG_SERIAL_ISA=y
>>  CONFIG_MC146818RTC=y
>> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
>> index ae358569a155..49e13e7aeeee 100644
>> --- a/hw/intc/Makefile.objs
>> +++ b/hw/intc/Makefile.objs
>> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
>>  obj-$(CONFIG_XICS) += xics.o
>>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> new file mode 100644
>> index 000000000000..b2fc3007c85f
>> --- /dev/null
>> +++ b/hw/intc/spapr_xive.c
>> @@ -0,0 +1,165 @@
>> +/*
>> + * QEMU PowerPC sPAPR XIVE model
>> + *
>> + * Copyright (c) 2017, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
> 
> version 2 or (at your option) any later version.

yep. I will shorten the headers at the same time.

> 
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "qapi/error.h"
>> +#include "target/ppc/cpu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/dma.h"
>> +#include "monitor/monitor.h"
>> +#include "hw/ppc/spapr_xive.h"
>> +
>> +#include "xive-internal.h"
>> +
>> +/*
>> + * Main XIVE object
>> + */
>> +
>> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < xive->nr_irqs; i++) {
>> +        XiveIVE *ive = &xive->ivt[i];
>> +
>> +        if (!(ive->w & IVE_VALID)) {
>> +            continue;
>> +        }
>> +
>> +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
>> +                       ive->w & IVE_MASKED ? "M" : " ",
>> +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>> +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
>> +    }
>> +}
>> +
>> +void spapr_xive_reset(void *dev)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>> +    int i;
>> +
>> +    /* Mask all valid IVEs in the IRQ number space. */
>> +    for (i = 0; i < xive->nr_irqs; i++) {
>> +        XiveIVE *ive = &xive->ivt[i];
>> +        if (ive->w & IVE_VALID) {
>> +            ive->w |= IVE_MASKED;
>> +        }
>> +    }
>> +}
>> +
>> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>> +
>> +    if (!xive->nr_irqs) {
>> +        error_setg(errp, "Number of interrupt needs to be greater 0");
>> +        return;
>> +    }
>> +
>> +    /* Allocate the IVT (Interrupt Virtualization Table) */
>> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> 
> Even if it isn't documented, AFAIK current recommended practice is to do:
> 
>     xive->ivt = g_new0(XiveIVE, xive->nr_irqs);

OK.
 
>> +
>> +    qemu_register_reset(spapr_xive_reset, dev);
> 
> Shouldn't you set dc->reset in spapr_xive_class_init() instead ?

qemu_register_reset() is a more general API. What is the best
practice ? 

>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_xive_ive = {
>> +    .name = TYPE_SPAPR_XIVE "/ive",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT64(w, XiveIVE),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool vmstate_spapr_xive_needed(void *opaque)
>> +{
>> +    /* TODO check machine XIVE support */
>> +    return true;
>> +}
>> +
>> +static const VMStateDescription vmstate_spapr_xive = {
>> +    .name = TYPE_SPAPR_XIVE,
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = vmstate_spapr_xive_needed,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
>> +                                           vmstate_spapr_xive_ive, XiveIVE),
> 
> Hmm... this array is allocated at realize and this will cause
> the migration code to re-allocate it again with the same size,
> and leak memory IIUC.

I thought so but something was going wrong on the receive side (memory 
corruption detected by valgrind). I did not find why yet.

Thanks,

C.

>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static Property spapr_xive_properties[] = {
>> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +    dc->realize = spapr_xive_realize;
>> +    dc->props = spapr_xive_properties;
>> +    dc->desc = "sPAPR XIVE interrupt controller";
>> +    dc->vmsd = &vmstate_spapr_xive;
>> +}
>> +
>> +static const TypeInfo spapr_xive_info = {
>> +    .name = TYPE_SPAPR_XIVE,
>> +    .parent = TYPE_SYS_BUS_DEVICE,
>> +    .instance_size = sizeof(sPAPRXive),
>> +    .class_init = spapr_xive_class_init,
>> +};
>> +
>> +static void spapr_xive_register_types(void)
>> +{
>> +    type_register_static(&spapr_xive_info);
>> +}
>> +
>> +type_init(spapr_xive_register_types)
>> +
>> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
>> +}
>> +
>> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
>> +
>> +    if (!ive) {
>> +        return false;
>> +    }
>> +
>> +    ive->w |= IVE_VALID;
>> +    return true;
>> +}
>> +
>> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
>> +
>> +    if (!ive) {
>> +        return false;
>> +    }
>> +
>> +    ive->w &= ~IVE_VALID;
>> +    return true;
>> +}
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> new file mode 100644
>> index 000000000000..bea88d82992c
>> --- /dev/null
>> +++ b/hw/intc/xive-internal.h
>> @@ -0,0 +1,50 @@
>> +/*
>> + * QEMU PowerPC XIVE model
>> + *
>> + * Copyright 2016,2017 IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +#ifndef _INTC_XIVE_INTERNAL_H
>> +#define _INTC_XIVE_INTERNAL_H
>> +
>> +/* Utilities to manipulate these (originaly from OPAL) */
>> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
>> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
>> +#define SETFIELD(m, v, val)                             \
>> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
>> +
>> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
>> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
>> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
>> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>> +                                 PPC_BIT32(bs))
>> +
>> +/* IVE/EAS
>> + *
>> + * One per interrupt source. Targets that interrupt to a given EQ
>> + * and provides the corresponding logical interrupt number (EQ data)
>> + *
>> + * We also map this structure to the escalation descriptor inside
>> + * an EQ, though in that case the valid and masked bits are not used.
>> + */
>> +typedef struct XiveIVE {
>> +        /* Use a single 64-bit definition to make it easier to
>> +         * perform atomic updates
>> +         */
>> +        uint64_t        w;
>> +#define IVE_VALID       PPC_BIT(0)
>> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
>> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>> +} XiveIVE;
>> +
>> +void spapr_xive_reset(void *dev);
>> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
>> +
>> +#endif /* _INTC_XIVE_INTERNAL_H */
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> new file mode 100644
>> index 000000000000..795b3f4ded7c
>> --- /dev/null
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -0,0 +1,44 @@
>> +/*
>> + * QEMU PowerPC sPAPR XIVE model
>> + *
>> + * Copyright (c) 2017, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef PPC_SPAPR_XIVE_H
>> +#define PPC_SPAPR_XIVE_H
>> +
>> +#include <hw/sysbus.h>
>> +
>> +typedef struct sPAPRXive sPAPRXive;
>> +typedef struct XiveIVE XiveIVE;
>> +
>> +#define TYPE_SPAPR_XIVE "spapr-xive"
>> +#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
>> +
>> +struct sPAPRXive {
>> +    SysBusDevice parent;
>> +
>> +    /* Properties */
>> +    uint32_t     nr_irqs;
>> +
>> +    /* XIVE internal tables */
>> +    XiveIVE      *ivt;
>> +};
>> +
>> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
>> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>> +
>> +#endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-29  4:59       ` David Gibson
@ 2017-11-29 13:56         ` Cédric Le Goater
  2017-11-29 16:23           ` Cédric Le Goater
  2017-11-30  4:26           ` David Gibson
  0 siblings, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-29 13:56 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

>>>> +    switch (offset) {
>>>> +    case 0:
>>>> +        spapr_xive_source_eoi(xive, lisn);
>>>
>>> Hrm.  I don't love that you're dealing with clearing that LSI bit
>>> here, but setting it at a different level.
>>>
>>> The state machines are doing my head in a bit, is there any way
>>> you could derive the STATUS_SENT bit from the PQ bits?
>>
>> Yes. I should. 
>>
>> I am also lacking a guest driver to exercise these LSIs so I didn't
>> pay a lot of attention to level interrupts. Any idea ?
> 
> How about an old-school emulated PCI device?  Maybe rtl8139?

Perfect. The current model is working but I will see how I can 
improve it to use the PQ bits instead.

I also found a couple of issues on the way. 

We do need the "#interrupt-cells" and "interrupt-controller" 
properties. They are missing from the XIVE sPAPR specs but there
is no other way to find the parent controller for the LSIs ... 
I have re-asked the pHyp team to include them in the specs and 
fixed the QEMU model.
 
Linux thinks the interrupt type is an "edge" and not a "level" one :
  
  (initramfs) cat /proc/interrupts 
             CPU0       
   16:          0  XIVE-IPI    0 Edge      IPI
   17:         14  XIVE-IRQ 4100 Edge      enp0s0
   18:          0  XIVE-IRQ 4097 Edge      RAS_HOTPLUG
   19:          0  XIVE-IRQ 4096 Edge      RAS_EPOW
   20:         20  XIVE-IRQ 4098 Edge      hvc_console

and XIVE complains :

  [    8.319970] xive: Interrupt 17 (HW 0x1004) type mismatch, Linux says Edge, FW says Level

I am digging this one.

Thanks.

C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-29 13:46     ` Cédric Le Goater
@ 2017-11-29 15:51       ` Greg Kurz
  2017-11-29 16:41         ` Cédric Le Goater
  2017-11-30  4:23       ` David Gibson
  1 sibling, 1 reply; 128+ messages in thread
From: Greg Kurz @ 2017-11-29 15:51 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

On Wed, 29 Nov 2017 14:46:56 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> On 11/29/2017 12:49 PM, Greg Kurz wrote:
> 
> > Cédric Le Goater <clg@kaod.org> wrote:
> >   
> >> The XIVE interrupt controller uses a set of tables to redirect exception
> >> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
> >> table, also known as Event Assignment Structure (EAS), is one them.
> >>
> >> The XIVE model is designed to make use of the full range of the IRQ
> >> number space and does not use an offset like the XICS mode does.
> >> Hence, the IVE table is directly indexed by the IRQ number.
> >>
> >> The IVE stores Event Queue data associated with a source. The lookups
> >> are performed when the source is configured or when an event is
> >> triggered.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  default-configs/ppc64-softmmu.mak |   1 +
> >>  hw/intc/Makefile.objs             |   1 +
> >>  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
> >>  hw/intc/xive-internal.h           |  50 ++++++++++++
> >>  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
> >>  5 files changed, 261 insertions(+)
> >>  create mode 100644 hw/intc/spapr_xive.c
> >>  create mode 100644 hw/intc/xive-internal.h
> >>  create mode 100644 include/hw/ppc/spapr_xive.h
> >>
> >> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> >> index d1b3a6dd50f8..4a7f6a0696de 100644
> >> --- a/default-configs/ppc64-softmmu.mak
> >> +++ b/default-configs/ppc64-softmmu.mak
> >> @@ -56,6 +56,7 @@ CONFIG_SM501=y
> >>  CONFIG_XICS=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> >> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> >>  # For PReP
> >>  CONFIG_SERIAL_ISA=y
> >>  CONFIG_MC146818RTC=y
> >> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> >> index ae358569a155..49e13e7aeeee 100644
> >> --- a/hw/intc/Makefile.objs
> >> +++ b/hw/intc/Makefile.objs
> >> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >>  obj-$(CONFIG_XICS) += xics.o
> >>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> >> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> >>  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> new file mode 100644
> >> index 000000000000..b2fc3007c85f
> >> --- /dev/null
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -0,0 +1,165 @@
> >> +/*
> >> + * QEMU PowerPC sPAPR XIVE model
> >> + *
> >> + * Copyright (c) 2017, IBM Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License, version 2, as  
> > 
> > version 2 or (at your option) any later version.  
> 
> yep. I will shorten the headers at the same time.
> 

Yeah, I found a really short one from Peter in include/hw/arm/armv7m.h:

/*
 * ARMv7M CPU object
 *
 * Copyright (c) 2017 Linaro Ltd
 * Written by Peter Maydell <peter.maydell@linaro.org>
 *
 * This code is licensed under the GPL version 2 or later.
 */

> >   
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/log.h"
> >> +#include "qapi/error.h"
> >> +#include "target/ppc/cpu.h"
> >> +#include "sysemu/cpus.h"
> >> +#include "sysemu/dma.h"
> >> +#include "monitor/monitor.h"
> >> +#include "hw/ppc/spapr_xive.h"
> >> +
> >> +#include "xive-internal.h"
> >> +
> >> +/*
> >> + * Main XIVE object
> >> + */
> >> +
> >> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> >> +{
> >> +    int i;
> >> +
> >> +    for (i = 0; i < xive->nr_irqs; i++) {
> >> +        XiveIVE *ive = &xive->ivt[i];
> >> +
> >> +        if (!(ive->w & IVE_VALID)) {
> >> +            continue;
> >> +        }
> >> +
> >> +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> >> +                       ive->w & IVE_MASKED ? "M" : " ",
> >> +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> >> +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
> >> +    }
> >> +}
> >> +
> >> +void spapr_xive_reset(void *dev)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +    int i;
> >> +
> >> +    /* Mask all valid IVEs in the IRQ number space. */
> >> +    for (i = 0; i < xive->nr_irqs; i++) {
> >> +        XiveIVE *ive = &xive->ivt[i];
> >> +        if (ive->w & IVE_VALID) {
> >> +            ive->w |= IVE_MASKED;
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +
> >> +    if (!xive->nr_irqs) {
> >> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> >> +        return;
> >> +    }
> >> +
> >> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> >> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));  
> > 
> > Even if it isn't documented, AFAIK current recommended practice is to do:
> > 
> >     xive->ivt = g_new0(XiveIVE, xive->nr_irqs);  
> 
> OK.
>  
> >> +
> >> +    qemu_register_reset(spapr_xive_reset, dev);  
> > 
> > Shouldn't you set dc->reset in spapr_xive_class_init() instead ?  
> 
> qemu_register_reset() is a more general API. What is the best
> practice ? 
> 

I'm no expert but this is a sysbus device, right ? So I'd expect it to be
connected to some bus (sysbus_get_default() ?) and to get reset when the
parent bus is reset...

> >> +}
> >> +
> >> +static const VMStateDescription vmstate_spapr_xive_ive = {
> >> +    .name = TYPE_SPAPR_XIVE "/ive",
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .fields = (VMStateField []) {
> >> +        VMSTATE_UINT64(w, XiveIVE),
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >> +static bool vmstate_spapr_xive_needed(void *opaque)
> >> +{
> >> +    /* TODO check machine XIVE support */
> >> +    return true;
> >> +}
> >> +
> >> +static const VMStateDescription vmstate_spapr_xive = {
> >> +    .name = TYPE_SPAPR_XIVE,
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .needed = vmstate_spapr_xive_needed,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
> >> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
> >> +                                           vmstate_spapr_xive_ive, XiveIVE),  
> > 
> > Hmm... this array is allocated at realize and this will cause
> > the migration code to re-allocate it again with the same size,
> > and leak memory IIUC.  
> 
> I thought so but something was going wrong on the receive side (memory 
> corruption detected by valgrind). I did not find why yet.
> 

Have you tried VMSTATE_STRUCT_VARRAY_POINTER_UINT32() ?

> Thanks,
> 
> C.
> 
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >> +static Property spapr_xive_properties[] = {
> >> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> >> +    DEFINE_PROP_END_OF_LIST(),
> >> +};
> >> +
> >> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
> >> +{
> >> +    DeviceClass *dc = DEVICE_CLASS(klass);
> >> +
> >> +    dc->realize = spapr_xive_realize;
> >> +    dc->props = spapr_xive_properties;
> >> +    dc->desc = "sPAPR XIVE interrupt controller";
> >> +    dc->vmsd = &vmstate_spapr_xive;
> >> +}
> >> +
> >> +static const TypeInfo spapr_xive_info = {
> >> +    .name = TYPE_SPAPR_XIVE,
> >> +    .parent = TYPE_SYS_BUS_DEVICE,
> >> +    .instance_size = sizeof(sPAPRXive),
> >> +    .class_init = spapr_xive_class_init,
> >> +};
> >> +
> >> +static void spapr_xive_register_types(void)
> >> +{
> >> +    type_register_static(&spapr_xive_info);
> >> +}
> >> +
> >> +type_init(spapr_xive_register_types)
> >> +
> >> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
> >> +}
> >> +
> >> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
> >> +
> >> +    if (!ive) {
> >> +        return false;
> >> +    }
> >> +
> >> +    ive->w |= IVE_VALID;
> >> +    return true;
> >> +}
> >> +
> >> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
> >> +{
> >> +    XiveIVE *ive = spapr_xive_get_ive(xive, lisn);
> >> +
> >> +    if (!ive) {
> >> +        return false;
> >> +    }
> >> +
> >> +    ive->w &= ~IVE_VALID;
> >> +    return true;
> >> +}
> >> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> >> new file mode 100644
> >> index 000000000000..bea88d82992c
> >> --- /dev/null
> >> +++ b/hw/intc/xive-internal.h
> >> @@ -0,0 +1,50 @@
> >> +/*
> >> + * QEMU PowerPC XIVE model
> >> + *
> >> + * Copyright 2016,2017 IBM Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or
> >> + * modify it under the terms of the GNU General Public License
> >> + * as published by the Free Software Foundation; either version
> >> + * 2 of the License, or (at your option) any later version.
> >> + */
> >> +#ifndef _INTC_XIVE_INTERNAL_H
> >> +#define _INTC_XIVE_INTERNAL_H
> >> +
> >> +/* Utilities to manipulate these (originaly from OPAL) */
> >> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
> >> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
> >> +#define SETFIELD(m, v, val)                             \
> >> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
> >> +
> >> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
> >> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
> >> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
> >> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
> >> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
> >> +                                 PPC_BIT32(bs))
> >> +
> >> +/* IVE/EAS
> >> + *
> >> + * One per interrupt source. Targets that interrupt to a given EQ
> >> + * and provides the corresponding logical interrupt number (EQ data)
> >> + *
> >> + * We also map this structure to the escalation descriptor inside
> >> + * an EQ, though in that case the valid and masked bits are not used.
> >> + */
> >> +typedef struct XiveIVE {
> >> +        /* Use a single 64-bit definition to make it easier to
> >> +         * perform atomic updates
> >> +         */
> >> +        uint64_t        w;
> >> +#define IVE_VALID       PPC_BIT(0)
> >> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
> >> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
> >> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
> >> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> >> +} XiveIVE;
> >> +
> >> +void spapr_xive_reset(void *dev);
> >> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
> >> +
> >> +#endif /* _INTC_XIVE_INTERNAL_H */
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> new file mode 100644
> >> index 000000000000..795b3f4ded7c
> >> --- /dev/null
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -0,0 +1,44 @@
> >> +/*
> >> + * QEMU PowerPC sPAPR XIVE model
> >> + *
> >> + * Copyright (c) 2017, IBM Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License, version 2, as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#ifndef PPC_SPAPR_XIVE_H
> >> +#define PPC_SPAPR_XIVE_H
> >> +
> >> +#include <hw/sysbus.h>
> >> +
> >> +typedef struct sPAPRXive sPAPRXive;
> >> +typedef struct XiveIVE XiveIVE;
> >> +
> >> +#define TYPE_SPAPR_XIVE "spapr-xive"
> >> +#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> >> +
> >> +struct sPAPRXive {
> >> +    SysBusDevice parent;
> >> +
> >> +    /* Properties */
> >> +    uint32_t     nr_irqs;
> >> +
> >> +    /* XIVE internal tables */
> >> +    XiveIVE      *ivt;
> >> +};
> >> +
> >> +bool spapr_xive_irq_set(sPAPRXive *xive, uint32_t lisn);
> >> +bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
> >> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> >> +
> >> +#endif /* PPC_SPAPR_XIVE_H */  
> >   
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-29 13:56         ` Cédric Le Goater
@ 2017-11-29 16:23           ` Cédric Le Goater
  2017-11-30  4:28             ` David Gibson
  2017-12-02 14:28             ` Benjamin Herrenschmidt
  2017-11-30  4:26           ` David Gibson
  1 sibling, 2 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-29 16:23 UTC (permalink / raw)
  To: David Gibson; +Cc: list@suse.de:PowerPC, qemu-devel, Benjamin Herrenschmidt

On 11/29/2017 02:56 PM, Cédric Le Goater wrote:
>>>>> +    switch (offset) {
>>>>> +    case 0:
>>>>> +        spapr_xive_source_eoi(xive, lisn);
>>>>
>>>> Hrm.  I don't love that you're dealing with clearing that LSI bit
>>>> here, but setting it at a different level.
>>>>
>>>> The state machines are doing my head in a bit, is there any way
>>>> you could derive the STATUS_SENT bit from the PQ bits?
>>>
>>> Yes. I should. 
>>>
>>> I am also lacking a guest driver to exercise these LSIs so I didn't
>>> pay a lot of attention to level interrupts. Any idea ?
>>
>> How about an old-school emulated PCI device?  Maybe rtl8139?
> 
> Perfect. The current model is working but I will see how I can 
> improve it to use the PQ bits instead.

Using the PQ bits is simplifying the model but we still have to 
maintain an array to store the IRQ type. 

There are 3 unused bits in the IVE descriptor, bits[1-3]:  

  #define IVE_VALID       PPC_BIT(0)
  #define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
  #define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
  #define IVE_MASKED      PPC_BIT(32)              /* Masked */
  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */

We could hijack one of them to store the LSI type and get rid of 
the type array. Would you object to that ? 

C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-29 15:51       ` Greg Kurz
@ 2017-11-29 16:41         ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-29 16:41 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt

>>>> +static const VMStateDescription vmstate_spapr_xive = {
>>>> +    .name = TYPE_SPAPR_XIVE,
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .needed = vmstate_spapr_xive_needed,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>>>> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
>>>> +                                           vmstate_spapr_xive_ive, XiveIVE),  
>>>
>>> Hmm... this array is allocated at realize and this will cause
>>> the migration code to re-allocate it again with the same size,
>>> and leak memory IIUC.  
>>
>> I thought so but something was going wrong on the receive side (memory 
>> corruption detected by valgrind). I did not find why yet.
>>
> 
> Have you tried VMSTATE_STRUCT_VARRAY_POINTER_UINT32() ?

yes. tcg/intel only though.

C. 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model
  2017-11-29  9:55     ` Cédric Le Goater
@ 2017-11-30  4:06       ` David Gibson
  2017-11-30 13:44         ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-30  4:06 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2740 bytes --]

On Wed, Nov 29, 2017 at 10:55:34AM +0100, Cédric Le Goater wrote:
> On 11/29/2017 06:11 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:42PM +0100, Cédric Le Goater wrote:
> >> The XIVE interrupt presenter exposes a set of rings, also called
> >> Thread Interrupt Management Areas (TIMA), to handle priority
> >> management and interrupt acknowledgment among other things. There is
> >> one ring per level of privilege, four in all. The one we are
> >> interested in for the sPAPR machine is the OS ring.
> >>
> >> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
> >> used to retrieve the targeted interrupt presenter object holding the
> >> cache data of the registers the model use.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive.c        | 271 ++++++++++++++++++++++++++++++++++++++++++++
> >>  hw/intc/xive-internal.h     |  89 +++++++++++++++
> >>  include/hw/ppc/spapr_xive.h |  11 ++
> >>  3 files changed, 371 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index b1e3f8710cff..554b25e0884c 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -23,9 +23,166 @@
> >>  #include "sysemu/dma.h"
> >>  #include "monitor/monitor.h"
> >>  #include "hw/ppc/spapr_xive.h"
> >> +#include "hw/ppc/xics.h"
> >>  
> >>  #include "xive-internal.h"
> >>  
> >> +struct sPAPRXiveICP {
> > 
> > I'd really prefer to avoid calling anything in xive "icp" to avoid
> > confusion with xics.
> 
> OK. 
> 
> The specs refers to the whole as an IVPE : Interrupt Virtualization 
> Presentation Engine. In our model, we use the TIMA cached values of 
> the OS ring and the qemu_irq for the CPU line. 
> 
> Would 'sPAPRXivePresenter' be fine ?

That'd be ok.  Or call if sPAPRIVPE.  Or even call it TIMA.  I'd be
fine with any of those.

[snip]
> >> +static uint64_t spapr_xive_tm_read(void *opaque, hwaddr offset, unsigned size)
> >> +{
> >> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> > 
> > So, strictly speaking this could be handled by setting each of the
> > CPUs address spaces separately, to something with their own TIMA
> > superimposed on address_space_memory. 
> 
> Ah. I didn't know we could do that.

I think that should work from having seen the code before.  I haven't
actually attempted it..

> > What you have might be more practical though.
> 
> well, you will see at the end of the patchset how cpu->intc is
> assigned.

[snip]

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-29 11:49   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-11-29 13:46     ` Cédric Le Goater
@ 2017-11-30  4:22     ` David Gibson
  1 sibling, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-11-30  4:22 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Cédric Le Goater, qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 5711 bytes --]

On Wed, Nov 29, 2017 at 12:49:04PM +0100, Greg Kurz wrote:
> On Thu, 23 Nov 2017 14:29:38 +0100
> Cédric Le Goater <clg@kaod.org> wrote:
> 
> > The XIVE interrupt controller uses a set of tables to redirect exception
> > from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
> > table, also known as Event Assignment Structure (EAS), is one them.
> > 
> > The XIVE model is designed to make use of the full range of the IRQ
> > number space and does not use an offset like the XICS mode does.
> > Hence, the IVE table is directly indexed by the IRQ number.
> > 
> > The IVE stores Event Queue data associated with a source. The lookups
> > are performed when the source is configured or when an event is
> > triggered.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  default-configs/ppc64-softmmu.mak |   1 +
> >  hw/intc/Makefile.objs             |   1 +
> >  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
> >  hw/intc/xive-internal.h           |  50 ++++++++++++
> >  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
> >  5 files changed, 261 insertions(+)
> >  create mode 100644 hw/intc/spapr_xive.c
> >  create mode 100644 hw/intc/xive-internal.h
> >  create mode 100644 include/hw/ppc/spapr_xive.h
> > 
> > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > index d1b3a6dd50f8..4a7f6a0696de 100644
> > --- a/default-configs/ppc64-softmmu.mak
> > +++ b/default-configs/ppc64-softmmu.mak
> > @@ -56,6 +56,7 @@ CONFIG_SM501=y
> >  CONFIG_XICS=$(CONFIG_PSERIES)
> >  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> > +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> >  # For PReP
> >  CONFIG_SERIAL_ISA=y
> >  CONFIG_MC146818RTC=y
> > diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> > index ae358569a155..49e13e7aeeee 100644
> > --- a/hw/intc/Makefile.objs
> > +++ b/hw/intc/Makefile.objs
> > @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >  obj-$(CONFIG_XICS) += xics.o
> >  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> > +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> >  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> > diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> > new file mode 100644
> > index 000000000000..b2fc3007c85f
> > --- /dev/null
> > +++ b/hw/intc/spapr_xive.c
> > @@ -0,0 +1,165 @@
> > +/*
> > + * QEMU PowerPC sPAPR XIVE model
> > + *
> > + * Copyright (c) 2017, IBM Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> 
> version 2 or (at your option) any later version.
> 
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "qapi/error.h"
> > +#include "target/ppc/cpu.h"
> > +#include "sysemu/cpus.h"
> > +#include "sysemu/dma.h"
> > +#include "monitor/monitor.h"
> > +#include "hw/ppc/spapr_xive.h"
> > +
> > +#include "xive-internal.h"
> > +
> > +/*
> > + * Main XIVE object
> > + */
> > +
> > +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < xive->nr_irqs; i++) {
> > +        XiveIVE *ive = &xive->ivt[i];
> > +
> > +        if (!(ive->w & IVE_VALID)) {
> > +            continue;
> > +        }
> > +
> > +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> > +                       ive->w & IVE_MASKED ? "M" : " ",
> > +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> > +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
> > +    }
> > +}
> > +
> > +void spapr_xive_reset(void *dev)
> > +{
> > +    sPAPRXive *xive = SPAPR_XIVE(dev);
> > +    int i;
> > +
> > +    /* Mask all valid IVEs in the IRQ number space. */
> > +    for (i = 0; i < xive->nr_irqs; i++) {
> > +        XiveIVE *ive = &xive->ivt[i];
> > +        if (ive->w & IVE_VALID) {
> > +            ive->w |= IVE_MASKED;
> > +        }
> > +    }
> > +}
> > +
> > +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> > +{
> > +    sPAPRXive *xive = SPAPR_XIVE(dev);
> > +
> > +    if (!xive->nr_irqs) {
> > +        error_setg(errp, "Number of interrupt needs to be greater 0");
> > +        return;
> > +    }
> > +
> > +    /* Allocate the IVT (Interrupt Virtualization Table) */
> > +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> 
> Even if it isn't documented, AFAIK current recommended practice is to do:

Yeah, that's a good idea - protects against integer overflows if
nothing else.

>     xive->ivt = g_new0(XiveIVE, xive->nr_irqs);
> 
> > +
> > +    qemu_register_reset(spapr_xive_reset, dev);
> 
> Shouldn't you set dc->reset in spapr_xive_class_init() instead ?

Yeah, that.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller
  2017-11-29 13:46     ` Cédric Le Goater
  2017-11-29 15:51       ` Greg Kurz
@ 2017-11-30  4:23       ` David Gibson
  1 sibling, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-11-30  4:23 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Greg Kurz, qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 6224 bytes --]

On Wed, Nov 29, 2017 at 02:46:56PM +0100, Cédric Le Goater wrote:
> On 11/29/2017 12:49 PM, Greg Kurz wrote:
> > On Thu, 23 Nov 2017 14:29:38 +0100
> > Cédric Le Goater <clg@kaod.org> wrote:
> > 
> >> The XIVE interrupt controller uses a set of tables to redirect exception
> >> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
> >> table, also known as Event Assignment Structure (EAS), is one them.
> >>
> >> The XIVE model is designed to make use of the full range of the IRQ
> >> number space and does not use an offset like the XICS mode does.
> >> Hence, the IVE table is directly indexed by the IRQ number.
> >>
> >> The IVE stores Event Queue data associated with a source. The lookups
> >> are performed when the source is configured or when an event is
> >> triggered.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  default-configs/ppc64-softmmu.mak |   1 +
> >>  hw/intc/Makefile.objs             |   1 +
> >>  hw/intc/spapr_xive.c              | 165 ++++++++++++++++++++++++++++++++++++++
> >>  hw/intc/xive-internal.h           |  50 ++++++++++++
> >>  include/hw/ppc/spapr_xive.h       |  44 ++++++++++
> >>  5 files changed, 261 insertions(+)
> >>  create mode 100644 hw/intc/spapr_xive.c
> >>  create mode 100644 hw/intc/xive-internal.h
> >>  create mode 100644 include/hw/ppc/spapr_xive.h
> >>
> >> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> >> index d1b3a6dd50f8..4a7f6a0696de 100644
> >> --- a/default-configs/ppc64-softmmu.mak
> >> +++ b/default-configs/ppc64-softmmu.mak
> >> @@ -56,6 +56,7 @@ CONFIG_SM501=y
> >>  CONFIG_XICS=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> >> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> >>  # For PReP
> >>  CONFIG_SERIAL_ISA=y
> >>  CONFIG_MC146818RTC=y
> >> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> >> index ae358569a155..49e13e7aeeee 100644
> >> --- a/hw/intc/Makefile.objs
> >> +++ b/hw/intc/Makefile.objs
> >> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >>  obj-$(CONFIG_XICS) += xics.o
> >>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> >> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> >>  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> new file mode 100644
> >> index 000000000000..b2fc3007c85f
> >> --- /dev/null
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -0,0 +1,165 @@
> >> +/*
> >> + * QEMU PowerPC sPAPR XIVE model
> >> + *
> >> + * Copyright (c) 2017, IBM Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License, version 2, as
> > 
> > version 2 or (at your option) any later version.
> 
> yep. I will shorten the headers at the same time.
> 
> > 
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/log.h"
> >> +#include "qapi/error.h"
> >> +#include "target/ppc/cpu.h"
> >> +#include "sysemu/cpus.h"
> >> +#include "sysemu/dma.h"
> >> +#include "monitor/monitor.h"
> >> +#include "hw/ppc/spapr_xive.h"
> >> +
> >> +#include "xive-internal.h"
> >> +
> >> +/*
> >> + * Main XIVE object
> >> + */
> >> +
> >> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> >> +{
> >> +    int i;
> >> +
> >> +    for (i = 0; i < xive->nr_irqs; i++) {
> >> +        XiveIVE *ive = &xive->ivt[i];
> >> +
> >> +        if (!(ive->w & IVE_VALID)) {
> >> +            continue;
> >> +        }
> >> +
> >> +        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> >> +                       ive->w & IVE_MASKED ? "M" : " ",
> >> +                       (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> >> +                       (int) GETFIELD(IVE_EQ_DATA, ive->w));
> >> +    }
> >> +}
> >> +
> >> +void spapr_xive_reset(void *dev)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +    int i;
> >> +
> >> +    /* Mask all valid IVEs in the IRQ number space. */
> >> +    for (i = 0; i < xive->nr_irqs; i++) {
> >> +        XiveIVE *ive = &xive->ivt[i];
> >> +        if (ive->w & IVE_VALID) {
> >> +            ive->w |= IVE_MASKED;
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +
> >> +    if (!xive->nr_irqs) {
> >> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> >> +        return;
> >> +    }
> >> +
> >> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> >> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> > 
> > Even if it isn't documented, AFAIK current recommended practice is to do:
> > 
> >     xive->ivt = g_new0(XiveIVE, xive->nr_irqs);
> 
> OK.
>  
> >> +
> >> +    qemu_register_reset(spapr_xive_reset, dev);
> > 
> > Shouldn't you set dc->reset in spapr_xive_class_init() instead ?
> 
> qemu_register_reset() is a more general API. What is the best
> practice ?

Usually dc->reset for those cases where it can be used - which is
basically anything sitting on a qbus.

There can be exceptions (e.g. the stuff I did for DRCs, because they
sometimes had a bus and sometimes didn't), but AFAICT this isn't one.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-29 13:56         ` Cédric Le Goater
  2017-11-29 16:23           ` Cédric Le Goater
@ 2017-11-30  4:26           ` David Gibson
  2017-11-30 15:40             ` Cédric Le Goater
  1 sibling, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-30  4:26 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2030 bytes --]

On Wed, Nov 29, 2017 at 02:56:39PM +0100, Cédric Le Goater wrote:
> >>>> +    switch (offset) {
> >>>> +    case 0:
> >>>> +        spapr_xive_source_eoi(xive, lisn);
> >>>
> >>> Hrm.  I don't love that you're dealing with clearing that LSI bit
> >>> here, but setting it at a different level.
> >>>
> >>> The state machines are doing my head in a bit, is there any way
> >>> you could derive the STATUS_SENT bit from the PQ bits?
> >>
> >> Yes. I should. 
> >>
> >> I am also lacking a guest driver to exercise these LSIs so I didn't
> >> pay a lot of attention to level interrupts. Any idea ?
> > 
> > How about an old-school emulated PCI device?  Maybe rtl8139?
> 
> Perfect. The current model is working but I will see how I can 
> improve it to use the PQ bits instead.
> 
> I also found a couple of issues on the way. 
> 
> We do need the "#interrupt-cells" and "interrupt-controller" 
> properties. They are missing from the XIVE sPAPR specs but there
> is no other way to find the parent controller for the LSIs ... 
> I have re-asked the pHyp team to include them in the specs and 
> fixed the QEMU model.

Told ya so :).

> Linux thinks the interrupt type is an "edge" and not a "level" one :

Right "edge" and message interrupts work basically the same way.

>   (initramfs) cat /proc/interrupts 
>              CPU0       
>    16:          0  XIVE-IPI    0 Edge      IPI
>    17:         14  XIVE-IRQ 4100 Edge      enp0s0
>    18:          0  XIVE-IRQ 4097 Edge      RAS_HOTPLUG
>    19:          0  XIVE-IRQ 4096 Edge      RAS_EPOW
>    20:         20  XIVE-IRQ 4098 Edge      hvc_console
> 
> and XIVE complains :
> 
>   [    8.319970] xive: Interrupt 17 (HW 0x1004) type mismatch, Linux says Edge, FW says Level
> 
> I am digging this one.
> 
> Thanks.
> 
> C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-29 16:23           ` Cédric Le Goater
@ 2017-11-30  4:28             ` David Gibson
  2017-11-30 16:05               ` Cédric Le Goater
  2017-12-02 14:33               ` Benjamin Herrenschmidt
  2017-12-02 14:28             ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-11-30  4:28 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: list@suse.de:PowerPC, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

On Wed, Nov 29, 2017 at 05:23:25PM +0100, Cédric Le Goater wrote:
> On 11/29/2017 02:56 PM, Cédric Le Goater wrote:
> >>>>> +    switch (offset) {
> >>>>> +    case 0:
> >>>>> +        spapr_xive_source_eoi(xive, lisn);
> >>>>
> >>>> Hrm.  I don't love that you're dealing with clearing that LSI bit
> >>>> here, but setting it at a different level.
> >>>>
> >>>> The state machines are doing my head in a bit, is there any way
> >>>> you could derive the STATUS_SENT bit from the PQ bits?
> >>>
> >>> Yes. I should. 
> >>>
> >>> I am also lacking a guest driver to exercise these LSIs so I didn't
> >>> pay a lot of attention to level interrupts. Any idea ?
> >>
> >> How about an old-school emulated PCI device?  Maybe rtl8139?
> > 
> > Perfect. The current model is working but I will see how I can 
> > improve it to use the PQ bits instead.
> 
> Using the PQ bits is simplifying the model but we still have to 
> maintain an array to store the IRQ type. 
> 
> There are 3 unused bits in the IVE descriptor, bits[1-3]:  
> 
>   #define IVE_VALID       PPC_BIT(0)
>   #define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>   #define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>   #define IVE_MASKED      PPC_BIT(32)              /* Masked */
>   #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> 
> We could hijack one of them to store the LSI type and get rid of 
> the type array. Would you object to that ?

Hrm.  These IVE bits are architected, aren't they?  In which case I'd
be wary of stealing a reserved bit in case of future extensions.

I'm wondering if we want another word / structure for storing
non-architected, implementation specific flags or info.

How does this work at the hardware level?  Presumbly the actual
hardware components don't communicate with the XIVE to request edge or
level.  So how does it know?  Specific ranges for LSIs?  If that we
should probably do the same.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues Cédric Le Goater
  2017-11-23 20:31   ` Benjamin Herrenschmidt
@ 2017-11-30  4:38   ` David Gibson
  2017-11-30 14:06     ` Cédric Le Goater
  2017-12-02 14:39     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-11-30  4:38 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 7894 bytes --]

On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
> The Event Queue Descriptor (EQD) table, also known as Event Notification
> Descriptor (END), is one of the internal tables the XIVE interrupt
> controller uses to redirect exception from event sources to CPU
> threads.
> 
> The EQD specifies on which Event Queue the event data should be posted
> when an exception occurs (later on pulled by the OS) and which server
> (VPD in XIVE terminology) to notify. The Event Queue is a much more
> complex structure but we start with a simple model for the sPAPR
> machine.

Just to clarify my understanding a server / VPD in XIVE would
typically correspond to a cpu - either real or virtual, yes?

> There is one XiveEQ per priority and the model chooses to store them
> under the Xive Interrupt presenter model. It will be retrieved, just
> like for XICS, through the 'intc' object pointer of the CPU.
> 
> The EQ indexing follows a simple pattern:
> 
>        (server << 3) | (priority & 0x7)
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 106 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 554b25e0884c..983317a6b3f6 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -23,6 +23,7 @@
>  #include "sysemu/dma.h"
>  #include "monitor/monitor.h"
>  #include "hw/ppc/spapr_xive.h"
> +#include "hw/ppc/spapr.h"
>  #include "hw/ppc/xics.h"
>  
>  #include "xive-internal.h"
> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
>      uint8_t   tima[TM_RING_COUNT * 0x10];
>      uint8_t   *tima_os;
>      qemu_irq  output;
> +
> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>  };
>  
>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>      },
>  };
>  
> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
> +{
> +    PowerPCCPU *cpu = spapr_find_cpu(server);
> +
> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
> +}
> +
>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>  {
>  
> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>  
>      memset(xicp->tima, 0, sizeof(xicp->tima));
> +
> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
>  }
>  
>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
>  }
>  
> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT32(w0, XiveEQ),
> +        VMSTATE_UINT32(w1, XiveEQ),
> +        VMSTATE_UINT32(w2, XiveEQ),
> +        VMSTATE_UINT32(w3, XiveEQ),
> +        VMSTATE_UINT32(w4, XiveEQ),
> +        VMSTATE_UINT32(w5, XiveEQ),
> +        VMSTATE_UINT32(w6, XiveEQ),
> +        VMSTATE_UINT32(w7, XiveEQ),

Wow.  Super descriptive field names there, but I guess that's not your fault.

> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
>  {
>      /* TODO check machine XIVE support */
> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
>      .needed = vmstate_spapr_xive_icp_needed,
>      .fields = (VMStateField[]) {
>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>      ive->w &= ~IVE_VALID;
>      return true;
>  }
> +
> +/*
> + * Use a simple indexing for the EQs.

Is this server+priority encoding architected anywhere?  Otherwise, why
not use separate parameters?

> + */
> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
> +{
> +    int priority = eq_idx & 0x7;
> +    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
> +
> +    return xicp ? &xicp->eqt[priority] : NULL;
> +}
> +
> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
> +                              uint8_t priority, uint32_t *out_eq_idx)
> +{
> +    if (priority > XIVE_PRIORITY_MAX) {
> +        return false;
> +    }
> +
> +    if (out_eq_idx) {
> +        *out_eq_idx = (server << 3) | (priority & 0x7);
> +    }
> +
> +    return true;
> +}
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index 7d329f203a9b..c3949671aa03 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -131,9 +131,59 @@ typedef struct XiveIVE {
>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>  } XiveIVE;
>  
> +/* EQ */
> +typedef struct XiveEQ {
> +        uint32_t        w0;
> +#define EQ_W0_VALID             PPC_BIT32(0)
> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
> +#define EQ_W0_SW0               PPC_BIT32(16)
> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
> +#define EQ_QSIZE_4K             0
> +#define EQ_QSIZE_64K            4
> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
> +        uint32_t        w1;
> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
> +#define EQ_W1_ESn_P             PPC_BIT32(0)
> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
> +#define EQ_W1_ESe_P             PPC_BIT32(2)
> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
> +#define EQ_W1_GENERATION        PPC_BIT32(9)
> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> +        uint32_t        w2;
> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> +        uint32_t        w3;
> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> +        uint32_t        w4;
> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
> +        uint32_t        w5;
> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
> +        uint32_t        w6;
> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> +        uint32_t        w7;
> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> +} XiveEQ;
> +
>  #define XIVE_PRIORITY_MAX  7
>  
>  void spapr_xive_reset(void *dev);
>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
> +                              uint32_t *out_eq_idx);
>  
>  #endif /* _INTC_XIVE_INTERNAL_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue Cédric Le Goater
@ 2017-11-30  4:49   ` David Gibson
  2017-11-30 14:16     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-30  4:49 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 4119 bytes --]

On Thu, Nov 23, 2017 at 02:29:44PM +0100, Cédric Le Goater wrote:
> If a triggered event is let through, the Event Queue data defined in the
> associated IVE is pushed in the in-memory event queue. The latter is a
> circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG hcall,
> one per server and priority couple. It is composed of Event Queue entries
> which are 4 bytes long, the first bit being a 'generation' bit and the 31
> following bits the EQ Data field.
> 
> The EQ Data field provides a way to set an invariant logical event source
> number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG hcall.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 983317a6b3f6..df14c5a88275 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -193,9 +193,76 @@ static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>      return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>  }
>  
> +static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
> +{
> +    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
> +    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
> +    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
> +    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
> +
> +    uint64_t qaddr = qaddr_base + (qindex << 2);
> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
> +    uint32_t qentries = 1 << (qsize + 10);
> +
> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {

This suggests that uint32_t data contains guest endian data, which it
generally shouldn't.  Better to use stl_be_dma() (or whatever is
appropriate for the endianness of the data field.

> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
> +                      HWADDR_PRIx "\n", __func__, qaddr);
> +        return;
> +    }
> +
> +    qindex = (qindex + 1) % qentries;
> +    if (qindex == 0) {
> +        qgen ^= 1;
> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
> +    }
> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
> +}
> +
>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>  {
> +    XiveIVE *ive;
> +    XiveEQ *eq;
> +    uint32_t eq_idx;
> +    uint8_t priority;
> +
> +    ive = spapr_xive_get_ive(xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);

As mentioned on other patches, I'm a little concerned by these
guest-triggerable logs.  I guess the LOG_GUEST_ERROR mask will save
us, though.

> +        return;
> +    }
>  
> +    if (ive->w & IVE_MASKED) {
> +        return;
> +    }
> +
> +    /* Find our XiveEQ */
> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
> +    eq = spapr_xive_get_eq(xive, eq_idx);
> +    if (!eq) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", lisn);
> +        return;
> +    }
> +
> +    if (eq->w0 & EQ_W0_ENQUEUE) {
> +        spapr_xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
> +    }
> +
> +    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> +    }
> +
> +    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> +        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> +
> +        /* The EQ is masked. Can this happen ?  */
> +        if (priority == 0xff) {
> +            g_assert_not_reached();
> +        }
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> +    }
>  }
>  
>  /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged Cédric Le Goater
@ 2017-11-30  5:00   ` David Gibson
  2017-11-30 16:17     ` Cédric Le Goater
                       ` (2 more replies)
  0 siblings, 3 replies; 128+ messages in thread
From: David Gibson @ 2017-11-30  5:00 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 5838 bytes --]

On Thu, Nov 23, 2017 at 02:29:45PM +0100, Cédric Le Goater wrote:
> The Pending Interrupt Priority Register (PIPR) contains the priority
> of the most favored pending notification. It is calculated from the
> Interrupt Pending Buffer (IPB) which indicates a pending interrupt at
> the priority corresponding to the bit number.
> 
> If the PIPR is more favored (1) than the Current Processor Priority
> Register (CPPR), the CPU interrupt line is raised and the EO bit of
> the Notification Source Register is updated to notify the presence of
> an exception for the O/S. The check needs to be done whenever the PIPR
> or the CPPR is changed.
> 
> Then, the O/S Exception is raised and the O/S acknowledges the
> interrupt with a special read in the TIMA. If the EO bit of the
> Notification Source Register (NSR) is set (and it should), the Current
> Processor Priority Register (CPPR) takes the value of the Pending
> Interrupt Priority Register (PIPR). The bit number in the Interrupt
> Pending Buffer (IPB) corresponding to the priority of the pending
> interrupt is reseted and so is the EO bit of the NSR.
> 
> (1) numerically less than
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index df14c5a88275..fead9c7031f3 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -39,9 +39,63 @@ struct sPAPRXiveICP {
>      XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>  };
>  
> +/* Convert a priority number to an Interrupt Pending Buffer (IPB)
> + * register, which indicates a pending interrupt at the priority
> + * corresponding to the bit number
> + */
> +static uint8_t priority_to_ipb(uint8_t priority)
> +{
> +    return priority > XIVE_PRIORITY_MAX ?
> +        0 : 1 << (XIVE_PRIORITY_MAX - priority);

Does handling out of bounds values here make sense, or should you just
assert() they're not passed in?

> +}
> +
> +/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
> + * Interrupt Priority Register (PIPR), which contains the priority of
> + * the most favored pending notification.
> + *
> + * TODO:
> + *
> + *   PIPR is clamped to CPPR. So the value in the PIPR is:
> + *
> + *     v = leftmost_bit_of(ipb) (or 0xff);
> + *     pipr = v < cppr ? v : cppr;
> + *
> + * Ben says: "which means it's never actually 0xff ... surprise !".
> + * But, the CPPR can be set to 0xFF ... I am confused ...

A resolution to this would be nice..

> + */
> +static uint8_t ipb_to_pipr(uint8_t ibp)
> +{
> +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
> +}
> +
>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>  {
> -    return 0;
> +    uint8_t nsr = icp->tima_os[TM_NSR];
> +
> +    qemu_irq_lower(icp->output);
> +
> +    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
> +        uint8_t cppr = icp->tima_os[TM_PIPR];
> +
> +        icp->tima_os[TM_CPPR] = cppr;
> +
> +        /* Reset the pending buffer bit */
> +        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);

What if multiple irqs of the same priority were queued?

> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> +
> +        /* Drop Exception bit for OS */
> +        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
> +    }
> +
> +    return (nsr << 8) | icp->tima_os[TM_CPPR];
> +}
> +
> +static void spapr_xive_icp_notify(sPAPRXiveICP *icp)
> +{
> +    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
> +        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> +        qemu_irq_raise(icp->output);
> +    }
>  }
>  
>  static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> @@ -51,6 +105,9 @@ static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
>      }
>  
>      icp->tima_os[TM_CPPR] = cppr;
> +
> +    /* CPPR has changed, inform the ICP which might raise an exception */
> +    spapr_xive_icp_notify(icp);
>  }
>  
>  /*
> @@ -224,6 +281,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>      XiveEQ *eq;
>      uint32_t eq_idx;
>      uint8_t priority;
> +    uint32_t server;
> +    sPAPRXiveICP *icp;
>  
>      ive = spapr_xive_get_ive(xive, lisn);
>      if (!ive || !(ive->w & IVE_VALID)) {
> @@ -253,6 +312,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>          qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
>      }
>  
> +    server = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> +    icp = spapr_xive_icp_get(xive, server);
> +    if (!icp) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for server %d\n", server);
> +        return;
> +    }
> +
>      if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
>          priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>  
> @@ -260,9 +326,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>          if (priority == 0xff) {
>              g_assert_not_reached();
>          }
> +
> +        /* Update the IPB (Interrupt Pending Buffer) with the priority
> +         * of the new notification and inform the ICP, which will
> +         * decide to raise the exception, or not, depending the CPPR.
> +         */
> +        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
>      } else {
>          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
>      }
> +
> +    spapr_xive_icp_notify(icp);
>  }
>  
>  /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine Cédric Le Goater
@ 2017-11-30  5:55   ` David Gibson
  2017-11-30 15:15     ` Cédric Le Goater
  2017-11-30 15:38     ` Cédric Le Goater
  0 siblings, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-11-30  5:55 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 4553 bytes --]

On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
> The XIVE object is designed to be always available, so it is created
> unconditionally on newer machines.

There doesn't actually seem to be anything dependent on machine
version here.

> Depending on the configuration and
> the guest capabilities, the CAS negotiation process will decide which
> interrupt model to use, legacy or XIVE.
> 
> The XIVE model makes use of the full range of the IRQ number space
> because the IRQ numbers for the CPU IPIs are allocated in the range
> below XICS_IRQ_BASE, which is unused by XICS.

Ok.  And I take it 4096 is enough space for the XIVE IPIs for the
forseeable future?

> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr.c         | 34 ++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5d3325ca3c88..0e0107c8272c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -56,6 +56,7 @@
>  #include "hw/ppc/spapr_vio.h"
>  #include "hw/pci-host/spapr.h"
>  #include "hw/ppc/xics.h"
> +#include "hw/ppc/spapr_xive.h"
>  #include "hw/pci/msi.h"
>  
>  #include "hw/pci/pci.h"
> @@ -204,6 +205,29 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
>      }
>  }
>  
> +static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
> +                                    Error **errp)
> +{
> +    Error *local_err = NULL;
> +    Object *obj;
> +
> +    obj = object_new(TYPE_SPAPR_XIVE);
> +    object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
> +    object_property_set_int(obj, nr_irqs, "nr-irqs",  &local_err);
> +    if (local_err) {
> +        goto error;
> +    }
> +    object_property_set_bool(obj, true, "realized", &local_err);
> +    if (local_err) {
> +        goto error;
> +    }
> +
> +    return SPAPR_XIVE(obj);
> +error:
> +    error_propagate(errp, local_err);
> +    return NULL;
> +}
> +
>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>                                    int smt_threads)
>  {
> @@ -2360,6 +2384,16 @@ static void ppc_spapr_init(MachineState *machine)
>      /* Set up Interrupt Controller before we create the VCPUs */
>      xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
>  
> +    /* We don't have KVM support yet, so check for irqchip=on */
> +    if (kvm_enabled() && machine_kernel_irqchip_required(machine)) {
> +        error_report("kernel_irqchip requested. no XIVE support");

I think you want an actual exit(1) here, no?  error_report() will
print an error but keep going.

> +    } else {
> +        /* XIVE uses the full range of IRQ numbers. The CPU IPIs will
> +         * use the range below XICS_IRQ_BASE, which is unused by XICS. */
> +        spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + XICS_IRQS_SPAPR,
> +                                        &error_fatal);

XICS_IRQ_BASE == 4096, and XICS_IRQS_SPAPR (which we should rename at
some point) == 1024.

So we have a total irq space of 5k, which is a bit odd.  I'd be ok
with rounding it out to 8k for newer machines if that's useful.
Sparse allocations in there might make life easier for getting
consistent irq numbers without an "allocator" per se (because we can
use different regions for VIO, PCI intx, MSI, etc. etc.).

> +    }
> +
>      /* Set up containers for ibm,client-architecture-support negotiated options
>       */
>      spapr->ov5 = spapr_ovec_new();
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9a3885593c86..90e2b0f6c678 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -14,6 +14,7 @@ struct sPAPRNVRAM;
>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>  typedef struct sPAPREventSource sPAPREventSource;
>  typedef struct sPAPRPendingHPT sPAPRPendingHPT;
> +typedef struct sPAPRXive sPAPRXive;
>  
>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>  #define SPAPR_ENTRY_POINT       0x100
> @@ -127,6 +128,7 @@ struct sPAPRMachineState {
>      MemoryHotplugState hotplug_memory;
>  
>      const char *icp_type;
> +    sPAPRXive  *xive;
>  };
>  
>  #define H_SUCCESS         0

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model
  2017-11-30  4:06       ` David Gibson
@ 2017-11-30 13:44         ` Cédric Le Goater
  2017-12-01  4:03           ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 13:44 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/30/2017 04:06 AM, David Gibson wrote:
> On Wed, Nov 29, 2017 at 10:55:34AM +0100, Cédric Le Goater wrote:
>> On 11/29/2017 06:11 AM, David Gibson wrote:
>>> On Thu, Nov 23, 2017 at 02:29:42PM +0100, Cédric Le Goater wrote:
>>>> The XIVE interrupt presenter exposes a set of rings, also called
>>>> Thread Interrupt Management Areas (TIMA), to handle priority
>>>> management and interrupt acknowledgment among other things. There is
>>>> one ring per level of privilege, four in all. The one we are
>>>> interested in for the sPAPR machine is the OS ring.
>>>>
>>>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
>>>> used to retrieve the targeted interrupt presenter object holding the
>>>> cache data of the registers the model use.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/spapr_xive.c        | 271 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  hw/intc/xive-internal.h     |  89 +++++++++++++++
>>>>  include/hw/ppc/spapr_xive.h |  11 ++
>>>>  3 files changed, 371 insertions(+)
>>>>
>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>> index b1e3f8710cff..554b25e0884c 100644
>>>> --- a/hw/intc/spapr_xive.c
>>>> +++ b/hw/intc/spapr_xive.c
>>>> @@ -23,9 +23,166 @@
>>>>  #include "sysemu/dma.h"
>>>>  #include "monitor/monitor.h"
>>>>  #include "hw/ppc/spapr_xive.h"
>>>> +#include "hw/ppc/xics.h"
>>>>  
>>>>  #include "xive-internal.h"
>>>>  
>>>> +struct sPAPRXiveICP {
>>>
>>> I'd really prefer to avoid calling anything in xive "icp" to avoid
>>> confusion with xics.
>>
>> OK. 
>>
>> The specs refers to the whole as an IVPE : Interrupt Virtualization 
>> Presentation Engine. In our model, we use the TIMA cached values of 
>> the OS ring and the qemu_irq for the CPU line. 
>>
>> Would 'sPAPRXivePresenter' be fine ?
> 
> That'd be ok.  Or call if sPAPRIVPE.  Or even call it TIMA.  I'd be
> fine with any of those.

In this model, I am making a lot of shortcuts in the XIVE concepts
(which I don't master completely yet ...) 

The IVPE is the part of the overall controller doing the interrupt 
presentation.

The TIMA refers to the MMIO region in which the thread interrupt 
management is done. 

The XIVE structure that contains the 'virtual processor' interrupt 
state is the NVT: Notification Virtual Target. An index to an NVT 
is stored in the EQs to do the routing. I did not introduce the NVT 
in sPAPRXive because it's rather big, 128 bytes, and we don't need 
much of it (NSR, CPPR, PIPR, IPB) but we could use a shorten one.

So I think sPAPRXiveNVT, or sPAPRXiveVP (VP for virtual processor)
would be better names.

We will need more of the NVT structure to support the hcalls 
doing the set and the get of the address of the Reporting Cache 
line (H_INT_{S,G}ET_OS_REPORTING_LINE). We can extend it when 
time comes.  

C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-30  4:38   ` David Gibson
@ 2017-11-30 14:06     ` Cédric Le Goater
  2017-11-30 23:35       ` David Gibson
  2017-12-02 14:39     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 14:06 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/30/2017 04:38 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
>> The Event Queue Descriptor (EQD) table, also known as Event Notification
>> Descriptor (END), is one of the internal tables the XIVE interrupt
>> controller uses to redirect exception from event sources to CPU
>> threads.
>>
>> The EQD specifies on which Event Queue the event data should be posted
>> when an exception occurs (later on pulled by the OS) and which server
>> (VPD in XIVE terminology) to notify. The Event Queue is a much more
>> complex structure but we start with a simple model for the sPAPR
>> machine.
> 
> Just to clarify my understanding a server / VPD in XIVE would
> typically correspond to a cpu - either real or virtual, yes?

yes. VP for "virtual processor" and VPD for "virtual processor 
descriptor" which contains the XIVE interrupt state of the VP 
when not dispatched. It is still described in some documentation 
as an NVT : Notification Virtual Target.  

XIVE concepts were renamed at some time but the old name perdured.
I am still struggling my way through all the names.


>> There is one XiveEQ per priority and the model chooses to store them
>> under the Xive Interrupt presenter model. It will be retrieved, just
>> like for XICS, through the 'intc' object pointer of the CPU.
>>
>> The EQ indexing follows a simple pattern:
>>
>>        (server << 3) | (priority & 0x7)
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 106 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 554b25e0884c..983317a6b3f6 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -23,6 +23,7 @@
>>  #include "sysemu/dma.h"
>>  #include "monitor/monitor.h"
>>  #include "hw/ppc/spapr_xive.h"
>> +#include "hw/ppc/spapr.h"
>>  #include "hw/ppc/xics.h"
>>  
>>  #include "xive-internal.h"
>> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
>>      uint8_t   tima[TM_RING_COUNT * 0x10];
>>      uint8_t   *tima_os;
>>      qemu_irq  output;
>> +
>> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>>  };
>>  
>>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>>      },
>>  };
>>  
>> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>> +{
>> +    PowerPCCPU *cpu = spapr_find_cpu(server);
>> +
>> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>> +}
>> +
>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>  {
>>  
>> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
>>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>>  
>>      memset(xicp->tima, 0, sizeof(xicp->tima));
>> +
>> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
>>  }
>>  
>>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
>> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
>>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
>>  }
>>  
>> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
>> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT32(w0, XiveEQ),
>> +        VMSTATE_UINT32(w1, XiveEQ),
>> +        VMSTATE_UINT32(w2, XiveEQ),
>> +        VMSTATE_UINT32(w3, XiveEQ),
>> +        VMSTATE_UINT32(w4, XiveEQ),
>> +        VMSTATE_UINT32(w5, XiveEQ),
>> +        VMSTATE_UINT32(w6, XiveEQ),
>> +        VMSTATE_UINT32(w7, XiveEQ),
> 
> Wow.  Super descriptive field names there, but I guess that's not your fault.

The defines in the "xive-internal.h" give a better view ... 

>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
>>  {
>>      /* TODO check machine XIVE support */
>> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
>>      .needed = vmstate_spapr_xive_icp_needed,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
>> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
>> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>>      ive->w &= ~IVE_VALID;
>>      return true;
>>  }
>> +
>> +/*
>> + * Use a simple indexing for the EQs.
> 
> Is this server+priority encoding architected anywhere?  

no. This is a model shortcut.

> Otherwise, why not use separate parameters?

yes. spapr_xive_get_eq() could use separate parameters and it would
shorten the some of the hcalls.

The result is stored in a single field of the IVE, EQ_INDEX. So I will 
still need mangle/demangle routines but these could be simple macros.
I will look at it.

>> + */
>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
>> +{
>> +    int priority = eq_idx & 0x7;
>> +    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
>> +
>> +    return xicp ? &xicp->eqt[priority] : NULL;
>> +}
>> +
>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
>> +                              uint8_t priority, uint32_t *out_eq_idx)
>> +{
>> +    if (priority > XIVE_PRIORITY_MAX) {
>> +        return false;
>> +    }
>> +
>> +    if (out_eq_idx) {
>> +        *out_eq_idx = (server << 3) | (priority & 0x7);
>> +    }
>> +
>> +    return true;
>> +}
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> index 7d329f203a9b..c3949671aa03 100644
>> --- a/hw/intc/xive-internal.h
>> +++ b/hw/intc/xive-internal.h
>> @@ -131,9 +131,59 @@ typedef struct XiveIVE {
>>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>>  } XiveIVE;
>>  
>> +/* EQ */
>> +typedef struct XiveEQ {
>> +        uint32_t        w0;
>> +#define EQ_W0_VALID             PPC_BIT32(0)
>> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
>> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
>> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
>> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
>> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
>> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
>> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
>> +#define EQ_W0_SW0               PPC_BIT32(16)
>> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
>> +#define EQ_QSIZE_4K             0
>> +#define EQ_QSIZE_64K            4
>> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
>> +        uint32_t        w1;
>> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
>> +#define EQ_W1_ESn_P             PPC_BIT32(0)
>> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
>> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
>> +#define EQ_W1_ESe_P             PPC_BIT32(2)
>> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
>> +#define EQ_W1_GENERATION        PPC_BIT32(9)
>> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
>> +        uint32_t        w2;
>> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
>> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
>> +        uint32_t        w3;
>> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
>> +        uint32_t        w4;
>> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
>> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
>> +        uint32_t        w5;
>> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
>> +        uint32_t        w6;
>> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
>> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
>> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
>> +        uint32_t        w7;
>> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
>> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
>> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
>> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
>> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>> +} XiveEQ;
>> +
>>  #define XIVE_PRIORITY_MAX  7
>>  
>>  void spapr_xive_reset(void *dev);
>>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
>> +                              uint32_t *out_eq_idx);
>>  
>>  #endif /* _INTC_XIVE_INTERNAL_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-11-30  4:49   ` David Gibson
@ 2017-11-30 14:16     ` Cédric Le Goater
  2017-12-01  4:10       ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 14:16 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/30/2017 04:49 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:44PM +0100, Cédric Le Goater wrote:
>> If a triggered event is let through, the Event Queue data defined in the
>> associated IVE is pushed in the in-memory event queue. The latter is a
>> circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG hcall,
>> one per server and priority couple. It is composed of Event Queue entries
>> which are 4 bytes long, the first bit being a 'generation' bit and the 31
>> following bits the EQ Data field.
>>
>> The EQ Data field provides a way to set an invariant logical event source
>> number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG hcall.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 67 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 983317a6b3f6..df14c5a88275 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -193,9 +193,76 @@ static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>>      return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>>  }
>>  
>> +static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
>> +{
>> +    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
>> +    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
>> +    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
>> +    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
>> +
>> +    uint64_t qaddr = qaddr_base + (qindex << 2);
>> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
>> +    uint32_t qentries = 1 << (qsize + 10);
>> +
>> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
> 
> This suggests that uint32_t data contains guest endian data, which it
> generally shouldn't.  Better to use stl_be_dma() (or whatever is
> appropriate for the endianness of the data field.

There are no requirement on the endianness of the data field and 
it is just stored in the IVE in the hcall H_INT_SET_SOURCE_CONFIG. 
So the guest can pass whatever it likes.  

>> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
>> +                      HWADDR_PRIx "\n", __func__, qaddr);
>> +        return;
>> +    }
>> +
>> +    qindex = (qindex + 1) % qentries;
>> +    if (qindex == 0) {
>> +        qgen ^= 1;
>> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
>> +    }
>> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
>> +}
>> +
>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>  {
>> +    XiveIVE *ive;
>> +    XiveEQ *eq;
>> +    uint32_t eq_idx;
>> +    uint8_t priority;
>> +
>> +    ive = spapr_xive_get_ive(xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> 
> As mentioned on other patches, I'm a little concerned by these
> guest-triggerable logs.  I guess the LOG_GUEST_ERROR mask will save
> us, though.

I want to track 'invalid' interrupts but I haven't seen these show up 
in my tests. I agree there are a little too much and some could just 
be asserts.

Thanks,

C.

> 
>> +        return;
>> +    }
>>  
>> +    if (ive->w & IVE_MASKED) {
>> +        return;
>> +    }
>> +
>> +    /* Find our XiveEQ */
>> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
>> +    eq = spapr_xive_get_eq(xive, eq_idx);
>> +    if (!eq) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", lisn);
>> +        return;
>> +    }
>> +
>> +    if (eq->w0 & EQ_W0_ENQUEUE) {
>> +        spapr_xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
>> +    }
>> +
>> +    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
>> +    }
>> +
>> +    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
>> +        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>> +
>> +        /* The EQ is masked. Can this happen ?  */
>> +        if (priority == 0xff) {
>> +            g_assert_not_reached();
>> +        }
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
>> +    }
>>  }
>>  
>>  /*
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-11-30  5:55   ` David Gibson
@ 2017-11-30 15:15     ` Cédric Le Goater
  2017-12-01  4:14       ` David Gibson
  2017-11-30 15:38     ` Cédric Le Goater
  1 sibling, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 15:15 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/30/2017 05:55 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
>> The XIVE object is designed to be always available, so it is created
>> unconditionally on newer machines.
> 
> There doesn't actually seem to be anything dependent on machine
> version here.

No. I thought that was too early in the patchset. This is handled 
in the last patch with a 'xive_exploitation' bool which is set to 
false on older machines. 

But, nevertheless, the XIVE objects are always created even if not
used. Something to discuss. 

>> Depending on the configuration and
>> the guest capabilities, the CAS negotiation process will decide which
>> interrupt model to use, legacy or XIVE.
>>
>> The XIVE model makes use of the full range of the IRQ number space
>> because the IRQ numbers for the CPU IPIs are allocated in the range
>> below XICS_IRQ_BASE, which is unused by XICS.
> 
> Ok.  And I take it 4096 is enough space for the XIVE IPIs for the
> forseeable future?

The biggest real system I am aware of as 16 sockets, 192 cores, SMT8. 
That's 1536 cpus. pseries has a max_cpus of 1024. 

>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/ppc/spapr.c         | 34 ++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |  2 ++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 5d3325ca3c88..0e0107c8272c 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -56,6 +56,7 @@
>>  #include "hw/ppc/spapr_vio.h"
>>  #include "hw/pci-host/spapr.h"
>>  #include "hw/ppc/xics.h"
>> +#include "hw/ppc/spapr_xive.h"
>>  #include "hw/pci/msi.h"
>>  
>>  #include "hw/pci/pci.h"
>> @@ -204,6 +205,29 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
>>      }
>>  }
>>  
>> +static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
>> +                                    Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    Object *obj;
>> +
>> +    obj = object_new(TYPE_SPAPR_XIVE);
>> +    object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
>> +    object_property_set_int(obj, nr_irqs, "nr-irqs",  &local_err);
>> +    if (local_err) {
>> +        goto error;
>> +    }
>> +    object_property_set_bool(obj, true, "realized", &local_err);
>> +    if (local_err) {
>> +        goto error;
>> +    }
>> +
>> +    return SPAPR_XIVE(obj);
>> +error:
>> +    error_propagate(errp, local_err);
>> +    return NULL;
>> +}
>> +
>>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>>                                    int smt_threads)
>>  {
>> @@ -2360,6 +2384,16 @@ static void ppc_spapr_init(MachineState *machine)
>>      /* Set up Interrupt Controller before we create the VCPUs */
>>      xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
>>  
>> +    /* We don't have KVM support yet, so check for irqchip=on */
>> +    if (kvm_enabled() && machine_kernel_irqchip_required(machine)) {
>> +        error_report("kernel_irqchip requested. no XIVE support");
> 
> I think you want an actual exit(1) here, no?  error_report() will
> print an error but keep going.

yes. Today, it coredumps. I am not sure why. I will add an exit().

> 
>> +    } else {
>> +        /* XIVE uses the full range of IRQ numbers. The CPU IPIs will
>> +         * use the range below XICS_IRQ_BASE, which is unused by XICS. */
>> +        spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + XICS_IRQS_SPAPR,
>> +                                        &error_fatal);
> 
> XICS_IRQ_BASE == 4096, and XICS_IRQS_SPAPR (which we should rename at
> some point) == 1024.
> 
> So we have a total irq space of 5k, which is a bit odd.  I'd be ok
> with rounding it out to 8k for newer machines if that's useful.
> Sparse allocations in there might make life easier for getting
> consistent irq numbers without an "allocator" per se (because we can
> use different regions for VIO, PCI intx, MSI, etc. etc.).
I will start another thread on that topic.

Thanks,

C. 

> 
>> +    }
>> +
>>      /* Set up containers for ibm,client-architecture-support negotiated options
>>       */
>>      spapr->ov5 = spapr_ovec_new();
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 9a3885593c86..90e2b0f6c678 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -14,6 +14,7 @@ struct sPAPRNVRAM;
>>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>>  typedef struct sPAPREventSource sPAPREventSource;
>>  typedef struct sPAPRPendingHPT sPAPRPendingHPT;
>> +typedef struct sPAPRXive sPAPRXive;
>>  
>>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>>  #define SPAPR_ENTRY_POINT       0x100
>> @@ -127,6 +128,7 @@ struct sPAPRMachineState {
>>      MemoryHotplugState hotplug_memory;
>>  
>>      const char *icp_type;
>> +    sPAPRXive  *xive;
>>  };
>>  
>>  #define H_SUCCESS         0
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-11-30  5:55   ` David Gibson
  2017-11-30 15:15     ` Cédric Le Goater
@ 2017-11-30 15:38     ` Cédric Le Goater
  2017-12-01  4:17       ` David Gibson
  1 sibling, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 15:38 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Greg Kurz

>> +    } else {
>> +        /* XIVE uses the full range of IRQ numbers. The CPU IPIs will
>> +         * use the range below XICS_IRQ_BASE, which is unused by XICS. */
>> +        spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + XICS_IRQS_SPAPR,
>> +                                        &error_fatal);
> 
> XICS_IRQ_BASE == 4096, and XICS_IRQS_SPAPR (which we should rename at
> some point) == 1024.

BTW, why XICS_IRQ_BASE == 4096 ? I could not find a reason for
this offset. 

> So we have a total irq space of 5k, which is a bit odd.  I'd be ok
> with rounding it out to 8k for newer machines if that's useful.

ok. and using a machine class value to maintain compatibility. That 
would be useful if we allocate more PHBs. 

> Sparse allocations in there might make life easier for getting
> consistent irq numbers without an "allocator" per se (because we can
> use different regions for VIO, PCI intx, MSI, etc. etc.).

So, do you think we should modify the IRQ allocator routines to be 
able to segment the IRQ number space and let devices specify the
range they want to use ? 

That would be useful for the PHB LSIs. The starting IRQ for the PHB
could be aligned on some value depending on the PHB index, first 
would come the LSI interrupts and then the MSIs which are allocated 
later on by the guest. We would have predictable values.

Thanks,

C. 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-30  4:26           ` David Gibson
@ 2017-11-30 15:40             ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 15:40 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/30/2017 04:26 AM, David Gibson wrote:
> On Wed, Nov 29, 2017 at 02:56:39PM +0100, Cédric Le Goater wrote:
>>>>>> +    switch (offset) {
>>>>>> +    case 0:
>>>>>> +        spapr_xive_source_eoi(xive, lisn);
>>>>>
>>>>> Hrm.  I don't love that you're dealing with clearing that LSI bit
>>>>> here, but setting it at a different level.
>>>>>
>>>>> The state machines are doing my head in a bit, is there any way
>>>>> you could derive the STATUS_SENT bit from the PQ bits?
>>>>
>>>> Yes. I should. 
>>>>
>>>> I am also lacking a guest driver to exercise these LSIs so I didn't
>>>> pay a lot of attention to level interrupts. Any idea ?
>>>
>>> How about an old-school emulated PCI device?  Maybe rtl8139?
>>
>> Perfect. The current model is working but I will see how I can 
>> improve it to use the PQ bits instead.
>>
>> I also found a couple of issues on the way. 
>>
>> We do need the "#interrupt-cells" and "interrupt-controller" 
>> properties. They are missing from the XIVE sPAPR specs but there
>> is no other way to find the parent controller for the LSIs ... 
>> I have re-asked the pHyp team to include them in the specs and 
>> fixed the QEMU model.
> 
> Told ya so :).

I believed you ! I just needed a test case :)

C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-30  4:28             ` David Gibson
@ 2017-11-30 16:05               ` Cédric Le Goater
  2017-12-02 14:33               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 16:05 UTC (permalink / raw)
  To: David Gibson; +Cc: list@suse.de:PowerPC, qemu-devel, Benjamin Herrenschmidt

On 11/30/2017 04:28 AM, David Gibson wrote:
> On Wed, Nov 29, 2017 at 05:23:25PM +0100, Cédric Le Goater wrote:
>> On 11/29/2017 02:56 PM, Cédric Le Goater wrote:
>>>>>>> +    switch (offset) {
>>>>>>> +    case 0:
>>>>>>> +        spapr_xive_source_eoi(xive, lisn);
>>>>>>
>>>>>> Hrm.  I don't love that you're dealing with clearing that LSI bit
>>>>>> here, but setting it at a different level.
>>>>>>
>>>>>> The state machines are doing my head in a bit, is there any way
>>>>>> you could derive the STATUS_SENT bit from the PQ bits?
>>>>>
>>>>> Yes. I should. 
>>>>>
>>>>> I am also lacking a guest driver to exercise these LSIs so I didn't
>>>>> pay a lot of attention to level interrupts. Any idea ?
>>>>
>>>> How about an old-school emulated PCI device?  Maybe rtl8139?
>>>
>>> Perfect. The current model is working but I will see how I can 
>>> improve it to use the PQ bits instead.
>>
>> Using the PQ bits is simplifying the model but we still have to 
>> maintain an array to store the IRQ type. 
>>
>> There are 3 unused bits in the IVE descriptor, bits[1-3]:  
>>
>>   #define IVE_VALID       PPC_BIT(0)
>>   #define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>>   #define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>>   #define IVE_MASKED      PPC_BIT(32)              /* Masked */
>>   #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>>
>> We could hijack one of them to store the LSI type and get rid of 
>> the type array. Would you object to that ?
> 
> Hrm.  These IVE bits are architected, aren't they?  

Yes and unused.

> In which case I'd
> be wary of stealing a reserved bit in case of future extensions.
> 
> I'm wondering if we want another word / structure for storing
> non-architected, implementation specific flags or info.

That's what is done with the status array. As migration will put 
pressure on future changes, may be, we should go for an extra 
word for these model non-architected needs.
 
> How does this work at the hardware level?  Presumbly the actual
> hardware components don't communicate with the XIVE to request edge or
> level.  

No.

> So how does it know?  Specific ranges for LSIs? 
 
When OPAL allocates interrupt numbers for a device, it records 
their type to handle the EOI a little differently for LSIs. 
For sPAPR, it is really the same, the hcall H_INT_GET_SOURCE_INFO 
gives the interrupt type to the guest and the EOI is handled 
with a specific load. A part from that, the LSI interrupts follow 
the same path as the MSI.
 
The device controller must have some extra logic to handle the 
level for these interrupts but I am no expert in the domain. 

> If that we should probably do the same.

Modeling the LSI level with the PQ bits looks fine. We just need 
to store the IRQ type information somewhere under the XIVE object. 
We can keep it in a byte array or a bitmap to reduce the size. 
But if we foresee additional state to store, we might want to use 
the byte array directly.

Thanks,

C.
 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged
  2017-11-30  5:00   ` David Gibson
@ 2017-11-30 16:17     ` Cédric Le Goater
  2017-12-02 14:40     ` Benjamin Herrenschmidt
  2017-12-07 11:55     ` Cédric Le Goater
  2 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-11-30 16:17 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 11/30/2017 05:00 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:45PM +0100, Cédric Le Goater wrote:
>> The Pending Interrupt Priority Register (PIPR) contains the priority
>> of the most favored pending notification. It is calculated from the
>> Interrupt Pending Buffer (IPB) which indicates a pending interrupt at
>> the priority corresponding to the bit number.
>>
>> If the PIPR is more favored (1) than the Current Processor Priority
>> Register (CPPR), the CPU interrupt line is raised and the EO bit of
>> the Notification Source Register is updated to notify the presence of
>> an exception for the O/S. The check needs to be done whenever the PIPR
>> or the CPPR is changed.
>>
>> Then, the O/S Exception is raised and the O/S acknowledges the
>> interrupt with a special read in the TIMA. If the EO bit of the
>> Notification Source Register (NSR) is set (and it should), the Current
>> Processor Priority Register (CPPR) takes the value of the Pending
>> Interrupt Priority Register (PIPR). The bit number in the Interrupt
>> Pending Buffer (IPB) corresponding to the priority of the pending
>> interrupt is reseted and so is the EO bit of the NSR.
>>
>> (1) numerically less than
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 76 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index df14c5a88275..fead9c7031f3 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -39,9 +39,63 @@ struct sPAPRXiveICP {
>>      XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>>  };
>>  
>> +/* Convert a priority number to an Interrupt Pending Buffer (IPB)
>> + * register, which indicates a pending interrupt at the priority
>> + * corresponding to the bit number
>> + */
>> +static uint8_t priority_to_ipb(uint8_t priority)
>> +{
>> +    return priority > XIVE_PRIORITY_MAX ?
>> +        0 : 1 << (XIVE_PRIORITY_MAX - priority);
> 
> Does handling out of bounds values here make sense, or should you just
> assert() they're not passed in?

Looking at the code, I think we could assert, yes. I need to 
check the SET_OS_PENDING command first.

>> +}
>> +
>> +/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
>> + * Interrupt Priority Register (PIPR), which contains the priority of
>> + * the most favored pending notification.
>> + *
>> + * TODO:
>> + *
>> + *   PIPR is clamped to CPPR. So the value in the PIPR is:
>> + *
>> + *     v = leftmost_bit_of(ipb) (or 0xff);
>> + *     pipr = v < cppr ? v : cppr;
>> + *
>> + * Ben says: "which means it's never actually 0xff ... surprise !".
>> + * But, the CPPR can be set to 0xFF ... I am confused ...
> 
> A resolution to this would be nice..

That's on my TODO list. Not a big issue. 
 
>> + */
>> +static uint8_t ipb_to_pipr(uint8_t ibp)
>> +{
>> +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
>> +}
>> +
>>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>>  {
>> -    return 0;
>> +    uint8_t nsr = icp->tima_os[TM_NSR];
>> +
>> +    qemu_irq_lower(icp->output);
>> +
>> +    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
>> +        uint8_t cppr = icp->tima_os[TM_PIPR];
>> +
>> +        icp->tima_os[TM_CPPR] = cppr;
>> +
>> +        /* Reset the pending buffer bit */
>> +        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
> 
> What if multiple irqs of the same priority were queued?

When an interrupt is EOI'ed, the queue is scanned to check 
for any pending interrupts. If so, a replay is forced with 
a call to force_external_irq_replay()

C. 

> 
>> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
>> +
>> +        /* Drop Exception bit for OS */
>> +        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
>> +    }
>> +
>> +    return (nsr << 8) | icp->tima_os[TM_CPPR];
>> +}
>> +
>> +static void spapr_xive_icp_notify(sPAPRXiveICP *icp)
>> +{
>> +    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
>> +        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
>> +        qemu_irq_raise(icp->output);
>> +    }
>>  }
>>  
>>  static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
>> @@ -51,6 +105,9 @@ static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
>>      }
>>  
>>      icp->tima_os[TM_CPPR] = cppr;
>> +
>> +    /* CPPR has changed, inform the ICP which might raise an exception */
>> +    spapr_xive_icp_notify(icp);
>>  }
>>  
>>  /*
>> @@ -224,6 +281,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>      XiveEQ *eq;
>>      uint32_t eq_idx;
>>      uint8_t priority;
>> +    uint32_t server;
>> +    sPAPRXiveICP *icp;
>>  
>>      ive = spapr_xive_get_ive(xive, lisn);
>>      if (!ive || !(ive->w & IVE_VALID)) {
>> @@ -253,6 +312,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>          qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
>>      }
>>  
>> +    server = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
>> +    icp = spapr_xive_icp_get(xive, server);
>> +    if (!icp) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for server %d\n", server);
>> +        return;
>> +    }
>> +
>>      if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
>>          priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>>  
>> @@ -260,9 +326,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>          if (priority == 0xff) {
>>              g_assert_not_reached();
>>          }
>> +
>> +        /* Update the IPB (Interrupt Pending Buffer) with the priority
>> +         * of the new notification and inform the ICP, which will
>> +         * decide to raise the exception, or not, depending the CPPR.
>> +         */
>> +        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
>> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
>>      } else {
>>          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
>>      }
>> +
>> +    spapr_xive_icp_notify(icp);
>>  }
>>  
>>  /*
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-30 14:06     ` Cédric Le Goater
@ 2017-11-30 23:35       ` David Gibson
  2017-12-01 16:36         ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-11-30 23:35 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 9759 bytes --]

On Thu, Nov 30, 2017 at 02:06:27PM +0000, Cédric Le Goater wrote:
> On 11/30/2017 04:38 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
> >> The Event Queue Descriptor (EQD) table, also known as Event Notification
> >> Descriptor (END), is one of the internal tables the XIVE interrupt
> >> controller uses to redirect exception from event sources to CPU
> >> threads.
> >>
> >> The EQD specifies on which Event Queue the event data should be posted
> >> when an exception occurs (later on pulled by the OS) and which server
> >> (VPD in XIVE terminology) to notify. The Event Queue is a much more
> >> complex structure but we start with a simple model for the sPAPR
> >> machine.
> > 
> > Just to clarify my understanding a server / VPD in XIVE would
> > typically correspond to a cpu - either real or virtual, yes?
> 
> yes. VP for "virtual processor" and VPD for "virtual processor 
> descriptor" which contains the XIVE interrupt state of the VP 
> when not dispatched. It is still described in some documentation 
> as an NVT : Notification Virtual Target.  
> 
> XIVE concepts were renamed at some time but the old name perdured.
> I am still struggling my way through all the names.
> 
> 
> >> There is one XiveEQ per priority and the model chooses to store them
> >> under the Xive Interrupt presenter model. It will be retrieved, just
> >> like for XICS, through the 'intc' object pointer of the CPU.
> >>
> >> The EQ indexing follows a simple pattern:
> >>
> >>        (server << 3) | (priority & 0x7)
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 106 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 554b25e0884c..983317a6b3f6 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -23,6 +23,7 @@
> >>  #include "sysemu/dma.h"
> >>  #include "monitor/monitor.h"
> >>  #include "hw/ppc/spapr_xive.h"
> >> +#include "hw/ppc/spapr.h"
> >>  #include "hw/ppc/xics.h"
> >>  
> >>  #include "xive-internal.h"
> >> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
> >>      uint8_t   tima[TM_RING_COUNT * 0x10];
> >>      uint8_t   *tima_os;
> >>      qemu_irq  output;
> >> +
> >> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
> >>  };
> >>  
> >>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> >> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
> >>      },
> >>  };
> >>  
> >> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
> >> +{
> >> +    PowerPCCPU *cpu = spapr_find_cpu(server);
> >> +
> >> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
> >> +}
> >> +
> >>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >>  {
> >>  
> >> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
> >>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
> >>  
> >>      memset(xicp->tima, 0, sizeof(xicp->tima));
> >> +
> >> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
> >>  }
> >>  
> >>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
> >> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
> >>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
> >>  }
> >>  
> >> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
> >> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .fields = (VMStateField []) {
> >> +        VMSTATE_UINT32(w0, XiveEQ),
> >> +        VMSTATE_UINT32(w1, XiveEQ),
> >> +        VMSTATE_UINT32(w2, XiveEQ),
> >> +        VMSTATE_UINT32(w3, XiveEQ),
> >> +        VMSTATE_UINT32(w4, XiveEQ),
> >> +        VMSTATE_UINT32(w5, XiveEQ),
> >> +        VMSTATE_UINT32(w6, XiveEQ),
> >> +        VMSTATE_UINT32(w7, XiveEQ),
> > 
> > Wow.  Super descriptive field names there, but I guess that's not your fault.
> 
> The defines in the "xive-internal.h" give a better view ... 
> 
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +};
> >> +
> >>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
> >>  {
> >>      /* TODO check machine XIVE support */
> >> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
> >>      .needed = vmstate_spapr_xive_icp_needed,
> >>      .fields = (VMStateField[]) {
> >>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
> >> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
> >> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
> >>          VMSTATE_END_OF_LIST()
> >>      },
> >>  };
> >> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
> >>      ive->w &= ~IVE_VALID;
> >>      return true;
> >>  }
> >> +
> >> +/*
> >> + * Use a simple indexing for the EQs.
> > 
> > Is this server+priority encoding architected anywhere?  
> 
> no. This is a model shortcut.
> 
> > Otherwise, why not use separate parameters?
> 
> yes. spapr_xive_get_eq() could use separate parameters and it would
> shorten the some of the hcalls.
> 
> The result is stored in a single field of the IVE, EQ_INDEX. So I will 
> still need mangle/demangle routines but these could be simple macros.
> I will look at it.

Hm, ok.  So it's architected in the sense that you're using the
encoding from the EQ_INDEX field throughout.  That's could be a
reasonable choice, I can't really tell yet.

On the other hand, it might be easier to read if we use server and
priority as separate parameters until the point we actually encode
into the EQ_INDEX field.

> 
> >> + */
> >> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
> >> +{
> >> +    int priority = eq_idx & 0x7;
> >> +    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
> >> +
> >> +    return xicp ? &xicp->eqt[priority] : NULL;
> >> +}
> >> +
> >> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
> >> +                              uint8_t priority, uint32_t *out_eq_idx)
> >> +{
> >> +    if (priority > XIVE_PRIORITY_MAX) {
> >> +        return false;
> >> +    }
> >> +
> >> +    if (out_eq_idx) {
> >> +        *out_eq_idx = (server << 3) | (priority & 0x7);
> >> +    }
> >> +
> >> +    return true;
> >> +}
> >> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> >> index 7d329f203a9b..c3949671aa03 100644
> >> --- a/hw/intc/xive-internal.h
> >> +++ b/hw/intc/xive-internal.h
> >> @@ -131,9 +131,59 @@ typedef struct XiveIVE {
> >>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> >>  } XiveIVE;
> >>  
> >> +/* EQ */
> >> +typedef struct XiveEQ {
> >> +        uint32_t        w0;
> >> +#define EQ_W0_VALID             PPC_BIT32(0)
> >> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
> >> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
> >> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
> >> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
> >> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
> >> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
> >> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
> >> +#define EQ_W0_SW0               PPC_BIT32(16)
> >> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
> >> +#define EQ_QSIZE_4K             0
> >> +#define EQ_QSIZE_64K            4
> >> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
> >> +        uint32_t        w1;
> >> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
> >> +#define EQ_W1_ESn_P             PPC_BIT32(0)
> >> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
> >> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
> >> +#define EQ_W1_ESe_P             PPC_BIT32(2)
> >> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
> >> +#define EQ_W1_GENERATION        PPC_BIT32(9)
> >> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> >> +        uint32_t        w2;
> >> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> >> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> >> +        uint32_t        w3;
> >> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> >> +        uint32_t        w4;
> >> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
> >> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
> >> +        uint32_t        w5;
> >> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
> >> +        uint32_t        w6;
> >> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
> >> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> >> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> >> +        uint32_t        w7;
> >> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
> >> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> >> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> >> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
> >> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> >> +} XiveEQ;
> >> +
> >>  #define XIVE_PRIORITY_MAX  7
> >>  
> >>  void spapr_xive_reset(void *dev);
> >>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
> >> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
> >> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
> >> +                              uint32_t *out_eq_idx);
> >>  
> >>  #endif /* _INTC_XIVE_INTERNAL_H */
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 19/25] spapr: add hcalls support " Cédric Le Goater
@ 2017-12-01  4:01   ` David Gibson
  2017-12-01 17:46     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-01  4:01 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 36892 bytes --]

On Thu, Nov 23, 2017 at 02:29:49PM +0100, Cédric Le Goater wrote:
> A set of Hypervisor's call are used to configure the interrupt sources
> and the event/notification queues of the guest:
> 
>  - H_INT_GET_SOURCE_INFO
> 
>    used to obtain the address of the MMIO page of the Event State
>    Buffer (PQ bits) entry associated with the source.
> 
>  - H_INT_SET_SOURCE_CONFIG
> 
>    assigns a source to a "target".
> 
>  - H_INT_GET_SOURCE_CONFIG
> 
>    determines to which "target" and "priority" is assigned to a source
> 
>  - H_INT_GET_QUEUE_INFO
> 
>    returns the address of the notification management page associated
>    with the specified "target" and "priority".
> 
>  - H_INT_SET_QUEUE_CONFIG
> 
>    sets or resets the event queue for a given "target" and "priority".
>    It is also used to set the notification config associated with the
>    queue, only unconditional notification for the moment.  Reset is
>    performed with a queue size of 0 and queueing is disabled in that
>    case.
> 
>  - H_INT_GET_QUEUE_CONFIG
> 
>    returns the queue settings for a given "target" and "priority".
> 
>  - H_INT_RESET
> 
>    resets all of the partition's interrupt exploitation structures to
>    their initial state, losing all configuration set via the hcalls
>    H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> 
>  - H_INT_SYNC
> 
>    issue a synchronisation on a source to make sure sure all
>    notifications have reached their queue.
> 
> Calls that still need to be addressed :
> 
>    H_INT_SET_OS_REPORTING_LINE
>    H_INT_GET_OS_REPORTING_LINE
> 
> See the code for more documentation on each hcall.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/Makefile.objs       |   2 +-
>  hw/intc/spapr_xive_hcall.c  | 885 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr.c              |   2 +
>  include/hw/ppc/spapr.h      |  15 +-
>  include/hw/ppc/spapr_xive.h |   4 +
>  5 files changed, 906 insertions(+), 2 deletions(-)
>  create mode 100644 hw/intc/spapr_xive_hcall.c
> 
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 49e13e7aeeee..122e2ec77e8d 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -35,7 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
>  obj-$(CONFIG_XICS) += xics.o
>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> -obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o spapr_xive_hcall.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> new file mode 100644
> index 000000000000..676fe0e2d5c7
> --- /dev/null
> +++ b/hw/intc/spapr_xive_hcall.c
> @@ -0,0 +1,885 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE model
> + *
> + * Copyright (c) 2017, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "cpu.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_xive.h"
> +#include "hw/ppc/fdt.h"
> +#include "monitor/monitor.h"
> +
> +#include "xive-internal.h"
> +
> +/* Priority ranges reserved by the hypervisor. The Linux driver is
> + * expected to choose priority 6.
> + */
> +static const uint32_t reserved_priorities[] = {
> +    7,    /* start */
> +    0xf8, /* count */
> +};
> +
> +static bool priority_is_valid(uint32_t priority)
> +{
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(reserved_priorities) / 2; i++) {
> +        uint32_t base  = reserved_priorities[2 * i];
> +        uint32_t count = reserved_priorities[2 * i + 1];
> +
> +        if (priority >= base && priority < base + count) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: priority %d is reserved\n",
> +                          __func__, priority);
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}

This seems like overkill.  Aren't there only 0..7 levels supported in
hardware, in which case a one byte bitmap will suffice to store the
reserved levels.

To check my understanding again, if you're running this with KVM, the
host kernel and qemu will need to agree on which are the reserved
levels, yes?

> +
> +/*
> + * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
> + * real address of the MMIO page through which the Event State Buffer
> + * entry associated with the value of the "lisn" parameter is managed.
> + *
> + * Parameters:
> + * Input
> + * - "flags"
> + *       Bits 0-63 reserved
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
> + *       ibm,query-interrupt-source-number RTAS call, or as returned
> + *       by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output
> + * - R4: "flags"
> + *       Bits 0-59: Reserved
> + *       Bit 60: H_INT_ESB must be used for Event State Buffer
> + *               management
> + *       Bit 61: 1 == LSI  0 == MSI
> + *       Bit 62: the full function page supports trigger
> + *       Bit 63: Store EOI Supported
> + * - R5: Logical Real address of full function Event State Buffer
> + *       management page, -1 if ESB hcall flag is set to 1.
> + * - R6: Logical Real Address of trigger only Event State Buffer
> + *       management page or -1.
> + * - R7: Power of 2 page size for the ESB management pages returned in
> + *       R5 and R6.
> + */
> +static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
> +                                          sPAPRMachineState *spapr,
> +                                          target_ulong opcode,
> +                                          target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    XiveIVE *ive;
> +    target_ulong flags  = args[0];
> +    target_ulong lisn   = args[1];
> +    uint64_t mmio_base;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;

Is H_FUNCTION required by the PAPR ACRs here?  Usually we only use
H_FUNCTION if the hypercall doesn't exist at all, and if unavailable
for other reasons use H_AUTHORITY or something.

> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        return H_P2;
> +    }
> +
> +    mmio_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;

Hrm.. why was xive->esb_base not already a u64?

> +    args[0] = 0;
> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
> +        args[0] |= XIVE_SRC_LSI;
> +    }
> +    if (xive->flags & XIVE_SRC_TRIGGER) {
> +        args[0] |= XIVE_SRC_TRIGGER;
> +    }
> +
> +    if (xive->flags & XIVE_SRC_H_INT_ESB) {
> +        args[1] = -1; /* never used in QEMU  */
> +        args[2] = -1;
> +    } else {
> +        args[1] = mmio_base;
> +        if (xive->flags & XIVE_SRC_TRIGGER) {
> +            args[2] = -1; /* No specific trigger page */
> +        } else {
> +            args[2] = -1; /* TODO: support for specific trigger page */
> +        }
> +    }

What does the availability of SRC_TRIGGER (and INT_ESB) depend on?  If
it varies with host capabilities, that's going to be real pain for
migration.

> +
> +    args[3] = xive->esb_shift;
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
> + * Interrupt Source to a target. The Logical Interrupt Source is
> + * designated with the "lisn" parameter and the target is designated
> + * with the "target" and "priority" parameters.  Upon return from the
> + * hcall(), no additional interrupts will be directed to the old EQ.
> + *
> + * TODO: The old EQ should be investigated for interrupts that
> + * occurred prior to or during the hcall().
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-61: Reserved
> + *      Bit 62: set the "eisn" in the EA
> + *      Bit 63: masks the interrupt source in the hardware interrupt
> + *      control structure. An interrupt masked by this mechanism will
> + *      be dropped, but it's source state bits will still be
> + *      set. There is no race-free way of unmasking and restoring the
> + *      source. Thus this should only be used in interrupts that are
> + *      also masked at the source, and only in cases where the
> + *      interrupt is not meant to be used for a large amount of time
> + *      because no valid target exists for it for example
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> + *      ibm,query-interrupt-source-number RTAS call, or as returned by
> + *      the H_ALLOCATE_VAS_WINDOW hcall
> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> + *      "ibm,ppc-interrupt-gserver#s"
> + * - "priority" is a valid priority not in
> + *      "ibm,plat-res-int-priorities"
> + * - "eisn" is the guest EISN associated with the "lisn"
> + *
> + * Output:
> + * - None
> + */
> +
> +#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
> +#define XIVE_SRC_MASK     (1ull << (63 - 63))

Aren't there already a bunch of macros you have for defining things in
terms of IBM bit numbers, so you can avoid open coding (63 - whatever).

> +
> +static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
> +                                            sPAPRMachineState *spapr,
> +                                            target_ulong opcode,
> +                                            target_ulong *args)
> +{
> +    XiveIVE *ive;
> +    uint64_t new_ive;
> +    target_ulong flags    = args[0];
> +    target_ulong lisn     = args[1];
> +    target_ulong target   = args[2];
> +    target_ulong priority = args[3];
> +    target_ulong eisn     = args[4];
> +    uint32_t eq_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        return H_P2;
> +    }
> +
> +    /* priority 0xff is used to reset the IVE */
> +    if (priority == 0xff) {
> +        new_ive = IVE_VALID | IVE_MASKED;
> +        goto out;
> +    }
> +
> +    new_ive = ive->w;
> +
> +    if (flags & XIVE_SRC_MASK) {
> +        new_ive = ive->w | IVE_MASKED;
> +    } else {
> +        new_ive = ive->w & ~IVE_MASKED;
> +    }
> +
> +    if (!priority_is_valid(priority)) {
> +        return H_P4;
> +    }
> +
> +    /* TODO: If the partition thread count is greater than the
> +     * hardware thread count, validate the "target" has a
> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> +     */

What's this about?  I thought the point of XIVE was you could set up
target queues for your vcpus regardless of mapping to physical cpus.

> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the EQ corresponding to the
> +     * target.
> +     */
> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> +        return H_P3;
> +    }
> +
> +    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
> +    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
> +
> +    if (flags & XIVE_SRC_SET_EISN) {
> +        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
> +    }
> +
> +out:
> +    /* TODO: handle syncs ? */
> +
> +    /* And update */
> +    ive->w = new_ive;
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
> + * target/priority pair is assigned to the specified Logical Interrupt
> + * Source.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-63 Reserved
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> + *      ibm,query-interrupt-source-number RTAS call, or as
> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output:
> + * - R4: Target to which the specified Logical Interrupt Source is
> + *       assigned
> + * - R5: Priority to which the specified Logical Interrupt Source is
> + *       assigned
> + * - R6: EISN for the specified Logical Interrupt Source (this will be
> + *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
> + */
> +static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
> +                                            sPAPRMachineState *spapr,
> +                                            target_ulong opcode,
> +                                            target_ulong *args)
> +{
> +    target_ulong flags = args[0];
> +    target_ulong lisn = args[1];
> +    XiveIVE *ive;
> +    XiveEQ *eq;
> +    uint32_t eq_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        return H_P2;
> +    }
> +
> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> +    if (!eq) {
> +        return H_HARDWARE;
> +    }
> +
> +    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> +
> +    if (ive->w & IVE_MASKED) {
> +        args[1] = 0xff;
> +    } else {
> +        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> +    }
> +
> +    args[2] = GETFIELD(IVE_EQ_DATA, ive->w);
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
> + * address of the notification management page associated with the
> + * specified target and priority.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *       Bits 0-63 Reserved
> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - "priority" is a valid priority not in
> + *       "ibm,plat-res-int-priorities"
> + *
> + * Output:
> + * - R4: Logical real address of notification page
> + * - R5: Power of 2 page size of the notification page
> + */
> +static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
> +                                         sPAPRMachineState *spapr,
> +                                         target_ulong opcode,
> +                                         target_ulong *args)
> +{
> +    target_ulong flags    = args[0];
> +    target_ulong target   = args[1];
> +    target_ulong priority = args[2];
> +    uint32_t eq_idx;
> +    XiveEQ *eq;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    if (!priority_is_valid(priority)) {
> +        return H_P3;
> +    }
> +
> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the EQ corresponding to the
> +     * target.
> +     */
> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> +        return H_P2;
> +    }
> +
> +    /* TODO: If the partition thread count is greater than the
> +     * hardware thread count, validate the "target" has a
> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> +     */
> +
> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> +    if (!eq)  {
> +        return H_HARDWARE;
> +    }
> +
> +    args[0] = -1; /* TODO: return ESn page */
> +    if (eq->w0 & EQ_W0_ENQUEUE) {
> +        args[1] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
> +    } else {
> +        args[1] = 0;
> +    }
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
> + * a given "target" and "priority".  It is also used to set the
> + * notification config associated with the EQ.  An EQ size of 0 is
> + * used to reset the EQ config for a given target and priority. If
> + * resetting the EQ config, the END associated with the given "target"
> + * and "priority" will be changed to disable queueing.
> + *
> + * Upon return from the hcall(), no additional interrupts will be
> + * directed to the old EQ (if one was set). The old EQ (if one was
> + * set) should be investigated for interrupts that occurred prior to
> + * or during the hcall().
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-62: Reserved
> + *      Bit 63: Unconditional Notify (n) per the XIVE spec
> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - "priority" is a valid priority not in
> + *       "ibm,plat-res-int-priorities"
> + * - "eventQueue": The logical real address of the start of the EQ
> + * - "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
> + *
> + * Output:
> + * - None
> + */
> +
> +#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
> +
> +static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
> +                                           sPAPRMachineState *spapr,
> +                                           target_ulong opcode,
> +                                           target_ulong *args)
> +{
> +    target_ulong flags    = args[0];
> +    target_ulong target   = args[1];
> +    target_ulong priority = args[2];
> +    target_ulong qpage    = args[3];
> +    target_ulong qsize    = args[4];
> +    uint32_t eq_idx;
> +    XiveEQ *old_eq;
> +    XiveEQ eq;
> +    uint32_t qdata;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~XIVE_EQ_ALWAYS_NOTIFY) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    if (!priority_is_valid(priority)) {
> +        return H_P3;
> +    }
> +
> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the EQ corresponding to the
> +     * target.
> +     */
> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> +        return H_P2;
> +    }
> +
> +    /* TODO: If the partition thread count is greater than the
> +     * hardware thread count, validate the "target" has a
> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> +     */
> +
> +    old_eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> +    if (!old_eq)  {
> +        return H_HARDWARE;
> +    }
> +
> +    eq = *old_eq;
> +
> +    switch (qsize) {
> +    case 12:
> +    case 16:
> +    case 21:
> +    case 24:
> +        eq.w3 = ((uint64_t)qpage) & 0xffffffff;
> +        eq.w2 = (((uint64_t)qpage)) >> 32 & 0x0fffffff;
> +        eq.w0 |= EQ_W0_ENQUEUE;
> +        eq.w0 = SETFIELD(EQ_W0_QSIZE, eq.w0, qsize - 12);
> +        break;
> +    case 0:
> +        /* reset queue and disable queueing */
> +        eq.w2 = eq.w3 = 0;
> +        eq.w0 &= ~EQ_W0_ENQUEUE;
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid EQ size %"PRIx64"\n",
> +                      __func__, qsize);
> +        return H_P5;
> +    }
> +
> +    if (qsize) {
> +        /*
> +         * Let's validate the EQ address with a read of the first EQ
> +         * entry. We could also check that the full queue has been
> +         * zeroed by the OS.
> +         */
> +        if (address_space_read(&address_space_memory, qpage,
> +                               MEMTXATTRS_UNSPECIFIED,
> +                               (uint8_t *) &qdata, sizeof(qdata))) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to read EQ data @0x%"
> +                          HWADDR_PRIx "\n", __func__, qpage);
> +            return H_P4;
> +        }
> +    }
> +
> +    /* Ensure the priority and target are correctly set (they will not
> +     * be right after allocation)
> +     */
> +    eq.w6 = SETFIELD(EQ_W6_NVT_BLOCK, 0ul, 0ul) |
> +        SETFIELD(EQ_W6_NVT_INDEX, 0ul, target);
> +    eq.w7 = SETFIELD(EQ_W7_F0_PRIORITY, 0ul, priority);
> +
> +    /* TODO: depends on notitification page (ESn) from H_INT_GET_QUEUE_INFO */
> +    if (flags & XIVE_EQ_ALWAYS_NOTIFY) {
> +        eq.w0 |= EQ_W0_UCOND_NOTIFY;

Do you need to also clear if the flag is not set?  AFAICT eq.w0 is
inherited from teh old queue and enver reset from scratch.

> +    }
> +
> +    /* The generation bit for the EQ starts at 1 and The EQ page
> +     * offset counter starts at 0.
> +     */
> +    eq.w1 = EQ_W1_GENERATION | SETFIELD(EQ_W1_PAGE_OFF, 0ul, 0ul);
> +    eq.w0 |= EQ_W0_VALID;
> +
> +    /* TODO: issue syncs required to ensure all in-flight interrupts
> +     * are complete on the old EQ */
> +
> +    /* Update EQ */
> +    *old_eq = eq;

Hrm.  The BQL probably saves you, but in general do you need to make
sure the ENQUEUE bit is set after updating everything else?

> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
> + * target and priority.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-62: Reserved
> + *      Bit 63: Debug: Return debug data
> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - "priority" is a valid priority not in
> + *       "ibm,plat-res-int-priorities"
> + *
> + * Output:
> + * - R4: "flags":
> + *       Bits 0-62: Reserved
> + *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
> + * - R5: The logical real address of the start of the EQ
> + * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
> + * - R7: The value of Event Queue Offset Counter per XIVE spec
> + *       if "Debug" = 1, else 0
> + *
> + */
> +
> +#define XIVE_EQ_DEBUG     (1ull << (63 - 63))
> +
> +static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
> +                                           sPAPRMachineState *spapr,
> +                                           target_ulong opcode,
> +                                           target_ulong *args)
> +{
> +    target_ulong flags    = args[0];
> +    target_ulong target   = args[1];
> +    target_ulong priority = args[2];
> +    uint32_t eq_idx;
> +    XiveEQ *eq;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~XIVE_EQ_DEBUG) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    if (!priority_is_valid(priority)) {
> +        return H_P3;
> +    }
> +
> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the EQ corresponding to the
> +     * target.
> +     */
> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> +        return H_P2;
> +    }
> +
> +    /* TODO: If the partition thread count is greater than the
> +     * hardware thread count, validate the "target" has a
> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> +     */
> +
> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> +    if (!eq)  {
> +        return H_HARDWARE;
> +    }
> +
> +    args[0] = 0;
> +    if (eq->w0 & EQ_W0_UCOND_NOTIFY) {
> +        args[0] |= XIVE_EQ_ALWAYS_NOTIFY;
> +    }
> +
> +    if (eq->w0 & EQ_W0_ENQUEUE) {
> +        args[1] =
> +            (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
> +        args[2] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
> +    } else {
> +        args[1] = 0;
> +        args[2] = 0;
> +    }
> +
> +    /* TODO: do we need any locking on the EQ ? */

Probably not if you're designating it as protected by the BQL.

> +    if (flags & XIVE_EQ_DEBUG) {
> +        /* Load the event queue generation number into the return flags */
> +        args[0] |= GETFIELD(EQ_W1_GENERATION, eq->w1);
> +
> +        /* Load R7 with the event queue offset counter */
> +        args[3] = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
> +    }
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
> + * reporting cache line pair for the calling thread.  The reporting
> + * cache lines will contain the OS interrupt context when the OS
> + * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
> + * interrupt. The reporting cache lines can be reset by inputting -1
> + * in "reportingLine".  Issuing the CI store byte without reporting
> + * cache lines registered will result in the data not being accessible
> + * to the OS.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-63: Reserved
> + * - "reportingLine": The logical real address of the reporting cache
> + *    line pair
> + *
> + * Output:
> + * - None
> + */
> +static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
> +                                                sPAPRMachineState *spapr,
> +                                                target_ulong opcode,
> +                                                target_ulong *args)
> +{
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    /* TODO: H_INT_SET_OS_REPORTING_LINE */
> +    return H_FUNCTION;
> +}
> +
> +/*
> + * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
> + * real address of the reporting cache line pair set for the input
> + * "target".  If no reporting cache line pair has been set, -1 is
> + * returned.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-63: Reserved
> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - "reportingLine": The logical real address of the reporting cache
> + *   line pair
> + *
> + * Output:
> + * - R4: The logical real address of the reporting line if set, else -1
> + */
> +static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
> +                                                sPAPRMachineState *spapr,
> +                                                target_ulong opcode,
> +                                                target_ulong *args)
> +{
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    /* TODO: H_INT_GET_OS_REPORTING_LINE */
> +    return H_FUNCTION;
> +}
> +
> +/*
> + * The H_INT_ESB hcall() is used to issue a load or store to the ESB
> + * page for the input "lisn".  This hcall is only supported for LISNs
> + * that have the ESB hcall flag set to 1 when returned from hcall()
> + * H_INT_GET_SOURCE_INFO.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-62: Reserved
> + *      bit 63: Store: Store=1, store operation, else load operation
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> + *      ibm,query-interrupt-source-number RTAS call, or as
> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
> + * - "esbOffset" is the offset into the ESB page for the load or store operation
> + * - "storeData" is the data to write for a store operation
> + *
> + * Output:
> + * - R4: R4: The value of the load if load operation, else -1
> + */
> +
> +#define XIVE_ESB_STORE (1ull << (63 - 63))
> +
> +static target_ulong h_int_esb(PowerPCCPU *cpu,
> +                              sPAPRMachineState *spapr,
> +                              target_ulong opcode,
> +                              target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    XiveIVE *ive;
> +    target_ulong flags   = args[0];
> +    target_ulong lisn    = args[1];
> +    target_ulong offset  = args[2];
> +    target_ulong data    = args[3];
> +    uint64_t esb_base;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~XIVE_ESB_STORE) {
> +        return H_PARAMETER;
> +    }
> +
> +    ive = spapr_xive_get_ive(xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        return H_P2;
> +    }
> +
> +    if (offset > (1ull << xive->esb_shift)) {
> +        return H_P3;
> +    }
> +
> +    esb_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
> +    esb_base += offset;
> +
> +    if (dma_memory_rw(&address_space_memory, esb_base, &data, 8,
> +                      (flags & XIVE_ESB_STORE))) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to rw data @0x%"
> +                      HWADDR_PRIx "\n", __func__, esb_base);
> +        return H_HARDWARE;
> +    }
> +    args[0] = (flags & XIVE_ESB_STORE) ? -1 : data;
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SYNC hcall() is used to issue hardware syncs that will
> + * ensure any in flight events for the input lisn are in the event
> + * queue.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-63: Reserved
> + * - "lisn" is per "interrupts", "interrupt-map", or
> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> + *      ibm,query-interrupt-source-number RTAS call, or as
> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output:
> + * - None
> + */
> +static target_ulong h_int_sync(PowerPCCPU *cpu,
> +                               sPAPRMachineState *spapr,
> +                               target_ulong opcode,
> +                               target_ulong *args)
> +{
> +    XiveIVE *ive;
> +    target_ulong flags   = args[0];
> +    target_ulong lisn    = args[1];
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        return H_P2;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    /* This is not real hardware. Nothing to be done */
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_RESET hcall() is used to reset all of the partition's
> + * interrupt exploitation structures to their initial state.  This
> + * means losing all previously set interrupt state set via
> + * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> + *
> + * Parameters:
> + * Input:
> + * - "flags"
> + *      Bits 0-63: Reserved
> + *
> + * Output:
> + * - None
> + */
> +static target_ulong h_int_reset(PowerPCCPU *cpu,
> +                                sPAPRMachineState *spapr,
> +                                target_ulong opcode,
> +                                target_ulong *args)
> +{
> +    target_ulong flags   = args[0];
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    spapr_xive_reset(spapr->xive);
> +    return H_SUCCESS;
> +}
> +
> +void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> +{
> +    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
> +    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
> +    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
> +    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
> +    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
> +    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
> +    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
> +                             h_int_set_os_reporting_line);
> +    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
> +                             h_int_get_os_reporting_line);
> +    spapr_register_hypercall(H_INT_ESB, h_int_esb);
> +    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> +    spapr_register_hypercall(H_INT_RESET, h_int_reset);
> +}
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ca4e72187f60..8b15c0b500d0 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -222,6 +222,8 @@ static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
>          goto error;
>      }
>  
> +    spapr_xive_hcall_init(spapr);
> +
>      return SPAPR_XIVE(obj);
>  error:
>      error_propagate(errp, local_err);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 90e2b0f6c678..a25e218b34e2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -387,7 +387,20 @@ struct sPAPRMachineState {
>  #define H_INVALIDATE_PID        0x378
>  #define H_REGISTER_PROC_TBL     0x37C
>  #define H_SIGNAL_SYS_RESET      0x380
> -#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
> +
> +#define H_INT_GET_SOURCE_INFO   0x3A8
> +#define H_INT_SET_SOURCE_CONFIG 0x3AC
> +#define H_INT_GET_SOURCE_CONFIG 0x3B0
> +#define H_INT_GET_QUEUE_INFO    0x3B4
> +#define H_INT_SET_QUEUE_CONFIG  0x3B8
> +#define H_INT_GET_QUEUE_CONFIG  0x3BC
> +#define H_INT_SET_OS_REPORTING_LINE 0x3C0
> +#define H_INT_GET_OS_REPORTING_LINE 0x3C4
> +#define H_INT_ESB               0x3C8
> +#define H_INT_SYNC              0x3CC
> +#define H_INT_RESET             0x3D0
> +
> +#define MAX_HCALL_OPCODE        H_INT_RESET
>  
>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>   * as well.
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 6e8a189e723f..3f822220647f 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -79,4 +79,8 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>  void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
>  
> +typedef struct sPAPRMachineState sPAPRMachineState;
> +
> +void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> +
>  #endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model
  2017-11-30 13:44         ` Cédric Le Goater
@ 2017-12-01  4:03           ` David Gibson
  2017-12-01  8:02             ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-01  4:03 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]

On Thu, Nov 30, 2017 at 01:44:51PM +0000, Cédric Le Goater wrote:
> On 11/30/2017 04:06 AM, David Gibson wrote:
> > On Wed, Nov 29, 2017 at 10:55:34AM +0100, Cédric Le Goater wrote:
> >> On 11/29/2017 06:11 AM, David Gibson wrote:
> >>> On Thu, Nov 23, 2017 at 02:29:42PM +0100, Cédric Le Goater wrote:
> >>>> The XIVE interrupt presenter exposes a set of rings, also called
> >>>> Thread Interrupt Management Areas (TIMA), to handle priority
> >>>> management and interrupt acknowledgment among other things. There is
> >>>> one ring per level of privilege, four in all. The one we are
> >>>> interested in for the sPAPR machine is the OS ring.
> >>>>
> >>>> The TIMA is mapped at the same address for each CPU. 'current_cpu' is
> >>>> used to retrieve the targeted interrupt presenter object holding the
> >>>> cache data of the registers the model use.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  hw/intc/spapr_xive.c        | 271 ++++++++++++++++++++++++++++++++++++++++++++
> >>>>  hw/intc/xive-internal.h     |  89 +++++++++++++++
> >>>>  include/hw/ppc/spapr_xive.h |  11 ++
> >>>>  3 files changed, 371 insertions(+)
> >>>>
> >>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >>>> index b1e3f8710cff..554b25e0884c 100644
> >>>> --- a/hw/intc/spapr_xive.c
> >>>> +++ b/hw/intc/spapr_xive.c
> >>>> @@ -23,9 +23,166 @@
> >>>>  #include "sysemu/dma.h"
> >>>>  #include "monitor/monitor.h"
> >>>>  #include "hw/ppc/spapr_xive.h"
> >>>> +#include "hw/ppc/xics.h"
> >>>>  
> >>>>  #include "xive-internal.h"
> >>>>  
> >>>> +struct sPAPRXiveICP {
> >>>
> >>> I'd really prefer to avoid calling anything in xive "icp" to avoid
> >>> confusion with xics.
> >>
> >> OK. 
> >>
> >> The specs refers to the whole as an IVPE : Interrupt Virtualization 
> >> Presentation Engine. In our model, we use the TIMA cached values of 
> >> the OS ring and the qemu_irq for the CPU line. 
> >>
> >> Would 'sPAPRXivePresenter' be fine ?
> > 
> > That'd be ok.  Or call if sPAPRIVPE.  Or even call it TIMA.  I'd be
> > fine with any of those.
> 
> In this model, I am making a lot of shortcuts in the XIVE concepts
> (which I don't master completely yet ...) 
> 
> The IVPE is the part of the overall controller doing the interrupt 
> presentation.
> 
> The TIMA refers to the MMIO region in which the thread interrupt 
> management is done. 
> 
> The XIVE structure that contains the 'virtual processor' interrupt 
> state is the NVT: Notification Virtual Target. An index to an NVT 
> is stored in the EQs to do the routing. I did not introduce the NVT 
> in sPAPRXive because it's rather big, 128 bytes, and we don't need 
> much of it (NSR, CPPR, PIPR, IPB) but we could use a shorten one.
> 
> So I think sPAPRXiveNVT, or sPAPRXiveVP (VP for virtual processor)
> would be better names.

Ok.  I prefer sPAPRXiveNVT of these two.

> 
> We will need more of the NVT structure to support the hcalls 
> doing the set and the get of the address of the Reporting Cache 
> line (H_INT_{S,G}ET_OS_REPORTING_LINE). We can extend it when 
> time comes.  
> 
> C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-11-30 14:16     ` Cédric Le Goater
@ 2017-12-01  4:10       ` David Gibson
  2017-12-01 16:43         ` Cédric Le Goater
  2017-12-02 14:45         ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-12-01  4:10 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 4333 bytes --]

On Thu, Nov 30, 2017 at 02:16:30PM +0000, Cédric Le Goater wrote:
> On 11/30/2017 04:49 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:44PM +0100, Cédric Le Goater wrote:
> >> If a triggered event is let through, the Event Queue data defined in the
> >> associated IVE is pushed in the in-memory event queue. The latter is a
> >> circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG hcall,
> >> one per server and priority couple. It is composed of Event Queue entries
> >> which are 4 bytes long, the first bit being a 'generation' bit and the 31
> >> following bits the EQ Data field.
> >>
> >> The EQ Data field provides a way to set an invariant logical event source
> >> number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG hcall.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 67 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 983317a6b3f6..df14c5a88275 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -193,9 +193,76 @@ static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
> >>      return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
> >>  }
> >>  
> >> +static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
> >> +{
> >> +    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
> >> +    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
> >> +    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
> >> +    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
> >> +
> >> +    uint64_t qaddr = qaddr_base + (qindex << 2);
> >> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
> >> +    uint32_t qentries = 1 << (qsize + 10);
> >> +
> >> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
> > 
> > This suggests that uint32_t data contains guest endian data, which it
> > generally shouldn't.  Better to use stl_be_dma() (or whatever is
> > appropriate for the endianness of the data field.
> 
> There are no requirement on the endianness of the data field and 
> it is just stored in the IVE in the hcall H_INT_SET_SOURCE_CONFIG. 
> So the guest can pass whatever it likes.  

Hm, ok.  Guest endian (or at least, not definitively host-endian) data
in a plain uint32_t makes me uncomfortable.  Could we use char data[4]
instead, to make it clear it's a byte-ordered buffer, rather than a
number as far as the XIVE is concerned.

Hm.. except that doesn't quite work, because the hardware must define
which end that generation bit ends up in...

> >> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
> >> +                      HWADDR_PRIx "\n", __func__, qaddr);
> >> +        return;
> >> +    }
> >> +
> >> +    qindex = (qindex + 1) % qentries;
> >> +    if (qindex == 0) {
> >> +        qgen ^= 1;
> >> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
> >> +    }
> >> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
> >> +}
> >> +
> >>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >>  {
> >> +    XiveIVE *ive;
> >> +    XiveEQ *eq;
> >> +    uint32_t eq_idx;
> >> +    uint8_t priority;
> >> +
> >> +    ive = spapr_xive_get_ive(xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID)) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> > 
> > As mentioned on other patches, I'm a little concerned by these
> > guest-triggerable logs.  I guess the LOG_GUEST_ERROR mask will save
> > us, though.
> 
> I want to track 'invalid' interrupts but I haven't seen these show up 
> in my tests. I agree there are a little too much and some could just 
> be asserts.

Uh.. I don't think many can be assert()s.  assert() is only
appropriate if it being tripped definitely indicates a bug in qemu.
Nearly all these qemu_log()s I've seen can be tripped by the guest
doing something bad, which absolutely should not assert() qemu.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-11-30 15:15     ` Cédric Le Goater
@ 2017-12-01  4:14       ` David Gibson
  2017-12-01  8:10         ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-01  4:14 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

On Thu, Nov 30, 2017 at 03:15:09PM +0000, Cédric Le Goater wrote:
> On 11/30/2017 05:55 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
> >> The XIVE object is designed to be always available, so it is created
> >> unconditionally on newer machines.
> > 
> > There doesn't actually seem to be anything dependent on machine
> > version here.
> 
> No. I thought that was too early in the patchset. This is handled 
> in the last patch with a 'xive_exploitation' bool which is set to 
> false on older machines. 
> 
> But, nevertheless, the XIVE objects are always created even if not
> used. Something to discuss.

That'll definitely break backwards migration, since the destination
won't understand the (unused but still present) xive state it
receives.  So xives can only be created on new machine types.  I'm ok
(at least tentatively) with always creating them on the newer machine
types, regardless of whether the guest ends up exploiting it or not.

> >> Depending on the configuration and
> >> the guest capabilities, the CAS negotiation process will decide which
> >> interrupt model to use, legacy or XIVE.
> >>
> >> The XIVE model makes use of the full range of the IRQ number space
> >> because the IRQ numbers for the CPU IPIs are allocated in the range
> >> below XICS_IRQ_BASE, which is unused by XICS.
> > 
> > Ok.  And I take it 4096 is enough space for the XIVE IPIs for the
> > forseeable future?
> 
> The biggest real system I am aware of as 16 sockets, 192 cores, SMT8. 
> That's 1536 cpus. pseries has a max_cpus of 1024.

Ok, so we can go to double the current system size, but not 4x.  Not
sure if that seems adequate or not.  Still it's a relatively minor
detail.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-11-30 15:38     ` Cédric Le Goater
@ 2017-12-01  4:17       ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-12-01  4:17 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Greg Kurz

[-- Attachment #1: Type: text/plain, Size: 2204 bytes --]

On Thu, Nov 30, 2017 at 03:38:46PM +0000, Cédric Le Goater wrote:
> >> +    } else {
> >> +        /* XIVE uses the full range of IRQ numbers. The CPU IPIs will
> >> +         * use the range below XICS_IRQ_BASE, which is unused by XICS. */
> >> +        spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + XICS_IRQS_SPAPR,
> >> +                                        &error_fatal);
> > 
> > XICS_IRQ_BASE == 4096, and XICS_IRQS_SPAPR (which we should rename at
> > some point) == 1024.
> 
> BTW, why XICS_IRQ_BASE == 4096 ? I could not find a reason for
> this offset.

It's basically arbitrary.  Possible I copied the value used in
practice on a PowerVM system of the time, but I don't recall for sure.

> > So we have a total irq space of 5k, which is a bit odd.  I'd be ok
> > with rounding it out to 8k for newer machines if that's useful.
> 
> ok. and using a machine class value to maintain compatibility. That 
> would be useful if we allocate more PHBs. 
> 
> > Sparse allocations in there might make life easier for getting
> > consistent irq numbers without an "allocator" per se (because we can
> > use different regions for VIO, PCI intx, MSI, etc. etc.).
> 
> So, do you think we should modify the IRQ allocator routines to be 
> able to segment the IRQ number space and let devices specify the
> range they want to use ?

No, I'm suggesting *eliminating* the IRQ allocator routines (except
for backwards compat) and having devices "just know" their irq numbers
based on their own device number and the portion of the overall irq
space they're supposed to live in.

PCI MSI is an exception, obviously, it will need some sort of runtime
allocation.

> That would be useful for the PHB LSIs. The starting IRQ for the PHB
> could be aligned on some value depending on the PHB index, first 
> would come the LSI interrupts and then the MSIs which are allocated 
> later on by the guest. We would have predictable values.
> 
> Thanks,
> 
> C. 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model
  2017-12-01  4:03           ` David Gibson
@ 2017-12-01  8:02             ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-01  8:02 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

>>> That'd be ok.  Or call if sPAPRIVPE.  Or even call it TIMA.  I'd be
>>> fine with any of those.
>>
>> In this model, I am making a lot of shortcuts in the XIVE concepts
>> (which I don't master completely yet ...) 
>>
>> The IVPE is the part of the overall controller doing the interrupt 
>> presentation.
>>
>> The TIMA refers to the MMIO region in which the thread interrupt 
>> management is done. 
>>
>> The XIVE structure that contains the 'virtual processor' interrupt 
>> state is the NVT: Notification Virtual Target. An index to an NVT 
>> is stored in the EQs to do the routing. I did not introduce the NVT 
>> in sPAPRXive because it's rather big, 128 bytes, and we don't need 
>> much of it (NSR, CPPR, PIPR, IPB) but we could use a shorten one.
>>
>> So I think sPAPRXiveNVT, or sPAPRXiveVP (VP for virtual processor)
>> would be better names.
> 
> Ok.  I prefer sPAPRXiveNVT of these two.

Fine. Also what about the location of the files. They are under
hw/intc/ but shouldn't we consider moving them under hw/spapr/ ? 

Thanks,

C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-12-01  4:14       ` David Gibson
@ 2017-12-01  8:10         ` Cédric Le Goater
  2017-12-04  1:59           ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-01  8:10 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/01/2017 05:14 AM, David Gibson wrote:
> On Thu, Nov 30, 2017 at 03:15:09PM +0000, Cédric Le Goater wrote:
>> On 11/30/2017 05:55 AM, David Gibson wrote:
>>> On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
>>>> The XIVE object is designed to be always available, so it is created
>>>> unconditionally on newer machines.
>>>
>>> There doesn't actually seem to be anything dependent on machine
>>> version here.
>>
>> No. I thought that was too early in the patchset. This is handled 
>> in the last patch with a 'xive_exploitation' bool which is set to 
>> false on older machines. 
>>
>> But, nevertheless, the XIVE objects are always created even if not
>> used. Something to discuss.
> 
> That'll definitely break backwards migration, since the destination
> won't understand the (unused but still present) xive state it
> receives. 

no because it's not sent. the vmstate 'needed' op of the sPAPRXive
object discards it :

    static bool vmstate_spapr_xive_needed(void *opaque)
    {
        sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
        return spapr->xive_exploitation;
    }

> So xives can only be created on new machine types. 

That would be better I agree. I can probably use the 'xive_exploitation'
bool to condition its creation.

> I'm ok
> (at least tentatively) with always creating them on the newer machine
> types, regardless of whether the guest ends up exploiting it or not.

OK.


>>>> Depending on the configuration and
>>>> the guest capabilities, the CAS negotiation process will decide which
>>>> interrupt model to use, legacy or XIVE.
>>>>
>>>> The XIVE model makes use of the full range of the IRQ number space
>>>> because the IRQ numbers for the CPU IPIs are allocated in the range
>>>> below XICS_IRQ_BASE, which is unused by XICS.
>>>
>>> Ok.  And I take it 4096 is enough space for the XIVE IPIs for the
>>> forseeable future?
>>
>> The biggest real system I am aware of as 16 sockets, 192 cores, SMT8. 
>> That's 1536 cpus. pseries has a max_cpus of 1024.
> 
> Ok, so we can go to double the current system size, but not 4x.  Not
> sure if that seems adequate or not.  Still it's a relatively minor
> detail.
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-30 23:35       ` David Gibson
@ 2017-12-01 16:36         ` Cédric Le Goater
  2017-12-04  1:09           ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-01 16:36 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/01/2017 12:35 AM, David Gibson wrote:
> On Thu, Nov 30, 2017 at 02:06:27PM +0000, Cédric Le Goater wrote:
>> On 11/30/2017 04:38 AM, David Gibson wrote:
>>> On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
>>>> The Event Queue Descriptor (EQD) table, also known as Event Notification
>>>> Descriptor (END), is one of the internal tables the XIVE interrupt
>>>> controller uses to redirect exception from event sources to CPU
>>>> threads.
>>>>
>>>> The EQD specifies on which Event Queue the event data should be posted
>>>> when an exception occurs (later on pulled by the OS) and which server
>>>> (VPD in XIVE terminology) to notify. The Event Queue is a much more
>>>> complex structure but we start with a simple model for the sPAPR
>>>> machine.
>>>
>>> Just to clarify my understanding a server / VPD in XIVE would
>>> typically correspond to a cpu - either real or virtual, yes?
>>
>> yes. VP for "virtual processor" and VPD for "virtual processor 
>> descriptor" which contains the XIVE interrupt state of the VP 
>> when not dispatched. It is still described in some documentation 
>> as an NVT : Notification Virtual Target.  
>>
>> XIVE concepts were renamed at some time but the old name perdured.
>> I am still struggling my way through all the names.
>>
>>
>>>> There is one XiveEQ per priority and the model chooses to store them
>>>> under the Xive Interrupt presenter model. It will be retrieved, just
>>>> like for XICS, through the 'intc' object pointer of the CPU.
>>>>
>>>> The EQ indexing follows a simple pattern:
>>>>
>>>>        (server << 3) | (priority & 0x7)
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 106 insertions(+)
>>>>
>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>> index 554b25e0884c..983317a6b3f6 100644
>>>> --- a/hw/intc/spapr_xive.c
>>>> +++ b/hw/intc/spapr_xive.c
>>>> @@ -23,6 +23,7 @@
>>>>  #include "sysemu/dma.h"
>>>>  #include "monitor/monitor.h"
>>>>  #include "hw/ppc/spapr_xive.h"
>>>> +#include "hw/ppc/spapr.h"
>>>>  #include "hw/ppc/xics.h"
>>>>  
>>>>  #include "xive-internal.h"
>>>> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
>>>>      uint8_t   tima[TM_RING_COUNT * 0x10];
>>>>      uint8_t   *tima_os;
>>>>      qemu_irq  output;
>>>> +
>>>> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>>>>  };
>>>>  
>>>>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>>>> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>>>>      },
>>>>  };
>>>>  
>>>> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>>>> +{
>>>> +    PowerPCCPU *cpu = spapr_find_cpu(server);
>>>> +
>>>> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>>>> +}
>>>> +
>>>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>>>  {
>>>>  
>>>> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
>>>>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>>>>  
>>>>      memset(xicp->tima, 0, sizeof(xicp->tima));
>>>> +
>>>> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
>>>>  }
>>>>  
>>>>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
>>>> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
>>>>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
>>>>  }
>>>>  
>>>> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
>>>> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .fields = (VMStateField []) {
>>>> +        VMSTATE_UINT32(w0, XiveEQ),
>>>> +        VMSTATE_UINT32(w1, XiveEQ),
>>>> +        VMSTATE_UINT32(w2, XiveEQ),
>>>> +        VMSTATE_UINT32(w3, XiveEQ),
>>>> +        VMSTATE_UINT32(w4, XiveEQ),
>>>> +        VMSTATE_UINT32(w5, XiveEQ),
>>>> +        VMSTATE_UINT32(w6, XiveEQ),
>>>> +        VMSTATE_UINT32(w7, XiveEQ),
>>>
>>> Wow.  Super descriptive field names there, but I guess that's not your fault.
>>
>> The defines in the "xive-internal.h" give a better view ... 
>>
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    },
>>>> +};
>>>> +
>>>>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
>>>>  {
>>>>      /* TODO check machine XIVE support */
>>>> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
>>>>      .needed = vmstate_spapr_xive_icp_needed,
>>>>      .fields = (VMStateField[]) {
>>>>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
>>>> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
>>>> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
>>>>          VMSTATE_END_OF_LIST()
>>>>      },
>>>>  };
>>>> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>>>>      ive->w &= ~IVE_VALID;
>>>>      return true;
>>>>  }
>>>> +
>>>> +/*
>>>> + * Use a simple indexing for the EQs.
>>>
>>> Is this server+priority encoding architected anywhere?  
>>
>> no. This is a model shortcut.
>>
>>> Otherwise, why not use separate parameters?
>>
>> yes. spapr_xive_get_eq() could use separate parameters and it would
>> shorten the some of the hcalls.
>>
>> The result is stored in a single field of the IVE, EQ_INDEX. So I will 
>> still need mangle/demangle routines but these could be simple macros.
>> I will look at it.
> 
> Hm, ok.  So it's architected in the sense that you're using the
> encoding from the EQ_INDEX field throughout.  That's could be a
> reasonable choice, I can't really tell yet.
> 
> On the other hand, it might be easier to read if we use server and
> priority as separate parameters until the point we actually encode
> into the EQ_INDEX field.

In the architecture, the EQ_INDEX field contains an index to an 
Event Queue Descriptor and the Event Queue Descriptor has a 
EQ_W6_NVT_INDEX field pointing to an Notification Virtual Target.
So there are two extra tables for the EQs and for the NVTs
used by the HW.

In the sPAPR model, an EQ array is stored under the sPAPRXiveNVT 
object which is stored under the ->intc pointer of the CPUState 
object

So the EQ_INDEX field is really taking a shortcut, encoding 
the cpu number and the priority to find an EQ, and the 
EQ_W6_NVT_INDEX field holds a value which is the cpu number.
But at the end, we save two tables. 

C.


> 
>>
>>>> + */
>>>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
>>>> +{
>>>> +    int priority = eq_idx & 0x7;
>>>> +    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
>>>> +
>>>> +    return xicp ? &xicp->eqt[priority] : NULL;
>>>> +}
>>>> +
>>>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
>>>> +                              uint8_t priority, uint32_t *out_eq_idx)
>>>> +{
>>>> +    if (priority > XIVE_PRIORITY_MAX) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    if (out_eq_idx) {
>>>> +        *out_eq_idx = (server << 3) | (priority & 0x7);
>>>> +    }
>>>> +
>>>> +    return true;
>>>> +}
>>>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>>>> index 7d329f203a9b..c3949671aa03 100644
>>>> --- a/hw/intc/xive-internal.h
>>>> +++ b/hw/intc/xive-internal.h
>>>> @@ -131,9 +131,59 @@ typedef struct XiveIVE {
>>>>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>>>>  } XiveIVE;
>>>>  
>>>> +/* EQ */
>>>> +typedef struct XiveEQ {
>>>> +        uint32_t        w0;
>>>> +#define EQ_W0_VALID             PPC_BIT32(0)
>>>> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
>>>> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
>>>> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
>>>> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
>>>> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
>>>> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
>>>> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
>>>> +#define EQ_W0_SW0               PPC_BIT32(16)
>>>> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
>>>> +#define EQ_QSIZE_4K             0
>>>> +#define EQ_QSIZE_64K            4
>>>> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
>>>> +        uint32_t        w1;
>>>> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
>>>> +#define EQ_W1_ESn_P             PPC_BIT32(0)
>>>> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
>>>> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
>>>> +#define EQ_W1_ESe_P             PPC_BIT32(2)
>>>> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
>>>> +#define EQ_W1_GENERATION        PPC_BIT32(9)
>>>> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
>>>> +        uint32_t        w2;
>>>> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
>>>> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
>>>> +        uint32_t        w3;
>>>> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
>>>> +        uint32_t        w4;
>>>> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
>>>> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
>>>> +        uint32_t        w5;
>>>> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
>>>> +        uint32_t        w6;
>>>> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
>>>> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
>>>> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
>>>> +        uint32_t        w7;
>>>> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
>>>> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
>>>> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
>>>> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
>>>> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>>>> +} XiveEQ;
>>>> +
>>>>  #define XIVE_PRIORITY_MAX  7
>>>>  
>>>>  void spapr_xive_reset(void *dev);
>>>>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
>>>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
>>>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
>>>> +                              uint32_t *out_eq_idx);
>>>>  
>>>>  #endif /* _INTC_XIVE_INTERNAL_H */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-12-01  4:10       ` David Gibson
@ 2017-12-01 16:43         ` Cédric Le Goater
  2017-12-02 14:45         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-01 16:43 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/01/2017 05:10 AM, David Gibson wrote:
> On Thu, Nov 30, 2017 at 02:16:30PM +0000, Cédric Le Goater wrote:
>> On 11/30/2017 04:49 AM, David Gibson wrote:
>>> On Thu, Nov 23, 2017 at 02:29:44PM +0100, Cédric Le Goater wrote:
>>>> If a triggered event is let through, the Event Queue data defined in the
>>>> associated IVE is pushed in the in-memory event queue. The latter is a
>>>> circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG hcall,
>>>> one per server and priority couple. It is composed of Event Queue entries
>>>> which are 4 bytes long, the first bit being a 'generation' bit and the 31
>>>> following bits the EQ Data field.
>>>>
>>>> The EQ Data field provides a way to set an invariant logical event source
>>>> number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG hcall.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 67 insertions(+)
>>>>
>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>> index 983317a6b3f6..df14c5a88275 100644
>>>> --- a/hw/intc/spapr_xive.c
>>>> +++ b/hw/intc/spapr_xive.c
>>>> @@ -193,9 +193,76 @@ static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>>>>      return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>>>>  }
>>>>  
>>>> +static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
>>>> +{
>>>> +    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
>>>> +    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
>>>> +    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
>>>> +    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
>>>> +
>>>> +    uint64_t qaddr = qaddr_base + (qindex << 2);
>>>> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
>>>> +    uint32_t qentries = 1 << (qsize + 10);
>>>> +
>>>> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
>>>
>>> This suggests that uint32_t data contains guest endian data, which it
>>> generally shouldn't.  Better to use stl_be_dma() (or whatever is
>>> appropriate for the endianness of the data field.
>>
>> There are no requirement on the endianness of the data field and 
>> it is just stored in the IVE in the hcall H_INT_SET_SOURCE_CONFIG. 
>> So the guest can pass whatever it likes.  
> 
> Hm, ok.  Guest endian (or at least, not definitively host-endian) data
> in a plain uint32_t makes me uncomfortable.  Could we use char data[4]
> instead, to make it clear it's a byte-ordered buffer, rather than a
> number as far as the XIVE is concerned.
> 
> Hm.. except that doesn't quite work, because the hardware must define
> which end that generation bit ends up in...

Sorry, this is is BE. My bad.

C.
 
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
>>>> +                      HWADDR_PRIx "\n", __func__, qaddr);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    qindex = (qindex + 1) % qentries;
>>>> +    if (qindex == 0) {
>>>> +        qgen ^= 1;
>>>> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
>>>> +    }
>>>> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
>>>> +}
>>>> +
>>>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>>>  {
>>>> +    XiveIVE *ive;
>>>> +    XiveEQ *eq;
>>>> +    uint32_t eq_idx;
>>>> +    uint8_t priority;
>>>> +
>>>> +    ive = spapr_xive_get_ive(xive, lisn);
>>>> +    if (!ive || !(ive->w & IVE_VALID)) {
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
>>>
>>> As mentioned on other patches, I'm a little concerned by these
>>> guest-triggerable logs.  I guess the LOG_GUEST_ERROR mask will save
>>> us, though.
>>
>> I want to track 'invalid' interrupts but I haven't seen these show up 
>> in my tests. I agree there are a little too much and some could just 
>> be asserts.
> 
> Uh.. I don't think many can be assert()s.  assert() is only
> appropriate if it being tripped definitely indicates a bug in qemu.
> Nearly all these qemu_log()s I've seen can be tripped by the guest
> doing something bad, which absolutely should not assert() qemu.
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-12-01  4:01   ` David Gibson
@ 2017-12-01 17:46     ` Cédric Le Goater
  2017-12-05  7:00       ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-01 17:46 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/01/2017 05:01 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:49PM +0100, Cédric Le Goater wrote:
>> A set of Hypervisor's call are used to configure the interrupt sources
>> and the event/notification queues of the guest:
>>
>>  - H_INT_GET_SOURCE_INFO
>>
>>    used to obtain the address of the MMIO page of the Event State
>>    Buffer (PQ bits) entry associated with the source.
>>
>>  - H_INT_SET_SOURCE_CONFIG
>>
>>    assigns a source to a "target".
>>
>>  - H_INT_GET_SOURCE_CONFIG
>>
>>    determines to which "target" and "priority" is assigned to a source
>>
>>  - H_INT_GET_QUEUE_INFO
>>
>>    returns the address of the notification management page associated
>>    with the specified "target" and "priority".
>>
>>  - H_INT_SET_QUEUE_CONFIG
>>
>>    sets or resets the event queue for a given "target" and "priority".
>>    It is also used to set the notification config associated with the
>>    queue, only unconditional notification for the moment.  Reset is
>>    performed with a queue size of 0 and queueing is disabled in that
>>    case.
>>
>>  - H_INT_GET_QUEUE_CONFIG
>>
>>    returns the queue settings for a given "target" and "priority".
>>
>>  - H_INT_RESET
>>
>>    resets all of the partition's interrupt exploitation structures to
>>    their initial state, losing all configuration set via the hcalls
>>    H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
>>
>>  - H_INT_SYNC
>>
>>    issue a synchronisation on a source to make sure sure all
>>    notifications have reached their queue.
>>
>> Calls that still need to be addressed :
>>
>>    H_INT_SET_OS_REPORTING_LINE
>>    H_INT_GET_OS_REPORTING_LINE
>>
>> See the code for more documentation on each hcall.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/Makefile.objs       |   2 +-
>>  hw/intc/spapr_xive_hcall.c  | 885 ++++++++++++++++++++++++++++++++++++++++++++
>>  hw/ppc/spapr.c              |   2 +
>>  include/hw/ppc/spapr.h      |  15 +-
>>  include/hw/ppc/spapr_xive.h |   4 +
>>  5 files changed, 906 insertions(+), 2 deletions(-)
>>  create mode 100644 hw/intc/spapr_xive_hcall.c
>>
>> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
>> index 49e13e7aeeee..122e2ec77e8d 100644
>> --- a/hw/intc/Makefile.objs
>> +++ b/hw/intc/Makefile.objs
>> @@ -35,7 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
>>  obj-$(CONFIG_XICS) += xics.o
>>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>> -obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o spapr_xive_hcall.o
>>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
>> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
>> new file mode 100644
>> index 000000000000..676fe0e2d5c7
>> --- /dev/null
>> +++ b/hw/intc/spapr_xive_hcall.c
>> @@ -0,0 +1,885 @@
>> +/*
>> + * QEMU PowerPC sPAPR XIVE model
>> + *
>> + * Copyright (c) 2017, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "qapi/error.h"
>> +#include "cpu.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/ppc/spapr_xive.h"
>> +#include "hw/ppc/fdt.h"
>> +#include "monitor/monitor.h"
>> +
>> +#include "xive-internal.h"
>> +
>> +/* Priority ranges reserved by the hypervisor. The Linux driver is
>> + * expected to choose priority 6.
>> + */
>> +static const uint32_t reserved_priorities[] = {
>> +    7,    /* start */
>> +    0xf8, /* count */
>> +};
>> +
>> +static bool priority_is_valid(uint32_t priority)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(reserved_priorities) / 2; i++) {
>> +        uint32_t base  = reserved_priorities[2 * i];
>> +        uint32_t count = reserved_priorities[2 * i + 1];
>> +
>> +        if (priority >= base && priority < base + count) {
>> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: priority %d is reserved\n",
>> +                          __func__, priority);
>> +            return false;
>> +        }
>> +    }
>> +
>> +    return true;
>> +}
> 
> This seems like overkill.  Aren't there only 0..7 levels supported in
> hardware, in which case a one byte bitmap will suffice to store the
> reserved levels.

I was trying the use the same array that will be exposed in the device
tree in the "ibm,plat-res-int-priorities" property, defined as 
follow in PAPR:

	property name that designates to the client program that the
	platform has reserved one or more interrupt priorities for its
	own use.
	
	prop-encoded-value: one or more (interrupt priority, range)
	pairs, where interrupt priority is a single cell hexidec- imal
	number between 0x00 and 0xFF, and range is an integer encoded as
	with encode-int that represents the number of contiguous
	interrupt priorities that have been reserved by the platform for
	its internal use.


But I agree, it's a bit overkill to check for 0..7 levels ...
 
> To check my understanding again, if you're running this with KVM, the
> host kernel and qemu will need to agree on which are the reserved
> levels, yes?

Hmm, these values are quite static. So I don't think there will be 
any sort of exchange between KVM and QEMU to define the range to 
expose to the guest. 

For the moment, Linux only uses one priority, the lowest, and Ben
has introduced in OPAL an automatic interrupt escalation feature
using queue 7 for all other queues (DD2.0 cpus). So we only expose 
range 0..6 to the guest for this purpose.

So we agreed orally.

> 
>> +
>> +/*
>> + * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
>> + * real address of the MMIO page through which the Event State Buffer
>> + * entry associated with the value of the "lisn" parameter is managed.
>> + *
>> + * Parameters:
>> + * Input
>> + * - "flags"
>> + *       Bits 0-63 reserved
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *       ibm,query-interrupt-source-number RTAS call, or as returned
>> + *       by the H_ALLOCATE_VAS_WINDOW hcall
>> + *
>> + * Output
>> + * - R4: "flags"
>> + *       Bits 0-59: Reserved
>> + *       Bit 60: H_INT_ESB must be used for Event State Buffer
>> + *               management
>> + *       Bit 61: 1 == LSI  0 == MSI
>> + *       Bit 62: the full function page supports trigger
>> + *       Bit 63: Store EOI Supported
>> + * - R5: Logical Real address of full function Event State Buffer
>> + *       management page, -1 if ESB hcall flag is set to 1.
>> + * - R6: Logical Real Address of trigger only Event State Buffer
>> + *       management page or -1.
>> + * - R7: Power of 2 page size for the ESB management pages returned in
>> + *       R5 and R6.
>> + */
>> +static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
>> +                                          sPAPRMachineState *spapr,
>> +                                          target_ulong opcode,
>> +                                          target_ulong *args)
>> +{
>> +    sPAPRXive *xive = spapr->xive;
>> +    XiveIVE *ive;
>> +    target_ulong flags  = args[0];
>> +    target_ulong lisn   = args[1];
>> +    uint64_t mmio_base;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
> 
> Is H_FUNCTION required by the PAPR ACRs here?  

yes. quoting the specs :

	/* H_Function: The calling OS is not in exploitation mode */

I need to review once more all of the return errors but, last time
I checked they looked sane. 

> Usually we only use
> H_FUNCTION if the hypercall doesn't exist at all, and if unavailable
> for other reasons use H_AUTHORITY or something.
> 
>> +    }
>> +
>> +    if (flags) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        return H_P2;
>> +    }
>> +
>> +    mmio_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
> 
> Hrm.. why was xive->esb_base not already a u64?

its an 'hwaddr'. Yes I can remove it.

>> +    args[0] = 0;
>> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
>> +        args[0] |= XIVE_SRC_LSI;
>> +    }
>> +    if (xive->flags & XIVE_SRC_TRIGGER) {
>> +        args[0] |= XIVE_SRC_TRIGGER;
>> +    }
>> +
>> +    if (xive->flags & XIVE_SRC_H_INT_ESB) {

btw, this is why I have the ->flags field. Do you still want me to 
remove it ? because I would like to keep the logic below. No big 
deal if not.

>> +        args[1] = -1; /* never used in QEMU  */
>> +        args[2] = -1;
>> +    } else {
>> +        args[1] = mmio_base;
>> +        if (xive->flags & XIVE_SRC_TRIGGER) {
>> +            args[2] = -1; /* No specific trigger page */
>> +        } else {
>> +            args[2] = -1; /* TODO: support for specific trigger page */
>> +        }
>> +    }
> 
> What does the availability of SRC_TRIGGER (and INT_ESB) depend on? 

The CPU revision. But we won't introduce XIVE exploitation mode on 
anything else than DD2.0 which has full XIVE support. Even STORE_EOI 
that we should be adding.

> If it varies with host capabilities, that's going to be real pain for
> migration.

Yes. I am not aware of any future extension but I agree this is
something we need to keep an eye on.
 
>> +
>> +    args[3] = xive->esb_shift;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
>> + * Interrupt Source to a target. The Logical Interrupt Source is
>> + * designated with the "lisn" parameter and the target is designated
>> + * with the "target" and "priority" parameters.  Upon return from the
>> + * hcall(), no additional interrupts will be directed to the old EQ.
>> + *
>> + * TODO: The old EQ should be investigated for interrupts that
>> + * occurred prior to or during the hcall().
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-61: Reserved
>> + *      Bit 62: set the "eisn" in the EA
>> + *      Bit 63: masks the interrupt source in the hardware interrupt
>> + *      control structure. An interrupt masked by this mechanism will
>> + *      be dropped, but it's source state bits will still be
>> + *      set. There is no race-free way of unmasking and restoring the
>> + *      source. Thus this should only be used in interrupts that are
>> + *      also masked at the source, and only in cases where the
>> + *      interrupt is not meant to be used for a large amount of time
>> + *      because no valid target exists for it for example
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *      ibm,query-interrupt-source-number RTAS call, or as returned by
>> + *      the H_ALLOCATE_VAS_WINDOW hcall
>> + * - "target" is per "ibm,ppc-interrupt-server#s" or
>> + *      "ibm,ppc-interrupt-gserver#s"
>> + * - "priority" is a valid priority not in
>> + *      "ibm,plat-res-int-priorities"
>> + * - "eisn" is the guest EISN associated with the "lisn"
>> + *
>> + * Output:
>> + * - None
>> + */
>> +
>> +#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
>> +#define XIVE_SRC_MASK     (1ull << (63 - 63))
> 
> Aren't there already a bunch of macros you have for defining things in
> terms of IBM bit numbers, so you can avoid open coding (63 - whatever).

Yes. 

On that topic, could we include the PPC_BIT* macros somewhere under ppc ? 

>> +
>> +static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
>> +                                            sPAPRMachineState *spapr,
>> +                                            target_ulong opcode,
>> +                                            target_ulong *args)
>> +{
>> +    XiveIVE *ive;
>> +    uint64_t new_ive;
>> +    target_ulong flags    = args[0];
>> +    target_ulong lisn     = args[1];
>> +    target_ulong target   = args[2];
>> +    target_ulong priority = args[3];
>> +    target_ulong eisn     = args[4];
>> +    uint32_t eq_idx;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        return H_P2;
>> +    }
>> +
>> +    /* priority 0xff is used to reset the IVE */
>> +    if (priority == 0xff) {
>> +        new_ive = IVE_VALID | IVE_MASKED;
>> +        goto out;
>> +    }
>> +
>> +    new_ive = ive->w;
>> +
>> +    if (flags & XIVE_SRC_MASK) {
>> +        new_ive = ive->w | IVE_MASKED;
>> +    } else {
>> +        new_ive = ive->w & ~IVE_MASKED;
>> +    }
>> +
>> +    if (!priority_is_valid(priority)) {
>> +        return H_P4;
>> +    }
>> +
>> +    /* TODO: If the partition thread count is greater than the
>> +     * hardware thread count, validate the "target" has a
>> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
>> +     */
> 
> What's this about?  

That is from the specs and I haven't quite figured out what it meant.
I need to ask.

> I thought the point of XIVE was you could set up
> target queues for your vcpus regardless of mapping to physical cpus.

yes.

>> +    /* Validate that "target" is part of the list of threads allocated
>> +     * to the partition. For that, find the EQ corresponding to the
>> +     * target.
>> +     */
>> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
>> +        return H_P3;
>> +    }
>> +
>> +    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
>> +    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
>> +
>> +    if (flags & XIVE_SRC_SET_EISN) {
>> +        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
>> +    }
>> +
>> +out:
>> +    /* TODO: handle syncs ? */
>> +
>> +    /* And update */
>> +    ive->w = new_ive;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
>> + * target/priority pair is assigned to the specified Logical Interrupt
>> + * Source.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-63 Reserved
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *      ibm,query-interrupt-source-number RTAS call, or as
>> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
>> + *
>> + * Output:
>> + * - R4: Target to which the specified Logical Interrupt Source is
>> + *       assigned
>> + * - R5: Priority to which the specified Logical Interrupt Source is
>> + *       assigned
>> + * - R6: EISN for the specified Logical Interrupt Source (this will be
>> + *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
>> + */
>> +static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
>> +                                            sPAPRMachineState *spapr,
>> +                                            target_ulong opcode,
>> +                                            target_ulong *args)
>> +{
>> +    target_ulong flags = args[0];
>> +    target_ulong lisn = args[1];
>> +    XiveIVE *ive;
>> +    XiveEQ *eq;
>> +    uint32_t eq_idx;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        return H_P2;
>> +    }
>> +
>> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
>> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
>> +    if (!eq) {
>> +        return H_HARDWARE;
>> +    }
>> +
>> +    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
>> +
>> +    if (ive->w & IVE_MASKED) {
>> +        args[1] = 0xff;
>> +    } else {
>> +        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>> +    }
>> +
>> +    args[2] = GETFIELD(IVE_EQ_DATA, ive->w);
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
>> + * address of the notification management page associated with the
>> + * specified target and priority.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *       Bits 0-63 Reserved
>> + * - "target" is per "ibm,ppc-interrupt-server#s" or
>> + *       "ibm,ppc-interrupt-gserver#s"
>> + * - "priority" is a valid priority not in
>> + *       "ibm,plat-res-int-priorities"
>> + *
>> + * Output:
>> + * - R4: Logical real address of notification page
>> + * - R5: Power of 2 page size of the notification page
>> + */
>> +static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
>> +                                         sPAPRMachineState *spapr,
>> +                                         target_ulong opcode,
>> +                                         target_ulong *args)
>> +{
>> +    target_ulong flags    = args[0];
>> +    target_ulong target   = args[1];
>> +    target_ulong priority = args[2];
>> +    uint32_t eq_idx;
>> +    XiveEQ *eq;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    if (!priority_is_valid(priority)) {
>> +        return H_P3;
>> +    }
>> +
>> +    /* Validate that "target" is part of the list of threads allocated
>> +     * to the partition. For that, find the EQ corresponding to the
>> +     * target.
>> +     */
>> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
>> +        return H_P2;
>> +    }
>> +
>> +    /* TODO: If the partition thread count is greater than the
>> +     * hardware thread count, validate the "target" has a
>> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
>> +     */
>> +
>> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
>> +    if (!eq)  {
>> +        return H_HARDWARE;
>> +    }
>> +
>> +    args[0] = -1; /* TODO: return ESn page */
>> +    if (eq->w0 & EQ_W0_ENQUEUE) {
>> +        args[1] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
>> +    } else {
>> +        args[1] = 0;
>> +    }
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
>> + * a given "target" and "priority".  It is also used to set the
>> + * notification config associated with the EQ.  An EQ size of 0 is
>> + * used to reset the EQ config for a given target and priority. If
>> + * resetting the EQ config, the END associated with the given "target"
>> + * and "priority" will be changed to disable queueing.
>> + *
>> + * Upon return from the hcall(), no additional interrupts will be
>> + * directed to the old EQ (if one was set). The old EQ (if one was
>> + * set) should be investigated for interrupts that occurred prior to
>> + * or during the hcall().
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-62: Reserved
>> + *      Bit 63: Unconditional Notify (n) per the XIVE spec
>> + * - "target" is per "ibm,ppc-interrupt-server#s" or
>> + *       "ibm,ppc-interrupt-gserver#s"
>> + * - "priority" is a valid priority not in
>> + *       "ibm,plat-res-int-priorities"
>> + * - "eventQueue": The logical real address of the start of the EQ
>> + * - "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
>> + *
>> + * Output:
>> + * - None
>> + */
>> +
>> +#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
>> +
>> +static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
>> +                                           sPAPRMachineState *spapr,
>> +                                           target_ulong opcode,
>> +                                           target_ulong *args)
>> +{
>> +    target_ulong flags    = args[0];
>> +    target_ulong target   = args[1];
>> +    target_ulong priority = args[2];
>> +    target_ulong qpage    = args[3];
>> +    target_ulong qsize    = args[4];
>> +    uint32_t eq_idx;
>> +    XiveEQ *old_eq;
>> +    XiveEQ eq;
>> +    uint32_t qdata;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags & ~XIVE_EQ_ALWAYS_NOTIFY) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    if (!priority_is_valid(priority)) {
>> +        return H_P3;
>> +    }
>> +
>> +    /* Validate that "target" is part of the list of threads allocated
>> +     * to the partition. For that, find the EQ corresponding to the
>> +     * target.
>> +     */
>> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
>> +        return H_P2;
>> +    }
>> +
>> +    /* TODO: If the partition thread count is greater than the
>> +     * hardware thread count, validate the "target" has a
>> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
>> +     */
>> +
>> +    old_eq = spapr_xive_get_eq(spapr->xive, eq_idx);
>> +    if (!old_eq)  {
>> +        return H_HARDWARE;
>> +    }
>> +
>> +    eq = *old_eq;
>> +
>> +    switch (qsize) {
>> +    case 12:
>> +    case 16:
>> +    case 21:
>> +    case 24:
>> +        eq.w3 = ((uint64_t)qpage) & 0xffffffff;
>> +        eq.w2 = (((uint64_t)qpage)) >> 32 & 0x0fffffff;
>> +        eq.w0 |= EQ_W0_ENQUEUE;
>> +        eq.w0 = SETFIELD(EQ_W0_QSIZE, eq.w0, qsize - 12);
>> +        break;
>> +    case 0:
>> +        /* reset queue and disable queueing */
>> +        eq.w2 = eq.w3 = 0;
>> +        eq.w0 &= ~EQ_W0_ENQUEUE;
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid EQ size %"PRIx64"\n",
>> +                      __func__, qsize);
>> +        return H_P5;
>> +    }
>> +
>> +    if (qsize) {
>> +        /*
>> +         * Let's validate the EQ address with a read of the first EQ
>> +         * entry. We could also check that the full queue has been
>> +         * zeroed by the OS.
>> +         */
>> +        if (address_space_read(&address_space_memory, qpage,
>> +                               MEMTXATTRS_UNSPECIFIED,
>> +                               (uint8_t *) &qdata, sizeof(qdata))) {
>> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to read EQ data @0x%"
>> +                          HWADDR_PRIx "\n", __func__, qpage);
>> +            return H_P4;
>> +        }
>> +    }
>> +
>> +    /* Ensure the priority and target are correctly set (they will not
>> +     * be right after allocation)
>> +     */
>> +    eq.w6 = SETFIELD(EQ_W6_NVT_BLOCK, 0ul, 0ul) |
>> +        SETFIELD(EQ_W6_NVT_INDEX, 0ul, target);
>> +    eq.w7 = SETFIELD(EQ_W7_F0_PRIORITY, 0ul, priority);
>> +
>> +    /* TODO: depends on notitification page (ESn) from H_INT_GET_QUEUE_INFO */
>> +    if (flags & XIVE_EQ_ALWAYS_NOTIFY) {
>> +        eq.w0 |= EQ_W0_UCOND_NOTIFY;
> 
> Do you need to also clear if the flag is not set?  AFAICT eq.w0 is
> inherited from teh old queue and enver reset from scratch.

True. It is always on if the EQ is not reseted. I also need 
to be more precise in spapr_xive_irq() when dealing with the 
reseted EQs. The model has not fallen in to that trap yet.

>> +    }
>> +
>> +    /* The generation bit for the EQ starts at 1 and The EQ page
>> +     * offset counter starts at 0.
>> +     */
>> +    eq.w1 = EQ_W1_GENERATION | SETFIELD(EQ_W1_PAGE_OFF, 0ul, 0ul);
>> +    eq.w0 |= EQ_W0_VALID;
>> +
>> +    /* TODO: issue syncs required to ensure all in-flight interrupts
>> +     * are complete on the old EQ */
>> +
>> +    /* Update EQ */
>> +    *old_eq = eq;
> 
> Hrm.  The BQL probably saves you, but in general do you need to make
> sure the ENQUEUE bit is set after updating everything else?

There is a rather complex procedure to update the HW, cache and 
memory. See xive_eqc_cache_update() in OPAL. I will need to dig 
in for the PowerNV support ...

>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
>> + * target and priority.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-62: Reserved
>> + *      Bit 63: Debug: Return debug data
>> + * - "target" is per "ibm,ppc-interrupt-server#s" or
>> + *       "ibm,ppc-interrupt-gserver#s"
>> + * - "priority" is a valid priority not in
>> + *       "ibm,plat-res-int-priorities"
>> + *
>> + * Output:
>> + * - R4: "flags":
>> + *       Bits 0-62: Reserved
>> + *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
>> + * - R5: The logical real address of the start of the EQ
>> + * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
>> + * - R7: The value of Event Queue Offset Counter per XIVE spec
>> + *       if "Debug" = 1, else 0
>> + *
>> + */
>> +
>> +#define XIVE_EQ_DEBUG     (1ull << (63 - 63))
>> +
>> +static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
>> +                                           sPAPRMachineState *spapr,
>> +                                           target_ulong opcode,
>> +                                           target_ulong *args)
>> +{
>> +    target_ulong flags    = args[0];
>> +    target_ulong target   = args[1];
>> +    target_ulong priority = args[2];
>> +    uint32_t eq_idx;
>> +    XiveEQ *eq;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags & ~XIVE_EQ_DEBUG) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    if (!priority_is_valid(priority)) {
>> +        return H_P3;
>> +    }
>> +
>> +    /* Validate that "target" is part of the list of threads allocated
>> +     * to the partition. For that, find the EQ corresponding to the
>> +     * target.
>> +     */
>> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
>> +        return H_P2;
>> +    }
>> +
>> +    /* TODO: If the partition thread count is greater than the
>> +     * hardware thread count, validate the "target" has a
>> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
>> +     */
>> +
>> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
>> +    if (!eq)  {
>> +        return H_HARDWARE;
>> +    }
>> +
>> +    args[0] = 0;
>> +    if (eq->w0 & EQ_W0_UCOND_NOTIFY) {
>> +        args[0] |= XIVE_EQ_ALWAYS_NOTIFY;
>> +    }
>> +
>> +    if (eq->w0 & EQ_W0_ENQUEUE) {
>> +        args[1] =
>> +            (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
>> +        args[2] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
>> +    } else {
>> +        args[1] = 0;
>> +        args[2] = 0;
>> +    }
>> +
>> +    /* TODO: do we need any locking on the EQ ? */
> 
> Probably not if you're designating it as protected by the BQL.

OK.

Thanks,

C. 
 
>> +    if (flags & XIVE_EQ_DEBUG) {
>> +        /* Load the event queue generation number into the return flags */
>> +        args[0] |= GETFIELD(EQ_W1_GENERATION, eq->w1);
>> +
>> +        /* Load R7 with the event queue offset counter */
>> +        args[3] = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
>> +    }
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
>> + * reporting cache line pair for the calling thread.  The reporting
>> + * cache lines will contain the OS interrupt context when the OS
>> + * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
>> + * interrupt. The reporting cache lines can be reset by inputting -1
>> + * in "reportingLine".  Issuing the CI store byte without reporting
>> + * cache lines registered will result in the data not being accessible
>> + * to the OS.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-63: Reserved
>> + * - "reportingLine": The logical real address of the reporting cache
>> + *    line pair
>> + *
>> + * Output:
>> + * - None
>> + */
>> +static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
>> +                                                sPAPRMachineState *spapr,
>> +                                                target_ulong opcode,
>> +                                                target_ulong *args)
>> +{
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    /* TODO: H_INT_SET_OS_REPORTING_LINE */
>> +    return H_FUNCTION;
>> +}
>> +
>> +/*
>> + * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
>> + * real address of the reporting cache line pair set for the input
>> + * "target".  If no reporting cache line pair has been set, -1 is
>> + * returned.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-63: Reserved
>> + * - "target" is per "ibm,ppc-interrupt-server#s" or
>> + *       "ibm,ppc-interrupt-gserver#s"
>> + * - "reportingLine": The logical real address of the reporting cache
>> + *   line pair
>> + *
>> + * Output:
>> + * - R4: The logical real address of the reporting line if set, else -1
>> + */
>> +static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
>> +                                                sPAPRMachineState *spapr,
>> +                                                target_ulong opcode,
>> +                                                target_ulong *args)
>> +{
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    /* TODO: H_INT_GET_OS_REPORTING_LINE */
>> +    return H_FUNCTION;
>> +}
>> +
>> +/*
>> + * The H_INT_ESB hcall() is used to issue a load or store to the ESB
>> + * page for the input "lisn".  This hcall is only supported for LISNs
>> + * that have the ESB hcall flag set to 1 when returned from hcall()
>> + * H_INT_GET_SOURCE_INFO.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-62: Reserved
>> + *      bit 63: Store: Store=1, store operation, else load operation
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *      ibm,query-interrupt-source-number RTAS call, or as
>> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
>> + * - "esbOffset" is the offset into the ESB page for the load or store operation
>> + * - "storeData" is the data to write for a store operation
>> + *
>> + * Output:
>> + * - R4: R4: The value of the load if load operation, else -1
>> + */
>> +
>> +#define XIVE_ESB_STORE (1ull << (63 - 63))
>> +
>> +static target_ulong h_int_esb(PowerPCCPU *cpu,
>> +                              sPAPRMachineState *spapr,
>> +                              target_ulong opcode,
>> +                              target_ulong *args)
>> +{
>> +    sPAPRXive *xive = spapr->xive;
>> +    XiveIVE *ive;
>> +    target_ulong flags   = args[0];
>> +    target_ulong lisn    = args[1];
>> +    target_ulong offset  = args[2];
>> +    target_ulong data    = args[3];
>> +    uint64_t esb_base;
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags & ~XIVE_ESB_STORE) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    ive = spapr_xive_get_ive(xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (offset > (1ull << xive->esb_shift)) {
>> +        return H_P3;
>> +    }
>> +
>> +    esb_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
>> +    esb_base += offset;
>> +
>> +    if (dma_memory_rw(&address_space_memory, esb_base, &data, 8,
>> +                      (flags & XIVE_ESB_STORE))) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to rw data @0x%"
>> +                      HWADDR_PRIx "\n", __func__, esb_base);
>> +        return H_HARDWARE;
>> +    }
>> +    args[0] = (flags & XIVE_ESB_STORE) ? -1 : data;
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_SYNC hcall() is used to issue hardware syncs that will
>> + * ensure any in flight events for the input lisn are in the event
>> + * queue.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-63: Reserved
>> + * - "lisn" is per "interrupts", "interrupt-map", or
>> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
>> + *      ibm,query-interrupt-source-number RTAS call, or as
>> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
>> + *
>> + * Output:
>> + * - None
>> + */
>> +static target_ulong h_int_sync(PowerPCCPU *cpu,
>> +                               sPAPRMachineState *spapr,
>> +                               target_ulong opcode,
>> +                               target_ulong *args)
>> +{
>> +    XiveIVE *ive;
>> +    target_ulong flags   = args[0];
>> +    target_ulong lisn    = args[1];
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        return H_P2;
>> +    }
>> +
>> +    /*
>> +     * H_STATE should be returned if a H_INT_RESET is in progress.
>> +     * This is not needed when running the emulation under QEMU
>> +     */
>> +
>> +    /* This is not real hardware. Nothing to be done */
>> +    return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * The H_INT_RESET hcall() is used to reset all of the partition's
>> + * interrupt exploitation structures to their initial state.  This
>> + * means losing all previously set interrupt state set via
>> + * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
>> + *
>> + * Parameters:
>> + * Input:
>> + * - "flags"
>> + *      Bits 0-63: Reserved
>> + *
>> + * Output:
>> + * - None
>> + */
>> +static target_ulong h_int_reset(PowerPCCPU *cpu,
>> +                                sPAPRMachineState *spapr,
>> +                                target_ulong opcode,
>> +                                target_ulong *args)
>> +{
>> +    target_ulong flags   = args[0];
>> +
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (flags) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    spapr_xive_reset(spapr->xive);
>> +    return H_SUCCESS;
>> +}
>> +
>> +void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>> +{
>> +    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
>> +    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
>> +    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
>> +    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
>> +    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
>> +    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
>> +    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
>> +                             h_int_set_os_reporting_line);
>> +    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
>> +                             h_int_get_os_reporting_line);
>> +    spapr_register_hypercall(H_INT_ESB, h_int_esb);
>> +    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>> +    spapr_register_hypercall(H_INT_RESET, h_int_reset);
>> +}
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index ca4e72187f60..8b15c0b500d0 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -222,6 +222,8 @@ static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
>>          goto error;
>>      }
>>  
>> +    spapr_xive_hcall_init(spapr);
>> +
>>      return SPAPR_XIVE(obj);
>>  error:
>>      error_propagate(errp, local_err);
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 90e2b0f6c678..a25e218b34e2 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -387,7 +387,20 @@ struct sPAPRMachineState {
>>  #define H_INVALIDATE_PID        0x378
>>  #define H_REGISTER_PROC_TBL     0x37C
>>  #define H_SIGNAL_SYS_RESET      0x380
>> -#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
>> +
>> +#define H_INT_GET_SOURCE_INFO   0x3A8
>> +#define H_INT_SET_SOURCE_CONFIG 0x3AC
>> +#define H_INT_GET_SOURCE_CONFIG 0x3B0
>> +#define H_INT_GET_QUEUE_INFO    0x3B4
>> +#define H_INT_SET_QUEUE_CONFIG  0x3B8
>> +#define H_INT_GET_QUEUE_CONFIG  0x3BC
>> +#define H_INT_SET_OS_REPORTING_LINE 0x3C0
>> +#define H_INT_GET_OS_REPORTING_LINE 0x3C4
>> +#define H_INT_ESB               0x3C8
>> +#define H_INT_SYNC              0x3CC
>> +#define H_INT_RESET             0x3D0
>> +
>> +#define MAX_HCALL_OPCODE        H_INT_RESET
>>  
>>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>>   * as well.
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 6e8a189e723f..3f822220647f 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -79,4 +79,8 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
>>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>>  void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
>>  
>> +typedef struct sPAPRMachineState sPAPRMachineState;
>> +
>> +void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>> +
>>  #endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-28  6:38   ` David Gibson
  2017-11-28 18:33     ` Cédric Le Goater
@ 2017-12-02 14:23     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:23 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Tue, 2017-11-28 at 17:38 +1100, David Gibson wrote:
> Hrm.  I don't love that you're dealing with clearing that LSI bit
> here, but setting it at a different level.
> 
> The state machines are doing my head in a bit, is there any way
> you could derive the STATUS_SENT bit from the PQ bits?

Yeah it should be...

So you should normally need only one extra bit of state for LSI which
is whether it's asserted or not and no extra bit of state for MSIs.

P is basically "sent". Q is whether another event has been queued up
(and is only meaningful for MSIs though the 01 combination will mask
LSIs too).

The state logic should be for MSIs on event:

	- if PQ=01 ignore (masked)
	- if P=1, set Q and finish
	- set P=1 and forward event to IVE

For EOI (load and store):

	- if PQ=01 ignore
	- P=Q, Q=0
	- (storeEOI only) if new P=1, forward event to IVE
 
For LSIs, and "event" is whenever the state is asserted, and Q is
meaningless, so basically on every change of state or ESB:

	- if PQ=01 ignore (masked)
	- if P=1 finish
	- set P=1 and forward event to IVE

For EOI (load and store):

	- if PQ=01 ignore
	- clear P
	- re-evaluate as above if asserted

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags
  2017-11-28  6:40   ` David Gibson
  2017-11-28 18:23     ` Cédric Le Goater
@ 2017-12-02 14:24     ` Benjamin Herrenschmidt
  2017-12-02 14:38       ` Cédric Le Goater
  1 sibling, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:24 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Tue, 2017-11-28 at 17:40 +1100, David Gibson wrote:
> > @@ -368,6 +368,10 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >      /* Allocate the IVT (Interrupt Virtualization Table) */
> >      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> >  
> > +    /* All sources are emulated under the XIVE object and share the
> > +     * same characteristic */
> > +    xive->flags = XIVE_SRC_TRIGGER;
> 
> You never actually use this field.  And since it always has the same
> value, is there a point to storing it?

Some HW sources don't have it, so with pass-through maybe...

Cheers,
Ben

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources
  2017-11-28 18:18     ` Cédric Le Goater
@ 2017-12-02 14:26       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:26 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On Tue, 2017-11-28 at 18:18 +0000, Cédric Le Goater wrote:
> AFAICT, it doesn't. LSI events are configured as the other XIVE interrupts. 
> The level is converted in the P bit and the Q bit should always be zero.
> So I should be able to simplify the proposed model which still is mimicking 
> XICS  ... I will take a look at it. 
> 
> There are a sort of special degenerated LSIs but these are for bringup.

Not really. So for MSIs you don't need your state flags.

For LSIs, you do need the "asserted" one that keeps track of the LSI
input state. See my other note with the actual states.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-29 16:23           ` Cédric Le Goater
  2017-11-30  4:28             ` David Gibson
@ 2017-12-02 14:28             ` Benjamin Herrenschmidt
  2017-12-02 14:47               ` Cédric Le Goater
  1 sibling, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:28 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: list@suse.de:PowerPC, qemu-devel

On Wed, 2017-11-29 at 17:23 +0100, Cédric Le Goater wrote:
> On 11/29/2017 02:56 PM, Cédric Le Goater wrote:
> > > > > > +    switch (offset) {
> > > > > > +    case 0:
> > > > > > +        spapr_xive_source_eoi(xive, lisn);
> > > > > 
> > > > > Hrm.  I don't love that you're dealing with clearing that LSI bit
> > > > > here, but setting it at a different level.
> > > > > 
> > > > > The state machines are doing my head in a bit, is there any way
> > > > > you could derive the STATUS_SENT bit from the PQ bits?
> > > > 
> > > > Yes. I should. 
> > > > 
> > > > I am also lacking a guest driver to exercise these LSIs so I didn't
> > > > pay a lot of attention to level interrupts. Any idea ?
> > > 
> > > How about an old-school emulated PCI device?  Maybe rtl8139?
> > 
> > Perfect. The current model is working but I will see how I can 
> > improve it to use the PQ bits instead.
> 
> Using the PQ bits is simplifying the model but we still have to 
> maintain an array to store the IRQ type. 
> 
> There are 3 unused bits in the IVE descriptor, bits[1-3]:  
> 
>   #define IVE_VALID       PPC_BIT(0)
>   #define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>   #define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>   #define IVE_MASKED      PPC_BIT(32)              /* Masked */
>   #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> 
> We could hijack one of them to store the LSI type and get rid of 
> the type array. Would you object to that ? 

This won't work well if/when you implement a real HW XIVE.

Another option is to have different source objects for LSIs and MSIs.

Cheers,
Ben.
> 
> C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-11-30  4:28             ` David Gibson
  2017-11-30 16:05               ` Cédric Le Goater
@ 2017-12-02 14:33               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:33 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: list@suse.de:PowerPC, qemu-devel

On Thu, 2017-11-30 at 15:28 +1100, David Gibson wrote:
> 
> How does this work at the hardware level?  Presumbly the actual
> hardware components don't communicate with the XIVE to request edge or
> level.  So how does it know?  Specific ranges for LSIs?  If that we
> should probably do the same.

So the source controller and the IVE are separate. The source
controller sends an internal MMIO to the IVE for "translating" the
event into a queue etc...

The IVE only see "events" which are effectively state transitions of
the P bit of the source.

The LSI vs MSI difference is thus entirely a property of the source
HW.

All the XIVE "Generic" built-in sources (the ones you can trigger with
an MMIO, which we use in KVM for all the IPIs and virtual interrupts)
are MSIs.

You find 2 kind of blocks of LSIs in the chip, the one PSI block which
has a handful or two of LSI sources for random "stuff" (LPC
interrupt(s), i2c interrupts etc..) and the LSI blocks which are in
each PHB.

So the PHB has basically two different bits of logic, one for LSIs and
one for MSIs. Their HW state machine is different.

In fact in the PHB and the PSI, I think, there's even an MMIO backdoor
register that allows you to see the "state" of the LSI (asserted).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags
  2017-12-02 14:24     ` Benjamin Herrenschmidt
@ 2017-12-02 14:38       ` Cédric Le Goater
  2017-12-02 14:48         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-02 14:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: qemu-ppc, qemu-devel

On 12/02/2017 03:24 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-11-28 at 17:40 +1100, David Gibson wrote:
>>> @@ -368,6 +368,10 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>>      /* Allocate the IVT (Interrupt Virtualization Table) */
>>>      xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>>>  
>>> +    /* All sources are emulated under the XIVE object and share the
>>> +     * same characteristic */
>>> +    xive->flags = XIVE_SRC_TRIGGER;
>>
>> You never actually use this field.  And since it always has the same
>> value, is there a point to storing it?
> 
> Some HW sources don't have it, so with pass-through maybe...

Hmm, yes. So, the current design for sPAPR handles all sources 
under the same XIVE object with a global memory region for all 
the ESBs. 

The first RFC had a mechanism to register source objects into 
the XIVE main one, allocating the IRQs per source and mapping 
the ESBs in the overall region. A bit like OPAL does. I then 
simplified for the sake of clarity and merged everything under 
the same XIVE object. 

Shall I reintroduce multiples sources support ? and provide a 
default one for IPIs and virtual devices of the machine. 

Thanks,

C. 

 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-11-30  4:38   ` David Gibson
  2017-11-30 14:06     ` Cédric Le Goater
@ 2017-12-02 14:39     ` Benjamin Herrenschmidt
  2017-12-02 14:41       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:39 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Thu, 2017-11-30 at 15:38 +1100, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
> > The Event Queue Descriptor (EQD) table, also known as Event Notification
> > Descriptor (END), is one of the internal tables the XIVE interrupt
> > controller uses to redirect exception from event sources to CPU
> > threads.
> > 
> > The EQD specifies on which Event Queue the event data should be posted
> > when an exception occurs (later on pulled by the OS) and which server
> > (VPD in XIVE terminology) to notify. The Event Queue is a much more
> > complex structure but we start with a simple model for the sPAPR
> > machine.
> 
> Just to clarify my understanding a server / VPD in XIVE would
> typically correspond to a cpu - either real or virtual, yes?

The IVEs and EQs are managed by the virtualization controller. The VPs
(aka ENDs) are managed by the presentation controller. There's a VP per
real and virtual CPU.

You can think of the XIVE as having 3 main component types:


 - Source controller(s). There are some in the PHBs, one generic in the
XIVE itself, and one in the PSI bridge. Those contain the PQ bits and
thus the trigger & coalescing logic. They effectively shoot an MMIO to
the virtualization controller on events.

 - Virtualization controller (one per chip). This receives the above
MMIOs from the sources, manages the IVEs to get the target queue and
remap the number, and manages the queues. When a queue is enabled for
notification (or escalation) and such an event occurs, an MMIO goes to
the corresponding presentation controller.

 - Presentation controller (one per chip). This receives the above
notifications and sets as a result the IPB bits for one of the 8
priorities. Basically this guy tracks a single pending bit per priority
for each VP indicating whether there's something in the queue for that
priority and delivers interrupts to the core accordingly.


Now this is a simplified view. The PC supports groups but we don't
handle that yet, there are escalation interrupts, there are
redistribution mechanisms etc... but for now you get the basic idea.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged
  2017-11-30  5:00   ` David Gibson
  2017-11-30 16:17     ` Cédric Le Goater
@ 2017-12-02 14:40     ` Benjamin Herrenschmidt
  2017-12-04  1:17       ` David Gibson
  2017-12-07 11:55     ` Cédric Le Goater
  2 siblings, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:40 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Thu, 2017-11-30 at 16:00 +1100, David Gibson wrote:
> 
> >  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> >  {
> > -    return 0;
> > +    uint8_t nsr = icp->tima_os[TM_NSR];
> > +
> > +    qemu_irq_lower(icp->output);
> > +
> > +    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
> > +        uint8_t cppr = icp->tima_os[TM_PIPR];
> > +
> > +        icp->tima_os[TM_CPPR] = cppr;
> > +
> > +        /* Reset the pending buffer bit */
> > +        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
> 
> What if multiple irqs of the same priority were queued?

It's the job of the OS to handle that case by consuming from the queue
until it's empty. There is an MMIO the guest can use if it wants to
that can set the IPB bits back to 1 for a given priority. Otherwise in
Linux we just have a SW way to force a replay.

> > +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > +
> > +        /* Drop Exception bit for OS */
> > +        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
> > +    }
> > +
> > +    return (nsr << 8) | icp->tima_os[TM_CPPR];
> > +}
> > +
> > +static void spapr_xive_icp_notify(sPAPRXiveICP *icp)
> > +{
> > +    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
> > +        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> > +        qemu_irq_raise(icp->output);
> > +    }
> >  }
> >  
> >  static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> > @@ -51,6 +105,9 @@ static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> >      }
> >  
> >      icp->tima_os[TM_CPPR] = cppr;
> > +
> > +    /* CPPR has changed, inform the ICP which might raise an exception */
> > +    spapr_xive_icp_notify(icp);
> >  }
> >  
> >  /*
> > @@ -224,6 +281,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >      XiveEQ *eq;
> >      uint32_t eq_idx;
> >      uint8_t priority;
> > +    uint32_t server;
> > +    sPAPRXiveICP *icp;
> >  
> >      ive = spapr_xive_get_ive(xive, lisn);
> >      if (!ive || !(ive->w & IVE_VALID)) {
> > @@ -253,6 +312,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >          qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> >      }
> >  
> > +    server = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> > +    icp = spapr_xive_icp_get(xive, server);
> > +    if (!icp) {
> > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for server %d\n", server);
> > +        return;
> > +    }
> > +
> >      if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> >          priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> >  
> > @@ -260,9 +326,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >          if (priority == 0xff) {
> >              g_assert_not_reached();
> >          }
> > +
> > +        /* Update the IPB (Interrupt Pending Buffer) with the priority
> > +         * of the new notification and inform the ICP, which will
> > +         * decide to raise the exception, or not, depending the CPPR.
> > +         */
> > +        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
> > +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> >      } else {
> >          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> >      }
> > +
> > +    spapr_xive_icp_notify(icp);
> >  }
> >  
> >  /*
> 
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-12-02 14:39     ` Benjamin Herrenschmidt
@ 2017-12-02 14:41       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:41 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Sat, 2017-12-02 at 08:39 -0600, Benjamin Herrenschmidt wrote:
> The IVEs and EQs are managed by the virtualization controller. The VPs
> (aka ENDs) 

typo. aka NVTs

> are managed by the presentation controller. There's a VP per
> real and virtual CPU.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-12-01  4:10       ` David Gibson
  2017-12-01 16:43         ` Cédric Le Goater
@ 2017-12-02 14:45         ` Benjamin Herrenschmidt
  2017-12-02 14:46           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:45 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Fri, 2017-12-01 at 15:10 +1100, David Gibson wrote:
> 
> Hm, ok.  Guest endian (or at least, not definitively host-endian) data
> in a plain uint32_t makes me uncomfortable.  Could we use char data[4]
> instead, to make it clear it's a byte-ordered buffer, rather than a
> number as far as the XIVE is concerned.
> 
> Hm.. except that doesn't quite work, because the hardware must define
> which end that generation bit ends up in...

It also needs to be written atomically. Just say it's big endian.

Cheers,
Ben.

> > >> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
> > >> +                      HWADDR_PRIx "\n", __func__, qaddr);
> > >> +        return;
> > >> +    }
> > >> +
> > >> +    qindex = (qindex + 1) % qentries;
> > >> +    if (qindex == 0) {
> > >> +        qgen ^= 1;
> > >> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
> > >> +    }
> > >> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
> > >> +}
> > >> +
> > >>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > >>  {
> > >> +    XiveIVE *ive;
> > >> +    XiveEQ *eq;
> > >> +    uint32_t eq_idx;
> > >> +    uint8_t priority;
> > >> +
> > >> +    ive = spapr_xive_get_ive(xive, lisn);
> > >> +    if (!ive || !(ive->w & IVE_VALID)) {
> > >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> > > 
> > > As mentioned on other patches, I'm a little concerned by these
> > > guest-triggerable logs.  I guess the LOG_GUEST_ERROR mask will save
> > > us, though.
> > 
> > I want to track 'invalid' interrupts but I haven't seen these show up 
> > in my tests. I agree there are a little too much and some could just 
> > be asserts.
> 
> Uh.. I don't think many can be assert()s.  assert() is only
> appropriate if it being tripped definitely indicates a bug in qemu.
> Nearly all these qemu_log()s I've seen can be tripped by the guest
> doing something bad, which absolutely should not assert() qemu.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-12-02 14:45         ` Benjamin Herrenschmidt
@ 2017-12-02 14:46           ` Benjamin Herrenschmidt
  2017-12-04  1:20             ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:46 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Sat, 2017-12-02 at 08:45 -0600, Benjamin Herrenschmidt wrote:
> On Fri, 2017-12-01 at 15:10 +1100, David Gibson wrote:
> > 
> > Hm, ok.  Guest endian (or at least, not definitively host-endian) data
> > in a plain uint32_t makes me uncomfortable.  Could we use char data[4]
> > instead, to make it clear it's a byte-ordered buffer, rather than a
> > number as far as the XIVE is concerned.
> > 
> > Hm.. except that doesn't quite work, because the hardware must define
> > which end that generation bit ends up in...
> 
> It also needs to be written atomically. Just say it's big endian.

Also the guest reads it using be32_to_cpup...

> 
> Cheers,
> Ben.
> 
> > > > > +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
> > > > > +                      HWADDR_PRIx "\n", __func__, qaddr);
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    qindex = (qindex + 1) % qentries;
> > > > > +    if (qindex == 0) {
> > > > > +        qgen ^= 1;
> > > > > +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
> > > > > +    }
> > > > > +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
> > > > > +}
> > > > > +
> > > > >  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > > > >  {
> > > > > +    XiveIVE *ive;
> > > > > +    XiveEQ *eq;
> > > > > +    uint32_t eq_idx;
> > > > > +    uint8_t priority;
> > > > > +
> > > > > +    ive = spapr_xive_get_ive(xive, lisn);
> > > > > +    if (!ive || !(ive->w & IVE_VALID)) {
> > > > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", lisn);
> > > > 
> > > > As mentioned on other patches, I'm a little concerned by these
> > > > guest-triggerable logs.  I guess the LOG_GUEST_ERROR mask will save
> > > > us, though.
> > > 
> > > I want to track 'invalid' interrupts but I haven't seen these show up 
> > > in my tests. I agree there are a little too much and some could just 
> > > be asserts.
> > 
> > Uh.. I don't think many can be assert()s.  assert() is only
> > appropriate if it being tripped definitely indicates a bug in qemu.
> > Nearly all these qemu_log()s I've seen can be tripped by the guest
> > doing something bad, which absolutely should not assert() qemu.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources
  2017-12-02 14:28             ` Benjamin Herrenschmidt
@ 2017-12-02 14:47               ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-02 14:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: list@suse.de:PowerPC, qemu-devel

On 12/02/2017 03:28 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2017-11-29 at 17:23 +0100, Cédric Le Goater wrote:
>> On 11/29/2017 02:56 PM, Cédric Le Goater wrote:
>>>>>>> +    switch (offset) {
>>>>>>> +    case 0:
>>>>>>> +        spapr_xive_source_eoi(xive, lisn);
>>>>>>
>>>>>> Hrm.  I don't love that you're dealing with clearing that LSI bit
>>>>>> here, but setting it at a different level.
>>>>>>
>>>>>> The state machines are doing my head in a bit, is there any way
>>>>>> you could derive the STATUS_SENT bit from the PQ bits?
>>>>>
>>>>> Yes. I should. 
>>>>>
>>>>> I am also lacking a guest driver to exercise these LSIs so I didn't
>>>>> pay a lot of attention to level interrupts. Any idea ?
>>>>
>>>> How about an old-school emulated PCI device?  Maybe rtl8139?
>>>
>>> Perfect. The current model is working but I will see how I can 
>>> improve it to use the PQ bits instead.
>>
>> Using the PQ bits is simplifying the model but we still have to 
>> maintain an array to store the IRQ type. 
>>
>> There are 3 unused bits in the IVE descriptor, bits[1-3]:  
>>
>>   #define IVE_VALID       PPC_BIT(0)
>>   #define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>>   #define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>>   #define IVE_MASKED      PPC_BIT(32)              /* Masked */
>>   #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>>
>> We could hijack one of them to store the LSI type and get rid of 
>> the type array. Would you object to that ? 
> 
> This won't work well if/when you implement a real HW XIVE.
> 
> Another option is to have different source objects for LSIs and MSIs.

yes. Like for the PHB3 in PowerNV or in OPAL.

I will need to complexify the model a bit more with multiple source 
support like we did for PowerNV but that might be interesting for 
pass-through.

Thanks,

C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags
  2017-12-02 14:38       ` Cédric Le Goater
@ 2017-12-02 14:48         ` Benjamin Herrenschmidt
  2017-12-02 14:50           ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-02 14:48 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On Sat, 2017-12-02 at 15:38 +0100, Cédric Le Goater wrote:
> Hmm, yes. So, the current design for sPAPR handles all sources 
> under the same XIVE object with a global memory region for all 
> the ESBs. 
> 
> The first RFC had a mechanism to register source objects into 
> the XIVE main one, allocating the IRQs per source and mapping 
> the ESBs in the overall region. A bit like OPAL does. I then 
> simplified for the sake of clarity and merged everything under 
> the same XIVE object. 
> 
> Shall I reintroduce multiples sources support ? and provide a 
> default one for IPIs and virtual devices of the machine. 

That or you need state bits ;-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags
  2017-12-02 14:48         ` Benjamin Herrenschmidt
@ 2017-12-02 14:50           ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-02 14:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson; +Cc: qemu-ppc, qemu-devel

On 12/02/2017 03:48 PM, Benjamin Herrenschmidt wrote:
> On Sat, 2017-12-02 at 15:38 +0100, Cédric Le Goater wrote:
>> Hmm, yes. So, the current design for sPAPR handles all sources 
>> under the same XIVE object with a global memory region for all 
>> the ESBs. 
>>
>> The first RFC had a mechanism to register source objects into 
>> the XIVE main one, allocating the IRQs per source and mapping 
>> the ESBs in the overall region. A bit like OPAL does. I then 
>> simplified for the sake of clarity and merged everything under 
>> the same XIVE object. 
>>
>> Shall I reintroduce multiples sources support ? and provide a 
>> default one for IPIs and virtual devices of the machine. 
> 
> That or you need state bits ;-)

yeah. I started with state bits and thought I could hack the PQ 
ones but that's wrong. 

Thanks,

C. 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-12-01 16:36         ` Cédric Le Goater
@ 2017-12-04  1:09           ` David Gibson
  2017-12-04 16:31             ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-04  1:09 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 11260 bytes --]

On Fri, Dec 01, 2017 at 05:36:39PM +0100, Cédric Le Goater wrote:
> On 12/01/2017 12:35 AM, David Gibson wrote:
> > On Thu, Nov 30, 2017 at 02:06:27PM +0000, Cédric Le Goater wrote:
> >> On 11/30/2017 04:38 AM, David Gibson wrote:
> >>> On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
> >>>> The Event Queue Descriptor (EQD) table, also known as Event Notification
> >>>> Descriptor (END), is one of the internal tables the XIVE interrupt
> >>>> controller uses to redirect exception from event sources to CPU
> >>>> threads.
> >>>>
> >>>> The EQD specifies on which Event Queue the event data should be posted
> >>>> when an exception occurs (later on pulled by the OS) and which server
> >>>> (VPD in XIVE terminology) to notify. The Event Queue is a much more
> >>>> complex structure but we start with a simple model for the sPAPR
> >>>> machine.
> >>>
> >>> Just to clarify my understanding a server / VPD in XIVE would
> >>> typically correspond to a cpu - either real or virtual, yes?
> >>
> >> yes. VP for "virtual processor" and VPD for "virtual processor 
> >> descriptor" which contains the XIVE interrupt state of the VP 
> >> when not dispatched. It is still described in some documentation 
> >> as an NVT : Notification Virtual Target.  
> >>
> >> XIVE concepts were renamed at some time but the old name perdured.
> >> I am still struggling my way through all the names.
> >>
> >>
> >>>> There is one XiveEQ per priority and the model chooses to store them
> >>>> under the Xive Interrupt presenter model. It will be retrieved, just
> >>>> like for XICS, through the 'intc' object pointer of the CPU.
> >>>>
> >>>> The EQ indexing follows a simple pattern:
> >>>>
> >>>>        (server << 3) | (priority & 0x7)
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
> >>>>  2 files changed, 106 insertions(+)
> >>>>
> >>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >>>> index 554b25e0884c..983317a6b3f6 100644
> >>>> --- a/hw/intc/spapr_xive.c
> >>>> +++ b/hw/intc/spapr_xive.c
> >>>> @@ -23,6 +23,7 @@
> >>>>  #include "sysemu/dma.h"
> >>>>  #include "monitor/monitor.h"
> >>>>  #include "hw/ppc/spapr_xive.h"
> >>>> +#include "hw/ppc/spapr.h"
> >>>>  #include "hw/ppc/xics.h"
> >>>>  
> >>>>  #include "xive-internal.h"
> >>>> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
> >>>>      uint8_t   tima[TM_RING_COUNT * 0x10];
> >>>>      uint8_t   *tima_os;
> >>>>      qemu_irq  output;
> >>>> +
> >>>> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
> >>>>  };
> >>>>  
> >>>>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> >>>> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
> >>>>      },
> >>>>  };
> >>>>  
> >>>> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
> >>>> +{
> >>>> +    PowerPCCPU *cpu = spapr_find_cpu(server);
> >>>> +
> >>>> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
> >>>> +}
> >>>> +
> >>>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >>>>  {
> >>>>  
> >>>> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
> >>>>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
> >>>>  
> >>>>      memset(xicp->tima, 0, sizeof(xicp->tima));
> >>>> +
> >>>> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
> >>>>  }
> >>>>  
> >>>>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
> >>>> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
> >>>>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
> >>>>  }
> >>>>  
> >>>> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
> >>>> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
> >>>> +    .version_id = 1,
> >>>> +    .minimum_version_id = 1,
> >>>> +    .fields = (VMStateField []) {
> >>>> +        VMSTATE_UINT32(w0, XiveEQ),
> >>>> +        VMSTATE_UINT32(w1, XiveEQ),
> >>>> +        VMSTATE_UINT32(w2, XiveEQ),
> >>>> +        VMSTATE_UINT32(w3, XiveEQ),
> >>>> +        VMSTATE_UINT32(w4, XiveEQ),
> >>>> +        VMSTATE_UINT32(w5, XiveEQ),
> >>>> +        VMSTATE_UINT32(w6, XiveEQ),
> >>>> +        VMSTATE_UINT32(w7, XiveEQ),
> >>>
> >>> Wow.  Super descriptive field names there, but I guess that's not your fault.
> >>
> >> The defines in the "xive-internal.h" give a better view ... 
> >>
> >>>> +        VMSTATE_END_OF_LIST()
> >>>> +    },
> >>>> +};
> >>>> +
> >>>>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
> >>>>  {
> >>>>      /* TODO check machine XIVE support */
> >>>> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
> >>>>      .needed = vmstate_spapr_xive_icp_needed,
> >>>>      .fields = (VMStateField[]) {
> >>>>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
> >>>> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
> >>>> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
> >>>>          VMSTATE_END_OF_LIST()
> >>>>      },
> >>>>  };
> >>>> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
> >>>>      ive->w &= ~IVE_VALID;
> >>>>      return true;
> >>>>  }
> >>>> +
> >>>> +/*
> >>>> + * Use a simple indexing for the EQs.
> >>>
> >>> Is this server+priority encoding architected anywhere?  
> >>
> >> no. This is a model shortcut.
> >>
> >>> Otherwise, why not use separate parameters?
> >>
> >> yes. spapr_xive_get_eq() could use separate parameters and it would
> >> shorten the some of the hcalls.
> >>
> >> The result is stored in a single field of the IVE, EQ_INDEX. So I will 
> >> still need mangle/demangle routines but these could be simple macros.
> >> I will look at it.
> > 
> > Hm, ok.  So it's architected in the sense that you're using the
> > encoding from the EQ_INDEX field throughout.  That's could be a
> > reasonable choice, I can't really tell yet.
> > 
> > On the other hand, it might be easier to read if we use server and
> > priority as separate parameters until the point we actually encode
> > into the EQ_INDEX field.
> 
> In the architecture, the EQ_INDEX field contains an index to an 
> Event Queue Descriptor and the Event Queue Descriptor has a 
> EQ_W6_NVT_INDEX field pointing to an Notification Virtual Target.
> So there are two extra tables for the EQs and for the NVTs
> used by the HW.

Ok.  In the PAPR interface is the EQ_INDEX ever exposed to the guest?
Or does it just supply target/priority numbers and the hypervisor
manages the mapping to queues internally?

> In the sPAPR model, an EQ array is stored under the sPAPRXiveNVT 
> object which is stored under the ->intc pointer of the CPUState 
> object
> 
> So the EQ_INDEX field is really taking a shortcut, encoding 
> the cpu number and the priority to find an EQ, and the 
> EQ_W6_NVT_INDEX field holds a value which is the cpu number.
> But at the end, we save two tables. 
> 
> C.
> 
> 
> > 
> >>
> >>>> + */
> >>>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t eq_idx)
> >>>> +{
> >>>> +    int priority = eq_idx & 0x7;
> >>>> +    sPAPRXiveICP *xicp = spapr_xive_icp_get(xive, eq_idx >> 3);
> >>>> +
> >>>> +    return xicp ? &xicp->eqt[priority] : NULL;
> >>>> +}
> >>>> +
> >>>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server,
> >>>> +                              uint8_t priority, uint32_t *out_eq_idx)
> >>>> +{
> >>>> +    if (priority > XIVE_PRIORITY_MAX) {
> >>>> +        return false;
> >>>> +    }
> >>>> +
> >>>> +    if (out_eq_idx) {
> >>>> +        *out_eq_idx = (server << 3) | (priority & 0x7);
> >>>> +    }
> >>>> +
> >>>> +    return true;
> >>>> +}
> >>>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> >>>> index 7d329f203a9b..c3949671aa03 100644
> >>>> --- a/hw/intc/xive-internal.h
> >>>> +++ b/hw/intc/xive-internal.h
> >>>> @@ -131,9 +131,59 @@ typedef struct XiveIVE {
> >>>>  #define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> >>>>  } XiveIVE;
> >>>>  
> >>>> +/* EQ */
> >>>> +typedef struct XiveEQ {
> >>>> +        uint32_t        w0;
> >>>> +#define EQ_W0_VALID             PPC_BIT32(0)
> >>>> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
> >>>> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
> >>>> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
> >>>> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
> >>>> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
> >>>> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
> >>>> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
> >>>> +#define EQ_W0_SW0               PPC_BIT32(16)
> >>>> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
> >>>> +#define EQ_QSIZE_4K             0
> >>>> +#define EQ_QSIZE_64K            4
> >>>> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
> >>>> +        uint32_t        w1;
> >>>> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
> >>>> +#define EQ_W1_ESn_P             PPC_BIT32(0)
> >>>> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
> >>>> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
> >>>> +#define EQ_W1_ESe_P             PPC_BIT32(2)
> >>>> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
> >>>> +#define EQ_W1_GENERATION        PPC_BIT32(9)
> >>>> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> >>>> +        uint32_t        w2;
> >>>> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> >>>> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> >>>> +        uint32_t        w3;
> >>>> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> >>>> +        uint32_t        w4;
> >>>> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
> >>>> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
> >>>> +        uint32_t        w5;
> >>>> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
> >>>> +        uint32_t        w6;
> >>>> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
> >>>> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> >>>> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> >>>> +        uint32_t        w7;
> >>>> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
> >>>> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> >>>> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> >>>> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
> >>>> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> >>>> +} XiveEQ;
> >>>> +
> >>>>  #define XIVE_PRIORITY_MAX  7
> >>>>  
> >>>>  void spapr_xive_reset(void *dev);
> >>>>  XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t lisn);
> >>>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
> >>>> +bool spapr_xive_eq_for_server(sPAPRXive *xive, uint32_t server, uint8_t prio,
> >>>> +                              uint32_t *out_eq_idx);
> >>>>  
> >>>>  #endif /* _INTC_XIVE_INTERNAL_H */
> >>>
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged
  2017-12-02 14:40     ` Benjamin Herrenschmidt
@ 2017-12-04  1:17       ` David Gibson
  2017-12-04 16:09         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-04  1:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3940 bytes --]

On Sat, Dec 02, 2017 at 08:40:58AM -0600, Benjamin Herrenschmidt wrote:
> On Thu, 2017-11-30 at 16:00 +1100, David Gibson wrote:
> > 
> > >  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> > >  {
> > > -    return 0;
> > > +    uint8_t nsr = icp->tima_os[TM_NSR];
> > > +
> > > +    qemu_irq_lower(icp->output);
> > > +
> > > +    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
> > > +        uint8_t cppr = icp->tima_os[TM_PIPR];
> > > +
> > > +        icp->tima_os[TM_CPPR] = cppr;
> > > +
> > > +        /* Reset the pending buffer bit */
> > > +        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
> > 
> > What if multiple irqs of the same priority were queued?
> 
> It's the job of the OS to handle that case by consuming from the queue
> until it's empty. There is an MMIO the guest can use if it wants to
> that can set the IPB bits back to 1 for a given priority. Otherwise in
> Linux we just have a SW way to force a replay.

Ok, so "accept" is effectively saying the OS is accepting all
interrupts from that queue, right?

> 
> > > +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > > +
> > > +        /* Drop Exception bit for OS */
> > > +        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
> > > +    }
> > > +
> > > +    return (nsr << 8) | icp->tima_os[TM_CPPR];
> > > +}
> > > +
> > > +static void spapr_xive_icp_notify(sPAPRXiveICP *icp)
> > > +{
> > > +    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
> > > +        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> > > +        qemu_irq_raise(icp->output);
> > > +    }
> > >  }
> > >  
> > >  static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> > > @@ -51,6 +105,9 @@ static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> > >      }
> > >  
> > >      icp->tima_os[TM_CPPR] = cppr;
> > > +
> > > +    /* CPPR has changed, inform the ICP which might raise an exception */
> > > +    spapr_xive_icp_notify(icp);
> > >  }
> > >  
> > >  /*
> > > @@ -224,6 +281,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > >      XiveEQ *eq;
> > >      uint32_t eq_idx;
> > >      uint8_t priority;
> > > +    uint32_t server;
> > > +    sPAPRXiveICP *icp;
> > >  
> > >      ive = spapr_xive_get_ive(xive, lisn);
> > >      if (!ive || !(ive->w & IVE_VALID)) {
> > > @@ -253,6 +312,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > >          qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> > >      }
> > >  
> > > +    server = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> > > +    icp = spapr_xive_icp_get(xive, server);
> > > +    if (!icp) {
> > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for server %d\n", server);
> > > +        return;
> > > +    }
> > > +
> > >      if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> > >          priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> > >  
> > > @@ -260,9 +326,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > >          if (priority == 0xff) {
> > >              g_assert_not_reached();
> > >          }
> > > +
> > > +        /* Update the IPB (Interrupt Pending Buffer) with the priority
> > > +         * of the new notification and inform the ICP, which will
> > > +         * decide to raise the exception, or not, depending the CPPR.
> > > +         */
> > > +        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
> > > +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > >      } else {
> > >          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> > >      }
> > > +
> > > +    spapr_xive_icp_notify(icp);
> > >  }
> > >  
> > >  /*
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-12-02 14:46           ` Benjamin Herrenschmidt
@ 2017-12-04  1:20             ` David Gibson
  2017-12-05 10:58               ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-04  1:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Sat, Dec 02, 2017 at 08:46:19AM -0600, Benjamin Herrenschmidt wrote:
> On Sat, 2017-12-02 at 08:45 -0600, Benjamin Herrenschmidt wrote:
> > On Fri, 2017-12-01 at 15:10 +1100, David Gibson wrote:
> > > 
> > > Hm, ok.  Guest endian (or at least, not definitively host-endian) data
> > > in a plain uint32_t makes me uncomfortable.  Could we use char data[4]
> > > instead, to make it clear it's a byte-ordered buffer, rather than a
> > > number as far as the XIVE is concerned.
> > > 
> > > Hm.. except that doesn't quite work, because the hardware must define
> > > which end that generation bit ends up in...
> > 
> > It also needs to be written atomically. Just say it's big endian.
> 
> Also the guest reads it using be32_to_cpup...

Ok.  Definitely should be treated as BE and read/written with the be32
DMA helper functions.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-12-01  8:10         ` Cédric Le Goater
@ 2017-12-04  1:59           ` David Gibson
  2017-12-04  8:32             ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-04  1:59 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3019 bytes --]

On Fri, Dec 01, 2017 at 09:10:24AM +0100, Cédric Le Goater wrote:
> On 12/01/2017 05:14 AM, David Gibson wrote:
> > On Thu, Nov 30, 2017 at 03:15:09PM +0000, Cédric Le Goater wrote:
> >> On 11/30/2017 05:55 AM, David Gibson wrote:
> >>> On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
> >>>> The XIVE object is designed to be always available, so it is created
> >>>> unconditionally on newer machines.
> >>>
> >>> There doesn't actually seem to be anything dependent on machine
> >>> version here.
> >>
> >> No. I thought that was too early in the patchset. This is handled 
> >> in the last patch with a 'xive_exploitation' bool which is set to 
> >> false on older machines. 
> >>
> >> But, nevertheless, the XIVE objects are always created even if not
> >> used. Something to discuss.
> > 
> > That'll definitely break backwards migration, since the destination
> > won't understand the (unused but still present) xive state it
> > receives. 
> 
> no because it's not sent. the vmstate 'needed' op of the sPAPRXive
> object discards it :
> 
>     static bool vmstate_spapr_xive_needed(void *opaque)
>     {
>         sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  
>         return spapr->xive_exploitation;
>     }

Ah, sorry, missed that.  Once we have negotiation we'll need to make
sure the xive_exploitation bit is sent first, of course, but I'm
pretty sure the machine state is already sent first.

> > So xives can only be created on new machine types. 
> 
> That would be better I agree. I can probably use the 'xive_exploitation'
> bool to condition its creation.

Hrm.  I'm less sure about that - I'm not sure the lifetimes line up.
But I'd like to avoid creating them on earlier machine types, even if
xive_exploitation can't do the trick.

> 
> > I'm ok
> > (at least tentatively) with always creating them on the newer machine
> > types, regardless of whether the guest ends up exploiting it or not.
> 
> OK.
> 
> 
> >>>> Depending on the configuration and
> >>>> the guest capabilities, the CAS negotiation process will decide which
> >>>> interrupt model to use, legacy or XIVE.
> >>>>
> >>>> The XIVE model makes use of the full range of the IRQ number space
> >>>> because the IRQ numbers for the CPU IPIs are allocated in the range
> >>>> below XICS_IRQ_BASE, which is unused by XICS.
> >>>
> >>> Ok.  And I take it 4096 is enough space for the XIVE IPIs for the
> >>> forseeable future?
> >>
> >> The biggest real system I am aware of as 16 sockets, 192 cores, SMT8. 
> >> That's 1536 cpus. pseries has a max_cpus of 1024.
> > 
> > Ok, so we can go to double the current system size, but not 4x.  Not
> > sure if that seems adequate or not.  Still it's a relatively minor
> > detail.
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 20/25] spapr: add device tree support for the XIVE interrupt mode
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 20/25] spapr: add device tree " Cédric Le Goater
@ 2017-12-04  7:49   ` David Gibson
  2017-12-04 16:19     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-04  7:49 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 6335 bytes --]

On Thu, Nov 23, 2017 at 02:29:50PM +0100, Cédric Le Goater wrote:
> The XIVE interface for the guest is described in the device tree under
> the "interrupt-controller" node. A couple of new properties are
> specific to XIVE :
> 
>  - "reg"
> 
>    contains the base address and size of the thread interrupt
>    managnement areas (TIMA), also called rings, for the User level and
>    for the Guest OS level. Only the Guest OS level is taken into
>    account today.
> 
>  - "ibm,xive-eq-sizes"
> 
>    the size of the event queues. One cell per size supported, contains
>    log2 of size, in ascending order.
> 
>  - "ibm,xive-lisn-ranges"
> 
>    the interrupt numbers ranges assigned to the guest. These are
>    allocated using a simple bitmap.
> 
> and also under the root node :
> 
>  - "ibm,plat-res-int-priorities"
> 
>    contains a list of priorities that the hypervisor has reserved for
>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
> 
> When the XIVE interrupt mode is activated after the CAS negotiation,
> the machine will perform a reboot to rebuild the device tree.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive_hcall.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr.c              |  7 ++++++-
>  hw/ppc/spapr_hcall.c        |  6 ++++++
>  include/hw/ppc/spapr_xive.h |  2 ++
>  4 files changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> index 676fe0e2d5c7..60c6c9f4be8f 100644
> --- a/hw/intc/spapr_xive_hcall.c
> +++ b/hw/intc/spapr_xive_hcall.c
> @@ -883,3 +883,53 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>  }
> +
> +void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
> +                         void *fdt, uint32_t phandle)

Call it spapr_dt_xive() please, I'm trying to standardize on that
pattern for functions creating DT pieces.

> +{
> +    sPAPRXive *xive = spapr->xive;
> +    int node;
> +    uint64_t timas[2 * 2];
> +    uint32_t lisn_ranges[] = {
> +        cpu_to_be32(0),
> +        cpu_to_be32(nr_servers),
> +    };
> +    uint32_t eq_sizes[] = {
> +        cpu_to_be32(12), /* 4K */
> +        cpu_to_be32(16), /* 64K */
> +        cpu_to_be32(21), /* 2M */
> +        cpu_to_be32(24), /* 16M */
> +    };
> +    uint32_t plat_res_int_priorities[ARRAY_SIZE(reserved_priorities)];
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(plat_res_int_priorities); i++) {
> +        plat_res_int_priorities[i] = cpu_to_be32(reserved_priorities[i]);
> +    }
> +
> +    /* Thread Interrupt Management Areas : User and OS */
> +    for (i = 0; i < 2; i++) {
> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
> +    }
> +
> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));

You need a unit address here matching the reg property.

> +
> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));

You don't need to set name properties explicitly for flattened trees.

> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> +
> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
> +                     sizeof(eq_sizes)));
> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
> +                     sizeof(lisn_ranges)));
> +
> +    /* For SLOF */
> +    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
> +    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
> +
> +    /* top properties */
> +    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
> +                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
> +}
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8b15c0b500d0..3a62369883cc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1127,7 +1127,12 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>      _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
>  
>      /* /interrupt controller */
> -    spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
> +    } else {
> +        /* Populate device tree for XIVE */
> +        spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
> +    }
>  
>      ret = spapr_populate_memory(spapr, fdt);
>      if (ret < 0) {
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index be22a6b2895f..e2a1665beee9 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1646,6 +1646,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
>                                            ov5_updates) != 0);
>      }
> +
> +    /* We need to rebuild the device tree for XIVE, generate a reset */
> +    if (!spapr->cas_reboot) {
> +        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT);
> +    }
> +
>      spapr_ovec_cleanup(ov5_updates);
>  
>      if (spapr->cas_reboot) {
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 3f822220647f..f6d4bf26e06a 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -82,5 +82,7 @@ void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
>  typedef struct sPAPRMachineState sPAPRMachineState;
>  
>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> +void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers, void *fdt,
> +                         uint32_t phandle);
>  
>  #endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions Cédric Le Goater
@ 2017-12-04  7:52   ` David Gibson
  2017-12-04 15:30     ` Cédric Le Goater
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-04  7:52 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3246 bytes --]

On Thu, Nov 23, 2017 at 02:29:51PM +0100, Cédric Le Goater wrote:
> When the XIVE interrupt mode is activated, the machine needs to expose
> to the guest the MMIO regions use by the controller :
> 
>   - Event State Buffer (ESB)
>   - Thread Interrupt Management Area (TIMA)
> 
> Migration will also need to reflect the current interrupt mode in use.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive_hcall.c  | 14 ++++++++++++++
>  hw/ppc/spapr.c              |  5 +++++
>  include/hw/ppc/spapr_xive.h |  1 +
>  3 files changed, 20 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> index 60c6c9f4be8f..ba217144878e 100644
> --- a/hw/intc/spapr_xive_hcall.c
> +++ b/hw/intc/spapr_xive_hcall.c
> @@ -933,3 +933,17 @@ void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
>      _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
>                       plat_res_int_priorities, sizeof(plat_res_int_priorities)));
>  }
> +
> +void spapr_xive_mmio_map(sPAPRMachineState *spapr)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +
> +    /* ESBs */
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->esb_base);
> +
> +    /* Thread Management Interrupt Areas */
> +    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
> +     * region needs some rework in the handlers */
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1,
> +                    xive->tm_base + (1 << xive->tm_shift));

You probably shouldn't be exposing the user TIMA in the DT if you're
only allowing the OS TIME to be mapped.

> +}
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3a62369883cc..734706c18cb3 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1132,6 +1132,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>      } else {
>          /* Populate device tree for XIVE */
>          spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
> +        spapr_xive_mmio_map(spapr);

This doesn't belong here, spapr_build_fdt() should _just_ build the
fdt, not have side effects on the actual device state.

>      }
>  
>      ret = spapr_populate_memory(spapr, fdt);
> @@ -1613,6 +1614,10 @@ static int spapr_post_load(void *opaque, int version_id)
>          }
>      }
>  
> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        spapr_xive_mmio_map(spapr);
> +    }
> +
>      return err;
>  }
>  
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index f6d4bf26e06a..88355f7eb643 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -84,5 +84,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>  void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers, void *fdt,
>                           uint32_t phandle);
> +void spapr_xive_mmio_map(sPAPRMachineState *spapr);
>  
>  #endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 22/25] spapr: add XIVE support to spapr_irq_get_qirq()
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 22/25] spapr: add XIVE support to spapr_irq_get_qirq() Cédric Le Goater
@ 2017-12-04  7:52   ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-12-04  7:52 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]

On Thu, Nov 23, 2017 at 02:29:52PM +0100, Cédric Le Goater wrote:
> The XIVE object has its own set of qirqs which is to be used when the
> XIVE interrupt mode is activated.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 734706c18cb3..a91ec1c0751a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3746,8 +3746,12 @@ qemu_irq spapr_irq_get_qirq(sPAPRMachineState *spapr, int irq)
>  {
>      ICSState *ics = spapr->ics;
>  
> -    if (ics_valid_irq(ics, irq)) {
> -        return ics->qirqs[irq - ics->offset];
> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return spapr->xive->qirqs[irq];

You should have a xive helper function for this - spapr code shouldn't
be reaching into the internal XIVE structure.

> +    } else {
> +        if (ics_valid_irq(ics, irq)) {
> +            return ics->qirqs[irq - ics->offset];
> +        }
>      }
>  
>      return NULL;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 23/25] spapr: toggle the ICP depending on the selected interrupt mode
  2017-11-23 13:29 ` [Qemu-devel] [PATCH 23/25] spapr: toggle the ICP depending on the selected interrupt mode Cédric Le Goater
@ 2017-12-04  7:56   ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-12-04  7:56 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 4703 bytes --]

On Thu, Nov 23, 2017 at 02:29:53PM +0100, Cédric Le Goater wrote:
> Each interrupt mode has its own specific interrupt presenter object,
> that we store under the CPU object, one for XICS and one for XIVE. The
> active presenter, corresponding to the current interrupt mode, is
> simply selected with a lookup on the children of the CPU.
> 
> Migration and CPU hotplug also need to reflect the current interrupt
> mode in use.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr.c                  | 21 ++++++++++++++++++++-
>  hw/ppc/spapr_cpu_core.c         | 31 +++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_cpu_core.h |  1 +
>  3 files changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a91ec1c0751a..b7389dbdf5ca 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1128,8 +1128,10 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>  
>      /* /interrupt controller */
>      if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        spapr_cpu_core_set_icp(spapr->icp_type);
>          spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
>      } else {
> +        spapr_cpu_core_set_icp(TYPE_SPAPR_XIVE_ICP);

Again you shouldn't have non-DT side-effects from spapr_build_fdt().

>          /* Populate device tree for XIVE */
>          spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
>          spapr_xive_mmio_map(spapr);
> @@ -1615,6 +1617,7 @@ static int spapr_post_load(void *opaque, int version_id)
>      }
>  
>      if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        spapr_cpu_core_set_icp(TYPE_SPAPR_XIVE_ICP);
>          spapr_xive_mmio_map(spapr);
>      }
>  
> @@ -3610,7 +3613,7 @@ static ICPState *spapr_icp_get(XICSFabric *xi, int vcpu_id)
>  Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
>  {
>      Error *local_err = NULL;
> -    Object *obj;
> +    Object *obj, *obj_xive;
>  
>      obj = icp_create(cs, spapr->icp_type, XICS_FABRIC(spapr), &local_err);
>      if (local_err) {
> @@ -3618,6 +3621,22 @@ Object *spapr_icp_create(sPAPRMachineState *spapr, CPUState *cs, Error **errp)
>          return NULL;
>      }
>  
> +    /* Add a XIVE interrupt presenter. The machine will switch the CPU
> +     * ICP depending on the interrupt model negotiated at CAS time.
> +     */
> +    obj_xive = icp_create(cs, TYPE_SPAPR_XIVE_ICP, XICS_FABRIC(spapr),
> +                          &local_err);

You shouldn't be using icp_create() a xics function, for xive.

> +    if (local_err) {
> +        object_unparent(obj);
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +
> +    /* when hotplugged, the CPU should have the correct ICP */
> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return obj_xive;
> +    }
> +
>      return obj;
>  }
>  
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 61a9850e688b..b0e39270f262 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -257,3 +257,34 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
>  };
>  
>  DEFINE_TYPES(spapr_cpu_core_type_infos)
> +
> +typedef struct ForeachFindICPArgs {
> +    const char *icp_type;
> +    Object *icp;
> +} ForeachFindICPArgs;
> +
> +static int spapr_cpu_core_find_icp(Object *child, void *opaque)
> +{
> +    ForeachFindICPArgs *args = opaque;
> +
> +    if (object_dynamic_cast(child, args->icp_type)) {
> +        args->icp = child;
> +    }
> +
> +    return args->icp != NULL;
> +}
> +
> +void spapr_cpu_core_set_icp(const char *icp_type)
> +{
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        ForeachFindICPArgs args = { icp_type, NULL };
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        object_child_foreach(OBJECT(cs), spapr_cpu_core_find_icp, &args);
> +        g_assert(args.icp);
> +
> +        cpu->intc = args.icp;
> +    }
> +}
> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
> index f2d48d6a6786..a657dfb8863c 100644
> --- a/include/hw/ppc/spapr_cpu_core.h
> +++ b/include/hw/ppc/spapr_cpu_core.h
> @@ -38,4 +38,5 @@ typedef struct sPAPRCPUCoreClass {
>  } sPAPRCPUCoreClass;
>  
>  const char *spapr_get_cpu_core_type(const char *cpu_type);
> +void spapr_cpu_core_set_icp(const char *icp_type);
>  #endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-12-04  1:59           ` David Gibson
@ 2017-12-04  8:32             ` Cédric Le Goater
  2017-12-04  8:40               ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-04  8:32 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/04/2017 02:59 AM, David Gibson wrote:
> On Fri, Dec 01, 2017 at 09:10:24AM +0100, Cédric Le Goater wrote:
>> On 12/01/2017 05:14 AM, David Gibson wrote:
>>> On Thu, Nov 30, 2017 at 03:15:09PM +0000, Cédric Le Goater wrote:
>>>> On 11/30/2017 05:55 AM, David Gibson wrote:
>>>>> On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
>>>>>> The XIVE object is designed to be always available, so it is created
>>>>>> unconditionally on newer machines.
>>>>>
>>>>> There doesn't actually seem to be anything dependent on machine
>>>>> version here.
>>>>
>>>> No. I thought that was too early in the patchset. This is handled 
>>>> in the last patch with a 'xive_exploitation' bool which is set to 
>>>> false on older machines. 
>>>>
>>>> But, nevertheless, the XIVE objects are always created even if not
>>>> used. Something to discuss.
>>>
>>> That'll definitely break backwards migration, since the destination
>>> won't understand the (unused but still present) xive state it
>>> receives. 
>>
>> no because it's not sent. the vmstate 'needed' op of the sPAPRXive
>> object discards it :
>>
>>     static bool vmstate_spapr_xive_needed(void *opaque)
>>     {
>>         sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>  
>>         return spapr->xive_exploitation;
>>     }
> 
> Ah, sorry, missed that.  Once we have negotiation we'll need to make
> sure the xive_exploitation bit is sent first, of course, but I'm
> pretty sure the machine state is already sent first.
> 
>>> So xives can only be created on new machine types. 
>>
>> That would be better I agree. I can probably use the 'xive_exploitation'
>> bool to condition its creation.
> 
> Hrm.  I'm less sure about that - I'm not sure the lifetimes line up.
> But I'd like to avoid creating them on earlier machine types, even if
> xive_exploitation can't do the trick.

Yes. I agree. I think we can work something out without introducing
too much complexity. The XIVE object is directly used by the
machine only to set/unset IRQ numbers. Otherwise, it is always 
conditioned by :

    spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)

I think adding a couple of more tests on the 'xive_exploitation'
bool should work out for older machines.

Thanks,
 
C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
  2017-12-04  8:32             ` Cédric Le Goater
@ 2017-12-04  8:40               ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-12-04  8:40 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]

On Mon, Dec 04, 2017 at 09:32:00AM +0100, Cédric Le Goater wrote:
> On 12/04/2017 02:59 AM, David Gibson wrote:
> > On Fri, Dec 01, 2017 at 09:10:24AM +0100, Cédric Le Goater wrote:
> >> On 12/01/2017 05:14 AM, David Gibson wrote:
> >>> On Thu, Nov 30, 2017 at 03:15:09PM +0000, Cédric Le Goater wrote:
> >>>> On 11/30/2017 05:55 AM, David Gibson wrote:
> >>>>> On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
> >>>>>> The XIVE object is designed to be always available, so it is created
> >>>>>> unconditionally on newer machines.
> >>>>>
> >>>>> There doesn't actually seem to be anything dependent on machine
> >>>>> version here.
> >>>>
> >>>> No. I thought that was too early in the patchset. This is handled 
> >>>> in the last patch with a 'xive_exploitation' bool which is set to 
> >>>> false on older machines. 
> >>>>
> >>>> But, nevertheless, the XIVE objects are always created even if not
> >>>> used. Something to discuss.
> >>>
> >>> That'll definitely break backwards migration, since the destination
> >>> won't understand the (unused but still present) xive state it
> >>> receives. 
> >>
> >> no because it's not sent. the vmstate 'needed' op of the sPAPRXive
> >> object discards it :
> >>
> >>     static bool vmstate_spapr_xive_needed(void *opaque)
> >>     {
> >>         sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >>  
> >>         return spapr->xive_exploitation;
> >>     }
> > 
> > Ah, sorry, missed that.  Once we have negotiation we'll need to make
> > sure the xive_exploitation bit is sent first, of course, but I'm
> > pretty sure the machine state is already sent first.
> > 
> >>> So xives can only be created on new machine types. 
> >>
> >> That would be better I agree. I can probably use the 'xive_exploitation'
> >> bool to condition its creation.
> > 
> > Hrm.  I'm less sure about that - I'm not sure the lifetimes line up.
> > But I'd like to avoid creating them on earlier machine types, even if
> > xive_exploitation can't do the trick.
> 
> Yes. I agree. I think we can work something out without introducing
> too much complexity. The XIVE object is directly used by the
> machine only to set/unset IRQ numbers. Otherwise, it is always 
> conditioned by :
> 
>     spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)
> 
> I think adding a couple of more tests on the 'xive_exploitation'
> bool should work out for older machines.

Ok.  If not you can always add a "xive_possible" flag to the MachineClass.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions
  2017-12-04  7:52   ` David Gibson
@ 2017-12-04 15:30     ` Cédric Le Goater
  2017-12-05  2:24       ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-04 15:30 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/04/2017 08:52 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:51PM +0100, Cédric Le Goater wrote:
>> When the XIVE interrupt mode is activated, the machine needs to expose
>> to the guest the MMIO regions use by the controller :
>>
>>   - Event State Buffer (ESB)
>>   - Thread Interrupt Management Area (TIMA)
>>
>> Migration will also need to reflect the current interrupt mode in use.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive_hcall.c  | 14 ++++++++++++++
>>  hw/ppc/spapr.c              |  5 +++++
>>  include/hw/ppc/spapr_xive.h |  1 +
>>  3 files changed, 20 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
>> index 60c6c9f4be8f..ba217144878e 100644
>> --- a/hw/intc/spapr_xive_hcall.c
>> +++ b/hw/intc/spapr_xive_hcall.c
>> @@ -933,3 +933,17 @@ void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
>>      _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
>>                       plat_res_int_priorities, sizeof(plat_res_int_priorities)));
>>  }
>> +
>> +void spapr_xive_mmio_map(sPAPRMachineState *spapr)
>> +{
>> +    sPAPRXive *xive = spapr->xive;
>> +
>> +    /* ESBs */
>> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->esb_base);
>> +
>> +    /* Thread Management Interrupt Areas */
>> +    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
>> +     * region needs some rework in the handlers */
>> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1,
>> +                    xive->tm_base + (1 << xive->tm_shift));
> 
> You probably shouldn't be exposing the user TIMA in the DT if you're
> only allowing the OS TIME to be mapped.

The specs requires to map both Uset and OS TIMA.

> 
>> +}
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3a62369883cc..734706c18cb3 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1132,6 +1132,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>>      } else {
>>          /* Populate device tree for XIVE */
>>          spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
>> +        spapr_xive_mmio_map(spapr);
> 
> This doesn't belong here, spapr_build_fdt() should _just_ build the
> fdt, not have side effects on the actual device state.

Yes. I will move the rest of the XIVE setup in the reset handler
before the device tree is built.

Thanks,

C.  

>>      }
>>  
>>      ret = spapr_populate_memory(spapr, fdt);
>> @@ -1613,6 +1614,10 @@ static int spapr_post_load(void *opaque, int version_id)
>>          }
>>      }
>>  
>> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        spapr_xive_mmio_map(spapr);
>> +    }
>> +
>>      return err;
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index f6d4bf26e06a..88355f7eb643 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -84,5 +84,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>  void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers, void *fdt,
>>                           uint32_t phandle);
>> +void spapr_xive_mmio_map(sPAPRMachineState *spapr);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged
  2017-12-04  1:17       ` David Gibson
@ 2017-12-04 16:09         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-04 16:09 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

On Mon, 2017-12-04 at 12:17 +1100, David Gibson wrote:
> On Sat, Dec 02, 2017 at 08:40:58AM -0600, Benjamin Herrenschmidt wrote:
> > On Thu, 2017-11-30 at 16:00 +1100, David Gibson wrote:
> > > 
> > > >  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> > > >  {
> > > > -    return 0;
> > > > +    uint8_t nsr = icp->tima_os[TM_NSR];
> > > > +
> > > > +    qemu_irq_lower(icp->output);
> > > > +
> > > > +    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
> > > > +        uint8_t cppr = icp->tima_os[TM_PIPR];
> > > > +
> > > > +        icp->tima_os[TM_CPPR] = cppr;
> > > > +
> > > > +        /* Reset the pending buffer bit */
> > > > +        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
> > > 
> > > What if multiple irqs of the same priority were queued?
> > 
> > It's the job of the OS to handle that case by consuming from the queue
> > until it's empty. There is an MMIO the guest can use if it wants to
> > that can set the IPB bits back to 1 for a given priority. Otherwise in
> > Linux we just have a SW way to force a replay.
> 
> Ok, so "accept" is effectively saying the OS is accepting all
> interrupts from that queue, right?

It's whatever you want it to mean. It's simply a test & clear on the
prio bit. From a HW standpoint, you could have multiple queues or just
set an internal SW flag to go chck again later etc...

> 
> > 
> > > > +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > > > +
> > > > +        /* Drop Exception bit for OS */
> > > > +        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
> > > > +    }
> > > > +
> > > > +    return (nsr << 8) | icp->tima_os[TM_CPPR];
> > > > +}
> > > > +
> > > > +static void spapr_xive_icp_notify(sPAPRXiveICP *icp)
> > > > +{
> > > > +    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
> > > > +        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> > > > +        qemu_irq_raise(icp->output);
> > > > +    }
> > > >  }
> > > >  
> > > >  static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> > > > @@ -51,6 +105,9 @@ static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> > > >      }
> > > >  
> > > >      icp->tima_os[TM_CPPR] = cppr;
> > > > +
> > > > +    /* CPPR has changed, inform the ICP which might raise an exception */
> > > > +    spapr_xive_icp_notify(icp);
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -224,6 +281,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > > >      XiveEQ *eq;
> > > >      uint32_t eq_idx;
> > > >      uint8_t priority;
> > > > +    uint32_t server;
> > > > +    sPAPRXiveICP *icp;
> > > >  
> > > >      ive = spapr_xive_get_ive(xive, lisn);
> > > >      if (!ive || !(ive->w & IVE_VALID)) {
> > > > @@ -253,6 +312,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > > >          qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> > > >      }
> > > >  
> > > > +    server = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> > > > +    icp = spapr_xive_icp_get(xive, server);
> > > > +    if (!icp) {
> > > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for server %d\n", server);
> > > > +        return;
> > > > +    }
> > > > +
> > > >      if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> > > >          priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> > > >  
> > > > @@ -260,9 +326,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > > >          if (priority == 0xff) {
> > > >              g_assert_not_reached();
> > > >          }
> > > > +
> > > > +        /* Update the IPB (Interrupt Pending Buffer) with the priority
> > > > +         * of the new notification and inform the ICP, which will
> > > > +         * decide to raise the exception, or not, depending the CPPR.
> > > > +         */
> > > > +        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
> > > > +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > > >      } else {
> > > >          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> > > >      }
> > > > +
> > > > +    spapr_xive_icp_notify(icp);
> > > >  }
> > > >  
> > > >  /*
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 20/25] spapr: add device tree support for the XIVE interrupt mode
  2017-12-04  7:49   ` David Gibson
@ 2017-12-04 16:19     ` Cédric Le Goater
  2017-12-05  3:38       ` David Gibson
  0 siblings, 1 reply; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-04 16:19 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/04/2017 08:49 AM, David Gibson wrote:
> On Thu, Nov 23, 2017 at 02:29:50PM +0100, Cédric Le Goater wrote:
>> The XIVE interface for the guest is described in the device tree under
>> the "interrupt-controller" node. A couple of new properties are
>> specific to XIVE :
>>
>>  - "reg"
>>
>>    contains the base address and size of the thread interrupt
>>    managnement areas (TIMA), also called rings, for the User level and
>>    for the Guest OS level. Only the Guest OS level is taken into
>>    account today.
>>
>>  - "ibm,xive-eq-sizes"
>>
>>    the size of the event queues. One cell per size supported, contains
>>    log2 of size, in ascending order.
>>
>>  - "ibm,xive-lisn-ranges"
>>
>>    the interrupt numbers ranges assigned to the guest. These are
>>    allocated using a simple bitmap.
>>
>> and also under the root node :
>>
>>  - "ibm,plat-res-int-priorities"
>>
>>    contains a list of priorities that the hypervisor has reserved for
>>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
>>
>> When the XIVE interrupt mode is activated after the CAS negotiation,
>> the machine will perform a reboot to rebuild the device tree.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive_hcall.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++
>>  hw/ppc/spapr.c              |  7 ++++++-
>>  hw/ppc/spapr_hcall.c        |  6 ++++++
>>  include/hw/ppc/spapr_xive.h |  2 ++
>>  4 files changed, 64 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
>> index 676fe0e2d5c7..60c6c9f4be8f 100644
>> --- a/hw/intc/spapr_xive_hcall.c
>> +++ b/hw/intc/spapr_xive_hcall.c
>> @@ -883,3 +883,53 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>  }
>> +
>> +void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
>> +                         void *fdt, uint32_t phandle)
> 
> Call it spapr_dt_xive() please, I'm trying to standardize on that
> pattern for functions creating DT pieces.

OK. And what about the first argument : sPAPRMachineState *spapr 
or sPAPRXive *xive ? I tend to prefer the first option because
it's related to the interface with the guest, like the hcalls.

> 
>> +{
>> +    sPAPRXive *xive = spapr->xive;
>> +    int node;
>> +    uint64_t timas[2 * 2];
>> +    uint32_t lisn_ranges[] = {
>> +        cpu_to_be32(0),
>> +        cpu_to_be32(nr_servers),
>> +    };
>> +    uint32_t eq_sizes[] = {
>> +        cpu_to_be32(12), /* 4K */
>> +        cpu_to_be32(16), /* 64K */
>> +        cpu_to_be32(21), /* 2M */
>> +        cpu_to_be32(24), /* 16M */
>> +    };
>> +    uint32_t plat_res_int_priorities[ARRAY_SIZE(reserved_priorities)];
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(plat_res_int_priorities); i++) {
>> +        plat_res_int_priorities[i] = cpu_to_be32(reserved_priorities[i]);
>> +    }
>> +
>> +    /* Thread Interrupt Management Areas : User and OS */
>> +    for (i = 0; i < 2; i++) {
>> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
>> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
>> +    }
>> +
>> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
> 
> You need a unit address here matching the reg property.

Indeed. I didn't notice. Curiously it was taking the first address 
specified in the reg property of the node.

>> +
>> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
> 
> You don't need to set name properties explicitly for flattened trees.

OK.

Thanks,

C. 



>> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
>> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
>> +
>> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
>> +                     sizeof(eq_sizes)));
>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
>> +                     sizeof(lisn_ranges)));
>> +
>> +    /* For SLOF */
>> +    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
>> +    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
>> +
>> +    /* top properties */
>> +    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
>> +                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
>> +}
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 8b15c0b500d0..3a62369883cc 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1127,7 +1127,12 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>>      _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
>>  
>>      /* /interrupt controller */
>> -    spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
>> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
>> +    } else {
>> +        /* Populate device tree for XIVE */
>> +        spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
>> +    }
>>  
>>      ret = spapr_populate_memory(spapr, fdt);
>>      if (ret < 0) {
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index be22a6b2895f..e2a1665beee9 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -1646,6 +1646,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
>>                                            ov5_updates) != 0);
>>      }
>> +
>> +    /* We need to rebuild the device tree for XIVE, generate a reset */
>> +    if (!spapr->cas_reboot) {
>> +        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT);
>> +    }
>> +
>>      spapr_ovec_cleanup(ov5_updates);
>>  
>>      if (spapr->cas_reboot) {
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 3f822220647f..f6d4bf26e06a 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -82,5 +82,7 @@ void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>  
>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>> +void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers, void *fdt,
>> +                         uint32_t phandle);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues
  2017-12-04  1:09           ` David Gibson
@ 2017-12-04 16:31             ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-04 16:31 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/04/2017 02:09 AM, David Gibson wrote:
> On Fri, Dec 01, 2017 at 05:36:39PM +0100, Cédric Le Goater wrote:
>> On 12/01/2017 12:35 AM, David Gibson wrote:
>>> On Thu, Nov 30, 2017 at 02:06:27PM +0000, Cédric Le Goater wrote:
>>>> On 11/30/2017 04:38 AM, David Gibson wrote:
>>>>> On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
>>>>>> The Event Queue Descriptor (EQD) table, also known as Event Notification
>>>>>> Descriptor (END), is one of the internal tables the XIVE interrupt
>>>>>> controller uses to redirect exception from event sources to CPU
>>>>>> threads.
>>>>>>
>>>>>> The EQD specifies on which Event Queue the event data should be posted
>>>>>> when an exception occurs (later on pulled by the OS) and which server
>>>>>> (VPD in XIVE terminology) to notify. The Event Queue is a much more
>>>>>> complex structure but we start with a simple model for the sPAPR
>>>>>> machine.
>>>>>
>>>>> Just to clarify my understanding a server / VPD in XIVE would
>>>>> typically correspond to a cpu - either real or virtual, yes?
>>>>
>>>> yes. VP for "virtual processor" and VPD for "virtual processor 
>>>> descriptor" which contains the XIVE interrupt state of the VP 
>>>> when not dispatched. It is still described in some documentation 
>>>> as an NVT : Notification Virtual Target.  
>>>>
>>>> XIVE concepts were renamed at some time but the old name perdured.
>>>> I am still struggling my way through all the names.
>>>>
>>>>
>>>>>> There is one XiveEQ per priority and the model chooses to store them
>>>>>> under the Xive Interrupt presenter model. It will be retrieved, just
>>>>>> like for XICS, through the 'intc' object pointer of the CPU.
>>>>>>
>>>>>> The EQ indexing follows a simple pattern:
>>>>>>
>>>>>>        (server << 3) | (priority & 0x7)
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>>>> ---
>>>>>>  hw/intc/spapr_xive.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  hw/intc/xive-internal.h | 50 +++++++++++++++++++++++++++++++++++++++++++
>>>>>>  2 files changed, 106 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>>>> index 554b25e0884c..983317a6b3f6 100644
>>>>>> --- a/hw/intc/spapr_xive.c
>>>>>> +++ b/hw/intc/spapr_xive.c
>>>>>> @@ -23,6 +23,7 @@
>>>>>>  #include "sysemu/dma.h"
>>>>>>  #include "monitor/monitor.h"
>>>>>>  #include "hw/ppc/spapr_xive.h"
>>>>>> +#include "hw/ppc/spapr.h"
>>>>>>  #include "hw/ppc/xics.h"
>>>>>>  
>>>>>>  #include "xive-internal.h"
>>>>>> @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
>>>>>>      uint8_t   tima[TM_RING_COUNT * 0x10];
>>>>>>      uint8_t   *tima_os;
>>>>>>      qemu_irq  output;
>>>>>> +
>>>>>> +    XiveEQ    eqt[XIVE_PRIORITY_MAX + 1];
>>>>>>  };
>>>>>>  
>>>>>>  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>>>>>> @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>>>>>>      },
>>>>>>  };
>>>>>>  
>>>>>> +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>>>>>> +{
>>>>>> +    PowerPCCPU *cpu = spapr_find_cpu(server);
>>>>>> +
>>>>>> +    return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>>>>>> +}
>>>>>> +
>>>>>>  static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>>>>>  {
>>>>>>  
>>>>>> @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
>>>>>>      sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>>>>>>  
>>>>>>      memset(xicp->tima, 0, sizeof(xicp->tima));
>>>>>> +
>>>>>> +    memset(xicp->eqt, 0, sizeof(xicp->eqt));
>>>>>>  }
>>>>>>  
>>>>>>  static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
>>>>>> @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
>>>>>>      xicp->tima_os = &xicp->tima[TM_QW1_OS];
>>>>>>  }
>>>>>>  
>>>>>> +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
>>>>>> +    .name = TYPE_SPAPR_XIVE_ICP "/eq",
>>>>>> +    .version_id = 1,
>>>>>> +    .minimum_version_id = 1,
>>>>>> +    .fields = (VMStateField []) {
>>>>>> +        VMSTATE_UINT32(w0, XiveEQ),
>>>>>> +        VMSTATE_UINT32(w1, XiveEQ),
>>>>>> +        VMSTATE_UINT32(w2, XiveEQ),
>>>>>> +        VMSTATE_UINT32(w3, XiveEQ),
>>>>>> +        VMSTATE_UINT32(w4, XiveEQ),
>>>>>> +        VMSTATE_UINT32(w5, XiveEQ),
>>>>>> +        VMSTATE_UINT32(w6, XiveEQ),
>>>>>> +        VMSTATE_UINT32(w7, XiveEQ),
>>>>>
>>>>> Wow.  Super descriptive field names there, but I guess that's not your fault.
>>>>
>>>> The defines in the "xive-internal.h" give a better view ... 
>>>>
>>>>>> +        VMSTATE_END_OF_LIST()
>>>>>> +    },
>>>>>> +};
>>>>>> +
>>>>>>  static bool vmstate_spapr_xive_icp_needed(void *opaque)
>>>>>>  {
>>>>>>      /* TODO check machine XIVE support */
>>>>>> @@ -696,6 +725,8 @@ static const VMStateDescription vmstate_spapr_xive_icp = {
>>>>>>      .needed = vmstate_spapr_xive_icp_needed,
>>>>>>      .fields = (VMStateField[]) {
>>>>>>          VMSTATE_BUFFER(tima, sPAPRXiveICP),
>>>>>> +        VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 1), 1,
>>>>>> +                             vmstate_spapr_xive_icp_eq, XiveEQ),
>>>>>>          VMSTATE_END_OF_LIST()
>>>>>>      },
>>>>>>  };
>>>>>> @@ -755,3 +786,28 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn)
>>>>>>      ive->w &= ~IVE_VALID;
>>>>>>      return true;
>>>>>>  }
>>>>>> +
>>>>>> +/*
>>>>>> + * Use a simple indexing for the EQs.
>>>>>
>>>>> Is this server+priority encoding architected anywhere?  
>>>>
>>>> no. This is a model shortcut.
>>>>
>>>>> Otherwise, why not use separate parameters?
>>>>
>>>> yes. spapr_xive_get_eq() could use separate parameters and it would
>>>> shorten the some of the hcalls.
>>>>
>>>> The result is stored in a single field of the IVE, EQ_INDEX. So I will 
>>>> still need mangle/demangle routines but these could be simple macros.
>>>> I will look at it.
>>>
>>> Hm, ok.  So it's architected in the sense that you're using the
>>> encoding from the EQ_INDEX field throughout.  That's could be a
>>> reasonable choice, I can't really tell yet.
>>>
>>> On the other hand, it might be easier to read if we use server and
>>> priority as separate parameters until the point we actually encode
>>> into the EQ_INDEX field.
>>
>> In the architecture, the EQ_INDEX field contains an index to an 
>> Event Queue Descriptor and the Event Queue Descriptor has a 
>> EQ_W6_NVT_INDEX field pointing to an Notification Virtual Target.
>> So there are two extra tables for the EQs and for the NVTs
>> used by the HW.
> 
> Ok.  In the PAPR interface is the EQ_INDEX ever exposed to the guest?

never. 

> Or does it just supply target/priority numbers and the hypervisor
> manages the mapping to queues internally?

Yes. target/priority numbers is the interface used by the hcalls. 

Same for baremetal. OPAL handles the EQ indexing because it creates 
the EQ table and register it in the controller   

C.
 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions
  2017-12-04 15:30     ` Cédric Le Goater
@ 2017-12-05  2:24       ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-12-05  2:24 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2830 bytes --]

On Mon, Dec 04, 2017 at 04:30:36PM +0100, Cédric Le Goater wrote:
> On 12/04/2017 08:52 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:51PM +0100, Cédric Le Goater wrote:
> >> When the XIVE interrupt mode is activated, the machine needs to expose
> >> to the guest the MMIO regions use by the controller :
> >>
> >>   - Event State Buffer (ESB)
> >>   - Thread Interrupt Management Area (TIMA)
> >>
> >> Migration will also need to reflect the current interrupt mode in use.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive_hcall.c  | 14 ++++++++++++++
> >>  hw/ppc/spapr.c              |  5 +++++
> >>  include/hw/ppc/spapr_xive.h |  1 +
> >>  3 files changed, 20 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> >> index 60c6c9f4be8f..ba217144878e 100644
> >> --- a/hw/intc/spapr_xive_hcall.c
> >> +++ b/hw/intc/spapr_xive_hcall.c
> >> @@ -933,3 +933,17 @@ void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
> >>      _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
> >>                       plat_res_int_priorities, sizeof(plat_res_int_priorities)));
> >>  }
> >> +
> >> +void spapr_xive_mmio_map(sPAPRMachineState *spapr)
> >> +{
> >> +    sPAPRXive *xive = spapr->xive;
> >> +
> >> +    /* ESBs */
> >> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->esb_base);
> >> +
> >> +    /* Thread Management Interrupt Areas */
> >> +    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
> >> +     * region needs some rework in the handlers */
> >> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1,
> >> +                    xive->tm_base + (1 << xive->tm_shift));
> > 
> > You probably shouldn't be exposing the user TIMA in the DT if you're
> > only allowing the OS TIME to be mapped.
> 
> The specs requires to map both Uset and OS TIMA.

Ok.

> > 
> >> +}
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 3a62369883cc..734706c18cb3 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -1132,6 +1132,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> >>      } else {
> >>          /* Populate device tree for XIVE */
> >>          spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
> >> +        spapr_xive_mmio_map(spapr);
> > 
> > This doesn't belong here, spapr_build_fdt() should _just_ build the
> > fdt, not have side effects on the actual device state.
> 
> Yes. I will move the rest of the XIVE setup in the reset handler
> before the device tree is built.

Ok.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 20/25] spapr: add device tree support for the XIVE interrupt mode
  2017-12-04 16:19     ` Cédric Le Goater
@ 2017-12-05  3:38       ` David Gibson
  0 siblings, 0 replies; 128+ messages in thread
From: David Gibson @ 2017-12-05  3:38 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 7405 bytes --]

On Mon, Dec 04, 2017 at 05:19:03PM +0100, Cédric Le Goater wrote:
> On 12/04/2017 08:49 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:50PM +0100, Cédric Le Goater wrote:
> >> The XIVE interface for the guest is described in the device tree under
> >> the "interrupt-controller" node. A couple of new properties are
> >> specific to XIVE :
> >>
> >>  - "reg"
> >>
> >>    contains the base address and size of the thread interrupt
> >>    managnement areas (TIMA), also called rings, for the User level and
> >>    for the Guest OS level. Only the Guest OS level is taken into
> >>    account today.
> >>
> >>  - "ibm,xive-eq-sizes"
> >>
> >>    the size of the event queues. One cell per size supported, contains
> >>    log2 of size, in ascending order.
> >>
> >>  - "ibm,xive-lisn-ranges"
> >>
> >>    the interrupt numbers ranges assigned to the guest. These are
> >>    allocated using a simple bitmap.
> >>
> >> and also under the root node :
> >>
> >>  - "ibm,plat-res-int-priorities"
> >>
> >>    contains a list of priorities that the hypervisor has reserved for
> >>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
> >>
> >> When the XIVE interrupt mode is activated after the CAS negotiation,
> >> the machine will perform a reboot to rebuild the device tree.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive_hcall.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++
> >>  hw/ppc/spapr.c              |  7 ++++++-
> >>  hw/ppc/spapr_hcall.c        |  6 ++++++
> >>  include/hw/ppc/spapr_xive.h |  2 ++
> >>  4 files changed, 64 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> >> index 676fe0e2d5c7..60c6c9f4be8f 100644
> >> --- a/hw/intc/spapr_xive_hcall.c
> >> +++ b/hw/intc/spapr_xive_hcall.c
> >> @@ -883,3 +883,53 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> >>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> >>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
> >>  }
> >> +
> >> +void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers,
> >> +                         void *fdt, uint32_t phandle)
> > 
> > Call it spapr_dt_xive() please, I'm trying to standardize on that
> > pattern for functions creating DT pieces.
> 
> OK. And what about the first argument : sPAPRMachineState *spapr 
> or sPAPRXive *xive ? I tend to prefer the first option because
> it's related to the interface with the guest, like the hcalls.

Yes, using the MachineState as the first parameter is fine.

> 
> > 
> >> +{
> >> +    sPAPRXive *xive = spapr->xive;
> >> +    int node;
> >> +    uint64_t timas[2 * 2];
> >> +    uint32_t lisn_ranges[] = {
> >> +        cpu_to_be32(0),
> >> +        cpu_to_be32(nr_servers),
> >> +    };
> >> +    uint32_t eq_sizes[] = {
> >> +        cpu_to_be32(12), /* 4K */
> >> +        cpu_to_be32(16), /* 64K */
> >> +        cpu_to_be32(21), /* 2M */
> >> +        cpu_to_be32(24), /* 16M */
> >> +    };
> >> +    uint32_t plat_res_int_priorities[ARRAY_SIZE(reserved_priorities)];
> >> +    int i;
> >> +
> >> +    for (i = 0; i < ARRAY_SIZE(plat_res_int_priorities); i++) {
> >> +        plat_res_int_priorities[i] = cpu_to_be32(reserved_priorities[i]);
> >> +    }
> >> +
> >> +    /* Thread Interrupt Management Areas : User and OS */
> >> +    for (i = 0; i < 2; i++) {
> >> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
> >> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
> >> +    }
> >> +
> >> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
> > 
> > You need a unit address here matching the reg property.
> 
> Indeed. I didn't notice. Curiously it was taking the first address 
> specified in the reg property of the node.

I'm guessing that's SLOF's intervention.

> 
> >> +
> >> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
> > 
> > You don't need to set name properties explicitly for flattened trees.
> 
> OK.
> 
> Thanks,
> 
> C. 
> 
> 
> 
> >> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
> >> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> >> +
> >> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
> >> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
> >> +                     sizeof(eq_sizes)));
> >> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
> >> +                     sizeof(lisn_ranges)));
> >> +
> >> +    /* For SLOF */
> >> +    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
> >> +    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
> >> +
> >> +    /* top properties */
> >> +    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
> >> +                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
> >> +}
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 8b15c0b500d0..3a62369883cc 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -1127,7 +1127,12 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> >>      _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
> >>  
> >>      /* /interrupt controller */
> >> -    spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
> >> +    } else {
> >> +        /* Populate device tree for XIVE */
> >> +        spapr_xive_populate(spapr, xics_max_server_number(), fdt, PHANDLE_XICP);
> >> +    }
> >>  
> >>      ret = spapr_populate_memory(spapr, fdt);
> >>      if (ret < 0) {
> >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> >> index be22a6b2895f..e2a1665beee9 100644
> >> --- a/hw/ppc/spapr_hcall.c
> >> +++ b/hw/ppc/spapr_hcall.c
> >> @@ -1646,6 +1646,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
> >>                                            ov5_updates) != 0);
> >>      }
> >> +
> >> +    /* We need to rebuild the device tree for XIVE, generate a reset */
> >> +    if (!spapr->cas_reboot) {
> >> +        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT);
> >> +    }
> >> +
> >>      spapr_ovec_cleanup(ov5_updates);
> >>  
> >>      if (spapr->cas_reboot) {
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> index 3f822220647f..f6d4bf26e06a 100644
> >> --- a/include/hw/ppc/spapr_xive.h
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -82,5 +82,7 @@ void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
> >>  typedef struct sPAPRMachineState sPAPRMachineState;
> >>  
> >>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> >> +void spapr_xive_populate(sPAPRMachineState *spapr, int nr_servers, void *fdt,
> >> +                         uint32_t phandle);
> >>  
> >>  #endif /* PPC_SPAPR_XIVE_H */
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-12-01 17:46     ` Cédric Le Goater
@ 2017-12-05  7:00       ` David Gibson
  2017-12-05 14:50         ` Benjamin Herrenschmidt
  2017-12-05 16:12         ` Cédric Le Goater
  0 siblings, 2 replies; 128+ messages in thread
From: David Gibson @ 2017-12-05  7:00 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 42107 bytes --]

On Fri, Dec 01, 2017 at 06:46:45PM +0100, Cédric Le Goater wrote:
> On 12/01/2017 05:01 AM, David Gibson wrote:
> > On Thu, Nov 23, 2017 at 02:29:49PM +0100, Cédric Le Goater wrote:
> >> A set of Hypervisor's call are used to configure the interrupt sources
> >> and the event/notification queues of the guest:
> >>
> >>  - H_INT_GET_SOURCE_INFO
> >>
> >>    used to obtain the address of the MMIO page of the Event State
> >>    Buffer (PQ bits) entry associated with the source.
> >>
> >>  - H_INT_SET_SOURCE_CONFIG
> >>
> >>    assigns a source to a "target".
> >>
> >>  - H_INT_GET_SOURCE_CONFIG
> >>
> >>    determines to which "target" and "priority" is assigned to a source
> >>
> >>  - H_INT_GET_QUEUE_INFO
> >>
> >>    returns the address of the notification management page associated
> >>    with the specified "target" and "priority".
> >>
> >>  - H_INT_SET_QUEUE_CONFIG
> >>
> >>    sets or resets the event queue for a given "target" and "priority".
> >>    It is also used to set the notification config associated with the
> >>    queue, only unconditional notification for the moment.  Reset is
> >>    performed with a queue size of 0 and queueing is disabled in that
> >>    case.
> >>
> >>  - H_INT_GET_QUEUE_CONFIG
> >>
> >>    returns the queue settings for a given "target" and "priority".
> >>
> >>  - H_INT_RESET
> >>
> >>    resets all of the partition's interrupt exploitation structures to
> >>    their initial state, losing all configuration set via the hcalls
> >>    H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> >>
> >>  - H_INT_SYNC
> >>
> >>    issue a synchronisation on a source to make sure sure all
> >>    notifications have reached their queue.
> >>
> >> Calls that still need to be addressed :
> >>
> >>    H_INT_SET_OS_REPORTING_LINE
> >>    H_INT_GET_OS_REPORTING_LINE
> >>
> >> See the code for more documentation on each hcall.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/Makefile.objs       |   2 +-
> >>  hw/intc/spapr_xive_hcall.c  | 885 ++++++++++++++++++++++++++++++++++++++++++++
> >>  hw/ppc/spapr.c              |   2 +
> >>  include/hw/ppc/spapr.h      |  15 +-
> >>  include/hw/ppc/spapr_xive.h |   4 +
> >>  5 files changed, 906 insertions(+), 2 deletions(-)
> >>  create mode 100644 hw/intc/spapr_xive_hcall.c
> >>
> >> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> >> index 49e13e7aeeee..122e2ec77e8d 100644
> >> --- a/hw/intc/Makefile.objs
> >> +++ b/hw/intc/Makefile.objs
> >> @@ -35,7 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >>  obj-$(CONFIG_XICS) += xics.o
> >>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> >> -obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> >> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o spapr_xive_hcall.o
> >>  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> >> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> >> new file mode 100644
> >> index 000000000000..676fe0e2d5c7
> >> --- /dev/null
> >> +++ b/hw/intc/spapr_xive_hcall.c
> >> @@ -0,0 +1,885 @@
> >> +/*
> >> + * QEMU PowerPC sPAPR XIVE model
> >> + *
> >> + * Copyright (c) 2017, IBM Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License, version 2, as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/log.h"
> >> +#include "qapi/error.h"
> >> +#include "cpu.h"
> >> +#include "hw/ppc/spapr.h"
> >> +#include "hw/ppc/spapr_xive.h"
> >> +#include "hw/ppc/fdt.h"
> >> +#include "monitor/monitor.h"
> >> +
> >> +#include "xive-internal.h"
> >> +
> >> +/* Priority ranges reserved by the hypervisor. The Linux driver is
> >> + * expected to choose priority 6.
> >> + */
> >> +static const uint32_t reserved_priorities[] = {
> >> +    7,    /* start */
> >> +    0xf8, /* count */
> >> +};
> >> +
> >> +static bool priority_is_valid(uint32_t priority)
> >> +{
> >> +    int i;
> >> +
> >> +    for (i = 0; i < ARRAY_SIZE(reserved_priorities) / 2; i++) {
> >> +        uint32_t base  = reserved_priorities[2 * i];
> >> +        uint32_t count = reserved_priorities[2 * i + 1];
> >> +
> >> +        if (priority >= base && priority < base + count) {
> >> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: priority %d is reserved\n",
> >> +                          __func__, priority);
> >> +            return false;
> >> +        }
> >> +    }
> >> +
> >> +    return true;
> >> +}
> > 
> > This seems like overkill.  Aren't there only 0..7 levels supported in
> > hardware, in which case a one byte bitmap will suffice to store the
> > reserved levels.
> 
> I was trying the use the same array that will be exposed in the device
> tree in the "ibm,plat-res-int-priorities" property, defined as 
> follow in PAPR:
> 
> 	property name that designates to the client program that the
> 	platform has reserved one or more interrupt priorities for its
> 	own use.
> 	
> 	prop-encoded-value: one or more (interrupt priority, range)
> 	pairs, where interrupt priority is a single cell hexidec- imal
> 	number between 0x00 and 0xFF, and range is an integer encoded as
> 	with encode-int that represents the number of contiguous
> 	interrupt priorities that have been reserved by the platform for
> 	its internal use.
> 
> 
> But I agree, it's a bit overkill to check for 0..7 levels ...

Ok, I do see the point here.  Hmm.. not sure where best to go with
this.  One source of data is always good, and this is probably less
complex than deriving the DT list from a bitmap.

On the other hand I am wary these days of over-generalizing, since it
can lead to nightmares for migration consistency.

> > To check my understanding again, if you're running this with KVM, the
> > host kernel and qemu will need to agree on which are the reserved
> > levels, yes?
> 
> Hmm, these values are quite static. So I don't think there will be 
> any sort of exchange between KVM and QEMU to define the range to 
> expose to the guest. 
> 
> For the moment, Linux only uses one priority, the lowest, and Ben
> has introduced in OPAL an automatic interrupt escalation feature
> using queue 7 for all other queues (DD2.0 cpus). So we only expose 
> range 0..6 to the guest for this purpose.
> 
> So we agreed orally.

Ok, that's fine, just making sure I understand the situation.
> >> +static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
> >> +                                          sPAPRMachineState *spapr,
> >> +                                          target_ulong opcode,
> >> +                                          target_ulong *args)
> >> +{
> >> +    sPAPRXive *xive = spapr->xive;
> >> +    XiveIVE *ive;
> >> +    target_ulong flags  = args[0];
> >> +    target_ulong lisn   = args[1];
> >> +    uint64_t mmio_base;
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> > 
> > Is H_FUNCTION required by the PAPR ACRs here?  
> 
> yes. quoting the specs :
> 
> 	/* H_Function: The calling OS is not in exploitation mode */
> 
> I need to review once more all of the return errors but, last time
> I checked they looked sane. 

Ok.

> > Usually we only use
> > H_FUNCTION if the hypercall doesn't exist at all, and if unavailable
> > for other reasons use H_AUTHORITY or something.
> > 
> >> +    }
> >> +
> >> +    if (flags) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    mmio_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
> > 
> > Hrm.. why was xive->esb_base not already a u64?
> 
> its an 'hwaddr'. Yes I can remove it.

Right.  mmio_base should be a hwaddr too, so you shouldn't need a cast.

> >> +    args[0] = 0;
> >> +    if (spapr_xive_irq_is_lsi(xive, lisn)) {
> >> +        args[0] |= XIVE_SRC_LSI;
> >> +    }
> >> +    if (xive->flags & XIVE_SRC_TRIGGER) {
> >> +        args[0] |= XIVE_SRC_TRIGGER;
> >> +    }
> >> +
> >> +    if (xive->flags & XIVE_SRC_H_INT_ESB) {
> 
> btw, this is why I have the ->flags field. Do you still want me to 
> remove it ? because I would like to keep the logic below. No big 
> deal if not.
> 
> >> +        args[1] = -1; /* never used in QEMU  */
> >> +        args[2] = -1;
> >> +    } else {
> >> +        args[1] = mmio_base;
> >> +        if (xive->flags & XIVE_SRC_TRIGGER) {
> >> +            args[2] = -1; /* No specific trigger page */
> >> +        } else {
> >> +            args[2] = -1; /* TODO: support for specific trigger page */
> >> +        }
> >> +    }
> > 
> > What does the availability of SRC_TRIGGER (and INT_ESB) depend on? 
> 
> The CPU revision. But we won't introduce XIVE exploitation mode on 
> anything else than DD2.0 which has full XIVE support. Even STORE_EOI 
> that we should be adding.

Hrm.  Host CPU?  That's a problem - if guest visible properties like
this vary with the host CPU, migration breaks.

> 
> > If it varies with host capabilities, that's going to be real pain for
> > migration.
> 
> Yes. I am not aware of any future extension but I agree this is
> something we need to keep an eye on.

I'm not talking about future extension, I'm meaning right now.

>  
> >> +
> >> +    args[3] = xive->esb_shift;
> >> +
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
> >> + * Interrupt Source to a target. The Logical Interrupt Source is
> >> + * designated with the "lisn" parameter and the target is designated
> >> + * with the "target" and "priority" parameters.  Upon return from the
> >> + * hcall(), no additional interrupts will be directed to the old EQ.
> >> + *
> >> + * TODO: The old EQ should be investigated for interrupts that
> >> + * occurred prior to or during the hcall().
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-61: Reserved
> >> + *      Bit 62: set the "eisn" in the EA
> >> + *      Bit 63: masks the interrupt source in the hardware interrupt
> >> + *      control structure. An interrupt masked by this mechanism will
> >> + *      be dropped, but it's source state bits will still be
> >> + *      set. There is no race-free way of unmasking and restoring the
> >> + *      source. Thus this should only be used in interrupts that are
> >> + *      also masked at the source, and only in cases where the
> >> + *      interrupt is not meant to be used for a large amount of time
> >> + *      because no valid target exists for it for example
> >> + * - "lisn" is per "interrupts", "interrupt-map", or
> >> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> >> + *      ibm,query-interrupt-source-number RTAS call, or as returned by
> >> + *      the H_ALLOCATE_VAS_WINDOW hcall
> >> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> >> + *      "ibm,ppc-interrupt-gserver#s"
> >> + * - "priority" is a valid priority not in
> >> + *      "ibm,plat-res-int-priorities"
> >> + * - "eisn" is the guest EISN associated with the "lisn"
> >> + *
> >> + * Output:
> >> + * - None
> >> + */
> >> +
> >> +#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
> >> +#define XIVE_SRC_MASK     (1ull << (63 - 63))
> > 
> > Aren't there already a bunch of macros you have for defining things in
> > terms of IBM bit numbers, so you can avoid open coding (63 - whatever).
> 
> Yes. 
> 
> On that topic, could we include the PPC_BIT* macros somewhere under ppc ? 

Uh, sure, why not. target/ppc/cpu.h seems the logical place.

> >> +
> >> +static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
> >> +                                            sPAPRMachineState *spapr,
> >> +                                            target_ulong opcode,
> >> +                                            target_ulong *args)
> >> +{
> >> +    XiveIVE *ive;
> >> +    uint64_t new_ive;
> >> +    target_ulong flags    = args[0];
> >> +    target_ulong lisn     = args[1];
> >> +    target_ulong target   = args[2];
> >> +    target_ulong priority = args[3];
> >> +    target_ulong eisn     = args[4];
> >> +    uint32_t eq_idx;
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    /* priority 0xff is used to reset the IVE */
> >> +    if (priority == 0xff) {
> >> +        new_ive = IVE_VALID | IVE_MASKED;
> >> +        goto out;
> >> +    }
> >> +
> >> +    new_ive = ive->w;
> >> +
> >> +    if (flags & XIVE_SRC_MASK) {
> >> +        new_ive = ive->w | IVE_MASKED;
> >> +    } else {
> >> +        new_ive = ive->w & ~IVE_MASKED;
> >> +    }
> >> +
> >> +    if (!priority_is_valid(priority)) {
> >> +        return H_P4;
> >> +    }
> >> +
> >> +    /* TODO: If the partition thread count is greater than the
> >> +     * hardware thread count, validate the "target" has a
> >> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> >> +     */
> > 
> > What's this about?  
> 
> That is from the specs and I haven't quite figured out what it meant.
> I need to ask.
> 
> > I thought the point of XIVE was you could set up
> > target queues for your vcpus regardless of mapping to physical cpus.
> 
> yes.
> 
> >> +    /* Validate that "target" is part of the list of threads allocated
> >> +     * to the partition. For that, find the EQ corresponding to the
> >> +     * target.
> >> +     */
> >> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> >> +        return H_P3;
> >> +    }
> >> +
> >> +    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
> >> +    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
> >> +
> >> +    if (flags & XIVE_SRC_SET_EISN) {
> >> +        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
> >> +    }
> >> +
> >> +out:
> >> +    /* TODO: handle syncs ? */
> >> +
> >> +    /* And update */
> >> +    ive->w = new_ive;
> >> +
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
> >> + * target/priority pair is assigned to the specified Logical Interrupt
> >> + * Source.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-63 Reserved
> >> + * - "lisn" is per "interrupts", "interrupt-map", or
> >> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> >> + *      ibm,query-interrupt-source-number RTAS call, or as
> >> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
> >> + *
> >> + * Output:
> >> + * - R4: Target to which the specified Logical Interrupt Source is
> >> + *       assigned
> >> + * - R5: Priority to which the specified Logical Interrupt Source is
> >> + *       assigned
> >> + * - R6: EISN for the specified Logical Interrupt Source (this will be
> >> + *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
> >> + */
> >> +static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
> >> +                                            sPAPRMachineState *spapr,
> >> +                                            target_ulong opcode,
> >> +                                            target_ulong *args)
> >> +{
> >> +    target_ulong flags = args[0];
> >> +    target_ulong lisn = args[1];
> >> +    XiveIVE *ive;
> >> +    XiveEQ *eq;
> >> +    uint32_t eq_idx;
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
> >> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> >> +    if (!eq) {
> >> +        return H_HARDWARE;
> >> +    }
> >> +
> >> +    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> >> +
> >> +    if (ive->w & IVE_MASKED) {
> >> +        args[1] = 0xff;
> >> +    } else {
> >> +        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> >> +    }
> >> +
> >> +    args[2] = GETFIELD(IVE_EQ_DATA, ive->w);
> >> +
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
> >> + * address of the notification management page associated with the
> >> + * specified target and priority.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *       Bits 0-63 Reserved
> >> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> >> + *       "ibm,ppc-interrupt-gserver#s"
> >> + * - "priority" is a valid priority not in
> >> + *       "ibm,plat-res-int-priorities"
> >> + *
> >> + * Output:
> >> + * - R4: Logical real address of notification page
> >> + * - R5: Power of 2 page size of the notification page
> >> + */
> >> +static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
> >> +                                         sPAPRMachineState *spapr,
> >> +                                         target_ulong opcode,
> >> +                                         target_ulong *args)
> >> +{
> >> +    target_ulong flags    = args[0];
> >> +    target_ulong target   = args[1];
> >> +    target_ulong priority = args[2];
> >> +    uint32_t eq_idx;
> >> +    XiveEQ *eq;
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    if (!priority_is_valid(priority)) {
> >> +        return H_P3;
> >> +    }
> >> +
> >> +    /* Validate that "target" is part of the list of threads allocated
> >> +     * to the partition. For that, find the EQ corresponding to the
> >> +     * target.
> >> +     */
> >> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    /* TODO: If the partition thread count is greater than the
> >> +     * hardware thread count, validate the "target" has a
> >> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> >> +     */
> >> +
> >> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> >> +    if (!eq)  {
> >> +        return H_HARDWARE;
> >> +    }
> >> +
> >> +    args[0] = -1; /* TODO: return ESn page */
> >> +    if (eq->w0 & EQ_W0_ENQUEUE) {
> >> +        args[1] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
> >> +    } else {
> >> +        args[1] = 0;
> >> +    }
> >> +
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
> >> + * a given "target" and "priority".  It is also used to set the
> >> + * notification config associated with the EQ.  An EQ size of 0 is
> >> + * used to reset the EQ config for a given target and priority. If
> >> + * resetting the EQ config, the END associated with the given "target"
> >> + * and "priority" will be changed to disable queueing.
> >> + *
> >> + * Upon return from the hcall(), no additional interrupts will be
> >> + * directed to the old EQ (if one was set). The old EQ (if one was
> >> + * set) should be investigated for interrupts that occurred prior to
> >> + * or during the hcall().
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-62: Reserved
> >> + *      Bit 63: Unconditional Notify (n) per the XIVE spec
> >> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> >> + *       "ibm,ppc-interrupt-gserver#s"
> >> + * - "priority" is a valid priority not in
> >> + *       "ibm,plat-res-int-priorities"
> >> + * - "eventQueue": The logical real address of the start of the EQ
> >> + * - "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
> >> + *
> >> + * Output:
> >> + * - None
> >> + */
> >> +
> >> +#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
> >> +
> >> +static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
> >> +                                           sPAPRMachineState *spapr,
> >> +                                           target_ulong opcode,
> >> +                                           target_ulong *args)
> >> +{
> >> +    target_ulong flags    = args[0];
> >> +    target_ulong target   = args[1];
> >> +    target_ulong priority = args[2];
> >> +    target_ulong qpage    = args[3];
> >> +    target_ulong qsize    = args[4];
> >> +    uint32_t eq_idx;
> >> +    XiveEQ *old_eq;
> >> +    XiveEQ eq;
> >> +    uint32_t qdata;
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags & ~XIVE_EQ_ALWAYS_NOTIFY) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    if (!priority_is_valid(priority)) {
> >> +        return H_P3;
> >> +    }
> >> +
> >> +    /* Validate that "target" is part of the list of threads allocated
> >> +     * to the partition. For that, find the EQ corresponding to the
> >> +     * target.
> >> +     */
> >> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    /* TODO: If the partition thread count is greater than the
> >> +     * hardware thread count, validate the "target" has a
> >> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> >> +     */
> >> +
> >> +    old_eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> >> +    if (!old_eq)  {
> >> +        return H_HARDWARE;
> >> +    }
> >> +
> >> +    eq = *old_eq;
> >> +
> >> +    switch (qsize) {
> >> +    case 12:
> >> +    case 16:
> >> +    case 21:
> >> +    case 24:
> >> +        eq.w3 = ((uint64_t)qpage) & 0xffffffff;
> >> +        eq.w2 = (((uint64_t)qpage)) >> 32 & 0x0fffffff;
> >> +        eq.w0 |= EQ_W0_ENQUEUE;
> >> +        eq.w0 = SETFIELD(EQ_W0_QSIZE, eq.w0, qsize - 12);
> >> +        break;
> >> +    case 0:
> >> +        /* reset queue and disable queueing */
> >> +        eq.w2 = eq.w3 = 0;
> >> +        eq.w0 &= ~EQ_W0_ENQUEUE;
> >> +        break;
> >> +    default:
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid EQ size %"PRIx64"\n",
> >> +                      __func__, qsize);
> >> +        return H_P5;
> >> +    }
> >> +
> >> +    if (qsize) {
> >> +        /*
> >> +         * Let's validate the EQ address with a read of the first EQ
> >> +         * entry. We could also check that the full queue has been
> >> +         * zeroed by the OS.
> >> +         */
> >> +        if (address_space_read(&address_space_memory, qpage,
> >> +                               MEMTXATTRS_UNSPECIFIED,
> >> +                               (uint8_t *) &qdata, sizeof(qdata))) {
> >> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to read EQ data @0x%"
> >> +                          HWADDR_PRIx "\n", __func__, qpage);
> >> +            return H_P4;
> >> +        }
> >> +    }
> >> +
> >> +    /* Ensure the priority and target are correctly set (they will not
> >> +     * be right after allocation)
> >> +     */
> >> +    eq.w6 = SETFIELD(EQ_W6_NVT_BLOCK, 0ul, 0ul) |
> >> +        SETFIELD(EQ_W6_NVT_INDEX, 0ul, target);
> >> +    eq.w7 = SETFIELD(EQ_W7_F0_PRIORITY, 0ul, priority);
> >> +
> >> +    /* TODO: depends on notitification page (ESn) from H_INT_GET_QUEUE_INFO */
> >> +    if (flags & XIVE_EQ_ALWAYS_NOTIFY) {
> >> +        eq.w0 |= EQ_W0_UCOND_NOTIFY;
> > 
> > Do you need to also clear if the flag is not set?  AFAICT eq.w0 is
> > inherited from teh old queue and enver reset from scratch.
> 
> True. It is always on if the EQ is not reseted. I also need 
> to be more precise in spapr_xive_irq() when dealing with the 
> reseted EQs. The model has not fallen in to that trap yet.
> 
> >> +    }
> >> +
> >> +    /* The generation bit for the EQ starts at 1 and The EQ page
> >> +     * offset counter starts at 0.
> >> +     */
> >> +    eq.w1 = EQ_W1_GENERATION | SETFIELD(EQ_W1_PAGE_OFF, 0ul, 0ul);
> >> +    eq.w0 |= EQ_W0_VALID;
> >> +
> >> +    /* TODO: issue syncs required to ensure all in-flight interrupts
> >> +     * are complete on the old EQ */
> >> +
> >> +    /* Update EQ */
> >> +    *old_eq = eq;
> > 
> > Hrm.  The BQL probably saves you, but in general do you need to make
> > sure the ENQUEUE bit is set after updating everything else?
> 
> There is a rather complex procedure to update the HW, cache and 
> memory. See xive_eqc_cache_update() in OPAL. I will need to dig 
> in for the PowerNV support ...
> 
> >> +
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
> >> + * target and priority.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-62: Reserved
> >> + *      Bit 63: Debug: Return debug data
> >> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> >> + *       "ibm,ppc-interrupt-gserver#s"
> >> + * - "priority" is a valid priority not in
> >> + *       "ibm,plat-res-int-priorities"
> >> + *
> >> + * Output:
> >> + * - R4: "flags":
> >> + *       Bits 0-62: Reserved
> >> + *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
> >> + * - R5: The logical real address of the start of the EQ
> >> + * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
> >> + * - R7: The value of Event Queue Offset Counter per XIVE spec
> >> + *       if "Debug" = 1, else 0
> >> + *
> >> + */
> >> +
> >> +#define XIVE_EQ_DEBUG     (1ull << (63 - 63))
> >> +
> >> +static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
> >> +                                           sPAPRMachineState *spapr,
> >> +                                           target_ulong opcode,
> >> +                                           target_ulong *args)
> >> +{
> >> +    target_ulong flags    = args[0];
> >> +    target_ulong target   = args[1];
> >> +    target_ulong priority = args[2];
> >> +    uint32_t eq_idx;
> >> +    XiveEQ *eq;
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags & ~XIVE_EQ_DEBUG) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    if (!priority_is_valid(priority)) {
> >> +        return H_P3;
> >> +    }
> >> +
> >> +    /* Validate that "target" is part of the list of threads allocated
> >> +     * to the partition. For that, find the EQ corresponding to the
> >> +     * target.
> >> +     */
> >> +    if (!spapr_xive_eq_for_server(spapr->xive, target, priority, &eq_idx)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    /* TODO: If the partition thread count is greater than the
> >> +     * hardware thread count, validate the "target" has a
> >> +     * corresponding hardware thread else return H_NOT_AVAILABLE.
> >> +     */
> >> +
> >> +    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
> >> +    if (!eq)  {
> >> +        return H_HARDWARE;
> >> +    }
> >> +
> >> +    args[0] = 0;
> >> +    if (eq->w0 & EQ_W0_UCOND_NOTIFY) {
> >> +        args[0] |= XIVE_EQ_ALWAYS_NOTIFY;
> >> +    }
> >> +
> >> +    if (eq->w0 & EQ_W0_ENQUEUE) {
> >> +        args[1] =
> >> +            (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
> >> +        args[2] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
> >> +    } else {
> >> +        args[1] = 0;
> >> +        args[2] = 0;
> >> +    }
> >> +
> >> +    /* TODO: do we need any locking on the EQ ? */
> > 
> > Probably not if you're designating it as protected by the BQL.
> 
> OK.
> 
> Thanks,
> 
> C. 
>  
> >> +    if (flags & XIVE_EQ_DEBUG) {
> >> +        /* Load the event queue generation number into the return flags */
> >> +        args[0] |= GETFIELD(EQ_W1_GENERATION, eq->w1);
> >> +
> >> +        /* Load R7 with the event queue offset counter */
> >> +        args[3] = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
> >> +    }
> >> +
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
> >> + * reporting cache line pair for the calling thread.  The reporting
> >> + * cache lines will contain the OS interrupt context when the OS
> >> + * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
> >> + * interrupt. The reporting cache lines can be reset by inputting -1
> >> + * in "reportingLine".  Issuing the CI store byte without reporting
> >> + * cache lines registered will result in the data not being accessible
> >> + * to the OS.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-63: Reserved
> >> + * - "reportingLine": The logical real address of the reporting cache
> >> + *    line pair
> >> + *
> >> + * Output:
> >> + * - None
> >> + */
> >> +static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
> >> +                                                sPAPRMachineState *spapr,
> >> +                                                target_ulong opcode,
> >> +                                                target_ulong *args)
> >> +{
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    /* TODO: H_INT_SET_OS_REPORTING_LINE */
> >> +    return H_FUNCTION;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
> >> + * real address of the reporting cache line pair set for the input
> >> + * "target".  If no reporting cache line pair has been set, -1 is
> >> + * returned.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-63: Reserved
> >> + * - "target" is per "ibm,ppc-interrupt-server#s" or
> >> + *       "ibm,ppc-interrupt-gserver#s"
> >> + * - "reportingLine": The logical real address of the reporting cache
> >> + *   line pair
> >> + *
> >> + * Output:
> >> + * - R4: The logical real address of the reporting line if set, else -1
> >> + */
> >> +static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
> >> +                                                sPAPRMachineState *spapr,
> >> +                                                target_ulong opcode,
> >> +                                                target_ulong *args)
> >> +{
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    /* TODO: H_INT_GET_OS_REPORTING_LINE */
> >> +    return H_FUNCTION;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_ESB hcall() is used to issue a load or store to the ESB
> >> + * page for the input "lisn".  This hcall is only supported for LISNs
> >> + * that have the ESB hcall flag set to 1 when returned from hcall()
> >> + * H_INT_GET_SOURCE_INFO.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-62: Reserved
> >> + *      bit 63: Store: Store=1, store operation, else load operation
> >> + * - "lisn" is per "interrupts", "interrupt-map", or
> >> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> >> + *      ibm,query-interrupt-source-number RTAS call, or as
> >> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
> >> + * - "esbOffset" is the offset into the ESB page for the load or store operation
> >> + * - "storeData" is the data to write for a store operation
> >> + *
> >> + * Output:
> >> + * - R4: R4: The value of the load if load operation, else -1
> >> + */
> >> +
> >> +#define XIVE_ESB_STORE (1ull << (63 - 63))
> >> +
> >> +static target_ulong h_int_esb(PowerPCCPU *cpu,
> >> +                              sPAPRMachineState *spapr,
> >> +                              target_ulong opcode,
> >> +                              target_ulong *args)
> >> +{
> >> +    sPAPRXive *xive = spapr->xive;
> >> +    XiveIVE *ive;
> >> +    target_ulong flags   = args[0];
> >> +    target_ulong lisn    = args[1];
> >> +    target_ulong offset  = args[2];
> >> +    target_ulong data    = args[3];
> >> +    uint64_t esb_base;
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags & ~XIVE_ESB_STORE) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    ive = spapr_xive_get_ive(xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    if (offset > (1ull << xive->esb_shift)) {
> >> +        return H_P3;
> >> +    }
> >> +
> >> +    esb_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
> >> +    esb_base += offset;
> >> +
> >> +    if (dma_memory_rw(&address_space_memory, esb_base, &data, 8,
> >> +                      (flags & XIVE_ESB_STORE))) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to rw data @0x%"
> >> +                      HWADDR_PRIx "\n", __func__, esb_base);
> >> +        return H_HARDWARE;
> >> +    }
> >> +    args[0] = (flags & XIVE_ESB_STORE) ? -1 : data;
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_SYNC hcall() is used to issue hardware syncs that will
> >> + * ensure any in flight events for the input lisn are in the event
> >> + * queue.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-63: Reserved
> >> + * - "lisn" is per "interrupts", "interrupt-map", or
> >> + *      "ibm,xive-lisn-ranges" properties, or as returned by the
> >> + *      ibm,query-interrupt-source-number RTAS call, or as
> >> + *      returned by the H_ALLOCATE_VAS_WINDOW hcall
> >> + *
> >> + * Output:
> >> + * - None
> >> + */
> >> +static target_ulong h_int_sync(PowerPCCPU *cpu,
> >> +                               sPAPRMachineState *spapr,
> >> +                               target_ulong opcode,
> >> +                               target_ulong *args)
> >> +{
> >> +    XiveIVE *ive;
> >> +    target_ulong flags   = args[0];
> >> +    target_ulong lisn    = args[1];
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    ive = spapr_xive_get_ive(spapr->xive, lisn);
> >> +    if (!ive || !(ive->w & IVE_VALID)) {
> >> +        return H_P2;
> >> +    }
> >> +
> >> +    /*
> >> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> >> +     * This is not needed when running the emulation under QEMU
> >> +     */
> >> +
> >> +    /* This is not real hardware. Nothing to be done */
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * The H_INT_RESET hcall() is used to reset all of the partition's
> >> + * interrupt exploitation structures to their initial state.  This
> >> + * means losing all previously set interrupt state set via
> >> + * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> >> + *
> >> + * Parameters:
> >> + * Input:
> >> + * - "flags"
> >> + *      Bits 0-63: Reserved
> >> + *
> >> + * Output:
> >> + * - None
> >> + */
> >> +static target_ulong h_int_reset(PowerPCCPU *cpu,
> >> +                                sPAPRMachineState *spapr,
> >> +                                target_ulong opcode,
> >> +                                target_ulong *args)
> >> +{
> >> +    target_ulong flags   = args[0];
> >> +
> >> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        return H_FUNCTION;
> >> +    }
> >> +
> >> +    if (flags) {
> >> +        return H_PARAMETER;
> >> +    }
> >> +
> >> +    spapr_xive_reset(spapr->xive);
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> >> +{
> >> +    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
> >> +    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
> >> +    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
> >> +    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
> >> +    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
> >> +    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
> >> +    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
> >> +                             h_int_set_os_reporting_line);
> >> +    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
> >> +                             h_int_get_os_reporting_line);
> >> +    spapr_register_hypercall(H_INT_ESB, h_int_esb);
> >> +    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> >> +    spapr_register_hypercall(H_INT_RESET, h_int_reset);
> >> +}
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index ca4e72187f60..8b15c0b500d0 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -222,6 +222,8 @@ static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
> >>          goto error;
> >>      }
> >>  
> >> +    spapr_xive_hcall_init(spapr);
> >> +
> >>      return SPAPR_XIVE(obj);
> >>  error:
> >>      error_propagate(errp, local_err);
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index 90e2b0f6c678..a25e218b34e2 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -387,7 +387,20 @@ struct sPAPRMachineState {
> >>  #define H_INVALIDATE_PID        0x378
> >>  #define H_REGISTER_PROC_TBL     0x37C
> >>  #define H_SIGNAL_SYS_RESET      0x380
> >> -#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
> >> +
> >> +#define H_INT_GET_SOURCE_INFO   0x3A8
> >> +#define H_INT_SET_SOURCE_CONFIG 0x3AC
> >> +#define H_INT_GET_SOURCE_CONFIG 0x3B0
> >> +#define H_INT_GET_QUEUE_INFO    0x3B4
> >> +#define H_INT_SET_QUEUE_CONFIG  0x3B8
> >> +#define H_INT_GET_QUEUE_CONFIG  0x3BC
> >> +#define H_INT_SET_OS_REPORTING_LINE 0x3C0
> >> +#define H_INT_GET_OS_REPORTING_LINE 0x3C4
> >> +#define H_INT_ESB               0x3C8
> >> +#define H_INT_SYNC              0x3CC
> >> +#define H_INT_RESET             0x3D0
> >> +
> >> +#define MAX_HCALL_OPCODE        H_INT_RESET
> >>  
> >>  /* The hcalls above are standardized in PAPR and implemented by pHyp
> >>   * as well.
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> index 6e8a189e723f..3f822220647f 100644
> >> --- a/include/hw/ppc/spapr_xive.h
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -79,4 +79,8 @@ bool spapr_xive_irq_unset(sPAPRXive *xive, uint32_t lisn);
> >>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> >>  void spapr_xive_icp_pic_print_info(sPAPRXiveICP *xicp, Monitor *mon);
> >>  
> >> +typedef struct sPAPRMachineState sPAPRMachineState;
> >> +
> >> +void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> >> +
> >>  #endif /* PPC_SPAPR_XIVE_H */
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue
  2017-12-04  1:20             ` David Gibson
@ 2017-12-05 10:58               ` Cédric Le Goater
  0 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-05 10:58 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt; +Cc: qemu-ppc, qemu-devel

On 12/04/2017 01:20 AM, David Gibson wrote:
> On Sat, Dec 02, 2017 at 08:46:19AM -0600, Benjamin Herrenschmidt wrote:
>> On Sat, 2017-12-02 at 08:45 -0600, Benjamin Herrenschmidt wrote:
>>> On Fri, 2017-12-01 at 15:10 +1100, David Gibson wrote:
>>>>
>>>> Hm, ok.  Guest endian (or at least, not definitively host-endian) data
>>>> in a plain uint32_t makes me uncomfortable.  Could we use char data[4]
>>>> instead, to make it clear it's a byte-ordered buffer, rather than a
>>>> number as far as the XIVE is concerned.
>>>>
>>>> Hm.. except that doesn't quite work, because the hardware must define
>>>> which end that generation bit ends up in...
>>>
>>> It also needs to be written atomically. Just say it's big endian.
>>
>> Also the guest reads it using be32_to_cpup...
> 
> Ok.  Definitely should be treated as BE and read/written with the be32
> DMA helper functions.
> 

hmm, the stl_be_dma does not return errors but dma_memory_write()
does. 

    static inline void st##_sname##_##_end##_dma(AddressSpace *as,      \
                                                 dma_addr_t addr,       \
                                                 uint##_bits##_t val)   \
    {                                                                   \
        val = cpu_to_##_end##_bits(val);                                \
        dma_memory_write(as, addr, &val, (_bits) / 8);                  \
    }

These macros seem to be only used by spapr_vio and nvram. I can probably 
change them.

C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-12-05  7:00       ` David Gibson
@ 2017-12-05 14:50         ` Benjamin Herrenschmidt
  2017-12-06  9:20           ` David Gibson
  2017-12-05 16:12         ` Cédric Le Goater
  1 sibling, 1 reply; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-05 14:50 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater; +Cc: qemu-ppc, qemu-devel

On Tue, 2017-12-05 at 18:00 +1100, David Gibson wrote:
> > The CPU revision. But we won't introduce XIVE exploitation mode on 
> > anything else than DD2.0 which has full XIVE support. Even STORE_EOI 
> > that we should be adding.
> 
> Hrm.  Host CPU?  That's a problem - if guest visible properties like
> this vary with the host CPU, migration breaks.

I don't think this is going to be a problem in practice. The
availability of trigger comes from OPAL but in practice, all virtual
interrupts are going to support it always, only some of the HW
originated ones (PCIe MSIs for example and LSIs) won't. And we don't
migrate with PCIe devices passed through.

So the guest need the info, but we should be ok with migration from P9
DD2.0 onwards. Nobody sane cares about P9 DD1.0.

Any future chip will have to ensure that we don't lose that property in
HW least we lost migration, but that would break AIX too so I'm
reasonably confident the HW guys will get that right ;-)

> > 
> > > If it varies with host capabilities, that's going to be real pain for
> > > migration.
> > 
> > Yes. I am not aware of any future extension but I agree this is
> > something we need to keep an eye on.
> 
> I'm not talking about future extension, I'm meaning right now.

No, no issue right now.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-12-05  7:00       ` David Gibson
  2017-12-05 14:50         ` Benjamin Herrenschmidt
@ 2017-12-05 16:12         ` Cédric Le Goater
  1 sibling, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-05 16:12 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[ ... ] 

>>>> +static bool priority_is_valid(uint32_t priority)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < ARRAY_SIZE(reserved_priorities) / 2; i++) {
>>>> +        uint32_t base  = reserved_priorities[2 * i];
>>>> +        uint32_t count = reserved_priorities[2 * i + 1];
>>>> +
>>>> +        if (priority >= base && priority < base + count) {
>>>> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: priority %d is reserved\n",
>>>> +                          __func__, priority);
>>>> +            return false;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return true;
>>>> +}
>>>
>>> This seems like overkill.  Aren't there only 0..7 levels supported in
>>> hardware, in which case a one byte bitmap will suffice to store the
>>> reserved levels.
>>
>> I was trying the use the same array that will be exposed in the device
>> tree in the "ibm,plat-res-int-priorities" property, defined as 
>> follow in PAPR:
>>
>> 	property name that designates to the client program that the
>> 	platform has reserved one or more interrupt priorities for its
>> 	own use.
>> 	
>> 	prop-encoded-value: one or more (interrupt priority, range)
>> 	pairs, where interrupt priority is a single cell hexidec- imal
>> 	number between 0x00 and 0xFF, and range is an integer encoded as
>> 	with encode-int that represents the number of contiguous
>> 	interrupt priorities that have been reserved by the platform for
>> 	its internal use.
>>
>>
>> But I agree, it's a bit overkill to check for 0..7 levels ...
> 
> Ok, I do see the point here.  Hmm.. not sure where best to go with
> this.  One source of data is always good, and this is probably less
> complex than deriving the DT list from a bitmap.
> 
> On the other hand I am wary these days of over-generalizing, since it
> can lead to nightmares for migration consistency.

I should be able to simplify by using one array of valid priorities
for the guest, and from that build the array of reserved priorities 
for the platform. 
 
[ ... ]

>>>> +    mmio_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * lisn;
>>>
>>> Hrm.. why was xive->esb_base not already a u64?
>>
>> its an 'hwaddr'. Yes I can remove it.
> 
> Right.  mmio_base should be a hwaddr too, so you shouldn't need a cast.

yes.

[ ... ]

>>> What does the availability of SRC_TRIGGER (and INT_ESB) depend on? 
>>
>> The CPU revision. But we won't introduce XIVE exploitation mode on 
>> anything else than DD2.0 which has full XIVE support. Even STORE_EOI 
>> that we should be adding.
> 
> Hrm.  Host CPU?  That's a problem - if guest visible properties like
> this vary with the host CPU, migration breaks.
> 
>>
>>> If it varies with host capabilities, that's going to be real pain for
>>> migration.
>>
>> Yes. I am not aware of any future extension but I agree this is
>> something we need to keep an eye on.
> 
> I'm not talking about future extension, I'm meaning right now.

I think Ben has answered these questions.

[ ... ] 

>> On that topic, could we include the PPC_BIT* macros somewhere under ppc ? 
> 
> Uh, sure, why not. target/ppc/cpu.h seems the logical place.

I will send a preliminary patch for 2.12 then.


Thanks,


C.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-12-05 14:50         ` Benjamin Herrenschmidt
@ 2017-12-06  9:20           ` David Gibson
  2017-12-06 19:41             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 128+ messages in thread
From: David Gibson @ 2017-12-06  9:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1914 bytes --]

On Tue, Dec 05, 2017 at 08:50:26AM -0600, Benjamin Herrenschmidt wrote:
> On Tue, 2017-12-05 at 18:00 +1100, David Gibson wrote:
> > > The CPU revision. But we won't introduce XIVE exploitation mode on 
> > > anything else than DD2.0 which has full XIVE support. Even STORE_EOI 
> > > that we should be adding.
> > 
> > Hrm.  Host CPU?  That's a problem - if guest visible properties like
> > this vary with the host CPU, migration breaks.
> 
> I don't think this is going to be a problem in practice. The
> availability of trigger comes from OPAL but in practice, all virtual
> interrupts are going to support it always,

Ok.  It still makes me nervous to derive guest visible features from
the host.  I'd prefer to just hardwire the XIVE model to always/never
advertise it and simply fail if that isn't workable for the host kernel.

> only some of the HW
> originated ones (PCIe MSIs for example and LSIs) won't. And we don't
> migrate with PCIe devices passed through.

.. and cross that bridge when we come to it.

> So the guest need the info, but we should be ok with migration from P9
> DD2.0 onwards. Nobody sane cares about P9 DD1.0.
> 
> Any future chip will have to ensure that we don't lose that property in
> HW least we lost migration, but that would break AIX too so I'm
> reasonably confident the HW guys will get that right ;-)
> 
> > > 
> > > > If it varies with host capabilities, that's going to be real pain for
> > > > migration.
> > > 
> > > Yes. I am not aware of any future extension but I agree this is
> > > something we need to keep an eye on.
> > 
> > I'm not talking about future extension, I'm meaning right now.
> 
> No, no issue right now.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode
  2017-12-06  9:20           ` David Gibson
@ 2017-12-06 19:41             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 128+ messages in thread
From: Benjamin Herrenschmidt @ 2017-12-06 19:41 UTC (permalink / raw)
  To: David Gibson; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

On Wed, 2017-12-06 at 20:20 +1100, David Gibson wrote:
> On Tue, Dec 05, 2017 at 08:50:26AM -0600, Benjamin Herrenschmidt wrote:
> > On Tue, 2017-12-05 at 18:00 +1100, David Gibson wrote:
> > > > The CPU revision. But we won't introduce XIVE exploitation mode on 
> > > > anything else than DD2.0 which has full XIVE support. Even STORE_EOI 
> > > > that we should be adding.
> > > 
> > > Hrm.  Host CPU?  That's a problem - if guest visible properties like
> > > this vary with the host CPU, migration breaks.
> > 
> > I don't think this is going to be a problem in practice. The
> > availability of trigger comes from OPAL but in practice, all virtual
> > interrupts are going to support it always,
> 
> Ok.  It still makes me nervous to derive guest visible features from
> the host.  I'd prefer to just hardwire the XIVE model to always/never
> advertise it and simply fail if that isn't workable for the host kernel.

We could fail loudly if we see an migratable interrupt that doesn't
have the flag.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged
  2017-11-30  5:00   ` David Gibson
  2017-11-30 16:17     ` Cédric Le Goater
  2017-12-02 14:40     ` Benjamin Herrenschmidt
@ 2017-12-07 11:55     ` Cédric Le Goater
  2 siblings, 0 replies; 128+ messages in thread
From: Cédric Le Goater @ 2017-12-07 11:55 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

>> +/* Convert a priority number to an Interrupt Pending Buffer (IPB)
>> + * register, which indicates a pending interrupt at the priority
>> + * corresponding to the bit number
>> + */
>> +static uint8_t priority_to_ipb(uint8_t priority)
>> +{
>> +    return priority > XIVE_PRIORITY_MAX ?
>> +        0 : 1 << (XIVE_PRIORITY_MAX - priority);
> 
> Does handling out of bounds values here make sense, or should you just
> assert() they're not passed in?

The priority can be above in the TM_SPC_SET_OS_PENDING command. 

>> +}
>> +
>> +/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
>> + * Interrupt Priority Register (PIPR), which contains the priority of
>> + * the most favored pending notification.
>> + *
>> + * TODO:
>> + *
>> + *   PIPR is clamped to CPPR. So the value in the PIPR is:
>> + *
>> + *     v = leftmost_bit_of(ipb) (or 0xff);
>> + *     pipr = v < cppr ? v : cppr;
>> + *
>> + * Ben says: "which means it's never actually 0xff ... surprise !".
>> + * But, the CPPR can be set to 0xFF ... I am confused ...
> 
> A resolution to this would be nice..

The 'accept' sequence does :

  - store the PIPR in the CPPR
  - reset the pending buffer bit in the IPB
  - recompute the PIPR value from the new IPB
  - drop the exception bit for OS

Typical values are : 

  before  IBP=02 PIPR=06 CPPR=ff NSR=80
   after  IBP=00 PIPR=ff CPPR=06 NSR=00

So today, if the IPB becomes zero, the PIPR is adjusted to 0xFF.

But, if the PIPR is clamped to the CPPR (6), there is a disconnect
between the IBP and the PIPR values. Is that OK ? 

Then, the second effect is that as soon as the OS sets back the CPPR 
to 0xFF, we notify the CPU again and enter a loop. 

I think this is because a test on the EO bit of the NSR is missing 
in the routine setting the CPPR. this bit notifies the presence of 
an exception for the O/S. I will add that. I misunderstood the specs 
on the topic.

C. 
 

^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2017-12-07 11:55 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-23 13:29 [Qemu-devel] [PATCH 00/25] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 01/25] ppc/xics: introduce an icp_create() helper Cédric Le Goater
2017-11-24  2:51   ` David Gibson
2017-11-24  7:57     ` Cédric Le Goater
2017-11-24  9:55     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-11-27  7:20       ` David Gibson
2017-11-24  9:08   ` Greg Kurz
2017-11-23 13:29 ` [Qemu-devel] [PATCH 02/25] ppc/xics: assign of the CPU 'intc' pointer under the core Cédric Le Goater
2017-11-24  2:57   ` David Gibson
2017-11-24  9:21   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-11-23 13:29 ` [Qemu-devel] [PATCH 03/25] spapr: introduce a spapr_icp_create() helper Cédric Le Goater
2017-11-24 10:09   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-11-24 12:26     ` Cédric Le Goater
2017-11-28 10:56       ` Greg Kurz
2017-11-23 13:29 ` [Qemu-devel] [PATCH 04/25] spapr: move the IRQ allocation routines under the machine Cédric Le Goater
2017-11-24  3:13   ` David Gibson
2017-11-28 10:57   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-11-23 13:29 ` [Qemu-devel] [PATCH 05/25] spapr: introduce a spapr_irq_set() helper Cédric Le Goater
2017-11-24  3:16   ` David Gibson
2017-11-24  8:32     ` Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 06/25] spapr: introduce a spapr_irq_get_qirq() helper Cédric Le Goater
2017-11-24  3:18   ` David Gibson
2017-11-24  8:01     ` Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 07/25] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller Cédric Le Goater
2017-11-28  5:40   ` David Gibson
2017-11-28 10:44     ` Cédric Le Goater
2017-11-29  4:47       ` David Gibson
2017-11-29 11:49   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-11-29 13:46     ` Cédric Le Goater
2017-11-29 15:51       ` Greg Kurz
2017-11-29 16:41         ` Cédric Le Goater
2017-11-30  4:23       ` David Gibson
2017-11-30  4:22     ` David Gibson
2017-11-23 13:29 ` [Qemu-devel] [PATCH 09/25] spapr: introduce handlers for XIVE interrupt sources Cédric Le Goater
2017-11-28  5:45   ` David Gibson
2017-11-28 18:18     ` Cédric Le Goater
2017-12-02 14:26       ` Benjamin Herrenschmidt
2017-11-23 13:29 ` [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the " Cédric Le Goater
2017-11-28  6:38   ` David Gibson
2017-11-28 18:33     ` Cédric Le Goater
2017-11-29  4:59       ` David Gibson
2017-11-29 13:56         ` Cédric Le Goater
2017-11-29 16:23           ` Cédric Le Goater
2017-11-30  4:28             ` David Gibson
2017-11-30 16:05               ` Cédric Le Goater
2017-12-02 14:33               ` Benjamin Herrenschmidt
2017-12-02 14:28             ` Benjamin Herrenschmidt
2017-12-02 14:47               ` Cédric Le Goater
2017-11-30  4:26           ` David Gibson
2017-11-30 15:40             ` Cédric Le Goater
2017-12-02 14:23     ` Benjamin Herrenschmidt
2017-11-23 13:29 ` [Qemu-devel] [PATCH 11/25] spapr: describe the XIVE interrupt source flags Cédric Le Goater
2017-11-28  6:40   ` David Gibson
2017-11-28 18:23     ` Cédric Le Goater
2017-12-02 14:24     ` Benjamin Herrenschmidt
2017-12-02 14:38       ` Cédric Le Goater
2017-12-02 14:48         ` Benjamin Herrenschmidt
2017-12-02 14:50           ` Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 12/25] spapr: introduce a XIVE interrupt presenter model Cédric Le Goater
2017-11-29  5:11   ` David Gibson
2017-11-29  9:55     ` Cédric Le Goater
2017-11-30  4:06       ` David Gibson
2017-11-30 13:44         ` Cédric Le Goater
2017-12-01  4:03           ` David Gibson
2017-12-01  8:02             ` Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues Cédric Le Goater
2017-11-23 20:31   ` Benjamin Herrenschmidt
2017-11-24  8:15     ` Cédric Le Goater
2017-11-26 21:52       ` Benjamin Herrenschmidt
2017-11-30  4:38   ` David Gibson
2017-11-30 14:06     ` Cédric Le Goater
2017-11-30 23:35       ` David Gibson
2017-12-01 16:36         ` Cédric Le Goater
2017-12-04  1:09           ` David Gibson
2017-12-04 16:31             ` Cédric Le Goater
2017-12-02 14:39     ` Benjamin Herrenschmidt
2017-12-02 14:41       ` Benjamin Herrenschmidt
2017-11-23 13:29 ` [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue Cédric Le Goater
2017-11-30  4:49   ` David Gibson
2017-11-30 14:16     ` Cédric Le Goater
2017-12-01  4:10       ` David Gibson
2017-12-01 16:43         ` Cédric Le Goater
2017-12-02 14:45         ` Benjamin Herrenschmidt
2017-12-02 14:46           ` Benjamin Herrenschmidt
2017-12-04  1:20             ` David Gibson
2017-12-05 10:58               ` Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged Cédric Le Goater
2017-11-30  5:00   ` David Gibson
2017-11-30 16:17     ` Cédric Le Goater
2017-12-02 14:40     ` Benjamin Herrenschmidt
2017-12-04  1:17       ` David Gibson
2017-12-04 16:09         ` Benjamin Herrenschmidt
2017-12-07 11:55     ` Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 16/25] spapr: add support for the SET_OS_PENDING command (XIVE) Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine Cédric Le Goater
2017-11-30  5:55   ` David Gibson
2017-11-30 15:15     ` Cédric Le Goater
2017-12-01  4:14       ` David Gibson
2017-12-01  8:10         ` Cédric Le Goater
2017-12-04  1:59           ` David Gibson
2017-12-04  8:32             ` Cédric Le Goater
2017-12-04  8:40               ` David Gibson
2017-11-30 15:38     ` Cédric Le Goater
2017-12-01  4:17       ` David Gibson
2017-11-23 13:29 ` [Qemu-devel] [PATCH 18/25] spapr: allocate IRQ numbers for the XIVE interrupt mode Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 19/25] spapr: add hcalls support " Cédric Le Goater
2017-12-01  4:01   ` David Gibson
2017-12-01 17:46     ` Cédric Le Goater
2017-12-05  7:00       ` David Gibson
2017-12-05 14:50         ` Benjamin Herrenschmidt
2017-12-06  9:20           ` David Gibson
2017-12-06 19:41             ` Benjamin Herrenschmidt
2017-12-05 16:12         ` Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 20/25] spapr: add device tree " Cédric Le Goater
2017-12-04  7:49   ` David Gibson
2017-12-04 16:19     ` Cédric Le Goater
2017-12-05  3:38       ` David Gibson
2017-11-23 13:29 ` [Qemu-devel] [PATCH 21/25] spapr: introduce a helper to map the XIVE memory regions Cédric Le Goater
2017-12-04  7:52   ` David Gibson
2017-12-04 15:30     ` Cédric Le Goater
2017-12-05  2:24       ` David Gibson
2017-11-23 13:29 ` [Qemu-devel] [PATCH 22/25] spapr: add XIVE support to spapr_irq_get_qirq() Cédric Le Goater
2017-12-04  7:52   ` David Gibson
2017-11-23 13:29 ` [Qemu-devel] [PATCH 23/25] spapr: toggle the ICP depending on the selected interrupt mode Cédric Le Goater
2017-12-04  7:56   ` David Gibson
2017-11-23 13:29 ` [Qemu-devel] [PATCH 24/25] spapr: add support to dump XIVE information Cédric Le Goater
2017-11-23 13:29 ` [Qemu-devel] [PATCH 25/25] spapr: advertise XIVE exploitation mode in CAS Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.