All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
@ 2017-09-11 17:12 Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller Cédric Le Goater
                   ` (21 more replies)
  0 siblings, 22 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with an
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode, the newer POWER9 interrupt model. This
patchset is a proposal to add XIVE support in POWER9 sPAPR machine.

Follows a model for the XIVE interrupt controller and support for the
Hypervisor's calls which are used to configure the interrupt sources
and the event/notification queues of the guest. The last patch
integrates XIVE in the sPAPR machine.

Code is here:

  https://github.com/legoater/qemu/commits/xive

Caveats :

 - IRQ allocator : making progress

   The sPAPR machine make uses of the interrupt controller very early
   in the initialization sequence to allocate IRQ numbers and populate
   the device tree. CAS requires XIVE to be able to switch interrupt
   model and consequently have the models share a common IRQ allocator.   

   I have chosen to link the sPAPR XICS interrupt source into XIVE to
   share the ICSIRQState array which acts as an IRQ allocator. This
   can be improved.

 - Interrupt presenter :

   The register data is directly stored under the ICPState structure
   which is shared with all other sPAPR interrupt controller models.

 - KVM support : not addressed yet

   The guest needs to be run with kernel_irqchip=off on a POWER9 system.

 - LSI : lightly tested.
   
Thanks,

C.

Changes since RFC v1:

 - removed initial complexity due to a tentative try to support
   PowerNV. This will come later.
 - removed specific XIVE interrupt source and presenter models
 - renamed files and typedefs
 - removed print_info() handler
 - introduced a CAS reset to rebuild the device tree
 - linked the XIVE model with the sPAPR XICS interrupt source to share
   the IRQ allocator   
 - improved hcall support (still some missing but they are not used
   under Linux)
 - improved device tree
 - should have addressed comments in first RFC
 - and much more ... Next version should have a better changelog.
 

Cédric Le Goater (21):
  ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC
  ppc/xive: define the XIVE internal tables
  ppc/xive: provide a link to the sPAPR ICS object under XIVE
  ppc/xive: allocate IRQ numbers for the IPIs
  ppc/xive: introduce handlers for interrupt sources
  ppc/xive: add MMIO handlers for the XIVE interrupt sources
  ppc/xive: describe the XIVE interrupt source flags
  ppc/xive: extend the interrupt presenter model for XIVE
  ppc/xive: add MMIO handlers for the XIVE TIMA
  ppc/xive: push the EQ data in OS event queue
  ppc/xive: notify the CPU when interrupt priority is more privileged
  ppc/xive: handle interrupt acknowledgment by the O/S
  ppc/xive: add support for the SET_OS_PENDING command
  spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument
  spapr: add a XIVE object to the sPAPR machine
  ppc/xive: add hcalls support
  ppc/xive: add device tree support
  ppc/xive: introduce a helper to map the XIVE memory regions
  ppc/xics: introduce a qirq_get() helper in the XICSFabric
  spapr: activate XIVE exploitation mode

 default-configs/ppc64-softmmu.mak |   1 +
 hw/intc/Makefile.objs             |   1 +
 hw/intc/spapr_xive.c              | 821 +++++++++++++++++++++++++++++++++
 hw/intc/spapr_xive_hcall.c        | 930 ++++++++++++++++++++++++++++++++++++++
 hw/intc/xics.c                    |  11 +-
 hw/intc/xive-internal.h           | 189 ++++++++
 hw/ppc/spapr.c                    | 110 ++++-
 hw/ppc/spapr_hcall.c              |   6 +
 hw/ppc/spapr_pci.c                |   4 +-
 include/hw/pci-host/spapr.h       |   2 +-
 include/hw/ppc/spapr.h            |  17 +-
 include/hw/ppc/spapr_xive.h       |  75 +++
 include/hw/ppc/xics.h             |   7 +
 include/migration/vmstate.h       |  10 +
 14 files changed, 2169 insertions(+), 15 deletions(-)
 create mode 100644 hw/intc/spapr_xive.c
 create mode 100644 hw/intc/spapr_xive_hcall.c
 create mode 100644 hw/intc/xive-internal.h
 create mode 100644 include/hw/ppc/spapr_xive.h

-- 
2.13.5

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  2:27   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 02/21] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC Cédric Le Goater
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

Start with a couple of attributes for the XIVE sPAPR controller
model. The number of provisionned IRQ is necessary to size the
different internal XIVE tables, the number of CPUs is also.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |  1 +
 hw/intc/Makefile.objs             |  1 +
 hw/intc/spapr_xive.c              | 76 +++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_xive.h       | 37 +++++++++++++++++++
 4 files changed, 115 insertions(+)
 create mode 100644 hw/intc/spapr_xive.c
 create mode 100644 include/hw/ppc/spapr_xive.h

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 46c95993217d..8294df31c0f5 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -56,6 +56,7 @@ CONFIG_SM501=y
 CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
+CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
 # For PReP
 CONFIG_SERIAL_ISA=y
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 78426a7dafcd..2dae80bdf611 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
+obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
new file mode 100644
index 000000000000..c83796519586
--- /dev/null
+++ b/hw/intc/spapr_xive.c
@@ -0,0 +1,76 @@
+/*
+ * QEMU PowerPC sPAPR XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/dma.h"
+#include "monitor/monitor.h"
+#include "hw/ppc/xics.h"
+#include "hw/ppc/spapr_xive.h"
+
+
+/*
+ * Main XIVE object
+ */
+
+static void spapr_xive_realize(DeviceState *dev, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+
+    if (!xive->nr_targets) {
+        error_setg(errp, "Number of interrupt targets needs to be greater 0");
+        return;
+    }
+
+    /* We need to be able to allocate at least the IPIs */
+    if (!xive->nr_irqs || xive->nr_irqs < xive->nr_targets) {
+        error_setg(errp, "Number of interrupts too small");
+        return;
+    }
+}
+
+static Property spapr_xive_properties[] = {
+    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
+    DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void spapr_xive_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = spapr_xive_realize;
+    dc->props = spapr_xive_properties;
+    dc->desc = "sPAPR XIVE interrupt controller";
+}
+
+static const TypeInfo spapr_xive_info = {
+    .name = TYPE_SPAPR_XIVE,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(sPAPRXive),
+    .class_init = spapr_xive_class_init,
+};
+
+static void spapr_xive_register_types(void)
+{
+    type_register_static(&spapr_xive_info);
+}
+
+type_init(spapr_xive_register_types)
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
new file mode 100644
index 000000000000..5b99f7fc2b81
--- /dev/null
+++ b/include/hw/ppc/spapr_xive.h
@@ -0,0 +1,37 @@
+/*
+ * QEMU PowerPC sPAPR XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef PPC_SPAPR_XIVE_H
+#define PPC_SPAPR_XIVE_H
+
+#include <hw/sysbus.h>
+
+typedef struct sPAPRXive sPAPRXive;
+
+#define TYPE_SPAPR_XIVE "spapr-xive"
+#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
+
+struct sPAPRXive {
+    SysBusDevice parent;
+
+    /* Properties */
+    uint32_t     nr_targets;
+    uint32_t     nr_irqs;
+};
+
+#endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 02/21] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables Cédric Le Goater
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

This is needed to migrate the state of the internal tables of the XIVE
object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/migration/vmstate.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 85e43da56868..4dfb1bf84b5e 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -560,6 +560,16 @@ extern const VMStateInfo vmstate_info_qtailq;
     .offset     = vmstate_offset_pointer(_state, _field, _type),     \
 }
 
+#define VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
+    .name       = (stringify(_field)),                               \
+    .version_id = (_version),                                        \
+    .vmsd       = &(_vmsd),                                          \
+    .num_offset = vmstate_offset_value(_state, _field_num, uint32_t), \
+    .size       = sizeof(_type),                                     \
+    .flags      = VMS_STRUCT|VMS_VARRAY_UINT32|VMS_ALLOC|VMS_POINTER, \
+    .offset     = vmstate_offset_pointer(_state, _field, _type),     \
+}
+
 #define VMSTATE_STATIC_BUFFER(_field, _state, _version, _test, _start, _size) { \
     .name         = (stringify(_field)),                             \
     .version_id   = (_version),                                      \
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 02/21] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  2:39   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE Cédric Le Goater
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

The XIVE interrupt controller of the POWER9 uses a set of tables to
redirect exception from event sources to CPU threads. Among which we
choose to model :

 - the State Bit Entries (SBE), also known as Event State Buffer
   (ESB). This is a two bit state machine for each event source which
   is used to trigger events. The bits are named "P" (pending) and "Q"
   (queued) and can be controlled by MMIO.

 - the Interrupt Virtualization Entry (IVE) table, also known as Event
   Assignment Structure (EAS). This table is indexed by the IRQ number
   and is looked up to find the Event Queue associated with a
   triggered event.

 - the Event Queue Descriptor (EQD) table, also known as Event
   Notification Descriptor (END). The EQD contains fields that specify
   the Event Queue on which event data is posted (and later pulled by
   the OS) and also a target (or VPD) to notify.

An additional table was not modeled but we might need to support the
H_INT_SET_OS_REPORTING_LINE hcall:

 - the Virtual Processor Descriptor (VPD) table, also known as
   Notification Virtual Target (NVT).

The XIVE object is expanded with the tables described above. The size
of each table depends on the number of provisioned IRQ and the maximum
number of CPUs in the system. The indexing is very basic and might
need to be improved for the EQs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 108 ++++++++++++++++++++++++++++++++++++++++++++
 hw/intc/xive-internal.h     | 105 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_xive.h |   9 ++++
 3 files changed, 222 insertions(+)
 create mode 100644 hw/intc/xive-internal.h

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index c83796519586..6d98528fae68 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -25,11 +25,34 @@
 #include "hw/ppc/xics.h"
 #include "hw/ppc/spapr_xive.h"
 
+#include "xive-internal.h"
 
 /*
  * Main XIVE object
  */
 
+void spapr_xive_reset(void *dev)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+    int i;
+
+    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
+    memset(xive->sbe, 0x55, xive->sbe_size);
+
+    /* Validate all available IVEs in the IRQ number space. It would
+     * be more correct to validate only the allocated IRQs but this
+     * would require some callback routine from the spapr machine into
+     * XIVE. To be done later.
+     */
+    for (i = 0; i < xive->nr_irqs; i++) {
+        XiveIVE *ive = &xive->ivt[i];
+        ive->w = IVE_VALID | IVE_MASKED;
+    }
+
+    /* clear all EQs */
+    memset(xive->eqt, 0, xive->nr_eqs * sizeof(XiveEQ));
+}
+
 static void spapr_xive_realize(DeviceState *dev, Error **errp)
 {
     sPAPRXive *xive = SPAPR_XIVE(dev);
@@ -44,8 +67,64 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "Number of interrupts too small");
         return;
     }
+
+    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
+    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
+    xive->sbe = g_malloc0(xive->sbe_size);
+
+    /* Allocate the IVT (Interrupt Virtualization Table) */
+    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
+
+    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
+     * for each thread in the system */
+    xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
+    xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
+
+    qemu_register_reset(spapr_xive_reset, dev);
 }
 
+static const VMStateDescription vmstate_spapr_xive_ive = {
+    .name = "xive/ive",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(w, XiveIVE),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_xive_eq = {
+    .name = "xive/eq",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT32(w0, XiveEQ),
+        VMSTATE_UINT32(w1, XiveEQ),
+        VMSTATE_UINT32(w2, XiveEQ),
+        VMSTATE_UINT32(w3, XiveEQ),
+        VMSTATE_UINT32(w4, XiveEQ),
+        VMSTATE_UINT32(w5, XiveEQ),
+        VMSTATE_UINT32(w6, XiveEQ),
+        VMSTATE_UINT32(w7, XiveEQ),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_xive = {
+    .name = "xive",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_VARRAY_UINT32_ALLOC(sbe, sPAPRXive, sbe_size, 0,
+                                    vmstate_info_uint8, uint8_t),
+        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 0,
+                                    vmstate_spapr_xive_ive, XiveIVE),
+        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(eqt, sPAPRXive, nr_eqs, 0,
+                                    vmstate_spapr_xive_eq, XiveEQ),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static Property spapr_xive_properties[] = {
     DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
     DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
@@ -59,6 +138,7 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
     dc->realize = spapr_xive_realize;
     dc->props = spapr_xive_properties;
     dc->desc = "sPAPR XIVE interrupt controller";
+    dc->vmsd = &vmstate_xive;
 }
 
 static const TypeInfo spapr_xive_info = {
@@ -74,3 +154,31 @@ static void spapr_xive_register_types(void)
 }
 
 type_init(spapr_xive_register_types)
+
+XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t idx)
+{
+    return idx < xive->nr_irqs ? &xive->ivt[idx] : NULL;
+}
+
+XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx)
+{
+    return idx < xive->nr_eqs ? &xive->eqt[idx] : NULL;
+}
+
+/* TODO: improve EQ indexing. This is very simple and relies on the
+ * fact that target (CPU) numbers start at 0 and are contiguous. It
+ * should be OK for sPAPR.
+ */
+bool spapr_xive_eq_for_target(sPAPRXive *xive, uint32_t target,
+                              uint8_t priority, uint32_t *out_eq_idx)
+{
+    if (priority > XIVE_PRIORITY_MAX || target >= xive->nr_targets) {
+        return false;
+    }
+
+    if (out_eq_idx) {
+        *out_eq_idx = target + priority;
+    }
+
+    return true;
+}
diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
new file mode 100644
index 000000000000..95184bad5c1d
--- /dev/null
+++ b/hw/intc/xive-internal.h
@@ -0,0 +1,105 @@
+/*
+ * QEMU PowerPC XIVE model
+ *
+ * Copyright 2016,2017 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _INTC_XIVE_INTERNAL_H
+#define _INTC_XIVE_INTERNAL_H
+
+/* Utilities to manipulate these (originaly from OPAL) */
+#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
+#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
+#define SETFIELD(m, v, val)                             \
+        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
+
+#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
+#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
+#define PPC_BIT8(bit)           (0x80UL >> (bit))
+#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
+#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
+                                 PPC_BIT32(bs))
+
+/* IVE/EAS
+ *
+ * One per interrupt source. Targets that interrupt to a given EQ
+ * and provides the corresponding logical interrupt number (EQ data)
+ *
+ * We also map this structure to the escalation descriptor inside
+ * an EQ, though in that case the valid and masked bits are not used.
+ */
+typedef struct XiveIVE {
+        /* Use a single 64-bit definition to make it easier to
+         * perform atomic updates
+         */
+        uint64_t        w;
+#define IVE_VALID       PPC_BIT(0)
+#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
+#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
+#define IVE_MASKED      PPC_BIT(32)              /* Masked */
+#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
+} XiveIVE;
+
+/* EQ */
+typedef struct XiveEQ {
+        uint32_t        w0;
+#define EQ_W0_VALID             PPC_BIT32(0)
+#define EQ_W0_ENQUEUE           PPC_BIT32(1)
+#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
+#define EQ_W0_BACKLOG           PPC_BIT32(3)
+#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
+#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
+#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
+#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
+#define EQ_W0_SW0               PPC_BIT32(16)
+#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
+#define EQ_QSIZE_4K             0
+#define EQ_QSIZE_64K            4
+#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
+        uint32_t        w1;
+#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
+#define EQ_W1_ESn_P             PPC_BIT32(0)
+#define EQ_W1_ESn_Q             PPC_BIT32(1)
+#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
+#define EQ_W1_ESe_P             PPC_BIT32(2)
+#define EQ_W1_ESe_Q             PPC_BIT32(3)
+#define EQ_W1_GENERATION        PPC_BIT32(9)
+#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
+        uint32_t        w2;
+#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
+#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
+        uint32_t        w3;
+#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
+        uint32_t        w4;
+#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
+#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
+        uint32_t        w5;
+#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
+        uint32_t        w6;
+#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
+#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
+#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
+        uint32_t        w7;
+#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
+#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
+#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
+#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
+#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
+} XiveEQ;
+
+#define XIVE_EQ_PRIORITY_COUNT 8
+#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
+
+void spapr_xive_reset(void *dev);
+XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t isn);
+XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
+
+bool spapr_xive_eq_for_target(sPAPRXive *xive, uint32_t target, uint8_t prio,
+                        uint32_t *out_eq_idx);
+
+
+#endif /* _INTC_XIVE_INTERNAL_H */
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 5b99f7fc2b81..b17dd4f17b0b 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -22,6 +22,8 @@
 #include <hw/sysbus.h>
 
 typedef struct sPAPRXive sPAPRXive;
+typedef struct XiveIVE XiveIVE;
+typedef struct XiveEQ XiveEQ;
 
 #define TYPE_SPAPR_XIVE "spapr-xive"
 #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
@@ -32,6 +34,13 @@ struct sPAPRXive {
     /* Properties */
     uint32_t     nr_targets;
     uint32_t     nr_irqs;
+
+    /* XIVE internal tables */
+    uint8_t      *sbe;
+    uint32_t     sbe_size;
+    XiveIVE      *ivt;
+    XiveEQ       *eqt;
+    uint32_t     nr_eqs;
 };
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (2 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-11 22:04   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-09-19  2:44   ` [Qemu-devel] " David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs Cédric Le Goater
                   ` (17 subsequent siblings)
  21 siblings, 2 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

The sPAPR machine first starts with a XICS interrupt model and
depending on the guest capabilities, the XIVE exploitation mode is
negotiated during CAS. A reset should then be performed to rebuild the
device tree but the same IRQ numbers which were allocated by the
devices prior to reset, when the XICS model was operating, are still
in use.

For this purpose, we need a common IRQ number allocator for both the
interrupt models: XICS legacy or XIVE exploitation. This is what the
ICSIRQState array of the XICS interrupt source is used for. It also
contains the LSI/MSI flag of an interrupt which will we need later on.

So, let's provide a link to the sPAPR ICS object under XIVE to make
use of it.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 12 ++++++++++++
 include/hw/ppc/spapr_xive.h |  4 ++++
 2 files changed, 16 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 6d98528fae68..1681affb0848 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -56,6 +56,8 @@ void spapr_xive_reset(void *dev)
 static void spapr_xive_realize(DeviceState *dev, Error **errp)
 {
     sPAPRXive *xive = SPAPR_XIVE(dev);
+    Object *obj;
+    Error *err = NULL;
 
     if (!xive->nr_targets) {
         error_setg(errp, "Number of interrupt targets needs to be greater 0");
@@ -68,6 +70,16 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /* Retrieve SPAPR ICS source to share the IRQ number allocator */
+    obj = object_property_get_link(OBJECT(dev), "ics", &err);
+    if (!obj) {
+        error_setg(errp, "%s: required link 'ics' not found: %s",
+                   __func__, error_get_pretty(err));
+        return;
+    }
+
+    xive->ics = ICS_BASE(obj);
+
     /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
     xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
     xive->sbe = g_malloc0(xive->sbe_size);
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index b17dd4f17b0b..29112589b37f 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -24,6 +24,7 @@
 typedef struct sPAPRXive sPAPRXive;
 typedef struct XiveIVE XiveIVE;
 typedef struct XiveEQ XiveEQ;
+typedef struct ICSState ICSState;
 
 #define TYPE_SPAPR_XIVE "spapr-xive"
 #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
@@ -35,6 +36,9 @@ struct sPAPRXive {
     uint32_t     nr_targets;
     uint32_t     nr_irqs;
 
+    /* IRQ */
+    ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
+
     /* XIVE internal tables */
     uint8_t      *sbe;
     uint32_t     sbe_size;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (3 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  2:45   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources Cédric Le Goater
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

The number of IPIs is deduced from the max number of CPUs the guest
supports and the IRQ numbers for the IPIs are allocated from the top
of the IRQ number space to reduce conflict with other IRQ numbers
allocated by the devices.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 1681affb0848..52c32f588d6d 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -58,6 +58,7 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     sPAPRXive *xive = SPAPR_XIVE(dev);
     Object *obj;
     Error *err = NULL;
+    int i;
 
     if (!xive->nr_targets) {
         error_setg(errp, "Number of interrupt targets needs to be greater 0");
@@ -80,6 +81,11 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
 
     xive->ics = ICS_BASE(obj);
 
+    /* Allocate the last IRQ numbers for the IPIs */
+    for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
+        ics_set_irq_type(xive->ics, i, false);
+    }
+
     /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
     xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
     xive->sbe = g_malloc0(xive->sbe_size);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (4 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  2:48   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE " Cédric Le Goater
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

These are very similar to the XICS handlers in a simpler form. They
make use of the ICSIRQState array of the XICS interrupt source to
differentiate the MSI from the LSI interrupts. The spapr_xive_irq()
routine in charge of triggering the CPU interrupt line will be filled
later on.

The next patch will introduce the MMIO handlers to interact with XIVE
interrupt sources.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_xive.h |  1 +
 2 files changed, 47 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 52c32f588d6d..1ed7b6a286e9 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -27,6 +27,50 @@
 
 #include "xive-internal.h"
 
+static void spapr_xive_irq(sPAPRXive *xive, int srcno)
+{
+
+}
+
+/*
+ * XIVE Interrupt Source
+ */
+static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
+{
+    if (val) {
+        spapr_xive_irq(xive, srcno);
+    }
+}
+
+static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int srcno, int val)
+{
+    ICSIRQState *irq = &xive->ics->irqs[srcno];
+
+    if (val) {
+        irq->status |= XICS_STATUS_ASSERTED;
+    } else {
+        irq->status &= ~XICS_STATUS_ASSERTED;
+    }
+
+    if (irq->status & XICS_STATUS_ASSERTED
+        && !(irq->status & XICS_STATUS_SENT)) {
+        irq->status |= XICS_STATUS_SENT;
+        spapr_xive_irq(xive, srcno);
+    }
+}
+
+static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    ICSIRQState *irq = &xive->ics->irqs[srcno];
+
+    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+        spapr_xive_source_set_irq_lsi(xive, srcno, val);
+    } else {
+        spapr_xive_source_set_irq_msi(xive, srcno, val);
+    }
+}
+
 /*
  * Main XIVE object
  */
@@ -80,6 +124,8 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     }
 
     xive->ics = ICS_BASE(obj);
+    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
+                                     xive->nr_irqs);
 
     /* Allocate the last IRQ numbers for the IPIs */
     for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 29112589b37f..eab92c4c1bb8 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -38,6 +38,7 @@ struct sPAPRXive {
 
     /* IRQ */
     ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
+    qemu_irq     *qirqs;
 
     /* XIVE internal tables */
     uint8_t      *sbe;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (5 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  2:57   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 08/21] ppc/xive: describe the XIVE interrupt source flags Cédric Le Goater
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

Each interrupt source is associated with a two bit state machine
called an Event State Buffer (ESB) which is controlled by MMIO to
trigger events. See code for more details on the states and
transitions.

The MMIO space for the ESB translation is 512GB large on baremetal
(powernv) systems and the BAR depends on the chip id. In our model for
the sPAPR machine, we choose to only map a sub memory region for the
provisionned IRQ numbers and to use the mapping address of chip 0 on a
real system. The OS will get the address of the MMIO page of the ESB
entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.

For KVM support, we should think of a way to map this QEMU memory
region in the host to trigger events directly.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_xive.h |   6 ++
 2 files changed, 261 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 1ed7b6a286e9..8a85d64efc4c 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
 }
 
 /*
+ * "magic" Event State Buffer (ESB) MMIO offsets.
+ *
+ * Each interrupt source has a 2-bit state machine called ESB
+ * which can be controlled by MMIO. It's made of 2 bits, P and
+ * Q. P indicates that an interrupt is pending (has been sent
+ * to a queue and is waiting for an EOI). Q indicates that the
+ * interrupt has been triggered while pending.
+ *
+ * This acts as a coalescing mechanism in order to guarantee
+ * that a given interrupt only occurs at most once in a queue.
+ *
+ * When doing an EOI, the Q bit will indicate if the interrupt
+ * needs to be re-triggered.
+ *
+ * The following offsets into the ESB MMIO allow to read or
+ * manipulate the PQ bits. They must be used with an 8-bytes
+ * load instruction. They all return the previous state of the
+ * interrupt (atomically).
+ *
+ * Additionally, some ESB pages support doing an EOI via a
+ * store at 0 and some ESBs support doing a trigger via a
+ * separate trigger page.
+ */
+#define XIVE_ESB_GET            0x800
+#define XIVE_ESB_SET_PQ_00      0xc00
+#define XIVE_ESB_SET_PQ_01      0xd00
+#define XIVE_ESB_SET_PQ_10      0xe00
+#define XIVE_ESB_SET_PQ_11      0xf00
+
+#define XIVE_ESB_VAL_P          0x2
+#define XIVE_ESB_VAL_Q          0x1
+
+#define XIVE_ESB_RESET          0x0
+#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
+#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
+#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
+
+static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
+{
+    uint32_t byte = idx / 4;
+    uint32_t bit  = (idx % 4) * 2;
+
+    assert(byte < xive->sbe_size);
+
+    return (xive->sbe[byte] >> bit) & 0x3;
+}
+
+static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
+{
+    uint32_t byte = idx / 4;
+    uint32_t bit  = (idx % 4) * 2;
+    uint8_t old, new;
+
+    assert(byte < xive->sbe_size);
+
+    old = xive->sbe[byte];
+
+    new = xive->sbe[byte] & ~(0x3 << bit);
+    new |= (pq & 0x3) << bit;
+
+    xive->sbe[byte] = new;
+
+    return (old >> bit) & 0x3;
+}
+
+static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
+{
+    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_PENDING:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_QUEUED:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_OFF:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
+{
+    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_PENDING:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
+        return true;
+    case XIVE_ESB_QUEUED:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
+        return true;
+    case XIVE_ESB_OFF:
+        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+/*
+ * XIVE Interrupt Source MMIOs
+ */
+static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)
+{
+    ICSIRQState *irq = &xive->ics->irqs[srcno];
+
+    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+        irq->status &= ~XICS_STATUS_SENT;
+    }
+}
+
+/* TODO: handle second page
+ *
+ * Some HW use a separate page for trigger. We only support the case
+ * in which the trigger can be done in the same page as the EOI.
+ */
+static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    uint32_t offset = addr & 0xF00;
+    uint32_t srcno = addr >> xive->esb_shift;
+    XiveIVE *ive;
+    uint64_t ret = -1;
+
+    ive = spapr_xive_get_ive(xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID))  {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
+        goto out;
+    }
+
+    switch (offset) {
+    case 0:
+        spapr_xive_source_eoi(xive, srcno);
+
+        /* return TRUE or FALSE depending on PQ value */
+        ret = spapr_xive_pq_eoi(xive, srcno);
+        break;
+
+    case XIVE_ESB_GET:
+        ret = spapr_xive_pq_get(xive, srcno);
+        break;
+
+    case XIVE_ESB_SET_PQ_00:
+    case XIVE_ESB_SET_PQ_01:
+    case XIVE_ESB_SET_PQ_10:
+    case XIVE_ESB_SET_PQ_11:
+        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
+    }
+
+out:
+    return ret;
+}
+
+static void spapr_xive_esb_write(void *opaque, hwaddr addr,
+                           uint64_t value, unsigned size)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    uint32_t offset = addr & 0xF00;
+    uint32_t srcno = addr >> xive->esb_shift;
+    XiveIVE *ive;
+    bool notify = false;
+
+    ive = spapr_xive_get_ive(xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID))  {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
+        return;
+    }
+
+    switch (offset) {
+    case 0:
+        /* TODO: should we trigger even if the IVE is masked ? */
+        notify = spapr_xive_pq_trigger(xive, srcno);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
+                      offset);
+        return;
+    }
+
+    if (notify && !(ive->w & IVE_MASKED)) {
+        qemu_irq_pulse(xive->qirqs[srcno]);
+    }
+}
+
+static const MemoryRegionOps spapr_xive_esb_ops = {
+    .read = spapr_xive_esb_read,
+    .write = spapr_xive_esb_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+};
+
+/*
  * XIVE Interrupt Source
  */
 static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
@@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
 /*
  * Main XIVE object
  */
+#define P9_MMIO_BASE     0x006000000000000ull
+
+/* VC BAR contains set translations for the ESBs and the EQs. */
+#define VC_BAR_DEFAULT   0x10000000000ull
+#define VC_BAR_SIZE      0x08000000000ull
+#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
+
+static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
+                                            unsigned size)
+{
+    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
+                  __func__, offset, size);
+    return 0;
+}
+
+static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
+                                         uint64_t value, unsigned size)
+{
+    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
+                  __func__, offset, value, size);
+}
+
+static const MemoryRegionOps spapr_xive_esb_default_ops = {
+    .read = spapr_xive_esb_default_read,
+    .write = spapr_xive_esb_default_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
 
 void spapr_xive_reset(void *dev)
 {
@@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
     xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
 
+    /* VC BAR. That's the full window but we will only map the
+     * subregions in use. */
+    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
+    xive->esb_shift = ESB_SHIFT;
+
+    /* Install default memory region handlers to log bogus access */
+    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
+                          NULL, "xive.esb.full", VC_BAR_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
+
+    /* Install the ESB memory region in the overall one */
+    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
+                          xive, "xive.esb",
+                          (1ull << xive->esb_shift) * xive->nr_irqs);
+    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
+
     qemu_register_reset(spapr_xive_reset, dev);
 }
 
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index eab92c4c1bb8..0f516534d76a 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -46,6 +46,12 @@ struct sPAPRXive {
     XiveIVE      *ivt;
     XiveEQ       *eqt;
     uint32_t     nr_eqs;
+
+    /* ESB memory region */
+    uint32_t     esb_shift;
+    hwaddr       esb_base;
+    MemoryRegion esb_mr;
+    MemoryRegion esb_iomem;
 };
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 08/21] ppc/xive: describe the XIVE interrupt source flags
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (6 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE " Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE Cédric Le Goater
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

The XIVE interrupt sources can have different characteristics
depending on their nature and the HW level in use. The PAPR specs
provide a set of flags to describe them : :

 - XIVE_SRC_H_INT_ESB  the Event State Buffers are controlled with a
                       specific hcall H_INT_ESB and not with MMIO
 - XIVE_SRC_LSI        LSI or MSI source (ICSIRQState level)
 - XIVE_SRC_TRIGGER    the full function page supports trigger
 - XIVE_SRC_STORE_EOI  EOI can be done with a store.

Our QEMU emulation of XIVE for the sPAPR machine gathers all sources
under a same model and provides a common source with the
XIVE_SRC_TRIGGER type. So, the above list is mostly informative apart
from the XIVE_SRC_LSI flag which will be deduced from the
XICS_FLAGS_IRQ_LSI flag of the ICSIRQState array when needed.

The OS retrieves this information on the source with the
H_INT_GET_SOURCE_INFO hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 4 ++++
 include/hw/ppc/spapr_xive.h | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 8a85d64efc4c..a1ce993d2afa 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -371,6 +371,10 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
         ics_set_irq_type(xive->ics, i, false);
     }
 
+    /* All sources are emulated under the XIVE object and share the
+     * same characteristic */
+    xive->flags = XIVE_SRC_TRIGGER;
+
     /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
     xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
     xive->sbe = g_malloc0(xive->sbe_size);
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 0f516534d76a..b46e59319236 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -40,6 +40,13 @@ struct sPAPRXive {
     ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
     qemu_irq     *qirqs;
 
+    /* Interrupt source flags */
+#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
+#define XIVE_SRC_LSI           (1ull << (63 - 61))
+#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
+#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
+    uint32_t     flags;
+
     /* XIVE internal tables */
     uint8_t      *sbe;
     uint32_t     sbe_size;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (7 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 08/21] ppc/xive: describe the XIVE interrupt source flags Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  7:36   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 10/21] ppc/xive: add MMIO handlers for the XIVE TIMA Cédric Le Goater
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

The XIVE interrupt presenter exposes a set of Thread Interrupt
Management Areas, also called rings, one per different level of
privilege (four in all). This area is used to handle priority
management and interrupt acknowledgment among other things.

We extend the ICPState object with a cache of the register data for
XIVE. The integration with the sPAPR machine is much easier and we
need a common framework to switch from one controller model to
another: XICS <-> XIVE.

The next patch will introduce the MMIO handlers to interact with the
TIMA, OS only, which is required for the sPAPR support.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics.c        | 4 ++++
 include/hw/ppc/xics.h | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index a84ba51ad8ff..927d4fec966a 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -274,6 +274,7 @@ static const VMStateDescription vmstate_icp_server = {
         VMSTATE_UINT32(xirr, ICPState),
         VMSTATE_UINT8(pending_priority, ICPState),
         VMSTATE_UINT8(mfrr, ICPState),
+        VMSTATE_UINT8_ARRAY(tima, ICPState, 0x40),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -293,6 +294,7 @@ static void icp_reset(void *dev)
     if (icpc->reset) {
         icpc->reset(icp);
     }
+    memset(icp->tima, 0, sizeof(icp->tima));
 }
 
 static void icp_realize(DeviceState *dev, Error **errp)
@@ -343,6 +345,8 @@ static void icp_realize(DeviceState *dev, Error **errp)
         icpc->realize(icp, errp);
     }
 
+    icp->tima_os = &icp->tima[0x10];
+
     qemu_register_reset(icp_reset, dev);
     vmstate_register(NULL, icp->cs->cpu_index, &vmstate_icp_server, icp);
 }
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 28d248abad61..c835997303c4 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -83,6 +83,12 @@ struct ICPState {
     qemu_irq output;
 
     XICSFabric *xics;
+
+    /* XIVE section */
+#define XIVE_TM_RING_COUNT 4
+
+    uint8_t tima[XIVE_TM_RING_COUNT * 0x10];
+    uint8_t *tima_os;
 };
 
 #define ICP_PROP_XICS "xics"
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 10/21] ppc/xive: add MMIO handlers for the XIVE TIMA
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (8 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue Cédric Le Goater
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

The Thread Interrupt Management Area for the OS is mostly used to
acknowledge interrupts and set the CPPR of the CPU.

The TIMA is mapped at the same address for each CPU. 'current_cpu' is
used to retrieve the targeted interrupt presenter object holding the
cache data of the registers.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 161 ++++++++++++++++++++++++++++++++++++++++++++
 hw/intc/xive-internal.h     |  84 +++++++++++++++++++++++
 include/hw/ppc/spapr_xive.h |   5 ++
 3 files changed, 250 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index a1ce993d2afa..557a7e2535b5 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -27,6 +27,154 @@
 
 #include "xive-internal.h"
 
+
+static uint64_t spapr_xive_icp_accept(ICPState *icp)
+{
+    return 0;
+}
+
+static void spapr_xive_icp_set_cppr(ICPState *icp, uint8_t cppr)
+{
+    if (cppr > XIVE_PRIORITY_MAX) {
+        cppr = 0xff;
+    }
+
+    icp->tima_os[TM_CPPR] = cppr;
+}
+
+/*
+ * Thread Interrupt Management Area MMIO
+ */
+static uint64_t spapr_xive_tm_read_special(ICPState *icp, hwaddr offset,
+                                     unsigned size)
+{
+    uint64_t ret = -1;
+
+    if (offset == TM_SPC_ACK_OS_REG && size == 2) {
+        ret = spapr_xive_icp_accept(icp);
+    } else {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
+                      HWADDR_PRIx" size %d\n", offset, size);
+    }
+
+    return ret;
+}
+
+static uint64_t spapr_xive_tm_read(void *opaque, hwaddr offset, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    ICPState *icp = ICP(cpu->intc);
+    uint64_t ret = -1;
+    int i;
+
+    if (offset >= TM_SPC_ACK_EBB) {
+        return spapr_xive_tm_read_special(icp, offset, size);
+    }
+
+    if (offset & TM_QW1_OS) {
+        switch (size) {
+        case 1:
+        case 2:
+        case 4:
+        case 8:
+            if (QEMU_IS_ALIGNED(offset, size)) {
+                ret = 0;
+                for (i = 0; i < size; i++) {
+                    ret |= icp->tima[offset + i] << (8 * i);
+                }
+            } else {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                              "XIVE: invalid TIMA read alignment @%"
+                              HWADDR_PRIx" size %d\n", offset, size);
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
+                      HWADDR_PRIx"\n", offset);
+    }
+
+    return ret;
+}
+
+static bool spapr_xive_tm_is_readonly(uint8_t index)
+{
+    /* Let's be optimistic and prepare ground for HV mode support */
+    switch (index) {
+    case TM_QW1_OS + TM_CPPR:
+        return false;
+    default:
+        return true;
+    }
+}
+
+static void spapr_xive_tm_write_special(ICPState *icp, hwaddr offset,
+                                  uint64_t value, unsigned size)
+{
+    /* TODO: support TM_SPC_SET_OS_PENDING */
+
+    /* TODO: support TM_SPC_ACK_OS_EL */
+}
+
+static void spapr_xive_tm_write(void *opaque, hwaddr offset,
+                           uint64_t value, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    ICPState *icp = ICP(cpu->intc);
+    int i;
+
+    if (offset >= TM_SPC_ACK_EBB) {
+        spapr_xive_tm_write_special(icp, offset, value, size);
+        return;
+    }
+
+    if (offset & TM_QW1_OS) {
+        switch (size) {
+        case 1:
+            if (offset == TM_QW1_OS + TM_CPPR) {
+                spapr_xive_icp_set_cppr(icp, value & 0xff);
+            }
+            break;
+        case 4:
+        case 8:
+            if (QEMU_IS_ALIGNED(offset, size)) {
+                for (i = 0; i < size; i++) {
+                    if (!spapr_xive_tm_is_readonly(offset + i)) {
+                        icp->tima[offset + i] = (value >> (8 * i)) & 0xff;
+                    }
+                }
+            } else {
+                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                              HWADDR_PRIx" size %d\n", offset, size);
+            }
+            break;
+        default:
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                          HWADDR_PRIx" size %d\n", offset, size);
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: does handle non-OS TIMA ring @%"
+                      HWADDR_PRIx"\n", offset);
+    }
+}
+
+
+static const MemoryRegionOps spapr_xive_tm_ops = {
+    .read = spapr_xive_tm_read,
+    .write = spapr_xive_tm_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
+
 static void spapr_xive_irq(sPAPRXive *xive, int srcno)
 {
 
@@ -293,6 +441,11 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
 #define VC_BAR_SIZE      0x08000000000ull
 #define ESB_SHIFT        16 /* One 64k page. OPAL has two */
 
+/* Thread Interrupt Management Area MMIO */
+#define TM_BAR_DEFAULT   0x30203180000ull
+#define TM_SHIFT         16
+#define TM_BAR_SIZE      (XIVE_TM_RING_COUNT * (1 << TM_SHIFT))
+
 static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
                                             unsigned size)
 {
@@ -403,6 +556,14 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
                           (1ull << xive->esb_shift) * xive->nr_irqs);
     memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
 
+    /* TM BAR. Same address for each chip */
+    xive->tm_base = (P9_MMIO_BASE | TM_BAR_DEFAULT);
+    xive->tm_shift = TM_SHIFT;
+
+    memory_region_init_io(&xive->tm_iomem, OBJECT(xive), &spapr_xive_tm_ops,
+                          xive, "xive.tm", TM_BAR_SIZE);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_iomem);
+
     qemu_register_reset(spapr_xive_reset, dev);
 }
 
diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index 95184bad5c1d..c6678ec7d161 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -24,6 +24,90 @@
 #define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
                                  PPC_BIT32(bs))
 
+/*
+ * Thread Management (aka "TM") registers
+ */
+
+/* TM register offsets */
+#define TM_QW0_USER             0x000 /* All rings */
+#define TM_QW1_OS               0x010 /* Ring 0..2 */
+#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
+#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
+
+/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
+#define TM_NSR                  0x0  /*  +   +   -   +  */
+#define TM_CPPR                 0x1  /*  -   +   -   +  */
+#define TM_IPB                  0x2  /*  -   +   +   +  */
+#define TM_LSMFB                0x3  /*  -   +   +   +  */
+#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
+#define TM_INC                  0x5  /*  -   +   -   +  */
+#define TM_AGE                  0x6  /*  -   +   -   +  */
+#define TM_PIPR                 0x7  /*  -   +   -   +  */
+
+#define TM_WORD0                0x0
+#define TM_WORD1                0x4
+
+/*
+ * QW word 2 contains the valid bit at the top and other fields
+ * depending on the QW.
+ */
+#define TM_WORD2                0x8
+#define   TM_QW0W2_VU           PPC_BIT32(0)
+#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
+#define   TM_QW1W2_VO           PPC_BIT32(0)
+#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
+#define   TM_QW2W2_VP           PPC_BIT32(0)
+#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
+#define   TM_QW3W2_VT           PPC_BIT32(0)
+#define   TM_QW3W2_LP           PPC_BIT32(6)
+#define   TM_QW3W2_LE           PPC_BIT32(7)
+#define   TM_QW3W2_T            PPC_BIT32(31)
+
+/*
+ * In addition to normal loads to "peek" and writes (only when invalid)
+ * using 4 and 8 bytes accesses, the above registers support these
+ * "special" byte operations:
+ *
+ *   - Byte load from QW0[NSR] - User level NSR (EBB)
+ *   - Byte store to QW0[NSR] - User level NSR (EBB)
+ *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
+ *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
+ *                                    otherwise VT||0000000
+ *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
+ *
+ * Then we have all these "special" CI ops at these offset that trigger
+ * all sorts of side effects:
+ */
+#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
+#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
+#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
+#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
+                                         * context */
+#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
+#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
+                                         * context to reg */
+#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
+                                         * context to reg*/
+#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
+#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
+                                         * line */
+#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
+#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
+                                         * line */
+#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
+/* XXX more... */
+
+/* NSR fields for the various QW ack types */
+#define TM_QW0_NSR_EB           PPC_BIT8(0)
+#define TM_QW1_NSR_EO           PPC_BIT8(0)
+#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
+#define  TM_QW3_NSR_HE_NONE     0
+#define  TM_QW3_NSR_HE_POOL     1
+#define  TM_QW3_NSR_HE_PHYS     2
+#define  TM_QW3_NSR_HE_LSI      3
+#define TM_QW3_NSR_I            PPC_BIT8(2)
+#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
+
 /* IVE/EAS
  *
  * One per interrupt source. Targets that interrupt to a given EQ
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index b46e59319236..3af01a0a4b22 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -59,6 +59,11 @@ struct sPAPRXive {
     hwaddr       esb_base;
     MemoryRegion esb_mr;
     MemoryRegion esb_iomem;
+
+    /* TIMA memory region */
+    uint32_t     tm_shift;
+    hwaddr       tm_base;
+    MemoryRegion tm_iomem;
 };
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (9 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 10/21] ppc/xive: add MMIO handlers for the XIVE TIMA Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  7:45   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 12/21] ppc/xive: notify the CPU when interrupt priority is more privileged Cédric Le Goater
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

If a triggered event is let through, the Event Queue data defined in
the associated IVE is pushed in the in-memory event queue. The latter
is a circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG
hcall, one per target and priority couple. It is composed of Event
Queue entries which are 4 bytes long, the first bit being a
'generation' bit and the 31 following bits the EQ Data field.

The EQ Data field provides a way to set an invariant logical event
source number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG
hcall.

Notification of the CPU will be done in the following patch.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 557a7e2535b5..4bc61cfda67a 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -175,9 +175,76 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
     },
 };
 
+static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
+{
+    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
+    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
+    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
+    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
+
+    uint64_t qaddr = qaddr_base + (qindex << 2);
+    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
+    uint32_t qentries = 1 << (qsize + 10);
+
+    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
+                      HWADDR_PRIx "\n", __func__, qaddr);
+        return;
+    }
+
+    qindex = (qindex + 1) % qentries;
+    if (qindex == 0) {
+        qgen ^= 1;
+        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
+    }
+    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
+}
+
 static void spapr_xive_irq(sPAPRXive *xive, int srcno)
 {
+    XiveIVE *ive;
+    XiveEQ *eq;
+    uint32_t eq_idx;
+    uint32_t priority;
+
+    ive = spapr_xive_get_ive(xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
+        return;
+    }
+
+    if (ive->w & IVE_MASKED) {
+        return;
+    }
+
+    /* Find our XiveEQ */
+    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
+    eq = spapr_xive_get_eq(xive, eq_idx);
+    if (!eq) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", srcno);
+        return;
+    }
+
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        spapr_xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
+    }
+
+    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
+        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
+    }
+
+    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
+        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
 
+        /* The EQ is masked. Can this happen ?  */
+        if (priority == 0xff) {
+            return;
+        }
+    } else {
+        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
+    }
 }
 
 /*
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 12/21] ppc/xive: notify the CPU when interrupt priority is more privileged
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (10 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  7:50   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S Cédric Le Goater
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

The Pending Interrupt Priority Register (PIPR) contains the priority
of the most favored pending notification. It is calculated from the
Interrupt Pending Buffer (IPB) which indicates a pending interrupt at
the priority corresponding to the bit number.

If the PIPR is more favored (1) than the Current Processor Priority
Register (CPPR), the CPU interrupt line can be raised and the EO bit
of the Notification Source Register is updated to notify the presence
of an exception for the O/S. The check needs to be done whenever the
PIPR or the CPPR is changed.

(1) numerically less than

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 4bc61cfda67a..e5d4b723b7e0 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -28,11 +28,39 @@
 #include "xive-internal.h"
 
 
+/* Convert a priority number to an Interrupt Pending Buffer (IPB)
+ * register, which indicates a pending interrupt at the priority
+ * corresponding to the bit number
+ */
+static uint8_t priority_to_ipb(uint8_t priority)
+{
+    return priority > XIVE_PRIORITY_MAX ? 0 :  1 << (7 - priority);
+}
+
+/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
+ * Interrupt Priority Register (PIPR), which contains the priority of
+ * the most favored pending notification.
+ *
+ * TODO: PIPR can never be OxFF. Needs a fix.
+ */
+static uint8_t ipb_to_pipr(uint8_t ibp)
+{
+    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
+}
+
 static uint64_t spapr_xive_icp_accept(ICPState *icp)
 {
     return 0;
 }
 
+static void spapr_xive_icp_notify(ICPState *icp)
+{
+    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
+        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
+        qemu_irq_raise(ICP(icp)->output);
+    }
+}
+
 static void spapr_xive_icp_set_cppr(ICPState *icp, uint8_t cppr)
 {
     if (cppr > XIVE_PRIORITY_MAX) {
@@ -40,6 +68,10 @@ static void spapr_xive_icp_set_cppr(ICPState *icp, uint8_t cppr)
     }
 
     icp->tima_os[TM_CPPR] = cppr;
+
+    /* CPPR has changed, inform the ICP which might raise an
+     * exception */
+    spapr_xive_icp_notify(icp);
 }
 
 /*
@@ -206,6 +238,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
     XiveEQ *eq;
     uint32_t eq_idx;
     uint32_t priority;
+    uint32_t target;
+    ICPState *icp;
 
     ive = spapr_xive_get_ive(xive, srcno);
     if (!ive || !(ive->w & IVE_VALID)) {
@@ -235,6 +269,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
         qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
     }
 
+    target = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
+    icp = xics_icp_get(xive->ics->xics, target);
+    if (!icp) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for target %d\n", target);
+        return;
+    }
+
     if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
         priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
 
@@ -242,9 +283,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
         if (priority == 0xff) {
             return;
         }
+
+        /* Update the IPB (Interrupt Pending Buffer) with the priority
+         * of the new notification and inform the ICP, which will
+         * decide to raise the exception, or not, depending the CPPR.
+         */
+        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
+        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
     } else {
         qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
     }
+
+    spapr_xive_icp_notify(icp);
 }
 
 /*
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (11 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 12/21] ppc/xive: notify the CPU when interrupt priority is more privileged Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  7:53   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command Cédric Le Goater
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

When an O/S Exception is raised, the O/S acknowledges the interrupt
with a special read in the TIMA. If the EO bit of the Notification
Source Register (NSR) is set (and it should), the Current Processor
Priority Register (CPPR) takes the value of the Pending Interrupt
Priority Register (PIPR), which contains the priority of the most
favored pending notification. The bit number corresponding to the
priority of the pending interrupt is reseted in the Interrupt Pending
Buffer (IPB) and so is the EO bit of the NSR.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index e5d4b723b7e0..ad3ff91b13ea 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -50,7 +50,24 @@ static uint8_t ipb_to_pipr(uint8_t ibp)
 
 static uint64_t spapr_xive_icp_accept(ICPState *icp)
 {
-    return 0;
+    uint8_t nsr = icp->tima_os[TM_NSR];
+
+    qemu_irq_lower(icp->output);
+
+    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
+        uint8_t cppr = icp->tima_os[TM_PIPR];
+
+        icp->tima_os[TM_CPPR] = cppr;
+
+        /* Reset the pending buffer bit */
+        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
+        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
+
+        /* Drop Exception bit for OS */
+        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
+    }
+
+    return (nsr << 8) | icp->tima_os[TM_CPPR];
 }
 
 static void spapr_xive_icp_notify(ICPState *icp)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (12 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  7:55   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 15/21] spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument Cédric Le Goater
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

Adjusting the Interrupt Pending Buffer for the O/S would allow a CPU
to process event queues of other priorities during one physical
interrupt cycle. This is not currently used by the XIVE support for
sPAPR in Linux but it is by the hypervisor.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index ad3ff91b13ea..ad3f03e37401 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -162,7 +162,14 @@ static bool spapr_xive_tm_is_readonly(uint8_t index)
 static void spapr_xive_tm_write_special(ICPState *icp, hwaddr offset,
                                   uint64_t value, unsigned size)
 {
-    /* TODO: support TM_SPC_SET_OS_PENDING */
+    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
+        icp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
+        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
+        spapr_xive_icp_notify(icp);
+    } else {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
+                      HWADDR_PRIx" size %d\n", offset, size);
+    }
 
     /* TODO: support TM_SPC_ACK_OS_EL */
 }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 15/21] spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (13 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  7:56   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 16/21] spapr: add a XIVE object to the sPAPR machine Cédric Le Goater
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

This adds some flexibility in the definition of the number of
available IRQS used in a sPAPR machine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c              | 2 +-
 hw/ppc/spapr_pci.c          | 4 ++--
 include/hw/pci-host/spapr.h | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3e3ff1fbc988..5d69df928434 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1093,7 +1093,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     }
 
     QLIST_FOREACH(phb, &spapr->phbs, list) {
-        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
+        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, XICS_IRQS_SPAPR);
         if (ret < 0) {
             error_report("couldn't setup PCI devices in fdt");
             exit(1);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index d84abf1070a0..05b0a067458e 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -2073,7 +2073,7 @@ static void spapr_phb_pci_enumerate(sPAPRPHBState *phb)
 
 int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
-                          void *fdt)
+                          void *fdt, int nr_irqs)
 {
     int bus_off, i, j, ret;
     char nodename[FDT_NAME_MAX];
@@ -2142,7 +2142,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
     _FDT(fdt_setprop(fdt, bus_off, "ranges", &ranges, sizeof_ranges));
     _FDT(fdt_setprop(fdt, bus_off, "reg", &bus_reg, sizeof(bus_reg)));
     _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
-    _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS_SPAPR));
+    _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", nr_irqs));
 
     /* Dynamic DMA window */
     if (phb->ddw_enabled) {
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 38470b2f0e5c..40146f72c103 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -115,7 +115,7 @@ PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index);
 
 int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
-                          void *fdt);
+                          void *fdt, int nr_irqs);
 
 void spapr_pci_rtas_init(void);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 16/21] spapr: add a XIVE object to the sPAPR machine
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (14 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 15/21] spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  8:38   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 17/21] ppc/xive: add hcalls support Cédric Le Goater
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

If the machine supports XIVE (POWER9 CPU), create a XIVE object. The
CAS negotiation process will decide which model (legacy or XIVE) will
be used for the interrupt controller depending on the guest
capabilities.

Also extend the number of provisionned IRQs with the number of CPUs,
this is required for XIVE which allocates one IRQ number for each IPI.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c         | 63 ++++++++++++++++++++++++++++++++++++++++++++++++--
 include/hw/ppc/spapr.h |  2 ++
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5d69df928434..b6577dbecdea 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -44,6 +44,7 @@
 #include "mmu-hash64.h"
 #include "mmu-book3s-v3.h"
 #include "qom/cpu.h"
+#include "target/ppc/cpu-models.h"
 
 #include "hw/boards.h"
 #include "hw/ppc/ppc.h"
@@ -54,6 +55,7 @@
 #include "hw/ppc/spapr_vio.h"
 #include "hw/pci-host/spapr.h"
 #include "hw/ppc/xics.h"
+#include "hw/ppc/spapr_xive.h"
 #include "hw/pci/msi.h"
 
 #include "hw/pci/pci.h"
@@ -202,6 +204,35 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
     }
 }
 
+static sPAPRXive *spapr_spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
+                               int nr_servers, Error **errp)
+{
+    Error *local_err = NULL;
+    Object *obj;
+
+    obj = object_new(TYPE_SPAPR_XIVE);
+    object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
+    object_property_add_const_link(obj, "ics", OBJECT(spapr->ics),
+                                   &error_abort);
+    object_property_set_int(obj, nr_irqs, "nr-irqs",  &local_err);
+    if (local_err) {
+        goto error;
+    }
+    object_property_set_int(obj, nr_servers, "nr-targets", &local_err);
+    if (local_err) {
+        goto error;
+    }
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        goto error;
+    }
+
+    return SPAPR_XIVE(obj);
+error:
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
                                   int smt_threads)
 {
@@ -1093,7 +1124,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     }
 
     QLIST_FOREACH(phb, &spapr->phbs, list) {
-        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, XICS_IRQS_SPAPR);
+        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt,
+                                    XICS_IRQS_SPAPR + xics_max_server_number());
         if (ret < 0) {
             error_report("couldn't setup PCI devices in fdt");
             exit(1);
@@ -2140,6 +2172,16 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
     g_free(type);
 }
 
+/*
+ * Only POWER9 Processor chips support the XIVE interrupt controller
+ */
+static bool ppc_support_xive(MachineState *machine)
+{
+   PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(first_cpu);
+
+   return pcc->pvr_match(pcc, CPU_POWERPC_POWER9_BASE);
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(MachineState *machine)
 {
@@ -2237,7 +2279,8 @@ static void ppc_spapr_init(MachineState *machine)
     load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
 
     /* Set up Interrupt Controller before we create the VCPUs */
-    xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
+    xics_system_init(machine, XICS_IRQS_SPAPR + xics_max_server_number(),
+                     &error_fatal);
 
     /* Set up containers for ibm,client-set-architecture negotiated options */
     spapr->ov5 = spapr_ovec_new();
@@ -2274,6 +2317,22 @@ static void ppc_spapr_init(MachineState *machine)
 
     spapr_init_cpus(spapr);
 
+    /* Set up XIVE. CAS will choose whether the guest runs in XICS
+     * (legacy mode) or XIVE Exploitation mode
+     *
+     * We don't have KVM support yet, so check for irqchip=on
+     */
+    if (ppc_support_xive(machine)) {
+        if (kvm_enabled() && machine_kernel_irqchip_required(machine)) {
+            error_report("kernel_irqchip requested. no XIVE support");
+        } else {
+            spapr->xive = spapr_spapr_xive_create(spapr,
+                               XICS_IRQS_SPAPR + xics_max_server_number(),
+                               xics_max_server_number(),
+                               &error_fatal);
+        }
+    }
+
     if (kvm_enabled()) {
         /* Enable H_LOGICAL_CI_* so SLOF can talk to in-kernel devices */
         kvmppc_enable_logical_ci_hcalls();
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 2a303a705c17..6cd5ab73c5dc 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -14,6 +14,7 @@ struct sPAPRNVRAM;
 typedef struct sPAPREventLogEntry sPAPREventLogEntry;
 typedef struct sPAPREventSource sPAPREventSource;
 typedef struct sPAPRPendingHPT sPAPRPendingHPT;
+typedef struct sPAPRXive sPAPRXive;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -127,6 +128,7 @@ struct sPAPRMachineState {
     MemoryHotplugState hotplug_memory;
 
     const char *icp_type;
+    sPAPRXive  *xive;
 };
 
 #define H_SUCCESS         0
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 17/21] ppc/xive: add hcalls support
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (15 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 16/21] spapr: add a XIVE object to the sPAPR machine Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support Cédric Le Goater
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

A set of Hypervisor's call are used to configure the interrupt sources
and the event/notification queues of the guest:

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines to which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification config associated with the
   queue, only unconditional notification for the moment.  Reset is
   performed with a queue size of 0 and queueing is disabled in that
   case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the partition's interrupt exploitation structures to
   their initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure sure all
   notifications have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

See the code for more documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/Makefile.objs       |   2 +-
 hw/intc/spapr_xive_hcall.c  | 876 ++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr.c              |   2 +
 include/hw/ppc/spapr.h      |  15 +-
 include/hw/ppc/spapr_xive.h |   4 +
 5 files changed, 897 insertions(+), 2 deletions(-)
 create mode 100644 hw/intc/spapr_xive_hcall.c

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 2dae80bdf611..00a9aea2dd29 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -35,7 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
-obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
+obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o spapr_xive_hcall.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
new file mode 100644
index 000000000000..4c77b65683de
--- /dev/null
+++ b/hw/intc/spapr_xive_hcall.c
@@ -0,0 +1,876 @@
+/*
+ * QEMU PowerPC sPAPR XIVE model
+ *
+ * Copyright (c) 2017, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "cpu.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/fdt.h"
+#include "monitor/monitor.h"
+
+#include "xive-internal.h"
+
+/*
+ * TODO: check the valid priorities from the ranges listed in the
+ * "ibm,plat-res-int-priorities" property. Be simple for the moment.
+ */
+static bool priority_is_valid(int priority)
+{
+    return priority >= 0 && priority < 8;
+}
+
+/*
+ * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
+ * real address of the MMIO page through which the Event State Buffer
+ * entry associated with the value of the "lisn" parameter is managed.
+ *
+ * Parameters:
+ * Input
+ * - "flags"
+ *       Bits 0-63 reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as returned
+ *       by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output
+ * - R4: "flags"
+ *       Bits 0-59: Reserved
+ *       Bit 60: H_INT_ESB must be used for Event State Buffer
+ *               management
+ *       Bit 61: 1 == LSI  0 == MSI
+ *       Bit 62: the full function page supports trigger
+ *       Bit 63: Store EOI Supported
+ * - R5: Logical Real address of full function Event State Buffer
+ *       management page, -1 if ESB hcall flag is set to 1.
+ * - R6: Logical Real Address of trigger only Event State Buffer
+ *       management page or -1.
+ * - R7: Power of 2 page size for the ESB management pages returned in
+ *       R5 and R6.
+ */
+static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
+                                          sPAPRMachineState *spapr,
+                                          target_ulong opcode,
+                                          target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveIVE *ive;
+    target_ulong flags  = args[0];
+    target_ulong lisn   = args[1];
+    uint64_t mmio_base;
+    ICSIRQState *irq;
+    uint32_t srcno = lisn - spapr->ics->offset;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    ive = spapr_xive_get_ive(spapr->xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    mmio_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * srcno;
+    irq = &spapr->ics->irqs[srcno];
+
+    args[0] = 0;
+    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+        args[0] |= XIVE_SRC_LSI;
+    }
+    if (xive->flags & XIVE_SRC_TRIGGER) {
+        args[0] |= XIVE_SRC_TRIGGER;
+    }
+
+    if (xive->flags & XIVE_SRC_H_INT_ESB) {
+        args[1] = -1; /* never used in QEMU  */
+        args[2] = -1;
+    } else {
+        args[1] = mmio_base;
+        if (xive->flags & XIVE_SRC_TRIGGER) {
+            args[2] = -1; /* No specific trigger page */
+        } else {
+            args[2] = -1; /* TODO: support for specific trigger page */
+        }
+    }
+
+    args[3] = xive->esb_shift;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
+ * Interrupt Source to a target. The Logical Interrupt Source is
+ * designated with the "lisn" parameter and the target is designated
+ * with the "target" and "priority" parameters.  Upon return from the
+ * hcall(), no additional interrupts will be directed to the old EQ.
+ *
+ * TODO: The old EQ should be investigated for interrupts that
+ * occurred prior to or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-61: Reserved
+ *      Bit 62: set the "eisn" in the EA
+ *      Bit 63: masks the interrupt source in the hardware interrupt
+ *      control structure. An interrupt masked by this mechanism will
+ *      be dropped, but it's source state bits will still be
+ *      set. There is no race-free way of unmasking and restoring the
+ *      source. Thus this should only be used in interrupts that are
+ *      also masked at the source, and only in cases where the
+ *      interrupt is not meant to be used for a large amount of time
+ *      because no valid target exists for it for example
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as returned by
+ *      the H_ALLOCATE_VAS_WINDOW hcall
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *      "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *      "ibm,plat-res-int-priorities"
+ * - "eisn" is the guest EISN associated with the "lisn"
+ *
+ * Output:
+ * - None
+ */
+
+#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
+#define XIVE_SRC_MASK     (1ull << (63 - 63))
+
+static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    XiveIVE *ive;
+    uint64_t new_ive;
+    target_ulong flags    = args[0];
+    target_ulong lisn     = args[1];
+    target_ulong target   = args[2];
+    target_ulong priority = args[3];
+    target_ulong eisn     = args[4];
+    uint32_t eq_idx;
+    uint32_t srcno = lisn - spapr->ics->offset;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~(XIVE_SRC_SET_EISN | XIVE_SRC_MASK)) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    ive = spapr_xive_get_ive(spapr->xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    /* priority 0xff is used to reset the IVE */
+    if (priority == 0xff) {
+        new_ive = IVE_VALID | IVE_MASKED;
+        goto out;
+    }
+
+    new_ive = ive->w;
+
+    if (flags & XIVE_SRC_MASK) {
+        new_ive = ive->w | IVE_MASKED;
+    } else {
+        new_ive = ive->w & ~IVE_MASKED;
+    }
+
+    if (!priority_is_valid(priority)) {
+        return H_P4;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P3;
+    }
+
+    new_ive = SETFIELD(IVE_EQ_BLOCK, new_ive, 0ul);
+    new_ive = SETFIELD(IVE_EQ_INDEX, new_ive, eq_idx);
+
+    if (flags & XIVE_SRC_SET_EISN) {
+        new_ive = SETFIELD(IVE_EQ_DATA, new_ive, eisn);
+    }
+
+out:
+    /* TODO: handle syncs ? */
+
+    /* And update */
+    ive->w = new_ive;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
+ * target/priority pair is assigned to the specified Logical Interrupt
+ * Source.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63 Reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as
+ *      returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - R4: Target to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R5: Priority to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R6: EISN for the specified Logical Interrupt Source (this will be
+ *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
+ */
+static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong lisn = args[1];
+    XiveIVE *ive;
+    XiveEQ *eq;
+    uint32_t eq_idx;
+    uint32_t srcno = lisn - spapr->ics->offset;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    ive = spapr_xive_get_ive(spapr->xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
+    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!eq) {
+        return H_HARDWARE;
+    }
+
+    args[0] = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
+
+    if (ive->w & IVE_MASKED) {
+        args[1] = 0xff;
+    } else {
+        args[1] = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
+    }
+
+    args[2] = GETFIELD(IVE_EQ_DATA, ive->w);
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
+ * address of the notification management page associated with the
+ * specified target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *       Bits 0-63 Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: Logical real address of notification page
+ * - R5: Power of 2 page size of the notification page
+ */
+static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
+                                         sPAPRMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    uint32_t eq_idx;
+    XiveEQ *eq;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!eq)  {
+        return H_HARDWARE;
+    }
+
+    args[0] = -1; /* TODO: return ESn page */
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        args[1] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
+    } else {
+        args[1] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
+ * a given "target" and "priority".  It is also used to set the
+ * notification config associated with the EQ.  An EQ size of 0 is
+ * used to reset the EQ config for a given target and priority. If
+ * resetting the EQ config, the END associated with the given "target"
+ * and "priority" will be changed to disable queueing.
+ *
+ * Upon return from the hcall(), no additional interrupts will be
+ * directed to the old EQ (if one was set). The old EQ (if one was
+ * set) should be investigated for interrupts that occurred prior to
+ * or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      Bit 63: Unconditional Notify (n) per the XIVE spec
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ * - "eventQueue": The logical real address of the start of the EQ
+ * - "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
+ *
+ * Output:
+ * - None
+ */
+
+#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
+
+static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    target_ulong qpage    = args[3];
+    target_ulong qsize    = args[4];
+    uint32_t eq_idx;
+    XiveEQ *old_eq;
+    XiveEQ eq;
+    uint32_t qdata;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_EQ_ALWAYS_NOTIFY) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    old_eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!old_eq)  {
+        return H_HARDWARE;
+    }
+
+    eq = *old_eq;
+
+    switch (qsize) {
+    case 12:
+    case 16:
+    case 21:
+    case 24:
+        eq.w3 = ((uint64_t)qpage) & 0xffffffff;
+        eq.w2 = (((uint64_t)qpage)) >> 32 & 0x0fffffff;
+        eq.w0 |= EQ_W0_ENQUEUE;
+        eq.w0 = SETFIELD(EQ_W0_QSIZE, eq.w0, qsize - 12);
+        break;
+    case 0:
+        /* reset queue and disable queueing */
+        eq.w2 = eq.w3 = 0;
+        eq.w0 &= ~EQ_W0_ENQUEUE;
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid EQ size %"PRIx64"\n",
+                      __func__, qsize);
+        return H_P5;
+    }
+
+    if (qsize) {
+        /*
+         * Let's validate the EQ address with a read of the first EQ
+         * entry. We could also check that the full queue has been
+         * zeroed by the OS.
+         */
+        if (address_space_read(&address_space_memory, qpage,
+                               MEMTXATTRS_UNSPECIFIED,
+                               (uint8_t *) &qdata, sizeof(qdata))) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to read EQ data @0x%"
+                          HWADDR_PRIx "\n", __func__, qpage);
+            return H_P4;
+        }
+    }
+
+    /* Ensure the priority and target are correctly set (they will not
+     * be right after allocation)
+     */
+    eq.w6 = SETFIELD(EQ_W6_NVT_BLOCK, 0ul, 0ul) |
+        SETFIELD(EQ_W6_NVT_INDEX, 0ul, target);
+    eq.w7 = SETFIELD(EQ_W7_F0_PRIORITY, 0ul, priority);
+
+    /* TODO: depends on notitification page (ESn) from H_INT_GET_QUEUE_INFO */
+    if (flags & XIVE_EQ_ALWAYS_NOTIFY) {
+        eq.w0 |= EQ_W0_UCOND_NOTIFY;
+    }
+
+    /* The generation bit for the EQ starts at 1 and The EQ page
+     * offset counter starts at 0.
+     */
+    eq.w1 = EQ_W1_GENERATION | SETFIELD(EQ_W1_PAGE_OFF, 0ul, 0ul);
+    eq.w0 |= EQ_W0_VALID;
+
+    /* TODO: issue syncs required to ensure all in-flight interrupts
+     * are complete on the old EQ */
+
+    /* Update EQ */
+    *old_eq = eq;
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
+ * target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      Bit 63: Debug: Return debug data
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: "flags":
+ *       Bits 0-62: Reserved
+ *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
+ * - R5: The logical real address of the start of the EQ
+ * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
+ * - R7: The value of Event Queue Offset Counter per XIVE spec
+ *       if "Debug" = 1, else 0
+ *
+ */
+
+#define XIVE_EQ_DEBUG     (1ull << (63 - 63))
+
+static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    target_ulong flags    = args[0];
+    target_ulong target   = args[1];
+    target_ulong priority = args[2];
+    uint32_t eq_idx;
+    XiveEQ *eq;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_EQ_DEBUG) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (!priority_is_valid(priority)) {
+        return H_P3;
+    }
+
+   /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the EQ corresponding to the
+     * target.
+     */
+    if (!spapr_xive_eq_for_target(spapr->xive, target, priority, &eq_idx)) {
+        return H_P2;
+    }
+
+    /* TODO: If the partition thread count is greater than the
+     * hardware thread count, validate the "target" has a
+     * corresponding hardware thread else return H_NOT_AVAILABLE.
+     */
+
+    eq = spapr_xive_get_eq(spapr->xive, eq_idx);
+    if (!eq)  {
+        return H_HARDWARE;
+    }
+
+    args[0] = 0;
+    if (eq->w0 & EQ_W0_UCOND_NOTIFY) {
+        args[0] |= XIVE_EQ_ALWAYS_NOTIFY;
+    }
+
+    if (eq->w0 & EQ_W0_ENQUEUE) {
+        args[1] =
+            (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
+        args[2] = GETFIELD(EQ_W0_QSIZE, eq->w0) + 12;
+    } else {
+        args[1] = 0;
+        args[2] = 0;
+    }
+
+    /* TODO: do we need any locking on the EQ ? */
+    if (flags & XIVE_EQ_DEBUG) {
+        /* Load the event queue generation number into the return flags */
+        args[0] |= GETFIELD(EQ_W1_GENERATION, eq->w1);
+
+        /* Load R7 with the event queue offset counter */
+        args[3] = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
+ * reporting cache line pair for the calling thread.  The reporting
+ * cache lines will contain the OS interrupt context when the OS
+ * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
+ * interrupt. The reporting cache lines can be reset by inputting -1
+ * in "reportingLine".  Issuing the CI store byte without reporting
+ * cache lines registered will result in the data not being accessible
+ * to the OS.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "reportingLine": The logical real address of the reporting cache
+ *    line pair
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_SET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
+ * real address of the reporting cache line pair set for the input
+ * "target".  If no reporting cache line pair has been set, -1 is
+ * returned.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - "reportingLine": The logical real address of the reporting cache
+ *   line pair
+ *
+ * Output:
+ * - R4: The logical real address of the reporting line if set, else -1
+ */
+static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_GET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_ESB hcall() is used to issue a load or store to the ESB
+ * page for the input "lisn".  This hcall is only supported for LISNs
+ * that have the ESB hcall flag set to 1 when returned from hcall()
+ * H_INT_GET_SOURCE_INFO.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-62: Reserved
+ *      bit 63: Store: Store=1, store operation, else load operation
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as
+ *      returned by the H_ALLOCATE_VAS_WINDOW hcall
+ * - "esbOffset" is the offset into the ESB page for the load or store operation
+ * - "storeData" is the data to write for a store operation
+ *
+ * Output:
+ * - R4: R4: The value of the load if load operation, else -1
+ */
+
+#define XIVE_ESB_STORE (1ull << (63 - 63))
+
+static target_ulong h_int_esb(PowerPCCPU *cpu,
+                              sPAPRMachineState *spapr,
+                              target_ulong opcode,
+                              target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveIVE *ive;
+    target_ulong flags   = args[0];
+    target_ulong lisn    = args[1];
+    target_ulong offset  = args[2];
+    target_ulong data    = args[3];
+    uint64_t esb_base;
+    uint32_t srcno = lisn - spapr->ics->offset;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~XIVE_ESB_STORE) {
+        return H_PARAMETER;
+    }
+
+    ive = spapr_xive_get_ive(xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    if (offset > (1ull << xive->esb_shift)) {
+        return H_P3;
+    }
+
+    srcno = lisn - spapr->ics->offset;
+    esb_base = (uint64_t)xive->esb_base + (1ull << xive->esb_shift) * srcno;
+    esb_base += offset;
+
+    if (dma_memory_rw(&address_space_memory, esb_base, &data, 8,
+                      (flags & XIVE_ESB_STORE))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to rw data @0x%"
+                      HWADDR_PRIx "\n", __func__, esb_base);
+        return H_HARDWARE;
+    }
+    args[0] = (flags & XIVE_ESB_STORE) ? -1 : data;
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SYNC hcall() is used to issue hardware syncs that will
+ * ensure any in flight events for the input lisn are in the event
+ * queue.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ * - "lisn" is per "interrupts", "interrupt-map", or
+ *      "ibm,xive-lisn-ranges" properties, or as returned by the
+ *      ibm,query-interrupt-source-number RTAS call, or as
+ *      returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_sync(PowerPCCPU *cpu,
+                               sPAPRMachineState *spapr,
+                               target_ulong opcode,
+                               target_ulong *args)
+{
+    XiveIVE *ive;
+    target_ulong flags   = args[0];
+    target_ulong lisn    = args[1];
+    uint32_t srcno = lisn - spapr->ics->offset;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    ive = spapr_xive_get_ive(spapr->xive, srcno);
+    if (!ive || !(ive->w & IVE_VALID)) {
+        return H_P2;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* This is not real hardware. Nothing to be done */
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_RESET hcall() is used to reset all of the partition's
+ * interrupt exploitation structures to their initial state.  This
+ * means losing all previously set interrupt state set via
+ * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
+ *
+ * Parameters:
+ * Input:
+ * - "flags"
+ *      Bits 0-63: Reserved
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_reset(PowerPCCPU *cpu,
+                                sPAPRMachineState *spapr,
+                                target_ulong opcode,
+                                target_ulong *args)
+{
+    target_ulong flags   = args[0];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    spapr_xive_reset(spapr->xive);
+    return H_SUCCESS;
+}
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr)
+{
+    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
+    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
+    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
+    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
+    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
+                             h_int_set_os_reporting_line);
+    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
+                             h_int_get_os_reporting_line);
+    spapr_register_hypercall(H_INT_ESB, h_int_esb);
+    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
+    spapr_register_hypercall(H_INT_RESET, h_int_reset);
+}
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b6577dbecdea..c2011cb2dc72 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -227,6 +227,8 @@ static sPAPRXive *spapr_spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
         goto error;
     }
 
+    spapr_xive_hcall_init(spapr);
+
     return SPAPR_XIVE(obj);
 error:
     error_propagate(errp, local_err);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6cd5ab73c5dc..b7683fae6415 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -387,7 +387,20 @@ struct sPAPRMachineState {
 #define H_INVALIDATE_PID        0x378
 #define H_REGISTER_PROC_TBL     0x37C
 #define H_SIGNAL_SYS_RESET      0x380
-#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
+
+#define H_INT_GET_SOURCE_INFO   0x3A8
+#define H_INT_SET_SOURCE_CONFIG 0x3AC
+#define H_INT_GET_SOURCE_CONFIG 0x3B0
+#define H_INT_GET_QUEUE_INFO    0x3B4
+#define H_INT_SET_QUEUE_CONFIG  0x3B8
+#define H_INT_GET_QUEUE_CONFIG  0x3BC
+#define H_INT_SET_OS_REPORTING_LINE 0x3C0
+#define H_INT_GET_OS_REPORTING_LINE 0x3C4
+#define H_INT_ESB               0x3C8
+#define H_INT_SYNC              0x3CC
+#define H_INT_RESET             0x3D0
+
+#define MAX_HCALL_OPCODE        H_INT_RESET
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 3af01a0a4b22..ae5ff89533c0 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -66,4 +66,8 @@ struct sPAPRXive {
     MemoryRegion tm_iomem;
 };
 
+typedef struct sPAPRMachineState sPAPRMachineState;
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (16 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 17/21] ppc/xive: add hcalls support Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  8:44   ` David Gibson
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 19/21] ppc/xive: introduce a helper to map the XIVE memory regions Cédric Le Goater
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

Like for XICS, the XIVE interface for the guest is described in the
device tree under the "interrupt-controller" node. A couple of new
properties are specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), also called rings, for the User level and
   for the Guest OS level. Only the Guest OS level is taken into
   account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the interrupt numbers ranges assigned to the guest. These are
   allocated using a simple bitmap.

and also under the root node :

 - "ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use. Simulate ranges as defined by the PowerVM Hypervisor.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive_hcall.c  | 54 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_xive.h |  1 +
 2 files changed, 55 insertions(+)

diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
index 4c77b65683de..7b19ea6373dd 100644
--- a/hw/intc/spapr_xive_hcall.c
+++ b/hw/intc/spapr_xive_hcall.c
@@ -874,3 +874,57 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_INT_SYNC, h_int_sync);
     spapr_register_hypercall(H_INT_RESET, h_int_reset);
 }
+
+void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle)
+{
+    int node;
+    uint64_t timas[2 * 2];
+    uint32_t lisn_ranges[] = {
+        cpu_to_be32(xive->nr_irqs - xive->nr_targets + xive->ics->offset),
+        cpu_to_be32(xive->nr_targets),
+    };
+    uint32_t eq_sizes[] = {
+        cpu_to_be32(12), /* 4K */
+        cpu_to_be32(16), /* 64K */
+        cpu_to_be32(21), /* 2M */
+        cpu_to_be32(24), /* 16M */
+    };
+
+    /* Use some ranges to exercise the Linux driver, which should
+     * result in Linux choosing priority 6. This is not strictly
+     * necessary
+     */
+    uint32_t reserved_priorities[] = {
+        cpu_to_be32(1),  /* start */
+        cpu_to_be32(2),  /* count */
+        cpu_to_be32(7),  /* start */
+        cpu_to_be32(0xf8),  /* count */
+    };
+    int i;
+
+    /* Thread Interrupt Management Areas : User and OS */
+    for (i = 0; i < 2; i++) {
+        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
+        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
+    }
+
+    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
+
+    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
+    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
+
+    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
+                     sizeof(eq_sizes)));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
+                     sizeof(lisn_ranges)));
+
+    /* For SLOF */
+    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
+    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
+
+    /* top properties */
+    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
+                     reserved_priorities, sizeof(reserved_priorities)));
+}
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index ae5ff89533c0..0a156f2d8591 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -69,5 +69,6 @@ struct sPAPRXive {
 typedef struct sPAPRMachineState sPAPRMachineState;
 
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle);
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 19/21] ppc/xive: introduce a helper to map the XIVE memory regions
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (17 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 20/21] ppc/xics: introduce a qirq_get() helper in the XICSFabric Cédric Le Goater
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

It will be used when the guest chooses the XIVE exploitation mode in CAS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c        | 12 ++++++++++++
 include/hw/ppc/spapr_xive.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index ad3f03e37401..adcbbc6ec245 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -807,3 +807,15 @@ bool spapr_xive_eq_for_target(sPAPRXive *xive, uint32_t target,
 
     return true;
 }
+
+void spapr_xive_mmio_map(sPAPRXive *xive)
+{
+    /* ESBs */
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->esb_base);
+
+    /* Thread Management Interrupt Areas */
+    /* TODO: Only map the OS TIMA for the moment. Mapping the whole
+     * region needs some rework in the handlers */
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1,
+                    xive->tm_base + (1 << xive->tm_shift));
+}
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 0a156f2d8591..13cf10f365d8 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -70,5 +70,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
 void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle);
+void spapr_xive_mmio_map(sPAPRXive *xive);
 
 #endif /* PPC_SPAPR_XIVE_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 20/21] ppc/xics: introduce a qirq_get() helper in the XICSFabric
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (18 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 19/21] ppc/xive: introduce a helper to map the XIVE memory regions Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 21/21] spapr: activate XIVE exploitation mode Cédric Le Goater
  2017-09-19  8:20 ` [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
  21 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

It will be used to choose the appropriate set of qirqs when XIVE is
activated.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics.c        |  7 +------
 hw/ppc/spapr.c        | 12 ++++++++++++
 include/hw/ppc/xics.h |  1 +
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 927d4fec966a..7691492aa17b 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -685,13 +685,8 @@ static const TypeInfo xics_fabric_info = {
 qemu_irq xics_get_qirq(XICSFabric *xi, int irq)
 {
     XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
-    ICSState *ics = xic->ics_get(xi, irq);
 
-    if (ics) {
-        return ics->qirqs[irq - ics->offset];
-    }
-
-    return NULL;
+    return xic->qirq_get(xi, irq);
 }
 
 ICPState *xics_icp_get(XICSFabric *xi, int server)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c2011cb2dc72..d8b25be70cd8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3455,6 +3455,17 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
     *mmio64 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM64_WIN_SIZE;
 }
 
+static qemu_irq spapr_qirq_get(XICSFabric *dev, int irq)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(dev);
+
+    if (!ics_valid_irq(spapr->ics, irq)) {
+        return NULL;
+    }
+
+    return spapr->ics->qirqs[irq - spapr->ics->offset];
+}
+
 static ICSState *spapr_ics_get(XICSFabric *dev, int irq)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(dev);
@@ -3539,6 +3550,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     vhc->unmap_hptes = spapr_unmap_hptes;
     vhc->store_hpte = spapr_store_hpte;
     vhc->get_patbe = spapr_get_patbe;
+    xic->qirq_get = spapr_qirq_get;
     xic->ics_get = spapr_ics_get;
     xic->ics_resend = spapr_ics_resend;
     xic->icp_get = spapr_icp_get;
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index c835997303c4..46d2fc1ef2c1 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -178,6 +178,7 @@ struct XICSFabric {
 
 typedef struct XICSFabricClass {
     InterfaceClass parent;
+    qemu_irq (*qirq_get)(XICSFabric *xi, int irq);
     ICSState *(*ics_get)(XICSFabric *xi, int irq);
     void (*ics_resend)(XICSFabric *xi);
     ICPState *(*icp_get)(XICSFabric *xi, int server);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [RFC PATCH v2 21/21] spapr: activate XIVE exploitation mode
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (19 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 20/21] ppc/xics: introduce a qirq_get() helper in the XICSFabric Cédric Le Goater
@ 2017-09-11 17:12 ` Cédric Le Goater
  2017-09-19  8:20 ` [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
  21 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-11 17:12 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf
  Cc: Cédric Le Goater

A couple of adjustments need to be done to activate XIVE exploitation
mode. First, the hypervisor should advertise support for both models
XIVE legacy and XIVE exploitation in "ibm,arch-vec-5-platform-support".

The sPAPR machine starts with the XICS interrupt model (the default
behavior could be changed later on for POWER9) and, depending on the
guest capabilities, the XIVE exploitation mode is negotiated during
CAS. A reset is then performed to rebuild the device tree with new
XIVE properties under the "interrupt-controller" node.

Finally, the MMIO regions for the ESB and TIMA should be mapped at
reset time and post_load when XIVE exploitation mode is on.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c       | 33 ++++++++++++++++++++++++++++++---
 hw/ppc/spapr_hcall.c |  6 ++++++
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d8b25be70cd8..aaf1be7a50fe 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -942,7 +942,8 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt)
 /* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU features
  * that the guest may request and thus the valid values for bytes 24..26 of
  * option vector 5: */
-static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
+static void spapr_dt_ov5_platform_support(sPAPRMachineState *spapr,
+                                          void *fdt, int chosen)
 {
     PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
 
@@ -961,6 +962,13 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
         } else {
             val[3] = 0x00; /* Hash */
         }
+
+        /* TODO: introduce a kvmppc_has_cap_xive() ? Works with
+         * irqchip=off for now
+         */
+        if (spapr->xive) {
+            val[1] = 0x80; /* OV5_XIVE_BOTH */
+        }
     } else {
         if (first_ppc_cpu->env.mmu_model & POWERPC_MMU_V3) {
             /* V3 MMU supports both hash and radix (with dynamic switching) */
@@ -969,6 +977,9 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
             /* Otherwise we can only do hash */
             val[3] = 0x00;
         }
+        if (spapr->xive) {
+            val[1] = 0x80;  /* OV5_XIVE_BOTH */
+        }
     }
     _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
                      val, sizeof(val)));
@@ -1027,7 +1038,7 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt)
         _FDT(fdt_setprop_string(fdt, chosen, "linux,stdout-path", stdout_path));
     }
 
-    spapr_dt_ov5_platform_support(fdt, chosen);
+    spapr_dt_ov5_platform_support(spapr, fdt, chosen);
 
     g_free(stdout_path);
     g_free(bootlist);
@@ -1106,7 +1117,13 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
     /* /interrupt controller */
-    spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
+    } else {
+        /* populate device tree for XIVE */ ;
+        spapr_xive_populate(spapr->xive, fdt, PHANDLE_XICP);
+        spapr_xive_mmio_map(spapr->xive);
+    }
 
     ret = spapr_populate_memory(spapr, fdt);
     if (ret < 0) {
@@ -1552,6 +1569,10 @@ static int spapr_post_load(void *opaque, int version_id)
         }
     }
 
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_xive_mmio_map(spapr->xive);
+    }
+
     return err;
 }
 
@@ -2332,6 +2353,7 @@ static void ppc_spapr_init(MachineState *machine)
                                XICS_IRQS_SPAPR + xics_max_server_number(),
                                xics_max_server_number(),
                                &error_fatal);
+            spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
         }
     }
 
@@ -3463,6 +3485,11 @@ static qemu_irq spapr_qirq_get(XICSFabric *dev, int irq)
         return NULL;
     }
 
+    /* use XIVE qirqs when XIVE exploitation mode is on */
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return spapr->xive->qirqs[irq - spapr->ics->offset];
+    }
+
     return spapr->ics->qirqs[irq - spapr->ics->offset];
 }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 92f1e21358b8..ba00b8d3fdd6 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1613,6 +1613,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             (spapr_h_cas_compose_response(spapr, args[1], args[2],
                                           ov5_updates) != 0);
     }
+
+    /* We need to rebuild the device tree for XIVE, generate a reset */
+    if (!spapr->cas_reboot) {
+        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT);
+    }
+
     spapr_ovec_cleanup(ov5_updates);
 
     if (spapr->cas_reboot) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE Cédric Le Goater
@ 2017-09-11 22:04   ` Greg Kurz
  2017-09-12  5:47     ` Cédric Le Goater
  2017-09-19  2:44   ` [Qemu-devel] " David Gibson
  1 sibling, 1 reply; 90+ messages in thread
From: Greg Kurz @ 2017-09-11 22:04 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 3202 bytes --]

On Mon, 11 Sep 2017 19:12:18 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> The sPAPR machine first starts with a XICS interrupt model and
> depending on the guest capabilities, the XIVE exploitation mode is
> negotiated during CAS. A reset should then be performed to rebuild the
> device tree but the same IRQ numbers which were allocated by the
> devices prior to reset, when the XICS model was operating, are still
> in use.
> 
> For this purpose, we need a common IRQ number allocator for both the
> interrupt models: XICS legacy or XIVE exploitation. This is what the
> ICSIRQState array of the XICS interrupt source is used for. It also
> contains the LSI/MSI flag of an interrupt which will we need later on.
> 
> So, let's provide a link to the sPAPR ICS object under XIVE to make
> use of it.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 12 ++++++++++++
>  include/hw/ppc/spapr_xive.h |  4 ++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 6d98528fae68..1681affb0848 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -56,6 +56,8 @@ void spapr_xive_reset(void *dev)
>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>  {
>      sPAPRXive *xive = SPAPR_XIVE(dev);
> +    Object *obj;
> +    Error *err = NULL;
>  
>      if (!xive->nr_targets) {
>          error_setg(errp, "Number of interrupt targets needs to be greater 0");
> @@ -68,6 +70,16 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    /* Retrieve SPAPR ICS source to share the IRQ number allocator */
> +    obj = object_property_get_link(OBJECT(dev), "ics", &err);
> +    if (!obj) {
> +        error_setg(errp, "%s: required link 'ics' not found: %s",
> +                   __func__, error_get_pretty(err));
> +        return;

err is leaked if you do this way. Please do this instead:

        error_propagate(errp, err);
        error_prepend(errp, "required link 'ics' not found: ");

Note: I've just sent a patch to fix the same error in XICS :)

> +    }
> +
> +    xive->ics = ICS_BASE(obj);
> +
>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>      xive->sbe = g_malloc0(xive->sbe_size);
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index b17dd4f17b0b..29112589b37f 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -24,6 +24,7 @@
>  typedef struct sPAPRXive sPAPRXive;
>  typedef struct XiveIVE XiveIVE;
>  typedef struct XiveEQ XiveEQ;
> +typedef struct ICSState ICSState;
>  
>  #define TYPE_SPAPR_XIVE "spapr-xive"
>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> @@ -35,6 +36,9 @@ struct sPAPRXive {
>      uint32_t     nr_targets;
>      uint32_t     nr_irqs;
>  
> +    /* IRQ */
> +    ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
> +
>      /* XIVE internal tables */
>      uint8_t      *sbe;
>      uint32_t     sbe_size;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE
  2017-09-11 22:04   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-09-12  5:47     ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-12  5:47 UTC (permalink / raw)
  To: Greg Kurz
  Cc: qemu-ppc, qemu-devel, David Gibson, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/12/2017 12:04 AM, Greg Kurz wrote:
> On Mon, 11 Sep 2017 19:12:18 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> The sPAPR machine first starts with a XICS interrupt model and
>> depending on the guest capabilities, the XIVE exploitation mode is
>> negotiated during CAS. A reset should then be performed to rebuild the
>> device tree but the same IRQ numbers which were allocated by the
>> devices prior to reset, when the XICS model was operating, are still
>> in use.
>>
>> For this purpose, we need a common IRQ number allocator for both the
>> interrupt models: XICS legacy or XIVE exploitation. This is what the
>> ICSIRQState array of the XICS interrupt source is used for. It also
>> contains the LSI/MSI flag of an interrupt which will we need later on.
>>
>> So, let's provide a link to the sPAPR ICS object under XIVE to make
>> use of it.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 12 ++++++++++++
>>  include/hw/ppc/spapr_xive.h |  4 ++++
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 6d98528fae68..1681affb0848 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -56,6 +56,8 @@ void spapr_xive_reset(void *dev)
>>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>  {
>>      sPAPRXive *xive = SPAPR_XIVE(dev);
>> +    Object *obj;
>> +    Error *err = NULL;
>>  
>>      if (!xive->nr_targets) {
>>          error_setg(errp, "Number of interrupt targets needs to be greater 0");
>> @@ -68,6 +70,16 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>          return;
>>      }
>>  
>> +    /* Retrieve SPAPR ICS source to share the IRQ number allocator */
>> +    obj = object_property_get_link(OBJECT(dev), "ics", &err);
>> +    if (!obj) {
>> +        error_setg(errp, "%s: required link 'ics' not found: %s",
>> +                   __func__, error_get_pretty(err));
>> +        return;
> 
> err is leaked if you do this way. Please do this instead:
> 
>         error_propagate(errp, err);
>         error_prepend(errp, "required link 'ics' not found: ");

ok. I will fix. 

C.

> Note: I've just sent a patch to fix the same error in XICS :)
>
>> +    }
>> +
>> +    xive->ics = ICS_BASE(obj);
>> +
>>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>>      xive->sbe = g_malloc0(xive->sbe_size);
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index b17dd4f17b0b..29112589b37f 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -24,6 +24,7 @@
>>  typedef struct sPAPRXive sPAPRXive;
>>  typedef struct XiveIVE XiveIVE;
>>  typedef struct XiveEQ XiveEQ;
>> +typedef struct ICSState ICSState;
>>  
>>  #define TYPE_SPAPR_XIVE "spapr-xive"
>>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
>> @@ -35,6 +36,9 @@ struct sPAPRXive {
>>      uint32_t     nr_targets;
>>      uint32_t     nr_irqs;
>>  
>> +    /* IRQ */
>> +    ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
>> +
>>      /* XIVE internal tables */
>>      uint8_t      *sbe;
>>      uint32_t     sbe_size;
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller Cédric Le Goater
@ 2017-09-19  2:27   ` David Gibson
  2017-09-19 13:15     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  2:27 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1481 bytes --]

On Mon, Sep 11, 2017 at 07:12:15PM +0200, Cédric Le Goater wrote:
> Start with a couple of attributes for the XIVE sPAPR controller
> model. The number of provisionned IRQ is necessary to size the
> different internal XIVE tables, the number of CPUs is also.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

[snip]

> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +
> +    if (!xive->nr_targets) {
> +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
> +        return;
> +    }
> +    /* We need to be able to allocate at least the IPIs */
> +    if (!xive->nr_irqs || xive->nr_irqs < xive->nr_targets) {
> +        error_setg(errp, "Number of interrupts too small");
> +        return;
> +    }
> +}
> +
> +static Property spapr_xive_properties[] = {
> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> +    DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),

I'm a bit uneasy about the number of targets having to be set in
advance: this can make life awkward when CPUs are hotplugged.  I know
there's something similar in xics, but it has caused some hassles, and
we're starting to move away from it.

Do you really need this?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables Cédric Le Goater
@ 2017-09-19  2:39   ` David Gibson
  2017-09-19 13:46     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  2:39 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 13339 bytes --]

On Mon, Sep 11, 2017 at 07:12:17PM +0200, Cédric Le Goater wrote:
> The XIVE interrupt controller of the POWER9 uses a set of tables to
> redirect exception from event sources to CPU threads. Among which we
> choose to model :
> 
>  - the State Bit Entries (SBE), also known as Event State Buffer
>    (ESB). This is a two bit state machine for each event source which
>    is used to trigger events. The bits are named "P" (pending) and "Q"
>    (queued) and can be controlled by MMIO.
> 
>  - the Interrupt Virtualization Entry (IVE) table, also known as Event
>    Assignment Structure (EAS). This table is indexed by the IRQ number
>    and is looked up to find the Event Queue associated with a
>    triggered event.

Both the above are one entry per irq source, yes?  What's the
rationale for having them as parallel tables, rather than bits in a
single per-source structure?

>  - the Event Queue Descriptor (EQD) table, also known as Event
>    Notification Descriptor (END). The EQD contains fields that specify
>    the Event Queue on which event data is posted (and later pulled by
>    the OS) and also a target (or VPD) to notify.
> 
> An additional table was not modeled but we might need to support the
> H_INT_SET_OS_REPORTING_LINE hcall:
> 
>  - the Virtual Processor Descriptor (VPD) table, also known as
>    Notification Virtual Target (NVT).
> 
> The XIVE object is expanded with the tables described above. The size
> of each table depends on the number of provisioned IRQ and the maximum
> number of CPUs in the system. The indexing is very basic and might
> need to be improved for the EQs.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 108 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/intc/xive-internal.h     | 105 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_xive.h |   9 ++++
>  3 files changed, 222 insertions(+)
>  create mode 100644 hw/intc/xive-internal.h
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index c83796519586..6d98528fae68 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -25,11 +25,34 @@
>  #include "hw/ppc/xics.h"
>  #include "hw/ppc/spapr_xive.h"
>  
> +#include "xive-internal.h"
>  
>  /*
>   * Main XIVE object
>   */
>  
> +void spapr_xive_reset(void *dev)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +    int i;
> +
> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
> +    memset(xive->sbe, 0x55, xive->sbe_size);
> +
> +    /* Validate all available IVEs in the IRQ number space. It would
> +     * be more correct to validate only the allocated IRQs but this
> +     * would require some callback routine from the spapr machine into
> +     * XIVE. To be done later.
> +     */
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        XiveIVE *ive = &xive->ivt[i];
> +        ive->w = IVE_VALID | IVE_MASKED;
> +    }
> +
> +    /* clear all EQs */
> +    memset(xive->eqt, 0, xive->nr_eqs * sizeof(XiveEQ));
> +}
> +
>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>  {
>      sPAPRXive *xive = SPAPR_XIVE(dev);
> @@ -44,8 +67,64 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>          error_setg(errp, "Number of interrupts too small");
>          return;
>      }
> +
> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
> +    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
> +    xive->sbe = g_malloc0(xive->sbe_size);
> +
> +    /* Allocate the IVT (Interrupt Virtualization Table) */
> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
> +
> +    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
> +     * for each thread in the system */
> +    xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
> +    xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
> +
> +    qemu_register_reset(spapr_xive_reset, dev);
>  }
>  
> +static const VMStateDescription vmstate_spapr_xive_ive = {
> +    .name = "xive/ive",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT64(w, XiveIVE),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_xive_eq = {
> +    .name = "xive/eq",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT32(w0, XiveEQ),
> +        VMSTATE_UINT32(w1, XiveEQ),
> +        VMSTATE_UINT32(w2, XiveEQ),
> +        VMSTATE_UINT32(w3, XiveEQ),
> +        VMSTATE_UINT32(w4, XiveEQ),
> +        VMSTATE_UINT32(w5, XiveEQ),
> +        VMSTATE_UINT32(w6, XiveEQ),
> +        VMSTATE_UINT32(w7, XiveEQ),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_xive = {
> +    .name = "xive",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_VARRAY_UINT32_ALLOC(sbe, sPAPRXive, sbe_size, 0,
> +                                    vmstate_info_uint8, uint8_t),

Since you're treating the SBE as a packed buffer of u8s anyway, it's
probably simpler to use VMSTATE_BUFFER().  I don't see that you need
the ALLOC - it should have already been allocated on the destination.

Might be worth having a VMSTATE_UINT32_EQUAL to sanity check that
sbe_size is equal at either end.

> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 0,
> +                                    vmstate_spapr_xive_ive, XiveIVE),
> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(eqt, sPAPRXive, nr_eqs, 0,
> +                                    vmstate_spapr_xive_eq, XiveEQ),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static Property spapr_xive_properties[] = {
>      DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
>      DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
> @@ -59,6 +138,7 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
>      dc->realize = spapr_xive_realize;
>      dc->props = spapr_xive_properties;
>      dc->desc = "sPAPR XIVE interrupt controller";
> +    dc->vmsd = &vmstate_xive;
>  }
>  
>  static const TypeInfo spapr_xive_info = {
> @@ -74,3 +154,31 @@ static void spapr_xive_register_types(void)
>  }
>  
>  type_init(spapr_xive_register_types)
> +
> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t idx)
> +{
> +    return idx < xive->nr_irqs ? &xive->ivt[idx] : NULL;
> +}
> +
> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx)
> +{
> +    return idx < xive->nr_eqs ? &xive->eqt[idx] : NULL;
> +}
> +
> +/* TODO: improve EQ indexing. This is very simple and relies on the
> + * fact that target (CPU) numbers start at 0 and are contiguous. It
> + * should be OK for sPAPR.
> + */
> +bool spapr_xive_eq_for_target(sPAPRXive *xive, uint32_t target,
> +                              uint8_t priority, uint32_t *out_eq_idx)
> +{
> +    if (priority > XIVE_PRIORITY_MAX || target >= xive->nr_targets) {
> +        return false;
> +    }
> +
> +    if (out_eq_idx) {
> +        *out_eq_idx = target + priority;

Don't you need to multiply target by XIVE_EQ_PRIORITY_COUNT?

> +    }
> +
> +    return true;
> +}
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> new file mode 100644
> index 000000000000..95184bad5c1d
> --- /dev/null
> +++ b/hw/intc/xive-internal.h
> @@ -0,0 +1,105 @@
> +/*
> + * QEMU PowerPC XIVE model
> + *
> + * Copyright 2016,2017 IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#ifndef _INTC_XIVE_INTERNAL_H
> +#define _INTC_XIVE_INTERNAL_H
> +
> +/* Utilities to manipulate these (originaly from OPAL) */
> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
> +#define SETFIELD(m, v, val)                             \
> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
> +
> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
> +                                 PPC_BIT32(bs))
> +
> +/* IVE/EAS
> + *
> + * One per interrupt source. Targets that interrupt to a given EQ
> + * and provides the corresponding logical interrupt number (EQ data)
> + *
> + * We also map this structure to the escalation descriptor inside
> + * an EQ, though in that case the valid and masked bits are not used.
> + */
> +typedef struct XiveIVE {
> +        /* Use a single 64-bit definition to make it easier to
> +         * perform atomic updates
> +         */
> +        uint64_t        w;
> +#define IVE_VALID       PPC_BIT(0)
> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
> +} XiveIVE;
> +
> +/* EQ */
> +typedef struct XiveEQ {
> +        uint32_t        w0;

It'd be nice if IBM came up with better names for its fields thatn w0,
w1, etc.   Oh well.

> +#define EQ_W0_VALID             PPC_BIT32(0)
> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
> +#define EQ_W0_SW0               PPC_BIT32(16)
> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
> +#define EQ_QSIZE_4K             0
> +#define EQ_QSIZE_64K            4
> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
> +        uint32_t        w1;
> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
> +#define EQ_W1_ESn_P             PPC_BIT32(0)
> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
> +#define EQ_W1_ESe_P             PPC_BIT32(2)
> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
> +#define EQ_W1_GENERATION        PPC_BIT32(9)
> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> +        uint32_t        w2;
> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> +        uint32_t        w3;
> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> +        uint32_t        w4;
> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
> +        uint32_t        w5;
> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
> +        uint32_t        w6;
> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> +        uint32_t        w7;
> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> +} XiveEQ;
> +
> +#define XIVE_EQ_PRIORITY_COUNT 8
> +#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
> +
> +void spapr_xive_reset(void *dev);
> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t isn);
> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
> +
> +bool spapr_xive_eq_for_target(sPAPRXive *xive, uint32_t target, uint8_t prio,
> +                        uint32_t *out_eq_idx);
> +
> +
> +#endif /* _INTC_XIVE_INTERNAL_H */
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 5b99f7fc2b81..b17dd4f17b0b 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -22,6 +22,8 @@
>  #include <hw/sysbus.h>
>  
>  typedef struct sPAPRXive sPAPRXive;
> +typedef struct XiveIVE XiveIVE;
> +typedef struct XiveEQ XiveEQ;
>  
>  #define TYPE_SPAPR_XIVE "spapr-xive"
>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> @@ -32,6 +34,13 @@ struct sPAPRXive {
>      /* Properties */
>      uint32_t     nr_targets;
>      uint32_t     nr_irqs;
> +
> +    /* XIVE internal tables */
> +    uint8_t      *sbe;
> +    uint32_t     sbe_size;
> +    XiveIVE      *ivt;
> +    XiveEQ       *eqt;
> +    uint32_t     nr_eqs;
>  };
>  
>  #endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE Cédric Le Goater
  2017-09-11 22:04   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-09-19  2:44   ` David Gibson
  2017-09-19 14:46     ` Cédric Le Goater
  1 sibling, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  2:44 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 3749 bytes --]

On Mon, Sep 11, 2017 at 07:12:18PM +0200, Cédric Le Goater wrote:
> The sPAPR machine first starts with a XICS interrupt model and
> depending on the guest capabilities, the XIVE exploitation mode is
> negotiated during CAS. A reset should then be performed to rebuild the
> device tree but the same IRQ numbers which were allocated by the
> devices prior to reset, when the XICS model was operating, are still
> in use.
> 
> For this purpose, we need a common IRQ number allocator for both the
> interrupt models: XICS legacy or XIVE exploitation. This is what the
> ICSIRQState array of the XICS interrupt source is used for. It also
> contains the LSI/MSI flag of an interrupt which will we need later on.
> 
> So, let's provide a link to the sPAPR ICS object under XIVE to make
> use of it.

Blech, please don't.  The XIVE code absolutely shouldn't be
referencing XICS objects, it's a recipe for trouble down the line.

If we have to have some sort of abstract "spapr interrupt source"
object that could map to either an ICS irq, or a XIVE source then we
can do that, but don't directly link XIVE and XICS.  *Especially* not
new-in-terms-of-old like this, rather than old-in-terms-of-new.


> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 12 ++++++++++++
>  include/hw/ppc/spapr_xive.h |  4 ++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 6d98528fae68..1681affb0848 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -56,6 +56,8 @@ void spapr_xive_reset(void *dev)
>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>  {
>      sPAPRXive *xive = SPAPR_XIVE(dev);
> +    Object *obj;
> +    Error *err = NULL;
>  
>      if (!xive->nr_targets) {
>          error_setg(errp, "Number of interrupt targets needs to be greater 0");
> @@ -68,6 +70,16 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    /* Retrieve SPAPR ICS source to share the IRQ number allocator */

This really suggests we need to move the irq number allocator out of
XICS and into the general spapr code.  Or get rid of it entirely
(using a more static irq mapping) if possible.

> +    obj = object_property_get_link(OBJECT(dev), "ics", &err);
> +    if (!obj) {
> +        error_setg(errp, "%s: required link 'ics' not found: %s",
> +                   __func__, error_get_pretty(err));
> +        return;
> +    }
> +
> +    xive->ics = ICS_BASE(obj);
> +
>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>      xive->sbe = g_malloc0(xive->sbe_size);
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index b17dd4f17b0b..29112589b37f 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -24,6 +24,7 @@
>  typedef struct sPAPRXive sPAPRXive;
>  typedef struct XiveIVE XiveIVE;
>  typedef struct XiveEQ XiveEQ;
> +typedef struct ICSState ICSState;
>  
>  #define TYPE_SPAPR_XIVE "spapr-xive"
>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> @@ -35,6 +36,9 @@ struct sPAPRXive {
>      uint32_t     nr_targets;
>      uint32_t     nr_irqs;
>  
> +    /* IRQ */
> +    ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
> +
>      /* XIVE internal tables */
>      uint8_t      *sbe;
>      uint32_t     sbe_size;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs Cédric Le Goater
@ 2017-09-19  2:45   ` David Gibson
  2017-09-19 14:52     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  2:45 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

On Mon, Sep 11, 2017 at 07:12:19PM +0200, Cédric Le Goater wrote:
> The number of IPIs is deduced from the max number of CPUs the guest
> supports and the IRQ numbers for the IPIs are allocated from the top
> of the IRQ number space to reduce conflict with other IRQ numbers
> allocated by the devices.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

This is more ick associated with implementing XIVE in terms of XICS.
We shouldn't need to "allocate" IRQs for the IPIs - they should just
be a fixed set.  And we certainly shouldn't need to set the XICS irq
type for XIVE irqs.

> ---
>  hw/intc/spapr_xive.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 1681affb0848..52c32f588d6d 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -58,6 +58,7 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      sPAPRXive *xive = SPAPR_XIVE(dev);
>      Object *obj;
>      Error *err = NULL;
> +    int i;
>  
>      if (!xive->nr_targets) {
>          error_setg(errp, "Number of interrupt targets needs to be greater 0");
> @@ -80,6 +81,11 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>  
>      xive->ics = ICS_BASE(obj);
>  
> +    /* Allocate the last IRQ numbers for the IPIs */
> +    for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
> +        ics_set_irq_type(xive->ics, i, false);
> +    }
> +
>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>      xive->sbe = g_malloc0(xive->sbe_size);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources Cédric Le Goater
@ 2017-09-19  2:48   ` David Gibson
  2017-09-19 15:08     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  2:48 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 3409 bytes --]

On Mon, Sep 11, 2017 at 07:12:20PM +0200, Cédric Le Goater wrote:
> These are very similar to the XICS handlers in a simpler form. They
> make use of the ICSIRQState array of the XICS interrupt source to
> differentiate the MSI from the LSI interrupts. The spapr_xive_irq()
> routine in charge of triggering the CPU interrupt line will be filled
> later on.
> 
> The next patch will introduce the MMIO handlers to interact with XIVE
> interrupt sources.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_xive.h |  1 +
>  2 files changed, 47 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 52c32f588d6d..1ed7b6a286e9 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -27,6 +27,50 @@
>  
>  #include "xive-internal.h"
>  
> +static void spapr_xive_irq(sPAPRXive *xive, int srcno)
> +{
> +
> +}
> +
> +/*
> + * XIVE Interrupt Source
> + */
> +static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
> +{
> +    if (val) {
> +        spapr_xive_irq(xive, srcno);
> +    }
> +}

So in XICS "srcno" (vs "irq") indicates an offset within a single ICS
object, as opposed to a global irq number.  Does that concept even
exist in XIVE?

> +
> +static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int srcno, int val)
> +{
> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
> +
> +    if (val) {
> +        irq->status |= XICS_STATUS_ASSERTED;
> +    } else {
> +        irq->status &= ~XICS_STATUS_ASSERTED;

More mangling a XICS specific object for XIVE operations.  Please
stop.

> +    }
> +
> +    if (irq->status & XICS_STATUS_ASSERTED
> +        && !(irq->status & XICS_STATUS_SENT)) {
> +        irq->status |= XICS_STATUS_SENT;
> +        spapr_xive_irq(xive, srcno);
> +    }
> +}
> +
> +static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
> +
> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> +        spapr_xive_source_set_irq_lsi(xive, srcno, val);
> +    } else {
> +        spapr_xive_source_set_irq_msi(xive, srcno, val);
> +    }
> +}
> +
>  /*
>   * Main XIVE object
>   */
> @@ -80,6 +124,8 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      }
>  
>      xive->ics = ICS_BASE(obj);
> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
> +                                     xive->nr_irqs);
>  
>      /* Allocate the last IRQ numbers for the IPIs */
>      for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 29112589b37f..eab92c4c1bb8 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -38,6 +38,7 @@ struct sPAPRXive {
>  
>      /* IRQ */
>      ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
> +    qemu_irq     *qirqs;
>  
>      /* XIVE internal tables */
>      uint8_t      *sbe;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE " Cédric Le Goater
@ 2017-09-19  2:57   ` David Gibson
  2017-09-20 12:54     ` Cédric Le Goater
  2017-09-20 13:05     ` Cédric Le Goater
  0 siblings, 2 replies; 90+ messages in thread
From: David Gibson @ 2017-09-19  2:57 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 12023 bytes --]

On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
> Each interrupt source is associated with a two bit state machine
> called an Event State Buffer (ESB) which is controlled by MMIO to
> trigger events. See code for more details on the states and
> transitions.
> 
> The MMIO space for the ESB translation is 512GB large on baremetal
> (powernv) systems and the BAR depends on the chip id. In our model for
> the sPAPR machine, we choose to only map a sub memory region for the
> provisionned IRQ numbers and to use the mapping address of chip 0 on a
> real system. The OS will get the address of the MMIO page of the ESB
> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.

On bare metal, are the MMIOs for each irq source mapped contiguously?

> For KVM support, we should think of a way to map this QEMU memory
> region in the host to trigger events directly.

This would rely on being able to map them without mapping those for
any other VM or the host.  Does that mean allocating a contiguous (and
aligned) hunk of irqs for a guest?

We're going to need to be careful about irq allocation here.
Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
MMIO addresses, we need the MMIO addresses to be stable and
consistent, because we can't have them change across migration.  We
need to have this consistent between in-qemu and in-KVM XIVE
implementations as well.

> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_xive.h |   6 ++
>  2 files changed, 261 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 1ed7b6a286e9..8a85d64efc4c 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>  }
>  
>  /*
> + * "magic" Event State Buffer (ESB) MMIO offsets.
> + *
> + * Each interrupt source has a 2-bit state machine called ESB
> + * which can be controlled by MMIO. It's made of 2 bits, P and
> + * Q. P indicates that an interrupt is pending (has been sent
> + * to a queue and is waiting for an EOI). Q indicates that the
> + * interrupt has been triggered while pending.
> + *
> + * This acts as a coalescing mechanism in order to guarantee
> + * that a given interrupt only occurs at most once in a queue.
> + *
> + * When doing an EOI, the Q bit will indicate if the interrupt
> + * needs to be re-triggered.
> + *
> + * The following offsets into the ESB MMIO allow to read or
> + * manipulate the PQ bits. They must be used with an 8-bytes
> + * load instruction. They all return the previous state of the
> + * interrupt (atomically).
> + *
> + * Additionally, some ESB pages support doing an EOI via a
> + * store at 0 and some ESBs support doing a trigger via a
> + * separate trigger page.
> + */
> +#define XIVE_ESB_GET            0x800
> +#define XIVE_ESB_SET_PQ_00      0xc00
> +#define XIVE_ESB_SET_PQ_01      0xd00
> +#define XIVE_ESB_SET_PQ_10      0xe00
> +#define XIVE_ESB_SET_PQ_11      0xf00
> +
> +#define XIVE_ESB_VAL_P          0x2
> +#define XIVE_ESB_VAL_Q          0x1
> +
> +#define XIVE_ESB_RESET          0x0
> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
> +
> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
> +{
> +    uint32_t byte = idx / 4;
> +    uint32_t bit  = (idx % 4) * 2;
> +
> +    assert(byte < xive->sbe_size);
> +
> +    return (xive->sbe[byte] >> bit) & 0x3;
> +}
> +
> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
> +{
> +    uint32_t byte = idx / 4;
> +    uint32_t bit  = (idx % 4) * 2;
> +    uint8_t old, new;
> +
> +    assert(byte < xive->sbe_size);
> +
> +    old = xive->sbe[byte];
> +
> +    new = xive->sbe[byte] & ~(0x3 << bit);
> +    new |= (pq & 0x3) << bit;
> +
> +    xive->sbe[byte] = new;
> +
> +    return (old >> bit) & 0x3;
> +}
> +
> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
> +{
> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> +
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_PENDING:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_QUEUED:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_OFF:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
> +{
> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> +
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_PENDING:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
> +        return true;
> +    case XIVE_ESB_QUEUED:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
> +        return true;
> +    case XIVE_ESB_OFF:
> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +/*
> + * XIVE Interrupt Source MMIOs
> + */
> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)
> +{
> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
> +
> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> +        irq->status &= ~XICS_STATUS_SENT;
> +    }
> +}
> +
> +/* TODO: handle second page
> + *
> + * Some HW use a separate page for trigger. We only support the case
> + * in which the trigger can be done in the same page as the EOI.
> + */
> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t srcno = addr >> xive->esb_shift;
> +    XiveIVE *ive;
> +    uint64_t ret = -1;
> +
> +    ive = spapr_xive_get_ive(xive, srcno);
> +    if (!ive || !(ive->w & IVE_VALID))  {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> +        goto out;

Since there's a whole (4k) page for each source, I wonder if we should
actually map each one as a separate MMIO region to allow us to tweak
the mappings more flexibly.

> +    }
> +
> +    switch (offset) {
> +    case 0:
> +        spapr_xive_source_eoi(xive, srcno);
> +
> +        /* return TRUE or FALSE depending on PQ value */
> +        ret = spapr_xive_pq_eoi(xive, srcno);
> +        break;
> +
> +    case XIVE_ESB_GET:
> +        ret = spapr_xive_pq_get(xive, srcno);
> +        break;
> +
> +    case XIVE_ESB_SET_PQ_00:
> +    case XIVE_ESB_SET_PQ_01:
> +    case XIVE_ESB_SET_PQ_10:
> +    case XIVE_ESB_SET_PQ_11:
> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
> +    }
> +
> +out:
> +    return ret;
> +}
> +
> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
> +                           uint64_t value, unsigned size)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t srcno = addr >> xive->esb_shift;
> +    XiveIVE *ive;
> +    bool notify = false;
> +
> +    ive = spapr_xive_get_ive(xive, srcno);
> +    if (!ive || !(ive->w & IVE_VALID))  {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> +        return;
> +    }
> +
> +    switch (offset) {
> +    case 0:
> +        /* TODO: should we trigger even if the IVE is masked ? */
> +        notify = spapr_xive_pq_trigger(xive, srcno);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> +                      offset);
> +        return;
> +    }
> +
> +    if (notify && !(ive->w & IVE_MASKED)) {
> +        qemu_irq_pulse(xive->qirqs[srcno]);
> +    }
> +}
> +
> +static const MemoryRegionOps spapr_xive_esb_ops = {
> +    .read = spapr_xive_esb_read,
> +    .write = spapr_xive_esb_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +/*
>   * XIVE Interrupt Source
>   */
>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
>  /*
>   * Main XIVE object
>   */
> +#define P9_MMIO_BASE     0x006000000000000ull
> +
> +/* VC BAR contains set translations for the ESBs and the EQs. */
> +#define VC_BAR_DEFAULT   0x10000000000ull
> +#define VC_BAR_SIZE      0x08000000000ull
> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
> +
> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
> +                                            unsigned size)
> +{
> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> +                  __func__, offset, size);
> +    return 0;
> +}
> +
> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
> +                                         uint64_t value, unsigned size)
> +{
> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> +                  __func__, offset, value, size);
> +}
> +
> +static const MemoryRegionOps spapr_xive_esb_default_ops = {
> +    .read = spapr_xive_esb_default_read,
> +    .write = spapr_xive_esb_default_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
>  
>  void spapr_xive_reset(void *dev)
>  {
> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
>  
> +    /* VC BAR. That's the full window but we will only map the
> +     * subregions in use. */
> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
> +    xive->esb_shift = ESB_SHIFT;
> +
> +    /* Install default memory region handlers to log bogus access */
> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
> +                          NULL, "xive.esb.full", VC_BAR_SIZE);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
> +
> +    /* Install the ESB memory region in the overall one */
> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
> +                          xive, "xive.esb",
> +                          (1ull << xive->esb_shift) * xive->nr_irqs);
> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
> +
>      qemu_register_reset(spapr_xive_reset, dev);
>  }
>  
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index eab92c4c1bb8..0f516534d76a 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -46,6 +46,12 @@ struct sPAPRXive {
>      XiveIVE      *ivt;
>      XiveEQ       *eqt;
>      uint32_t     nr_eqs;
> +
> +    /* ESB memory region */
> +    uint32_t     esb_shift;
> +    hwaddr       esb_base;
> +    MemoryRegion esb_mr;
> +    MemoryRegion esb_iomem;
>  };
>  
>  #endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE Cédric Le Goater
@ 2017-09-19  7:36   ` David Gibson
  2017-09-19 19:28     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  7:36 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]

On Mon, Sep 11, 2017 at 07:12:23PM +0200, Cédric Le Goater wrote:
> The XIVE interrupt presenter exposes a set of Thread Interrupt
> Management Areas, also called rings, one per different level of
> privilege (four in all). This area is used to handle priority
> management and interrupt acknowledgment among other things.
> 
> We extend the ICPState object with a cache of the register data for
> XIVE. The integration with the sPAPR machine is much easier and we
> need a common framework to switch from one controller model to
> another: XICS <-> XIVE.

This sounds like an even worse idea than referencing the ICS state.
The TIMA really needs to be managed by a different object than the ICP.

> The next patch will introduce the MMIO handlers to interact with the
> TIMA, OS only, which is required for the sPAPR support.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xics.c        | 4 ++++
>  include/hw/ppc/xics.h | 6 ++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index a84ba51ad8ff..927d4fec966a 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -274,6 +274,7 @@ static const VMStateDescription vmstate_icp_server = {
>          VMSTATE_UINT32(xirr, ICPState),
>          VMSTATE_UINT8(pending_priority, ICPState),
>          VMSTATE_UINT8(mfrr, ICPState),
> +        VMSTATE_UINT8_ARRAY(tima, ICPState, 0x40),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> @@ -293,6 +294,7 @@ static void icp_reset(void *dev)
>      if (icpc->reset) {
>          icpc->reset(icp);
>      }
> +    memset(icp->tima, 0, sizeof(icp->tima));
>  }
>  
>  static void icp_realize(DeviceState *dev, Error **errp)
> @@ -343,6 +345,8 @@ static void icp_realize(DeviceState *dev, Error **errp)
>          icpc->realize(icp, errp);
>      }
>  
> +    icp->tima_os = &icp->tima[0x10];
> +
>      qemu_register_reset(icp_reset, dev);
>      vmstate_register(NULL, icp->cs->cpu_index, &vmstate_icp_server, icp);
>  }
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 28d248abad61..c835997303c4 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -83,6 +83,12 @@ struct ICPState {
>      qemu_irq output;
>  
>      XICSFabric *xics;
> +
> +    /* XIVE section */
> +#define XIVE_TM_RING_COUNT 4
> +
> +    uint8_t tima[XIVE_TM_RING_COUNT * 0x10];
> +    uint8_t *tima_os;
>  };
>  
>  #define ICP_PROP_XICS "xics"

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue Cédric Le Goater
@ 2017-09-19  7:45   ` David Gibson
  2017-09-19 19:36     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  7:45 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 3885 bytes --]

On Mon, Sep 11, 2017 at 07:12:25PM +0200, Cédric Le Goater wrote:
> If a triggered event is let through, the Event Queue data defined in
> the associated IVE is pushed in the in-memory event queue. The latter
> is a circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG
> hcall, one per target and priority couple. It is composed of Event
> Queue entries which are 4 bytes long, the first bit being a
> 'generation' bit and the 31 following bits the EQ Data field.
> 
> The EQ Data field provides a way to set an invariant logical event
> source number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG
> hcall.
> 
> Notification of the CPU will be done in the following patch.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 557a7e2535b5..4bc61cfda67a 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -175,9 +175,76 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>      },
>  };
>  
> +static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
> +{
> +    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
> +    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
> +    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
> +    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
> +
> +    uint64_t qaddr = qaddr_base + (qindex << 2);
> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
> +    uint32_t qentries = 1 << (qsize + 10);
> +
> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
> +                      HWADDR_PRIx "\n", __func__, qaddr);
> +        return;
> +    }
> +
> +    qindex = (qindex + 1) % qentries;
> +    if (qindex == 0) {
> +        qgen ^= 1;
> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
> +    }
> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
> +}
> +
>  static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>  {
> +    XiveIVE *ive;
> +    XiveEQ *eq;
> +    uint32_t eq_idx;
> +    uint32_t priority;
> +
> +    ive = spapr_xive_get_ive(xive, srcno);
> +    if (!ive || !(ive->w & IVE_VALID)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> +        return;
> +    }
> +
> +    if (ive->w & IVE_MASKED) {
> +        return;
> +    }
> +
> +    /* Find our XiveEQ */
> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
> +    eq = spapr_xive_get_eq(xive, eq_idx);
> +    if (!eq) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", srcno);
> +        return;
> +    }
> +
> +    if (eq->w0 & EQ_W0_ENQUEUE) {
> +        spapr_xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
> +    }
> +
> +    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> +    }
> +
> +    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> +        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>  
> +        /* The EQ is masked. Can this happen ?  */
> +        if (priority == 0xff) {
> +            return;

How does the 8-bit priority field here interact with the 3-bit
priority which selects which EQ to use?

> +        }
> +    } else {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> +    }
>  }
>  
>  /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 12/21] ppc/xive: notify the CPU when interrupt priority is more privileged
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 12/21] ppc/xive: notify the CPU when interrupt priority is more privileged Cédric Le Goater
@ 2017-09-19  7:50   ` David Gibson
  0 siblings, 0 replies; 90+ messages in thread
From: David Gibson @ 2017-09-19  7:50 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4347 bytes --]

On Mon, Sep 11, 2017 at 07:12:26PM +0200, Cédric Le Goater wrote:
> The Pending Interrupt Priority Register (PIPR) contains the priority
> of the most favored pending notification. It is calculated from the
> Interrupt Pending Buffer (IPB) which indicates a pending interrupt at
> the priority corresponding to the bit number.
> 
> If the PIPR is more favored (1) than the Current Processor Priority
> Register (CPPR), the CPU interrupt line can be raised and the EO bit
> of the Notification Source Register is updated to notify the presence
> of an exception for the O/S. The check needs to be done whenever the
> PIPR or the CPPR is changed.
> 
> (1) numerically less than
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 4bc61cfda67a..e5d4b723b7e0 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -28,11 +28,39 @@
>  #include "xive-internal.h"
>  
>  
> +/* Convert a priority number to an Interrupt Pending Buffer (IPB)
> + * register, which indicates a pending interrupt at the priority
> + * corresponding to the bit number
> + */
> +static uint8_t priority_to_ipb(uint8_t priority)
> +{
> +    return priority > XIVE_PRIORITY_MAX ? 0 :  1 << (7 - priority);
> +}
> +
> +/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
> + * Interrupt Priority Register (PIPR), which contains the priority of
> + * the most favored pending notification.
> + *
> + * TODO: PIPR can never be OxFF. Needs a fix.
> + */
> +static uint8_t ipb_to_pipr(uint8_t ibp)
> +{
> +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
> +}
> +
>  static uint64_t spapr_xive_icp_accept(ICPState *icp)
>  {
>      return 0;
>  }
>  
> +static void spapr_xive_icp_notify(ICPState *icp)
> +{
> +    if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
> +        icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> +        qemu_irq_raise(ICP(icp)->output);

The CPU interrupt lines are effectively level sensitive, but you never
lower this, AFAICT.

> +    }
> +}
> +
>  static void spapr_xive_icp_set_cppr(ICPState *icp, uint8_t cppr)
>  {
>      if (cppr > XIVE_PRIORITY_MAX) {
> @@ -40,6 +68,10 @@ static void spapr_xive_icp_set_cppr(ICPState *icp, uint8_t cppr)
>      }
>  
>      icp->tima_os[TM_CPPR] = cppr;
> +
> +    /* CPPR has changed, inform the ICP which might raise an
> +     * exception */
> +    spapr_xive_icp_notify(icp);
>  }
>  
>  /*
> @@ -206,6 +238,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>      XiveEQ *eq;
>      uint32_t eq_idx;
>      uint32_t priority;
> +    uint32_t target;
> +    ICPState *icp;
>  
>      ive = spapr_xive_get_ive(xive, srcno);
>      if (!ive || !(ive->w & IVE_VALID)) {
> @@ -235,6 +269,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>          qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
>      }
>  
> +    target = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> +    icp = xics_icp_get(xive->ics->xics, target);
> +    if (!icp) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for target %d\n", target);
> +        return;
> +    }
> +
>      if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
>          priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>  
> @@ -242,9 +283,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>          if (priority == 0xff) {
>              return;
>          }
> +
> +        /* Update the IPB (Interrupt Pending Buffer) with the priority
> +         * of the new notification and inform the ICP, which will
> +         * decide to raise the exception, or not, depending the CPPR.
> +         */
> +        icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
>      } else {
>          qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
>      }
> +
> +    spapr_xive_icp_notify(icp);
>  }
>  
>  /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S Cédric Le Goater
@ 2017-09-19  7:53   ` David Gibson
  2017-09-20  9:40     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  7:53 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2192 bytes --]

On Mon, Sep 11, 2017 at 07:12:27PM +0200, Cédric Le Goater wrote:
> When an O/S Exception is raised, the O/S acknowledges the interrupt
> with a special read in the TIMA. If the EO bit of the Notification
> Source Register (NSR) is set (and it should), the Current Processor
> Priority Register (CPPR) takes the value of the Pending Interrupt
> Priority Register (PIPR), which contains the priority of the most
> favored pending notification. The bit number corresponding to the
> priority of the pending interrupt is reseted in the Interrupt Pending
> Buffer (IPB) and so is the EO bit of the NSR.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index e5d4b723b7e0..ad3ff91b13ea 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -50,7 +50,24 @@ static uint8_t ipb_to_pipr(uint8_t ibp)
>  
>  static uint64_t spapr_xive_icp_accept(ICPState *icp)
>  {
> -    return 0;
> +    uint8_t nsr = icp->tima_os[TM_NSR];
> +
> +    qemu_irq_lower(icp->output);

Ah, here's the lower.  This should not be in a different patch from
the matching raise.  Plus, this doesn't seem right.  Shouldn't this
recheck the CPPR against the PIPR, in case a higher priority irq has
been delivered since the one the cpu is acking.

> +    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
> +        uint8_t cppr = icp->tima_os[TM_PIPR];
> +
> +        icp->tima_os[TM_CPPR] = cppr;
> +
> +        /* Reset the pending buffer bit */
> +        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> +
> +        /* Drop Exception bit for OS */
> +        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
> +    }
> +
> +    return (nsr << 8) | icp->tima_os[TM_CPPR];
>  }
>  
>  static void spapr_xive_icp_notify(ICPState *icp)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command Cédric Le Goater
@ 2017-09-19  7:55   ` David Gibson
  2017-09-20  9:47     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  7:55 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1769 bytes --]

On Mon, Sep 11, 2017 at 07:12:28PM +0200, Cédric Le Goater wrote:
> Adjusting the Interrupt Pending Buffer for the O/S would allow a CPU
> to process event queues of other priorities during one physical
> interrupt cycle. This is not currently used by the XIVE support for
> sPAPR in Linux but it is by the hypervisor.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index ad3ff91b13ea..ad3f03e37401 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -162,7 +162,14 @@ static bool spapr_xive_tm_is_readonly(uint8_t index)
>  static void spapr_xive_tm_write_special(ICPState *icp, hwaddr offset,
>                                    uint64_t value, unsigned size)
>  {
> -    /* TODO: support TM_SPC_SET_OS_PENDING */
> +    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
> +        icp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);

This only lets the cpu raise bits in the IPB, never clear them.  Is
that right?  I don't see how you'd implement the handling of multiple
priorities without being able to clear bits here.

> +        spapr_xive_icp_notify(icp);
> +    } else {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
> +                      HWADDR_PRIx" size %d\n", offset, size);
> +    }
>  
>      /* TODO: support TM_SPC_ACK_OS_EL */
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 15/21] spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 15/21] spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument Cédric Le Goater
@ 2017-09-19  7:56   ` David Gibson
  2017-09-20  9:49     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  7:56 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2922 bytes --]

On Mon, Sep 11, 2017 at 07:12:29PM +0200, Cédric Le Goater wrote:
> This adds some flexibility in the definition of the number of
> available IRQS used in a sPAPR machine.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

This doesn't seem sensible.  You've already stated that the XIVE and
XICS need equivalent irq number spaces, so in particular they should
have the same number of irqs advertised.

> ---
>  hw/ppc/spapr.c              | 2 +-
>  hw/ppc/spapr_pci.c          | 4 ++--
>  include/hw/pci-host/spapr.h | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3e3ff1fbc988..5d69df928434 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1093,7 +1093,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>      }
>  
>      QLIST_FOREACH(phb, &spapr->phbs, list) {
> -        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
> +        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, XICS_IRQS_SPAPR);
>          if (ret < 0) {
>              error_report("couldn't setup PCI devices in fdt");
>              exit(1);
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index d84abf1070a0..05b0a067458e 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -2073,7 +2073,7 @@ static void spapr_phb_pci_enumerate(sPAPRPHBState *phb)
>  
>  int spapr_populate_pci_dt(sPAPRPHBState *phb,
>                            uint32_t xics_phandle,
> -                          void *fdt)
> +                          void *fdt, int nr_irqs)
>  {
>      int bus_off, i, j, ret;
>      char nodename[FDT_NAME_MAX];
> @@ -2142,7 +2142,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>      _FDT(fdt_setprop(fdt, bus_off, "ranges", &ranges, sizeof_ranges));
>      _FDT(fdt_setprop(fdt, bus_off, "reg", &bus_reg, sizeof(bus_reg)));
>      _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
> -    _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS_SPAPR));
> +    _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", nr_irqs));
>  
>      /* Dynamic DMA window */
>      if (phb->ddw_enabled) {
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 38470b2f0e5c..40146f72c103 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -115,7 +115,7 @@ PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index);
>  
>  int spapr_populate_pci_dt(sPAPRPHBState *phb,
>                            uint32_t xics_phandle,
> -                          void *fdt);
> +                          void *fdt, int nr_irqs);
>  
>  void spapr_pci_rtas_init(void);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (20 preceding siblings ...)
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 21/21] spapr: activate XIVE exploitation mode Cédric Le Goater
@ 2017-09-19  8:20 ` David Gibson
  2017-09-19  8:46   ` David Gibson
  21 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  8:20 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 5311 bytes --]

On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> negotiation process determines whether the guest operates with an
> interrupt controller using the XICS legacy model, as found on POWER8,
> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> 
> Follows a model for the XIVE interrupt controller and support for the
> Hypervisor's calls which are used to configure the interrupt sources
> and the event/notification queues of the guest. The last patch
> integrates XIVE in the sPAPR machine.
> 
> Code is here:


An overall comment:

I note in several replies here that I think the way XICS objects are
re-used for XIVE is really ugly, and I think it will make future
maintenance pretty painful.

I'm thinking maybe trying to support the CAS negotiation of interrupt
controller from day 1 is warping the design.  A better approach might
be first to implement XIVE only when given a specific machine option -
guest gets one or the other and can't negotiate.

That should allow a more natural XIVE design to emerge, *then* we can
look at what's necessary to make boot-time negotiation possible.
> 
>   https://github.com/legoater/qemu/commits/xive
> 
> Caveats :
> 
>  - IRQ allocator : making progress
> 
>    The sPAPR machine make uses of the interrupt controller very early
>    in the initialization sequence to allocate IRQ numbers and populate
>    the device tree. CAS requires XIVE to be able to switch interrupt
>    model and consequently have the models share a common IRQ allocator.   
> 
>    I have chosen to link the sPAPR XICS interrupt source into XIVE to
>    share the ICSIRQState array which acts as an IRQ allocator. This
>    can be improved.
> 
>  - Interrupt presenter :
> 
>    The register data is directly stored under the ICPState structure
>    which is shared with all other sPAPR interrupt controller models.
> 
>  - KVM support : not addressed yet
> 
>    The guest needs to be run with kernel_irqchip=off on a POWER9 system.
> 
>  - LSI : lightly tested.
>    
> Thanks,
> 
> C.
> 
> Changes since RFC v1:
> 
>  - removed initial complexity due to a tentative try to support
>    PowerNV. This will come later.
>  - removed specific XIVE interrupt source and presenter models
>  - renamed files and typedefs
>  - removed print_info() handler
>  - introduced a CAS reset to rebuild the device tree
>  - linked the XIVE model with the sPAPR XICS interrupt source to share
>    the IRQ allocator   
>  - improved hcall support (still some missing but they are not used
>    under Linux)
>  - improved device tree
>  - should have addressed comments in first RFC
>  - and much more ... Next version should have a better changelog.
>  
> 
> Cédric Le Goater (21):
>   ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
>   migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC
>   ppc/xive: define the XIVE internal tables
>   ppc/xive: provide a link to the sPAPR ICS object under XIVE
>   ppc/xive: allocate IRQ numbers for the IPIs
>   ppc/xive: introduce handlers for interrupt sources
>   ppc/xive: add MMIO handlers for the XIVE interrupt sources
>   ppc/xive: describe the XIVE interrupt source flags
>   ppc/xive: extend the interrupt presenter model for XIVE
>   ppc/xive: add MMIO handlers for the XIVE TIMA
>   ppc/xive: push the EQ data in OS event queue
>   ppc/xive: notify the CPU when interrupt priority is more privileged
>   ppc/xive: handle interrupt acknowledgment by the O/S
>   ppc/xive: add support for the SET_OS_PENDING command
>   spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument
>   spapr: add a XIVE object to the sPAPR machine
>   ppc/xive: add hcalls support
>   ppc/xive: add device tree support
>   ppc/xive: introduce a helper to map the XIVE memory regions
>   ppc/xics: introduce a qirq_get() helper in the XICSFabric
>   spapr: activate XIVE exploitation mode
> 
>  default-configs/ppc64-softmmu.mak |   1 +
>  hw/intc/Makefile.objs             |   1 +
>  hw/intc/spapr_xive.c              | 821 +++++++++++++++++++++++++++++++++
>  hw/intc/spapr_xive_hcall.c        | 930 ++++++++++++++++++++++++++++++++++++++
>  hw/intc/xics.c                    |  11 +-
>  hw/intc/xive-internal.h           | 189 ++++++++
>  hw/ppc/spapr.c                    | 110 ++++-
>  hw/ppc/spapr_hcall.c              |   6 +
>  hw/ppc/spapr_pci.c                |   4 +-
>  include/hw/pci-host/spapr.h       |   2 +-
>  include/hw/ppc/spapr.h            |  17 +-
>  include/hw/ppc/spapr_xive.h       |  75 +++
>  include/hw/ppc/xics.h             |   7 +
>  include/migration/vmstate.h       |  10 +
>  14 files changed, 2169 insertions(+), 15 deletions(-)
>  create mode 100644 hw/intc/spapr_xive.c
>  create mode 100644 hw/intc/spapr_xive_hcall.c
>  create mode 100644 hw/intc/xive-internal.h
>  create mode 100644 include/hw/ppc/spapr_xive.h
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 16/21] spapr: add a XIVE object to the sPAPR machine
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 16/21] spapr: add a XIVE object to the sPAPR machine Cédric Le Goater
@ 2017-09-19  8:38   ` David Gibson
  2017-09-20  9:51     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  8:38 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 6037 bytes --]

On Mon, Sep 11, 2017 at 07:12:30PM +0200, Cédric Le Goater wrote:
> If the machine supports XIVE (POWER9 CPU), create a XIVE object. The
> CAS negotiation process will decide which model (legacy or XIVE) will
> be used for the interrupt controller depending on the guest
> capabilities.
> 
> Also extend the number of provisionned IRQs with the number of CPUs,
> this is required for XIVE which allocates one IRQ number for each IPI.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr.c         | 63 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 63 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5d69df928434..b6577dbecdea 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -44,6 +44,7 @@
>  #include "mmu-hash64.h"
>  #include "mmu-book3s-v3.h"
>  #include "qom/cpu.h"
> +#include "target/ppc/cpu-models.h"
>  
>  #include "hw/boards.h"
>  #include "hw/ppc/ppc.h"
> @@ -54,6 +55,7 @@
>  #include "hw/ppc/spapr_vio.h"
>  #include "hw/pci-host/spapr.h"
>  #include "hw/ppc/xics.h"
> +#include "hw/ppc/spapr_xive.h"
>  #include "hw/pci/msi.h"
>  
>  #include "hw/pci/pci.h"
> @@ -202,6 +204,35 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
>      }
>  }
>  
> +static sPAPRXive *spapr_spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
> +                               int nr_servers, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    Object *obj;
> +
> +    obj = object_new(TYPE_SPAPR_XIVE);
> +    object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
> +    object_property_add_const_link(obj, "ics", OBJECT(spapr->ics),
> +                                   &error_abort);
> +    object_property_set_int(obj, nr_irqs, "nr-irqs",  &local_err);
> +    if (local_err) {
> +        goto error;
> +    }
> +    object_property_set_int(obj, nr_servers, "nr-targets", &local_err);
> +    if (local_err) {
> +        goto error;
> +    }
> +    object_property_set_bool(obj, true, "realized", &local_err);
> +    if (local_err) {
> +        goto error;
> +    }
> +
> +    return SPAPR_XIVE(obj);
> +error:
> +    error_propagate(errp, local_err);
> +    return NULL;
> +}
> +
>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>                                    int smt_threads)
>  {
> @@ -1093,7 +1124,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>      }
>  
>      QLIST_FOREACH(phb, &spapr->phbs, list) {
> -        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, XICS_IRQS_SPAPR);
> +        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt,
> +                                    XICS_IRQS_SPAPR + xics_max_server_number());
>          if (ret < 0) {
>              error_report("couldn't setup PCI devices in fdt");
>              exit(1);
> @@ -2140,6 +2172,16 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>      g_free(type);
>  }
>  
> +/*
> + * Only POWER9 Processor chips support the XIVE interrupt controller
> + */
> +static bool ppc_support_xive(MachineState *machine)
> +{
> +   PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(first_cpu);
> +
> +   return pcc->pvr_match(pcc, CPU_POWERPC_POWER9_BASE);
> +}
> +
>  /* pSeries LPAR / sPAPR hardware init */
>  static void ppc_spapr_init(MachineState *machine)
>  {
> @@ -2237,7 +2279,8 @@ static void ppc_spapr_init(MachineState *machine)
>      load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
>  
>      /* Set up Interrupt Controller before we create the VCPUs */
> -    xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
> +    xics_system_init(machine, XICS_IRQS_SPAPR + xics_max_server_number(),
> +                     &error_fatal);

Has this hunk leaked from another patch?  AFAICT it only affects XICS
with what you have so far, which doesn't seem like what you want.

>      /* Set up containers for ibm,client-set-architecture negotiated options */
>      spapr->ov5 = spapr_ovec_new();
> @@ -2274,6 +2317,22 @@ static void ppc_spapr_init(MachineState *machine)
>  
>      spapr_init_cpus(spapr);
>  
> +    /* Set up XIVE. CAS will choose whether the guest runs in XICS
> +     * (legacy mode) or XIVE Exploitation mode
> +     *
> +     * We don't have KVM support yet, so check for irqchip=on
> +     */
> +    if (ppc_support_xive(machine)) {
> +        if (kvm_enabled() && machine_kernel_irqchip_required(machine)) {
> +            error_report("kernel_irqchip requested. no XIVE support");
> +        } else {
> +            spapr->xive = spapr_spapr_xive_create(spapr,
> +                               XICS_IRQS_SPAPR + xics_max_server_number(),
> +                               xics_max_server_number(),
> +                               &error_fatal);
> +        }
> +    }
> +
>      if (kvm_enabled()) {
>          /* Enable H_LOGICAL_CI_* so SLOF can talk to in-kernel devices */
>          kvmppc_enable_logical_ci_hcalls();
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 2a303a705c17..6cd5ab73c5dc 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -14,6 +14,7 @@ struct sPAPRNVRAM;
>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>  typedef struct sPAPREventSource sPAPREventSource;
>  typedef struct sPAPRPendingHPT sPAPRPendingHPT;
> +typedef struct sPAPRXive sPAPRXive;
>  
>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>  #define SPAPR_ENTRY_POINT       0x100
> @@ -127,6 +128,7 @@ struct sPAPRMachineState {
>      MemoryHotplugState hotplug_memory;
>  
>      const char *icp_type;
> +    sPAPRXive  *xive;
>  };
>  
>  #define H_SUCCESS         0

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support Cédric Le Goater
@ 2017-09-19  8:44   ` David Gibson
  2017-09-20 12:26     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  8:44 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4655 bytes --]

On Mon, Sep 11, 2017 at 07:12:32PM +0200, Cédric Le Goater wrote:
> Like for XICS, the XIVE interface for the guest is described in the
> device tree under the "interrupt-controller" node. A couple of new
> properties are specific to XIVE :
> 
>  - "reg"
> 
>    contains the base address and size of the thread interrupt
>    managnement areas (TIMA), also called rings, for the User level and
>    for the Guest OS level. Only the Guest OS level is taken into
>    account today.
> 
>  - "ibm,xive-eq-sizes"
> 
>    the size of the event queues. One cell per size supported, contains
>    log2 of size, in ascending order.
> 
>  - "ibm,xive-lisn-ranges"
> 
>    the interrupt numbers ranges assigned to the guest. These are
>    allocated using a simple bitmap.
> 
> and also under the root node :
> 
>  - "ibm,plat-res-int-priorities"
> 
>    contains a list of priorities that the hypervisor has reserved for
>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive_hcall.c  | 54 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_xive.h |  1 +
>  2 files changed, 55 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> index 4c77b65683de..7b19ea6373dd 100644
> --- a/hw/intc/spapr_xive_hcall.c
> +++ b/hw/intc/spapr_xive_hcall.c
> @@ -874,3 +874,57 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>  }
> +
> +void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle)
> +{
> +    int node;
> +    uint64_t timas[2 * 2];
> +    uint32_t lisn_ranges[] = {
> +        cpu_to_be32(xive->nr_irqs - xive->nr_targets + xive->ics->offset),
> +        cpu_to_be32(xive->nr_targets),
> +    };
> +    uint32_t eq_sizes[] = {
> +        cpu_to_be32(12), /* 4K */
> +        cpu_to_be32(16), /* 64K */
> +        cpu_to_be32(21), /* 2M */
> +        cpu_to_be32(24), /* 16M */
> +    };
> +
> +    /* Use some ranges to exercise the Linux driver, which should
> +     * result in Linux choosing priority 6. This is not strictly
> +     * necessary
> +     */
> +    uint32_t reserved_priorities[] = {
> +        cpu_to_be32(1),  /* start */
> +        cpu_to_be32(2),  /* count */
> +        cpu_to_be32(7),  /* start */
> +        cpu_to_be32(0xf8),  /* count */
> +    };
> +    int i;
> +
> +    /* Thread Interrupt Management Areas : User and OS */
> +    for (i = 0; i < 2; i++) {
> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
> +    }
> +
> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
> +
> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));

Shouldn't need this - SLOF will figure it out from the node name above.

> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> +
> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
> +                     sizeof(eq_sizes)));
> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
> +                     sizeof(lisn_ranges)));

I note this doesn't have the interrupt-controller or #interrupt-cells
properties.  So what acts as the interrupt parent for all the devices
in the tree with XIVE?

> +    /* For SLOF */
> +    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
> +    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
> +
> +    /* top properties */
> +    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
> +                     reserved_priorities, sizeof(reserved_priorities)));
> +}
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index ae5ff89533c0..0a156f2d8591 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -69,5 +69,6 @@ struct sPAPRXive {
>  typedef struct sPAPRMachineState sPAPRMachineState;
>  
>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> +void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle);
>  
>  #endif /* PPC_SPAPR_XIVE_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-19  8:20 ` [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
@ 2017-09-19  8:46   ` David Gibson
  2017-09-20 12:33     ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-19  8:46 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> > On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> > negotiation process determines whether the guest operates with an
> > interrupt controller using the XICS legacy model, as found on POWER8,
> > or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> > patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> > 
> > Follows a model for the XIVE interrupt controller and support for the
> > Hypervisor's calls which are used to configure the interrupt sources
> > and the event/notification queues of the guest. The last patch
> > integrates XIVE in the sPAPR machine.
> > 
> > Code is here:
> 
> 
> An overall comment:
> 
> I note in several replies here that I think the way XICS objects are
> re-used for XIVE is really ugly, and I think it will make future
> maintenance pretty painful.
> 
> I'm thinking maybe trying to support the CAS negotiation of interrupt
> controller from day 1 is warping the design.  A better approach might
> be first to implement XIVE only when given a specific machine option -
> guest gets one or the other and can't negotiate.
> 
> That should allow a more natural XIVE design to emerge, *then* we can
> look at what's necessary to make boot-time negotiation possible.

Actually, it just occurred to me that we might be making life hard for
ourselves by trying to actually switch between full XICS and XIVE
models.  Coudln't we have new machine types always construct the XIVE
infrastructure, but then implement the XICS RTAS and hcalls in terms
of the XIVE virtual hardware.  Since something more or less equivalent
has already been done in both OPAL and the host kernel, I'm guessing
this shouldn't be too hard at this point.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-19  2:27   ` David Gibson
@ 2017-09-19 13:15     ` Cédric Le Goater
  2017-09-22 11:00       ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-19 13:15 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 04:27 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:15PM +0200, Cédric Le Goater wrote:
>> Start with a couple of attributes for the XIVE sPAPR controller
>> model. The number of provisionned IRQ is necessary to size the
>> different internal XIVE tables, the number of CPUs is also.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> [snip]
> 
>> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>> +
>> +    if (!xive->nr_targets) {
>> +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
>> +        return;
>> +    }
>> +    /* We need to be able to allocate at least the IPIs */
>> +    if (!xive->nr_irqs || xive->nr_irqs < xive->nr_targets) {
>> +        error_setg(errp, "Number of interrupts too small");
>> +        return;
>> +    }
>> +}
>> +
>> +static Property spapr_xive_properties[] = {
>> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
>> +    DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
> 
> I'm a bit uneasy about the number of targets having to be set in
> advance: this can make life awkward when CPUs are hotplugged.  I know
> there's something similar in xics, but it has caused some hassles, and
> we're starting to move away from it.
> 
> Do you really need this?
> 

Some of the internal table size depend on the number of cpus 
defined for the machine. When the sPAPRXive object is instantiated, 
we use xics_max_server_number() to get the max number of cpus
provisioned.

C.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables
  2017-09-19  2:39   ` David Gibson
@ 2017-09-19 13:46     ` Cédric Le Goater
  2017-09-20  4:33       ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-19 13:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 04:39 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:17PM +0200, Cédric Le Goater wrote:
>> The XIVE interrupt controller of the POWER9 uses a set of tables to
>> redirect exception from event sources to CPU threads. Among which we
>> choose to model :
>>
>>  - the State Bit Entries (SBE), also known as Event State Buffer
>>    (ESB). This is a two bit state machine for each event source which
>>    is used to trigger events. The bits are named "P" (pending) and "Q"
>>    (queued) and can be controlled by MMIO.
>>
>>  - the Interrupt Virtualization Entry (IVE) table, also known as Event
>>    Assignment Structure (EAS). This table is indexed by the IRQ number
>>    and is looked up to find the Event Queue associated with a
>>    triggered event.
> 
> Both the above are one entry per irq source, yes?  What's the
> rationale for having them as parallel tables, rather than bits in a
> single per-source structure?

For the sPAPR machines, yes, we could use a struct to hold both 
information. But these tables are defined in the HW specs and 
are used as such by the PowerNV platform in skiboot. They are 
registered by the firmware for the use of the XIVE interrupt 
controller.   

When we model XIVE for PowerNV, it would be preferable to have 
common definitions for these tables I think. 
 
>>  - the Event Queue Descriptor (EQD) table, also known as Event
>>    Notification Descriptor (END). The EQD contains fields that specify
>>    the Event Queue on which event data is posted (and later pulled by
>>    the OS) and also a target (or VPD) to notify.
>>
>> An additional table was not modeled but we might need to support the
>> H_INT_SET_OS_REPORTING_LINE hcall:
>>
>>  - the Virtual Processor Descriptor (VPD) table, also known as
>>    Notification Virtual Target (NVT).
>>
>> The XIVE object is expanded with the tables described above. The size
>> of each table depends on the number of provisioned IRQ and the maximum
>> number of CPUs in the system. The indexing is very basic and might
>> need to be improved for the EQs.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 108 ++++++++++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive-internal.h     | 105 ++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr_xive.h |   9 ++++
>>  3 files changed, 222 insertions(+)
>>  create mode 100644 hw/intc/xive-internal.h
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index c83796519586..6d98528fae68 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -25,11 +25,34 @@
>>  #include "hw/ppc/xics.h"
>>  #include "hw/ppc/spapr_xive.h"
>>  
>> +#include "xive-internal.h"
>>  
>>  /*
>>   * Main XIVE object
>>   */
>>  
>> +void spapr_xive_reset(void *dev)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>> +    int i;
>> +
>> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
>> +    memset(xive->sbe, 0x55, xive->sbe_size);
>> +
>> +    /* Validate all available IVEs in the IRQ number space. It would
>> +     * be more correct to validate only the allocated IRQs but this
>> +     * would require some callback routine from the spapr machine into
>> +     * XIVE. To be done later.
>> +     */
>> +    for (i = 0; i < xive->nr_irqs; i++) {
>> +        XiveIVE *ive = &xive->ivt[i];
>> +        ive->w = IVE_VALID | IVE_MASKED;
>> +    }
>> +
>> +    /* clear all EQs */
>> +    memset(xive->eqt, 0, xive->nr_eqs * sizeof(XiveEQ));
>> +}
>> +
>>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>  {
>>      sPAPRXive *xive = SPAPR_XIVE(dev);
>> @@ -44,8 +67,64 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>          error_setg(errp, "Number of interrupts too small");
>>          return;
>>      }
>> +
>> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>> +    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>> +    xive->sbe = g_malloc0(xive->sbe_size);
>> +
>> +    /* Allocate the IVT (Interrupt Virtualization Table) */
>> +    xive->ivt = g_malloc0(xive->nr_irqs * sizeof(XiveIVE));
>> +
>> +    /* Allocate the EQDT (Event Queue Descriptor Table), 8 priorities
>> +     * for each thread in the system */
>> +    xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
>> +    xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
>> +
>> +    qemu_register_reset(spapr_xive_reset, dev);
>>  }
>>  
>> +static const VMStateDescription vmstate_spapr_xive_ive = {
>> +    .name = "xive/ive",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT64(w, XiveIVE),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static const VMStateDescription vmstate_spapr_xive_eq = {
>> +    .name = "xive/eq",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT32(w0, XiveEQ),
>> +        VMSTATE_UINT32(w1, XiveEQ),
>> +        VMSTATE_UINT32(w2, XiveEQ),
>> +        VMSTATE_UINT32(w3, XiveEQ),
>> +        VMSTATE_UINT32(w4, XiveEQ),
>> +        VMSTATE_UINT32(w5, XiveEQ),
>> +        VMSTATE_UINT32(w6, XiveEQ),
>> +        VMSTATE_UINT32(w7, XiveEQ),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static const VMStateDescription vmstate_xive = {
>> +    .name = "xive",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_VARRAY_UINT32_ALLOC(sbe, sPAPRXive, sbe_size, 0,
>> +                                    vmstate_info_uint8, uint8_t),
> 
> Since you're treating the SBE as a packed buffer of u8s anyway, it's
> probably simpler to use VMSTATE_BUFFER().  I don't see that you need
> the ALLOC - it should have already been allocated on the destination.
> 
> Might be worth having a VMSTATE_UINT32_EQUAL to sanity check that
> sbe_size is equal at either end.

OK. I will fix that.

> 
>> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 0,
>> +                                    vmstate_spapr_xive_ive, XiveIVE),
>> +        VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(eqt, sPAPRXive, nr_eqs, 0,
>> +                                    vmstate_spapr_xive_eq, XiveEQ),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static Property spapr_xive_properties[] = {
>>      DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
>>      DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
>> @@ -59,6 +138,7 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
>>      dc->realize = spapr_xive_realize;
>>      dc->props = spapr_xive_properties;
>>      dc->desc = "sPAPR XIVE interrupt controller";
>> +    dc->vmsd = &vmstate_xive;
>>  }
>>  
>>  static const TypeInfo spapr_xive_info = {
>> @@ -74,3 +154,31 @@ static void spapr_xive_register_types(void)
>>  }
>>  
>>  type_init(spapr_xive_register_types)
>> +
>> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t idx)
>> +{
>> +    return idx < xive->nr_irqs ? &xive->ivt[idx] : NULL;
>> +}
>> +
>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx)
>> +{
>> +    return idx < xive->nr_eqs ? &xive->eqt[idx] : NULL;
>> +}
>> +
>> +/* TODO: improve EQ indexing. This is very simple and relies on the
>> + * fact that target (CPU) numbers start at 0 and are contiguous. It
>> + * should be OK for sPAPR.
>> + */
>> +bool spapr_xive_eq_for_target(sPAPRXive *xive, uint32_t target,
>> +                              uint8_t priority, uint32_t *out_eq_idx)
>> +{
>> +    if (priority > XIVE_PRIORITY_MAX || target >= xive->nr_targets) {
>> +        return false;
>> +    }
>> +
>> +    if (out_eq_idx) {
>> +        *out_eq_idx = target + priority;
> 
> Don't you need to multiply target by XIVE_EQ_PRIORITY_COUNT?

Because this is a bug ... I was lucky to only use a high priority (7)

Thanks,

C. 

> 
>> +    }
>> +
>> +    return true;
>> +}
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> new file mode 100644
>> index 000000000000..95184bad5c1d
>> --- /dev/null
>> +++ b/hw/intc/xive-internal.h
>> @@ -0,0 +1,105 @@
>> +/*
>> + * QEMU PowerPC XIVE model
>> + *
>> + * Copyright 2016,2017 IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +#ifndef _INTC_XIVE_INTERNAL_H
>> +#define _INTC_XIVE_INTERNAL_H
>> +
>> +/* Utilities to manipulate these (originaly from OPAL) */
>> +#define MASK_TO_LSH(m)          (__builtin_ffsl(m) - 1)
>> +#define GETFIELD(m, v)          (((v) & (m)) >> MASK_TO_LSH(m))
>> +#define SETFIELD(m, v, val)                             \
>> +        (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
>> +
>> +#define PPC_BIT(bit)            (0x8000000000000000UL >> (bit))
>> +#define PPC_BIT32(bit)          (0x80000000UL >> (bit))
>> +#define PPC_BIT8(bit)           (0x80UL >> (bit))
>> +#define PPC_BITMASK(bs, be)     ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>> +#define PPC_BITMASK32(bs, be)   ((PPC_BIT32(bs) - PPC_BIT32(be)) | \
>> +                                 PPC_BIT32(bs))
>> +
>> +/* IVE/EAS
>> + *
>> + * One per interrupt source. Targets that interrupt to a given EQ
>> + * and provides the corresponding logical interrupt number (EQ data)
>> + *
>> + * We also map this structure to the escalation descriptor inside
>> + * an EQ, though in that case the valid and masked bits are not used.
>> + */
>> +typedef struct XiveIVE {
>> +        /* Use a single 64-bit definition to make it easier to
>> +         * perform atomic updates
>> +         */
>> +        uint64_t        w;
>> +#define IVE_VALID       PPC_BIT(0)
>> +#define IVE_EQ_BLOCK    PPC_BITMASK(4, 7)        /* Destination EQ block# */
>> +#define IVE_EQ_INDEX    PPC_BITMASK(8, 31)       /* Destination EQ index */
>> +#define IVE_MASKED      PPC_BIT(32)              /* Masked */
>> +#define IVE_EQ_DATA     PPC_BITMASK(33, 63)      /* Data written to the EQ */
>> +} XiveIVE;
>> +
>> +/* EQ */
>> +typedef struct XiveEQ {
>> +        uint32_t        w0;
> 
> It'd be nice if IBM came up with better names for its fields thatn w0,
> w1, etc.   Oh well.
> 
>> +#define EQ_W0_VALID             PPC_BIT32(0)
>> +#define EQ_W0_ENQUEUE           PPC_BIT32(1)
>> +#define EQ_W0_UCOND_NOTIFY      PPC_BIT32(2)
>> +#define EQ_W0_BACKLOG           PPC_BIT32(3)
>> +#define EQ_W0_PRECL_ESC_CTL     PPC_BIT32(4)
>> +#define EQ_W0_ESCALATE_CTL      PPC_BIT32(5)
>> +#define EQ_W0_END_OF_INTR       PPC_BIT32(6)
>> +#define EQ_W0_QSIZE             PPC_BITMASK32(12, 15)
>> +#define EQ_W0_SW0               PPC_BIT32(16)
>> +#define EQ_W0_FIRMWARE          EQ_W0_SW0 /* Owned by FW */
>> +#define EQ_QSIZE_4K             0
>> +#define EQ_QSIZE_64K            4
>> +#define EQ_W0_HWDEP             PPC_BITMASK32(24, 31)
>> +        uint32_t        w1;
>> +#define EQ_W1_ESn               PPC_BITMASK32(0, 1)
>> +#define EQ_W1_ESn_P             PPC_BIT32(0)
>> +#define EQ_W1_ESn_Q             PPC_BIT32(1)
>> +#define EQ_W1_ESe               PPC_BITMASK32(2, 3)
>> +#define EQ_W1_ESe_P             PPC_BIT32(2)
>> +#define EQ_W1_ESe_Q             PPC_BIT32(3)
>> +#define EQ_W1_GENERATION        PPC_BIT32(9)
>> +#define EQ_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
>> +        uint32_t        w2;
>> +#define EQ_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
>> +#define EQ_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
>> +        uint32_t        w3;
>> +#define EQ_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
>> +        uint32_t        w4;
>> +#define EQ_W4_ESC_EQ_BLOCK      PPC_BITMASK32(4, 7)
>> +#define EQ_W4_ESC_EQ_INDEX      PPC_BITMASK32(8, 31)
>> +        uint32_t        w5;
>> +#define EQ_W5_ESC_EQ_DATA       PPC_BITMASK32(1, 31)
>> +        uint32_t        w6;
>> +#define EQ_W6_FORMAT_BIT        PPC_BIT32(8)
>> +#define EQ_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
>> +#define EQ_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
>> +        uint32_t        w7;
>> +#define EQ_W7_F0_IGNORE         PPC_BIT32(0)
>> +#define EQ_W7_F0_BLK_GROUPING   PPC_BIT32(1)
>> +#define EQ_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
>> +#define EQ_W7_F1_WAKEZ          PPC_BIT32(0)
>> +#define EQ_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>> +} XiveEQ;
>> +
>> +#define XIVE_EQ_PRIORITY_COUNT 8
>> +#define XIVE_PRIORITY_MAX  (XIVE_EQ_PRIORITY_COUNT - 1)
>> +
>> +void spapr_xive_reset(void *dev);
>> +XiveIVE *spapr_xive_get_ive(sPAPRXive *xive, uint32_t isn);
>> +XiveEQ *spapr_xive_get_eq(sPAPRXive *xive, uint32_t idx);
>> +
>> +bool spapr_xive_eq_for_target(sPAPRXive *xive, uint32_t target, uint8_t prio,
>> +                        uint32_t *out_eq_idx);
>> +
>> +
>> +#endif /* _INTC_XIVE_INTERNAL_H */
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 5b99f7fc2b81..b17dd4f17b0b 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -22,6 +22,8 @@
>>  #include <hw/sysbus.h>
>>  
>>  typedef struct sPAPRXive sPAPRXive;
>> +typedef struct XiveIVE XiveIVE;
>> +typedef struct XiveEQ XiveEQ;
>>  
>>  #define TYPE_SPAPR_XIVE "spapr-xive"
>>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
>> @@ -32,6 +34,13 @@ struct sPAPRXive {
>>      /* Properties */
>>      uint32_t     nr_targets;
>>      uint32_t     nr_irqs;
>> +
>> +    /* XIVE internal tables */
>> +    uint8_t      *sbe;
>> +    uint32_t     sbe_size;
>> +    XiveIVE      *ivt;
>> +    XiveEQ       *eqt;
>> +    uint32_t     nr_eqs;
>>  };
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE
  2017-09-19  2:44   ` [Qemu-devel] " David Gibson
@ 2017-09-19 14:46     ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-19 14:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 04:44 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:18PM +0200, Cédric Le Goater wrote:
>> The sPAPR machine first starts with a XICS interrupt model and
>> depending on the guest capabilities, the XIVE exploitation mode is
>> negotiated during CAS. A reset should then be performed to rebuild the
>> device tree but the same IRQ numbers which were allocated by the
>> devices prior to reset, when the XICS model was operating, are still
>> in use.
>>
>> For this purpose, we need a common IRQ number allocator for both the
>> interrupt models: XICS legacy or XIVE exploitation. This is what the
>> ICSIRQState array of the XICS interrupt source is used for. It also
>> contains the LSI/MSI flag of an interrupt which will we need later on.
>>
>> So, let's provide a link to the sPAPR ICS object under XIVE to make
>> use of it.
> 
> Blech, please don't.  The XIVE code absolutely shouldn't be
> referencing XICS objects, it's a recipe for trouble down the line.

Trouble I don't know. But it is a bit ugly and this is why this 
patchset is still an RFC.  

> If we have to have some sort of abstract "spapr interrupt source"
> object that could map to either an ICS irq, or a XIVE source then we
> can do that, but don't directly link XIVE and XICS.  *Especially* not
> new-in-terms-of-old like this, rather than old-in-terms-of-new.

I agree with what you are saying on a common interrupt source but 
I just haven't found a way to do so yet. So I am trying to corner 
the ugliness in obvious shortcuts. The purpose is to identify what
we need to support migration, hotplug, cas reset, etc.

So the current solution is practical to support CAS reset and more 
important it does not break migration. We can't move an object or 
parts of an object around without breaking the migration as my 
recent changes on the xics icp have shown.  

>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 12 ++++++++++++
>>  include/hw/ppc/spapr_xive.h |  4 ++++
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 6d98528fae68..1681affb0848 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -56,6 +56,8 @@ void spapr_xive_reset(void *dev)
>>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>  {
>>      sPAPRXive *xive = SPAPR_XIVE(dev);
>> +    Object *obj;
>> +    Error *err = NULL;
>>  
>>      if (!xive->nr_targets) {
>>          error_setg(errp, "Number of interrupt targets needs to be greater 0");
>> @@ -68,6 +70,16 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>          return;
>>      }
>>  
>> +    /* Retrieve SPAPR ICS source to share the IRQ number allocator */
> 
> This really suggests we need to move the irq number allocator out of
> XICS and into the general spapr code.  Or get rid of it entirely
> (using a more static irq mapping) if possible.

I have some out of tree changes introducing a bitmap at the spapr
machine level. It is a nice cleanup of the spapr_ics_free() and 
spapr_ics_alloc*() routines. But it is not migration friendly and 
we still need to keep the ICSIRQState array, which also stores the 
IRQ state LSI/MSI. I need to check if such a change would bring
some benefits for XIVE.

C.

>> +    obj = object_property_get_link(OBJECT(dev), "ics", &err);
>> +    if (!obj) {
>> +        error_setg(errp, "%s: required link 'ics' not found: %s",
>> +                   __func__, error_get_pretty(err));
>> +        return;
>> +    }
>> +
>> +    xive->ics = ICS_BASE(obj);
>> +
>>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>>      xive->sbe = g_malloc0(xive->sbe_size);
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index b17dd4f17b0b..29112589b37f 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -24,6 +24,7 @@
>>  typedef struct sPAPRXive sPAPRXive;
>>  typedef struct XiveIVE XiveIVE;
>>  typedef struct XiveEQ XiveEQ;
>> +typedef struct ICSState ICSState;
>>  
>>  #define TYPE_SPAPR_XIVE "spapr-xive"
>>  #define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
>> @@ -35,6 +36,9 @@ struct sPAPRXive {
>>      uint32_t     nr_targets;
>>      uint32_t     nr_irqs;
>>  
>> +    /* IRQ */
>> +    ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
>> +
>>      /* XIVE internal tables */
>>      uint8_t      *sbe;
>>      uint32_t     sbe_size;
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs
  2017-09-19  2:45   ` David Gibson
@ 2017-09-19 14:52     ` Cédric Le Goater
  2017-09-20  4:35       ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-19 14:52 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 04:45 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:19PM +0200, Cédric Le Goater wrote:
>> The number of IPIs is deduced from the max number of CPUs the guest
>> supports and the IRQ numbers for the IPIs are allocated from the top
>> of the IRQ number space to reduce conflict with other IRQ numbers
>> allocated by the devices.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> This is more ick associated with implementing XIVE in terms of XICS.
> We shouldn't need to "allocate" IRQs for the IPIs - they should just
> be a fixed set.  

They are allocated at the right beginning so we can consider them
fixed I suppose. 

> And we certainly shouldn't need to set the XICS irq type for XIVE irqs.

This is because, in this patchset, XIVE and XICS use the same IRQ 
allocator which happens to be the ICSIRQState array of XICS. yes, 
this is ugly but we are identifying the different constraints. 

We should be doing the same with a common interrupt source, that is
to allocate some IRQ numbers for the IPIs. 

C.


>> ---
>>  hw/intc/spapr_xive.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 1681affb0848..52c32f588d6d 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -58,6 +58,7 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>      sPAPRXive *xive = SPAPR_XIVE(dev);
>>      Object *obj;
>>      Error *err = NULL;
>> +    int i;
>>  
>>      if (!xive->nr_targets) {
>>          error_setg(errp, "Number of interrupt targets needs to be greater 0");
>> @@ -80,6 +81,11 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>  
>>      xive->ics = ICS_BASE(obj);
>>  
>> +    /* Allocate the last IRQ numbers for the IPIs */
>> +    for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
>> +        ics_set_irq_type(xive->ics, i, false);
>> +    }
>> +
>>      /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>>      xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>>      xive->sbe = g_malloc0(xive->sbe_size);
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources
  2017-09-19  2:48   ` David Gibson
@ 2017-09-19 15:08     ` Cédric Le Goater
  2017-09-20  4:38       ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-19 15:08 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 04:48 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:20PM +0200, Cédric Le Goater wrote:
>> These are very similar to the XICS handlers in a simpler form. They
>> make use of the ICSIRQState array of the XICS interrupt source to
>> differentiate the MSI from the LSI interrupts. The spapr_xive_irq()
>> routine in charge of triggering the CPU interrupt line will be filled
>> later on.
>>
>> The next patch will introduce the MMIO handlers to interact with XIVE
>> interrupt sources.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr_xive.h |  1 +
>>  2 files changed, 47 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 52c32f588d6d..1ed7b6a286e9 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -27,6 +27,50 @@
>>  
>>  #include "xive-internal.h"
>>  
>> +static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>> +{
>> +
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source
>> + */
>> +static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
>> +{
>> +    if (val) {
>> +        spapr_xive_irq(xive, srcno);
>> +    }
>> +}
> 
> So in XICS "srcno" (vs "irq") indicates an offset within a single ICS
> object, as opposed to a global irq number.  Does that concept even
> exist in XIVE?

We don't really care in the internals. 'srcno' is just an index in the 
tables, may be I should change the name. It could be the same in XICS 
but the xirr is manipulated at low level and so we need to propagate 
the source offset in a couple of places. 

This to say that the 'irq' number is a guest level information which
in the patchset should only be used at the hcall level to identify 
a source.
 
>> +
>> +static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int srcno, int val)
>> +{
>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
>> +
>> +    if (val) {
>> +        irq->status |= XICS_STATUS_ASSERTED;
>> +    } else {
>> +        irq->status &= ~XICS_STATUS_ASSERTED;
> 
> More mangling a XICS specific object for XIVE operations.  Please
> stop.

ah ! we will still need the same information and that means introducing 
a common source object. The patchset today just uses the XICS ICSIRQState 
array as a common object.
 
>> +    }
>> +
>> +    if (irq->status & XICS_STATUS_ASSERTED
>> +        && !(irq->status & XICS_STATUS_SENT)) {
>> +        irq->status |= XICS_STATUS_SENT;
>> +        spapr_xive_irq(xive, srcno);
>> +    }
>> +}
>> +
>> +static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
>> +
>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>> +        spapr_xive_source_set_irq_lsi(xive, srcno, val);
>> +    } else {
>> +        spapr_xive_source_set_irq_msi(xive, srcno, val);
>> +    }
>> +}
>> +
>>  /*
>>   * Main XIVE object
>>   */
>> @@ -80,6 +124,8 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>      }
>>  
>>      xive->ics = ICS_BASE(obj);
>> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
>> +                                     xive->nr_irqs);
>>  
>>      /* Allocate the last IRQ numbers for the IPIs */
>>      for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 29112589b37f..eab92c4c1bb8 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -38,6 +38,7 @@ struct sPAPRXive {
>>  
>>      /* IRQ */
>>      ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
>> +    qemu_irq     *qirqs;
>>  
>>      /* XIVE internal tables */
>>      uint8_t      *sbe;
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE
  2017-09-19  7:36   ` David Gibson
@ 2017-09-19 19:28     ` Cédric Le Goater
  2017-09-22 10:58       ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-19 19:28 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 09:36 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:23PM +0200, Cédric Le Goater wrote:
>> The XIVE interrupt presenter exposes a set of Thread Interrupt
>> Management Areas, also called rings, one per different level of
>> privilege (four in all). This area is used to handle priority
>> management and interrupt acknowledgment among other things.
>>
>> We extend the ICPState object with a cache of the register data for
>> XIVE. The integration with the sPAPR machine is much easier and we
>> need a common framework to switch from one controller model to
>> another: XICS <-> XIVE.
> 
> This sounds like an even worse idea than referencing the ICS state.

ok ok.

> The TIMA really needs to be managed by a different object than the ICP.

like an array under the machine indexed by the cpu index ? 

at some point, we will need to :

    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
    ICPState *icp = ICP(cpu->intc);

and 

    icp = xics_icp_get(xive->ics->xics, target);


isn't the cpu->intc pointer  the best option to hold that information ? 
and it is migrated.

C. 


>> The next patch will introduce the MMIO handlers to interact with the
>> TIMA, OS only, which is required for the sPAPR support.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xics.c        | 4 ++++
>>  include/hw/ppc/xics.h | 6 ++++++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index a84ba51ad8ff..927d4fec966a 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -274,6 +274,7 @@ static const VMStateDescription vmstate_icp_server = {
>>          VMSTATE_UINT32(xirr, ICPState),
>>          VMSTATE_UINT8(pending_priority, ICPState),
>>          VMSTATE_UINT8(mfrr, ICPState),
>> +        VMSTATE_UINT8_ARRAY(tima, ICPState, 0x40),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> @@ -293,6 +294,7 @@ static void icp_reset(void *dev)
>>      if (icpc->reset) {
>>          icpc->reset(icp);
>>      }
>> +    memset(icp->tima, 0, sizeof(icp->tima));
>>  }
>>  
>>  static void icp_realize(DeviceState *dev, Error **errp)
>> @@ -343,6 +345,8 @@ static void icp_realize(DeviceState *dev, Error **errp)
>>          icpc->realize(icp, errp);
>>      }
>>  
>> +    icp->tima_os = &icp->tima[0x10];
>> +
>>      qemu_register_reset(icp_reset, dev);
>>      vmstate_register(NULL, icp->cs->cpu_index, &vmstate_icp_server, icp);
>>  }
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 28d248abad61..c835997303c4 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -83,6 +83,12 @@ struct ICPState {
>>      qemu_irq output;
>>  
>>      XICSFabric *xics;
>> +
>> +    /* XIVE section */
>> +#define XIVE_TM_RING_COUNT 4
>> +
>> +    uint8_t tima[XIVE_TM_RING_COUNT * 0x10];
>> +    uint8_t *tima_os;
>>  };
>>  
>>  #define ICP_PROP_XICS "xics"
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue
  2017-09-19  7:45   ` David Gibson
@ 2017-09-19 19:36     ` Cédric Le Goater
  2017-09-20  6:34       ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-19 19:36 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 09:45 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:25PM +0200, Cédric Le Goater wrote:
>> If a triggered event is let through, the Event Queue data defined in
>> the associated IVE is pushed in the in-memory event queue. The latter
>> is a circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG
>> hcall, one per target and priority couple. It is composed of Event
>> Queue entries which are 4 bytes long, the first bit being a
>> 'generation' bit and the 31 following bits the EQ Data field.
>>
>> The EQ Data field provides a way to set an invariant logical event
>> source number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG
>> hcall.
>>
>> Notification of the CPU will be done in the following patch.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 67 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 557a7e2535b5..4bc61cfda67a 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -175,9 +175,76 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>>      },
>>  };
>>  
>> +static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
>> +{
>> +    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
>> +    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
>> +    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
>> +    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
>> +
>> +    uint64_t qaddr = qaddr_base + (qindex << 2);
>> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
>> +    uint32_t qentries = 1 << (qsize + 10);
>> +
>> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
>> +                      HWADDR_PRIx "\n", __func__, qaddr);
>> +        return;
>> +    }
>> +
>> +    qindex = (qindex + 1) % qentries;
>> +    if (qindex == 0) {
>> +        qgen ^= 1;
>> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
>> +    }
>> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
>> +}
>> +
>>  static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>>  {
>> +    XiveIVE *ive;
>> +    XiveEQ *eq;
>> +    uint32_t eq_idx;
>> +    uint32_t priority;
>> +
>> +    ive = spapr_xive_get_ive(xive, srcno);
>> +    if (!ive || !(ive->w & IVE_VALID)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
>> +        return;
>> +    }
>> +
>> +    if (ive->w & IVE_MASKED) {
>> +        return;
>> +    }
>> +
>> +    /* Find our XiveEQ */
>> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
>> +    eq = spapr_xive_get_eq(xive, eq_idx);
>> +    if (!eq) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", srcno);
>> +        return;
>> +    }
>> +
>> +    if (eq->w0 & EQ_W0_ENQUEUE) {
>> +        spapr_xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
>> +    }
>> +
>> +    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
>> +    }
>> +
>> +    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
>> +        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
>>  
>> +        /* The EQ is masked. Can this happen ?  */
>> +        if (priority == 0xff) {
>> +            return;
> 
> How does the 8-bit priority field here interact with the 3-bit
> priority which selects which EQ to use?

priority OxFF is a special case kept for masking, see the hcall 
h_int_set_source_config. It should never reach the EQ lookup 
routines. So may be an assert would be better here.

C. 

> 
>> +        }
>> +    } else {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
>> +    }
>>  }
>>  
>>  /*
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables
  2017-09-19 13:46     ` Cédric Le Goater
@ 2017-09-20  4:33       ` David Gibson
  0 siblings, 0 replies; 90+ messages in thread
From: David Gibson @ 2017-09-20  4:33 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

On Tue, Sep 19, 2017 at 03:46:20PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:39 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:17PM +0200, Cédric Le Goater wrote:
> >> The XIVE interrupt controller of the POWER9 uses a set of tables to
> >> redirect exception from event sources to CPU threads. Among which we
> >> choose to model :
> >>
> >>  - the State Bit Entries (SBE), also known as Event State Buffer
> >>    (ESB). This is a two bit state machine for each event source which
> >>    is used to trigger events. The bits are named "P" (pending) and "Q"
> >>    (queued) and can be controlled by MMIO.
> >>
> >>  - the Interrupt Virtualization Entry (IVE) table, also known as Event
> >>    Assignment Structure (EAS). This table is indexed by the IRQ number
> >>    and is looked up to find the Event Queue associated with a
> >>    triggered event.
> > 
> > Both the above are one entry per irq source, yes?  What's the
> > rationale for having them as parallel tables, rather than bits in a
> > single per-source structure?
> 
> For the sPAPR machines, yes, we could use a struct to hold both 
> information. But these tables are defined in the HW specs and 
> are used as such by the PowerNV platform in skiboot. They are 
> registered by the firmware for the use of the XIVE interrupt 
> controller.   
> 
> When we model XIVE for PowerNV, it would be preferable to have 
> common definitions for these tables I think.

Ok, that seems like a reasonable case.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs
  2017-09-19 14:52     ` Cédric Le Goater
@ 2017-09-20  4:35       ` David Gibson
  0 siblings, 0 replies; 90+ messages in thread
From: David Gibson @ 2017-09-20  4:35 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1461 bytes --]

On Tue, Sep 19, 2017 at 04:52:10PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:45 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:19PM +0200, Cédric Le Goater wrote:
> >> The number of IPIs is deduced from the max number of CPUs the guest
> >> supports and the IRQ numbers for the IPIs are allocated from the top
> >> of the IRQ number space to reduce conflict with other IRQ numbers
> >> allocated by the devices.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > 
> > This is more ick associated with implementing XIVE in terms of XICS.
> > We shouldn't need to "allocate" IRQs for the IPIs - they should just
> > be a fixed set.  
> 
> They are allocated at the right beginning so we can consider them
> fixed I suppose. 
> 
> > And we certainly shouldn't need to set the XICS irq type for XIVE irqs.
> 
> This is because, in this patchset, XIVE and XICS use the same IRQ 
> allocator which happens to be the ICSIRQState array of XICS. yes, 
> this is ugly but we are identifying the different constraints. 

Yeah, as I said in the other mail, I think trying to support both
immediately is making a mess of the XIVE design.  Let's get it working
as a machine option first, then worry about CAS and migration.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources
  2017-09-19 15:08     ` Cédric Le Goater
@ 2017-09-20  4:38       ` David Gibson
  2017-09-21 14:11         ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-20  4:38 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 5156 bytes --]

On Tue, Sep 19, 2017 at 05:08:21PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:48 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:20PM +0200, Cédric Le Goater wrote:
> >> These are very similar to the XICS handlers in a simpler form. They
> >> make use of the ICSIRQState array of the XICS interrupt source to
> >> differentiate the MSI from the LSI interrupts. The spapr_xive_irq()
> >> routine in charge of triggering the CPU interrupt line will be filled
> >> later on.
> >>
> >> The next patch will introduce the MMIO handlers to interact with XIVE
> >> interrupt sources.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr_xive.h |  1 +
> >>  2 files changed, 47 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 52c32f588d6d..1ed7b6a286e9 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -27,6 +27,50 @@
> >>  
> >>  #include "xive-internal.h"
> >>  
> >> +static void spapr_xive_irq(sPAPRXive *xive, int srcno)
> >> +{
> >> +
> >> +}
> >> +
> >> +/*
> >> + * XIVE Interrupt Source
> >> + */
> >> +static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
> >> +{
> >> +    if (val) {
> >> +        spapr_xive_irq(xive, srcno);
> >> +    }
> >> +}
> > 
> > So in XICS "srcno" (vs "irq") indicates an offset within a single ICS
> > object, as opposed to a global irq number.  Does that concept even
> > exist in XIVE?
> 
> We don't really care in the internals. 'srcno' is just an index in the 
> tables, may be I should change the name. It could be the same in XICS 
> but the xirr is manipulated at low level and so we need to propagate 
> the source offset in a couple of places. 

Right.  My point is that the XICS code deliberately uses srcno vs. irq
names to identify which space we're talking about.  If we re-use the
srcno name in XIVE where it doesn't really apply that could be
misleading.

> This to say that the 'irq' number is a guest level information which
> in the patchset should only be used at the hcall level to identify 
> a source.

Right, and if there's no need to introduce a number space other than
the guest one, we should keep using that everywhere - and give it a
consistent name to avoid confusion.

>  
> >> +
> >> +static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int srcno, int val)
> >> +{
> >> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
> >> +
> >> +    if (val) {
> >> +        irq->status |= XICS_STATUS_ASSERTED;
> >> +    } else {
> >> +        irq->status &= ~XICS_STATUS_ASSERTED;
> > 
> > More mangling a XICS specific object for XIVE operations.  Please
> > stop.
> 
> ah ! we will still need the same information and that means introducing 
> a common source object. The patchset today just uses the XICS ICSIRQState 
> array as a common object.

It's not really the same information though.  For XICS irq->status is
*all* the information about the line's state, for XIVE, most of that
info is in the PQ bits which are elsewhere.  That makes at least some
of the information in ICSIRQState redundant, and therefore confusing
and misleading.

> >> +    }
> >> +
> >> +    if (irq->status & XICS_STATUS_ASSERTED
> >> +        && !(irq->status & XICS_STATUS_SENT)) {
> >> +        irq->status |= XICS_STATUS_SENT;
> >> +        spapr_xive_irq(xive, srcno);
> >> +    }
> >> +}
> >> +
> >> +static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> >> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
> >> +
> >> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> >> +        spapr_xive_source_set_irq_lsi(xive, srcno, val);
> >> +    } else {
> >> +        spapr_xive_source_set_irq_msi(xive, srcno, val);
> >> +    }
> >> +}
> >> +
> >>  /*
> >>   * Main XIVE object
> >>   */
> >> @@ -80,6 +124,8 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >>      }
> >>  
> >>      xive->ics = ICS_BASE(obj);
> >> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
> >> +                                     xive->nr_irqs);
> >>  
> >>      /* Allocate the last IRQ numbers for the IPIs */
> >>      for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> index 29112589b37f..eab92c4c1bb8 100644
> >> --- a/include/hw/ppc/spapr_xive.h
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -38,6 +38,7 @@ struct sPAPRXive {
> >>  
> >>      /* IRQ */
> >>      ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
> >> +    qemu_irq     *qirqs;
> >>  
> >>      /* XIVE internal tables */
> >>      uint8_t      *sbe;
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue
  2017-09-19 19:36     ` Cédric Le Goater
@ 2017-09-20  6:34       ` David Gibson
  2017-09-28  8:12         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-20  6:34 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4655 bytes --]

On Tue, Sep 19, 2017 at 09:36:08PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 09:45 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:25PM +0200, Cédric Le Goater wrote:
> >> If a triggered event is let through, the Event Queue data defined in
> >> the associated IVE is pushed in the in-memory event queue. The latter
> >> is a circular buffer provided by the OS using the H_INT_SET_QUEUE_CONFIG
> >> hcall, one per target and priority couple. It is composed of Event
> >> Queue entries which are 4 bytes long, the first bit being a
> >> 'generation' bit and the 31 following bits the EQ Data field.
> >>
> >> The EQ Data field provides a way to set an invariant logical event
> >> source number for an IRQ. It is set with the H_INT_SET_SOURCE_CONFIG
> >> hcall.
> >>
> >> Notification of the CPU will be done in the following patch.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 67 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 557a7e2535b5..4bc61cfda67a 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -175,9 +175,76 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
> >>      },
> >>  };
> >>  
> >> +static void spapr_xive_eq_push(XiveEQ *eq, uint32_t data)
> >> +{
> >> +    uint64_t qaddr_base = (((uint64_t)(eq->w2 & 0x0fffffff)) << 32) | eq->w3;
> >> +    uint32_t qsize = GETFIELD(EQ_W0_QSIZE, eq->w0);
> >> +    uint32_t qindex = GETFIELD(EQ_W1_PAGE_OFF, eq->w1);
> >> +    uint32_t qgen = GETFIELD(EQ_W1_GENERATION, eq->w1);
> >> +
> >> +    uint64_t qaddr = qaddr_base + (qindex << 2);
> >> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
> >> +    uint32_t qentries = 1 << (qsize + 10);
> >> +
> >> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to write EQ data @0x%"
> >> +                      HWADDR_PRIx "\n", __func__, qaddr);
> >> +        return;
> >> +    }
> >> +
> >> +    qindex = (qindex + 1) % qentries;
> >> +    if (qindex == 0) {
> >> +        qgen ^= 1;
> >> +        eq->w1 = SETFIELD(EQ_W1_GENERATION, eq->w1, qgen);
> >> +    }
> >> +    eq->w1 = SETFIELD(EQ_W1_PAGE_OFF, eq->w1, qindex);
> >> +}
> >> +
> >>  static void spapr_xive_irq(sPAPRXive *xive, int srcno)
> >>  {
> >> +    XiveIVE *ive;
> >> +    XiveEQ *eq;
> >> +    uint32_t eq_idx;
> >> +    uint32_t priority;
> >> +
> >> +    ive = spapr_xive_get_ive(xive, srcno);
> >> +    if (!ive || !(ive->w & IVE_VALID)) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> >> +        return;
> >> +    }
> >> +
> >> +    if (ive->w & IVE_MASKED) {
> >> +        return;
> >> +    }
> >> +
> >> +    /* Find our XiveEQ */
> >> +    eq_idx = GETFIELD(IVE_EQ_INDEX, ive->w);
> >> +    eq = spapr_xive_get_eq(xive, eq_idx);
> >> +    if (!eq) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No EQ for LISN %d\n", srcno);
> >> +        return;
> >> +    }
> >> +
> >> +    if (eq->w0 & EQ_W0_ENQUEUE) {
> >> +        spapr_xive_eq_push(eq, GETFIELD(IVE_EQ_DATA, ive->w));
> >> +    } else {
> >> +        qemu_log_mask(LOG_UNIMP, "XIVE: !ENQUEUE not implemented\n");
> >> +    }
> >> +
> >> +    if (!(eq->w0 & EQ_W0_UCOND_NOTIFY)) {
> >> +        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> >> +    }
> >> +
> >> +    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> >> +        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> >>  
> >> +        /* The EQ is masked. Can this happen ?  */
> >> +        if (priority == 0xff) {
> >> +            return;
> > 
> > How does the 8-bit priority field here interact with the 3-bit
> > priority which selects which EQ to use?
> 
> priority OxFF is a special case kept for masking, see the hcall 
> h_int_set_source_config. It should never reach the EQ lookup 
> routines. So may be an assert would be better here.

Ok, if this situation can't be guest triggered, only by a bug in the
rest of the XIVE code, then an assert() is better.

> 
> C. 
> 
> > 
> >> +        }
> >> +    } else {
> >> +        qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> >> +    }
> >>  }
> >>  
> >>  /*
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S
  2017-09-19  7:53   ` David Gibson
@ 2017-09-20  9:40     ` Cédric Le Goater
  2017-09-28  8:14       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20  9:40 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 09:53 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:27PM +0200, Cédric Le Goater wrote:
>> When an O/S Exception is raised, the O/S acknowledges the interrupt
>> with a special read in the TIMA. If the EO bit of the Notification
>> Source Register (NSR) is set (and it should), the Current Processor
>> Priority Register (CPPR) takes the value of the Pending Interrupt
>> Priority Register (PIPR), which contains the priority of the most
>> favored pending notification. The bit number corresponding to the
>> priority of the pending interrupt is reseted in the Interrupt Pending
>> Buffer (IPB) and so is the EO bit of the NSR.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c | 19 ++++++++++++++++++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index e5d4b723b7e0..ad3ff91b13ea 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -50,7 +50,24 @@ static uint8_t ipb_to_pipr(uint8_t ibp)
>>  
>>  static uint64_t spapr_xive_icp_accept(ICPState *icp)
>>  {
>> -    return 0;
>> +    uint8_t nsr = icp->tima_os[TM_NSR];
>> +
>> +    qemu_irq_lower(icp->output);
> 
> Ah, here's the lower.  This should not be in a different patch from
> the matching raise.  

ok I can merge these.

> Plus, this doesn't seem right.  Shouldn't this
> recheck the CPPR against the PIPR, in case a higher priority irq has
> been delivered since the one the cpu is acking.

If a higher priority is delivered, it means that the CPPR was more 
privileged and that we have now two bits set in the IPB by the time 
the interrupt is acked. The high priority PIPR will become the new 
CPPR and the IBP will be modified keeping only the lower priority. 

if the CPPR is modified to the lower priority level, then the 
first interrupt will be delivered again. 

I think this is fine.

C.


> 
>> +    if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
>> +        uint8_t cppr = icp->tima_os[TM_PIPR];
>> +
>> +        icp->tima_os[TM_CPPR] = cppr;
>> +
>> +        /* Reset the pending buffer bit */
>> +        icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
>> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
>> +
>> +        /* Drop Exception bit for OS */
>> +        icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
>> +    }
>> +
>> +    return (nsr << 8) | icp->tima_os[TM_CPPR];
>>  }
>>  
>>  static void spapr_xive_icp_notify(ICPState *icp)
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command
  2017-09-19  7:55   ` David Gibson
@ 2017-09-20  9:47     ` Cédric Le Goater
  2017-09-28  8:18       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20  9:47 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 09:55 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:28PM +0200, Cédric Le Goater wrote:
>> Adjusting the Interrupt Pending Buffer for the O/S would allow a CPU
>> to process event queues of other priorities during one physical
>> interrupt cycle. This is not currently used by the XIVE support for
>> sPAPR in Linux but it is by the hypervisor.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index ad3ff91b13ea..ad3f03e37401 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -162,7 +162,14 @@ static bool spapr_xive_tm_is_readonly(uint8_t index)
>>  static void spapr_xive_tm_write_special(ICPState *icp, hwaddr offset,
>>                                    uint64_t value, unsigned size)
>>  {
>> -    /* TODO: support TM_SPC_SET_OS_PENDING */
>> +    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
>> +        icp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
>> +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> 
> This only lets the cpu raise bits in the IPB, never clear them.> Is that right?  

The clear is done when the OS acks the interrupt.

> I don't see how you'd implement the handling of multiple
> priorities without being able to clear bits here.

I am not sure how this command should be used from the OS. 
Currently, I only see KVM handling it in the XICS/XIVE glue.
I need to take a closer look.

C.
 


>> +        spapr_xive_icp_notify(icp);
>> +    } else {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA write @%"
>> +                      HWADDR_PRIx" size %d\n", offset, size);
>> +    }
>>  
>>      /* TODO: support TM_SPC_ACK_OS_EL */
>>  }
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 15/21] spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument
  2017-09-19  7:56   ` David Gibson
@ 2017-09-20  9:49     ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20  9:49 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 09:56 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:29PM +0200, Cédric Le Goater wrote:
>> This adds some flexibility in the definition of the number of
>> available IRQS used in a sPAPR machine.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> This doesn't seem sensible.  You've already stated that the XIVE and
> XICS need equivalent irq number spaces, so in particular they should
> have the same number of irqs advertised.

yes. I am adding an argument to spapr_populate_pci_dt() 
because it is using the XICS_IRQS_SPAPR define directly 
and I intend to extend that number of irqs with the 
number of cpus to have room for IPIs.

C. 

>> ---
>>  hw/ppc/spapr.c              | 2 +-
>>  hw/ppc/spapr_pci.c          | 4 ++--
>>  include/hw/pci-host/spapr.h | 2 +-
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3e3ff1fbc988..5d69df928434 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1093,7 +1093,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>>      }
>>  
>>      QLIST_FOREACH(phb, &spapr->phbs, list) {
>> -        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
>> +        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, XICS_IRQS_SPAPR);
>>          if (ret < 0) {
>>              error_report("couldn't setup PCI devices in fdt");
>>              exit(1);
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index d84abf1070a0..05b0a067458e 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -2073,7 +2073,7 @@ static void spapr_phb_pci_enumerate(sPAPRPHBState *phb)
>>  
>>  int spapr_populate_pci_dt(sPAPRPHBState *phb,
>>                            uint32_t xics_phandle,
>> -                          void *fdt)
>> +                          void *fdt, int nr_irqs)
>>  {
>>      int bus_off, i, j, ret;
>>      char nodename[FDT_NAME_MAX];
>> @@ -2142,7 +2142,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>>      _FDT(fdt_setprop(fdt, bus_off, "ranges", &ranges, sizeof_ranges));
>>      _FDT(fdt_setprop(fdt, bus_off, "reg", &bus_reg, sizeof(bus_reg)));
>>      _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pci-config-space-type", 0x1));
>> -    _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", XICS_IRQS_SPAPR));
>> +    _FDT(fdt_setprop_cell(fdt, bus_off, "ibm,pe-total-#msi", nr_irqs));
>>  
>>      /* Dynamic DMA window */
>>      if (phb->ddw_enabled) {
>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>> index 38470b2f0e5c..40146f72c103 100644
>> --- a/include/hw/pci-host/spapr.h
>> +++ b/include/hw/pci-host/spapr.h
>> @@ -115,7 +115,7 @@ PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index);
>>  
>>  int spapr_populate_pci_dt(sPAPRPHBState *phb,
>>                            uint32_t xics_phandle,
>> -                          void *fdt);
>> +                          void *fdt, int nr_irqs);
>>  
>>  void spapr_pci_rtas_init(void);
>>  
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 16/21] spapr: add a XIVE object to the sPAPR machine
  2017-09-19  8:38   ` David Gibson
@ 2017-09-20  9:51     ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20  9:51 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 10:38 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:30PM +0200, Cédric Le Goater wrote:
>> If the machine supports XIVE (POWER9 CPU), create a XIVE object. The
>> CAS negotiation process will decide which model (legacy or XIVE) will
>> be used for the interrupt controller depending on the guest
>> capabilities.
>>
>> Also extend the number of provisionned IRQs with the number of CPUs,
>> this is required for XIVE which allocates one IRQ number for each IPI.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/ppc/spapr.c         | 63 ++++++++++++++++++++++++++++++++++++++++++++++++--
>>  include/hw/ppc/spapr.h |  2 ++
>>  2 files changed, 63 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 5d69df928434..b6577dbecdea 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -44,6 +44,7 @@
>>  #include "mmu-hash64.h"
>>  #include "mmu-book3s-v3.h"
>>  #include "qom/cpu.h"
>> +#include "target/ppc/cpu-models.h"
>>  
>>  #include "hw/boards.h"
>>  #include "hw/ppc/ppc.h"
>> @@ -54,6 +55,7 @@
>>  #include "hw/ppc/spapr_vio.h"
>>  #include "hw/pci-host/spapr.h"
>>  #include "hw/ppc/xics.h"
>> +#include "hw/ppc/spapr_xive.h"
>>  #include "hw/pci/msi.h"
>>  
>>  #include "hw/pci/pci.h"
>> @@ -202,6 +204,35 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
>>      }
>>  }
>>  
>> +static sPAPRXive *spapr_spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
>> +                               int nr_servers, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    Object *obj;
>> +
>> +    obj = object_new(TYPE_SPAPR_XIVE);
>> +    object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
>> +    object_property_add_const_link(obj, "ics", OBJECT(spapr->ics),
>> +                                   &error_abort);
>> +    object_property_set_int(obj, nr_irqs, "nr-irqs",  &local_err);
>> +    if (local_err) {
>> +        goto error;
>> +    }
>> +    object_property_set_int(obj, nr_servers, "nr-targets", &local_err);
>> +    if (local_err) {
>> +        goto error;
>> +    }
>> +    object_property_set_bool(obj, true, "realized", &local_err);
>> +    if (local_err) {
>> +        goto error;
>> +    }
>> +
>> +    return SPAPR_XIVE(obj);
>> +error:
>> +    error_propagate(errp, local_err);
>> +    return NULL;
>> +}
>> +
>>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>>                                    int smt_threads)
>>  {
>> @@ -1093,7 +1124,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>>      }
>>  
>>      QLIST_FOREACH(phb, &spapr->phbs, list) {
>> -        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, XICS_IRQS_SPAPR);
>> +        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt,
>> +                                    XICS_IRQS_SPAPR + xics_max_server_number());
>>          if (ret < 0) {
>>              error_report("couldn't setup PCI devices in fdt");
>>              exit(1);
>> @@ -2140,6 +2172,16 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>>      g_free(type);
>>  }
>>  
>> +/*
>> + * Only POWER9 Processor chips support the XIVE interrupt controller
>> + */
>> +static bool ppc_support_xive(MachineState *machine)
>> +{
>> +   PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(first_cpu);
>> +
>> +   return pcc->pvr_match(pcc, CPU_POWERPC_POWER9_BASE);
>> +}
>> +
>>  /* pSeries LPAR / sPAPR hardware init */
>>  static void ppc_spapr_init(MachineState *machine)
>>  {
>> @@ -2237,7 +2279,8 @@ static void ppc_spapr_init(MachineState *machine)
>>      load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
>>  
>>      /* Set up Interrupt Controller before we create the VCPUs */
>> -    xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
>> +    xics_system_init(machine, XICS_IRQS_SPAPR + xics_max_server_number(),
>> +                     &error_fatal);
> 
> Has this hunk leaked from another patch?  AFAICT it only affects XICS
> with what you have so far, which doesn't seem like what you want.

no. We are sharing the ICSIRQState array of XICS. This is why.

C. 

>>      /* Set up containers for ibm,client-set-architecture negotiated options */
>>      spapr->ov5 = spapr_ovec_new();
>> @@ -2274,6 +2317,22 @@ static void ppc_spapr_init(MachineState *machine)
>>  
>>      spapr_init_cpus(spapr);
>>  
>> +    /* Set up XIVE. CAS will choose whether the guest runs in XICS
>> +     * (legacy mode) or XIVE Exploitation mode
>> +     *
>> +     * We don't have KVM support yet, so check for irqchip=on
>> +     */
>> +    if (ppc_support_xive(machine)) {
>> +        if (kvm_enabled() && machine_kernel_irqchip_required(machine)) {
>> +            error_report("kernel_irqchip requested. no XIVE support");
>> +        } else {
>> +            spapr->xive = spapr_spapr_xive_create(spapr,
>> +                               XICS_IRQS_SPAPR + xics_max_server_number(),
>> +                               xics_max_server_number(),
>> +                               &error_fatal);
>> +        }
>> +    }
>> +
>>      if (kvm_enabled()) {
>>          /* Enable H_LOGICAL_CI_* so SLOF can talk to in-kernel devices */
>>          kvmppc_enable_logical_ci_hcalls();
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 2a303a705c17..6cd5ab73c5dc 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -14,6 +14,7 @@ struct sPAPRNVRAM;
>>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>>  typedef struct sPAPREventSource sPAPREventSource;
>>  typedef struct sPAPRPendingHPT sPAPRPendingHPT;
>> +typedef struct sPAPRXive sPAPRXive;
>>  
>>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>>  #define SPAPR_ENTRY_POINT       0x100
>> @@ -127,6 +128,7 @@ struct sPAPRMachineState {
>>      MemoryHotplugState hotplug_memory;
>>  
>>      const char *icp_type;
>> +    sPAPRXive  *xive;
>>  };
>>  
>>  #define H_SUCCESS         0
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-19  8:44   ` David Gibson
@ 2017-09-20 12:26     ` Cédric Le Goater
  2017-09-21  1:35       ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20 12:26 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 10:44 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:32PM +0200, Cédric Le Goater wrote:
>> Like for XICS, the XIVE interface for the guest is described in the
>> device tree under the "interrupt-controller" node. A couple of new
>> properties are specific to XIVE :
>>
>>  - "reg"
>>
>>    contains the base address and size of the thread interrupt
>>    managnement areas (TIMA), also called rings, for the User level and
>>    for the Guest OS level. Only the Guest OS level is taken into
>>    account today.
>>
>>  - "ibm,xive-eq-sizes"
>>
>>    the size of the event queues. One cell per size supported, contains
>>    log2 of size, in ascending order.
>>
>>  - "ibm,xive-lisn-ranges"
>>
>>    the interrupt numbers ranges assigned to the guest. These are
>>    allocated using a simple bitmap.
>>
>> and also under the root node :
>>
>>  - "ibm,plat-res-int-priorities"
>>
>>    contains a list of priorities that the hypervisor has reserved for
>>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive_hcall.c  | 54 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr_xive.h |  1 +
>>  2 files changed, 55 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
>> index 4c77b65683de..7b19ea6373dd 100644
>> --- a/hw/intc/spapr_xive_hcall.c
>> +++ b/hw/intc/spapr_xive_hcall.c
>> @@ -874,3 +874,57 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>  }
>> +
>> +void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle)
>> +{
>> +    int node;
>> +    uint64_t timas[2 * 2];
>> +    uint32_t lisn_ranges[] = {
>> +        cpu_to_be32(xive->nr_irqs - xive->nr_targets + xive->ics->offset),
>> +        cpu_to_be32(xive->nr_targets),
>> +    };
>> +    uint32_t eq_sizes[] = {
>> +        cpu_to_be32(12), /* 4K */
>> +        cpu_to_be32(16), /* 64K */
>> +        cpu_to_be32(21), /* 2M */
>> +        cpu_to_be32(24), /* 16M */
>> +    };
>> +
>> +    /* Use some ranges to exercise the Linux driver, which should
>> +     * result in Linux choosing priority 6. This is not strictly
>> +     * necessary
>> +     */
>> +    uint32_t reserved_priorities[] = {
>> +        cpu_to_be32(1),  /* start */
>> +        cpu_to_be32(2),  /* count */
>> +        cpu_to_be32(7),  /* start */
>> +        cpu_to_be32(0xf8),  /* count */
>> +    };
>> +    int i;
>> +
>> +    /* Thread Interrupt Management Areas : User and OS */
>> +    for (i = 0; i < 2; i++) {
>> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
>> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
>> +    }
>> +
>> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
>> +
>> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
> 
> Shouldn't need this - SLOF will figure it out from the node name above.

It is in the specs. phyp has it. we might as well keep it.

> 
>> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
>> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
>> +
>> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
>> +                     sizeof(eq_sizes)));
>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
>> +                     sizeof(lisn_ranges)));
> 
> I note this doesn't have the interrupt-controller or #interrupt-cells
> properties.  So what acts as the interrupt parent for all the devices
> in the tree with XIVE?

these properties are not in the specs anymore for the interrupt-controller
node and I don't think Linux makes use of them (even for XICS). So 
it just works fine.

C. 

>> +    /* For SLOF */
>> +    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
>> +    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
>> +
>> +    /* top properties */
>> +    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
>> +                     reserved_priorities, sizeof(reserved_priorities)));
>> +}
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index ae5ff89533c0..0a156f2d8591 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -69,5 +69,6 @@ struct sPAPRXive {
>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>  
>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>> +void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-19  8:46   ` David Gibson
@ 2017-09-20 12:33     ` Cédric Le Goater
  2017-09-21  1:25       ` David Gibson
  2017-09-28  8:23       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20 12:33 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 10:46 AM, David Gibson wrote:
> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>>> negotiation process determines whether the guest operates with an
>>> interrupt controller using the XICS legacy model, as found on POWER8,
>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
>>>
>>> Follows a model for the XIVE interrupt controller and support for the
>>> Hypervisor's calls which are used to configure the interrupt sources
>>> and the event/notification queues of the guest. The last patch
>>> integrates XIVE in the sPAPR machine.
>>>
>>> Code is here:
>>
>>
>> An overall comment:
>>
>> I note in several replies here that I think the way XICS objects are
>> re-used for XIVE is really ugly, and I think it will make future
>> maintenance pretty painful.

I agree. That was one way to identify what we need for migration 
compatibility and CAS reset.   

>> I'm thinking maybe trying to support the CAS negotiation of interrupt
>> controller from day 1 is warping the design.  A better approach might
>> be first to implement XIVE only when given a specific machine option -
>> guest gets one or the other and can't negotiate.

ok. 

CAS is not the most complex problem, we mostly need to share 
the ICSIRQState array and the source offset. migration from older
machine is a problem. We are doomed to keep the existing XICS
framework available.

>> That should allow a more natural XIVE design to emerge, *then* we can
>> look at what's necessary to make boot-time negotiation possible.
> 
> Actually, it just occurred to me that we might be making life hard for
> ourselves by trying to actually switch between full XICS and XIVE
> models.  Coudln't we have new machine types always construct the XIVE
> infrastructure, 

yes.

> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> hardware.

ok but migration will not be supported.

> Since something more or less equivalent
> has already been done in both OPAL and the host kernel, I'm guessing
> this shouldn't be too hard at this point.

Indeed that is how it is working currently on P9 kvm guests. hcalls are
implemented on top of XIVE native.

Thanks,


C.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-19  2:57   ` David Gibson
@ 2017-09-20 12:54     ` Cédric Le Goater
  2017-09-22 10:58       ` David Gibson
  2017-09-28  8:27       ` Benjamin Herrenschmidt
  2017-09-20 13:05     ` Cédric Le Goater
  1 sibling, 2 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20 12:54 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/19/2017 04:57 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
>> Each interrupt source is associated with a two bit state machine
>> called an Event State Buffer (ESB) which is controlled by MMIO to
>> trigger events. See code for more details on the states and
>> transitions.
>>
>> The MMIO space for the ESB translation is 512GB large on baremetal
>> (powernv) systems and the BAR depends on the chip id. In our model for
>> the sPAPR machine, we choose to only map a sub memory region for the
>> provisionned IRQ numbers and to use the mapping address of chip 0 on a
>> real system. The OS will get the address of the MMIO page of the ESB
>> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> 
> On bare metal, are the MMIOs for each irq source mapped contiguously?

yes. 
 
>> For KVM support, we should think of a way to map this QEMU memory
>> region in the host to trigger events directly.
> 
> This would rely on being able to map them without mapping those for
> any other VM or the host.  Does that mean allocating a contiguous (and
> aligned) hunk of irqs for a guest?

I think so yes, the IRQ and the memory regions are tied, and also being 
able to pass the MMIO region from the host to the guest, a bit like VFIO 
for the IOMMU regions I suppose. But I haven't dig the problem too much. 

This is an important part in the overall design. 

> We're going to need to be careful about irq allocation here.
> Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
> MMIO addresses, 

GET_SOURCE_INFO only retrieves the address of the MMIO region for 
a 'lisn'. it is not dynamically mapped. In the KVM case, the initial
information on the address would come from OPAL and then the host 
kernel would translate this information for the guest.

> we need the MMIO addresses to be stable and consistent, because 
> we can't have them change across migration.  

yes. I will catch my XIVE guru next week in Paris to clarify that
part. 

> We need to have this consistent between in-qemu and in-KVM XIVE
> implementations as well.

yes.

C.

>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr_xive.h |   6 ++
>>  2 files changed, 261 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 1ed7b6a286e9..8a85d64efc4c 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>>  }
>>  
>>  /*
>> + * "magic" Event State Buffer (ESB) MMIO offsets.
>> + *
>> + * Each interrupt source has a 2-bit state machine called ESB
>> + * which can be controlled by MMIO. It's made of 2 bits, P and
>> + * Q. P indicates that an interrupt is pending (has been sent
>> + * to a queue and is waiting for an EOI). Q indicates that the
>> + * interrupt has been triggered while pending.
>> + *
>> + * This acts as a coalescing mechanism in order to guarantee
>> + * that a given interrupt only occurs at most once in a queue.
>> + *
>> + * When doing an EOI, the Q bit will indicate if the interrupt
>> + * needs to be re-triggered.
>> + *
>> + * The following offsets into the ESB MMIO allow to read or
>> + * manipulate the PQ bits. They must be used with an 8-bytes
>> + * load instruction. They all return the previous state of the
>> + * interrupt (atomically).
>> + *
>> + * Additionally, some ESB pages support doing an EOI via a
>> + * store at 0 and some ESBs support doing a trigger via a
>> + * separate trigger page.
>> + */
>> +#define XIVE_ESB_GET            0x800
>> +#define XIVE_ESB_SET_PQ_00      0xc00
>> +#define XIVE_ESB_SET_PQ_01      0xd00
>> +#define XIVE_ESB_SET_PQ_10      0xe00
>> +#define XIVE_ESB_SET_PQ_11      0xf00
>> +
>> +#define XIVE_ESB_VAL_P          0x2
>> +#define XIVE_ESB_VAL_Q          0x1
>> +
>> +#define XIVE_ESB_RESET          0x0
>> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
>> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
>> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
>> +
>> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
>> +{
>> +    uint32_t byte = idx / 4;
>> +    uint32_t bit  = (idx % 4) * 2;
>> +
>> +    assert(byte < xive->sbe_size);
>> +
>> +    return (xive->sbe[byte] >> bit) & 0x3;
>> +}
>> +
>> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
>> +{
>> +    uint32_t byte = idx / 4;
>> +    uint32_t bit  = (idx % 4) * 2;
>> +    uint8_t old, new;
>> +
>> +    assert(byte < xive->sbe_size);
>> +
>> +    old = xive->sbe[byte];
>> +
>> +    new = xive->sbe[byte] & ~(0x3 << bit);
>> +    new |= (pq & 0x3) << bit;
>> +
>> +    xive->sbe[byte] = new;
>> +
>> +    return (old >> bit) & 0x3;
>> +}
>> +
>> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
>> +{
>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
>> +
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_PENDING:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_QUEUED:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_OFF:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
>> +{
>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
>> +
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_PENDING:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
>> +        return true;
>> +    case XIVE_ESB_QUEUED:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
>> +        return true;
>> +    case XIVE_ESB_OFF:
>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source MMIOs
>> + */
>> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)
>> +{
>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
>> +
>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>> +        irq->status &= ~XICS_STATUS_SENT;
>> +    }
>> +}
>> +
>> +/* TODO: handle second page
>> + *
>> + * Some HW use a separate page for trigger. We only support the case
>> + * in which the trigger can be done in the same page as the EOI.
>> + */
>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t srcno = addr >> xive->esb_shift;
>> +    XiveIVE *ive;
>> +    uint64_t ret = -1;
>> +
>> +    ive = spapr_xive_get_ive(xive, srcno);
>> +    if (!ive || !(ive->w & IVE_VALID))  {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
>> +        goto out;
> 
> Since there's a whole (4k) page for each source, I wonder if we should
> actually map each one as a separate MMIO region to allow us to tweak
> the mappings more flexibly.
> 
>> +    }
>> +
>> +    switch (offset) {
>> +    case 0:
>> +        spapr_xive_source_eoi(xive, srcno);
>> +
>> +        /* return TRUE or FALSE depending on PQ value */
>> +        ret = spapr_xive_pq_eoi(xive, srcno);
>> +        break;
>> +
>> +    case XIVE_ESB_GET:
>> +        ret = spapr_xive_pq_get(xive, srcno);
>> +        break;
>> +
>> +    case XIVE_ESB_SET_PQ_00:
>> +    case XIVE_ESB_SET_PQ_01:
>> +    case XIVE_ESB_SET_PQ_10:
>> +    case XIVE_ESB_SET_PQ_11:
>> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
>> +                           uint64_t value, unsigned size)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t srcno = addr >> xive->esb_shift;
>> +    XiveIVE *ive;
>> +    bool notify = false;
>> +
>> +    ive = spapr_xive_get_ive(xive, srcno);
>> +    if (!ive || !(ive->w & IVE_VALID))  {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
>> +        return;
>> +    }
>> +
>> +    switch (offset) {
>> +    case 0:
>> +        /* TODO: should we trigger even if the IVE is masked ? */
>> +        notify = spapr_xive_pq_trigger(xive, srcno);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
>> +                      offset);
>> +        return;
>> +    }
>> +
>> +    if (notify && !(ive->w & IVE_MASKED)) {
>> +        qemu_irq_pulse(xive->qirqs[srcno]);
>> +    }
>> +}
>> +
>> +static const MemoryRegionOps spapr_xive_esb_ops = {
>> +    .read = spapr_xive_esb_read,
>> +    .write = spapr_xive_esb_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +    .valid = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +    .impl = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>> +/*
>>   * XIVE Interrupt Source
>>   */
>>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
>> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
>>  /*
>>   * Main XIVE object
>>   */
>> +#define P9_MMIO_BASE     0x006000000000000ull
>> +
>> +/* VC BAR contains set translations for the ESBs and the EQs. */
>> +#define VC_BAR_DEFAULT   0x10000000000ull
>> +#define VC_BAR_SIZE      0x08000000000ull
>> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
>> +
>> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
>> +                                            unsigned size)
>> +{
>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
>> +                  __func__, offset, size);
>> +    return 0;
>> +}
>> +
>> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
>> +                                         uint64_t value, unsigned size)
>> +{
>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
>> +                  __func__, offset, value, size);
>> +}
>> +
>> +static const MemoryRegionOps spapr_xive_esb_default_ops = {
>> +    .read = spapr_xive_esb_default_read,
>> +    .write = spapr_xive_esb_default_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +};
>>  
>>  void spapr_xive_reset(void *dev)
>>  {
>> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
>>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
>>  
>> +    /* VC BAR. That's the full window but we will only map the
>> +     * subregions in use. */
>> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
>> +    xive->esb_shift = ESB_SHIFT;
>> +
>> +    /* Install default memory region handlers to log bogus access */
>> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
>> +                          NULL, "xive.esb.full", VC_BAR_SIZE);
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
>> +
>> +    /* Install the ESB memory region in the overall one */
>> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
>> +                          xive, "xive.esb",
>> +                          (1ull << xive->esb_shift) * xive->nr_irqs);
>> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
>> +
>>      qemu_register_reset(spapr_xive_reset, dev);
>>  }
>>  
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index eab92c4c1bb8..0f516534d76a 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -46,6 +46,12 @@ struct sPAPRXive {
>>      XiveIVE      *ivt;
>>      XiveEQ       *eqt;
>>      uint32_t     nr_eqs;
>> +
>> +    /* ESB memory region */
>> +    uint32_t     esb_shift;
>> +    hwaddr       esb_base;
>> +    MemoryRegion esb_mr;
>> +    MemoryRegion esb_iomem;
>>  };
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-19  2:57   ` David Gibson
  2017-09-20 12:54     ` Cédric Le Goater
@ 2017-09-20 13:05     ` Cédric Le Goater
  2017-09-28  8:29       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-20 13:05 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf


>> +/*
>> + * XIVE Interrupt Source MMIOs
>> + */
>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t srcno = addr >> xive->esb_shift;
>> +    XiveIVE *ive;
>> +    uint64_t ret = -1;
>> +
>> +    ive = spapr_xive_get_ive(xive, srcno);
>> +    if (!ive || !(ive->w & IVE_VALID))  {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
>> +        goto out;
> 
> Since there's a whole (4k) page for each source, I wonder if we should
> actually map each one as a separate MMIO region to allow us to tweak
> the mappings more flexibly
yes we could have a subregion for each source. In that case, 
we should also handle IVE_VALID properly. That will require 
a specific XIVE allocator which was difficult to do while
keeping the compatibility with XICS for migration and CAS.


C.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-20 12:33     ` Cédric Le Goater
@ 2017-09-21  1:25       ` David Gibson
  2017-09-21 14:18         ` Cédric Le Goater
  2017-09-28  8:23       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-21  1:25 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]

On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 10:46 AM, David Gibson wrote:
> > On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
> >> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> >>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> >>> negotiation process determines whether the guest operates with an
> >>> interrupt controller using the XICS legacy model, as found on POWER8,
> >>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> >>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> >>>
> >>> Follows a model for the XIVE interrupt controller and support for the
> >>> Hypervisor's calls which are used to configure the interrupt sources
> >>> and the event/notification queues of the guest. The last patch
> >>> integrates XIVE in the sPAPR machine.
> >>>
> >>> Code is here:
> >>
> >>
> >> An overall comment:
> >>
> >> I note in several replies here that I think the way XICS objects are
> >> re-used for XIVE is really ugly, and I think it will make future
> >> maintenance pretty painful.
> 
> I agree. That was one way to identify what we need for migration 
> compatibility and CAS reset.   
> 
> >> I'm thinking maybe trying to support the CAS negotiation of interrupt
> >> controller from day 1 is warping the design.  A better approach might
> >> be first to implement XIVE only when given a specific machine option -
> >> guest gets one or the other and can't negotiate.
> 
> ok. 
> 
> CAS is not the most complex problem, we mostly need to share 
> the ICSIRQState array and the source offset. migration from older
> machine is a problem.

Uh.. what?  Migration from an older machine isn't a thing.  We can
migrate from an older qemu, but the machine type (and version) has to
be identical at each end.  That's *why* we keep around the older
machine types on newer qemus.

> We are doomed to keep the existing XICS
> framework available.
> 
> >> That should allow a more natural XIVE design to emerge, *then* we can
> >> look at what's necessary to make boot-time negotiation possible.
> > 
> > Actually, it just occurred to me that we might be making life hard for
> > ourselves by trying to actually switch between full XICS and XIVE
> > models.  Coudln't we have new machine types always construct the XIVE
> > infrastructure, 
> 
> yes.
> 
> > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> > hardware.
> 
> ok but migration will not be supported.

Right, this would only be for newer machine types, and you can never
migrate between different machine types.

> > Since something more or less equivalent
> > has already been done in both OPAL and the host kernel, I'm guessing
> > this shouldn't be too hard at this point.
> 
> Indeed that is how it is working currently on P9 kvm guests. hcalls are
> implemented on top of XIVE native.
> 
> Thanks,
> 
> 
> C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-20 12:26     ` Cédric Le Goater
@ 2017-09-21  1:35       ` David Gibson
  2017-09-21 11:21         ` Cédric Le Goater
  2017-09-28  8:31         ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 90+ messages in thread
From: David Gibson @ 2017-09-21  1:35 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4917 bytes --]

On Wed, Sep 20, 2017 at 02:26:32PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 10:44 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:32PM +0200, Cédric Le Goater wrote:
> >> Like for XICS, the XIVE interface for the guest is described in the
> >> device tree under the "interrupt-controller" node. A couple of new
> >> properties are specific to XIVE :
> >>
> >>  - "reg"
> >>
> >>    contains the base address and size of the thread interrupt
> >>    managnement areas (TIMA), also called rings, for the User level and
> >>    for the Guest OS level. Only the Guest OS level is taken into
> >>    account today.
> >>
> >>  - "ibm,xive-eq-sizes"
> >>
> >>    the size of the event queues. One cell per size supported, contains
> >>    log2 of size, in ascending order.
> >>
> >>  - "ibm,xive-lisn-ranges"
> >>
> >>    the interrupt numbers ranges assigned to the guest. These are
> >>    allocated using a simple bitmap.
> >>
> >> and also under the root node :
> >>
> >>  - "ibm,plat-res-int-priorities"
> >>
> >>    contains a list of priorities that the hypervisor has reserved for
> >>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive_hcall.c  | 54 +++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr_xive.h |  1 +
> >>  2 files changed, 55 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> >> index 4c77b65683de..7b19ea6373dd 100644
> >> --- a/hw/intc/spapr_xive_hcall.c
> >> +++ b/hw/intc/spapr_xive_hcall.c
> >> @@ -874,3 +874,57 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> >>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> >>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
> >>  }
> >> +
> >> +void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle)
> >> +{
> >> +    int node;
> >> +    uint64_t timas[2 * 2];
> >> +    uint32_t lisn_ranges[] = {
> >> +        cpu_to_be32(xive->nr_irqs - xive->nr_targets + xive->ics->offset),
> >> +        cpu_to_be32(xive->nr_targets),
> >> +    };
> >> +    uint32_t eq_sizes[] = {
> >> +        cpu_to_be32(12), /* 4K */
> >> +        cpu_to_be32(16), /* 64K */
> >> +        cpu_to_be32(21), /* 2M */
> >> +        cpu_to_be32(24), /* 16M */
> >> +    };
> >> +
> >> +    /* Use some ranges to exercise the Linux driver, which should
> >> +     * result in Linux choosing priority 6. This is not strictly
> >> +     * necessary
> >> +     */
> >> +    uint32_t reserved_priorities[] = {
> >> +        cpu_to_be32(1),  /* start */
> >> +        cpu_to_be32(2),  /* count */
> >> +        cpu_to_be32(7),  /* start */
> >> +        cpu_to_be32(0xf8),  /* count */
> >> +    };
> >> +    int i;
> >> +
> >> +    /* Thread Interrupt Management Areas : User and OS */
> >> +    for (i = 0; i < 2; i++) {
> >> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
> >> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
> >> +    }
> >> +
> >> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
> >> +
> >> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
> > 
> > Shouldn't need this - SLOF will figure it out from the node name above.
> 
> It is in the specs. phyp has it. we might as well keep it.

You misunderstand.  SLOF will *create* the name property based on the
node name.  Adding it here has *no effect*.

> >> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
> >> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> >> +
> >> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
> >> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
> >> +                     sizeof(eq_sizes)));
> >> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
> >> +                     sizeof(lisn_ranges)));
> > 
> > I note this doesn't have the interrupt-controller or #interrupt-cells
> > properties.  So what acts as the interrupt parent for all the devices
> > in the tree with XIVE?
> 
> these properties are not in the specs anymore for the interrupt-controller
> node and I don't think Linux makes use of them (even for XICS). So 
> it just works fine.

Um.. what!?  Are you saying that the PAPR XIVE spec completely broke
how interrupt specifiers have worked in the device tree since forever?

And I'm pretty sure Linux does make use of them.  Without
#interrupt-cells, there's no way it can properly interpret the
interrupts properties in the device nodes.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-21  1:35       ` David Gibson
@ 2017-09-21 11:21         ` Cédric Le Goater
  2017-09-22 10:54           ` David Gibson
  2017-09-28  8:43           ` Benjamin Herrenschmidt
  2017-09-28  8:31         ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-21 11:21 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/21/2017 03:35 AM, David Gibson wrote:
> On Wed, Sep 20, 2017 at 02:26:32PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 10:44 AM, David Gibson wrote:
>>> On Mon, Sep 11, 2017 at 07:12:32PM +0200, Cédric Le Goater wrote:
>>>> Like for XICS, the XIVE interface for the guest is described in the
>>>> device tree under the "interrupt-controller" node. A couple of new
>>>> properties are specific to XIVE :
>>>>
>>>>  - "reg"
>>>>
>>>>    contains the base address and size of the thread interrupt
>>>>    managnement areas (TIMA), also called rings, for the User level and
>>>>    for the Guest OS level. Only the Guest OS level is taken into
>>>>    account today.
>>>>
>>>>  - "ibm,xive-eq-sizes"
>>>>
>>>>    the size of the event queues. One cell per size supported, contains
>>>>    log2 of size, in ascending order.
>>>>
>>>>  - "ibm,xive-lisn-ranges"
>>>>
>>>>    the interrupt numbers ranges assigned to the guest. These are
>>>>    allocated using a simple bitmap.
>>>>
>>>> and also under the root node :
>>>>
>>>>  - "ibm,plat-res-int-priorities"
>>>>
>>>>    contains a list of priorities that the hypervisor has reserved for
>>>>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/spapr_xive_hcall.c  | 54 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr_xive.h |  1 +
>>>>  2 files changed, 55 insertions(+)
>>>>
>>>> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
>>>> index 4c77b65683de..7b19ea6373dd 100644
>>>> --- a/hw/intc/spapr_xive_hcall.c
>>>> +++ b/hw/intc/spapr_xive_hcall.c
>>>> @@ -874,3 +874,57 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>>>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>>>  }
>>>> +
>>>> +void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle)
>>>> +{
>>>> +    int node;
>>>> +    uint64_t timas[2 * 2];
>>>> +    uint32_t lisn_ranges[] = {
>>>> +        cpu_to_be32(xive->nr_irqs - xive->nr_targets + xive->ics->offset),
>>>> +        cpu_to_be32(xive->nr_targets),
>>>> +    };
>>>> +    uint32_t eq_sizes[] = {
>>>> +        cpu_to_be32(12), /* 4K */
>>>> +        cpu_to_be32(16), /* 64K */
>>>> +        cpu_to_be32(21), /* 2M */
>>>> +        cpu_to_be32(24), /* 16M */
>>>> +    };
>>>> +
>>>> +    /* Use some ranges to exercise the Linux driver, which should
>>>> +     * result in Linux choosing priority 6. This is not strictly
>>>> +     * necessary
>>>> +     */
>>>> +    uint32_t reserved_priorities[] = {
>>>> +        cpu_to_be32(1),  /* start */
>>>> +        cpu_to_be32(2),  /* count */
>>>> +        cpu_to_be32(7),  /* start */
>>>> +        cpu_to_be32(0xf8),  /* count */
>>>> +    };
>>>> +    int i;
>>>> +
>>>> +    /* Thread Interrupt Management Areas : User and OS */
>>>> +    for (i = 0; i < 2; i++) {
>>>> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
>>>> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
>>>> +    }
>>>> +
>>>> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
>>>> +
>>>> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
>>>
>>> Shouldn't need this - SLOF will figure it out from the node name above.
>>
>> It is in the specs. phyp has it. we might as well keep it.
> 
> You misunderstand.  SLOF will *create* the name property based on the
> node name.  Adding it here has *no effect*.

ok. I was not ware of that. I will remove it then.
 
>>>> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
>>>> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
>>>> +
>>>> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
>>>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
>>>> +                     sizeof(eq_sizes)));
>>>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
>>>> +                     sizeof(lisn_ranges)));
>>>
>>> I note this doesn't have the interrupt-controller or #interrupt-cells
>>> properties.  So what acts as the interrupt parent for all the devices
>>> in the tree with XIVE?
>>
>> these properties are not in the specs anymore for the interrupt-controller
>> node and I don't think Linux makes use of them (even for XICS). So 
>> it just works fine.
> 
> Um.. what!?  Are you saying that the PAPR XIVE spec completely broke
> how interrupt specifiers have worked in the device tree since forever?

Let me be more precise. I am saying that the interrupt-controller 
and #interrupt-cells properties are not needed under the main interrupt 
controller node. They can be removed from the tree and the Linux guest 
kernel will boot perfectly well.
  
These properties still are needed under the sub nodes like :

/proc/device-tree/vdevice/interrupt-controller
/proc/device-tree/event-sources/interrupt-controller

C.


> And I'm pretty sure Linux does make use of them.  Without
> #interrupt-cells, there's no way it can properly interpret the
> interrupts properties in the device nodes.
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources
  2017-09-20  4:38       ` David Gibson
@ 2017-09-21 14:11         ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-21 14:11 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/20/2017 06:38 AM, David Gibson wrote:
> On Tue, Sep 19, 2017 at 05:08:21PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 04:48 AM, David Gibson wrote:
>>> On Mon, Sep 11, 2017 at 07:12:20PM +0200, Cédric Le Goater wrote:
>>>> These are very similar to the XICS handlers in a simpler form. They
>>>> make use of the ICSIRQState array of the XICS interrupt source to
>>>> differentiate the MSI from the LSI interrupts. The spapr_xive_irq()
>>>> routine in charge of triggering the CPU interrupt line will be filled
>>>> later on.
>>>>
>>>> The next patch will introduce the MMIO handlers to interact with XIVE
>>>> interrupt sources.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/spapr_xive.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr_xive.h |  1 +
>>>>  2 files changed, 47 insertions(+)
>>>>
>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>> index 52c32f588d6d..1ed7b6a286e9 100644
>>>> --- a/hw/intc/spapr_xive.c
>>>> +++ b/hw/intc/spapr_xive.c
>>>> @@ -27,6 +27,50 @@
>>>>  
>>>>  #include "xive-internal.h"
>>>>  
>>>> +static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>>>> +{
>>>> +
>>>> +}
>>>> +
>>>> +/*
>>>> + * XIVE Interrupt Source
>>>> + */
>>>> +static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
>>>> +{
>>>> +    if (val) {
>>>> +        spapr_xive_irq(xive, srcno);
>>>> +    }
>>>> +}
>>>
>>> So in XICS "srcno" (vs "irq") indicates an offset within a single ICS
>>> object, as opposed to a global irq number.  Does that concept even
>>> exist in XIVE?
>>
>> We don't really care in the internals. 'srcno' is just an index in the 
>> tables, may be I should change the name. It could be the same in XICS 
>> but the xirr is manipulated at low level and so we need to propagate 
>> the source offset in a couple of places. 
> 
> Right.  My point is that the XICS code deliberately uses srcno vs. irq
> names to identify which space we're talking about.  If we re-use the
> srcno name in XIVE where it doesn't really apply that could be
> misleading.

yes. ok I will be careful with the naming. 

>> This to say that the 'irq' number is a guest level information which
>> in the patchset should only be used at the hcall level to identify 
>> a source.
> 
> Right, and if there's no need to introduce a number space other than
> the guest one, we should keep using that everywhere - and give it a
> consistent name to avoid confusion.

yes. I agree. I think XICS could benefit from some cleanups.

>>>> +
>>>> +static void spapr_xive_source_set_irq_lsi(sPAPRXive *xive, int srcno, int val)
>>>> +{
>>>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
>>>> +
>>>> +    if (val) {
>>>> +        irq->status |= XICS_STATUS_ASSERTED;
>>>> +    } else {
>>>> +        irq->status &= ~XICS_STATUS_ASSERTED;
>>>
>>> More mangling a XICS specific object for XIVE operations.  Please
>>> stop.
>>
>> ah ! we will still need the same information and that means introducing 
>> a common source object. The patchset today just uses the XICS ICSIRQState 
>> array as a common object.
> 
> It's not really the same information though.  For XICS irq->status is
> *all* the information about the line's state, for XIVE, most of that
> info is in the PQ bits which are elsewhere. 

This is true.

> That makes at least some of the information in ICSIRQState redundant,> and therefore confusing and misleading.

I will respin the patchset in a different way to distinguish 
xive from xics clearly. I will keep CAS and migration for later. 
The source should not be too complex to handle but I don't know 
for the ICP. 

Thanks,

C.

>>>> +    }
>>>> +
>>>> +    if (irq->status & XICS_STATUS_ASSERTED
>>>> +        && !(irq->status & XICS_STATUS_SENT)) {
>>>> +        irq->status |= XICS_STATUS_SENT;
>>>> +        spapr_xive_irq(xive, srcno);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
>>>> +{
>>>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>>>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
>>>> +
>>>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>>>> +        spapr_xive_source_set_irq_lsi(xive, srcno, val);
>>>> +    } else {
>>>> +        spapr_xive_source_set_irq_msi(xive, srcno, val);
>>>> +    }
>>>> +}
>>>> +
>>>>  /*
>>>>   * Main XIVE object
>>>>   */
>>>> @@ -80,6 +124,8 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>>>      }
>>>>  
>>>>      xive->ics = ICS_BASE(obj);
>>>> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
>>>> +                                     xive->nr_irqs);
>>>>  
>>>>      /* Allocate the last IRQ numbers for the IPIs */
>>>>      for (i = xive->nr_irqs - xive->nr_targets; i < xive->nr_irqs; i++) {
>>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>>>> index 29112589b37f..eab92c4c1bb8 100644
>>>> --- a/include/hw/ppc/spapr_xive.h
>>>> +++ b/include/hw/ppc/spapr_xive.h
>>>> @@ -38,6 +38,7 @@ struct sPAPRXive {
>>>>  
>>>>      /* IRQ */
>>>>      ICSState     *ics;  /* XICS source inherited from the SPAPR machine */
>>>> +    qemu_irq     *qirqs;
>>>>  
>>>>      /* XIVE internal tables */
>>>>      uint8_t      *sbe;
>>>
>>
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-21  1:25       ` David Gibson
@ 2017-09-21 14:18         ` Cédric Le Goater
  2017-09-22 10:33           ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-21 14:18 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/21/2017 03:25 AM, David Gibson wrote:
> On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 10:46 AM, David Gibson wrote:
>>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
>>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
>>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>>>>> negotiation process determines whether the guest operates with an
>>>>> interrupt controller using the XICS legacy model, as found on POWER8,
>>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
>>>>>
>>>>> Follows a model for the XIVE interrupt controller and support for the
>>>>> Hypervisor's calls which are used to configure the interrupt sources
>>>>> and the event/notification queues of the guest. The last patch
>>>>> integrates XIVE in the sPAPR machine.
>>>>>
>>>>> Code is here:
>>>>
>>>>
>>>> An overall comment:
>>>>
>>>> I note in several replies here that I think the way XICS objects are
>>>> re-used for XIVE is really ugly, and I think it will make future
>>>> maintenance pretty painful.
>>
>> I agree. That was one way to identify what we need for migration 
>> compatibility and CAS reset.   
>>
>>>> I'm thinking maybe trying to support the CAS negotiation of interrupt
>>>> controller from day 1 is warping the design.  A better approach might
>>>> be first to implement XIVE only when given a specific machine option -
>>>> guest gets one or the other and can't negotiate.
>>
>> ok. 
>>
>> CAS is not the most complex problem, we mostly need to share 
>> the ICSIRQState array and the source offset. migration from older
>> machine is a problem.
> 
> Uh.. what?  Migration from an older machine isn't a thing.  We can
> migrate from an older qemu, but the machine type (and version) has to
> be identical at each end.  That's *why* we keep around the older
> machine types on newer qemus.

yes. I am just wondering how I am going to handle a xics-only 
machine migrating to a xics/xive machine. 

The xive machine option we are talking about will activate 
the xive interrupt mode and instantiate the objects behind it. 
So when we migrate from an older machine we will need to start 
the target machine with xive=off. I guess that is OK.   

Thanks for the insights and the time to review the code,

C. 

>> We are doomed to keep the existing XICS
>> framework available.
>>
>>>> That should allow a more natural XIVE design to emerge, *then* we can
>>>> look at what's necessary to make boot-time negotiation possible.
>>>
>>> Actually, it just occurred to me that we might be making life hard for
>>> ourselves by trying to actually switch between full XICS and XIVE
>>> models.  Coudln't we have new machine types always construct the XIVE
>>> infrastructure, 
>>
>> yes.
>>
>>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
>>> hardware.
>>
>> ok but migration will not be supported.
> 
> Right, this would only be for newer machine types, and you can never
> migrate between different machine types.
> 
>>> Since something more or less equivalent
>>> has already been done in both OPAL and the host kernel, I'm guessing
>>> this shouldn't be too hard at this point.
>>
>> Indeed that is how it is working currently on P9 kvm guests. hcalls are
>> implemented on top of XIVE native.
>>
>> Thanks,
>>
>>
>> C.
>>
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-21 14:18         ` Cédric Le Goater
@ 2017-09-22 10:33           ` David Gibson
  2017-09-22 12:32             ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-22 10:33 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4258 bytes --]

On Thu, Sep 21, 2017 at 04:18:33PM +0200, Cédric Le Goater wrote:
> On 09/21/2017 03:25 AM, David Gibson wrote:
> > On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
> >> On 09/19/2017 10:46 AM, David Gibson wrote:
> >>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
> >>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> >>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> >>>>> negotiation process determines whether the guest operates with an
> >>>>> interrupt controller using the XICS legacy model, as found on POWER8,
> >>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> >>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> >>>>>
> >>>>> Follows a model for the XIVE interrupt controller and support for the
> >>>>> Hypervisor's calls which are used to configure the interrupt sources
> >>>>> and the event/notification queues of the guest. The last patch
> >>>>> integrates XIVE in the sPAPR machine.
> >>>>>
> >>>>> Code is here:
> >>>>
> >>>>
> >>>> An overall comment:
> >>>>
> >>>> I note in several replies here that I think the way XICS objects are
> >>>> re-used for XIVE is really ugly, and I think it will make future
> >>>> maintenance pretty painful.
> >>
> >> I agree. That was one way to identify what we need for migration 
> >> compatibility and CAS reset.   
> >>
> >>>> I'm thinking maybe trying to support the CAS negotiation of interrupt
> >>>> controller from day 1 is warping the design.  A better approach might
> >>>> be first to implement XIVE only when given a specific machine option -
> >>>> guest gets one or the other and can't negotiate.
> >>
> >> ok. 
> >>
> >> CAS is not the most complex problem, we mostly need to share 
> >> the ICSIRQState array and the source offset. migration from older
> >> machine is a problem.
> > 
> > Uh.. what?  Migration from an older machine isn't a thing.  We can
> > migrate from an older qemu, but the machine type (and version) has to
> > be identical at each end.  That's *why* we keep around the older
> > machine types on newer qemus.
> 
> yes. I am just wondering how I am going to handle a xics-only 
> machine migrating to a xics/xive machine. 

Won't ever happen.  Older machine types will always be xics, newer
machine type will always be xive (at least with POWER9).

> The xive machine option we are talking about will activate 
> the xive interrupt mode and instantiate the objects behind it. 
> So when we migrate from an older machine we will need to start 
> the target machine with xive=off. I guess that is OK.

Again, we *don't* migrate from an older machine.  Ever.  We only ever
migrate from an older qemu version to a newer qemu using the older
machine type.
> 
> Thanks for the insights and the time to review the code,
> 
> C. 
> 
> >> We are doomed to keep the existing XICS
> >> framework available.
> >>
> >>>> That should allow a more natural XIVE design to emerge, *then* we can
> >>>> look at what's necessary to make boot-time negotiation possible.
> >>>
> >>> Actually, it just occurred to me that we might be making life hard for
> >>> ourselves by trying to actually switch between full XICS and XIVE
> >>> models.  Coudln't we have new machine types always construct the XIVE
> >>> infrastructure, 
> >>
> >> yes.
> >>
> >>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> >>> hardware.
> >>
> >> ok but migration will not be supported.
> > 
> > Right, this would only be for newer machine types, and you can never
> > migrate between different machine types.
> > 
> >>> Since something more or less equivalent
> >>> has already been done in both OPAL and the host kernel, I'm guessing
> >>> this shouldn't be too hard at this point.
> >>
> >> Indeed that is how it is working currently on P9 kvm guests. hcalls are
> >> implemented on top of XIVE native.
> >>
> >> Thanks,
> >>
> >>
> >> C.
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-21 11:21         ` Cédric Le Goater
@ 2017-09-22 10:54           ` David Gibson
  2017-09-28  8:43           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 90+ messages in thread
From: David Gibson @ 2017-09-22 10:54 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 6771 bytes --]

On Thu, Sep 21, 2017 at 01:21:10PM +0200, Cédric Le Goater wrote:
> On 09/21/2017 03:35 AM, David Gibson wrote:
> > On Wed, Sep 20, 2017 at 02:26:32PM +0200, Cédric Le Goater wrote:
> >> On 09/19/2017 10:44 AM, David Gibson wrote:
> >>> On Mon, Sep 11, 2017 at 07:12:32PM +0200, Cédric Le Goater wrote:
> >>>> Like for XICS, the XIVE interface for the guest is described in the
> >>>> device tree under the "interrupt-controller" node. A couple of new
> >>>> properties are specific to XIVE :
> >>>>
> >>>>  - "reg"
> >>>>
> >>>>    contains the base address and size of the thread interrupt
> >>>>    managnement areas (TIMA), also called rings, for the User level and
> >>>>    for the Guest OS level. Only the Guest OS level is taken into
> >>>>    account today.
> >>>>
> >>>>  - "ibm,xive-eq-sizes"
> >>>>
> >>>>    the size of the event queues. One cell per size supported, contains
> >>>>    log2 of size, in ascending order.
> >>>>
> >>>>  - "ibm,xive-lisn-ranges"
> >>>>
> >>>>    the interrupt numbers ranges assigned to the guest. These are
> >>>>    allocated using a simple bitmap.
> >>>>
> >>>> and also under the root node :
> >>>>
> >>>>  - "ibm,plat-res-int-priorities"
> >>>>
> >>>>    contains a list of priorities that the hypervisor has reserved for
> >>>>    its own use. Simulate ranges as defined by the PowerVM Hypervisor.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  hw/intc/spapr_xive_hcall.c  | 54 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>  include/hw/ppc/spapr_xive.h |  1 +
> >>>>  2 files changed, 55 insertions(+)
> >>>>
> >>>> diff --git a/hw/intc/spapr_xive_hcall.c b/hw/intc/spapr_xive_hcall.c
> >>>> index 4c77b65683de..7b19ea6373dd 100644
> >>>> --- a/hw/intc/spapr_xive_hcall.c
> >>>> +++ b/hw/intc/spapr_xive_hcall.c
> >>>> @@ -874,3 +874,57 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> >>>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> >>>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
> >>>>  }
> >>>> +
> >>>> +void spapr_xive_populate(sPAPRXive *xive, void *fdt, uint32_t phandle)
> >>>> +{
> >>>> +    int node;
> >>>> +    uint64_t timas[2 * 2];
> >>>> +    uint32_t lisn_ranges[] = {
> >>>> +        cpu_to_be32(xive->nr_irqs - xive->nr_targets + xive->ics->offset),
> >>>> +        cpu_to_be32(xive->nr_targets),
> >>>> +    };
> >>>> +    uint32_t eq_sizes[] = {
> >>>> +        cpu_to_be32(12), /* 4K */
> >>>> +        cpu_to_be32(16), /* 64K */
> >>>> +        cpu_to_be32(21), /* 2M */
> >>>> +        cpu_to_be32(24), /* 16M */
> >>>> +    };
> >>>> +
> >>>> +    /* Use some ranges to exercise the Linux driver, which should
> >>>> +     * result in Linux choosing priority 6. This is not strictly
> >>>> +     * necessary
> >>>> +     */
> >>>> +    uint32_t reserved_priorities[] = {
> >>>> +        cpu_to_be32(1),  /* start */
> >>>> +        cpu_to_be32(2),  /* count */
> >>>> +        cpu_to_be32(7),  /* start */
> >>>> +        cpu_to_be32(0xf8),  /* count */
> >>>> +    };
> >>>> +    int i;
> >>>> +
> >>>> +    /* Thread Interrupt Management Areas : User and OS */
> >>>> +    for (i = 0; i < 2; i++) {
> >>>> +        timas[i * 2] = cpu_to_be64(xive->tm_base + i * (1 << xive->tm_shift));
> >>>> +        timas[i * 2 + 1] = cpu_to_be64(1 << xive->tm_shift);
> >>>> +    }
> >>>> +
> >>>> +    _FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
> >>>> +
> >>>> +    _FDT(fdt_setprop_string(fdt, node, "name", "interrupt-controller"));
> >>>
> >>> Shouldn't need this - SLOF will figure it out from the node name above.
> >>
> >> It is in the specs. phyp has it. we might as well keep it.
> > 
> > You misunderstand.  SLOF will *create* the name property based on the
> > node name.  Adding it here has *no effect*.
> 
> ok. I was not ware of that. I will remove it then.

Historical aside: in traditional OF there aren't "node names" as such.
Each node has a 'name' and 'reg' property and the "node name"
displayed in listings is formed from those as name@unit-address - with
the tricky catch being that unit-address is encoded from 'reg' in a
bus specific manner (using what's essentially a method attached to the
parent node).

Obviously that's awkward in the flat tree world, since we can't have
methods.  So instead nodes have a real string name built into the
structure including both the name and unit address components.  'name'
properties are generally omitted and derived from that name.  'reg'
should match according to the bus's encoding conventions, but the
number of things that can actually verify that is relatively small.

> >>>> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
> >>>> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> >>>> +
> >>>> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
> >>>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
> >>>> +                     sizeof(eq_sizes)));
> >>>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
> >>>> +                     sizeof(lisn_ranges)));
> >>>
> >>> I note this doesn't have the interrupt-controller or #interrupt-cells
> >>> properties.  So what acts as the interrupt parent for all the devices
> >>> in the tree with XIVE?
> >>
> >> these properties are not in the specs anymore for the interrupt-controller
> >> node and I don't think Linux makes use of them (even for XICS). So 
> >> it just works fine.
> > 
> > Um.. what!?  Are you saying that the PAPR XIVE spec completely broke
> > how interrupt specifiers have worked in the device tree since forever?
> 
> Let me be more precise. I am saying that the interrupt-controller 
> and #interrupt-cells properties are not needed under the main interrupt 
> controller node. They can be removed from the tree and the Linux guest 
> kernel will boot perfectly well.
>   
> These properties still are needed under the sub nodes like :
> 
> /proc/device-tree/vdevice/interrupt-controller
> /proc/device-tree/event-sources/interrupt-controller

Um.  This still makes no sense.  In order to have a common interrupt
space, those nodes must have an interrupt-parent pointing somewhere -
the top level interrupt controller, which needs interrupt-controller
and #interrupt-cells properties.  Note that that will be the "source"
side of the intc.  There could also be a presentation side of the
intc, which wouldn't need those properties.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-20 12:54     ` Cédric Le Goater
@ 2017-09-22 10:58       ` David Gibson
  2017-09-22 12:26         ` Cédric Le Goater
  2017-09-28  8:27       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-22 10:58 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 13906 bytes --]

On Wed, Sep 20, 2017 at 02:54:31PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:57 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
> >> Each interrupt source is associated with a two bit state machine
> >> called an Event State Buffer (ESB) which is controlled by MMIO to
> >> trigger events. See code for more details on the states and
> >> transitions.
> >>
> >> The MMIO space for the ESB translation is 512GB large on baremetal
> >> (powernv) systems and the BAR depends on the chip id. In our model for
> >> the sPAPR machine, we choose to only map a sub memory region for the
> >> provisionned IRQ numbers and to use the mapping address of chip 0 on a
> >> real system. The OS will get the address of the MMIO page of the ESB
> >> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> > 
> > On bare metal, are the MMIOs for each irq source mapped contiguously?
> 
> yes. 
>  
> >> For KVM support, we should think of a way to map this QEMU memory
> >> region in the host to trigger events directly.
> > 
> > This would rely on being able to map them without mapping those for
> > any other VM or the host.  Does that mean allocating a contiguous (and
> > aligned) hunk of irqs for a guest?
> 
> I think so yes, the IRQ and the memory regions are tied, and also being 
> able to pass the MMIO region from the host to the guest, a bit like VFIO 
> for the IOMMU regions I suppose. But I haven't dig the problem too much. 
> 
> This is an important part in the overall design. 
> 
> > We're going to need to be careful about irq allocation here.
> > Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
> > MMIO addresses, 
> 
> GET_SOURCE_INFO only retrieves the address of the MMIO region for 
> a 'lisn'. it is not dynamically mapped.

Ok... what's a "lisn"?


> In the KVM case, the initial
> information on the address would come from OPAL and then the host 
> kernel would translate this information for the guest.
> 
> > we need the MMIO addresses to be stable and consistent, because 
> > we can't have them change across migration.  
> 
> yes. I will catch my XIVE guru next week in Paris to clarify that
> part. 
> 
> > We need to have this consistent between in-qemu and in-KVM XIVE
> > implementations as well.
> 
> yes.
> 
> C.
> 
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr_xive.h |   6 ++
> >>  2 files changed, 261 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 1ed7b6a286e9..8a85d64efc4c 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
> >>  }
> >>  
> >>  /*
> >> + * "magic" Event State Buffer (ESB) MMIO offsets.
> >> + *
> >> + * Each interrupt source has a 2-bit state machine called ESB
> >> + * which can be controlled by MMIO. It's made of 2 bits, P and
> >> + * Q. P indicates that an interrupt is pending (has been sent
> >> + * to a queue and is waiting for an EOI). Q indicates that the
> >> + * interrupt has been triggered while pending.
> >> + *
> >> + * This acts as a coalescing mechanism in order to guarantee
> >> + * that a given interrupt only occurs at most once in a queue.
> >> + *
> >> + * When doing an EOI, the Q bit will indicate if the interrupt
> >> + * needs to be re-triggered.
> >> + *
> >> + * The following offsets into the ESB MMIO allow to read or
> >> + * manipulate the PQ bits. They must be used with an 8-bytes
> >> + * load instruction. They all return the previous state of the
> >> + * interrupt (atomically).
> >> + *
> >> + * Additionally, some ESB pages support doing an EOI via a
> >> + * store at 0 and some ESBs support doing a trigger via a
> >> + * separate trigger page.
> >> + */
> >> +#define XIVE_ESB_GET            0x800
> >> +#define XIVE_ESB_SET_PQ_00      0xc00
> >> +#define XIVE_ESB_SET_PQ_01      0xd00
> >> +#define XIVE_ESB_SET_PQ_10      0xe00
> >> +#define XIVE_ESB_SET_PQ_11      0xf00
> >> +
> >> +#define XIVE_ESB_VAL_P          0x2
> >> +#define XIVE_ESB_VAL_Q          0x1
> >> +
> >> +#define XIVE_ESB_RESET          0x0
> >> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
> >> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> >> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
> >> +
> >> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
> >> +{
> >> +    uint32_t byte = idx / 4;
> >> +    uint32_t bit  = (idx % 4) * 2;
> >> +
> >> +    assert(byte < xive->sbe_size);
> >> +
> >> +    return (xive->sbe[byte] >> bit) & 0x3;
> >> +}
> >> +
> >> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
> >> +{
> >> +    uint32_t byte = idx / 4;
> >> +    uint32_t bit  = (idx % 4) * 2;
> >> +    uint8_t old, new;
> >> +
> >> +    assert(byte < xive->sbe_size);
> >> +
> >> +    old = xive->sbe[byte];
> >> +
> >> +    new = xive->sbe[byte] & ~(0x3 << bit);
> >> +    new |= (pq & 0x3) << bit;
> >> +
> >> +    xive->sbe[byte] = new;
> >> +
> >> +    return (old >> bit) & 0x3;
> >> +}
> >> +
> >> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
> >> +{
> >> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> >> +
> >> +    switch (old_pq) {
> >> +    case XIVE_ESB_RESET:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> >> +        return false;
> >> +    case XIVE_ESB_PENDING:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> >> +        return false;
> >> +    case XIVE_ESB_QUEUED:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> >> +        return true;
> >> +    case XIVE_ESB_OFF:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
> >> +        return false;
> >> +    default:
> >> +         g_assert_not_reached();
> >> +    }
> >> +}
> >> +
> >> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
> >> +{
> >> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> >> +
> >> +    switch (old_pq) {
> >> +    case XIVE_ESB_RESET:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> >> +        return true;
> >> +    case XIVE_ESB_PENDING:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
> >> +        return true;
> >> +    case XIVE_ESB_QUEUED:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
> >> +        return true;
> >> +    case XIVE_ESB_OFF:
> >> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
> >> +        return false;
> >> +    default:
> >> +         g_assert_not_reached();
> >> +    }
> >> +}
> >> +
> >> +/*
> >> + * XIVE Interrupt Source MMIOs
> >> + */
> >> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)
> >> +{
> >> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
> >> +
> >> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> >> +        irq->status &= ~XICS_STATUS_SENT;
> >> +    }
> >> +}
> >> +
> >> +/* TODO: handle second page
> >> + *
> >> + * Some HW use a separate page for trigger. We only support the case
> >> + * in which the trigger can be done in the same page as the EOI.
> >> + */
> >> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> >> +    uint32_t offset = addr & 0xF00;
> >> +    uint32_t srcno = addr >> xive->esb_shift;
> >> +    XiveIVE *ive;
> >> +    uint64_t ret = -1;
> >> +
> >> +    ive = spapr_xive_get_ive(xive, srcno);
> >> +    if (!ive || !(ive->w & IVE_VALID))  {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> >> +        goto out;
> > 
> > Since there's a whole (4k) page for each source, I wonder if we should
> > actually map each one as a separate MMIO region to allow us to tweak
> > the mappings more flexibly.
> > 
> >> +    }
> >> +
> >> +    switch (offset) {
> >> +    case 0:
> >> +        spapr_xive_source_eoi(xive, srcno);
> >> +
> >> +        /* return TRUE or FALSE depending on PQ value */
> >> +        ret = spapr_xive_pq_eoi(xive, srcno);
> >> +        break;
> >> +
> >> +    case XIVE_ESB_GET:
> >> +        ret = spapr_xive_pq_get(xive, srcno);
> >> +        break;
> >> +
> >> +    case XIVE_ESB_SET_PQ_00:
> >> +    case XIVE_ESB_SET_PQ_01:
> >> +    case XIVE_ESB_SET_PQ_10:
> >> +    case XIVE_ESB_SET_PQ_11:
> >> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);
> >> +        break;
> >> +    default:
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
> >> +    }
> >> +
> >> +out:
> >> +    return ret;
> >> +}
> >> +
> >> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
> >> +                           uint64_t value, unsigned size)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> >> +    uint32_t offset = addr & 0xF00;
> >> +    uint32_t srcno = addr >> xive->esb_shift;
> >> +    XiveIVE *ive;
> >> +    bool notify = false;
> >> +
> >> +    ive = spapr_xive_get_ive(xive, srcno);
> >> +    if (!ive || !(ive->w & IVE_VALID))  {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> >> +        return;
> >> +    }
> >> +
> >> +    switch (offset) {
> >> +    case 0:
> >> +        /* TODO: should we trigger even if the IVE is masked ? */
> >> +        notify = spapr_xive_pq_trigger(xive, srcno);
> >> +        break;
> >> +    default:
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> >> +                      offset);
> >> +        return;
> >> +    }
> >> +
> >> +    if (notify && !(ive->w & IVE_MASKED)) {
> >> +        qemu_irq_pulse(xive->qirqs[srcno]);
> >> +    }
> >> +}
> >> +
> >> +static const MemoryRegionOps spapr_xive_esb_ops = {
> >> +    .read = spapr_xive_esb_read,
> >> +    .write = spapr_xive_esb_write,
> >> +    .endianness = DEVICE_BIG_ENDIAN,
> >> +    .valid = {
> >> +        .min_access_size = 8,
> >> +        .max_access_size = 8,
> >> +    },
> >> +    .impl = {
> >> +        .min_access_size = 8,
> >> +        .max_access_size = 8,
> >> +    },
> >> +};
> >> +
> >> +/*
> >>   * XIVE Interrupt Source
> >>   */
> >>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
> >> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
> >>  /*
> >>   * Main XIVE object
> >>   */
> >> +#define P9_MMIO_BASE     0x006000000000000ull
> >> +
> >> +/* VC BAR contains set translations for the ESBs and the EQs. */
> >> +#define VC_BAR_DEFAULT   0x10000000000ull
> >> +#define VC_BAR_SIZE      0x08000000000ull
> >> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
> >> +
> >> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
> >> +                                            unsigned size)
> >> +{
> >> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> >> +                  __func__, offset, size);
> >> +    return 0;
> >> +}
> >> +
> >> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
> >> +                                         uint64_t value, unsigned size)
> >> +{
> >> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> >> +                  __func__, offset, value, size);
> >> +}
> >> +
> >> +static const MemoryRegionOps spapr_xive_esb_default_ops = {
> >> +    .read = spapr_xive_esb_default_read,
> >> +    .write = spapr_xive_esb_default_write,
> >> +    .endianness = DEVICE_BIG_ENDIAN,
> >> +};
> >>  
> >>  void spapr_xive_reset(void *dev)
> >>  {
> >> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
> >>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
> >>  
> >> +    /* VC BAR. That's the full window but we will only map the
> >> +     * subregions in use. */
> >> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
> >> +    xive->esb_shift = ESB_SHIFT;
> >> +
> >> +    /* Install default memory region handlers to log bogus access */
> >> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
> >> +                          NULL, "xive.esb.full", VC_BAR_SIZE);
> >> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
> >> +
> >> +    /* Install the ESB memory region in the overall one */
> >> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
> >> +                          xive, "xive.esb",
> >> +                          (1ull << xive->esb_shift) * xive->nr_irqs);
> >> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
> >> +
> >>      qemu_register_reset(spapr_xive_reset, dev);
> >>  }
> >>  
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> index eab92c4c1bb8..0f516534d76a 100644
> >> --- a/include/hw/ppc/spapr_xive.h
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -46,6 +46,12 @@ struct sPAPRXive {
> >>      XiveIVE      *ivt;
> >>      XiveEQ       *eqt;
> >>      uint32_t     nr_eqs;
> >> +
> >> +    /* ESB memory region */
> >> +    uint32_t     esb_shift;
> >> +    hwaddr       esb_base;
> >> +    MemoryRegion esb_mr;
> >> +    MemoryRegion esb_iomem;
> >>  };
> >>  
> >>  #endif /* PPC_SPAPR_XIVE_H */
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE
  2017-09-19 19:28     ` Cédric Le Goater
@ 2017-09-22 10:58       ` David Gibson
  2017-09-22 12:27         ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-22 10:58 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]

On Tue, Sep 19, 2017 at 09:28:45PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 09:36 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:23PM +0200, Cédric Le Goater wrote:
> >> The XIVE interrupt presenter exposes a set of Thread Interrupt
> >> Management Areas, also called rings, one per different level of
> >> privilege (four in all). This area is used to handle priority
> >> management and interrupt acknowledgment among other things.
> >>
> >> We extend the ICPState object with a cache of the register data for
> >> XIVE. The integration with the sPAPR machine is much easier and we
> >> need a common framework to switch from one controller model to
> >> another: XICS <-> XIVE.
> > 
> > This sounds like an even worse idea than referencing the ICS state.
> 
> ok ok.
> 
> > The TIMA really needs to be managed by a different object than the ICP.
> 
> like an array under the machine indexed by the cpu index ? 

Or individual TIMA objects which the cpus point to using their intc
pointers.

> at some point, we will need to :
> 
>     PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
>     ICPState *icp = ICP(cpu->intc);
> 
> and 
> 
>     icp = xics_icp_get(xive->ics->xics, target);
> 
> 
> isn't the cpu->intc pointer  the best option to hold that information ? 
> and it is migrated.

No, it shouldn't be migrated.  It's set up during machine construction.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-19 13:15     ` Cédric Le Goater
@ 2017-09-22 11:00       ` David Gibson
  2017-09-22 12:42         ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: David Gibson @ 2017-09-22 11:00 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2171 bytes --]

On Tue, Sep 19, 2017 at 03:15:44PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:27 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:15PM +0200, Cédric Le Goater wrote:
> >> Start with a couple of attributes for the XIVE sPAPR controller
> >> model. The number of provisionned IRQ is necessary to size the
> >> different internal XIVE tables, the number of CPUs is also.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > 
> > [snip]
> > 
> >> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >> +{
> >> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +
> >> +    if (!xive->nr_targets) {
> >> +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
> >> +        return;
> >> +    }
> >> +    /* We need to be able to allocate at least the IPIs */
> >> +    if (!xive->nr_irqs || xive->nr_irqs < xive->nr_targets) {
> >> +        error_setg(errp, "Number of interrupts too small");
> >> +        return;
> >> +    }
> >> +}
> >> +
> >> +static Property spapr_xive_properties[] = {
> >> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> >> +    DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
> > 
> > I'm a bit uneasy about the number of targets having to be set in
> > advance: this can make life awkward when CPUs are hotplugged.  I know
> > there's something similar in xics, but it has caused some hassles, and
> > we're starting to move away from it.
> > 
> > Do you really need this?
> > 
> 
> Some of the internal table size depend on the number of cpus 
> defined for the machine.

Which ones?  My impression was that there needed to be at least #cpus
* #priority-levels EQs, but there could be more than that, so it was
no longer as tightly bound to the number if "interrupt servers" as xics.

> When the sPAPRXive object is instantiated, 
> we use xics_max_server_number() to get the max number of cpus
> provisioned.
> 
> C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-22 10:58       ` David Gibson
@ 2017-09-22 12:26         ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-22 12:26 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/22/2017 12:58 PM, David Gibson wrote:
> On Wed, Sep 20, 2017 at 02:54:31PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 04:57 AM, David Gibson wrote:
>>> On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
>>>> Each interrupt source is associated with a two bit state machine
>>>> called an Event State Buffer (ESB) which is controlled by MMIO to
>>>> trigger events. See code for more details on the states and
>>>> transitions.
>>>>
>>>> The MMIO space for the ESB translation is 512GB large on baremetal
>>>> (powernv) systems and the BAR depends on the chip id. In our model for
>>>> the sPAPR machine, we choose to only map a sub memory region for the
>>>> provisionned IRQ numbers and to use the mapping address of chip 0 on a
>>>> real system. The OS will get the address of the MMIO page of the ESB
>>>> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
>>>
>>> On bare metal, are the MMIOs for each irq source mapped contiguously?
>>
>> yes. 
>>  
>>>> For KVM support, we should think of a way to map this QEMU memory
>>>> region in the host to trigger events directly.
>>>
>>> This would rely on being able to map them without mapping those for
>>> any other VM or the host.  Does that mean allocating a contiguous (and
>>> aligned) hunk of irqs for a guest?
>>
>> I think so yes, the IRQ and the memory regions are tied, and also being 
>> able to pass the MMIO region from the host to the guest, a bit like VFIO 
>> for the IOMMU regions I suppose. But I haven't dig the problem too much. 
>>
>> This is an important part in the overall design. 
>>
>>> We're going to need to be careful about irq allocation here.
>>> Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
>>> MMIO addresses, 
>>
>> GET_SOURCE_INFO only retrieves the address of the MMIO region for 
>> a 'lisn'. it is not dynamically mapped.
> 
> Ok... what's a "lisn"?

Logical Interrupt Source Number. This is the source number the OS guest 
manipulates in the hcalls. The OS registers also an EISN, Effective Interrupt 
Source Number, associated with the LISN, which will be stored in the event 
queue.

C. 

> 
> 
>> In the KVM case, the initial
>> information on the address would come from OPAL and then the host 
>> kernel would translate this information for the guest.
>>
>>> we need the MMIO addresses to be stable and consistent, because 
>>> we can't have them change across migration.  
>>
>> yes. I will catch my XIVE guru next week in Paris to clarify that
>> part. 
>>
>>> We need to have this consistent between in-qemu and in-KVM XIVE
>>> implementations as well.
>>
>> yes.
>>
>> C.
>>
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr_xive.h |   6 ++
>>>>  2 files changed, 261 insertions(+)
>>>>
>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>> index 1ed7b6a286e9..8a85d64efc4c 100644
>>>> --- a/hw/intc/spapr_xive.c
>>>> +++ b/hw/intc/spapr_xive.c
>>>> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>>>>  }
>>>>  
>>>>  /*
>>>> + * "magic" Event State Buffer (ESB) MMIO offsets.
>>>> + *
>>>> + * Each interrupt source has a 2-bit state machine called ESB
>>>> + * which can be controlled by MMIO. It's made of 2 bits, P and
>>>> + * Q. P indicates that an interrupt is pending (has been sent
>>>> + * to a queue and is waiting for an EOI). Q indicates that the
>>>> + * interrupt has been triggered while pending.
>>>> + *
>>>> + * This acts as a coalescing mechanism in order to guarantee
>>>> + * that a given interrupt only occurs at most once in a queue.
>>>> + *
>>>> + * When doing an EOI, the Q bit will indicate if the interrupt
>>>> + * needs to be re-triggered.
>>>> + *
>>>> + * The following offsets into the ESB MMIO allow to read or
>>>> + * manipulate the PQ bits. They must be used with an 8-bytes
>>>> + * load instruction. They all return the previous state of the
>>>> + * interrupt (atomically).
>>>> + *
>>>> + * Additionally, some ESB pages support doing an EOI via a
>>>> + * store at 0 and some ESBs support doing a trigger via a
>>>> + * separate trigger page.
>>>> + */
>>>> +#define XIVE_ESB_GET            0x800
>>>> +#define XIVE_ESB_SET_PQ_00      0xc00
>>>> +#define XIVE_ESB_SET_PQ_01      0xd00
>>>> +#define XIVE_ESB_SET_PQ_10      0xe00
>>>> +#define XIVE_ESB_SET_PQ_11      0xf00
>>>> +
>>>> +#define XIVE_ESB_VAL_P          0x2
>>>> +#define XIVE_ESB_VAL_Q          0x1
>>>> +
>>>> +#define XIVE_ESB_RESET          0x0
>>>> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
>>>> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
>>>> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
>>>> +
>>>> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
>>>> +{
>>>> +    uint32_t byte = idx / 4;
>>>> +    uint32_t bit  = (idx % 4) * 2;
>>>> +
>>>> +    assert(byte < xive->sbe_size);
>>>> +
>>>> +    return (xive->sbe[byte] >> bit) & 0x3;
>>>> +}
>>>> +
>>>> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
>>>> +{
>>>> +    uint32_t byte = idx / 4;
>>>> +    uint32_t bit  = (idx % 4) * 2;
>>>> +    uint8_t old, new;
>>>> +
>>>> +    assert(byte < xive->sbe_size);
>>>> +
>>>> +    old = xive->sbe[byte];
>>>> +
>>>> +    new = xive->sbe[byte] & ~(0x3 << bit);
>>>> +    new |= (pq & 0x3) << bit;
>>>> +
>>>> +    xive->sbe[byte] = new;
>>>> +
>>>> +    return (old >> bit) & 0x3;
>>>> +}
>>>> +
>>>> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
>>>> +{
>>>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
>>>> +
>>>> +    switch (old_pq) {
>>>> +    case XIVE_ESB_RESET:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
>>>> +        return false;
>>>> +    case XIVE_ESB_PENDING:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
>>>> +        return false;
>>>> +    case XIVE_ESB_QUEUED:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
>>>> +        return true;
>>>> +    case XIVE_ESB_OFF:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
>>>> +        return false;
>>>> +    default:
>>>> +         g_assert_not_reached();
>>>> +    }
>>>> +}
>>>> +
>>>> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
>>>> +{
>>>> +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
>>>> +
>>>> +    switch (old_pq) {
>>>> +    case XIVE_ESB_RESET:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
>>>> +        return true;
>>>> +    case XIVE_ESB_PENDING:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
>>>> +        return true;
>>>> +    case XIVE_ESB_QUEUED:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
>>>> +        return true;
>>>> +    case XIVE_ESB_OFF:
>>>> +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
>>>> +        return false;
>>>> +    default:
>>>> +         g_assert_not_reached();
>>>> +    }
>>>> +}
>>>> +
>>>> +/*
>>>> + * XIVE Interrupt Source MMIOs
>>>> + */
>>>> +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)
>>>> +{
>>>> +    ICSIRQState *irq = &xive->ics->irqs[srcno];
>>>> +
>>>> +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
>>>> +        irq->status &= ~XICS_STATUS_SENT;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* TODO: handle second page
>>>> + *
>>>> + * Some HW use a separate page for trigger. We only support the case
>>>> + * in which the trigger can be done in the same page as the EOI.
>>>> + */
>>>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
>>>> +{
>>>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>>>> +    uint32_t offset = addr & 0xF00;
>>>> +    uint32_t srcno = addr >> xive->esb_shift;
>>>> +    XiveIVE *ive;
>>>> +    uint64_t ret = -1;
>>>> +
>>>> +    ive = spapr_xive_get_ive(xive, srcno);
>>>> +    if (!ive || !(ive->w & IVE_VALID))  {
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
>>>> +        goto out;
>>>
>>> Since there's a whole (4k) page for each source, I wonder if we should
>>> actually map each one as a separate MMIO region to allow us to tweak
>>> the mappings more flexibly.
>>>
>>>> +    }
>>>> +
>>>> +    switch (offset) {
>>>> +    case 0:
>>>> +        spapr_xive_source_eoi(xive, srcno);
>>>> +
>>>> +        /* return TRUE or FALSE depending on PQ value */
>>>> +        ret = spapr_xive_pq_eoi(xive, srcno);
>>>> +        break;
>>>> +
>>>> +    case XIVE_ESB_GET:
>>>> +        ret = spapr_xive_pq_get(xive, srcno);
>>>> +        break;
>>>> +
>>>> +    case XIVE_ESB_SET_PQ_00:
>>>> +    case XIVE_ESB_SET_PQ_01:
>>>> +    case XIVE_ESB_SET_PQ_10:
>>>> +    case XIVE_ESB_SET_PQ_11:
>>>> +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);
>>>> +        break;
>>>> +    default:
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
>>>> +    }
>>>> +
>>>> +out:
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
>>>> +                           uint64_t value, unsigned size)
>>>> +{
>>>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>>>> +    uint32_t offset = addr & 0xF00;
>>>> +    uint32_t srcno = addr >> xive->esb_shift;
>>>> +    XiveIVE *ive;
>>>> +    bool notify = false;
>>>> +
>>>> +    ive = spapr_xive_get_ive(xive, srcno);
>>>> +    if (!ive || !(ive->w & IVE_VALID))  {
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    switch (offset) {
>>>> +    case 0:
>>>> +        /* TODO: should we trigger even if the IVE is masked ? */
>>>> +        notify = spapr_xive_pq_trigger(xive, srcno);
>>>> +        break;
>>>> +    default:
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
>>>> +                      offset);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (notify && !(ive->w & IVE_MASKED)) {
>>>> +        qemu_irq_pulse(xive->qirqs[srcno]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static const MemoryRegionOps spapr_xive_esb_ops = {
>>>> +    .read = spapr_xive_esb_read,
>>>> +    .write = spapr_xive_esb_write,
>>>> +    .endianness = DEVICE_BIG_ENDIAN,
>>>> +    .valid = {
>>>> +        .min_access_size = 8,
>>>> +        .max_access_size = 8,
>>>> +    },
>>>> +    .impl = {
>>>> +        .min_access_size = 8,
>>>> +        .max_access_size = 8,
>>>> +    },
>>>> +};
>>>> +
>>>> +/*
>>>>   * XIVE Interrupt Source
>>>>   */
>>>>  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
>>>> @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
>>>>  /*
>>>>   * Main XIVE object
>>>>   */
>>>> +#define P9_MMIO_BASE     0x006000000000000ull
>>>> +
>>>> +/* VC BAR contains set translations for the ESBs and the EQs. */
>>>> +#define VC_BAR_DEFAULT   0x10000000000ull
>>>> +#define VC_BAR_SIZE      0x08000000000ull
>>>> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
>>>> +
>>>> +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
>>>> +                                            unsigned size)
>>>> +{
>>>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
>>>> +                  __func__, offset, size);
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
>>>> +                                         uint64_t value, unsigned size)
>>>> +{
>>>> +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
>>>> +                  __func__, offset, value, size);
>>>> +}
>>>> +
>>>> +static const MemoryRegionOps spapr_xive_esb_default_ops = {
>>>> +    .read = spapr_xive_esb_default_read,
>>>> +    .write = spapr_xive_esb_default_write,
>>>> +    .endianness = DEVICE_BIG_ENDIAN,
>>>> +};
>>>>  
>>>>  void spapr_xive_reset(void *dev)
>>>>  {
>>>> @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>>>      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
>>>>      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
>>>>  
>>>> +    /* VC BAR. That's the full window but we will only map the
>>>> +     * subregions in use. */
>>>> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
>>>> +    xive->esb_shift = ESB_SHIFT;
>>>> +
>>>> +    /* Install default memory region handlers to log bogus access */
>>>> +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
>>>> +                          NULL, "xive.esb.full", VC_BAR_SIZE);
>>>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
>>>> +
>>>> +    /* Install the ESB memory region in the overall one */
>>>> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
>>>> +                          xive, "xive.esb",
>>>> +                          (1ull << xive->esb_shift) * xive->nr_irqs);
>>>> +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
>>>> +
>>>>      qemu_register_reset(spapr_xive_reset, dev);
>>>>  }
>>>>  
>>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>>>> index eab92c4c1bb8..0f516534d76a 100644
>>>> --- a/include/hw/ppc/spapr_xive.h
>>>> +++ b/include/hw/ppc/spapr_xive.h
>>>> @@ -46,6 +46,12 @@ struct sPAPRXive {
>>>>      XiveIVE      *ivt;
>>>>      XiveEQ       *eqt;
>>>>      uint32_t     nr_eqs;
>>>> +
>>>> +    /* ESB memory region */
>>>> +    uint32_t     esb_shift;
>>>> +    hwaddr       esb_base;
>>>> +    MemoryRegion esb_mr;
>>>> +    MemoryRegion esb_iomem;
>>>>  };
>>>>  
>>>>  #endif /* PPC_SPAPR_XIVE_H */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE
  2017-09-22 10:58       ` David Gibson
@ 2017-09-22 12:27         ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-22 12:27 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/22/2017 12:58 PM, David Gibson wrote:
> On Tue, Sep 19, 2017 at 09:28:45PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 09:36 AM, David Gibson wrote:
>>> On Mon, Sep 11, 2017 at 07:12:23PM +0200, Cédric Le Goater wrote:
>>>> The XIVE interrupt presenter exposes a set of Thread Interrupt
>>>> Management Areas, also called rings, one per different level of
>>>> privilege (four in all). This area is used to handle priority
>>>> management and interrupt acknowledgment among other things.
>>>>
>>>> We extend the ICPState object with a cache of the register data for
>>>> XIVE. The integration with the sPAPR machine is much easier and we
>>>> need a common framework to switch from one controller model to
>>>> another: XICS <-> XIVE.
>>>
>>> This sounds like an even worse idea than referencing the ICS state.
>>
>> ok ok.
>>
>>> The TIMA really needs to be managed by a different object than the ICP.
>>
>> like an array under the machine indexed by the cpu index ? 
> 
> Or individual TIMA objects which the cpus point to using their intc
> pointers.

ah ok. We really are splitting the two worlds.

C.
 
>> at some point, we will need to :
>>
>>     PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
>>     ICPState *icp = ICP(cpu->intc);
>>
>> and 
>>
>>     icp = xics_icp_get(xive->ics->xics, target);
>>
>>
>> isn't the cpu->intc pointer  the best option to hold that information ? 
>> and it is migrated.
> 
> No, it shouldn't be migrated.  It's set up during machine construction.
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-22 10:33           ` David Gibson
@ 2017-09-22 12:32             ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-22 12:32 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/22/2017 12:33 PM, David Gibson wrote:
> On Thu, Sep 21, 2017 at 04:18:33PM +0200, Cédric Le Goater wrote:
>> On 09/21/2017 03:25 AM, David Gibson wrote:
>>> On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
>>>> On 09/19/2017 10:46 AM, David Gibson wrote:
>>>>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
>>>>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
>>>>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>>>>>>> negotiation process determines whether the guest operates with an
>>>>>>> interrupt controller using the XICS legacy model, as found on POWER8,
>>>>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>>>>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
>>>>>>>
>>>>>>> Follows a model for the XIVE interrupt controller and support for the
>>>>>>> Hypervisor's calls which are used to configure the interrupt sources
>>>>>>> and the event/notification queues of the guest. The last patch
>>>>>>> integrates XIVE in the sPAPR machine.
>>>>>>>
>>>>>>> Code is here:
>>>>>>
>>>>>>
>>>>>> An overall comment:
>>>>>>
>>>>>> I note in several replies here that I think the way XICS objects are
>>>>>> re-used for XIVE is really ugly, and I think it will make future
>>>>>> maintenance pretty painful.
>>>>
>>>> I agree. That was one way to identify what we need for migration 
>>>> compatibility and CAS reset.   
>>>>
>>>>>> I'm thinking maybe trying to support the CAS negotiation of interrupt
>>>>>> controller from day 1 is warping the design.  A better approach might
>>>>>> be first to implement XIVE only when given a specific machine option -
>>>>>> guest gets one or the other and can't negotiate.
>>>>
>>>> ok. 
>>>>
>>>> CAS is not the most complex problem, we mostly need to share 
>>>> the ICSIRQState array and the source offset. migration from older
>>>> machine is a problem.
>>>
>>> Uh.. what?  Migration from an older machine isn't a thing.  We can
>>> migrate from an older qemu, but the machine type (and version) has to
>>> be identical at each end.  That's *why* we keep around the older
>>> machine types on newer qemus.
>>
>> yes. I am just wondering how I am going to handle a xics-only 
>> machine migrating to a xics/xive machine. 
> 
> Won't ever happen.  Older machine types will always be xics, newer
> machine type will always be xive (at least with POWER9).
> 
>> The xive machine option we are talking about will activate 
>> the xive interrupt mode and instantiate the objects behind it. 
>> So when we migrate from an older machine we will need to start 
>> the target machine with xive=off. I guess that is OK.
> 
> Again, we *don't* migrate from an older machine.  Ever.  We only ever
> migrate from an older qemu version to a newer qemu using the older
> machine type.

Sorry I was talking about QEMU version, and not machine version.
I still have to look at how both machines will cohabitate in the 
newer QEMU. 

Thanks,

C. 


>>
>> Thanks for the insights and the time to review the code,
>>
>> C. 
>>
>>>> We are doomed to keep the existing XICS
>>>> framework available.
>>>>
>>>>>> That should allow a more natural XIVE design to emerge, *then* we can
>>>>>> look at what's necessary to make boot-time negotiation possible.
>>>>>
>>>>> Actually, it just occurred to me that we might be making life hard for
>>>>> ourselves by trying to actually switch between full XICS and XIVE
>>>>> models.  Coudln't we have new machine types always construct the XIVE
>>>>> infrastructure, 
>>>>
>>>> yes.
>>>>
>>>>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
>>>>> hardware.
>>>>
>>>> ok but migration will not be supported.
>>>
>>> Right, this would only be for newer machine types, and you can never
>>> migrate between different machine types.
>>>
>>>>> Since something more or less equivalent
>>>>> has already been done in both OPAL and the host kernel, I'm guessing
>>>>> this shouldn't be too hard at this point.
>>>>
>>>> Indeed that is how it is working currently on P9 kvm guests. hcalls are
>>>> implemented on top of XIVE native.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> C.
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-22 11:00       ` David Gibson
@ 2017-09-22 12:42         ` Cédric Le Goater
  2017-09-26  3:54           ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-22 12:42 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/22/2017 01:00 PM, David Gibson wrote:
> On Tue, Sep 19, 2017 at 03:15:44PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 04:27 AM, David Gibson wrote:
>>> On Mon, Sep 11, 2017 at 07:12:15PM +0200, Cédric Le Goater wrote:
>>>> Start with a couple of attributes for the XIVE sPAPR controller
>>>> model. The number of provisionned IRQ is necessary to size the
>>>> different internal XIVE tables, the number of CPUs is also.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> [snip]
>>>
>>>> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>>> +{
>>>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>>>> +
>>>> +    if (!xive->nr_targets) {
>>>> +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
>>>> +        return;
>>>> +    }
>>>> +    /* We need to be able to allocate at least the IPIs */
>>>> +    if (!xive->nr_irqs || xive->nr_irqs < xive->nr_targets) {
>>>> +        error_setg(errp, "Number of interrupts too small");
>>>> +        return;
>>>> +    }
>>>> +}
>>>> +
>>>> +static Property spapr_xive_properties[] = {
>>>> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
>>>> +    DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
>>>
>>> I'm a bit uneasy about the number of targets having to be set in
>>> advance: this can make life awkward when CPUs are hotplugged.  I know
>>> there's something similar in xics, but it has caused some hassles, and
>>> we're starting to move away from it.
>>>
>>> Do you really need this?
>>>
>>
>> Some of the internal table size depend on the number of cpus 
>> defined for the machine.
> 
> Which ones?  My impression was that there needed to be at least #cpus
> * #priority-levels EQs, but there could be more than that, 

euh no, not in spapr mode at least. There are 8 queues per cpu.

> so it was no longer as tightly bound to the number if "interrupt servers"> as xics.

ah. I think I see what you mean, that we could allocate them on the 
fly when needed by some hcalls ? 

The other place where I use the nr_targets is to provision the 
IRQ numbers for the IPIs but that could probably be done in some 
other way, specially it there is a IRQ allocator at the machine level.

C.  
>> When the sPAPRXive object is instantiated, 
>> we use xics_max_server_number() to get the max number of cpus
>> provisioned.
>>
>> C.
>>
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-22 12:42         ` Cédric Le Goater
@ 2017-09-26  3:54           ` David Gibson
  2017-09-26  9:45             ` Benjamin Herrenschmidt
  2017-11-16 15:58             ` Cédric Le Goater
  0 siblings, 2 replies; 90+ messages in thread
From: David Gibson @ 2017-09-26  3:54 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2946 bytes --]

On Fri, Sep 22, 2017 at 02:42:07PM +0200, Cédric Le Goater wrote:
> On 09/22/2017 01:00 PM, David Gibson wrote:
> > On Tue, Sep 19, 2017 at 03:15:44PM +0200, Cédric Le Goater wrote:
> >> On 09/19/2017 04:27 AM, David Gibson wrote:
> >>> On Mon, Sep 11, 2017 at 07:12:15PM +0200, Cédric Le Goater wrote:
> >>>> Start with a couple of attributes for the XIVE sPAPR controller
> >>>> model. The number of provisionned IRQ is necessary to size the
> >>>> different internal XIVE tables, the number of CPUs is also.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>
> >>> [snip]
> >>>
> >>>> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> >>>> +{
> >>>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> >>>> +
> >>>> +    if (!xive->nr_targets) {
> >>>> +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
> >>>> +        return;
> >>>> +    }
> >>>> +    /* We need to be able to allocate at least the IPIs */
> >>>> +    if (!xive->nr_irqs || xive->nr_irqs < xive->nr_targets) {
> >>>> +        error_setg(errp, "Number of interrupts too small");
> >>>> +        return;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static Property spapr_xive_properties[] = {
> >>>> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> >>>> +    DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
> >>>
> >>> I'm a bit uneasy about the number of targets having to be set in
> >>> advance: this can make life awkward when CPUs are hotplugged.  I know
> >>> there's something similar in xics, but it has caused some hassles, and
> >>> we're starting to move away from it.
> >>>
> >>> Do you really need this?
> >>>
> >>
> >> Some of the internal table size depend on the number of cpus 
> >> defined for the machine.
> > 
> > Which ones?  My impression was that there needed to be at least #cpus
> > * #priority-levels EQs, but there could be more than that, 
> 
> euh no, not in spapr mode at least. There are 8 queues per cpu.

Ok.

> > so it was no longer as tightly bound to the number if "interrupt servers"> as xics.
> 
> ah. I think I see what you mean, that we could allocate them on the 
> fly when needed by some hcalls ?

Not at hcall time, no, but at cpu hot(un)plug time I was wondering if we
could (de)allocate them then.

> The other place where I use the nr_targets is to provision the 
> IRQ numbers for the IPIs but that could probably be done in some 
> other way, specially it there is a IRQ allocator at the machine
> level.

Hm, ok.

> 
> C.  
> >> When the sPAPRXive object is instantiated, 
> >> we use xics_max_server_number() to get the max number of cpus
> >> provisioned.
> >>
> >> C.
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-26  3:54           ` David Gibson
@ 2017-09-26  9:45             ` Benjamin Herrenschmidt
  2017-11-16 16:48               ` Cédric Le Goater
  2017-11-16 15:58             ` Cédric Le Goater
  1 sibling, 1 reply; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-26  9:45 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Tue, 2017-09-26 at 13:54 +1000, David Gibson wrote:
> > > 
> > > Which ones?  My impression was that there needed to be at least #cpus
> > > * #priority-levels EQs, but there could be more than that, 
> > 
> > euh no, not in spapr mode at least. There are 8 queues per cpu.
> 
> Ok.

There's a HW feature of XIVE in DD2.x that I will start exploiting soon
that sacrifices a queue btw, keep that in mind.

We should probably only expose 0...6 to guests, not 0...7.

> > > so it was no longer as tightly bound to the number if "interrupt servers"> as xics.
> > 
> > ah. I think I see what you mean, that we could allocate them on the 
> > fly when needed by some hcalls ?
> 
> Not at hcall time, no, but at cpu hot(un)plug time I was wondering if we
> could (de)allocate them then.
> 
> > The other place where I use the nr_targets is to provision the 
> > IRQ numbers for the IPIs but that could probably be done in some 
> > other way, specially it there is a IRQ allocator at the machine
> > level.
> 
> Hm, ok.
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue
  2017-09-20  6:34       ` David Gibson
@ 2017-09-28  8:12         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:12 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Wed, 2017-09-20 at 16:34 +1000, David Gibson wrote:
> > >> +    if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> > >> +        priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> > >>  
> > >> +        /* The EQ is masked. Can this happen ?  */
> > >> +        if (priority == 0xff) {
> > >> +            return;
> > > 
> > > How does the 8-bit priority field here interact with the 3-bit
> > > priority which selects which EQ to use?
> > 
> > priority OxFF is a special case kept for masking, see the hcall 
> > h_int_set_source_config. It should never reach the EQ lookup 
> > routines. So may be an assert would be better here.
> 
> Ok, if this situation can't be guest triggered, only by a bug in the
> rest of the XIVE code, then an assert() is better.

Note: this doesn't match HW. However there's a mask bit in the EAS.

The problem when masking that way of course is that you lose triggers,
ie P gets set, the interrupt lost, and nobody will clear P.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S
  2017-09-20  9:40     ` Cédric Le Goater
@ 2017-09-28  8:14       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:14 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Wed, 2017-09-20 at 11:40 +0200, Cédric Le Goater wrote:
> > Plus, this doesn't seem right.  Shouldn't this
> > recheck the CPPR against the PIPR, in case a higher priority irq has
> > been delivered since the one the cpu is acking.
> 
> If a higher priority is delivered, it means that the CPPR was more 
> privileged and that we have now two bits set in the IPB by the time 
> the interrupt is acked. The high priority PIPR will become the new 
> CPPR and the IBP will be modified keeping only the lower priority. 
> 
> if the CPPR is modified to the lower priority level, then the 
> first interrupt will be delivered again. 
> 
> I think this is fine.

Also remember the HW PIPR behaviour, its a bit odd, it will be clamped
by the CPPR. So if CPPR is 0 PIPR will be 0.

If CPPR is 7, PIPR will be <= 7, etc...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command
  2017-09-20  9:47     ` Cédric Le Goater
@ 2017-09-28  8:18       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:18 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Wed, 2017-09-20 at 11:47 +0200, Cédric Le Goater wrote:
> > > @@ -162,7 +162,14 @@ static bool spapr_xive_tm_is_readonly(uint8_t index)
> > >   static void spapr_xive_tm_write_special(ICPState *icp, hwaddr offset,
> > >                                     uint64_t value, unsigned size)
> > >   {
> > > -    /* TODO: support TM_SPC_SET_OS_PENDING */
> > > +    if (offset == TM_SPC_SET_OS_PENDING && size == 1) {
> > > +        icp->tima_os[TM_IPB] |= priority_to_ipb(value & 0xff);
> > > +        icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > 
> > This only lets the cpu raise bits in the IPB, never clear them.> Is that right?  
> 
> The clear is done when the OS acks the interrupt.
> 
> > I don't see how you'd implement the handling of multiple
> > priorities without being able to clear bits here.
> 
> I am not sure how this command should be used from the OS. 
> Currently, I only see KVM handling it in the XICS/XIVE glue.
> I need to take a closer look.

It's a way to avoid the SW replay on EOI.

IE, assume you have 2 interrupts in the queue. You take the exception,
ack the first one, process it etc...

Then you EOI, the HW won't send a second notification. You need to look
at the queue and continue consuming until it's empty.

Today Linux checks the queue on EOI and use a SW mechanism to
synthetize a new pseudo-external interrupt.

This MMIO command would allow the OS to instead set back the
corresponding priority bit to 1 in the IPB and cause the HW to re-emit
the interrupt instead of SW.

Linux doesn't use this today because DD1 didn't support it for the HV
level, but other OSes might and we also might use it when we do groups,
thus allowing redistribution.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-20 12:33     ` Cédric Le Goater
  2017-09-21  1:25       ` David Gibson
@ 2017-09-28  8:23       ` Benjamin Herrenschmidt
  2017-09-28 13:17         ` David Gibson
  1 sibling, 1 reply; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:23 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Wed, 2017-09-20 at 14:33 +0200, Cédric Le Goater wrote:
> > > I'm thinking maybe trying to support the CAS negotiation of interrupt
> > > controller from day 1 is warping the design.  A better approach might
> > > be first to implement XIVE only when given a specific machine option -
> > > guest gets one or the other and can't negotiate.
> 
> ok. 
> 
> CAS is not the most complex problem, we mostly need to share 
> the ICSIRQState array and the source offset. migration from older
> machine is a problem. We are doomed to keep the existing XICS
> framework available.

I don't like sharing anything. I'd rather we had separate objects
alltogether. If needed we can implement CAS by doing a partition reboot
like pHyp does, at least initially, until we add ways to tear down and
rebuild objects.

The main issue is whether we can keep a consistent number space so the
DT doesn't have to be completely rebuilt. If it does, then reboot will
be the only practical option I'm afraid.

> > > That should allow a more natural XIVE design to emerge, *then* we can
> > > look at what's necessary to make boot-time negotiation possible.
> > 
> > Actually, it just occurred to me that we might be making life hard for
> > ourselves by trying to actually switch between full XICS and XIVE
> > models.  Coudln't we have new machine types always construct the XIVE
> > infrastructure, 
> 
> yes.
> 
> > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> > hardware.

That's gross :-)

This is also exactly what KVM does with real XIVE HW and there's also
such an emulation in OPAL. I'd be weary of creating a 3rd one...

I'd much prefer if we managed to:

 - Split the source numbering from the various state tracking objects
so we can have that common

 - Either delay the creation to after CAS or tear down & re-create the
state tracking objects at CAS time.

> ok but migration will not be supported.
> 
> > Since something more or less equivalent
> > has already been done in both OPAL and the host kernel, I'm guessing
> > this shouldn't be too hard at this point.

It would very much suck to have yet another one of these.

Also we need to understand how that would work in a KVM context, the
kernel will provide a "XICS" state even on top of XIVE unless we switch
the kernel object to native, but then the kernel will expect full
exploitation.

> Indeed that is how it is working currently on P9 kvm guests. hcalls are
> implemented on top of XIVE native.
> 
> Thanks,
> 
> 
> C.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-20 12:54     ` Cédric Le Goater
  2017-09-22 10:58       ` David Gibson
@ 2017-09-28  8:27       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:27 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Wed, 2017-09-20 at 14:54 +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:57 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
> > > Each interrupt source is associated with a two bit state machine
> > > called an Event State Buffer (ESB) which is controlled by MMIO to
> > > trigger events. See code for more details on the states and
> > > transitions.
> > > 
> > > The MMIO space for the ESB translation is 512GB large on baremetal
> > > (powernv) systems and the BAR depends on the chip id. In our model for
> > > the sPAPR machine, we choose to only map a sub memory region for the
> > > provisionned IRQ numbers and to use the mapping address of chip 0 on a
> > > real system. The OS will get the address of the MMIO page of the ESB
> > > entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> > 
> > On bare metal, are the MMIOs for each irq source mapped contiguously?
> 
> yes. 

Sort-of...

There are several source "controllers" in the system. Each PHB gets a
range of numbers, and the XIVE itself has about half a million of
generic sources (aka 'IPIs') which we use for IPIs and virtual device
interrupts.

Each of those "controller" has its own MMIO area. So the MMIOs are only
mapped contiguously within a given controller.

With pass-through, things are a bit more complex because a given guest
visible source can become either an IPI (when not attached to the host
interrupt) or a real HW source. So we'll have to invalidate the GPA-
>HVA mapping and remap. Tricky (but doable). I have some ideas about
how to plumb all that but haven't really fully thought it out.

> > > For KVM support, we should think of a way to map this QEMU memory
> > > region in the host to trigger events directly.
> > 
> > This would rely on being able to map them without mapping those for
> > any other VM or the host.  Does that mean allocating a contiguous (and
> > aligned) hunk of irqs for a guest?
> 
> I think so yes, the IRQ and the memory regions are tied, and also being 
> able to pass the MMIO region from the host to the guest, a bit like VFIO 
> for the IOMMU regions I suppose. But I haven't dig the problem too much. 
> 
> This is an important part in the overall design. 

There are also MMIO regions associated with queues.

> > We're going to need to be careful about irq allocation here.
> > Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
> > MMIO addresses, 

> GET_SOURCE_INFO only retrieves the address of the MMIO region for 
> a 'lisn'.

An interrupt number as coming from the device-tree.

>  it is not dynamically mapped. In the KVM case, the initial
> information on the address would come from OPAL and then the host 
> kernel would translate this information for the guest.
> 
> > we need the MMIO addresses to be stable and consistent, because 
> > we can't have them change across migration.  
> 
> yes. I will catch my XIVE guru next week in Paris to clarify that
> part. 
> 
> > We need to have this consistent between in-qemu and in-KVM XIVE
> > implementations as well.
> 
> yes.
> 
> C.
> 
> > > 
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > > ---
> > >  hw/intc/spapr_xive.c        | 255 ++++++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/ppc/spapr_xive.h |   6 ++
> > >  2 files changed, 261 insertions(+)
> > > 
> > > diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> > > index 1ed7b6a286e9..8a85d64efc4c 100644
> > > --- a/hw/intc/spapr_xive.c
> > > +++ b/hw/intc/spapr_xive.c
> > > @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
> > >  }
> > >  
> > >  /*
> > > + * "magic" Event State Buffer (ESB) MMIO offsets.
> > > + *
> > > + * Each interrupt source has a 2-bit state machine called ESB
> > > + * which can be controlled by MMIO. It's made of 2 bits, P and
> > > + * Q. P indicates that an interrupt is pending (has been sent
> > > + * to a queue and is waiting for an EOI). Q indicates that the
> > > + * interrupt has been triggered while pending.
> > > + *
> > > + * This acts as a coalescing mechanism in order to guarantee
> > > + * that a given interrupt only occurs at most once in a queue.
> > > + *
> > > + * When doing an EOI, the Q bit will indicate if the interrupt
> > > + * needs to be re-triggered.
> > > + *
> > > + * The following offsets into the ESB MMIO allow to read or
> > > + * manipulate the PQ bits. They must be used with an 8-bytes
> > > + * load instruction. They all return the previous state of the
> > > + * interrupt (atomically).
> > > + *
> > > + * Additionally, some ESB pages support doing an EOI via a
> > > + * store at 0 and some ESBs support doing a trigger via a
> > > + * separate trigger page.
> > > + */
> > > +#define XIVE_ESB_GET            0x800
> > > +#define XIVE_ESB_SET_PQ_00      0xc00
> > > +#define XIVE_ESB_SET_PQ_01      0xd00
> > > +#define XIVE_ESB_SET_PQ_10      0xe00
> > > +#define XIVE_ESB_SET_PQ_11      0xf00
> > > +
> > > +#define XIVE_ESB_VAL_P          0x2
> > > +#define XIVE_ESB_VAL_Q          0x1
> > > +
> > > +#define XIVE_ESB_RESET          0x0
> > > +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
> > > +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> > > +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
> > > +
> > > +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
> > > +{
> > > +    uint32_t byte = idx / 4;
> > > +    uint32_t bit  = (idx % 4) * 2;
> > > +
> > > +    assert(byte < xive->sbe_size);
> > > +
> > > +    return (xive->sbe[byte] >> bit) & 0x3;
> > > +}
> > > +
> > > +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
> > > +{
> > > +    uint32_t byte = idx / 4;
> > > +    uint32_t bit  = (idx % 4) * 2;
> > > +    uint8_t old, new;
> > > +
> > > +    assert(byte < xive->sbe_size);
> > > +
> > > +    old = xive->sbe[byte];
> > > +
> > > +    new = xive->sbe[byte] & ~(0x3 << bit);
> > > +    new |= (pq & 0x3) << bit;
> > > +
> > > +    xive->sbe[byte] = new;
> > > +
> > > +    return (old >> bit) & 0x3;
> > > +}
> > > +
> > > +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
> > > +{
> > > +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> > > +
> > > +    switch (old_pq) {
> > > +    case XIVE_ESB_RESET:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> > > +        return false;
> > > +    case XIVE_ESB_PENDING:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> > > +        return false;
> > > +    case XIVE_ESB_QUEUED:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> > > +        return true;
> > > +    case XIVE_ESB_OFF:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
> > > +        return false;
> > > +    default:
> > > +         g_assert_not_reached();
> > > +    }
> > > +}
> > > +
> > > +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
> > > +{
> > > +    uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> > > +
> > > +    switch (old_pq) {
> > > +    case XIVE_ESB_RESET:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> > > +        return true;
> > > +    case XIVE_ESB_PENDING:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
> > > +        return true;
> > > +    case XIVE_ESB_QUEUED:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
> > > +        return true;
> > > +    case XIVE_ESB_OFF:
> > > +        spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
> > > +        return false;
> > > +    default:
> > > +         g_assert_not_reached();
> > > +    }
> > > +}
> > > +
> > > +/*
> > > + * XIVE Interrupt Source MMIOs
> > > + */
> > > +static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)
> > > +{
> > > +    ICSIRQState *irq = &xive->ics->irqs[srcno];
> > > +
> > > +    if (irq->flags & XICS_FLAGS_IRQ_LSI) {
> > > +        irq->status &= ~XICS_STATUS_SENT;
> > > +    }
> > > +}
> > > +
> > > +/* TODO: handle second page
> > > + *
> > > + * Some HW use a separate page for trigger. We only support the case
> > > + * in which the trigger can be done in the same page as the EOI.
> > > + */
> > > +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> > > +{
> > > +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> > > +    uint32_t offset = addr & 0xF00;
> > > +    uint32_t srcno = addr >> xive->esb_shift;
> > > +    XiveIVE *ive;
> > > +    uint64_t ret = -1;
> > > +
> > > +    ive = spapr_xive_get_ive(xive, srcno);
> > > +    if (!ive || !(ive->w & IVE_VALID))  {
> > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> > > +        goto out;
> > 
> > Since there's a whole (4k) page for each source, I wonder if we should
> > actually map each one as a separate MMIO region to allow us to tweak
> > the mappings more flexibly.
> > 
> > > +    }
> > > +
> > > +    switch (offset) {
> > > +    case 0:
> > > +        spapr_xive_source_eoi(xive, srcno);
> > > +
> > > +        /* return TRUE or FALSE depending on PQ value */
> > > +        ret = spapr_xive_pq_eoi(xive, srcno);
> > > +        break;
> > > +
> > > +    case XIVE_ESB_GET:
> > > +        ret = spapr_xive_pq_get(xive, srcno);
> > > +        break;
> > > +
> > > +    case XIVE_ESB_SET_PQ_00:
> > > +    case XIVE_ESB_SET_PQ_01:
> > > +    case XIVE_ESB_SET_PQ_10:
> > > +    case XIVE_ESB_SET_PQ_11:
> > > +        ret = spapr_xive_pq_set(xive, srcno, (offset >> 8) & 0x3);
> > > +        break;
> > > +    default:
> > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
> > > +    }
> > > +
> > > +out:
> > > +    return ret;
> > > +}
> > > +
> > > +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
> > > +                           uint64_t value, unsigned size)
> > > +{
> > > +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> > > +    uint32_t offset = addr & 0xF00;
> > > +    uint32_t srcno = addr >> xive->esb_shift;
> > > +    XiveIVE *ive;
> > > +    bool notify = false;
> > > +
> > > +    ive = spapr_xive_get_ive(xive, srcno);
> > > +    if (!ive || !(ive->w & IVE_VALID))  {
> > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> > > +        return;
> > > +    }
> > > +
> > > +    switch (offset) {
> > > +    case 0:
> > > +        /* TODO: should we trigger even if the IVE is masked ? */
> > > +        notify = spapr_xive_pq_trigger(xive, srcno);
> > > +        break;
> > > +    default:
> > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> > > +                      offset);
> > > +        return;
> > > +    }
> > > +
> > > +    if (notify && !(ive->w & IVE_MASKED)) {
> > > +        qemu_irq_pulse(xive->qirqs[srcno]);
> > > +    }
> > > +}
> > > +
> > > +static const MemoryRegionOps spapr_xive_esb_ops = {
> > > +    .read = spapr_xive_esb_read,
> > > +    .write = spapr_xive_esb_write,
> > > +    .endianness = DEVICE_BIG_ENDIAN,
> > > +    .valid = {
> > > +        .min_access_size = 8,
> > > +        .max_access_size = 8,
> > > +    },
> > > +    .impl = {
> > > +        .min_access_size = 8,
> > > +        .max_access_size = 8,
> > > +    },
> > > +};
> > > +
> > > +/*
> > >   * XIVE Interrupt Source
> > >   */
> > >  static void spapr_xive_source_set_irq_msi(sPAPRXive *xive, int srcno, int val)
> > > @@ -74,6 +286,33 @@ static void spapr_xive_source_set_irq(void *opaque, int srcno, int val)
> > >  /*
> > >   * Main XIVE object
> > >   */
> > > +#define P9_MMIO_BASE     0x006000000000000ull
> > > +
> > > +/* VC BAR contains set translations for the ESBs and the EQs. */
> > > +#define VC_BAR_DEFAULT   0x10000000000ull
> > > +#define VC_BAR_SIZE      0x08000000000ull
> > > +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
> > > +
> > > +static uint64_t spapr_xive_esb_default_read(void *p, hwaddr offset,
> > > +                                            unsigned size)
> > > +{
> > > +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " [%u]\n",
> > > +                  __func__, offset, size);
> > > +    return 0;
> > > +}
> > > +
> > > +static void spapr_xive_esb_default_write(void *opaque, hwaddr offset,
> > > +                                         uint64_t value, unsigned size)
> > > +{
> > > +    qemu_log_mask(LOG_UNIMP, "%s: 0x%" HWADDR_PRIx " <- 0x%" PRIx64 " [%u]\n",
> > > +                  __func__, offset, value, size);
> > > +}
> > > +
> > > +static const MemoryRegionOps spapr_xive_esb_default_ops = {
> > > +    .read = spapr_xive_esb_default_read,
> > > +    .write = spapr_xive_esb_default_write,
> > > +    .endianness = DEVICE_BIG_ENDIAN,
> > > +};
> > >  
> > >  void spapr_xive_reset(void *dev)
> > >  {
> > > @@ -144,6 +383,22 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
> > >      xive->nr_eqs = xive->nr_targets * XIVE_EQ_PRIORITY_COUNT;
> > >      xive->eqt = g_malloc0(xive->nr_eqs * sizeof(XiveEQ));
> > >  
> > > +    /* VC BAR. That's the full window but we will only map the
> > > +     * subregions in use. */
> > > +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
> > > +    xive->esb_shift = ESB_SHIFT;
> > > +
> > > +    /* Install default memory region handlers to log bogus access */
> > > +    memory_region_init_io(&xive->esb_mr, NULL, &spapr_xive_esb_default_ops,
> > > +                          NULL, "xive.esb.full", VC_BAR_SIZE);
> > > +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_mr);
> > > +
> > > +    /* Install the ESB memory region in the overall one */
> > > +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive), &spapr_xive_esb_ops,
> > > +                          xive, "xive.esb",
> > > +                          (1ull << xive->esb_shift) * xive->nr_irqs);
> > > +    memory_region_add_subregion(&xive->esb_mr, 0, &xive->esb_iomem);
> > > +
> > >      qemu_register_reset(spapr_xive_reset, dev);
> > >  }
> > >  
> > > diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> > > index eab92c4c1bb8..0f516534d76a 100644
> > > --- a/include/hw/ppc/spapr_xive.h
> > > +++ b/include/hw/ppc/spapr_xive.h
> > > @@ -46,6 +46,12 @@ struct sPAPRXive {
> > >      XiveIVE      *ivt;
> > >      XiveEQ       *eqt;
> > >      uint32_t     nr_eqs;
> > > +
> > > +    /* ESB memory region */
> > > +    uint32_t     esb_shift;
> > > +    hwaddr       esb_base;
> > > +    MemoryRegion esb_mr;
> > > +    MemoryRegion esb_iomem;
> > >  };
> > >  
> > >  #endif /* PPC_SPAPR_XIVE_H */

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-20 13:05     ` Cédric Le Goater
@ 2017-09-28  8:29       ` Benjamin Herrenschmidt
  2017-09-28 13:20         ` David Gibson
  0 siblings, 1 reply; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:29 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Wed, 2017-09-20 at 15:05 +0200, Cédric Le Goater wrote:
> > > +/*
> > > + * XIVE Interrupt Source MMIOs
> > > + */
> > > +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> > > +{
> > > +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> > > +    uint32_t offset = addr & 0xF00;
> > > +    uint32_t srcno = addr >> xive->esb_shift;
> > > +    XiveIVE *ive;
> > > +    uint64_t ret = -1;
> > > +
> > > +    ive = spapr_xive_get_ive(xive, srcno);
> > > +    if (!ive || !(ive->w & IVE_VALID))  {
> > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> > > +        goto out;
> > 
> > Since there's a whole (4k) page for each source, I wonder if we should
> > actually map each one as a separate MMIO region to allow us to tweak
> > the mappings more flexibly
> 
> yes we could have a subregion for each source. In that case, 
> we should also handle IVE_VALID properly. That will require 
> a specific XIVE allocator which was difficult to do while
> keeping the compatibility with XICS for migration and CAS.

That will be a serious bloat with lots of interrupts. We also cannot
possibly have a KVM mm region per interrupt or even a vma.

I'm thinking of some kind of /dev/xive (or some other KVM or irqfd
orignated fd) that allows you to mmap a single big region whose content
is demand-faulted and invalidated by the kernel to map the various
interrupts.

So that it looks like a single VMA (and KVM memory block).

Ben.

> C.
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-21  1:35       ` David Gibson
  2017-09-21 11:21         ` Cédric Le Goater
@ 2017-09-28  8:31         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:31 UTC (permalink / raw)
  To: David Gibson, Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Thu, 2017-09-21 at 11:35 +1000, David Gibson wrote:
> > >> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
> > >> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> > >> +
> > >> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
> > >> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
> > >> +                     sizeof(eq_sizes)));
> > >> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
> > >> +                     sizeof(lisn_ranges)));
> > > 
> > > I note this doesn't have the interrupt-controller or #interrupt-cells
> > > properties.  So what acts as the interrupt parent for all the devices
> > > in the tree with XIVE?
> > 
> > these properties are not in the specs anymore for the interrupt-controller
> > node and I don't think Linux makes use of them (even for XICS). So 
> > it just works fine.
> 
> Um.. what!?  Are you saying that the PAPR XIVE spec completely broke
> how interrupt specifiers have worked in the device tree since forever?
> 
> And I'm pretty sure Linux does make use of them.  Without
> #interrupt-cells, there's no way it can properly interpret the
> interrupts properties in the device nodes.

Linux does make use of them and they are in the spec, but don't confuse
the nodes for the presentation controllers vs the node for the virtual
source controller which is the one that is the root of the interrupt
tree.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-21 11:21         ` Cédric Le Goater
  2017-09-22 10:54           ` David Gibson
@ 2017-09-28  8:43           ` Benjamin Herrenschmidt
  2017-09-28  8:51             ` Cédric Le Goater
  1 sibling, 1 reply; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28  8:43 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Thu, 2017-09-21 at 13:21 +0200, Cédric Le Goater wrote:
> Let me be more precise. I am saying that the interrupt-controller 
> and #interrupt-cells properties are not needed under the main interrupt 
> controller node. They can be removed from the tree and the Linux guest 
> kernel will boot perfectly well.

No they are needed. They are the parents of PCI interrupts for example.
There's something fishy here.

Do you have a DT snapshot from pHyp for me to look at ?

> These properties still are needed under the sub nodes like :
> 
> /proc/device-tree/vdevice/interrupt-controller
> /proc/device-tree/event-sources/interrupt-controller

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-28  8:43           ` Benjamin Herrenschmidt
@ 2017-09-28  8:51             ` Cédric Le Goater
  2017-09-28 10:03               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-28  8:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On 09/28/2017 10:43 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2017-09-21 at 13:21 +0200, Cédric Le Goater wrote:
>> Let me be more precise. I am saying that the interrupt-controller 
>> and #interrupt-cells properties are not needed under the main interrupt 
>> controller node. They can be removed from the tree and the Linux guest 
>> kernel will boot perfectly well.
> 
> No they are needed. They are the parents of PCI interrupts for example.
> There's something fishy here.

probably, I just removed the properties under QEMU and could 
boot the guest, with disks and network.

 
> Do you have a DT snapshot from pHyp for me to look at ?


# lsprop /proc/device-tree/interrupt-controller\@200010000/
compatible       "ibm,power-ivpe"
device_type      "power-ivpe"
ibm,xive-eq-sizes
		 00000007 00000009 0000000c 0000000e
		 00000010 00000012 00000015 00000016
		 00000018
reg              00000002 00010000 00000000 00010000
		 00000002 00000000 00000000 00010000
linux,phandle    00dce438 (14476344)
ibm,xive-lisn-ranges
		 00094000 00000030
name             "interrupt-controller"


Cheers,

C. 

> 
>> These properties still are needed under the sub nodes like :
>>
>> /proc/device-tree/vdevice/interrupt-controller
>> /proc/device-tree/event-sources/interrupt-controller

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-28  8:51             ` Cédric Le Goater
@ 2017-09-28 10:03               ` Benjamin Herrenschmidt
  2017-09-28 12:50                 ` Cédric Le Goater
  0 siblings, 1 reply; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-28 10:03 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On Thu, 2017-09-28 at 10:51 +0200, Cédric Le Goater wrote:
> probably, I just removed the properties under QEMU and could 
> boot the guest, with disks and network.

As long as you don't use LSIs...
>  
> > Do you have a DT snapshot from pHyp for me to look at ?
> 
> 
> # lsprop /proc/device-tree/interrupt-controller\@200010000/
> compatible       "ibm,power-ivpe"
> device_type      "power-ivpe"
> ibm,xive-eq-sizes
>                  00000007 00000009 0000000c 0000000e
>                  00000010 00000012 00000015 00000016
>                  00000018
> reg              00000002 00010000 00000000 00010000
>                  00000002 00000000 00000000 00010000
> linux,phandle    00dce438 (14476344)
> ibm,xive-lisn-ranges
>                  00094000 00000030
> name             "interrupt-controller"
> 
> 
> Cheers,

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support
  2017-09-28 10:03               ` Benjamin Herrenschmidt
@ 2017-09-28 12:50                 ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-09-28 12:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On 09/28/2017 12:03 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2017-09-28 at 10:51 +0200, Cédric Le Goater wrote:
>> probably, I just removed the properties under QEMU and could 
>> boot the guest, with disks and network.
> 
> As long as you don't use LSIs...

That I didn't test much. Which the devices could I use for 
the guest ? 

Thanks,  

C.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9)
  2017-09-28  8:23       ` Benjamin Herrenschmidt
@ 2017-09-28 13:17         ` David Gibson
  0 siblings, 0 replies; 90+ messages in thread
From: David Gibson @ 2017-09-28 13:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, qemu-ppc, qemu-devel,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 3497 bytes --]

On Thu, Sep 28, 2017 at 10:23:22AM +0200, Benjamin Herrenschmidt wrote:
> On Wed, 2017-09-20 at 14:33 +0200, Cédric Le Goater wrote:
> > > > I'm thinking maybe trying to support the CAS negotiation of interrupt
> > > > controller from day 1 is warping the design.  A better approach might
> > > > be first to implement XIVE only when given a specific machine option -
> > > > guest gets one or the other and can't negotiate.
> > 
> > ok. 
> > 
> > CAS is not the most complex problem, we mostly need to share 
> > the ICSIRQState array and the source offset. migration from older
> > machine is a problem. We are doomed to keep the existing XICS
> > framework available.
> 
> I don't like sharing anything. I'd rather we had separate objects
> alltogether. If needed we can implement CAS by doing a partition reboot
> like pHyp does, at least initially, until we add ways to tear down and
> rebuild objects.

Right, I agree.  The difficulty isn't really CAS reboot or not, it's
more that altering the virtual hardware at runtime is.. awkward.. in
qemu.  And then there's the issue of migrating the state, which also
gets a bit complex.

As you've seen elsewhere, I think we need to get the XIVE model right
on its own first, then worry about those issues.

> The main issue is whether we can keep a consistent number space so the
> DT doesn't have to be completely rebuilt. If it does, then reboot will
> be the only practical option I'm afraid.

I think it should be possible to make a consistent number space.  At
present the irq allocation is kind of tied to xics, but I think that's
fixable.

> > > > That should allow a more natural XIVE design to emerge, *then* we can
> > > > look at what's necessary to make boot-time negotiation possible.
> > > 
> > > Actually, it just occurred to me that we might be making life hard for
> > > ourselves by trying to actually switch between full XICS and XIVE
> > > models.  Coudln't we have new machine types always construct the XIVE
> > > infrastructure, 
> > 
> > yes.
> > 
> > > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> > > hardware.
> 
> That's gross :-)
> 
> This is also exactly what KVM does with real XIVE HW and there's also
> such an emulation in OPAL. I'd be weary of creating a 3rd one...
> 
> I'd much prefer if we managed to:
> 
>  - Split the source numbering from the various state tracking objects
> so we can have that common
> 
>  - Either delay the creation to after CAS or tear down & re-create the
> state tracking objects at CAS time.
> 
> > ok but migration will not be supported.
> > 
> > > Since something more or less equivalent
> > > has already been done in both OPAL and the host kernel, I'm guessing
> > > this shouldn't be too hard at this point.
> 
> It would very much suck to have yet another one of these.

Hm, ok.

> Also we need to understand how that would work in a KVM context, the
> kernel will provide a "XICS" state even on top of XIVE unless we switch
> the kernel object to native, but then the kernel will expect full
> exploitation.
> 
> > Indeed that is how it is working currently on P9 kvm guests. hcalls are
> > implemented on top of XIVE native.
> > 
> > Thanks,
> > 
> > 
> > C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources
  2017-09-28  8:29       ` Benjamin Herrenschmidt
@ 2017-09-28 13:20         ` David Gibson
  0 siblings, 0 replies; 90+ messages in thread
From: David Gibson @ 2017-09-28 13:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Cédric Le Goater, qemu-ppc, qemu-devel,
	Alexey Kardashevskiy, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

On Thu, Sep 28, 2017 at 10:29:02AM +0200, Benjamin Herrenschmidt wrote:
> On Wed, 2017-09-20 at 15:05 +0200, Cédric Le Goater wrote:
> > > > +/*
> > > > + * XIVE Interrupt Source MMIOs
> > > > + */
> > > > +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> > > > +{
> > > > +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> > > > +    uint32_t offset = addr & 0xF00;
> > > > +    uint32_t srcno = addr >> xive->esb_shift;
> > > > +    XiveIVE *ive;
> > > > +    uint64_t ret = -1;
> > > > +
> > > > +    ive = spapr_xive_get_ive(xive, srcno);
> > > > +    if (!ive || !(ive->w & IVE_VALID))  {
> > > > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> > > > +        goto out;
> > > 
> > > Since there's a whole (4k) page for each source, I wonder if we should
> > > actually map each one as a separate MMIO region to allow us to tweak
> > > the mappings more flexibly
> > 
> > yes we could have a subregion for each source. In that case, 
> > we should also handle IVE_VALID properly. That will require 
> > a specific XIVE allocator which was difficult to do while
> > keeping the compatibility with XICS for migration and CAS.
> 
> That will be a serious bloat with lots of interrupts. We also cannot
> possibly have a KVM mm region per interrupt or even a vma.

Yeah.  I'd been thinking in terms of thousands of sources, which
wouldn't be too unreasonable in terms of separate regions.  With all
the IPIs sounds like it could be more in the hundres of thousands at
which point that would get very nasty.

So I agree, I think we want to keep it one region.

AIUI, how we do that shouldn't affect the guest, though.

> I'm thinking of some kind of /dev/xive (or some other KVM or irqfd
> orignated fd) that allows you to mmap a single big region whose content
> is demand-faulted and invalidated by the kernel to map the various
> interrupts.
> 
> So that it looks like a single VMA (and KVM memory block).
> 
> Ben.
> 
> > C.
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-26  3:54           ` David Gibson
  2017-09-26  9:45             ` Benjamin Herrenschmidt
@ 2017-11-16 15:58             ` Cédric Le Goater
  1 sibling, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-11-16 15:58 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt,
	Alexey Kardashevskiy, Alexander Graf

On 09/26/2017 05:54 AM, David Gibson wrote:
> On Fri, Sep 22, 2017 at 02:42:07PM +0200, Cédric Le Goater wrote:
>> On 09/22/2017 01:00 PM, David Gibson wrote:
>>> On Tue, Sep 19, 2017 at 03:15:44PM +0200, Cédric Le Goater wrote:
>>>> On 09/19/2017 04:27 AM, David Gibson wrote:
>>>>> On Mon, Sep 11, 2017 at 07:12:15PM +0200, Cédric Le Goater wrote:
>>>>>> Start with a couple of attributes for the XIVE sPAPR controller
>>>>>> model. The number of provisionned IRQ is necessary to size the
>>>>>> different internal XIVE tables, the number of CPUs is also.
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>>>
>>>>> [snip]
>>>>>
>>>>>> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>>>>> +{
>>>>>> +    sPAPRXive *xive = SPAPR_XIVE(dev);
>>>>>> +
>>>>>> +    if (!xive->nr_targets) {
>>>>>> +        error_setg(errp, "Number of interrupt targets needs to be greater 0");
>>>>>> +        return;
>>>>>> +    }
>>>>>> +    /* We need to be able to allocate at least the IPIs */
>>>>>> +    if (!xive->nr_irqs || xive->nr_irqs < xive->nr_targets) {
>>>>>> +        error_setg(errp, "Number of interrupts too small");
>>>>>> +        return;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static Property spapr_xive_properties[] = {
>>>>>> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
>>>>>> +    DEFINE_PROP_UINT32("nr-targets", sPAPRXive, nr_targets, 0),
>>>>>
>>>>> I'm a bit uneasy about the number of targets having to be set in
>>>>> advance: this can make life awkward when CPUs are hotplugged.  I know
>>>>> there's something similar in xics, but it has caused some hassles, and
>>>>> we're starting to move away from it.
>>>>>
>>>>> Do you really need this?
>>>>>
>>>>
>>>> Some of the internal table size depend on the number of cpus 
>>>> defined for the machine.
>>>
>>> Which ones?  My impression was that there needed to be at least #cpus
>>> * #priority-levels EQs, but there could be more than that, 
>>
>> euh no, not in spapr mode at least. There are 8 queues per cpu.
> 
> Ok.
> 
>>> so it was no longer as tightly bound to the number if "interrupt servers"> as xics.
>>
>> ah. I think I see what you mean, that we could allocate them on the 
>> fly when needed by some hcalls ?
> 
> Not at hcall time, no, but at cpu hot(un)plug time I was wondering if we
> could (de)allocate them then.

Yes. I am currently reshuffling the ppc/xive patchset on top of the 
sPAPR allocator, also trying to take into account your comments and 
Ben's. And I think we can do that. 

I have introduced a specific XiveICPState (stored under the ->intc
pointer of the CPUState) but I think we can go a little further and
also store the XiveEQ array there. So that all the state related to 
a CPU would be allocated at init time or hot(un)plug time hot time.

We could call the resulting compound object (ICP + EQs) a sPAPRXiveCore 
or something like that. And we won't need to keep 'nr_targets' around 
the sPAPR Xive object anymore *if* we do one more thing. See below. 

>> The other place where I use the nr_targets is to provision the 
>> IRQ numbers for the IPIs but that could probably be done in some 
>> other way, specially it there is a IRQ allocator at the machine
>> level.
> 
> Hm, ok.

The sPAPRXiveCore object above could also allocate the IPI for XIVE 
when it is created. But, the resulting IRQ number should be in a range 
not overlapping with the other devices' IRQs. 

That means that we should probably introduce the BUID concept in the 
IRQ allocator. This concept was left out from the original design of 
XICS as it was using a single ICS but it would be useful to define 
ranges for devices now. for PHBs for instance.

We could start with 2K or 4K ranges. The first one would be for IPIs.

Dave, could you please take a quick look at the allocator patchset 
when you have some time. I don't think there is much to do in the API 
to introduce ranges. It is just a question of defining a 'start' value 
when looking for an empty slot.  

Thanks,

C. 

>>
>> C.  
>>>> When the sPAPRXive object is instantiated, 
>>>> we use xics_max_server_number() to get the max number of cpus
>>>> provisioned.
>>>>
>>>> C.
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  2017-09-26  9:45             ` Benjamin Herrenschmidt
@ 2017-11-16 16:48               ` Cédric Le Goater
  0 siblings, 0 replies; 90+ messages in thread
From: Cédric Le Goater @ 2017-11-16 16:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson
  Cc: qemu-ppc, qemu-devel, Alexey Kardashevskiy, Alexander Graf

On 09/26/2017 11:45 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-09-26 at 13:54 +1000, David Gibson wrote:
>>>>
>>>> Which ones?  My impression was that there needed to be at least #cpus
>>>> * #priority-levels EQs, but there could be more than that, 
>>>
>>> euh no, not in spapr mode at least. There are 8 queues per cpu.
>>
>> Ok.
> 
> There's a HW feature of XIVE in DD2.x that I will start exploiting soon
> that sacrifices a queue btw, keep that in mind.
> 
> We should probably only expose 0...6 to guests, not 0...7.

yes. This is achieved using the "ibm,plat-res-int-priorities" property 
of the device tree. 

I don't think we also need to check these priority ranges when allocating
the EQs. We can just allocate them all (8 priorities) and let the hcalls 
do the checks. They should return H_P3 or H_P4 if the prio is invalid.

C.

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2017-11-16 16:49 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-11 17:12 [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 01/21] ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller Cédric Le Goater
2017-09-19  2:27   ` David Gibson
2017-09-19 13:15     ` Cédric Le Goater
2017-09-22 11:00       ` David Gibson
2017-09-22 12:42         ` Cédric Le Goater
2017-09-26  3:54           ` David Gibson
2017-09-26  9:45             ` Benjamin Herrenschmidt
2017-11-16 16:48               ` Cédric Le Goater
2017-11-16 15:58             ` Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 02/21] migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 03/21] ppc/xive: define the XIVE internal tables Cédric Le Goater
2017-09-19  2:39   ` David Gibson
2017-09-19 13:46     ` Cédric Le Goater
2017-09-20  4:33       ` David Gibson
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 04/21] ppc/xive: provide a link to the sPAPR ICS object under XIVE Cédric Le Goater
2017-09-11 22:04   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-09-12  5:47     ` Cédric Le Goater
2017-09-19  2:44   ` [Qemu-devel] " David Gibson
2017-09-19 14:46     ` Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 05/21] ppc/xive: allocate IRQ numbers for the IPIs Cédric Le Goater
2017-09-19  2:45   ` David Gibson
2017-09-19 14:52     ` Cédric Le Goater
2017-09-20  4:35       ` David Gibson
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 06/21] ppc/xive: introduce handlers for interrupt sources Cédric Le Goater
2017-09-19  2:48   ` David Gibson
2017-09-19 15:08     ` Cédric Le Goater
2017-09-20  4:38       ` David Gibson
2017-09-21 14:11         ` Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE " Cédric Le Goater
2017-09-19  2:57   ` David Gibson
2017-09-20 12:54     ` Cédric Le Goater
2017-09-22 10:58       ` David Gibson
2017-09-22 12:26         ` Cédric Le Goater
2017-09-28  8:27       ` Benjamin Herrenschmidt
2017-09-20 13:05     ` Cédric Le Goater
2017-09-28  8:29       ` Benjamin Herrenschmidt
2017-09-28 13:20         ` David Gibson
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 08/21] ppc/xive: describe the XIVE interrupt source flags Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 09/21] ppc/xive: extend the interrupt presenter model for XIVE Cédric Le Goater
2017-09-19  7:36   ` David Gibson
2017-09-19 19:28     ` Cédric Le Goater
2017-09-22 10:58       ` David Gibson
2017-09-22 12:27         ` Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 10/21] ppc/xive: add MMIO handlers for the XIVE TIMA Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 11/21] ppc/xive: push the EQ data in OS event queue Cédric Le Goater
2017-09-19  7:45   ` David Gibson
2017-09-19 19:36     ` Cédric Le Goater
2017-09-20  6:34       ` David Gibson
2017-09-28  8:12         ` Benjamin Herrenschmidt
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 12/21] ppc/xive: notify the CPU when interrupt priority is more privileged Cédric Le Goater
2017-09-19  7:50   ` David Gibson
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 13/21] ppc/xive: handle interrupt acknowledgment by the O/S Cédric Le Goater
2017-09-19  7:53   ` David Gibson
2017-09-20  9:40     ` Cédric Le Goater
2017-09-28  8:14       ` Benjamin Herrenschmidt
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 14/21] ppc/xive: add support for the SET_OS_PENDING command Cédric Le Goater
2017-09-19  7:55   ` David Gibson
2017-09-20  9:47     ` Cédric Le Goater
2017-09-28  8:18       ` Benjamin Herrenschmidt
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 15/21] spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument Cédric Le Goater
2017-09-19  7:56   ` David Gibson
2017-09-20  9:49     ` Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 16/21] spapr: add a XIVE object to the sPAPR machine Cédric Le Goater
2017-09-19  8:38   ` David Gibson
2017-09-20  9:51     ` Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 17/21] ppc/xive: add hcalls support Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 18/21] ppc/xive: add device tree support Cédric Le Goater
2017-09-19  8:44   ` David Gibson
2017-09-20 12:26     ` Cédric Le Goater
2017-09-21  1:35       ` David Gibson
2017-09-21 11:21         ` Cédric Le Goater
2017-09-22 10:54           ` David Gibson
2017-09-28  8:43           ` Benjamin Herrenschmidt
2017-09-28  8:51             ` Cédric Le Goater
2017-09-28 10:03               ` Benjamin Herrenschmidt
2017-09-28 12:50                 ` Cédric Le Goater
2017-09-28  8:31         ` Benjamin Herrenschmidt
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 19/21] ppc/xive: introduce a helper to map the XIVE memory regions Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 20/21] ppc/xics: introduce a qirq_get() helper in the XICSFabric Cédric Le Goater
2017-09-11 17:12 ` [Qemu-devel] [RFC PATCH v2 21/21] spapr: activate XIVE exploitation mode Cédric Le Goater
2017-09-19  8:20 ` [Qemu-devel] [RFC PATCH v2 00/21] Guest exploitation of the XIVE interrupt controller (POWER9) David Gibson
2017-09-19  8:46   ` David Gibson
2017-09-20 12:33     ` Cédric Le Goater
2017-09-21  1:25       ` David Gibson
2017-09-21 14:18         ` Cédric Le Goater
2017-09-22 10:33           ` David Gibson
2017-09-22 12:32             ` Cédric Le Goater
2017-09-28  8:23       ` Benjamin Herrenschmidt
2017-09-28 13:17         ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.