All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9)
@ 2018-12-09 19:45 Cédric Le Goater
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers Cédric Le Goater
                   ` (18 more replies)
  0 siblings, 19 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Hello,

Here is the version 7 of the QEMU models adding support for the XIVE
interrupt controller to the sPAPR machine, under TCG only this
time. KVM support will be proposed in an other patchset, along with
the KVM XIVE device patchset, and so will PowerNV.

The most important changes for sPAPR are the introduction of the 4.0
machines. The sPAPRXive model still inherits from the XiveRouter. It
is possible to change the class inheritance tree but it does not bring
much for now.

I am not sure how we should handle the machine definitions, so I
proposed both, XIVE only and dual interrupt mode. The impact on the
XICS machine is limited with TCG but KVM support of the 'dual' machine
will change things. Let me know how you want to proceed.

Thanks,

C.

Changes in v7 :

 Common XIVE models :
 
 - removed the 'chip-id' field from XiveRouter
 - introduced a 'block-id' field in XiveENDSource to lookup the XIVE
   END structure when doing a load in the MMIO ESB
 - removed reset XiveENDSource handler
 - introduced a xive_tctx_word2() helper to extract TM_WORD2 of a ring.
 - removed HW CAM line setting and use as it is only useful for PowerNV
 - made use of xive_tctx_word2() helper
 - made use of GETFIELD_BE32() to compare CAM lines
 - fixed initialization of XiveTCTXMatch

 sPAPR models :

 - simplified the prototypes of helpers
 - introduced an assert in set_nvt() method
 - introduced a fixed value for the controller block id value.
 - removed the hardwiring the HW CAM line. Back to v5 state.
 - removed patch "spapr: modify the irq backend 'init' method". It did
   not bring much
 - split the 'xive' sPAPR IRQ backend from the 'xive' machine
 - split the 'dual' sPAPR IRQ backend from the 'dual' machine
 - introduced 4.0* machines
 
 KVM :

 - hardly no changes 
 - will come later in a KVM patchset

 PowerNV:

 - will come later in a PowerNV patchset

Changes in v6 :

 https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00965.html

Changes in v5 :

 https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg03218.html

Changes in v4 :

 https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01672.html


= XIVE =================================================================


The POWER9 processor comes with a new interrupt controller, called
XIVE as "eXternal Interrupt Virtualization Engine".


* Overall architecture


             XIVE Interrupt Controller
             +------------------------------------+      IPIs
             | +---------+ +---------+ +--------+ |    +-------+
             | |VC       | |CQ       | |PC      |----> | CORES |
             | |     esb | |         | |        |----> |       |
             | |     eas | |  Bridge | |   tctx |----> |       |
             | |SC   end | |         | |    nvt | |    |       |
 +------+    | +---------+ +----+----+ +--------+ |    +-+-+-+-+
 | RAM  |    +------------------|-----------------+      | | |
 |      |                       |                        | | |
 |      |                       |                        | | |
 |      |  +--------------------v------------------------v-v-v--+    other
 |      <--+                     Power Bus                      +--> chips
 |  esb |  +---------+-----------------------+------------------+
 |  eas |            |                       |
 |  end |         +--|------+                |
 |  nvt |       +----+----+ |           +----+----+
 +------+       |SC       | |           |SC       |
                |         | |           |         |
                | PQ-bits | |           | PQ-bits |
                | local   |-+           |  in VC  |
                +---------+             +---------+
                   PCIe                 NX,NPU,CAPI

                  SC: Source Controller (aka. IVSE)
                  VC: Virtualization Controller (aka. IVRE)
                  PC: Presentation Controller (aka. IVPE)
                  CQ: Common Queue (Bridge)

             PQ-bits: 2 bits source state machine (P:pending Q:queued)
                 esb: Event State Buffer (Array of PQ bits in an IVSE)
                 eas: Event Assignment Structure
                 end: Event Notification Descriptor
                 nvt: Notification Virtual Target
                tctx: Thread interrupt Context


The XIVE IC is composed of three sub-engines :

  - Interrupt Virtualization Source Engine (IVSE), or Source
    Controller (SC). These are found in PCI PHBs, in the PSI host
    bridge controller, but also inside the main controller for the
    core IPIs and other sub-chips (NX, CAP, NPU) of the
    chip/processor. They are configured to feed the IVRE with events.

  - Interrupt Virtualization Routing Engine (IVRE) or Virtualization
    Controller (VC). Its job is to match an event source with an Event
    Notification Descriptor (END).

  - Interrupt Virtualization Presentation Engine (IVPE) or Presentation
    Controller (PC). It maintains the interrupt context state of each
    thread and handles the delivery of the external exception to the
    thread.


* XIVE internal tables

Each of the sub-engines uses a set of tables to redirect exceptions
from event sources to CPU threads.

                                          +-------+
  User or OS                              |  EQ   |
      or                          +------>|entries|
  Hypervisor                      |       |  ..   |
    Memory                        |       +-------+
                                  |           ^
                                  |           |
             +-------------------------------------------------+
                                  |           |
  Hypervisor      +------+    +---+--+    +---+--+   +------+
    Memory        | ESB  |    | EAT  |    | ENDT |   | NVTT |
   (skiboot)      +----+-+    +----+-+    +----+-+   +------+
                    ^  |        ^  |        ^  |       ^
                    |  |        |  |        |  |       |
             +-------------------------------------------------+
                    |  |        |  |        |  |       |
                    |  |        |  |        |  |       |
               +----|--|--------|--|--------|--|-+   +-|-----+    +------+
               |    |  |        |  |        |  | |   | | tctx|    |Thread|
   IPI or   ---+    +  v        +  v        +  v |---| +  .. |----->     |
  HW events    |                                 |   |       |    |      |
               |             IVRE                |   | IVPE  |    +------+
               +---------------------------------+   +-------+
            


The IVSE have a 2-bits state machine, P for pending and Q for queued,
for each source that allows events to be triggered. They are stored in
an Event State Buffer (ESB) array and can be controlled by MMIOs.

If the event is let through, the IVRE looks up in the Event Assignment
Structure (EAS) table for an Event Notification Descriptor (END)
configured for the source. Each Event Notification Descriptor defines
a notification path to a CPU and an in-memory Event Queue, in which
will be enqueued an EQ data for the OS to pull.

The IVPE determines if a Notification Virtual Target (NVT) can handle
the event by scanning the thread contexts of the VCPUs dispatched on
the processor HW threads. It maintains the interrupt context state of
each thread in a NVT table.


* Overview of the QEMU models for the XIVE sub-engines

The XiveSource models the IVSE in general, internal and external. It
handles the source ESBs and the MMIO interface to control them.

The XiveNotifier is a small helper interface interconnecting the
XiveSource to the XiveRouter.

The XiveRouter is an abstract model acting as a combined IVRE and
IVPE. It routes event notifications using the IVE and EQD tables to
the IVPE sub-engine which does a CAM scan to find a CPU to deliver the
exception. Storage should be provided by the inheriting classes.

XiveENDSource is a special source object. It exposes the EQ ESB MMIOs of
the Event Queues which are used for coalescing event notifications and
for escalation. Not used on the field, only to sync the EQ cache in
OPAL.

Finally, the XiveTCTX contains the interrupt state context of a thread,
four sets of registers, one for each exception that can be delivered
to a CPU. These contexts are scanned by the IVPE to find a matching VP
when a notification is triggered. It also models the Thread Interrupt
Management Area (TIMA), which exposes the thread context registers to
the CPU for interrupt management.


* XIVE for sPAPR

sPAPRXive models the XIVE interrupt controller of a sPAPR machine. It
inherits from the XiveRouter and provisions storage for the IVE and
END tables. The NVT table does not need a backend in sPAPR. It owns a
XiveSource object for the IPIs and the virtual device interrupts, a
memory region for the TIMA and a XiveENDSource to manage the END ESBs.
(not used by Linux).

These choices were made to have a sPAPR interrupt controller
consistent with the one found on baremetal and to facilitate KVM
support, the main difficulty being the host memory regions exposed to
the guest.

The NVT and tbe END indexing needs some care and a set of helpers are
defined to ease the conversion between the CPU id as seen by the guest
and the XIVE identifiers manipulated by the models. 


* Integration in the sPAPR machine, xive only and dual

A new sPAPR IRQ backend is defined for XIVE. It introduces a couple of
new operations to handle the differences in the creation of the device
tree and in the allocation of the CPU interrupt controller. A new
'xive' only pseries machine is defined using this XIVE backend.

Being able to support both interrupt mode in the same machine requires
some more changes. As the machine chooses the interrupt mode at CAS
time, it is activated after a reconfiguration done in a reset. This is
handled by a new 'dual' sPAPR IRQ backend which is built on top of the
XICS and XIVE backend. A new 'dual' pseries machine is defined using
this backend.


* KVM support

Support for KVM introduces a set of specific XIVE models, very much
like XICS does, which self-connect to their KVM counterparts in the
Linux kernel. Two host memory regions are exposed to the guest and
need special care at initialization :

  - ESB mmios
  - Thread Interrupt Management Area (TIMA)

The models uses KVM accessors to synchronize the QEMU state with
KVM. The states are :

  - the source configuration (EAT)
  - the END configuration (ENDT)
  - the OS EQ state (toggle bit and index)
  - the thread interrupt context registers.

Hybrid guest using KVM and an emulated irqchip (kernel_irqchip=off) is
supported. Migration under KVM is supported.

KVM support for the 'dual' machine required some more changes. Both
interrupt mode need to be initialized at the QEMU level to keep the
IRQ number space in sync and to allow switching from one mode to
another. At the KVM level, the whole initialization of the KVM device,
sources and presenters, needs to be done in the reset handler when the
interrupt mode is chosen. This is a major change in the KVM models.

KVM being initialized at reset, we loose the possiblity to fallback to
the QEMU emulated mode in case of failure and failures become fatal to
the machine.


* PowerNV models

The PnvXIVE model uses the XiveRouter abstract model just like
sPAPRXive. It provides accessors to the EAS, END and NVT tables which
are stored in the QEMU PowerNV machine and not in QEMU anymore. It
owns a set of memory regions for the IC registers, the ESBs, the END
ESBs, the TIMA, the notification MMIO.

Multichip is supported and the available IVSEs are the internal one
for the IPIS, the PSI host bridge controller and PHB4.

The next interesting step would be to add escalation events and model
the VCPU dispatching to support emulated KVM guests.


* GitHub trees
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-v7-3.1
  
QEMU PowerNV:

  https://github.com/legoater/qemu/commits/powernv-3.1

Linux/KVM:

  https://github.com/legoater/linux/commits/xive-4.20

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Cédric Le Goater (19):
  ppc/xive: add support for the END Event State Buffers
  ppc/xive: introduce the XIVE interrupt thread context
  ppc/xive: introduce a simplified XIVE presenter
  ppc/xive: notify the CPU when the interrupt priority is more
    privileged
  spapr/xive: introduce a XIVE interrupt controller
  spapr/xive: use the VCPU id as a NVT identifier
  spapr: introduce a new machine IRQ backend for XIVE
  spapr: add hcalls support for the XIVE exploitation interrupt mode
  spapr: add device tree support for the XIVE exploitation mode
  spapr: allocate the interrupt thread context under the CPU core
  spapr: extend the sPAPR IRQ backend for XICS migration
  spapr: add a 'reset' method to the sPAPR IRQ backend
  spapr: add an extra OV5 field to the sPAPR IRQ backend
  spapr: set the interrupt presenter at reset
  spapr/xive: enable XIVE MMIOs at reset
  spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  spapr: Add a pseries-4.0 machine type
  spapr: add a 'pseries-4.0-xive' machine type
  spapr: add a 'pseries-4.0-dual' machine type

 default-configs/ppc64-softmmu.mak |    1 +
 include/hw/compat.h               |    3 +
 include/hw/ppc/spapr.h            |   23 +-
 include/hw/ppc/spapr_cpu_core.h   |    2 +
 include/hw/ppc/spapr_irq.h        |   12 +
 include/hw/ppc/spapr_xive.h       |   53 ++
 include/hw/ppc/xics.h             |    4 +-
 include/hw/ppc/xive.h             |   80 ++
 include/hw/ppc/xive_regs.h        |  106 +++
 hw/intc/spapr_xive.c              | 1480 +++++++++++++++++++++++++++++
 hw/intc/xics_spapr.c              |    3 +-
 hw/intc/xive.c                    |  883 ++++++++++++++++-
 hw/ppc/spapr.c                    |  100 +-
 hw/ppc/spapr_cpu_core.c           |   31 +-
 hw/ppc/spapr_hcall.c              |   13 +
 hw/ppc/spapr_irq.c                |  350 +++++++
 hw/intc/Makefile.objs             |    1 +
 17 files changed, 3119 insertions(+), 26 deletions(-)
 create mode 100644 include/hw/ppc/spapr_xive.h
 create mode 100644 hw/intc/spapr_xive.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-10  4:16   ` David Gibson
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 02/19] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The Event Notification Descriptor (END) XIVE structure also contains
two Event State Buffers providing further coalescing of interrupts,
one for the notification event (ESn) and one for the escalation events
(ESe). A MMIO page is assigned for each to control the EOI through
loads only. Stores are not allowed.

The END ESBs are modeled through an object resembling the 'XiveSource'
It is stateless as the END state bits are backed into the XiveEND
structure under the XiveRouter and the MMIO accesses follow the same
rules as for the XiveSource ESBs.

END ESBs are not supported by the Linux drivers neither on OPAL nor on
sPAPR. Nevetherless, it provides a mean to study the question in the
future and validates a bit more the XIVE model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 Changes since v6:

 - removed the 'chip-id' field from XiveRouter
 - introduced a 'block-id' field in XiveENDSource to lookup the XIVE
   END structure when doing a load in the MMIO ESB
 - removed reset XiveENDSource handler

 include/hw/ppc/xive.h |  21 ++++++
 hw/intc/xive.c        | 160 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 179 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 4851d3b3a41f..014f64aa98f6 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -336,6 +336,27 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
 int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                           XiveEND *end, uint8_t word_number);
 
+/*
+ * XIVE END ESBs
+ */
+
+#define TYPE_XIVE_END_SOURCE "xive-end-source"
+#define XIVE_END_SOURCE(obj) \
+    OBJECT_CHECK(XiveENDSource, (obj), TYPE_XIVE_END_SOURCE)
+
+typedef struct XiveENDSource {
+    DeviceState parent;
+
+    uint32_t        nr_ends;
+    uint8_t         block_id;
+
+    /* ESB memory region */
+    uint32_t        esb_shift;
+    MemoryRegion    esb_mmio;
+
+    XiveRouter      *xrtr;
+} XiveENDSource;
+
 /*
  * For legacy compatibility, the exceptions define up to 256 different
  * priorities. P9 implements only 9 levels : 8 active levels [0 - 7]
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 41d8ba1540d0..2196ce8de059 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -612,8 +612,18 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
      * even futher coalescing in the Router
      */
     if (!xive_end_is_notify(&end)) {
-        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
-        return;
+        uint8_t pq = GETFIELD_BE32(END_W1_ESn, end.w1);
+        bool notify = xive_esb_trigger(&pq);
+
+        if (pq != GETFIELD_BE32(END_W1_ESn, end.w1)) {
+            end.w1 = SETFIELD_BE32(END_W1_ESn, end.w1, pq);
+            xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
+        }
+
+        /* ESn[Q]=1 : end of notification */
+        if (!notify) {
+            return;
+        }
     }
 
     /*
@@ -692,6 +702,151 @@ void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon)
                    (uint32_t) GETFIELD_BE64(EAS_END_DATA, eas->w));
 }
 
+/*
+ * END ESB MMIO loads
+ */
+static uint64_t xive_end_source_read(void *opaque, hwaddr addr, unsigned size)
+{
+    XiveENDSource *xsrc = XIVE_END_SOURCE(opaque);
+    uint32_t offset = addr & 0xFFF;
+    uint8_t end_blk;
+    uint32_t end_idx;
+    XiveEND end;
+    uint32_t end_esmask;
+    uint8_t pq;
+    uint64_t ret = -1;
+
+    end_blk = xsrc->block_id;
+    end_idx = addr >> (xsrc->esb_shift + 1);
+
+    if (xive_router_get_end(xsrc->xrtr, end_blk, end_idx, &end)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No END %x/%x\n", end_blk,
+                      end_idx);
+        return -1;
+    }
+
+    if (!xive_end_is_valid(&end)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: END %x/%x is invalid\n",
+                      end_blk, end_idx);
+        return -1;
+    }
+
+    end_esmask = addr_is_even(addr, xsrc->esb_shift) ? END_W1_ESn : END_W1_ESe;
+    pq = GETFIELD_BE32(end_esmask, end.w1);
+
+    switch (offset) {
+    case XIVE_ESB_LOAD_EOI ... XIVE_ESB_LOAD_EOI + 0x7FF:
+        ret = xive_esb_eoi(&pq);
+
+        /* Forward the source event notification for routing ?? */
+        break;
+
+    case XIVE_ESB_GET ... XIVE_ESB_GET + 0x3FF:
+        ret = pq;
+        break;
+
+    case XIVE_ESB_SET_PQ_00 ... XIVE_ESB_SET_PQ_00 + 0x0FF:
+    case XIVE_ESB_SET_PQ_01 ... XIVE_ESB_SET_PQ_01 + 0x0FF:
+    case XIVE_ESB_SET_PQ_10 ... XIVE_ESB_SET_PQ_10 + 0x0FF:
+    case XIVE_ESB_SET_PQ_11 ... XIVE_ESB_SET_PQ_11 + 0x0FF:
+        ret = xive_esb_set(&pq, (offset >> 8) & 0x3);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid END ESB load addr %d\n",
+                      offset);
+        return -1;
+    }
+
+    if (pq != GETFIELD_BE32(end_esmask, end.w1)) {
+        end.w1 = SETFIELD_BE32(end_esmask, end.w1, pq);
+        xive_router_write_end(xsrc->xrtr, end_blk, end_idx, &end, 1);
+    }
+
+    return ret;
+}
+
+/*
+ * END ESB MMIO stores are invalid
+ */
+static void xive_end_source_write(void *opaque, hwaddr addr,
+                                  uint64_t value, unsigned size)
+{
+    qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr 0x%"
+                  HWADDR_PRIx"\n", addr);
+}
+
+static const MemoryRegionOps xive_end_source_ops = {
+    .read = xive_end_source_read,
+    .write = xive_end_source_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+};
+
+static void xive_end_source_realize(DeviceState *dev, Error **errp)
+{
+    XiveENDSource *xsrc = XIVE_END_SOURCE(dev);
+    Object *obj;
+    Error *local_err = NULL;
+
+    obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
+    if (!obj) {
+        error_propagate(errp, local_err);
+        error_prepend(errp, "required link 'xive' not found: ");
+        return;
+    }
+
+    xsrc->xrtr = XIVE_ROUTER(obj);
+
+    if (!xsrc->nr_ends) {
+        error_setg(errp, "Number of interrupt needs to be greater than 0");
+        return;
+    }
+
+    if (xsrc->esb_shift != XIVE_ESB_4K &&
+        xsrc->esb_shift != XIVE_ESB_64K) {
+        error_setg(errp, "Invalid ESB shift setting");
+        return;
+    }
+
+    /*
+     * Each END is assigned an even/odd pair of MMIO pages, the even page
+     * manages the ESn field while the odd page manages the ESe field.
+     */
+    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
+                          &xive_end_source_ops, xsrc, "xive.end",
+                          (1ull << (xsrc->esb_shift + 1)) * xsrc->nr_ends);
+}
+
+static Property xive_end_source_properties[] = {
+    DEFINE_PROP_UINT8("block-id", XiveENDSource, block_id, 0),
+    DEFINE_PROP_UINT32("nr-ends", XiveENDSource, nr_ends, 0),
+    DEFINE_PROP_UINT32("shift", XiveENDSource, esb_shift, XIVE_ESB_64K),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xive_end_source_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->desc    = "XIVE END Source";
+    dc->props   = xive_end_source_properties;
+    dc->realize = xive_end_source_realize;
+}
+
+static const TypeInfo xive_end_source_info = {
+    .name          = TYPE_XIVE_END_SOURCE,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(XiveENDSource),
+    .class_init    = xive_end_source_class_init,
+};
+
 /*
  * XIVE Fabric
  */
@@ -706,6 +861,7 @@ static void xive_register_types(void)
     type_register_static(&xive_source_info);
     type_register_static(&xive_fabric_info);
     type_register_static(&xive_router_info);
+    type_register_static(&xive_end_source_info);
 }
 
 type_init(xive_register_types)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 02/19] ppc/xive: introduce the XIVE interrupt thread context
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-10  4:19   ` David Gibson
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Each POWER9 processor chip has a XIVE presenter that can generate four
different exceptions to its threads:

  - hypervisor exception,
  - O/S exception
  - Event-Based Branch (EBB)
  - msgsnd (doorbell).

Each exception has a state independent from the others called a Thread
Interrupt Management context. This context is a set of registers which
lets the thread handle priority management and interrupt acknowledgment
among other things. The most important ones being :

  - Interrupt Priority Register  (PIPR)
  - Interrupt Pending Buffer     (IPB)
  - Current Processor Priority   (CPPR)
  - Notification Source Register (NSR)

These registers are accessible through a specific MMIO region, called
the Thread Interrupt Management Area (TIMA), four aligned pages, each
exposing a different view of the registers. First page (page address
ending in 0b00) gives access to the entire context and is reserved for
the ring 0 view for the physical thread context. The second (page
address ending in 0b01) is for the hypervisor, ring 1 view. The third
(page address ending in 0b10) is for the operating system, ring 2
view. The fourth (page address ending in 0b11) is for user level, ring
3 view.

The thread interrupt context is modeled with a XiveTCTX object
containing the values of the different exception registers. The TIMA
region is mapped at the same address for each CPU.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---

 Changes since v6

 - introduced a xive_tctx_word2() helper to extract TM_WORD2 of a ring.

 include/hw/ppc/xive.h      |  44 ++++
 include/hw/ppc/xive_regs.h |  82 +++++++
 hw/intc/xive.c             | 424 +++++++++++++++++++++++++++++++++++++
 3 files changed, 550 insertions(+)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 014f64aa98f6..1e823a4c64e9 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -367,4 +367,48 @@ typedef struct XiveENDSource {
 void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon);
 void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon);
 
+/*
+ * XIVE Thread interrupt Management (TM) context
+ */
+
+#define TYPE_XIVE_TCTX "xive-tctx"
+#define XIVE_TCTX(obj) OBJECT_CHECK(XiveTCTX, (obj), TYPE_XIVE_TCTX)
+
+/*
+ * XIVE Thread interrupt Management register rings :
+ *
+ *   QW-0  User       event-based exception state
+ *   QW-1  O/S        OS context for priority management, interrupt acks
+ *   QW-2  Pool       hypervisor pool context for virtual processors dispatched
+ *   QW-3  Physical   physical thread context and security context
+ */
+#define XIVE_TM_RING_COUNT      4
+#define XIVE_TM_RING_SIZE       0x10
+
+typedef struct XiveTCTX {
+    DeviceState parent_obj;
+
+    CPUState    *cs;
+    qemu_irq    output;
+
+    uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
+} XiveTCTX;
+
+/*
+ * XIVE Thread Interrupt Management Aera (TIMA)
+ *
+ * This region gives access to the registers of the thread interrupt
+ * management context. It is four page wide, each page providing a
+ * different view of the registers. The page with the lower offset is
+ * the most privileged and gives access to the entire context.
+ */
+#define XIVE_TM_HW_PAGE         0x0
+#define XIVE_TM_HV_PAGE         0x1
+#define XIVE_TM_OS_PAGE         0x2
+#define XIVE_TM_USER_PAGE       0x3
+
+extern const MemoryRegionOps xive_tm_ops;
+
+void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
+
 #endif /* PPC_XIVE_H */
diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
index 3c0ebad18b69..ede3d04c5eda 100644
--- a/include/hw/ppc/xive_regs.h
+++ b/include/hw/ppc/xive_regs.h
@@ -23,6 +23,88 @@
 #define XIVE_SRCNO_INDEX(srcno) ((srcno) & 0x0fffffff)
 #define XIVE_SRCNO(blk, idx)    ((uint32_t)(blk) << 28 | (idx))
 
+#define TM_SHIFT                16
+
+/* TM register offsets */
+#define TM_QW0_USER             0x000 /* All rings */
+#define TM_QW1_OS               0x010 /* Ring 0..2 */
+#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
+#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
+
+/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
+#define TM_NSR                  0x0  /*  +   +   -   +  */
+#define TM_CPPR                 0x1  /*  -   +   -   +  */
+#define TM_IPB                  0x2  /*  -   +   +   +  */
+#define TM_LSMFB                0x3  /*  -   +   +   +  */
+#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
+#define TM_INC                  0x5  /*  -   +   -   +  */
+#define TM_AGE                  0x6  /*  -   +   -   +  */
+#define TM_PIPR                 0x7  /*  -   +   -   +  */
+
+#define TM_WORD0                0x0
+#define TM_WORD1                0x4
+
+/*
+ * QW word 2 contains the valid bit at the top and other fields
+ * depending on the QW.
+ */
+#define TM_WORD2                0x8
+#define   TM_QW0W2_VU           PPC_BIT32(0)
+#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
+#define   TM_QW1W2_VO           PPC_BIT32(0)
+#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
+#define   TM_QW2W2_VP           PPC_BIT32(0)
+#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
+#define   TM_QW3W2_VT           PPC_BIT32(0)
+#define   TM_QW3W2_LP           PPC_BIT32(6)
+#define   TM_QW3W2_LE           PPC_BIT32(7)
+#define   TM_QW3W2_T            PPC_BIT32(31)
+
+/*
+ * In addition to normal loads to "peek" and writes (only when invalid)
+ * using 4 and 8 bytes accesses, the above registers support these
+ * "special" byte operations:
+ *
+ *   - Byte load from QW0[NSR] - User level NSR (EBB)
+ *   - Byte store to QW0[NSR] - User level NSR (EBB)
+ *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
+ *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
+ *                                    otherwise VT||0000000
+ *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
+ *
+ * Then we have all these "special" CI ops at these offset that trigger
+ * all sorts of side effects:
+ */
+#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
+#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
+#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
+#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
+                                         * context */
+#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
+#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
+                                         * context to reg */
+#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
+                                         * context to reg*/
+#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
+#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
+                                         * line */
+#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
+#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
+                                         * line */
+#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
+/* XXX more... */
+
+/* NSR fields for the various QW ack types */
+#define TM_QW0_NSR_EB           PPC_BIT8(0)
+#define TM_QW1_NSR_EO           PPC_BIT8(0)
+#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
+#define  TM_QW3_NSR_HE_NONE     0
+#define  TM_QW3_NSR_HE_POOL     1
+#define  TM_QW3_NSR_HE_PHYS     2
+#define  TM_QW3_NSR_HE_LSI      3
+#define TM_QW3_NSR_I            PPC_BIT8(2)
+#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
+
 /* EAS (Event Assignment Structure)
  *
  * One per interrupt source. Targets an interrupt to a given Event
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 2196ce8de059..2615d16b7437 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -16,6 +16,429 @@
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
 #include "hw/ppc/xive.h"
+#include "hw/ppc/xive_regs.h"
+
+/*
+ * XIVE Thread Interrupt Management context
+ */
+
+static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
+{
+    return 0;
+}
+
+static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
+{
+    if (cppr > XIVE_PRIORITY_MAX) {
+        cppr = 0xff;
+    }
+
+    tctx->regs[ring + TM_CPPR] = cppr;
+}
+
+/*
+ * XIVE Thread Interrupt Management Area (TIMA)
+ */
+
+/*
+ * Define an access map for each page of the TIMA that we will use in
+ * the memory region ops to filter values when doing loads and stores
+ * of raw registers values
+ *
+ * Registers accessibility bits :
+ *
+ *    0x0 - no access
+ *    0x1 - write only
+ *    0x2 - read only
+ *    0x3 - read/write
+ */
+
+static const uint8_t xive_tm_hw_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   3, 3, 3, 0,
+};
+
+static const uint8_t xive_tm_hv_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   0, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   0, 0, 0, 0,
+};
+
+static const uint8_t xive_tm_os_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-1 OS   */   2, 3, 2, 2,   2, 2, 0, 2,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+};
+
+static const uint8_t xive_tm_user_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-1 OS   */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+};
+
+/*
+ * Overall TIMA access map for the thread interrupt management context
+ * registers
+ */
+static const uint8_t *xive_tm_views[] = {
+    [XIVE_TM_HW_PAGE]   = xive_tm_hw_view,
+    [XIVE_TM_HV_PAGE]   = xive_tm_hv_view,
+    [XIVE_TM_OS_PAGE]   = xive_tm_os_view,
+    [XIVE_TM_USER_PAGE] = xive_tm_user_view,
+};
+
+/*
+ * Computes a register access mask for a given offset in the TIMA
+ */
+static uint64_t xive_tm_mask(hwaddr offset, unsigned size, bool write)
+{
+    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
+    uint8_t reg_offset = offset & 0x3F;
+    uint8_t reg_mask = write ? 0x1 : 0x2;
+    uint64_t mask = 0x0;
+    int i;
+
+    for (i = 0; i < size; i++) {
+        if (xive_tm_views[page_offset][reg_offset + i] & reg_mask) {
+            mask |= (uint64_t) 0xff << (8 * (size - i - 1));
+        }
+    }
+
+    return mask;
+}
+
+static void xive_tm_raw_write(XiveTCTX *tctx, hwaddr offset, uint64_t value,
+                              unsigned size)
+{
+    uint8_t ring_offset = offset & 0x30;
+    uint8_t reg_offset = offset & 0x3F;
+    uint64_t mask = xive_tm_mask(offset, size, true);
+    int i;
+
+    /*
+     * Only 4 or 8 bytes stores are allowed and the User ring is
+     * excluded
+     */
+    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA @%"
+                      HWADDR_PRIx"\n", offset);
+        return;
+    }
+
+    /*
+     * Use the register offset for the raw values and filter out
+     * reserved values
+     */
+    for (i = 0; i < size; i++) {
+        uint8_t byte_mask = (mask >> (8 * (size - i - 1)));
+        if (byte_mask) {
+            tctx->regs[reg_offset + i] = (value >> (8 * (size - i - 1))) &
+                byte_mask;
+        }
+    }
+}
+
+static uint64_t xive_tm_raw_read(XiveTCTX *tctx, hwaddr offset, unsigned size)
+{
+    uint8_t ring_offset = offset & 0x30;
+    uint8_t reg_offset = offset & 0x3F;
+    uint64_t mask = xive_tm_mask(offset, size, false);
+    uint64_t ret;
+    int i;
+
+    /*
+     * Only 4 or 8 bytes loads are allowed and the User ring is
+     * excluded
+     */
+    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access at TIMA @%"
+                      HWADDR_PRIx"\n", offset);
+        return -1;
+    }
+
+    /* Use the register offset for the raw values */
+    ret = 0;
+    for (i = 0; i < size; i++) {
+        ret |= (uint64_t) tctx->regs[reg_offset + i] << (8 * (size - i - 1));
+    }
+
+    /* filter out reserved values */
+    return ret & mask;
+}
+
+/*
+ * The TM context is mapped twice within each page. Stores and loads
+ * to the first mapping below 2K write and read the specified values
+ * without modification. The second mapping above 2K performs specific
+ * state changes (side effects) in addition to setting/returning the
+ * interrupt management area context of the processor thread.
+ */
+static uint64_t xive_tm_ack_os_reg(XiveTCTX *tctx, hwaddr offset, unsigned size)
+{
+    return xive_tctx_accept(tctx, TM_QW1_OS);
+}
+
+static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset,
+                                uint64_t value, unsigned size)
+{
+    xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff);
+}
+
+/*
+ * Define a mapping of "special" operations depending on the TIMA page
+ * offset and the size of the operation.
+ */
+typedef struct XiveTmOp {
+    uint8_t  page_offset;
+    uint32_t op_offset;
+    unsigned size;
+    void     (*write_handler)(XiveTCTX *tctx, hwaddr offset, uint64_t value,
+                              unsigned size);
+    uint64_t (*read_handler)(XiveTCTX *tctx, hwaddr offset, unsigned size);
+} XiveTmOp;
+
+static const XiveTmOp xive_tm_operations[] = {
+    /*
+     * MMIOs below 2K : raw values and special operations without side
+     * effects
+     */
+    { XIVE_TM_OS_PAGE, TM_QW1_OS + TM_CPPR,   1, xive_tm_set_os_cppr, NULL },
+
+    /* MMIOs above 2K : special operations with side effects */
+    { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG,     2, NULL, xive_tm_ack_os_reg },
+};
+
+static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write)
+{
+    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
+    uint32_t op_offset = offset & 0xFFF;
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(xive_tm_operations); i++) {
+        const XiveTmOp *xto = &xive_tm_operations[i];
+
+        /* Accesses done from a more privileged TIMA page is allowed */
+        if (xto->page_offset >= page_offset &&
+            xto->op_offset == op_offset &&
+            xto->size == size &&
+            ((write && xto->write_handler) || (!write && xto->read_handler))) {
+            return xto;
+        }
+    }
+    return NULL;
+}
+
+/*
+ * TIMA MMIO handlers
+ */
+static void xive_tm_write(void *opaque, hwaddr offset,
+                          uint64_t value, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+    const XiveTmOp *xto;
+
+    /*
+     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
+     */
+
+    /*
+     * First, check for special operations in the 2K region
+     */
+    if (offset & 0x800) {
+        xto = xive_tm_find_op(offset, size, true);
+        if (!xto) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA"
+                          "@%"HWADDR_PRIx"\n", offset);
+        } else {
+            xto->write_handler(tctx, offset, value, size);
+        }
+        return;
+    }
+
+    /*
+     * Then, for special operations in the region below 2K.
+     */
+    xto = xive_tm_find_op(offset, size, true);
+    if (xto) {
+        xto->write_handler(tctx, offset, value, size);
+        return;
+    }
+
+    /*
+     * Finish with raw access to the register values
+     */
+    xive_tm_raw_write(tctx, offset, value, size);
+}
+
+static uint64_t xive_tm_read(void *opaque, hwaddr offset, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+    const XiveTmOp *xto;
+
+    /*
+     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
+     */
+
+    /*
+     * First, check for special operations in the 2K region
+     */
+    if (offset & 0x800) {
+        xto = xive_tm_find_op(offset, size, false);
+        if (!xto) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access to TIMA"
+                          "@%"HWADDR_PRIx"\n", offset);
+            return -1;
+        }
+        return xto->read_handler(tctx, offset, size);
+    }
+
+    /*
+     * Then, for special operations in the region below 2K.
+     */
+    xto = xive_tm_find_op(offset, size, false);
+    if (xto) {
+        return xto->read_handler(tctx, offset, size);
+    }
+
+    /*
+     * Finish with raw access to the register values
+     */
+    return xive_tm_raw_read(tctx, offset, size);
+}
+
+const MemoryRegionOps xive_tm_ops = {
+    .read = xive_tm_read,
+    .write = xive_tm_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
+
+static inline uint32_t xive_tctx_word2(uint8_t *ring)
+{
+    return *((uint32_t *) &ring[TM_WORD2]);
+}
+
+static char *xive_tctx_ring_print(uint8_t *ring)
+{
+    uint32_t w2 = xive_tctx_word2(ring);
+
+    return g_strdup_printf("%02x   %02x  %02x    %02x   %02x  "
+                   "%02x  %02x   %02x  %08x",
+                   ring[TM_NSR], ring[TM_CPPR], ring[TM_IPB], ring[TM_LSMFB],
+                   ring[TM_ACK_CNT], ring[TM_INC], ring[TM_AGE], ring[TM_PIPR],
+                   be32_to_cpu(w2));
+}
+
+static const char * const xive_tctx_ring_names[] = {
+    "USER", "OS", "POOL", "PHYS",
+};
+
+void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
+{
+    int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
+    int i;
+
+    monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
+                   "  W2\n", cpu_index);
+
+    for (i = 0; i < XIVE_TM_RING_COUNT; i++) {
+        char *s = xive_tctx_ring_print(&tctx->regs[i * XIVE_TM_RING_SIZE]);
+        monitor_printf(mon, "CPU[%04x]: %4s    %s\n", cpu_index,
+                       xive_tctx_ring_names[i], s);
+        g_free(s);
+    }
+}
+
+static void xive_tctx_reset(void *dev)
+{
+    XiveTCTX *tctx = XIVE_TCTX(dev);
+
+    memset(tctx->regs, 0, sizeof(tctx->regs));
+
+    /* Set some defaults */
+    tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF;
+    tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF;
+    tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF;
+}
+
+static void xive_tctx_realize(DeviceState *dev, Error **errp)
+{
+    XiveTCTX *tctx = XIVE_TCTX(dev);
+    PowerPCCPU *cpu;
+    CPUPPCState *env;
+    Object *obj;
+    Error *local_err = NULL;
+
+    obj = object_property_get_link(OBJECT(dev), "cpu", &local_err);
+    if (!obj) {
+        error_propagate(errp, local_err);
+        error_prepend(errp, "required link 'cpu' not found: ");
+        return;
+    }
+
+    cpu = POWERPC_CPU(obj);
+    tctx->cs = CPU(obj);
+
+    env = &cpu->env;
+    switch (PPC_INPUT(env)) {
+    case PPC_FLAGS_INPUT_POWER7:
+        tctx->output = env->irq_inputs[POWER7_INPUT_INT];
+        break;
+
+    default:
+        error_setg(errp, "XIVE interrupt controller does not support "
+                   "this CPU bus model");
+        return;
+    }
+
+    qemu_register_reset(xive_tctx_reset, dev);
+}
+
+static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
+{
+    qemu_unregister_reset(xive_tctx_reset, dev);
+}
+
+static const VMStateDescription vmstate_xive_tctx = {
+    .name = TYPE_XIVE_TCTX,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_BUFFER(regs, XiveTCTX),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void xive_tctx_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->desc = "XIVE Interrupt Thread Context";
+    dc->realize = xive_tctx_realize;
+    dc->unrealize = xive_tctx_unrealize;
+    dc->vmsd = &vmstate_xive_tctx;
+}
+
+static const TypeInfo xive_tctx_info = {
+    .name          = TYPE_XIVE_TCTX,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(XiveTCTX),
+    .class_init    = xive_tctx_class_init,
+};
 
 /*
  * XIVE ESB helpers
@@ -862,6 +1285,7 @@ static void xive_register_types(void)
     type_register_static(&xive_fabric_info);
     type_register_static(&xive_router_info);
     type_register_static(&xive_end_source_info);
+    type_register_static(&xive_tctx_info);
 }
 
 type_init(xive_register_types)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers Cédric Le Goater
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 02/19] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-10  4:27   ` David Gibson
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 04/19] ppc/xive: notify the CPU when the interrupt priority is more privileged Cédric Le Goater
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The last sub-engine of the XIVE architecture is the Interrupt
Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
IVPE share elements, the Power Bus interface (CQ), the routing table
descriptors, and they can be combined in the same HW logic. We do the
same in QEMU and combine both engines in the XiveRouter for
simplicity.

When the IVRE has completed its job of matching an event source with a
Notification Virtual Target (NVT) to notify, it forwards the event
notification to the IVPE sub-engine. The IVPE scans the thread
interrupt contexts of the Notification Virtual Targets (NVT)
dispatched on the HW processor threads and if a match is found, it
signals the thread. If not, the IVPE escalates the notification to
some other targets and records the notification in a backlog queue.

The IVPE maintains the thread interrupt context state for each of its
NVTs not dispatched on HW processor threads in the Notification
Virtual Target table (NVTT).

The model currently only supports single NVT notifications.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 Changes since v6 :

 - removed HW CAM line setting and use as it is only useful for PowerNV
 - made use of xive_tctx_word2() helper
 - made use of GETFIELD_BE32() to compare CAM lines
 - fixed initialization of XiveTCTXMatch

 include/hw/ppc/xive.h      |  14 +++
 include/hw/ppc/xive_regs.h |  24 +++++
 hw/intc/xive.c             | 185 +++++++++++++++++++++++++++++++++++++
 3 files changed, 223 insertions(+)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 1e823a4c64e9..19309d1d65d1 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -325,6 +325,10 @@ typedef struct XiveRouterClass {
                    XiveEND *end);
     int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                      XiveEND *end, uint8_t word_number);
+    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                   XiveNVT *nvt);
+    int (*write_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                     XiveNVT *nvt, uint8_t word_number);
 } XiveRouterClass;
 
 void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
@@ -335,6 +339,11 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                         XiveEND *end);
 int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                           XiveEND *end, uint8_t word_number);
+int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                        XiveNVT *nvt);
+int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                          XiveNVT *nvt, uint8_t word_number);
+
 
 /*
  * XIVE END ESBs
@@ -411,4 +420,9 @@ extern const MemoryRegionOps xive_tm_ops;
 
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
 
+static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
+{
+    return (nvt_blk << 19) | nvt_idx;
+}
+
 #endif /* PPC_XIVE_H */
diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
index ede3d04c5eda..85557e730cd8 100644
--- a/include/hw/ppc/xive_regs.h
+++ b/include/hw/ppc/xive_regs.h
@@ -186,4 +186,28 @@ typedef struct XiveEND {
 #define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
 #define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
 
+/* Notification Virtual Target (NVT) */
+typedef struct XiveNVT {
+        uint32_t        w0;
+#define NVT_W0_VALID             PPC_BIT32(0)
+        uint32_t        w1;
+        uint32_t        w2;
+        uint32_t        w3;
+        uint32_t        w4;
+        uint32_t        w5;
+        uint32_t        w6;
+        uint32_t        w7;
+        uint32_t        w8;
+#define NVT_W8_GRP_VALID         PPC_BIT32(0)
+        uint32_t        w9;
+        uint32_t        wa;
+        uint32_t        wb;
+        uint32_t        wc;
+        uint32_t        wd;
+        uint32_t        we;
+        uint32_t        wf;
+} XiveNVT;
+
+#define xive_nvt_is_valid(nvt)    (be32_to_cpu((nvt)->w0) & NVT_W0_VALID)
+
 #endif /* PPC_XIVE_REGS_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 2615d16b7437..3eecffe99b3a 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -983,6 +983,183 @@ int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
    return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
 }
 
+int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                        XiveNVT *nvt)
+{
+   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
+
+   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
+}
+
+int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                        XiveNVT *nvt, uint8_t word_number)
+{
+   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
+
+   return xrc->write_nvt(xrtr, nvt_blk, nvt_idx, nvt, word_number);
+}
+
+/*
+ * The thread context register words are in big-endian format.
+ */
+static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
+                                     uint8_t nvt_blk, uint32_t nvt_idx,
+                                     bool cam_ignore, uint32_t logic_serv)
+{
+    uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx);
+    uint32_t qw2w2 = xive_tctx_word2(&tctx->regs[TM_QW2_HV_POOL]);
+    uint32_t qw1w2 = xive_tctx_word2(&tctx->regs[TM_QW1_OS]);
+    uint32_t qw0w2 = xive_tctx_word2(&tctx->regs[TM_QW0_USER]);
+
+    /* TODO (PowerNV): ignore mode. The low order bits of the NVT
+     * identifier are ignored in the "CAM" match.
+     */
+
+    if (format == 0) {
+        if (cam_ignore == true) {
+            /* F=0 & i=1: Logical server notification (bits ignored at
+             * the end of the NVT identifier)
+             */
+            qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n",
+                          nvt_blk, nvt_idx);
+             return -1;
+        }
+
+        /* F=0 & i=0: Specific NVT notification */
+
+        /* TODO (PowerNV) : PHYS ring */
+
+        /* HV POOL ring */
+        if ((be32_to_cpu(qw2w2) & TM_QW2W2_VP) &&
+            cam == GETFIELD_BE32(TM_QW2W2_POOL_CAM, qw2w2)) {
+            return TM_QW2_HV_POOL;
+        }
+
+        /* OS ring */
+        if ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) &&
+            cam == GETFIELD_BE32(TM_QW1W2_OS_CAM, qw1w2)) {
+            return TM_QW1_OS;
+        }
+    } else {
+        /* F=1 : User level Event-Based Branch (EBB) notification */
+
+        /* USER ring */
+        if  ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) &&
+             (cam == GETFIELD_BE32(TM_QW1W2_OS_CAM, qw1w2)) &&
+             (be32_to_cpu(qw0w2) & TM_QW0W2_VU) &&
+             (logic_serv == GETFIELD_BE32(TM_QW0W2_LOGIC_SERV, qw0w2))) {
+            return TM_QW0_USER;
+        }
+    }
+    return -1;
+}
+
+typedef struct XiveTCTXMatch {
+    XiveTCTX *tctx;
+    uint8_t ring;
+} XiveTCTXMatch;
+
+static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
+                                 uint8_t nvt_blk, uint32_t nvt_idx,
+                                 bool cam_ignore, uint8_t priority,
+                                 uint32_t logic_serv, XiveTCTXMatch *match)
+{
+    CPUState *cs;
+
+    /* TODO (PowerNV): handle chip_id overwrite of block field for
+     * hardwired CAM compares */
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+        int ring;
+
+        /*
+         * HW checks that the CPU is enabled in the Physical Thread
+         * Enable Register (PTER).
+         */
+
+        /*
+         * Check the thread context CAM lines and record matches. We
+         * will handle CPU exception delivery later
+         */
+        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
+                                         cam_ignore, logic_serv);
+        /*
+         * Save the context and follow on to catch duplicates, that we
+         * don't support yet.
+         */
+        if (ring != -1) {
+            if (match->tctx) {
+                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
+                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
+                return false;
+            }
+
+            match->ring = ring;
+            match->tctx = tctx;
+        }
+    }
+
+    if (!match->tctx) {
+        qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
+                      nvt_blk, nvt_idx);
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * This is our simple Xive Presenter Engine model. It is merged in the
+ * Router as it does not require an extra object.
+ *
+ * It receives notification requests sent by the IVRE to find one
+ * matching NVT (or more) dispatched on the processor threads. In case
+ * of a single NVT notification, the process is abreviated and the
+ * thread is signaled if a match is found. In case of a logical server
+ * notification (bits ignored at the end of the NVT identifier), the
+ * IVPE and IVRE select a winning thread using different filters. This
+ * involves 2 or 3 exchanges on the PowerBus that the model does not
+ * support.
+ *
+ * The parameters represent what is sent on the PowerBus
+ */
+static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
+                                  uint8_t nvt_blk, uint32_t nvt_idx,
+                                  bool cam_ignore, uint8_t priority,
+                                  uint32_t logic_serv)
+{
+    XiveNVT nvt;
+    XiveTCTXMatch match = { .tctx = NULL, .ring = 0 };
+    bool found;
+
+    /* NVT cache lookup */
+    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
+                      nvt_blk, nvt_idx);
+        return;
+    }
+
+    if (!xive_nvt_is_valid(&nvt)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
+                      nvt_blk, nvt_idx);
+        return;
+    }
+
+    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
+                                 priority, logic_serv, &match);
+    if (found) {
+        return;
+    }
+
+    /* If no matching NVT is dispatched on a HW thread :
+     * - update the NVT structure if backlog is activated
+     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
+     *   activated
+     */
+}
+
 /*
  * An END trigger can come from an event trigger (IPI or HW) or from
  * another chip. We don't model the PowerBus but the END trigger
@@ -1052,6 +1229,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
     /*
      * Follows IVPE notification
      */
+    xive_presenter_notify(xrtr, format,
+                          GETFIELD_BE32(END_W6_NVT_BLOCK, end.w6),
+                          GETFIELD_BE32(END_W6_NVT_INDEX, end.w6),
+                          GETFIELD_BE32(END_W7_F0_IGNORE, end.w7),
+                          priority,
+                          GETFIELD_BE32(END_W7_F1_LOG_SERVER_ID, end.w7));
+
+    /* TODO: Auto EOI. */
 }
 
 static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 04/19] ppc/xive: notify the CPU when the interrupt priority is more privileged
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (2 preceding siblings ...)
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 05/19] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

After the event data was enqueued in the O/S Event Queue, the IVPE
raises the bit corresponding to the priority of the pending interrupt
in the register IBP (Interrupt Pending Buffer) to indicate there is an
event pending in one of the 8 priority queues. The Pending Interrupt
Priority Register (PIPR) is also updated using the IPB. This register
represent the priority of the most favored pending notification.

The PIPR is then compared to the the Current Processor Priority
Register (CPPR). If it is more favored (numerically less than), the
CPU interrupt line is raised and the EO bit of the Notification Source
Register (NSR) is updated to notify the presence of an exception for
the O/S. The check needs to be done whenever the PIPR or the CPPR are
changed.

The O/S acknowledges the interrupt with a special load in the Thread
Interrupt Management Area. If the EO bit of the NSR is set, the CPPR
takes the value of PIPR. The bit number in the IBP corresponding to
the priority of the pending interrupt is reseted and so is the EO bit
of the NSR.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/intc/xive.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 3eecffe99b3a..ea5385ff7784 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -22,9 +22,73 @@
  * XIVE Thread Interrupt Management context
  */
 
+/* Convert a priority number to an Interrupt Pending Buffer (IPB)
+ * register, which indicates a pending interrupt at the priority
+ * corresponding to the bit number
+ */
+static uint8_t priority_to_ipb(uint8_t priority)
+{
+    return priority > XIVE_PRIORITY_MAX ?
+        0 : 1 << (XIVE_PRIORITY_MAX - priority);
+}
+
+/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
+ * Interrupt Priority Register (PIPR), which contains the priority of
+ * the most favored pending notification.
+ */
+static uint8_t ipb_to_pipr(uint8_t ibp)
+{
+    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
+}
+
+static void ipb_update(uint8_t *regs, uint8_t priority)
+{
+    regs[TM_IPB] |= priority_to_ipb(priority);
+    regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
+}
+
+static uint8_t exception_mask(uint8_t ring)
+{
+    switch (ring) {
+    case TM_QW1_OS:
+        return TM_QW1_NSR_EO;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
 {
-    return 0;
+    uint8_t *regs = &tctx->regs[ring];
+    uint8_t nsr = regs[TM_NSR];
+    uint8_t mask = exception_mask(ring);
+
+    qemu_irq_lower(tctx->output);
+
+    if (regs[TM_NSR] & mask) {
+        uint8_t cppr = regs[TM_PIPR];
+
+        regs[TM_CPPR] = cppr;
+
+        /* Reset the pending buffer bit */
+        regs[TM_IPB] &= ~priority_to_ipb(cppr);
+        regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
+
+        /* Drop Exception bit */
+        regs[TM_NSR] &= ~mask;
+    }
+
+    return (nsr << 8) | regs[TM_CPPR];
+}
+
+static void xive_tctx_notify(XiveTCTX *tctx, uint8_t ring)
+{
+    uint8_t *regs = &tctx->regs[ring];
+
+    if (regs[TM_PIPR] < regs[TM_CPPR]) {
+        regs[TM_NSR] |= exception_mask(ring);
+        qemu_irq_raise(tctx->output);
+    }
 }
 
 static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
@@ -34,6 +98,9 @@ static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
     }
 
     tctx->regs[ring + TM_CPPR] = cppr;
+
+    /* CPPR has changed, check if we need to raise a pending exception */
+    xive_tctx_notify(tctx, ring);
 }
 
 /*
@@ -189,6 +256,17 @@ static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset,
     xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff);
 }
 
+/*
+ * Adjust the IPB to allow a CPU to process event queues of other
+ * priorities during one physical interrupt cycle.
+ */
+static void xive_tm_set_os_pending(XiveTCTX *tctx, hwaddr offset,
+                                   uint64_t value, unsigned size)
+{
+    ipb_update(&tctx->regs[TM_QW1_OS], value & 0xff);
+    xive_tctx_notify(tctx, TM_QW1_OS);
+}
+
 /*
  * Define a mapping of "special" operations depending on the TIMA page
  * offset and the size of the operation.
@@ -211,6 +289,7 @@ static const XiveTmOp xive_tm_operations[] = {
 
     /* MMIOs above 2K : special operations with side effects */
     { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG,     2, NULL, xive_tm_ack_os_reg },
+    { XIVE_TM_OS_PAGE, TM_SPC_SET_OS_PENDING, 1, xive_tm_set_os_pending, NULL },
 };
 
 static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write)
@@ -373,6 +452,13 @@ static void xive_tctx_reset(void *dev)
     tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF;
     tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF;
     tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF;
+
+    /*
+     * Initialize PIPR to 0xFF to avoid phantom interrupts when the
+     * CPPR is first set.
+     */
+    tctx->regs[TM_QW1_OS + TM_PIPR] =
+        ipb_to_pipr(tctx->regs[TM_QW1_OS + TM_IPB]);
 }
 
 static void xive_tctx_realize(DeviceState *dev, Error **errp)
@@ -1150,9 +1236,15 @@ static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
     found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
                                  priority, logic_serv, &match);
     if (found) {
+        ipb_update(&match.tctx->regs[match.ring], priority);
+        xive_tctx_notify(match.tctx, match.ring);
         return;
     }
 
+    /* Record the IPB in the associated NVT structure */
+    ipb_update((uint8_t *) &nvt.w4, priority);
+    xive_router_write_nvt(xrtr, nvt_blk, nvt_idx, &nvt, 4);
+
     /* If no matching NVT is dispatched on a HW thread :
      * - update the NVT structure if backlog is activated
      * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 05/19] spapr/xive: introduce a XIVE interrupt controller
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (3 preceding siblings ...)
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 04/19] ppc/xive: notify the CPU when the interrupt priority is more privileged Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-10  4:36   ` David Gibson
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 06/19] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

sPAPRXive models the XIVE interrupt controller of the sPAPR machine.
It inherits from the XiveRouter and provisions storage for the routing
tables :

  - Event Assignment Structure (EAS)
  - Event Notification Descriptor (END)

The sPAPRXive model incorporates an internal XiveSource for the IPIs
and for the interrupts of the virtual devices of the guest. This model
is consistent with XIVE architecture which also incorporates an
internal IVSE for IPIs and accelerator interrupts in the IVRE
sub-engine.

The sPAPRXive model exports two memory regions, one for the ESB
trigger and management pages used to control the sources and one for
the TIMA pages. They are mapped by default at the addresses found on
chip 0 of a baremetal system. This is also consistent with the XIVE
architecture which defines a Virtualization Controller BAR for the
internal IVSE ESB pages and a Thread Managment BAR for the TIMA.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/spapr_xive.h       |  45 ++++
 hw/intc/spapr_xive.c              | 366 ++++++++++++++++++++++++++++++
 hw/intc/Makefile.objs             |   1 +
 4 files changed, 413 insertions(+)
 create mode 100644 include/hw/ppc/spapr_xive.h
 create mode 100644 hw/intc/spapr_xive.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 2d1e7c5c4668..7f34ad0528ed 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_XIVE=$(CONFIG_PSERIES)
+CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
 CONFIG_SPAPR_RNG=y
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
new file mode 100644
index 000000000000..f087959b9924
--- /dev/null
+++ b/include/hw/ppc/spapr_xive.h
@@ -0,0 +1,45 @@
+/*
+ * QEMU PowerPC sPAPR XIVE interrupt controller model
+ *
+ * Copyright (c) 2017-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PPC_SPAPR_XIVE_H
+#define PPC_SPAPR_XIVE_H
+
+#include "hw/ppc/xive.h"
+
+#define TYPE_SPAPR_XIVE "spapr-xive"
+#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
+
+typedef struct sPAPRXive {
+    XiveRouter    parent;
+
+    /* Internal interrupt source for IPIs and virtual devices */
+    XiveSource    source;
+    hwaddr        vc_base;
+
+    /* END ESB MMIOs */
+    XiveENDSource end_source;
+    hwaddr        end_base;
+
+    /* Routing table */
+    XiveEAS       *eat;
+    uint32_t      nr_irqs;
+    XiveEND       *endt;
+    uint32_t      nr_ends;
+
+    /* TIMA mapping address */
+    hwaddr        tm_base;
+    MemoryRegion  tm_mmio;
+} sPAPRXive;
+
+bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
+bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
+void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
+qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
+
+#endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
new file mode 100644
index 000000000000..eef5830d45c6
--- /dev/null
+++ b/hw/intc/spapr_xive.c
@@ -0,0 +1,366 @@
+/*
+ * QEMU PowerPC sPAPR XIVE interrupt controller model
+ *
+ * Copyright (c) 2017-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "monitor/monitor.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/xive.h"
+#include "hw/ppc/xive_regs.h"
+
+/*
+ * XIVE Virtualization Controller BAR and Thread Managment BAR that we
+ * use for the ESB pages and the TIMA pages
+ */
+#define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
+#define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
+
+/*
+ * On sPAPR machines, use a simplified output for the XIVE END
+ * structure dumping only the information related to the OS EQ.
+ */
+static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
+                                          Monitor *mon)
+{
+    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
+    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
+    uint32_t qentries = 1 << (qsize + 10);
+    uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
+    uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
+
+    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
+                   priority, qindex, qentries, qgen);
+
+    xive_end_queue_pic_print_info(end, 6, mon);
+    monitor_printf(mon, "]");
+}
+
+void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
+{
+    XiveSource *xsrc = &xive->source;
+    int i;
+
+    monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
+
+    for (i = 0; i < xive->nr_irqs; i++) {
+        uint8_t pq = xive_source_esb_get(xsrc, i);
+        XiveEAS *eas = &xive->eat[i];
+
+        if (!xive_eas_is_valid(eas)) {
+            continue;
+        }
+
+        monitor_printf(mon, "  %08x %s %c%c%c %s %08x ", i,
+                       xive_source_irq_is_lsi(xsrc, i) ? "LSI" : "MSI",
+                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
+                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
+                       xsrc->status[i] & XIVE_STATUS_ASSERTED ? 'A' : ' ',
+                       xive_eas_is_masked(eas) ? "M" : " ",
+                       (int) GETFIELD_BE64(EAS_END_DATA, eas->w));
+
+        if (!xive_eas_is_masked(eas)) {
+            uint32_t end_idx = GETFIELD_BE64(EAS_END_INDEX, eas->w);
+            XiveEND *end;
+
+            assert(end_idx < xive->nr_ends);
+            end = &xive->endt[end_idx];
+
+            if (xive_end_is_valid(end)) {
+                spapr_xive_end_pic_print_info(xive, end, mon);
+            }
+        }
+        monitor_printf(mon, "\n");
+    }
+}
+
+static void spapr_xive_map_mmio(sPAPRXive *xive)
+{
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
+}
+
+static void spapr_xive_end_reset(XiveEND *end)
+{
+    memset(end, 0, sizeof(*end));
+
+    /* switch off the escalation and notification ESBs */
+    end->w1 = cpu_to_be32(END_W1_ESe_Q | END_W1_ESn_Q);
+}
+
+static void spapr_xive_reset(void *dev)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+    int i;
+
+    /*
+     * The XiveSource has its own reset handler, which mask off all
+     * IRQs (!P|Q)
+     */
+
+    /* Mask all valid EASs in the IRQ number space. */
+    for (i = 0; i < xive->nr_irqs; i++) {
+        XiveEAS *eas = &xive->eat[i];
+        if (xive_eas_is_valid(eas)) {
+            eas->w = cpu_to_be64(EAS_VALID | EAS_MASKED);
+        } else {
+            eas->w = 0;
+        }
+    }
+
+    /* Clear all ENDs */
+    for (i = 0; i < xive->nr_ends; i++) {
+        spapr_xive_end_reset(&xive->endt[i]);
+    }
+}
+
+static void spapr_xive_instance_init(Object *obj)
+{
+    sPAPRXive *xive = SPAPR_XIVE(obj);
+
+    object_initialize(&xive->source, sizeof(xive->source), TYPE_XIVE_SOURCE);
+    object_property_add_child(obj, "source", OBJECT(&xive->source), NULL);
+
+    object_initialize(&xive->end_source, sizeof(xive->end_source),
+                      TYPE_XIVE_END_SOURCE);
+    object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
+                              NULL);
+}
+
+static void spapr_xive_realize(DeviceState *dev, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+    XiveSource *xsrc = &xive->source;
+    XiveENDSource *end_xsrc = &xive->end_source;
+    Error *local_err = NULL;
+
+    if (!xive->nr_irqs) {
+        error_setg(errp, "Number of interrupt needs to be greater 0");
+        return;
+    }
+
+    if (!xive->nr_ends) {
+        error_setg(errp, "Number of interrupt needs to be greater 0");
+        return;
+    }
+
+    /*
+     * Initialize the internal sources, for IPIs and virtual devices.
+     */
+    object_property_set_int(OBJECT(xsrc), xive->nr_irqs, "nr-irqs",
+                            &error_fatal);
+    object_property_add_const_link(OBJECT(xsrc), "xive", OBJECT(xive),
+                                   &error_fatal);
+    object_property_set_bool(OBJECT(xsrc), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /*
+     * Initialize the END ESB source
+     */
+    object_property_set_int(OBJECT(end_xsrc), xive->nr_irqs, "nr-ends",
+                            &error_fatal);
+    object_property_add_const_link(OBJECT(end_xsrc), "xive", OBJECT(xive),
+                                   &error_fatal);
+    object_property_set_bool(OBJECT(end_xsrc), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Set the mapping address of the END ESB pages after the source ESBs */
+    xive->end_base = xive->vc_base + (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
+
+    /*
+     * Allocate the routing tables
+     */
+    xive->eat = g_new0(XiveEAS, xive->nr_irqs);
+    xive->endt = g_new0(XiveEND, xive->nr_ends);
+
+    /* TIMA initialization */
+    memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
+                          "xive.tima", 4ull << TM_SHIFT);
+
+    /* Define all XIVE MMIO regions on SysBus */
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
+
+    /* Map all regions */
+    spapr_xive_map_mmio(xive);
+
+    qemu_register_reset(spapr_xive_reset, dev);
+}
+
+static int spapr_xive_get_eas(XiveRouter *xrtr, uint8_t eas_blk,
+                              uint32_t eas_idx, XiveEAS *eas)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xrtr);
+
+    if (eas_idx >= xive->nr_irqs) {
+        return -1;
+    }
+
+    *eas = xive->eat[eas_idx];
+    return 0;
+}
+
+static int spapr_xive_get_end(XiveRouter *xrtr,
+                              uint8_t end_blk, uint32_t end_idx, XiveEND *end)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xrtr);
+
+    if (end_idx >= xive->nr_ends) {
+        return -1;
+    }
+
+    memcpy(end, &xive->endt[end_idx], sizeof(XiveEND));
+    return 0;
+}
+
+static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
+                                uint32_t end_idx, XiveEND *end,
+                                uint8_t word_number)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xrtr);
+
+    if (end_idx >= xive->nr_ends) {
+        return -1;
+    }
+
+    memcpy(&xive->endt[end_idx], end, sizeof(XiveEND));
+    return 0;
+}
+
+static const VMStateDescription vmstate_spapr_xive_end = {
+    .name = TYPE_SPAPR_XIVE "/end",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT32(w0, XiveEND),
+        VMSTATE_UINT32(w1, XiveEND),
+        VMSTATE_UINT32(w2, XiveEND),
+        VMSTATE_UINT32(w3, XiveEND),
+        VMSTATE_UINT32(w4, XiveEND),
+        VMSTATE_UINT32(w5, XiveEND),
+        VMSTATE_UINT32(w6, XiveEND),
+        VMSTATE_UINT32(w7, XiveEND),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_xive_eas = {
+    .name = TYPE_SPAPR_XIVE "/eas",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(w, XiveEAS),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_xive = {
+    .name = TYPE_SPAPR_XIVE,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
+        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
+                                     vmstate_spapr_xive_eas, XiveEAS),
+        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(endt, sPAPRXive, nr_ends,
+                                             vmstate_spapr_xive_end, XiveEND),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property spapr_xive_properties[] = {
+    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
+    DEFINE_PROP_UINT32("nr-ends", sPAPRXive, nr_ends, 0),
+    DEFINE_PROP_UINT64("vc-base", sPAPRXive, vc_base, SPAPR_XIVE_VC_BASE),
+    DEFINE_PROP_UINT64("tm-base", sPAPRXive, tm_base, SPAPR_XIVE_TM_BASE),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void spapr_xive_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    XiveRouterClass *xrc = XIVE_ROUTER_CLASS(klass);
+
+    dc->desc    = "sPAPR XIVE Interrupt Controller";
+    dc->props   = spapr_xive_properties;
+    dc->realize = spapr_xive_realize;
+    dc->vmsd    = &vmstate_spapr_xive;
+
+    xrc->get_eas = spapr_xive_get_eas;
+    xrc->get_end = spapr_xive_get_end;
+    xrc->write_end = spapr_xive_write_end;
+}
+
+static const TypeInfo spapr_xive_info = {
+    .name = TYPE_SPAPR_XIVE,
+    .parent = TYPE_XIVE_ROUTER,
+    .instance_init = spapr_xive_instance_init,
+    .instance_size = sizeof(sPAPRXive),
+    .class_init = spapr_xive_class_init,
+};
+
+static void spapr_xive_register_types(void)
+{
+    type_register_static(&spapr_xive_info);
+}
+
+type_init(spapr_xive_register_types)
+
+bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
+{
+    XiveSource *xsrc = &xive->source;
+
+    if (lisn >= xive->nr_irqs) {
+        return false;
+    }
+
+    xive->eat[lisn].w |= cpu_to_be64(EAS_VALID);
+    xive_source_irq_set(xsrc, lisn, lsi);
+    return true;
+}
+
+bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn)
+{
+    XiveSource *xsrc = &xive->source;
+
+    if (lisn >= xive->nr_irqs) {
+        return false;
+    }
+
+    xive->eat[lisn].w &= cpu_to_be64(~EAS_VALID);
+    xive_source_irq_set(xsrc, lisn, false);
+    return true;
+}
+
+qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
+{
+    XiveSource *xsrc = &xive->source;
+
+    if (lisn >= xive->nr_irqs) {
+        return NULL;
+    }
+
+    /* The sPAPR machine/device should have claimed the IRQ before */
+    assert(xive_eas_is_valid(&xive->eat[lisn]));
+
+    return xive_source_qirq(xsrc, lisn);
+}
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 72a46ed91c31..301a8e972d91 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -38,6 +38,7 @@ obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
 obj-$(CONFIG_XIVE) += xive.o
+obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 06/19] spapr/xive: use the VCPU id as a NVT identifier
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (4 preceding siblings ...)
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 05/19] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-10  4:42   ` David Gibson
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 07/19] spapr: introduce a new machine IRQ backend for XIVE Cédric Le Goater
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts
to find a matching Notification Virtual Target (NVT) among the NVTs
dispatched on the HW processor threads.

On a real system, the thread interrupt contexts are updated by the
hypervisor when a Virtual Processor is scheduled to run on a HW
thread. Under QEMU, the model will emulate the same behavior by
hardwiring the NVT identifier in the thread context registers at
reset.

The NVT identifier used by the sPAPRXive model is the VCPU id. The END
identifier is also derived from the VCPU id. A set of helpers doing
the conversion between identifiers are provided for the hcalls
configuring the sources and the ENDs.

The model does not need a NVT table but the XiveRouter NVT operations
are provided to perform some extra checks in the routing algorithm.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 Changes since v6:

 - simplified the prototypes of helpers
 - introduced an assert in set_nvt() method

 hw/intc/spapr_xive.c | 56 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index eef5830d45c6..3ade419fdbb1 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -26,6 +26,26 @@
 #define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
 #define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
 
+/*
+ * The allocation of VP blocks is a complex operation in OPAL and the
+ * VP identifiers have a relation with the number of HW chips, the
+ * size of the VP blocks, VP grouping, etc. The QEMU sPAPR XIVE
+ * controller model does not have the same constraints and can use a
+ * simple mapping scheme of the CPU vcpu_id
+ *
+ * These identifiers are never returned to the OS.
+ */
+
+#define SPAPR_XIVE_NVT_BASE 0x400
+
+/*
+ * sPAPR NVT and END indexing helpers
+ */
+static uint32_t spapr_xive_nvt_to_target(uint8_t nvt_blk, uint32_t nvt_idx)
+{
+    return nvt_idx - SPAPR_XIVE_NVT_BASE;
+}
+
 /*
  * On sPAPR machines, use a simplified output for the XIVE END
  * structure dumping only the information related to the OS EQ.
@@ -40,7 +60,8 @@ static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
     uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
     uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
 
-    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
+    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d",
+                   spapr_xive_nvt_to_target(0, nvt),
                    priority, qindex, qentries, qgen);
 
     xive_end_queue_pic_print_info(end, 6, mon);
@@ -246,6 +267,37 @@ static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
     return 0;
 }
 
+static int spapr_xive_get_nvt(XiveRouter *xrtr,
+                              uint8_t nvt_blk, uint32_t nvt_idx, XiveNVT *nvt)
+{
+    uint32_t vcpu_id = spapr_xive_nvt_to_target(nvt_blk, nvt_idx);
+    PowerPCCPU *cpu = spapr_find_cpu(vcpu_id);
+
+    if (!cpu) {
+        /* TODO: should we assert() if we can find a NVT ? */
+        return -1;
+    }
+
+    /*
+     * sPAPR does not maintain a NVT table. Return that the NVT is
+     * valid if we have found a matching CPU
+     */
+    nvt->w0 = cpu_to_be32(NVT_W0_VALID);
+    return 0;
+}
+
+static int spapr_xive_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk,
+                                uint32_t nvt_idx, XiveNVT *nvt,
+                                uint8_t word_number)
+{
+    /*
+     * We don't need to write back to the NVTs because the sPAPR
+     * machine should never hit a non-scheduled NVT. It should never
+     * get called.
+     */
+    g_assert_not_reached();
+}
+
 static const VMStateDescription vmstate_spapr_xive_end = {
     .name = TYPE_SPAPR_XIVE "/end",
     .version_id = 1,
@@ -308,6 +360,8 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
     xrc->get_eas = spapr_xive_get_eas;
     xrc->get_end = spapr_xive_get_end;
     xrc->write_end = spapr_xive_write_end;
+    xrc->get_nvt = spapr_xive_get_nvt;
+    xrc->write_nvt = spapr_xive_write_nvt;
 }
 
 static const TypeInfo spapr_xive_info = {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 07/19] spapr: introduce a new machine IRQ backend for XIVE
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (5 preceding siblings ...)
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 06/19] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-10  4:45   ` David Gibson
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 08/19] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The XIVE IRQ backend uses the same layout as the new XICS backend but
covers the full range of the IRQ number space. The IRQ numbers for the
CPU IPIs are allocated at the bottom of this space, below 4K, to
preserve compatibility with XICS which does not use that range.

This should be enough given that the maximum number of CPUs is 1024
for the sPAPR machine under QEMU. For the record, the biggest POWER8
or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
cores, SMT8).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h     |   2 +
 include/hw/ppc/spapr_irq.h |   2 +
 hw/ppc/spapr_irq.c         | 113 +++++++++++++++++++++++++++++++++++++
 3 files changed, 117 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 198764066dc9..cb3082d319af 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -16,6 +16,7 @@ typedef struct sPAPREventLogEntry sPAPREventLogEntry;
 typedef struct sPAPREventSource sPAPREventSource;
 typedef struct sPAPRPendingHPT sPAPRPendingHPT;
 typedef struct ICSState ICSState;
+typedef struct sPAPRXive sPAPRXive;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -175,6 +176,7 @@ struct sPAPRMachineState {
     const char *icp_type;
     int32_t irq_map_nr;
     unsigned long *irq_map;
+    sPAPRXive  *xive;
 
     bool cmd_line_caps[SPAPR_CAP_NUM];
     sPAPRCapabilities def, eff, mig;
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index bd7301e6d9c6..23cdb51b879e 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -13,6 +13,7 @@
 /*
  * IRQ range offsets per device type
  */
+#define SPAPR_IRQ_IPI        0x0
 #define SPAPR_IRQ_EPOW       0x1000  /* XICS_IRQ_BASE offset */
 #define SPAPR_IRQ_HOTPLUG    0x1001
 #define SPAPR_IRQ_VIO        0x1100  /* 256 VIO devices */
@@ -42,6 +43,7 @@ typedef struct sPAPRIrq {
 
 extern sPAPRIrq spapr_irq_xics;
 extern sPAPRIrq spapr_irq_xics_legacy;
+extern sPAPRIrq spapr_irq_xive;
 
 void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index f8b651de0ec9..0bf47ff9fa26 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -12,6 +12,7 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
 #include "hw/ppc/xics.h"
 #include "sysemu/kvm.h"
 
@@ -205,6 +206,118 @@ sPAPRIrq spapr_irq_xics = {
     .print_info  = spapr_irq_print_info_xics,
 };
 
+/*
+ * XIVE IRQ backend.
+ */
+static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
+                                    int nr_servers, Error **errp)
+{
+    sPAPRXive *xive;
+    Error *local_err = NULL;
+    Object *obj;
+    uint32_t nr_ends = nr_servers << 3; /* 8 priority ENDs per CPU */
+    int i;
+
+    /* TODO : use qdev_create() ? */
+    obj = object_new(TYPE_SPAPR_XIVE);
+    object_property_set_int(obj, nr_irqs, "nr-irqs", &error_abort);
+    object_property_set_int(obj, nr_ends, "nr-ends", &error_abort);
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+    qdev_set_parent_bus(DEVICE(obj), sysbus_get_default());
+    xive = SPAPR_XIVE(obj);
+
+    /* Enable the CPU IPIs */
+    for (i = 0; i < nr_servers; ++i) {
+        spapr_xive_irq_claim(xive, SPAPR_IRQ_IPI + i, false);
+    }
+
+    return xive;
+}
+
+static void spapr_irq_init_xive(sPAPRMachineState *spapr, Error **errp)
+{
+    MachineState *machine = MACHINE(spapr);
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    int nr_irqs = smc->irq->nr_irqs;
+    Error *local_err = NULL;
+
+    /* KVM XIVE device not yet available */
+    if (kvm_enabled()) {
+        if (machine_kernel_irqchip_required(machine)) {
+            error_setg(errp, "kernel_irqchip requested. no KVM XIVE support");
+            return;
+        }
+    }
+
+    spapr->xive = spapr_xive_create(spapr, nr_irqs,
+                                    spapr_max_server_number(spapr), &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static int spapr_irq_claim_xive(sPAPRMachineState *spapr, int irq, bool lsi,
+                                Error **errp)
+{
+    if (!spapr_xive_irq_claim(spapr->xive, irq, lsi)) {
+        error_setg(errp, "IRQ %d is invalid", irq);
+        return -1;
+    }
+    return 0;
+}
+
+static void spapr_irq_free_xive(sPAPRMachineState *spapr, int irq, int num)
+{
+    int i;
+
+    for (i = irq; i < irq + num; ++i) {
+        spapr_xive_irq_free(spapr->xive, i);
+    }
+}
+
+static qemu_irq spapr_qirq_xive(sPAPRMachineState *spapr, int irq)
+{
+    return spapr_xive_qirq(spapr->xive, irq);
+}
+
+static void spapr_irq_print_info_xive(sPAPRMachineState *spapr,
+                                      Monitor *mon)
+{
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        xive_tctx_pic_print_info(XIVE_TCTX(cpu->intc), mon);
+    }
+
+    spapr_xive_pic_print_info(spapr->xive, mon);
+}
+
+/*
+ * XIVE uses the full IRQ number space. Set it to 8K to be compatible
+ * with XICS.
+ */
+
+#define SPAPR_IRQ_XIVE_NR_IRQS     0x2000
+#define SPAPR_IRQ_XIVE_NR_MSIS     (SPAPR_IRQ_XIVE_NR_IRQS - SPAPR_IRQ_MSI)
+
+sPAPRIrq spapr_irq_xive = {
+    .nr_irqs     = SPAPR_IRQ_XIVE_NR_IRQS,
+    .nr_msis     = SPAPR_IRQ_XIVE_NR_MSIS,
+
+    .init        = spapr_irq_init_xive,
+    .claim       = spapr_irq_claim_xive,
+    .free        = spapr_irq_free_xive,
+    .qirq        = spapr_qirq_xive,
+    .print_info  = spapr_irq_print_info_xive,
+};
+
 /*
  * sPAPR IRQ frontend routines for devices
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 08/19] spapr: add hcalls support for the XIVE exploitation interrupt mode
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (6 preceding siblings ...)
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 07/19] spapr: introduce a new machine IRQ backend for XIVE Cédric Le Goater
@ 2018-12-09 19:45 ` Cédric Le Goater
  2018-12-10  6:34   ` David Gibson
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode Cédric Le Goater
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The different XIVE virtualization structures (sources and event queues)
are configured with a set of Hypervisor calls :

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (ESB) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification configuration associated
   with the queue, only unconditional notification is supported for
   the moment. Reset is performed with a queue size of 0 and queueing
   is disabled in that case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the guest's internal interrupt structures to their
   initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure all notifications
   have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

See the code for more documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 Changes since v6:

 - simplified the prototypes of helpers
 - introduced a fixed value for the controller block id value.
 
 include/hw/ppc/spapr.h      |  15 +-
 include/hw/ppc/spapr_xive.h |   4 +
 hw/intc/spapr_xive.c        | 963 ++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_irq.c          |   2 +
 4 files changed, 983 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index cb3082d319af..6bf028a02fe2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -452,7 +452,20 @@ struct sPAPRMachineState {
 #define H_INVALIDATE_PID        0x378
 #define H_REGISTER_PROC_TBL     0x37C
 #define H_SIGNAL_SYS_RESET      0x380
-#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
+
+#define H_INT_GET_SOURCE_INFO   0x3A8
+#define H_INT_SET_SOURCE_CONFIG 0x3AC
+#define H_INT_GET_SOURCE_CONFIG 0x3B0
+#define H_INT_GET_QUEUE_INFO    0x3B4
+#define H_INT_SET_QUEUE_CONFIG  0x3B8
+#define H_INT_GET_QUEUE_CONFIG  0x3BC
+#define H_INT_SET_OS_REPORTING_LINE 0x3C0
+#define H_INT_GET_OS_REPORTING_LINE 0x3C4
+#define H_INT_ESB               0x3C8
+#define H_INT_SYNC              0x3CC
+#define H_INT_RESET             0x3D0
+
+#define MAX_HCALL_OPCODE        H_INT_RESET
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index f087959b9924..9506a8f4d10a 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -42,4 +42,8 @@ bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
 qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
 
+typedef struct sPAPRMachineState sPAPRMachineState;
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 3ade419fdbb1..982ac6e17051 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -38,6 +38,13 @@
 
 #define SPAPR_XIVE_NVT_BASE 0x400
 
+/*
+ * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
+ * to the controller block id value. It can nevertheless be changed
+ * for testing purpose.
+ */
+#define SPAPR_XIVE_BLOCK_ID 0x0
+
 /*
  * sPAPR NVT and END indexing helpers
  */
@@ -46,6 +53,64 @@ static uint32_t spapr_xive_nvt_to_target(uint8_t nvt_blk, uint32_t nvt_idx)
     return nvt_idx - SPAPR_XIVE_NVT_BASE;
 }
 
+static void spapr_xive_cpu_to_nvt(PowerPCCPU *cpu,
+                                  uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
+{
+    assert(cpu);
+
+    if (out_nvt_blk) {
+        *out_nvt_blk = SPAPR_XIVE_BLOCK_ID;
+    }
+
+    if (out_nvt_blk) {
+        *out_nvt_idx = SPAPR_XIVE_NVT_BASE + cpu->vcpu_id;
+    }
+}
+
+static int spapr_xive_target_to_nvt(uint32_t target,
+                                    uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
+{
+    PowerPCCPU *cpu = spapr_find_cpu(target);
+
+    if (!cpu) {
+        return -1;
+    }
+
+    spapr_xive_cpu_to_nvt(cpu, out_nvt_blk, out_nvt_idx);
+    return 0;
+}
+
+/*
+ * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
+ * priorities per CPU
+ */
+static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
+                                  uint8_t *out_end_blk, uint32_t *out_end_idx)
+{
+    assert(cpu);
+
+    if (out_end_blk) {
+        *out_end_blk = SPAPR_XIVE_BLOCK_ID;
+    }
+
+    if (out_end_idx) {
+        *out_end_idx = (cpu->vcpu_id << 3) + prio;
+    }
+}
+
+static int spapr_xive_target_to_end(uint32_t target, uint8_t prio,
+                                    uint8_t *out_end_blk, uint32_t *out_end_idx)
+{
+    PowerPCCPU *cpu = spapr_find_cpu(target);
+
+    if (!cpu) {
+        return -1;
+    }
+
+    spapr_xive_cpu_to_end(cpu, prio, out_end_blk, out_end_idx);
+    return 0;
+}
+
 /*
  * On sPAPR machines, use a simplified output for the XIVE END
  * structure dumping only the information related to the OS EQ.
@@ -418,3 +483,901 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
 
     return xive_source_qirq(xsrc, lisn);
 }
+
+/*
+ * XIVE hcalls
+ *
+ * The terminology used by the XIVE hcalls is the following :
+ *
+ *   TARGET vCPU number
+ *   EQ     Event Queue assigned by OS to receive event data
+ *   ESB    page for source interrupt management
+ *   LISN   Logical Interrupt Source Number identifying a source in the
+ *          machine
+ *   EISN   Effective Interrupt Source Number used by guest OS to
+ *          identify source in the guest
+ *
+ * The EAS, END, NVT structures are not exposed.
+ */
+
+/*
+ * Linux hosts under OPAL reserve priority 7 for their own escalation
+ * interrupts (DD2.X POWER9). So we only allow the guest to use
+ * priorities [0..6].
+ */
+static bool spapr_xive_priority_is_reserved(uint8_t priority)
+{
+    switch (priority) {
+    case 0 ... 6:
+        return false;
+    case 7: /* OPAL escalation queue */
+    default:
+        return true;
+    }
+}
+
+/*
+ * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
+ * real address of the MMIO page through which the Event State Buffer
+ * entry associated with the value of the "lisn" parameter is managed.
+ *
+ * Parameters:
+ * Input
+ * - R4: "flags"
+ *         Bits 0-63 reserved
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as returned
+ *       by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output
+ * - R4: "flags"
+ *         Bits 0-59: Reserved
+ *         Bit 60: H_INT_ESB must be used for Event State Buffer
+ *                 management
+ *         Bit 61: 1 == LSI  0 == MSI
+ *         Bit 62: the full function page supports trigger
+ *         Bit 63: Store EOI Supported
+ * - R5: Logical Real address of full function Event State Buffer
+ *       management page, -1 if H_INT_ESB hcall flag is set to 1.
+ * - R6: Logical Real Address of trigger only Event State Buffer
+ *       management page or -1.
+ * - R7: Power of 2 page size for the ESB management pages returned in
+ *       R5 and R6.
+ */
+
+#define SPAPR_XIVE_SRC_H_INT_ESB     PPC_BIT(60) /* ESB manage with H_INT_ESB */
+#define SPAPR_XIVE_SRC_LSI           PPC_BIT(61) /* Virtual LSI type */
+#define SPAPR_XIVE_SRC_TRIGGER       PPC_BIT(62) /* Trigger and management
+                                                    on same page */
+#define SPAPR_XIVE_SRC_STORE_EOI     PPC_BIT(63) /* Store EOI support */
+
+static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
+                                          sPAPRMachineState *spapr,
+                                          target_ulong opcode,
+                                          target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveSource *xsrc = &xive->source;
+    target_ulong flags  = args[0];
+    target_ulong lisn   = args[1];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    if (!xive_eas_is_valid(&xive->eat[lisn])) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /* All sources are emulated under the main XIVE object and share
+     * the same characteristics.
+     */
+    args[0] = 0;
+    if (!xive_source_esb_has_2page(xsrc)) {
+        args[0] |= SPAPR_XIVE_SRC_TRIGGER;
+    }
+    if (xsrc->esb_flags & XIVE_SRC_STORE_EOI) {
+        args[0] |= SPAPR_XIVE_SRC_STORE_EOI;
+    }
+
+    /*
+     * Force the use of the H_INT_ESB hcall in case of an LSI
+     * interrupt. This is necessary under KVM to re-trigger the
+     * interrupt if the level is still asserted
+     */
+    if (xive_source_irq_is_lsi(xsrc, lisn)) {
+        args[0] |= SPAPR_XIVE_SRC_H_INT_ESB | SPAPR_XIVE_SRC_LSI;
+    }
+
+    if (!(args[0] & SPAPR_XIVE_SRC_H_INT_ESB)) {
+        args[1] = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn);
+    } else {
+        args[1] = -1;
+    }
+
+    if (xive_source_esb_has_2page(xsrc) &&
+        !(args[0] & SPAPR_XIVE_SRC_H_INT_ESB)) {
+        args[2] = xive->vc_base + xive_source_esb_page(xsrc, lisn);
+    } else {
+        args[2] = -1;
+    }
+
+    if (xive_source_esb_has_2page(xsrc)) {
+        args[3] = xsrc->esb_shift - 1;
+    } else {
+        args[3] = xsrc->esb_shift;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
+ * Interrupt Source to a target. The Logical Interrupt Source is
+ * designated with the "lisn" parameter and the target is designated
+ * with the "target" and "priority" parameters.  Upon return from the
+ * hcall(), no additional interrupts will be directed to the old EQ.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-61: Reserved
+ *         Bit 62: set the "eisn" in the EAS
+ *         Bit 63: masks the interrupt source in the hardware interrupt
+ *       control structure. An interrupt masked by this mechanism will
+ *       be dropped, but it's source state bits will still be
+ *       set. There is no race-free way of unmasking and restoring the
+ *       source. Thus this should only be used in interrupts that are
+ *       also masked at the source, and only in cases where the
+ *       interrupt is not meant to be used for a large amount of time
+ *       because no valid target exists for it for example
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as returned by
+ *       the H_ALLOCATE_VAS_WINDOW hcall
+ * - R6: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R7: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ * - R8: "eisn" is the guest EISN associated with the "lisn"
+ *
+ * Output:
+ * - None
+ */
+
+#define SPAPR_XIVE_SRC_SET_EISN PPC_BIT(62)
+#define SPAPR_XIVE_SRC_MASK     PPC_BIT(63)
+
+static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveEAS eas, new_eas;
+    target_ulong flags    = args[0];
+    target_ulong lisn     = args[1];
+    target_ulong target   = args[2];
+    target_ulong priority = args[3];
+    target_ulong eisn     = args[4];
+    uint8_t end_blk;
+    uint32_t end_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~(SPAPR_XIVE_SRC_SET_EISN | SPAPR_XIVE_SRC_MASK)) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /* priority 0xff is used to reset the EAS */
+    if (priority == 0xff) {
+        new_eas.w = cpu_to_be64(EAS_VALID | EAS_MASKED);
+        goto out;
+    }
+
+    if (flags & SPAPR_XIVE_SRC_MASK) {
+        new_eas.w = eas.w | cpu_to_be64(EAS_MASKED);
+    } else {
+        new_eas.w = eas.w & cpu_to_be64(~EAS_MASKED);
+    }
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P4;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
+        return H_P3;
+    }
+
+    new_eas.w = SETFIELD_BE64(EAS_END_BLOCK, new_eas.w, end_blk);
+    new_eas.w = SETFIELD_BE64(EAS_END_INDEX, new_eas.w, end_idx);
+
+    if (flags & SPAPR_XIVE_SRC_SET_EISN) {
+        new_eas.w = SETFIELD_BE64(EAS_END_DATA, new_eas.w, eisn);
+    }
+
+out:
+    xive->eat[lisn] = new_eas;
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
+ * target/priority pair is assigned to the specified Logical Interrupt
+ * Source.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63 Reserved
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as
+ *       returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - R4: Target to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R5: Priority to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R6: EISN for the specified Logical Interrupt Source (this will be
+ *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
+ */
+static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags = args[0];
+    target_ulong lisn = args[1];
+    XiveEAS eas;
+    XiveEND *end;
+    uint8_t nvt_blk;
+    uint32_t end_idx, nvt_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /* EAS_END_BLOCK is unused on sPAPR */
+    end_idx = GETFIELD_BE64(EAS_END_INDEX, eas.w);
+
+    assert(end_idx < xive->nr_ends);
+    end = &xive->endt[end_idx];
+
+    nvt_blk = GETFIELD_BE32(END_W6_NVT_BLOCK, end->w6);
+    nvt_idx = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
+    args[0] = spapr_xive_nvt_to_target(nvt_blk, nvt_idx);
+
+    if (xive_eas_is_masked(&eas)) {
+        args[1] = 0xff;
+    } else {
+        args[1] = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
+    }
+
+    args[2] = GETFIELD_BE64(EAS_END_DATA, eas.w);
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
+ * address of the notification management page associated with the
+ * specified target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63 Reserved
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: Logical real address of notification page
+ * - R5: Power of 2 page size of the notification page
+ */
+static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
+                                         sPAPRMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveENDSource *end_xsrc = &xive->end_source;
+    target_ulong flags = args[0];
+    target_ulong target = args[1];
+    target_ulong priority = args[2];
+    XiveEND *end;
+    uint8_t end_blk;
+    uint32_t end_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
+        return H_P2;
+    }
+
+    assert(end_idx < xive->nr_ends);
+    end = &xive->endt[end_idx];
+
+    args[0] = xive->end_base + (1ull << (end_xsrc->esb_shift + 1)) * end_idx;
+    if (xive_end_is_enqueue(end)) {
+        args[1] = GETFIELD_BE32(END_W0_QSIZE, end->w0) + 12;
+    } else {
+        args[1] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
+ * a given "target" and "priority".  It is also used to set the
+ * notification config associated with the EQ.  An EQ size of 0 is
+ * used to reset the EQ config for a given target and priority. If
+ * resetting the EQ config, the END associated with the given "target"
+ * and "priority" will be changed to disable queueing.
+ *
+ * Upon return from the hcall(), no additional interrupts will be
+ * directed to the old EQ (if one was set). The old EQ (if one was
+ * set) should be investigated for interrupts that occurred prior to
+ * or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-62: Reserved
+ *         Bit 63: Unconditional Notify (n) per the XIVE spec
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ * - R7: "eventQueue": The logical real address of the start of the EQ
+ * - R8: "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
+ *
+ * Output:
+ * - None
+ */
+
+#define SPAPR_XIVE_END_ALWAYS_NOTIFY PPC_BIT(63)
+
+static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags = args[0];
+    target_ulong target = args[1];
+    target_ulong priority = args[2];
+    target_ulong qpage = args[3];
+    target_ulong qsize = args[4];
+    XiveEND end;
+    uint8_t end_blk, nvt_blk;
+    uint32_t end_idx, nvt_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~SPAPR_XIVE_END_ALWAYS_NOTIFY) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+
+    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
+        return H_P2;
+    }
+
+    assert(end_idx < xive->nr_ends);
+    memcpy(&end, &xive->endt[end_idx], sizeof(XiveEND));
+
+    switch (qsize) {
+    case 12:
+    case 16:
+    case 21:
+    case 24:
+        end.w2 = cpu_to_be32((qpage >> 32) & 0x0fffffff);
+        end.w3 = cpu_to_be32(qpage & 0xffffffff);
+        end.w0 |= cpu_to_be32(END_W0_ENQUEUE);
+        end.w0 = SETFIELD_BE32(END_W0_QSIZE, end.w0, qsize - 12);
+        break;
+    case 0:
+        /* reset queue and disable queueing */
+        spapr_xive_end_reset(&end);
+        goto out;
+
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid EQ size %"PRIx64"\n",
+                      qsize);
+        return H_P5;
+    }
+
+    if (qsize) {
+        hwaddr plen = 1 << qsize;
+        void *eq;
+
+        /*
+         * Validate the guest EQ. We should also check that the queue
+         * has been zeroed by the OS.
+         */
+        eq = address_space_map(CPU(cpu)->as, qpage, &plen, true,
+                               MEMTXATTRS_UNSPECIFIED);
+        if (plen != 1 << qsize) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to map EQ @0x%"
+                          HWADDR_PRIx "\n", qpage);
+            return H_P4;
+        }
+        address_space_unmap(CPU(cpu)->as, eq, plen, true, plen);
+    }
+
+    /* "target" should have been validated above */
+    if (spapr_xive_target_to_nvt(target, &nvt_blk, &nvt_idx)) {
+        g_assert_not_reached();
+    }
+
+    /* Ensure the priority and target are correctly set (they will not
+     * be right after allocation)
+     */
+    end.w6 = SETFIELD_BE32(END_W6_NVT_BLOCK, 0ul, nvt_blk) |
+        SETFIELD_BE32(END_W6_NVT_INDEX, 0ul, nvt_idx);
+    end.w7 = SETFIELD_BE32(END_W7_F0_PRIORITY, 0ul, priority);
+
+    if (flags & SPAPR_XIVE_END_ALWAYS_NOTIFY) {
+        end.w0 |= cpu_to_be32(END_W0_UCOND_NOTIFY);
+    } else {
+        end.w0 &= cpu_to_be32((uint32_t)~END_W0_UCOND_NOTIFY);
+    }
+
+    /* The generation bit for the END starts at 1 and The END page
+     * offset counter starts at 0.
+     */
+    end.w1 = cpu_to_be32(END_W1_GENERATION) |
+        SETFIELD_BE32(END_W1_PAGE_OFF, 0ul, 0ul);
+    end.w0 |= cpu_to_be32(END_W0_VALID);
+
+    /* TODO: issue syncs required to ensure all in-flight interrupts
+     * are complete on the old END */
+
+out:
+    /* Update END */
+    memcpy(&xive->endt[end_idx], &end, sizeof(XiveEND));
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
+ * target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-62: Reserved
+ *         Bit 63: Debug: Return debug data
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: "flags":
+ *       Bits 0-61: Reserved
+ *       Bit 62: The value of Event Queue Generation Number (g) per
+ *              the XIVE spec if "Debug" = 1
+ *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
+ * - R5: The logical real address of the start of the EQ
+ * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
+ * - R7: The value of Event Queue Offset Counter per XIVE spec
+ *       if "Debug" = 1, else 0
+ *
+ */
+
+#define SPAPR_XIVE_END_DEBUG     PPC_BIT(63)
+
+static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags = args[0];
+    target_ulong target = args[1];
+    target_ulong priority = args[2];
+    XiveEND *end;
+    uint8_t end_blk;
+    uint32_t end_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~SPAPR_XIVE_END_DEBUG) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
+        return H_P2;
+    }
+
+    assert(end_idx < xive->nr_ends);
+    end = &xive->endt[end_idx];
+
+    args[0] = 0;
+    if (xive_end_is_notify(end)) {
+        args[0] |= SPAPR_XIVE_END_ALWAYS_NOTIFY;
+    }
+
+    if (xive_end_is_enqueue(end)) {
+        args[1] = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
+            | be32_to_cpu(end->w3);
+        args[2] = GETFIELD_BE32(END_W0_QSIZE, end->w0) + 12;
+    } else {
+        args[1] = 0;
+        args[2] = 0;
+    }
+
+    /* TODO: do we need any locking on the END ? */
+    if (flags & SPAPR_XIVE_END_DEBUG) {
+        /* Load the event queue generation number into the return flags */
+        args[0] |= (uint64_t)GETFIELD_BE32(END_W1_GENERATION, end->w1) << 62;
+
+        /* Load R7 with the event queue offset counter */
+        args[3] = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+    } else {
+        args[3] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
+ * reporting cache line pair for the calling thread.  The reporting
+ * cache lines will contain the OS interrupt context when the OS
+ * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
+ * interrupt. The reporting cache lines can be reset by inputting -1
+ * in "reportingLine".  Issuing the CI store byte without reporting
+ * cache lines registered will result in the data not being accessible
+ * to the OS.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ * - R5: "reportingLine": The logical real address of the reporting cache
+ *       line pair
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_SET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
+ * real address of the reporting cache line pair set for the input
+ * "target".  If no reporting cache line pair has been set, -1 is
+ * returned.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "reportingLine": The logical real address of the reporting
+ *        cache line pair
+ *
+ * Output:
+ * - R4: The logical real address of the reporting line if set, else -1
+ */
+static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_GET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_ESB hcall() is used to issue a load or store to the ESB
+ * page for the input "lisn".  This hcall is only supported for LISNs
+ * that have the ESB hcall flag set to 1 when returned from hcall()
+ * H_INT_GET_SOURCE_INFO.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-62: Reserved
+ *         bit 63: Store: Store=1, store operation, else load operation
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as
+ *       returned by the H_ALLOCATE_VAS_WINDOW hcall
+ * - R6: "esbOffset" is the offset into the ESB page for the load or
+ *       store operation
+ * - R7: "storeData" is the data to write for a store operation
+ *
+ * Output:
+ * - R4: The value of the load if load operation, else -1
+ */
+
+#define SPAPR_XIVE_ESB_STORE PPC_BIT(63)
+
+static target_ulong h_int_esb(PowerPCCPU *cpu,
+                              sPAPRMachineState *spapr,
+                              target_ulong opcode,
+                              target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveEAS eas;
+    target_ulong flags  = args[0];
+    target_ulong lisn   = args[1];
+    target_ulong offset = args[2];
+    target_ulong data   = args[3];
+    hwaddr mmio_addr;
+    XiveSource *xsrc = &xive->source;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~SPAPR_XIVE_ESB_STORE) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    if (offset > (1ull << xsrc->esb_shift)) {
+        return H_P3;
+    }
+
+    mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
+
+    if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
+                      (flags & SPAPR_XIVE_ESB_STORE))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
+                      HWADDR_PRIx "\n", mmio_addr);
+        return H_HARDWARE;
+    }
+    args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SYNC hcall() is used to issue hardware syncs that will
+ * ensure any in flight events for the input lisn are in the event
+ * queue.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as
+ *       returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_sync(PowerPCCPU *cpu,
+                               sPAPRMachineState *spapr,
+                               target_ulong opcode,
+                               target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveEAS eas;
+    target_ulong flags = args[0];
+    target_ulong lisn = args[1];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* This is not real hardware. Nothing to be done */
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_RESET hcall() is used to reset all of the partition's
+ * interrupt exploitation structures to their initial state.  This
+ * means losing all previously set interrupt state set via
+ * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_reset(PowerPCCPU *cpu,
+                                sPAPRMachineState *spapr,
+                                target_ulong opcode,
+                                target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags   = args[0];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    device_reset(DEVICE(xive));
+    return H_SUCCESS;
+}
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr)
+{
+    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
+    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
+    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
+    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
+    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
+                             h_int_set_os_reporting_line);
+    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
+                             h_int_get_os_reporting_line);
+    spapr_register_hypercall(H_INT_ESB, h_int_esb);
+    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
+    spapr_register_hypercall(H_INT_RESET, h_int_reset);
+}
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 0bf47ff9fa26..d6768d0936f9 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -259,6 +259,8 @@ static void spapr_irq_init_xive(sPAPRMachineState *spapr, Error **errp)
         error_propagate(errp, local_err);
         return;
     }
+
+    spapr_xive_hcall_init(spapr);
 }
 
 static int spapr_irq_claim_xive(sPAPRMachineState *spapr, int irq, bool lsi,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (7 preceding siblings ...)
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 08/19] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-10  6:39   ` David Gibson
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 10/19] spapr: allocate the interrupt thread context under the CPU core Cédric Le Goater
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The XIVE interface for the guest is described in the device tree under
the "interrupt-controller" node. A couple of new properties are
specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), for the User level and for the Guest OS
   level. Only the Guest OS level is taken into account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the IRQ interrupt number ranges assigned to the guest for the IPIs.

and also under the root node :

 - "ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use. OPAL uses the priority 7 queue to automatically
   escalate interrupts for all other queues (DD2.X POWER9). So only
   priorities [0..6] are allowed for the guest.

Extend the sPAPR IRQ backend with a new handler to populate the DT
with the appropriate "interrupt-controller" node.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h  |  2 ++
 include/hw/ppc/spapr_xive.h |  2 ++
 include/hw/ppc/xics.h       |  4 +--
 hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
 hw/intc/xics_spapr.c        |  3 +-
 hw/ppc/spapr.c              |  3 +-
 hw/ppc/spapr_irq.c          |  3 ++
 7 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 23cdb51b879e..e51e9f052f63 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
     void (*free)(sPAPRMachineState *spapr, int irq, int num);
     qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
     void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
+    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
+                        void *fdt, uint32_t phandle);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 9506a8f4d10a..728a5e8dc163 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
 typedef struct sPAPRMachineState sPAPRMachineState;
 
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 9958443d1984..14afda198cdb 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
     ICPState *(*icp_get)(XICSFabric *xi, int server);
 } XICSFabricClass;
 
-void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
-
 ICPState *xics_icp_get(XICSFabric *xi, int server);
 
 /* Internal XICS interfaces */
@@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
 
 typedef struct sPAPRMachineState sPAPRMachineState;
 
+void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle);
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
 void xics_spapr_init(sPAPRMachineState *spapr);
 
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 982ac6e17051..a6d854b07690 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -14,6 +14,7 @@
 #include "target/ppc/cpu.h"
 #include "sysemu/cpus.h"
 #include "monitor/monitor.h"
+#include "hw/ppc/fdt.h"
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_xive.h"
 #include "hw/ppc/xive.h"
@@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_INT_SYNC, h_int_sync);
     spapr_register_hypercall(H_INT_RESET, h_int_reset);
 }
+
+void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle)
+{
+    sPAPRXive *xive = spapr->xive;
+    int node;
+    uint64_t timas[2 * 2];
+    /* Interrupt number ranges for the IPIs */
+    uint32_t lisn_ranges[] = {
+        cpu_to_be32(0),
+        cpu_to_be32(nr_servers),
+    };
+    uint32_t eq_sizes[] = {
+        cpu_to_be32(12), /* 4K */
+        cpu_to_be32(16), /* 64K */
+        cpu_to_be32(21), /* 2M */
+        cpu_to_be32(24), /* 16M */
+    };
+    /* The following array is in sync with the reserved priorities
+     * defined by the 'spapr_xive_priority_is_reserved' routine.
+     */
+    uint32_t plat_res_int_priorities[] = {
+        cpu_to_be32(7),    /* start */
+        cpu_to_be32(0xf8), /* count */
+    };
+    gchar *nodename;
+
+    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
+    timas[0] = cpu_to_be64(xive->tm_base +
+                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
+    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
+    timas[2] = cpu_to_be64(xive->tm_base +
+                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
+    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
+
+    nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
+                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
+    _FDT(node = fdt_add_subnode(fdt, 0, nodename));
+    g_free(nodename);
+
+    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
+
+    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
+                     sizeof(eq_sizes)));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
+                     sizeof(lisn_ranges)));
+
+    /* For Linux to link the LSIs to the interrupt controller. */
+    _FDT(fdt_setprop(fdt, node, "interrupt-controller", NULL, 0));
+    _FDT(fdt_setprop_cell(fdt, node, "#interrupt-cells", 2));
+
+    /* For SLOF */
+    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
+    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
+
+    /* The "ibm,plat-res-int-priorities" property defines the priority
+     * ranges reserved by the hypervisor
+     */
+    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
+                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
+}
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index 2e27b92b871a..f67d3c80bf3a 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -244,7 +244,8 @@ void xics_spapr_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_IPOLL, h_ipoll);
 }
 
-void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
+void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle)
 {
     uint32_t interrupt_server_ranges_prop[] = {
         0, cpu_to_be32(nr_servers),
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3f9fc73f7f59..8ff22cdb79d8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1268,7 +1268,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
     /* /interrupt controller */
-    spapr_dt_xics(spapr_max_server_number(spapr), fdt, PHANDLE_XICP);
+    smc->irq->dt_populate(spapr, spapr_max_server_number(spapr), fdt,
+                          PHANDLE_XICP);
 
     ret = spapr_populate_memory(spapr, fdt);
     if (ret < 0) {
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index d6768d0936f9..38ea2da7a094 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -204,6 +204,7 @@ sPAPRIrq spapr_irq_xics = {
     .free        = spapr_irq_free_xics,
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
+    .dt_populate = spapr_dt_xics,
 };
 
 /*
@@ -318,6 +319,7 @@ sPAPRIrq spapr_irq_xive = {
     .free        = spapr_irq_free_xive,
     .qirq        = spapr_qirq_xive,
     .print_info  = spapr_irq_print_info_xive,
+    .dt_populate = spapr_dt_xive,
 };
 
 /*
@@ -422,4 +424,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
     .free        = spapr_irq_free_xics,
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
+    .dt_populate = spapr_dt_xics,
 };
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 10/19] spapr: allocate the interrupt thread context under the CPU core
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (8 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 11/19] spapr: extend the sPAPR IRQ backend for XICS migration Cédric Le Goater
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Each interrupt mode has its own specific interrupt presenter object,
that we store under the CPU object, one for XICS and one for XIVE.

Extend the sPAPR IRQ backend with a new handler to support them both.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---

 Changes since v6:

 - removed the hardwiring the HW CAM line. Back to v5 state.

 include/hw/ppc/spapr_irq.h |  2 ++
 include/hw/ppc/xive.h      |  1 +
 hw/intc/xive.c             | 22 ++++++++++++++++++++++
 hw/ppc/spapr_cpu_core.c    |  5 ++---
 hw/ppc/spapr_irq.c         | 15 +++++++++++++++
 5 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index e51e9f052f63..13db0428ab51 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -41,6 +41,8 @@ typedef struct sPAPRIrq {
     void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
     void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
                         void *fdt, uint32_t phandle);
+    Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
+                               Error **errp);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 19309d1d65d1..18cd114eb244 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -419,6 +419,7 @@ typedef struct XiveTCTX {
 extern const MemoryRegionOps xive_tm_ops;
 
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp);
 
 static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
 {
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index ea5385ff7784..53d2f191e8a3 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -526,6 +526,28 @@ static const TypeInfo xive_tctx_info = {
     .class_init    = xive_tctx_class_init,
 };
 
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
+{
+    Error *local_err = NULL;
+    Object *obj;
+
+    obj = object_new(TYPE_XIVE_TCTX);
+    object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, &error_abort);
+    object_unref(obj);
+    object_property_add_const_link(obj, "cpu", cpu, &error_abort);
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        goto error;
+    }
+
+    return obj;
+
+error:
+    object_unparent(obj);
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 /*
  * XIVE ESB helpers
  */
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 2398ce62c0e7..1811cd48db90 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -11,7 +11,6 @@
 #include "hw/ppc/spapr_cpu_core.h"
 #include "target/ppc/cpu.h"
 #include "hw/ppc/spapr.h"
-#include "hw/ppc/xics.h" /* for icp_create() - to be removed */
 #include "hw/boards.h"
 #include "qapi/error.h"
 #include "sysemu/cpus.h"
@@ -215,6 +214,7 @@ static void spapr_cpu_core_unrealize(DeviceState *dev, Error **errp)
 static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                                sPAPRCPUCore *sc, Error **errp)
 {
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
     CPUPPCState *env = &cpu->env;
     CPUState *cs = CPU(cpu);
     Error *local_err = NULL;
@@ -233,8 +233,7 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     qemu_register_reset(spapr_cpu_reset, cpu);
     spapr_cpu_reset(cpu);
 
-    cpu->intc = icp_create(OBJECT(cpu), spapr->icp_type, XICS_FABRIC(spapr),
-                           &local_err);
+    cpu->intc = smc->irq->cpu_intc_create(spapr, OBJECT(cpu), &local_err);
     if (local_err) {
         goto error_unregister;
     }
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 38ea2da7a094..5efe33826967 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -191,6 +191,12 @@ static void spapr_irq_print_info_xics(sPAPRMachineState *spapr, Monitor *mon)
     ics_pic_print_info(spapr->ics, mon);
 }
 
+static Object *spapr_irq_cpu_intc_create_xics(sPAPRMachineState *spapr,
+                                              Object *cpu, Error **errp)
+{
+    return icp_create(cpu, spapr->icp_type, XICS_FABRIC(spapr), errp);
+}
+
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
 #define SPAPR_IRQ_XICS_NR_MSIS     \
     (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
@@ -205,6 +211,7 @@ sPAPRIrq spapr_irq_xics = {
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
 };
 
 /*
@@ -302,6 +309,12 @@ static void spapr_irq_print_info_xive(sPAPRMachineState *spapr,
     spapr_xive_pic_print_info(spapr->xive, mon);
 }
 
+static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
+                                              Object *cpu, Error **errp)
+{
+    return xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp);
+}
+
 /*
  * XIVE uses the full IRQ number space. Set it to 8K to be compatible
  * with XICS.
@@ -320,6 +333,7 @@ sPAPRIrq spapr_irq_xive = {
     .qirq        = spapr_qirq_xive,
     .print_info  = spapr_irq_print_info_xive,
     .dt_populate = spapr_dt_xive,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
 };
 
 /*
@@ -425,4 +439,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
 };
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 11/19] spapr: extend the sPAPR IRQ backend for XICS migration
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (9 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 10/19] spapr: allocate the interrupt thread context under the CPU core Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend Cédric Le Goater
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Introduce a new sPAPR IRQ handler to handle resend after migration
when the machine is using a KVM XICS interrupt controller model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/hw/ppc/spapr_irq.h |  2 ++
 hw/ppc/spapr.c             | 13 +++++--------
 hw/ppc/spapr_irq.c         | 27 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 13db0428ab51..84a25ffb6c65 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -43,6 +43,7 @@ typedef struct sPAPRIrq {
                         void *fdt, uint32_t phandle);
     Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
                                Error **errp);
+    int (*post_load)(sPAPRMachineState *spapr, int version_id);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
@@ -53,6 +54,7 @@ void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
 void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
 qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
+int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
 
 /*
  * XICS legacy routines
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8ff22cdb79d8..8cea4cad1732 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1730,14 +1730,6 @@ static int spapr_post_load(void *opaque, int version_id)
         return err;
     }
 
-    if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) {
-        CPUState *cs;
-        CPU_FOREACH(cs) {
-            PowerPCCPU *cpu = POWERPC_CPU(cs);
-            icp_resend(ICP(cpu->intc));
-        }
-    }
-
     /* In earlier versions, there was no separate qdev for the PAPR
      * RTC, so the RTC offset was stored directly in sPAPREnvironment.
      * So when migrating from those versions, poke the incoming offset
@@ -1758,6 +1750,11 @@ static int spapr_post_load(void *opaque, int version_id)
         }
     }
 
+    err = spapr_irq_post_load(spapr, version_id);
+    if (err) {
+        return err;
+    }
+
     return err;
 }
 
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 5efe33826967..35a067cad3f8 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -197,6 +197,18 @@ static Object *spapr_irq_cpu_intc_create_xics(sPAPRMachineState *spapr,
     return icp_create(cpu, spapr->icp_type, XICS_FABRIC(spapr), errp);
 }
 
+static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
+{
+    if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) {
+        CPUState *cs;
+        CPU_FOREACH(cs) {
+            PowerPCCPU *cpu = POWERPC_CPU(cs);
+            icp_resend(ICP(cpu->intc));
+        }
+    }
+    return 0;
+}
+
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
 #define SPAPR_IRQ_XICS_NR_MSIS     \
     (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
@@ -212,6 +224,7 @@ sPAPRIrq spapr_irq_xics = {
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
+    .post_load   = spapr_irq_post_load_xics,
 };
 
 /*
@@ -315,6 +328,11 @@ static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
     return xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp);
 }
 
+static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
+{
+    return 0;
+}
+
 /*
  * XIVE uses the full IRQ number space. Set it to 8K to be compatible
  * with XICS.
@@ -334,6 +352,7 @@ sPAPRIrq spapr_irq_xive = {
     .print_info  = spapr_irq_print_info_xive,
     .dt_populate = spapr_dt_xive,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
+    .post_load   = spapr_irq_post_load_xive,
 };
 
 /*
@@ -372,6 +391,13 @@ qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq)
     return smc->irq->qirq(spapr, irq);
 }
 
+int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+    return smc->irq->post_load(spapr, version_id);
+}
+
 /*
  * XICS legacy routines - to deprecate one day
  */
@@ -440,4 +466,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
+    .post_load   = spapr_irq_post_load_xics,
 };
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (10 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 11/19] spapr: extend the sPAPR IRQ backend for XICS migration Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-10  6:42   ` David Gibson
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 13/19] spapr: add an extra OV5 field " Cédric Le Goater
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

For the time being, the XIVE reset handler updates the OS CAM line of
the vCPU as it is done under a real hypervisor when a vCPU is
scheduled to run on a HW thread.

This handler will become even more useful when we introduce the
machine supporting both interrupt modes, XIVE and XICS. In this
machine, the interrupt mode is chosen by the CAS negotiation process
and activated after a reset.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h  |  2 ++
 include/hw/ppc/spapr_xive.h |  1 +
 hw/intc/spapr_xive.c        | 24 ++++++++++++++++++++++++
 hw/ppc/spapr.c              |  5 +++++
 hw/ppc/spapr_irq.c          | 24 ++++++++++++++++++++++++
 5 files changed, 56 insertions(+)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 84a25ffb6c65..63061a009b4c 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -44,6 +44,7 @@ typedef struct sPAPRIrq {
     Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
                                Error **errp);
     int (*post_load)(sPAPRMachineState *spapr, int version_id);
+    void (*reset)(sPAPRMachineState *spapr, Error **errp);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
@@ -55,6 +56,7 @@ int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
 void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
 qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
 int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
+void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp);
 
 /*
  * XICS legacy routines
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 728a5e8dc163..7244a6231ce6 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -47,5 +47,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
 void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
+void spapr_xive_reset_tctx(sPAPRXive *xive);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index a6d854b07690..560d8d031f74 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -179,6 +179,30 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
 }
 
+/*
+ * When a Virtual Processor is scheduled to run on a HW thread, the
+ * hypervisor pushes its identifier in the OS CAM line. Emulate the
+ * same behavior under QEMU.
+ */
+void spapr_xive_reset_tctx(sPAPRXive *xive)
+{
+    CPUState *cs;
+    uint8_t  nvt_blk;
+    uint32_t nvt_idx;
+    uint32_t nvt_cam;
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+
+        spapr_xive_cpu_to_nvt(cpu, &nvt_blk, &nvt_idx);
+
+        nvt_cam = cpu_to_be32(TM_QW1W2_VO |
+                              xive_nvt_cam_line(nvt_blk, nvt_idx));
+        memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &nvt_cam, 4);
+    }
+}
+
 static void spapr_xive_end_reset(XiveEND *end)
 {
     memset(end, 0, sizeof(*end));
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8cea4cad1732..98d69f09e080 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1619,6 +1619,11 @@ static void spapr_machine_reset(void)
 
     qemu_devices_reset();
 
+    /* This is fixing some of the default configuration of the XIVE
+     * devices. To be called after the reset of the machine devices.
+     */
+    spapr_irq_reset(spapr, &error_fatal);
+
     /* DRC reset may cause a device to be unplugged. This will cause troubles
      * if this device is used by another device (eg, a running vhost backend
      * will crash QEMU if the DIMM holding the vring goes away). To avoid such
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 35a067cad3f8..04f5c9665550 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -209,6 +209,10 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
     return 0;
 }
 
+static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
+{
+}
+
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
 #define SPAPR_IRQ_XICS_NR_MSIS     \
     (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
@@ -225,6 +229,7 @@ sPAPRIrq spapr_irq_xics = {
     .dt_populate = spapr_dt_xics,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
     .post_load   = spapr_irq_post_load_xics,
+    .reset       = spapr_irq_reset_xics,
 };
 
 /*
@@ -333,6 +338,15 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
     return 0;
 }
 
+static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
+{
+    /*
+     * Set the OS CAM line of the cpu interrupt thread context. Needs
+     * to come after the XiveTCTX reset handlers.
+     */
+    spapr_xive_reset_tctx(spapr->xive);
+}
+
 /*
  * XIVE uses the full IRQ number space. Set it to 8K to be compatible
  * with XICS.
@@ -353,6 +367,7 @@ sPAPRIrq spapr_irq_xive = {
     .dt_populate = spapr_dt_xive,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
     .post_load   = spapr_irq_post_load_xive,
+    .reset       = spapr_irq_reset_xive,
 };
 
 /*
@@ -398,6 +413,15 @@ int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
     return smc->irq->post_load(spapr, version_id);
 }
 
+void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+    if (smc->irq->reset) {
+        smc->irq->reset(spapr, errp);
+    }
+}
+
 /*
  * XICS legacy routines - to deprecate one day
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 13/19] spapr: add an extra OV5 field to the sPAPR IRQ backend
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (11 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset Cédric Le Goater
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This field defines the interrupt modes supported by the hypervisor in
the "ibm,arch-vec-5-platform-support" property. The CAS negotiation
process will select which mode to use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h     |  6 ++++++
 include/hw/ppc/spapr_irq.h |  1 +
 hw/ppc/spapr.c             | 23 ++++++++++++++++++-----
 hw/ppc/spapr_irq.c         |  3 +++
 4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6bf028a02fe2..daced428a42c 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -824,5 +824,11 @@ int spapr_caps_post_migration(sPAPRMachineState *spapr);
 
 void spapr_check_pagesize(sPAPRMachineState *spapr, hwaddr pagesize,
                           Error **errp);
+/*
+ * XIVE definitions
+ */
+#define SPAPR_OV5_XIVE_LEGACY   0x0
+#define SPAPR_OV5_XIVE_EXPLOIT  0x40
+#define SPAPR_OV5_XIVE_BOTH     0x80
 
 #endif /* HW_SPAPR_H */
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 63061a009b4c..b34d5a00381b 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -33,6 +33,7 @@ void spapr_irq_msi_reset(sPAPRMachineState *spapr);
 typedef struct sPAPRIrq {
     uint32_t    nr_irqs;
     uint32_t    nr_msis;
+    uint8_t     ov5;
 
     void (*init)(sPAPRMachineState *spapr, Error **errp);
     int (*claim)(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 98d69f09e080..5ef87a00f68b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1095,12 +1095,14 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt)
     spapr_dt_rtas_tokens(fdt, rtas);
 }
 
-/* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU features
- * that the guest may request and thus the valid values for bytes 24..26 of
- * option vector 5: */
-static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
+/* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU
+ * and the XIVE features that the guest may request and thus the valid
+ * values for bytes 23..26 of option vector 5: */
+static void spapr_dt_ov5_platform_support(sPAPRMachineState *spapr, void *fdt,
+                                          int chosen)
 {
     PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 
     char val[2 * 4] = {
         23, 0x00, /* Xive mode, filled in below. */
@@ -1121,7 +1123,13 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
         } else {
             val[3] = 0x00; /* Hash */
         }
+        /* If the KVM XIVE device is not available, the machine can
+         * still operate with kernel_irqchip=off
+         */
+        val[1] = smc->irq->ov5;
     } else {
+        val[1] = smc->irq->ov5;
+
         /* V3 MMU supports both hash and radix in tcg (with dynamic switching) */
         val[3] = 0xC0;
     }
@@ -1189,7 +1197,7 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt)
         _FDT(fdt_setprop_string(fdt, chosen, "stdout-path", stdout_path));
     }
 
-    spapr_dt_ov5_platform_support(fdt, chosen);
+    spapr_dt_ov5_platform_support(spapr, fdt, chosen);
 
     g_free(stdout_path);
     g_free(bootlist);
@@ -2622,6 +2630,11 @@ static void spapr_machine_init(MachineState *machine)
     /* advertise support for ibm,dyamic-memory-v2 */
     spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
 
+    /* advertise XIVE */
+    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT) {
+        spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
+    }
+
     /* init CPUs */
     spapr_init_cpus(spapr);
 
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 04f5c9665550..7a0d4f529763 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -220,6 +220,7 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
 sPAPRIrq spapr_irq_xics = {
     .nr_irqs     = SPAPR_IRQ_XICS_NR_IRQS,
     .nr_msis     = SPAPR_IRQ_XICS_NR_MSIS,
+    .ov5         = SPAPR_OV5_XIVE_LEGACY,
 
     .init        = spapr_irq_init_xics,
     .claim       = spapr_irq_claim_xics,
@@ -358,6 +359,7 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
 sPAPRIrq spapr_irq_xive = {
     .nr_irqs     = SPAPR_IRQ_XIVE_NR_IRQS,
     .nr_msis     = SPAPR_IRQ_XIVE_NR_MSIS,
+    .ov5         = SPAPR_OV5_XIVE_EXPLOIT,
 
     .init        = spapr_irq_init_xive,
     .claim       = spapr_irq_claim_xive,
@@ -482,6 +484,7 @@ int spapr_irq_find(sPAPRMachineState *spapr, int num, bool align, Error **errp)
 sPAPRIrq spapr_irq_xics_legacy = {
     .nr_irqs     = SPAPR_IRQ_XICS_LEGACY_NR_IRQS,
     .nr_msis     = SPAPR_IRQ_XICS_LEGACY_NR_IRQS,
+    .ov5         = SPAPR_OV5_XIVE_LEGACY,
 
     .init        = spapr_irq_init_xics,
     .claim       = spapr_irq_claim_xics,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (12 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 13/19] spapr: add an extra OV5 field " Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-11  1:46   ` David Gibson
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs " Cédric Le Goater
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Currently, the interrupt presenter of the vCPU is set at realize
time. Setting it at reset will become useful when the new machine
supporting both interrupt modes is introduced. In this machine, the
interrupt mode is chosen at CAS time and activated after a reset.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_cpu_core.h |  2 ++
 hw/ppc/spapr_cpu_core.c         | 26 ++++++++++++++++++++++++++
 hw/ppc/spapr_irq.c              | 12 ++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index 9e2821e4b31f..fc8ea9021656 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -53,4 +53,6 @@ static inline sPAPRCPUState *spapr_cpu_state(PowerPCCPU *cpu)
     return (sPAPRCPUState *)cpu->machine_data;
 }
 
+void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type);
+
 #endif
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 1811cd48db90..529de0b6b9c8 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -398,3 +398,29 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
 };
 
 DEFINE_TYPES(spapr_cpu_core_type_infos)
+
+typedef struct ForeachFindIntCArgs {
+    const char *intc_type;
+    Object *intc;
+} ForeachFindIntCArgs;
+
+static int spapr_cpu_core_find_intc(Object *child, void *opaque)
+{
+    ForeachFindIntCArgs *args = opaque;
+
+    if (object_dynamic_cast(child, args->intc_type)) {
+        args->intc = child;
+    }
+
+    return args->intc != NULL;
+}
+
+void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type)
+{
+    ForeachFindIntCArgs args = { intc_type, NULL };
+
+    object_child_foreach(OBJECT(cpu), spapr_cpu_core_find_intc, &args);
+    g_assert(args.intc);
+
+    cpu->intc = args.intc;
+}
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 7a0d4f529763..b423cee30e2c 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -12,6 +12,7 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_xive.h"
 #include "hw/ppc/xics.h"
 #include "sysemu/kvm.h"
@@ -211,6 +212,11 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
 
 static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
 {
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
+    }
 }
 
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
@@ -341,6 +347,12 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
 {
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        spapr_cpu_core_set_intc(POWERPC_CPU(cs), TYPE_XIVE_TCTX);
+    }
+
     /*
      * Set the OS CAM line of the cpu interrupt thread context. Needs
      * to come after the XiveTCTX reset handlers.
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs at reset
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (13 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-11  1:47   ` David Gibson
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS Cédric Le Goater
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Depending on the interrupt mode chosen, enable or disable the XIVE
MMIOs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h | 1 +
 hw/intc/spapr_xive.c        | 9 +++++++++
 hw/ppc/spapr_irq.c          | 8 ++++++++
 3 files changed, 18 insertions(+)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 7244a6231ce6..308afb61a666 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -48,5 +48,6 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr);
 void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
 void spapr_xive_reset_tctx(sPAPRXive *xive);
+void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 560d8d031f74..c6dbb2e8cfc7 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -179,6 +179,15 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
 }
 
+void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable)
+{
+    memory_region_set_enabled(&xive->source.esb_mmio, enable);
+    memory_region_set_enabled(&xive->tm_mmio, enable);
+
+    /* Disable the END ESBs until a guest OS makes use of them */
+    memory_region_set_enabled(&xive->end_source.esb_mmio, false);
+}
+
 /*
  * When a Virtual Processor is scheduled to run on a HW thread, the
  * hypervisor pushes its identifier in the OS CAM line. Emulate the
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index b423cee30e2c..a8e50725397c 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -217,6 +217,11 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
     CPU_FOREACH(cs) {
         spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
     }
+
+    /* Deactivate the XIVE MMIOs */
+    if (spapr->xive) {
+        spapr_xive_enable_mmio(spapr->xive, false);
+    }
 }
 
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
@@ -358,6 +363,9 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
      * to come after the XiveTCTX reset handlers.
      */
     spapr_xive_reset_tctx(spapr->xive);
+
+    /* Activate the XIVE MMIOs */
+    spapr_xive_enable_mmio(spapr->xive, true);
 }
 
 /*
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (14 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs " Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-11  2:03   ` David Gibson
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type Cédric Le Goater
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The interrupt mode is chosen by the CAS negotiation process and
activated after a reset to take into account the required changes in
the machine. These impact the device tree layout, the interrupt
presenter object and the exposed MMIO regions in the case of XIVE.

This default interrupt mode for the machine is XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h |   1 +
 hw/ppc/spapr.c             |   3 +-
 hw/ppc/spapr_hcall.c       |  13 ++++
 hw/ppc/spapr_irq.c         | 143 +++++++++++++++++++++++++++++++++++++
 4 files changed, 159 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index b34d5a00381b..29936498dbc8 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -51,6 +51,7 @@ typedef struct sPAPRIrq {
 extern sPAPRIrq spapr_irq_xics;
 extern sPAPRIrq spapr_irq_xics_legacy;
 extern sPAPRIrq spapr_irq_xive;
+extern sPAPRIrq spapr_irq_dual;
 
 void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5ef87a00f68b..fa41927d95dd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2631,7 +2631,8 @@ static void spapr_machine_init(MachineState *machine)
     spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
 
     /* advertise XIVE */
-    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT) {
+    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT ||
+        smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH) {
         spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
     }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index ae913d070f50..186b6a65543f 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1654,6 +1654,19 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             (spapr_h_cas_compose_response(spapr, args[1], args[2],
                                           ov5_updates) != 0);
     }
+
+    /*
+     * Generate a machine reset when we have an update of the
+     * interrupt mode. Only required on the machine supporting both
+     * mode.
+     */
+    if (!spapr->cas_reboot) {
+        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)
+            && smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH;
+    }
+
     spapr_ovec_cleanup(ov5_updates);
 
     if (spapr->cas_reboot) {
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index a8e50725397c..7c34939f774a 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -392,6 +392,149 @@ sPAPRIrq spapr_irq_xive = {
     .reset       = spapr_irq_reset_xive,
 };
 
+/*
+ * Dual XIVE and XICS IRQ backend.
+ *
+ * Both interrupt mode, XIVE and XICS, objects are created but the
+ * machine starts in legacy interrupt mode (XICS). It can be changed
+ * by the CAS negotiation process and, in that case, the new mode is
+ * activated after extra machine reset.
+ */
+
+/*
+ * Returns the sPAPR IRQ backend negotiated by CAS. XICS is the
+ * default.
+ */
+static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
+{
+    return spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT) ?
+        &spapr_irq_xive : &spapr_irq_xics;
+}
+
+static void spapr_irq_init_dual(sPAPRMachineState *spapr, Error **errp)
+{
+    MachineState *machine = MACHINE(spapr);
+    Error *local_err = NULL;
+
+    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
+        error_setg(errp, "No KVM support for the 'dual' machine");
+        return;
+    }
+
+    spapr_irq_xics.init(spapr, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    spapr_irq_xive.init(spapr, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static int spapr_irq_claim_dual(sPAPRMachineState *spapr, int irq, bool lsi,
+                                Error **errp)
+{
+    int ret;
+    Error *local_err = NULL;
+
+    ret = spapr_irq_xive.claim(spapr, irq, lsi, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return ret;
+    }
+
+    ret = spapr_irq_xics.claim(spapr, irq, lsi, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+
+    return ret;
+}
+
+static void spapr_irq_free_dual(sPAPRMachineState *spapr, int irq, int num)
+{
+    spapr_irq_xive.free(spapr, irq, num);
+    spapr_irq_xics.free(spapr, irq, num);
+}
+
+static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq)
+{
+    return spapr_irq_current(spapr)->qirq(spapr, irq);
+}
+
+static void spapr_irq_print_info_dual(sPAPRMachineState *spapr, Monitor *mon)
+{
+    spapr_irq_current(spapr)->print_info(spapr, mon);
+}
+
+static void spapr_irq_dt_populate_dual(sPAPRMachineState *spapr,
+                                       uint32_t nr_servers, void *fdt,
+                                       uint32_t phandle)
+{
+    spapr_irq_current(spapr)->dt_populate(spapr, nr_servers, fdt, phandle);
+}
+
+static Object *spapr_irq_cpu_intc_create_dual(sPAPRMachineState *spapr,
+                                              Object *cpu, Error **errp)
+{
+    Error *local_err = NULL;
+
+    spapr_irq_xive.cpu_intc_create(spapr, cpu, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    /* Default to XICS interrupt mode */
+    return spapr_irq_xics.cpu_intc_create(spapr, cpu, errp);
+}
+
+static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
+{
+    /*
+     * Force a reset of the XIVE backend after migration. The machine
+     * defaults to XICS at startup.
+     */
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_irq_xive.reset(spapr, &error_fatal);
+    }
+
+    return spapr_irq_current(spapr)->post_load(spapr, version_id);
+}
+
+static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
+{
+    /*
+     * Reset the interrupt mode selected by CAS.
+     */
+    spapr_irq_current(spapr)->reset(spapr, errp);
+}
+
+/*
+ * Define values in sync with the XIVE and XICS backend
+ */
+#define SPAPR_IRQ_DUAL_NR_IRQS     0x2000
+#define SPAPR_IRQ_DUAL_NR_MSIS     (SPAPR_IRQ_DUAL_NR_IRQS - SPAPR_IRQ_MSI)
+
+sPAPRIrq spapr_irq_dual = {
+    .nr_irqs     = SPAPR_IRQ_DUAL_NR_IRQS,
+    .nr_msis     = SPAPR_IRQ_DUAL_NR_MSIS,
+    .ov5         = SPAPR_OV5_XIVE_BOTH,
+
+    .init        = spapr_irq_init_dual,
+    .claim       = spapr_irq_claim_dual,
+    .free        = spapr_irq_free_dual,
+    .qirq        = spapr_qirq_dual,
+    .print_info  = spapr_irq_print_info_dual,
+    .dt_populate = spapr_irq_dt_populate_dual,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
+    .post_load   = spapr_irq_post_load_dual,
+    .reset       = spapr_irq_reset_dual,
+};
+
 /*
  * sPAPR IRQ frontend routines for devices
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (15 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-09 22:05   ` Benjamin Herrenschmidt
  2018-12-10  6:45   ` David Gibson
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' " Cédric Le Goater
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 19/19] spapr: add a 'pseries-4.0-dual' " Cédric Le Goater
  18 siblings, 2 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/compat.h |  3 +++
 hw/ppc/spapr.c      | 25 ++++++++++++++++++++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/hw/compat.h b/include/hw/compat.h
index 6f4d5fc64704..70958328fe7a 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -1,6 +1,9 @@
 #ifndef HW_COMPAT_H
 #define HW_COMPAT_H
 
+#define HW_COMPAT_3_1 \
+    /* empty */
+
 #define HW_COMPAT_3_0 \
     /* empty */
 
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fa41927d95dd..4012ebd794a4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3971,19 +3971,38 @@ static const TypeInfo spapr_machine_info = {
     }                                                                \
     type_init(spapr_machine_register_##suffix)
 
- /*
+/*
+ * pseries-4.0
+ */
+static void spapr_machine_4_0_instance_options(MachineState *machine)
+{
+}
+
+static void spapr_machine_4_0_class_options(MachineClass *mc)
+{
+    /* Defaults for the latest behaviour inherited from the base class */
+}
+
+DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
+
+/*
  * pseries-3.1
  */
+#define SPAPR_COMPAT_3_1                                              \
+    HW_COMPAT_3_1
+
 static void spapr_machine_3_1_instance_options(MachineState *machine)
 {
+    spapr_machine_4_0_instance_options(machine);
 }
 
 static void spapr_machine_3_1_class_options(MachineClass *mc)
 {
-    /* Defaults for the latest behaviour inherited from the base class */
+    spapr_machine_4_0_class_options(mc);
+    SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1);
 }
 
-DEFINE_SPAPR_MACHINE(3_1, "3.1", true);
+DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
 
 /*
  * pseries-3.0
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (16 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  2018-12-10 22:17   ` Cédric Le Goater
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 19/19] spapr: add a 'pseries-4.0-dual' " Cédric Le Goater
  18 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This pseries machine makes use of a new sPAPR IRQ backend supporting
the XIVE interrupt mode.

The guest OS is required to have support for the XIVE exploitation
mode of the POWER9 interrupt controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4012ebd794a4..3cc134a0b673 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3985,6 +3985,21 @@ static void spapr_machine_4_0_class_options(MachineClass *mc)
 
 DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
 
+static void spapr_machine_4_0_xive_instance_options(MachineState *machine)
+{
+    spapr_machine_4_0_instance_options(machine);
+}
+
+static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
+    spapr_machine_4_0_class_options(mc);
+    smc->irq = &spapr_irq_xive;
+}
+
+DEFINE_SPAPR_MACHINE(4_0_xive, "4.0-xive", false);
+
 /*
  * pseries-3.1
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH v7 19/19] spapr: add a 'pseries-4.0-dual' machine type
  2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (17 preceding siblings ...)
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' " Cédric Le Goater
@ 2018-12-09 19:46 ` Cédric Le Goater
  18 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-09 19:46 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This pseries machine makes use of a new sPAPR IRQ backend supporting
both interrupt modes : XIVE and XICS, the default being XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3cc134a0b673..d9fd4851824e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4000,6 +4000,21 @@ static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
 
 DEFINE_SPAPR_MACHINE(4_0_xive, "4.0-xive", false);
 
+static void spapr_machine_4_0_dual_instance_options(MachineState *machine)
+{
+    spapr_machine_4_0_instance_options(machine);
+}
+
+static void spapr_machine_4_0_dual_class_options(MachineClass *mc)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
+    spapr_machine_4_0_class_options(mc);
+    smc->irq = &spapr_irq_dual;
+}
+
+DEFINE_SPAPR_MACHINE(4_0_dual, "4.0-dual", false);
+
 /*
  * pseries-3.1
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type Cédric Le Goater
@ 2018-12-09 22:05   ` Benjamin Herrenschmidt
  2018-12-10  3:41     ` David Gibson
  2018-12-10  6:45   ` David Gibson
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2018-12-09 22:05 UTC (permalink / raw)
  To: Cédric Le Goater, David Gibson; +Cc: qemu-ppc, qemu-devel

On Sun, 2018-12-09 at 20:46 +0100, Cédric Le Goater wrote:
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---

If you're going to do that, can we include large decrementer in there
too ? (patches from Suraj in my tree but they night need a bit of
massaging).

>  include/hw/compat.h |  3 +++
>  hw/ppc/spapr.c      | 25 ++++++++++++++++++++++---
>  2 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 6f4d5fc64704..70958328fe7a 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -1,6 +1,9 @@
>  #ifndef HW_COMPAT_H
>  #define HW_COMPAT_H
>  
> +#define HW_COMPAT_3_1 \
> +    /* empty */
> +
>  #define HW_COMPAT_3_0 \
>      /* empty */
>  
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index fa41927d95dd..4012ebd794a4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3971,19 +3971,38 @@ static const TypeInfo spapr_machine_info = {
>      }                                                                \
>      type_init(spapr_machine_register_##suffix)
>  
> - /*
> +/*
> + * pseries-4.0
> + */
> +static void spapr_machine_4_0_instance_options(MachineState *machine)
> +{
> +}
> +
> +static void spapr_machine_4_0_class_options(MachineClass *mc)
> +{
> +    /* Defaults for the latest behaviour inherited from the base class */
> +}
> +
> +DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
> +
> +/*
>   * pseries-3.1
>   */
> +#define SPAPR_COMPAT_3_1                                              \
> +    HW_COMPAT_3_1
> +
>  static void spapr_machine_3_1_instance_options(MachineState *machine)
>  {
> +    spapr_machine_4_0_instance_options(machine);
>  }
>  
>  static void spapr_machine_3_1_class_options(MachineClass *mc)
>  {
> -    /* Defaults for the latest behaviour inherited from the base class */
> +    spapr_machine_4_0_class_options(mc);
> +    SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1);
>  }
>  
> -DEFINE_SPAPR_MACHINE(3_1, "3.1", true);
> +DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>  
>  /*
>   * pseries-3.0

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type
  2018-12-09 22:05   ` Benjamin Herrenschmidt
@ 2018-12-10  3:41     ` David Gibson
  2018-12-10  7:09       ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-10  3:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Cédric Le Goater, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2918 bytes --]

On Mon, Dec 10, 2018 at 09:05:06AM +1100, Benjamin Herrenschmidt wrote:
> On Sun, 2018-12-09 at 20:46 +0100, Cédric Le Goater wrote:
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> 
> If you're going to do that, can we include large decrementer in there
> too ? (patches from Suraj in my tree but they night need a bit of
> massaging).

We don't need to worry about that here.  The machine type's not
considered finalized until the release, so as long as you get the
large dec stuff in before the 4.0 release, it's fine.

Looks like Eduardo and others are probably doing a big batch machine
type update via the machine tree.  That will probably conflict, but it
should be a fairly easy one for me to sort out when the time comes.

> 
> >  include/hw/compat.h |  3 +++
> >  hw/ppc/spapr.c      | 25 ++++++++++++++++++++++---
> >  2 files changed, 25 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/hw/compat.h b/include/hw/compat.h
> > index 6f4d5fc64704..70958328fe7a 100644
> > --- a/include/hw/compat.h
> > +++ b/include/hw/compat.h
> > @@ -1,6 +1,9 @@
> >  #ifndef HW_COMPAT_H
> >  #define HW_COMPAT_H
> >  
> > +#define HW_COMPAT_3_1 \
> > +    /* empty */
> > +
> >  #define HW_COMPAT_3_0 \
> >      /* empty */
> >  
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index fa41927d95dd..4012ebd794a4 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -3971,19 +3971,38 @@ static const TypeInfo spapr_machine_info = {
> >      }                                                                \
> >      type_init(spapr_machine_register_##suffix)
> >  
> > - /*
> > +/*
> > + * pseries-4.0
> > + */
> > +static void spapr_machine_4_0_instance_options(MachineState *machine)
> > +{
> > +}
> > +
> > +static void spapr_machine_4_0_class_options(MachineClass *mc)
> > +{
> > +    /* Defaults for the latest behaviour inherited from the base class */
> > +}
> > +
> > +DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
> > +
> > +/*
> >   * pseries-3.1
> >   */
> > +#define SPAPR_COMPAT_3_1                                              \
> > +    HW_COMPAT_3_1
> > +
> >  static void spapr_machine_3_1_instance_options(MachineState *machine)
> >  {
> > +    spapr_machine_4_0_instance_options(machine);
> >  }
> >  
> >  static void spapr_machine_3_1_class_options(MachineClass *mc)
> >  {
> > -    /* Defaults for the latest behaviour inherited from the base class */
> > +    spapr_machine_4_0_class_options(mc);
> > +    SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1);
> >  }
> >  
> > -DEFINE_SPAPR_MACHINE(3_1, "3.1", true);
> > +DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >  
> >  /*
> >   * pseries-3.0
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers Cédric Le Goater
@ 2018-12-10  4:16   ` David Gibson
  2018-12-10  7:11     ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-10  4:16 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

On Sun, Dec 09, 2018 at 08:45:52PM +0100, Cédric Le Goater wrote:
> The Event Notification Descriptor (END) XIVE structure also contains
> two Event State Buffers providing further coalescing of interrupts,
> one for the notification event (ESn) and one for the escalation events
> (ESe). A MMIO page is assigned for each to control the EOI through
> loads only. Stores are not allowed.
> 
> The END ESBs are modeled through an object resembling the 'XiveSource'
> It is stateless as the END state bits are backed into the XiveEND
> structure under the XiveRouter and the MMIO accesses follow the same
> rules as for the XiveSource ESBs.
> 
> END ESBs are not supported by the Linux drivers neither on OPAL nor on
> sPAPR. Nevetherless, it provides a mean to study the question in the
> future and validates a bit more the XIVE model.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
> 
>  Changes since v6:
> 
>  - removed the 'chip-id' field from XiveRouter
>  - introduced a 'block-id' field in XiveENDSource to lookup the XIVE
>    END structure when doing a load in the MMIO ESB
>  - removed reset XiveENDSource handler
> 
>  include/hw/ppc/xive.h |  21 ++++++
>  hw/intc/xive.c        | 160 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 179 insertions(+), 2 deletions(-)

Applied to ppc-for-4.0.

I had some thoughts about maybe-nicer arrangements of things here, but
nothing important enough to delay this (the things I'm mulling over
wouldn't break migration, so it's fixable later).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 02/19] ppc/xive: introduce the XIVE interrupt thread context
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 02/19] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
@ 2018-12-10  4:19   ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-10  4:19 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 23169 bytes --]

On Sun, Dec 09, 2018 at 08:45:53PM +0100, Cédric Le Goater wrote:
> Each POWER9 processor chip has a XIVE presenter that can generate four
> different exceptions to its threads:
> 
>   - hypervisor exception,
>   - O/S exception
>   - Event-Based Branch (EBB)
>   - msgsnd (doorbell).
> 
> Each exception has a state independent from the others called a Thread
> Interrupt Management context. This context is a set of registers which
> lets the thread handle priority management and interrupt acknowledgment
> among other things. The most important ones being :
> 
>   - Interrupt Priority Register  (PIPR)
>   - Interrupt Pending Buffer     (IPB)
>   - Current Processor Priority   (CPPR)
>   - Notification Source Register (NSR)
> 
> These registers are accessible through a specific MMIO region, called
> the Thread Interrupt Management Area (TIMA), four aligned pages, each
> exposing a different view of the registers. First page (page address
> ending in 0b00) gives access to the entire context and is reserved for
> the ring 0 view for the physical thread context. The second (page
> address ending in 0b01) is for the hypervisor, ring 1 view. The third
> (page address ending in 0b10) is for the operating system, ring 2
> view. The fourth (page address ending in 0b11) is for user level, ring
> 3 view.
> 
> The thread interrupt context is modeled with a XiveTCTX object
> containing the values of the different exception registers. The TIMA
> region is mapped at the same address for each CPU.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Applied.

> ---
> 
>  Changes since v6
> 
>  - introduced a xive_tctx_word2() helper to extract TM_WORD2 of a ring.
> 
>  include/hw/ppc/xive.h      |  44 ++++
>  include/hw/ppc/xive_regs.h |  82 +++++++
>  hw/intc/xive.c             | 424 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 550 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 014f64aa98f6..1e823a4c64e9 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -367,4 +367,48 @@ typedef struct XiveENDSource {
>  void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon);
>  void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon);
>  
> +/*
> + * XIVE Thread interrupt Management (TM) context
> + */
> +
> +#define TYPE_XIVE_TCTX "xive-tctx"
> +#define XIVE_TCTX(obj) OBJECT_CHECK(XiveTCTX, (obj), TYPE_XIVE_TCTX)
> +
> +/*
> + * XIVE Thread interrupt Management register rings :
> + *
> + *   QW-0  User       event-based exception state
> + *   QW-1  O/S        OS context for priority management, interrupt acks
> + *   QW-2  Pool       hypervisor pool context for virtual processors dispatched
> + *   QW-3  Physical   physical thread context and security context
> + */
> +#define XIVE_TM_RING_COUNT      4
> +#define XIVE_TM_RING_SIZE       0x10
> +
> +typedef struct XiveTCTX {
> +    DeviceState parent_obj;
> +
> +    CPUState    *cs;
> +    qemu_irq    output;
> +
> +    uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
> +} XiveTCTX;
> +
> +/*
> + * XIVE Thread Interrupt Management Aera (TIMA)
> + *
> + * This region gives access to the registers of the thread interrupt
> + * management context. It is four page wide, each page providing a
> + * different view of the registers. The page with the lower offset is
> + * the most privileged and gives access to the entire context.
> + */
> +#define XIVE_TM_HW_PAGE         0x0
> +#define XIVE_TM_HV_PAGE         0x1
> +#define XIVE_TM_OS_PAGE         0x2
> +#define XIVE_TM_USER_PAGE       0x3
> +
> +extern const MemoryRegionOps xive_tm_ops;
> +
> +void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
> +
>  #endif /* PPC_XIVE_H */
> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> index 3c0ebad18b69..ede3d04c5eda 100644
> --- a/include/hw/ppc/xive_regs.h
> +++ b/include/hw/ppc/xive_regs.h
> @@ -23,6 +23,88 @@
>  #define XIVE_SRCNO_INDEX(srcno) ((srcno) & 0x0fffffff)
>  #define XIVE_SRCNO(blk, idx)    ((uint32_t)(blk) << 28 | (idx))
>  
> +#define TM_SHIFT                16
> +
> +/* TM register offsets */
> +#define TM_QW0_USER             0x000 /* All rings */
> +#define TM_QW1_OS               0x010 /* Ring 0..2 */
> +#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
> +#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
> +
> +/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
> +#define TM_NSR                  0x0  /*  +   +   -   +  */
> +#define TM_CPPR                 0x1  /*  -   +   -   +  */
> +#define TM_IPB                  0x2  /*  -   +   +   +  */
> +#define TM_LSMFB                0x3  /*  -   +   +   +  */
> +#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
> +#define TM_INC                  0x5  /*  -   +   -   +  */
> +#define TM_AGE                  0x6  /*  -   +   -   +  */
> +#define TM_PIPR                 0x7  /*  -   +   -   +  */
> +
> +#define TM_WORD0                0x0
> +#define TM_WORD1                0x4
> +
> +/*
> + * QW word 2 contains the valid bit at the top and other fields
> + * depending on the QW.
> + */
> +#define TM_WORD2                0x8
> +#define   TM_QW0W2_VU           PPC_BIT32(0)
> +#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
> +#define   TM_QW1W2_VO           PPC_BIT32(0)
> +#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
> +#define   TM_QW2W2_VP           PPC_BIT32(0)
> +#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
> +#define   TM_QW3W2_VT           PPC_BIT32(0)
> +#define   TM_QW3W2_LP           PPC_BIT32(6)
> +#define   TM_QW3W2_LE           PPC_BIT32(7)
> +#define   TM_QW3W2_T            PPC_BIT32(31)
> +
> +/*
> + * In addition to normal loads to "peek" and writes (only when invalid)
> + * using 4 and 8 bytes accesses, the above registers support these
> + * "special" byte operations:
> + *
> + *   - Byte load from QW0[NSR] - User level NSR (EBB)
> + *   - Byte store to QW0[NSR] - User level NSR (EBB)
> + *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
> + *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
> + *                                    otherwise VT||0000000
> + *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
> + *
> + * Then we have all these "special" CI ops at these offset that trigger
> + * all sorts of side effects:
> + */
> +#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
> +#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
> +#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
> +#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
> +                                         * context */
> +#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
> +#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
> +                                         * context to reg */
> +#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
> +                                         * context to reg*/
> +#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
> +#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
> +                                         * line */
> +#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
> +#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
> +                                         * line */
> +#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
> +/* XXX more... */
> +
> +/* NSR fields for the various QW ack types */
> +#define TM_QW0_NSR_EB           PPC_BIT8(0)
> +#define TM_QW1_NSR_EO           PPC_BIT8(0)
> +#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
> +#define  TM_QW3_NSR_HE_NONE     0
> +#define  TM_QW3_NSR_HE_POOL     1
> +#define  TM_QW3_NSR_HE_PHYS     2
> +#define  TM_QW3_NSR_HE_LSI      3
> +#define TM_QW3_NSR_I            PPC_BIT8(2)
> +#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
> +
>  /* EAS (Event Assignment Structure)
>   *
>   * One per interrupt source. Targets an interrupt to a given Event
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 2196ce8de059..2615d16b7437 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -16,6 +16,429 @@
>  #include "hw/qdev-properties.h"
>  #include "monitor/monitor.h"
>  #include "hw/ppc/xive.h"
> +#include "hw/ppc/xive_regs.h"
> +
> +/*
> + * XIVE Thread Interrupt Management context
> + */
> +
> +static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
> +{
> +    return 0;
> +}
> +
> +static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
> +{
> +    if (cppr > XIVE_PRIORITY_MAX) {
> +        cppr = 0xff;
> +    }
> +
> +    tctx->regs[ring + TM_CPPR] = cppr;
> +}
> +
> +/*
> + * XIVE Thread Interrupt Management Area (TIMA)
> + */
> +
> +/*
> + * Define an access map for each page of the TIMA that we will use in
> + * the memory region ops to filter values when doing loads and stores
> + * of raw registers values
> + *
> + * Registers accessibility bits :
> + *
> + *    0x0 - no access
> + *    0x1 - write only
> + *    0x2 - read only
> + *    0x3 - read/write
> + */
> +
> +static const uint8_t xive_tm_hw_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   3, 3, 3, 0,
> +};
> +
> +static const uint8_t xive_tm_hv_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   0, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   0, 0, 0, 0,
> +};
> +
> +static const uint8_t xive_tm_os_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-1 OS   */   2, 3, 2, 2,   2, 2, 0, 2,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +};
> +
> +static const uint8_t xive_tm_user_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-1 OS   */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +};
> +
> +/*
> + * Overall TIMA access map for the thread interrupt management context
> + * registers
> + */
> +static const uint8_t *xive_tm_views[] = {
> +    [XIVE_TM_HW_PAGE]   = xive_tm_hw_view,
> +    [XIVE_TM_HV_PAGE]   = xive_tm_hv_view,
> +    [XIVE_TM_OS_PAGE]   = xive_tm_os_view,
> +    [XIVE_TM_USER_PAGE] = xive_tm_user_view,
> +};
> +
> +/*
> + * Computes a register access mask for a given offset in the TIMA
> + */
> +static uint64_t xive_tm_mask(hwaddr offset, unsigned size, bool write)
> +{
> +    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
> +    uint8_t reg_offset = offset & 0x3F;
> +    uint8_t reg_mask = write ? 0x1 : 0x2;
> +    uint64_t mask = 0x0;
> +    int i;
> +
> +    for (i = 0; i < size; i++) {
> +        if (xive_tm_views[page_offset][reg_offset + i] & reg_mask) {
> +            mask |= (uint64_t) 0xff << (8 * (size - i - 1));
> +        }
> +    }
> +
> +    return mask;
> +}
> +
> +static void xive_tm_raw_write(XiveTCTX *tctx, hwaddr offset, uint64_t value,
> +                              unsigned size)
> +{
> +    uint8_t ring_offset = offset & 0x30;
> +    uint8_t reg_offset = offset & 0x3F;
> +    uint64_t mask = xive_tm_mask(offset, size, true);
> +    int i;
> +
> +    /*
> +     * Only 4 or 8 bytes stores are allowed and the User ring is
> +     * excluded
> +     */
> +    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA @%"
> +                      HWADDR_PRIx"\n", offset);
> +        return;
> +    }
> +
> +    /*
> +     * Use the register offset for the raw values and filter out
> +     * reserved values
> +     */
> +    for (i = 0; i < size; i++) {
> +        uint8_t byte_mask = (mask >> (8 * (size - i - 1)));
> +        if (byte_mask) {
> +            tctx->regs[reg_offset + i] = (value >> (8 * (size - i - 1))) &
> +                byte_mask;
> +        }
> +    }
> +}
> +
> +static uint64_t xive_tm_raw_read(XiveTCTX *tctx, hwaddr offset, unsigned size)
> +{
> +    uint8_t ring_offset = offset & 0x30;
> +    uint8_t reg_offset = offset & 0x3F;
> +    uint64_t mask = xive_tm_mask(offset, size, false);
> +    uint64_t ret;
> +    int i;
> +
> +    /*
> +     * Only 4 or 8 bytes loads are allowed and the User ring is
> +     * excluded
> +     */
> +    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access at TIMA @%"
> +                      HWADDR_PRIx"\n", offset);
> +        return -1;
> +    }
> +
> +    /* Use the register offset for the raw values */
> +    ret = 0;
> +    for (i = 0; i < size; i++) {
> +        ret |= (uint64_t) tctx->regs[reg_offset + i] << (8 * (size - i - 1));
> +    }
> +
> +    /* filter out reserved values */
> +    return ret & mask;
> +}
> +
> +/*
> + * The TM context is mapped twice within each page. Stores and loads
> + * to the first mapping below 2K write and read the specified values
> + * without modification. The second mapping above 2K performs specific
> + * state changes (side effects) in addition to setting/returning the
> + * interrupt management area context of the processor thread.
> + */
> +static uint64_t xive_tm_ack_os_reg(XiveTCTX *tctx, hwaddr offset, unsigned size)
> +{
> +    return xive_tctx_accept(tctx, TM_QW1_OS);
> +}
> +
> +static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset,
> +                                uint64_t value, unsigned size)
> +{
> +    xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff);
> +}
> +
> +/*
> + * Define a mapping of "special" operations depending on the TIMA page
> + * offset and the size of the operation.
> + */
> +typedef struct XiveTmOp {
> +    uint8_t  page_offset;
> +    uint32_t op_offset;
> +    unsigned size;
> +    void     (*write_handler)(XiveTCTX *tctx, hwaddr offset, uint64_t value,
> +                              unsigned size);
> +    uint64_t (*read_handler)(XiveTCTX *tctx, hwaddr offset, unsigned size);
> +} XiveTmOp;
> +
> +static const XiveTmOp xive_tm_operations[] = {
> +    /*
> +     * MMIOs below 2K : raw values and special operations without side
> +     * effects
> +     */
> +    { XIVE_TM_OS_PAGE, TM_QW1_OS + TM_CPPR,   1, xive_tm_set_os_cppr, NULL },
> +
> +    /* MMIOs above 2K : special operations with side effects */
> +    { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG,     2, NULL, xive_tm_ack_os_reg },
> +};
> +
> +static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write)
> +{
> +    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
> +    uint32_t op_offset = offset & 0xFFF;
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(xive_tm_operations); i++) {
> +        const XiveTmOp *xto = &xive_tm_operations[i];
> +
> +        /* Accesses done from a more privileged TIMA page is allowed */
> +        if (xto->page_offset >= page_offset &&
> +            xto->op_offset == op_offset &&
> +            xto->size == size &&
> +            ((write && xto->write_handler) || (!write && xto->read_handler))) {
> +            return xto;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +/*
> + * TIMA MMIO handlers
> + */
> +static void xive_tm_write(void *opaque, hwaddr offset,
> +                          uint64_t value, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> +    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +    const XiveTmOp *xto;
> +
> +    /*
> +     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
> +     */
> +
> +    /*
> +     * First, check for special operations in the 2K region
> +     */
> +    if (offset & 0x800) {
> +        xto = xive_tm_find_op(offset, size, true);
> +        if (!xto) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA"
> +                          "@%"HWADDR_PRIx"\n", offset);
> +        } else {
> +            xto->write_handler(tctx, offset, value, size);
> +        }
> +        return;
> +    }
> +
> +    /*
> +     * Then, for special operations in the region below 2K.
> +     */
> +    xto = xive_tm_find_op(offset, size, true);
> +    if (xto) {
> +        xto->write_handler(tctx, offset, value, size);
> +        return;
> +    }
> +
> +    /*
> +     * Finish with raw access to the register values
> +     */
> +    xive_tm_raw_write(tctx, offset, value, size);
> +}
> +
> +static uint64_t xive_tm_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> +    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +    const XiveTmOp *xto;
> +
> +    /*
> +     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
> +     */
> +
> +    /*
> +     * First, check for special operations in the 2K region
> +     */
> +    if (offset & 0x800) {
> +        xto = xive_tm_find_op(offset, size, false);
> +        if (!xto) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access to TIMA"
> +                          "@%"HWADDR_PRIx"\n", offset);
> +            return -1;
> +        }
> +        return xto->read_handler(tctx, offset, size);
> +    }
> +
> +    /*
> +     * Then, for special operations in the region below 2K.
> +     */
> +    xto = xive_tm_find_op(offset, size, false);
> +    if (xto) {
> +        return xto->read_handler(tctx, offset, size);
> +    }
> +
> +    /*
> +     * Finish with raw access to the register values
> +     */
> +    return xive_tm_raw_read(tctx, offset, size);
> +}
> +
> +const MemoryRegionOps xive_tm_ops = {
> +    .read = xive_tm_read,
> +    .write = xive_tm_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +static inline uint32_t xive_tctx_word2(uint8_t *ring)
> +{
> +    return *((uint32_t *) &ring[TM_WORD2]);
> +}
> +
> +static char *xive_tctx_ring_print(uint8_t *ring)
> +{
> +    uint32_t w2 = xive_tctx_word2(ring);
> +
> +    return g_strdup_printf("%02x   %02x  %02x    %02x   %02x  "
> +                   "%02x  %02x   %02x  %08x",
> +                   ring[TM_NSR], ring[TM_CPPR], ring[TM_IPB], ring[TM_LSMFB],
> +                   ring[TM_ACK_CNT], ring[TM_INC], ring[TM_AGE], ring[TM_PIPR],
> +                   be32_to_cpu(w2));
> +}
> +
> +static const char * const xive_tctx_ring_names[] = {
> +    "USER", "OS", "POOL", "PHYS",
> +};
> +
> +void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
> +{
> +    int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
> +    int i;
> +
> +    monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
> +                   "  W2\n", cpu_index);
> +
> +    for (i = 0; i < XIVE_TM_RING_COUNT; i++) {
> +        char *s = xive_tctx_ring_print(&tctx->regs[i * XIVE_TM_RING_SIZE]);
> +        monitor_printf(mon, "CPU[%04x]: %4s    %s\n", cpu_index,
> +                       xive_tctx_ring_names[i], s);
> +        g_free(s);
> +    }
> +}
> +
> +static void xive_tctx_reset(void *dev)
> +{
> +    XiveTCTX *tctx = XIVE_TCTX(dev);
> +
> +    memset(tctx->regs, 0, sizeof(tctx->regs));
> +
> +    /* Set some defaults */
> +    tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF;
> +    tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF;
> +    tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF;
> +}
> +
> +static void xive_tctx_realize(DeviceState *dev, Error **errp)
> +{
> +    XiveTCTX *tctx = XIVE_TCTX(dev);
> +    PowerPCCPU *cpu;
> +    CPUPPCState *env;
> +    Object *obj;
> +    Error *local_err = NULL;
> +
> +    obj = object_property_get_link(OBJECT(dev), "cpu", &local_err);
> +    if (!obj) {
> +        error_propagate(errp, local_err);
> +        error_prepend(errp, "required link 'cpu' not found: ");
> +        return;
> +    }
> +
> +    cpu = POWERPC_CPU(obj);
> +    tctx->cs = CPU(obj);
> +
> +    env = &cpu->env;
> +    switch (PPC_INPUT(env)) {
> +    case PPC_FLAGS_INPUT_POWER7:
> +        tctx->output = env->irq_inputs[POWER7_INPUT_INT];
> +        break;
> +
> +    default:
> +        error_setg(errp, "XIVE interrupt controller does not support "
> +                   "this CPU bus model");
> +        return;
> +    }
> +
> +    qemu_register_reset(xive_tctx_reset, dev);
> +}
> +
> +static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
> +{
> +    qemu_unregister_reset(xive_tctx_reset, dev);
> +}
> +
> +static const VMStateDescription vmstate_xive_tctx = {
> +    .name = TYPE_XIVE_TCTX,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_BUFFER(regs, XiveTCTX),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static void xive_tctx_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->desc = "XIVE Interrupt Thread Context";
> +    dc->realize = xive_tctx_realize;
> +    dc->unrealize = xive_tctx_unrealize;
> +    dc->vmsd = &vmstate_xive_tctx;
> +}
> +
> +static const TypeInfo xive_tctx_info = {
> +    .name          = TYPE_XIVE_TCTX,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(XiveTCTX),
> +    .class_init    = xive_tctx_class_init,
> +};
>  
>  /*
>   * XIVE ESB helpers
> @@ -862,6 +1285,7 @@ static void xive_register_types(void)
>      type_register_static(&xive_fabric_info);
>      type_register_static(&xive_router_info);
>      type_register_static(&xive_end_source_info);
> +    type_register_static(&xive_tctx_info);
>  }
>  
>  type_init(xive_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
@ 2018-12-10  4:27   ` David Gibson
  2018-12-10  7:15     ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-10  4:27 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 12572 bytes --]

On Sun, Dec 09, 2018 at 08:45:54PM +0100, Cédric Le Goater wrote:
> The last sub-engine of the XIVE architecture is the Interrupt
> Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
> IVPE share elements, the Power Bus interface (CQ), the routing table
> descriptors, and they can be combined in the same HW logic. We do the
> same in QEMU and combine both engines in the XiveRouter for
> simplicity.
> 
> When the IVRE has completed its job of matching an event source with a
> Notification Virtual Target (NVT) to notify, it forwards the event
> notification to the IVPE sub-engine. The IVPE scans the thread
> interrupt contexts of the Notification Virtual Targets (NVT)
> dispatched on the HW processor threads and if a match is found, it
> signals the thread. If not, the IVPE escalates the notification to
> some other targets and records the notification in a backlog queue.
> 
> The IVPE maintains the thread interrupt context state for each of its
> NVTs not dispatched on HW processor threads in the Notification
> Virtual Target table (NVTT).
> 
> The model currently only supports single NVT notifications.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Applied.

I think the tctx_word2() should have the byteswap, rather than having
it in the callers, but that can be fixed later.

> ---
> 
>  Changes since v6 :
> 
>  - removed HW CAM line setting and use as it is only useful for PowerNV
>  - made use of xive_tctx_word2() helper
>  - made use of GETFIELD_BE32() to compare CAM lines
>  - fixed initialization of XiveTCTXMatch
> 
>  include/hw/ppc/xive.h      |  14 +++
>  include/hw/ppc/xive_regs.h |  24 +++++
>  hw/intc/xive.c             | 185 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 223 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 1e823a4c64e9..19309d1d65d1 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -325,6 +325,10 @@ typedef struct XiveRouterClass {
>                     XiveEND *end);
>      int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                       XiveEND *end, uint8_t word_number);
> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                   XiveNVT *nvt);
> +    int (*write_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                     XiveNVT *nvt, uint8_t word_number);
>  } XiveRouterClass;
>  
>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> @@ -335,6 +339,11 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                          XiveEND *end);
>  int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                            XiveEND *end, uint8_t word_number);
> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt);
> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                          XiveNVT *nvt, uint8_t word_number);
> +
>  
>  /*
>   * XIVE END ESBs
> @@ -411,4 +420,9 @@ extern const MemoryRegionOps xive_tm_ops;
>  
>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
>  
> +static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
> +{
> +    return (nvt_blk << 19) | nvt_idx;
> +}
> +
>  #endif /* PPC_XIVE_H */
> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> index ede3d04c5eda..85557e730cd8 100644
> --- a/include/hw/ppc/xive_regs.h
> +++ b/include/hw/ppc/xive_regs.h
> @@ -186,4 +186,28 @@ typedef struct XiveEND {
>  #define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
>  #define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
>  
> +/* Notification Virtual Target (NVT) */
> +typedef struct XiveNVT {
> +        uint32_t        w0;
> +#define NVT_W0_VALID             PPC_BIT32(0)
> +        uint32_t        w1;
> +        uint32_t        w2;
> +        uint32_t        w3;
> +        uint32_t        w4;
> +        uint32_t        w5;
> +        uint32_t        w6;
> +        uint32_t        w7;
> +        uint32_t        w8;
> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
> +        uint32_t        w9;
> +        uint32_t        wa;
> +        uint32_t        wb;
> +        uint32_t        wc;
> +        uint32_t        wd;
> +        uint32_t        we;
> +        uint32_t        wf;
> +} XiveNVT;
> +
> +#define xive_nvt_is_valid(nvt)    (be32_to_cpu((nvt)->w0) & NVT_W0_VALID)
> +
>  #endif /* PPC_XIVE_REGS_H */
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 2615d16b7437..3eecffe99b3a 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -983,6 +983,183 @@ int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>     return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
>  }
>  
> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> +}
> +
> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt, uint8_t word_number)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->write_nvt(xrtr, nvt_blk, nvt_idx, nvt, word_number);
> +}
> +
> +/*
> + * The thread context register words are in big-endian format.
> + */
> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
> +                                     bool cam_ignore, uint32_t logic_serv)
> +{
> +    uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx);
> +    uint32_t qw2w2 = xive_tctx_word2(&tctx->regs[TM_QW2_HV_POOL]);
> +    uint32_t qw1w2 = xive_tctx_word2(&tctx->regs[TM_QW1_OS]);
> +    uint32_t qw0w2 = xive_tctx_word2(&tctx->regs[TM_QW0_USER]);
> +
> +    /* TODO (PowerNV): ignore mode. The low order bits of the NVT
> +     * identifier are ignored in the "CAM" match.
> +     */
> +
> +    if (format == 0) {
> +        if (cam_ignore == true) {
> +            /* F=0 & i=1: Logical server notification (bits ignored at
> +             * the end of the NVT identifier)
> +             */
> +            qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n",
> +                          nvt_blk, nvt_idx);
> +             return -1;
> +        }
> +
> +        /* F=0 & i=0: Specific NVT notification */
> +
> +        /* TODO (PowerNV) : PHYS ring */
> +
> +        /* HV POOL ring */
> +        if ((be32_to_cpu(qw2w2) & TM_QW2W2_VP) &&
> +            cam == GETFIELD_BE32(TM_QW2W2_POOL_CAM, qw2w2)) {
> +            return TM_QW2_HV_POOL;
> +        }
> +
> +        /* OS ring */
> +        if ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) &&
> +            cam == GETFIELD_BE32(TM_QW1W2_OS_CAM, qw1w2)) {
> +            return TM_QW1_OS;
> +        }
> +    } else {
> +        /* F=1 : User level Event-Based Branch (EBB) notification */
> +
> +        /* USER ring */
> +        if  ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) &&
> +             (cam == GETFIELD_BE32(TM_QW1W2_OS_CAM, qw1w2)) &&
> +             (be32_to_cpu(qw0w2) & TM_QW0W2_VU) &&
> +             (logic_serv == GETFIELD_BE32(TM_QW0W2_LOGIC_SERV, qw0w2))) {
> +            return TM_QW0_USER;
> +        }
> +    }
> +    return -1;
> +}
> +
> +typedef struct XiveTCTXMatch {
> +    XiveTCTX *tctx;
> +    uint8_t ring;
> +} XiveTCTXMatch;
> +
> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> +                                 bool cam_ignore, uint8_t priority,
> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
> +{
> +    CPUState *cs;
> +
> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
> +     * hardwired CAM compares */
> +
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +        int ring;
> +
> +        /*
> +         * HW checks that the CPU is enabled in the Physical Thread
> +         * Enable Register (PTER).
> +         */
> +
> +        /*
> +         * Check the thread context CAM lines and record matches. We
> +         * will handle CPU exception delivery later
> +         */
> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> +                                         cam_ignore, logic_serv);
> +        /*
> +         * Save the context and follow on to catch duplicates, that we
> +         * don't support yet.
> +         */
> +        if (ring != -1) {
> +            if (match->tctx) {
> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
> +                return false;
> +            }
> +
> +            match->ring = ring;
> +            match->tctx = tctx;
> +        }
> +    }
> +
> +    if (!match->tctx) {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
> +                      nvt_blk, nvt_idx);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +/*
> + * This is our simple Xive Presenter Engine model. It is merged in the
> + * Router as it does not require an extra object.
> + *
> + * It receives notification requests sent by the IVRE to find one
> + * matching NVT (or more) dispatched on the processor threads. In case
> + * of a single NVT notification, the process is abreviated and the
> + * thread is signaled if a match is found. In case of a logical server
> + * notification (bits ignored at the end of the NVT identifier), the
> + * IVPE and IVRE select a winning thread using different filters. This
> + * involves 2 or 3 exchanges on the PowerBus that the model does not
> + * support.
> + *
> + * The parameters represent what is sent on the PowerBus
> + */
> +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
> +                                  uint8_t nvt_blk, uint32_t nvt_idx,
> +                                  bool cam_ignore, uint8_t priority,
> +                                  uint32_t logic_serv)
> +{
> +    XiveNVT nvt;
> +    XiveTCTXMatch match = { .tctx = NULL, .ring = 0 };
> +    bool found;
> +
> +    /* NVT cache lookup */
> +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
> +                      nvt_blk, nvt_idx);
> +        return;
> +    }
> +
> +    if (!xive_nvt_is_valid(&nvt)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
> +                      nvt_blk, nvt_idx);
> +        return;
> +    }
> +
> +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
> +                                 priority, logic_serv, &match);
> +    if (found) {
> +        return;
> +    }
> +
> +    /* If no matching NVT is dispatched on a HW thread :
> +     * - update the NVT structure if backlog is activated
> +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
> +     *   activated
> +     */
> +}
> +
>  /*
>   * An END trigger can come from an event trigger (IPI or HW) or from
>   * another chip. We don't model the PowerBus but the END trigger
> @@ -1052,6 +1229,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>      /*
>       * Follows IVPE notification
>       */
> +    xive_presenter_notify(xrtr, format,
> +                          GETFIELD_BE32(END_W6_NVT_BLOCK, end.w6),
> +                          GETFIELD_BE32(END_W6_NVT_INDEX, end.w6),
> +                          GETFIELD_BE32(END_W7_F0_IGNORE, end.w7),
> +                          priority,
> +                          GETFIELD_BE32(END_W7_F1_LOG_SERVER_ID, end.w7));
> +
> +    /* TODO: Auto EOI. */
>  }
>  
>  static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 05/19] spapr/xive: introduce a XIVE interrupt controller
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 05/19] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
@ 2018-12-10  4:36   ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-10  4:36 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 16805 bytes --]

On Sun, Dec 09, 2018 at 08:45:56PM +0100, Cédric Le Goater wrote:
> sPAPRXive models the XIVE interrupt controller of the sPAPR machine.
> It inherits from the XiveRouter and provisions storage for the routing
> tables :
> 
>   - Event Assignment Structure (EAS)
>   - Event Notification Descriptor (END)
> 
> The sPAPRXive model incorporates an internal XiveSource for the IPIs
> and for the interrupts of the virtual devices of the guest. This model
> is consistent with XIVE architecture which also incorporates an
> internal IVSE for IPIs and accelerator interrupts in the IVRE
> sub-engine.
> 
> The sPAPRXive model exports two memory regions, one for the ESB
> trigger and management pages used to control the sources and one for
> the TIMA pages. They are mapped by default at the addresses found on
> chip 0 of a baremetal system. This is also consistent with the XIVE
> architecture which defines a Virtualization Controller BAR for the
> internal IVSE ESB pages and a Thread Managment BAR for the TIMA.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Applied.

> ---
>  default-configs/ppc64-softmmu.mak |   1 +
>  include/hw/ppc/spapr_xive.h       |  45 ++++
>  hw/intc/spapr_xive.c              | 366 ++++++++++++++++++++++++++++++
>  hw/intc/Makefile.objs             |   1 +
>  4 files changed, 413 insertions(+)
>  create mode 100644 include/hw/ppc/spapr_xive.h
>  create mode 100644 hw/intc/spapr_xive.c
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 2d1e7c5c4668..7f34ad0528ed 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -17,6 +17,7 @@ CONFIG_XICS=$(CONFIG_PSERIES)
>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>  CONFIG_XIVE=$(CONFIG_PSERIES)
> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_MEM_DEVICE=y
>  CONFIG_DIMM=y
>  CONFIG_SPAPR_RNG=y
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> new file mode 100644
> index 000000000000..f087959b9924
> --- /dev/null
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -0,0 +1,45 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE interrupt controller model
> + *
> + * Copyright (c) 2017-2018, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PPC_SPAPR_XIVE_H
> +#define PPC_SPAPR_XIVE_H
> +
> +#include "hw/ppc/xive.h"
> +
> +#define TYPE_SPAPR_XIVE "spapr-xive"
> +#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> +
> +typedef struct sPAPRXive {
> +    XiveRouter    parent;
> +
> +    /* Internal interrupt source for IPIs and virtual devices */
> +    XiveSource    source;
> +    hwaddr        vc_base;
> +
> +    /* END ESB MMIOs */
> +    XiveENDSource end_source;
> +    hwaddr        end_base;
> +
> +    /* Routing table */
> +    XiveEAS       *eat;
> +    uint32_t      nr_irqs;
> +    XiveEND       *endt;
> +    uint32_t      nr_ends;
> +
> +    /* TIMA mapping address */
> +    hwaddr        tm_base;
> +    MemoryRegion  tm_mmio;
> +} sPAPRXive;
> +
> +bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
> +bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> +qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
> +
> +#endif /* PPC_SPAPR_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> new file mode 100644
> index 000000000000..eef5830d45c6
> --- /dev/null
> +++ b/hw/intc/spapr_xive.c
> @@ -0,0 +1,366 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE interrupt controller model
> + *
> + * Copyright (c) 2017-2018, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "target/ppc/cpu.h"
> +#include "sysemu/cpus.h"
> +#include "monitor/monitor.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_xive.h"
> +#include "hw/ppc/xive.h"
> +#include "hw/ppc/xive_regs.h"
> +
> +/*
> + * XIVE Virtualization Controller BAR and Thread Managment BAR that we
> + * use for the ESB pages and the TIMA pages
> + */
> +#define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
> +#define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
> +
> +/*
> + * On sPAPR machines, use a simplified output for the XIVE END
> + * structure dumping only the information related to the OS EQ.
> + */
> +static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
> +                                          Monitor *mon)
> +{
> +    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
> +    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
> +    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
> +    uint32_t qentries = 1 << (qsize + 10);
> +    uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
> +    uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
> +
> +    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
> +                   priority, qindex, qentries, qgen);
> +
> +    xive_end_queue_pic_print_info(end, 6, mon);
> +    monitor_printf(mon, "]");
> +}
> +
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> +{
> +    XiveSource *xsrc = &xive->source;
> +    int i;
> +
> +    monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
> +
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        uint8_t pq = xive_source_esb_get(xsrc, i);
> +        XiveEAS *eas = &xive->eat[i];
> +
> +        if (!xive_eas_is_valid(eas)) {
> +            continue;
> +        }
> +
> +        monitor_printf(mon, "  %08x %s %c%c%c %s %08x ", i,
> +                       xive_source_irq_is_lsi(xsrc, i) ? "LSI" : "MSI",
> +                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
> +                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
> +                       xsrc->status[i] & XIVE_STATUS_ASSERTED ? 'A' : ' ',
> +                       xive_eas_is_masked(eas) ? "M" : " ",
> +                       (int) GETFIELD_BE64(EAS_END_DATA, eas->w));
> +
> +        if (!xive_eas_is_masked(eas)) {
> +            uint32_t end_idx = GETFIELD_BE64(EAS_END_INDEX, eas->w);
> +            XiveEND *end;
> +
> +            assert(end_idx < xive->nr_ends);
> +            end = &xive->endt[end_idx];
> +
> +            if (xive_end_is_valid(end)) {
> +                spapr_xive_end_pic_print_info(xive, end, mon);
> +            }
> +        }
> +        monitor_printf(mon, "\n");
> +    }
> +}
> +
> +static void spapr_xive_map_mmio(sPAPRXive *xive)
> +{
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
> +}
> +
> +static void spapr_xive_end_reset(XiveEND *end)
> +{
> +    memset(end, 0, sizeof(*end));
> +
> +    /* switch off the escalation and notification ESBs */
> +    end->w1 = cpu_to_be32(END_W1_ESe_Q | END_W1_ESn_Q);
> +}
> +
> +static void spapr_xive_reset(void *dev)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +    int i;
> +
> +    /*
> +     * The XiveSource has its own reset handler, which mask off all
> +     * IRQs (!P|Q)
> +     */
> +
> +    /* Mask all valid EASs in the IRQ number space. */
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        XiveEAS *eas = &xive->eat[i];
> +        if (xive_eas_is_valid(eas)) {
> +            eas->w = cpu_to_be64(EAS_VALID | EAS_MASKED);
> +        } else {
> +            eas->w = 0;
> +        }
> +    }
> +
> +    /* Clear all ENDs */
> +    for (i = 0; i < xive->nr_ends; i++) {
> +        spapr_xive_end_reset(&xive->endt[i]);
> +    }
> +}
> +
> +static void spapr_xive_instance_init(Object *obj)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(obj);
> +
> +    object_initialize(&xive->source, sizeof(xive->source), TYPE_XIVE_SOURCE);
> +    object_property_add_child(obj, "source", OBJECT(&xive->source), NULL);
> +
> +    object_initialize(&xive->end_source, sizeof(xive->end_source),
> +                      TYPE_XIVE_END_SOURCE);
> +    object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
> +                              NULL);
> +}
> +
> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +    XiveSource *xsrc = &xive->source;
> +    XiveENDSource *end_xsrc = &xive->end_source;
> +    Error *local_err = NULL;
> +
> +    if (!xive->nr_irqs) {
> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> +        return;
> +    }
> +
> +    if (!xive->nr_ends) {
> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> +        return;
> +    }
> +
> +    /*
> +     * Initialize the internal sources, for IPIs and virtual devices.
> +     */
> +    object_property_set_int(OBJECT(xsrc), xive->nr_irqs, "nr-irqs",
> +                            &error_fatal);
> +    object_property_add_const_link(OBJECT(xsrc), "xive", OBJECT(xive),
> +                                   &error_fatal);
> +    object_property_set_bool(OBJECT(xsrc), true, "realized", &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /*
> +     * Initialize the END ESB source
> +     */
> +    object_property_set_int(OBJECT(end_xsrc), xive->nr_irqs, "nr-ends",
> +                            &error_fatal);
> +    object_property_add_const_link(OBJECT(end_xsrc), "xive", OBJECT(xive),
> +                                   &error_fatal);
> +    object_property_set_bool(OBJECT(end_xsrc), true, "realized", &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /* Set the mapping address of the END ESB pages after the source ESBs */
> +    xive->end_base = xive->vc_base + (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
> +
> +    /*
> +     * Allocate the routing tables
> +     */
> +    xive->eat = g_new0(XiveEAS, xive->nr_irqs);
> +    xive->endt = g_new0(XiveEND, xive->nr_ends);
> +
> +    /* TIMA initialization */
> +    memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
> +                          "xive.tima", 4ull << TM_SHIFT);
> +
> +    /* Define all XIVE MMIO regions on SysBus */
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
> +
> +    /* Map all regions */
> +    spapr_xive_map_mmio(xive);
> +
> +    qemu_register_reset(spapr_xive_reset, dev);
> +}
> +
> +static int spapr_xive_get_eas(XiveRouter *xrtr, uint8_t eas_blk,
> +                              uint32_t eas_idx, XiveEAS *eas)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
> +
> +    if (eas_idx >= xive->nr_irqs) {
> +        return -1;
> +    }
> +
> +    *eas = xive->eat[eas_idx];
> +    return 0;
> +}
> +
> +static int spapr_xive_get_end(XiveRouter *xrtr,
> +                              uint8_t end_blk, uint32_t end_idx, XiveEND *end)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
> +
> +    if (end_idx >= xive->nr_ends) {
> +        return -1;
> +    }
> +
> +    memcpy(end, &xive->endt[end_idx], sizeof(XiveEND));
> +    return 0;
> +}
> +
> +static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
> +                                uint32_t end_idx, XiveEND *end,
> +                                uint8_t word_number)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
> +
> +    if (end_idx >= xive->nr_ends) {
> +        return -1;
> +    }
> +
> +    memcpy(&xive->endt[end_idx], end, sizeof(XiveEND));
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_spapr_xive_end = {
> +    .name = TYPE_SPAPR_XIVE "/end",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT32(w0, XiveEND),
> +        VMSTATE_UINT32(w1, XiveEND),
> +        VMSTATE_UINT32(w2, XiveEND),
> +        VMSTATE_UINT32(w3, XiveEND),
> +        VMSTATE_UINT32(w4, XiveEND),
> +        VMSTATE_UINT32(w5, XiveEND),
> +        VMSTATE_UINT32(w6, XiveEND),
> +        VMSTATE_UINT32(w7, XiveEND),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_xive_eas = {
> +    .name = TYPE_SPAPR_XIVE "/eas",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT64(w, XiveEAS),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_xive = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
> +        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
> +                                     vmstate_spapr_xive_eas, XiveEAS),
> +        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(endt, sPAPRXive, nr_ends,
> +                                             vmstate_spapr_xive_end, XiveEND),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property spapr_xive_properties[] = {
> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> +    DEFINE_PROP_UINT32("nr-ends", sPAPRXive, nr_ends, 0),
> +    DEFINE_PROP_UINT64("vc-base", sPAPRXive, vc_base, SPAPR_XIVE_VC_BASE),
> +    DEFINE_PROP_UINT64("tm-base", sPAPRXive, tm_base, SPAPR_XIVE_TM_BASE),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    XiveRouterClass *xrc = XIVE_ROUTER_CLASS(klass);
> +
> +    dc->desc    = "sPAPR XIVE Interrupt Controller";
> +    dc->props   = spapr_xive_properties;
> +    dc->realize = spapr_xive_realize;
> +    dc->vmsd    = &vmstate_spapr_xive;
> +
> +    xrc->get_eas = spapr_xive_get_eas;
> +    xrc->get_end = spapr_xive_get_end;
> +    xrc->write_end = spapr_xive_write_end;
> +}
> +
> +static const TypeInfo spapr_xive_info = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .parent = TYPE_XIVE_ROUTER,
> +    .instance_init = spapr_xive_instance_init,
> +    .instance_size = sizeof(sPAPRXive),
> +    .class_init = spapr_xive_class_init,
> +};
> +
> +static void spapr_xive_register_types(void)
> +{
> +    type_register_static(&spapr_xive_info);
> +}
> +
> +type_init(spapr_xive_register_types)
> +
> +bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
> +{
> +    XiveSource *xsrc = &xive->source;
> +
> +    if (lisn >= xive->nr_irqs) {
> +        return false;
> +    }
> +
> +    xive->eat[lisn].w |= cpu_to_be64(EAS_VALID);
> +    xive_source_irq_set(xsrc, lisn, lsi);
> +    return true;
> +}
> +
> +bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveSource *xsrc = &xive->source;
> +
> +    if (lisn >= xive->nr_irqs) {
> +        return false;
> +    }
> +
> +    xive->eat[lisn].w &= cpu_to_be64(~EAS_VALID);
> +    xive_source_irq_set(xsrc, lisn, false);
> +    return true;
> +}
> +
> +qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveSource *xsrc = &xive->source;
> +
> +    if (lisn >= xive->nr_irqs) {
> +        return NULL;
> +    }
> +
> +    /* The sPAPR machine/device should have claimed the IRQ before */
> +    assert(xive_eas_is_valid(&xive->eat[lisn]));
> +
> +    return xive_source_qirq(xsrc, lisn);
> +}
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 72a46ed91c31..301a8e972d91 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -38,6 +38,7 @@ obj-$(CONFIG_XICS) += xics.o
>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>  obj-$(CONFIG_XIVE) += xive.o
> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/19] spapr/xive: use the VCPU id as a NVT identifier
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 06/19] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
@ 2018-12-10  4:42   ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-10  4:42 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 4744 bytes --]

On Sun, Dec 09, 2018 at 08:45:57PM +0100, Cédric Le Goater wrote:
> The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts
> to find a matching Notification Virtual Target (NVT) among the NVTs
> dispatched on the HW processor threads.
> 
> On a real system, the thread interrupt contexts are updated by the
> hypervisor when a Virtual Processor is scheduled to run on a HW
> thread. Under QEMU, the model will emulate the same behavior by
> hardwiring the NVT identifier in the thread context registers at
> reset.
> 
> The NVT identifier used by the sPAPRXive model is the VCPU id. The END
> identifier is also derived from the VCPU id. A set of helpers doing
> the conversion between identifiers are provided for the hcalls
> configuring the sources and the ENDs.
> 
> The model does not need a NVT table but the XiveRouter NVT operations
> are provided to perform some extra checks in the routing algorithm.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Applied.

> ---
> 
>  Changes since v6:
> 
>  - simplified the prototypes of helpers
>  - introduced an assert in set_nvt() method
> 
>  hw/intc/spapr_xive.c | 56 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index eef5830d45c6..3ade419fdbb1 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -26,6 +26,26 @@
>  #define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
>  #define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
>  
> +/*
> + * The allocation of VP blocks is a complex operation in OPAL and the
> + * VP identifiers have a relation with the number of HW chips, the
> + * size of the VP blocks, VP grouping, etc. The QEMU sPAPR XIVE
> + * controller model does not have the same constraints and can use a
> + * simple mapping scheme of the CPU vcpu_id
> + *
> + * These identifiers are never returned to the OS.
> + */
> +
> +#define SPAPR_XIVE_NVT_BASE 0x400
> +
> +/*
> + * sPAPR NVT and END indexing helpers
> + */
> +static uint32_t spapr_xive_nvt_to_target(uint8_t nvt_blk, uint32_t nvt_idx)
> +{
> +    return nvt_idx - SPAPR_XIVE_NVT_BASE;
> +}
> +
>  /*
>   * On sPAPR machines, use a simplified output for the XIVE END
>   * structure dumping only the information related to the OS EQ.
> @@ -40,7 +60,8 @@ static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
>      uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
>      uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
>  
> -    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
> +    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d",
> +                   spapr_xive_nvt_to_target(0, nvt),
>                     priority, qindex, qentries, qgen);
>  
>      xive_end_queue_pic_print_info(end, 6, mon);
> @@ -246,6 +267,37 @@ static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
>      return 0;
>  }
>  
> +static int spapr_xive_get_nvt(XiveRouter *xrtr,
> +                              uint8_t nvt_blk, uint32_t nvt_idx, XiveNVT *nvt)
> +{
> +    uint32_t vcpu_id = spapr_xive_nvt_to_target(nvt_blk, nvt_idx);
> +    PowerPCCPU *cpu = spapr_find_cpu(vcpu_id);
> +
> +    if (!cpu) {
> +        /* TODO: should we assert() if we can find a NVT ? */
> +        return -1;
> +    }
> +
> +    /*
> +     * sPAPR does not maintain a NVT table. Return that the NVT is
> +     * valid if we have found a matching CPU
> +     */
> +    nvt->w0 = cpu_to_be32(NVT_W0_VALID);
> +    return 0;
> +}
> +
> +static int spapr_xive_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk,
> +                                uint32_t nvt_idx, XiveNVT *nvt,
> +                                uint8_t word_number)
> +{
> +    /*
> +     * We don't need to write back to the NVTs because the sPAPR
> +     * machine should never hit a non-scheduled NVT. It should never
> +     * get called.
> +     */
> +    g_assert_not_reached();
> +}
> +
>  static const VMStateDescription vmstate_spapr_xive_end = {
>      .name = TYPE_SPAPR_XIVE "/end",
>      .version_id = 1,
> @@ -308,6 +360,8 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
>      xrc->get_eas = spapr_xive_get_eas;
>      xrc->get_end = spapr_xive_get_end;
>      xrc->write_end = spapr_xive_write_end;
> +    xrc->get_nvt = spapr_xive_get_nvt;
> +    xrc->write_nvt = spapr_xive_write_nvt;
>  }
>  
>  static const TypeInfo spapr_xive_info = {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/19] spapr: introduce a new machine IRQ backend for XIVE
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 07/19] spapr: introduce a new machine IRQ backend for XIVE Cédric Le Goater
@ 2018-12-10  4:45   ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-10  4:45 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 6830 bytes --]

On Sun, Dec 09, 2018 at 08:45:58PM +0100, Cédric Le Goater wrote:
> The XIVE IRQ backend uses the same layout as the new XICS backend but
> covers the full range of the IRQ number space. The IRQ numbers for the
> CPU IPIs are allocated at the bottom of this space, below 4K, to
> preserve compatibility with XICS which does not use that range.
> 
> This should be enough given that the maximum number of CPUs is 1024
> for the sPAPR machine under QEMU. For the record, the biggest POWER8
> or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
> cores, SMT8).
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

WIth the exception of the TODO noted below.

> ---
>  include/hw/ppc/spapr.h     |   2 +
>  include/hw/ppc/spapr_irq.h |   2 +
>  hw/ppc/spapr_irq.c         | 113 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 117 insertions(+)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 198764066dc9..cb3082d319af 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -16,6 +16,7 @@ typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>  typedef struct sPAPREventSource sPAPREventSource;
>  typedef struct sPAPRPendingHPT sPAPRPendingHPT;
>  typedef struct ICSState ICSState;
> +typedef struct sPAPRXive sPAPRXive;
>  
>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>  #define SPAPR_ENTRY_POINT       0x100
> @@ -175,6 +176,7 @@ struct sPAPRMachineState {
>      const char *icp_type;
>      int32_t irq_map_nr;
>      unsigned long *irq_map;
> +    sPAPRXive  *xive;
>  
>      bool cmd_line_caps[SPAPR_CAP_NUM];
>      sPAPRCapabilities def, eff, mig;
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index bd7301e6d9c6..23cdb51b879e 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -13,6 +13,7 @@
>  /*
>   * IRQ range offsets per device type
>   */
> +#define SPAPR_IRQ_IPI        0x0
>  #define SPAPR_IRQ_EPOW       0x1000  /* XICS_IRQ_BASE offset */
>  #define SPAPR_IRQ_HOTPLUG    0x1001
>  #define SPAPR_IRQ_VIO        0x1100  /* 256 VIO devices */
> @@ -42,6 +43,7 @@ typedef struct sPAPRIrq {
>  
>  extern sPAPRIrq spapr_irq_xics;
>  extern sPAPRIrq spapr_irq_xics_legacy;
> +extern sPAPRIrq spapr_irq_xive;
>  
>  void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
>  int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index f8b651de0ec9..0bf47ff9fa26 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -12,6 +12,7 @@
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_xive.h"
>  #include "hw/ppc/xics.h"
>  #include "sysemu/kvm.h"
>  
> @@ -205,6 +206,118 @@ sPAPRIrq spapr_irq_xics = {
>      .print_info  = spapr_irq_print_info_xics,
>  };
>  
> +/*
> + * XIVE IRQ backend.
> + */
> +static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
> +                                    int nr_servers, Error **errp)
> +{
> +    sPAPRXive *xive;
> +    Error *local_err = NULL;
> +    Object *obj;
> +    uint32_t nr_ends = nr_servers << 3; /* 8 priority ENDs per CPU */
> +    int i;
> +
> +    /* TODO : use qdev_create() ? */

Ok, still waiting on this todo.

> +    obj = object_new(TYPE_SPAPR_XIVE);
> +    object_property_set_int(obj, nr_irqs, "nr-irqs", &error_abort);
> +    object_property_set_int(obj, nr_ends, "nr-ends", &error_abort);
> +    object_property_set_bool(obj, true, "realized", &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +    qdev_set_parent_bus(DEVICE(obj), sysbus_get_default());
> +    xive = SPAPR_XIVE(obj);
> +
> +    /* Enable the CPU IPIs */
> +    for (i = 0; i < nr_servers; ++i) {
> +        spapr_xive_irq_claim(xive, SPAPR_IRQ_IPI + i, false);
> +    }
> +
> +    return xive;
> +}
> +
> +static void spapr_irq_init_xive(sPAPRMachineState *spapr, Error **errp)
> +{
> +    MachineState *machine = MACHINE(spapr);
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +    int nr_irqs = smc->irq->nr_irqs;
> +    Error *local_err = NULL;
> +
> +    /* KVM XIVE device not yet available */
> +    if (kvm_enabled()) {
> +        if (machine_kernel_irqchip_required(machine)) {
> +            error_setg(errp, "kernel_irqchip requested. no KVM XIVE support");
> +            return;
> +        }
> +    }
> +
> +    spapr->xive = spapr_xive_create(spapr, nr_irqs,
> +                                    spapr_max_server_number(spapr), &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +static int spapr_irq_claim_xive(sPAPRMachineState *spapr, int irq, bool lsi,
> +                                Error **errp)
> +{
> +    if (!spapr_xive_irq_claim(spapr->xive, irq, lsi)) {
> +        error_setg(errp, "IRQ %d is invalid", irq);
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +static void spapr_irq_free_xive(sPAPRMachineState *spapr, int irq, int num)
> +{
> +    int i;
> +
> +    for (i = irq; i < irq + num; ++i) {
> +        spapr_xive_irq_free(spapr->xive, i);
> +    }
> +}
> +
> +static qemu_irq spapr_qirq_xive(sPAPRMachineState *spapr, int irq)
> +{
> +    return spapr_xive_qirq(spapr->xive, irq);
> +}
> +
> +static void spapr_irq_print_info_xive(sPAPRMachineState *spapr,
> +                                      Monitor *mon)
> +{
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        xive_tctx_pic_print_info(XIVE_TCTX(cpu->intc), mon);
> +    }
> +
> +    spapr_xive_pic_print_info(spapr->xive, mon);
> +}
> +
> +/*
> + * XIVE uses the full IRQ number space. Set it to 8K to be compatible
> + * with XICS.
> + */
> +
> +#define SPAPR_IRQ_XIVE_NR_IRQS     0x2000
> +#define SPAPR_IRQ_XIVE_NR_MSIS     (SPAPR_IRQ_XIVE_NR_IRQS - SPAPR_IRQ_MSI)
> +
> +sPAPRIrq spapr_irq_xive = {
> +    .nr_irqs     = SPAPR_IRQ_XIVE_NR_IRQS,
> +    .nr_msis     = SPAPR_IRQ_XIVE_NR_MSIS,
> +
> +    .init        = spapr_irq_init_xive,
> +    .claim       = spapr_irq_claim_xive,
> +    .free        = spapr_irq_free_xive,
> +    .qirq        = spapr_qirq_xive,
> +    .print_info  = spapr_irq_print_info_xive,
> +};
> +
>  /*
>   * sPAPR IRQ frontend routines for devices
>   */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/19] spapr: add hcalls support for the XIVE exploitation interrupt mode
  2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 08/19] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
@ 2018-12-10  6:34   ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-10  6:34 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 38932 bytes --]

On Sun, Dec 09, 2018 at 08:45:59PM +0100, Cédric Le Goater wrote:
> The different XIVE virtualization structures (sources and event queues)
> are configured with a set of Hypervisor calls :
> 
>  - H_INT_GET_SOURCE_INFO
> 
>    used to obtain the address of the MMIO page of the Event State
>    Buffer (ESB) entry associated with the source.
> 
>  - H_INT_SET_SOURCE_CONFIG
> 
>    assigns a source to a "target".
> 
>  - H_INT_GET_SOURCE_CONFIG
> 
>    determines which "target" and "priority" is assigned to a source
> 
>  - H_INT_GET_QUEUE_INFO
> 
>    returns the address of the notification management page associated
>    with the specified "target" and "priority".
> 
>  - H_INT_SET_QUEUE_CONFIG
> 
>    sets or resets the event queue for a given "target" and "priority".
>    It is also used to set the notification configuration associated
>    with the queue, only unconditional notification is supported for
>    the moment. Reset is performed with a queue size of 0 and queueing
>    is disabled in that case.
> 
>  - H_INT_GET_QUEUE_CONFIG
> 
>    returns the queue settings for a given "target" and "priority".
> 
>  - H_INT_RESET
> 
>    resets all of the guest's internal interrupt structures to their
>    initial state, losing all configuration set via the hcalls
>    H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> 
>  - H_INT_SYNC
> 
>    issue a synchronisation on a source to make sure all notifications
>    have reached their queue.
> 
> Calls that still need to be addressed :
> 
>    H_INT_SET_OS_REPORTING_LINE
>    H_INT_GET_OS_REPORTING_LINE
> 
> See the code for more documentation on each hcall.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
>  Changes since v6:
> 
>  - simplified the prototypes of helpers
>  - introduced a fixed value for the controller block id value.
>  
>  include/hw/ppc/spapr.h      |  15 +-
>  include/hw/ppc/spapr_xive.h |   4 +
>  hw/intc/spapr_xive.c        | 963 ++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_irq.c          |   2 +
>  4 files changed, 983 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index cb3082d319af..6bf028a02fe2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -452,7 +452,20 @@ struct sPAPRMachineState {
>  #define H_INVALIDATE_PID        0x378
>  #define H_REGISTER_PROC_TBL     0x37C
>  #define H_SIGNAL_SYS_RESET      0x380
> -#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
> +
> +#define H_INT_GET_SOURCE_INFO   0x3A8
> +#define H_INT_SET_SOURCE_CONFIG 0x3AC
> +#define H_INT_GET_SOURCE_CONFIG 0x3B0
> +#define H_INT_GET_QUEUE_INFO    0x3B4
> +#define H_INT_SET_QUEUE_CONFIG  0x3B8
> +#define H_INT_GET_QUEUE_CONFIG  0x3BC
> +#define H_INT_SET_OS_REPORTING_LINE 0x3C0
> +#define H_INT_GET_OS_REPORTING_LINE 0x3C4
> +#define H_INT_ESB               0x3C8
> +#define H_INT_SYNC              0x3CC
> +#define H_INT_RESET             0x3D0
> +
> +#define MAX_HCALL_OPCODE        H_INT_RESET
>  
>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>   * as well.
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index f087959b9924..9506a8f4d10a 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -42,4 +42,8 @@ bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>  qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
>  
> +typedef struct sPAPRMachineState sPAPRMachineState;
> +
> +void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> +
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 3ade419fdbb1..982ac6e17051 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -38,6 +38,13 @@
>  
>  #define SPAPR_XIVE_NVT_BASE 0x400
>  
> +/*
> + * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
> + * to the controller block id value. It can nevertheless be changed
> + * for testing purpose.
> + */
> +#define SPAPR_XIVE_BLOCK_ID 0x0
> +
>  /*
>   * sPAPR NVT and END indexing helpers
>   */
> @@ -46,6 +53,64 @@ static uint32_t spapr_xive_nvt_to_target(uint8_t nvt_blk, uint32_t nvt_idx)
>      return nvt_idx - SPAPR_XIVE_NVT_BASE;
>  }
>  
> +static void spapr_xive_cpu_to_nvt(PowerPCCPU *cpu,
> +                                  uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
> +{
> +    assert(cpu);
> +
> +    if (out_nvt_blk) {
> +        *out_nvt_blk = SPAPR_XIVE_BLOCK_ID;
> +    }
> +
> +    if (out_nvt_blk) {
> +        *out_nvt_idx = SPAPR_XIVE_NVT_BASE + cpu->vcpu_id;
> +    }
> +}
> +
> +static int spapr_xive_target_to_nvt(uint32_t target,
> +                                    uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
> +{
> +    PowerPCCPU *cpu = spapr_find_cpu(target);
> +
> +    if (!cpu) {
> +        return -1;
> +    }
> +
> +    spapr_xive_cpu_to_nvt(cpu, out_nvt_blk, out_nvt_idx);
> +    return 0;
> +}
> +
> +/*
> + * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
> + * priorities per CPU
> + */
> +static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
> +                                  uint8_t *out_end_blk, uint32_t *out_end_idx)
> +{
> +    assert(cpu);
> +
> +    if (out_end_blk) {
> +        *out_end_blk = SPAPR_XIVE_BLOCK_ID;
> +    }
> +
> +    if (out_end_idx) {
> +        *out_end_idx = (cpu->vcpu_id << 3) + prio;
> +    }
> +}
> +
> +static int spapr_xive_target_to_end(uint32_t target, uint8_t prio,
> +                                    uint8_t *out_end_blk, uint32_t *out_end_idx)
> +{
> +    PowerPCCPU *cpu = spapr_find_cpu(target);
> +
> +    if (!cpu) {
> +        return -1;
> +    }
> +
> +    spapr_xive_cpu_to_end(cpu, prio, out_end_blk, out_end_idx);
> +    return 0;
> +}
> +
>  /*
>   * On sPAPR machines, use a simplified output for the XIVE END
>   * structure dumping only the information related to the OS EQ.
> @@ -418,3 +483,901 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
>  
>      return xive_source_qirq(xsrc, lisn);
>  }
> +
> +/*
> + * XIVE hcalls
> + *
> + * The terminology used by the XIVE hcalls is the following :
> + *
> + *   TARGET vCPU number
> + *   EQ     Event Queue assigned by OS to receive event data
> + *   ESB    page for source interrupt management
> + *   LISN   Logical Interrupt Source Number identifying a source in the
> + *          machine
> + *   EISN   Effective Interrupt Source Number used by guest OS to
> + *          identify source in the guest
> + *
> + * The EAS, END, NVT structures are not exposed.
> + */
> +
> +/*
> + * Linux hosts under OPAL reserve priority 7 for their own escalation
> + * interrupts (DD2.X POWER9). So we only allow the guest to use
> + * priorities [0..6].
> + */
> +static bool spapr_xive_priority_is_reserved(uint8_t priority)
> +{
> +    switch (priority) {
> +    case 0 ... 6:
> +        return false;
> +    case 7: /* OPAL escalation queue */
> +    default:
> +        return true;
> +    }
> +}
> +
> +/*
> + * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
> + * real address of the MMIO page through which the Event State Buffer
> + * entry associated with the value of the "lisn" parameter is managed.
> + *
> + * Parameters:
> + * Input
> + * - R4: "flags"
> + *         Bits 0-63 reserved
> + * - R5: "lisn" is per "interrupts", "interrupt-map", or
> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
> + *       ibm,query-interrupt-source-number RTAS call, or as returned
> + *       by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output
> + * - R4: "flags"
> + *         Bits 0-59: Reserved
> + *         Bit 60: H_INT_ESB must be used for Event State Buffer
> + *                 management
> + *         Bit 61: 1 == LSI  0 == MSI
> + *         Bit 62: the full function page supports trigger
> + *         Bit 63: Store EOI Supported
> + * - R5: Logical Real address of full function Event State Buffer
> + *       management page, -1 if H_INT_ESB hcall flag is set to 1.
> + * - R6: Logical Real Address of trigger only Event State Buffer
> + *       management page or -1.
> + * - R7: Power of 2 page size for the ESB management pages returned in
> + *       R5 and R6.
> + */
> +
> +#define SPAPR_XIVE_SRC_H_INT_ESB     PPC_BIT(60) /* ESB manage with H_INT_ESB */
> +#define SPAPR_XIVE_SRC_LSI           PPC_BIT(61) /* Virtual LSI type */
> +#define SPAPR_XIVE_SRC_TRIGGER       PPC_BIT(62) /* Trigger and management
> +                                                    on same page */
> +#define SPAPR_XIVE_SRC_STORE_EOI     PPC_BIT(63) /* Store EOI support */
> +
> +static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
> +                                          sPAPRMachineState *spapr,
> +                                          target_ulong opcode,
> +                                          target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    XiveSource *xsrc = &xive->source;
> +    target_ulong flags  = args[0];
> +    target_ulong lisn   = args[1];
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (lisn >= xive->nr_irqs) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    if (!xive_eas_is_valid(&xive->eat[lisn])) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    /* All sources are emulated under the main XIVE object and share
> +     * the same characteristics.
> +     */
> +    args[0] = 0;
> +    if (!xive_source_esb_has_2page(xsrc)) {
> +        args[0] |= SPAPR_XIVE_SRC_TRIGGER;
> +    }
> +    if (xsrc->esb_flags & XIVE_SRC_STORE_EOI) {
> +        args[0] |= SPAPR_XIVE_SRC_STORE_EOI;
> +    }
> +
> +    /*
> +     * Force the use of the H_INT_ESB hcall in case of an LSI
> +     * interrupt. This is necessary under KVM to re-trigger the
> +     * interrupt if the level is still asserted
> +     */
> +    if (xive_source_irq_is_lsi(xsrc, lisn)) {
> +        args[0] |= SPAPR_XIVE_SRC_H_INT_ESB | SPAPR_XIVE_SRC_LSI;
> +    }
> +
> +    if (!(args[0] & SPAPR_XIVE_SRC_H_INT_ESB)) {
> +        args[1] = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn);
> +    } else {
> +        args[1] = -1;
> +    }
> +
> +    if (xive_source_esb_has_2page(xsrc) &&
> +        !(args[0] & SPAPR_XIVE_SRC_H_INT_ESB)) {
> +        args[2] = xive->vc_base + xive_source_esb_page(xsrc, lisn);
> +    } else {
> +        args[2] = -1;
> +    }
> +
> +    if (xive_source_esb_has_2page(xsrc)) {
> +        args[3] = xsrc->esb_shift - 1;
> +    } else {
> +        args[3] = xsrc->esb_shift;
> +    }
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
> + * Interrupt Source to a target. The Logical Interrupt Source is
> + * designated with the "lisn" parameter and the target is designated
> + * with the "target" and "priority" parameters.  Upon return from the
> + * hcall(), no additional interrupts will be directed to the old EQ.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-61: Reserved
> + *         Bit 62: set the "eisn" in the EAS
> + *         Bit 63: masks the interrupt source in the hardware interrupt
> + *       control structure. An interrupt masked by this mechanism will
> + *       be dropped, but it's source state bits will still be
> + *       set. There is no race-free way of unmasking and restoring the
> + *       source. Thus this should only be used in interrupts that are
> + *       also masked at the source, and only in cases where the
> + *       interrupt is not meant to be used for a large amount of time
> + *       because no valid target exists for it for example
> + * - R5: "lisn" is per "interrupts", "interrupt-map", or
> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
> + *       ibm,query-interrupt-source-number RTAS call, or as returned by
> + *       the H_ALLOCATE_VAS_WINDOW hcall
> + * - R6: "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - R7: "priority" is a valid priority not in
> + *       "ibm,plat-res-int-priorities"
> + * - R8: "eisn" is the guest EISN associated with the "lisn"
> + *
> + * Output:
> + * - None
> + */
> +
> +#define SPAPR_XIVE_SRC_SET_EISN PPC_BIT(62)
> +#define SPAPR_XIVE_SRC_MASK     PPC_BIT(63)
> +
> +static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
> +                                            sPAPRMachineState *spapr,
> +                                            target_ulong opcode,
> +                                            target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    XiveEAS eas, new_eas;
> +    target_ulong flags    = args[0];
> +    target_ulong lisn     = args[1];
> +    target_ulong target   = args[2];
> +    target_ulong priority = args[3];
> +    target_ulong eisn     = args[4];
> +    uint8_t end_blk;
> +    uint32_t end_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~(SPAPR_XIVE_SRC_SET_EISN | SPAPR_XIVE_SRC_MASK)) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (lisn >= xive->nr_irqs) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    eas = xive->eat[lisn];
> +    if (!xive_eas_is_valid(&eas)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    /* priority 0xff is used to reset the EAS */
> +    if (priority == 0xff) {
> +        new_eas.w = cpu_to_be64(EAS_VALID | EAS_MASKED);
> +        goto out;
> +    }
> +
> +    if (flags & SPAPR_XIVE_SRC_MASK) {
> +        new_eas.w = eas.w | cpu_to_be64(EAS_MASKED);
> +    } else {
> +        new_eas.w = eas.w & cpu_to_be64(~EAS_MASKED);
> +    }
> +
> +    if (spapr_xive_priority_is_reserved(priority)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
> +                      priority);
> +        return H_P4;
> +    }
> +
> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the END corresponding to the
> +     * target.
> +     */
> +    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
> +        return H_P3;
> +    }
> +
> +    new_eas.w = SETFIELD_BE64(EAS_END_BLOCK, new_eas.w, end_blk);
> +    new_eas.w = SETFIELD_BE64(EAS_END_INDEX, new_eas.w, end_idx);
> +
> +    if (flags & SPAPR_XIVE_SRC_SET_EISN) {
> +        new_eas.w = SETFIELD_BE64(EAS_END_DATA, new_eas.w, eisn);
> +    }
> +
> +out:
> +    xive->eat[lisn] = new_eas;
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
> + * target/priority pair is assigned to the specified Logical Interrupt
> + * Source.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-63 Reserved
> + * - R5: "lisn" is per "interrupts", "interrupt-map", or
> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
> + *       ibm,query-interrupt-source-number RTAS call, or as
> + *       returned by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output:
> + * - R4: Target to which the specified Logical Interrupt Source is
> + *       assigned
> + * - R5: Priority to which the specified Logical Interrupt Source is
> + *       assigned
> + * - R6: EISN for the specified Logical Interrupt Source (this will be
> + *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
> + */
> +static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
> +                                            sPAPRMachineState *spapr,
> +                                            target_ulong opcode,
> +                                            target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    target_ulong flags = args[0];
> +    target_ulong lisn = args[1];
> +    XiveEAS eas;
> +    XiveEND *end;
> +    uint8_t nvt_blk;
> +    uint32_t end_idx, nvt_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (lisn >= xive->nr_irqs) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    eas = xive->eat[lisn];
> +    if (!xive_eas_is_valid(&eas)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    /* EAS_END_BLOCK is unused on sPAPR */
> +    end_idx = GETFIELD_BE64(EAS_END_INDEX, eas.w);
> +
> +    assert(end_idx < xive->nr_ends);
> +    end = &xive->endt[end_idx];
> +
> +    nvt_blk = GETFIELD_BE32(END_W6_NVT_BLOCK, end->w6);
> +    nvt_idx = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
> +    args[0] = spapr_xive_nvt_to_target(nvt_blk, nvt_idx);
> +
> +    if (xive_eas_is_masked(&eas)) {
> +        args[1] = 0xff;
> +    } else {
> +        args[1] = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
> +    }
> +
> +    args[2] = GETFIELD_BE64(EAS_END_DATA, eas.w);
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
> + * address of the notification management page associated with the
> + * specified target and priority.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-63 Reserved
> + * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - R6: "priority" is a valid priority not in
> + *       "ibm,plat-res-int-priorities"
> + *
> + * Output:
> + * - R4: Logical real address of notification page
> + * - R5: Power of 2 page size of the notification page
> + */
> +static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
> +                                         sPAPRMachineState *spapr,
> +                                         target_ulong opcode,
> +                                         target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    XiveENDSource *end_xsrc = &xive->end_source;
> +    target_ulong flags = args[0];
> +    target_ulong target = args[1];
> +    target_ulong priority = args[2];
> +    XiveEND *end;
> +    uint8_t end_blk;
> +    uint32_t end_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    if (spapr_xive_priority_is_reserved(priority)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
> +                      priority);
> +        return H_P3;
> +    }
> +
> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the END corresponding to the
> +     * target.
> +     */
> +    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
> +        return H_P2;
> +    }
> +
> +    assert(end_idx < xive->nr_ends);
> +    end = &xive->endt[end_idx];
> +
> +    args[0] = xive->end_base + (1ull << (end_xsrc->esb_shift + 1)) * end_idx;
> +    if (xive_end_is_enqueue(end)) {
> +        args[1] = GETFIELD_BE32(END_W0_QSIZE, end->w0) + 12;
> +    } else {
> +        args[1] = 0;
> +    }
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
> + * a given "target" and "priority".  It is also used to set the
> + * notification config associated with the EQ.  An EQ size of 0 is
> + * used to reset the EQ config for a given target and priority. If
> + * resetting the EQ config, the END associated with the given "target"
> + * and "priority" will be changed to disable queueing.
> + *
> + * Upon return from the hcall(), no additional interrupts will be
> + * directed to the old EQ (if one was set). The old EQ (if one was
> + * set) should be investigated for interrupts that occurred prior to
> + * or during the hcall().
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-62: Reserved
> + *         Bit 63: Unconditional Notify (n) per the XIVE spec
> + * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - R6: "priority" is a valid priority not in
> + *       "ibm,plat-res-int-priorities"
> + * - R7: "eventQueue": The logical real address of the start of the EQ
> + * - R8: "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
> + *
> + * Output:
> + * - None
> + */
> +
> +#define SPAPR_XIVE_END_ALWAYS_NOTIFY PPC_BIT(63)
> +
> +static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
> +                                           sPAPRMachineState *spapr,
> +                                           target_ulong opcode,
> +                                           target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    target_ulong flags = args[0];
> +    target_ulong target = args[1];
> +    target_ulong priority = args[2];
> +    target_ulong qpage = args[3];
> +    target_ulong qsize = args[4];
> +    XiveEND end;
> +    uint8_t end_blk, nvt_blk;
> +    uint32_t end_idx, nvt_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~SPAPR_XIVE_END_ALWAYS_NOTIFY) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    if (spapr_xive_priority_is_reserved(priority)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
> +                      priority);
> +        return H_P3;
> +    }
> +
> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the END corresponding to the
> +     * target.
> +     */
> +
> +    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
> +        return H_P2;
> +    }
> +
> +    assert(end_idx < xive->nr_ends);
> +    memcpy(&end, &xive->endt[end_idx], sizeof(XiveEND));
> +
> +    switch (qsize) {
> +    case 12:
> +    case 16:
> +    case 21:
> +    case 24:
> +        end.w2 = cpu_to_be32((qpage >> 32) & 0x0fffffff);
> +        end.w3 = cpu_to_be32(qpage & 0xffffffff);
> +        end.w0 |= cpu_to_be32(END_W0_ENQUEUE);
> +        end.w0 = SETFIELD_BE32(END_W0_QSIZE, end.w0, qsize - 12);
> +        break;
> +    case 0:
> +        /* reset queue and disable queueing */
> +        spapr_xive_end_reset(&end);
> +        goto out;
> +
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid EQ size %"PRIx64"\n",
> +                      qsize);
> +        return H_P5;
> +    }
> +
> +    if (qsize) {
> +        hwaddr plen = 1 << qsize;
> +        void *eq;
> +
> +        /*
> +         * Validate the guest EQ. We should also check that the queue
> +         * has been zeroed by the OS.
> +         */
> +        eq = address_space_map(CPU(cpu)->as, qpage, &plen, true,
> +                               MEMTXATTRS_UNSPECIFIED);
> +        if (plen != 1 << qsize) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to map EQ @0x%"
> +                          HWADDR_PRIx "\n", qpage);
> +            return H_P4;
> +        }
> +        address_space_unmap(CPU(cpu)->as, eq, plen, true, plen);
> +    }
> +
> +    /* "target" should have been validated above */
> +    if (spapr_xive_target_to_nvt(target, &nvt_blk, &nvt_idx)) {
> +        g_assert_not_reached();
> +    }
> +
> +    /* Ensure the priority and target are correctly set (they will not
> +     * be right after allocation)
> +     */
> +    end.w6 = SETFIELD_BE32(END_W6_NVT_BLOCK, 0ul, nvt_blk) |
> +        SETFIELD_BE32(END_W6_NVT_INDEX, 0ul, nvt_idx);
> +    end.w7 = SETFIELD_BE32(END_W7_F0_PRIORITY, 0ul, priority);
> +
> +    if (flags & SPAPR_XIVE_END_ALWAYS_NOTIFY) {
> +        end.w0 |= cpu_to_be32(END_W0_UCOND_NOTIFY);
> +    } else {
> +        end.w0 &= cpu_to_be32((uint32_t)~END_W0_UCOND_NOTIFY);
> +    }
> +
> +    /* The generation bit for the END starts at 1 and The END page
> +     * offset counter starts at 0.
> +     */
> +    end.w1 = cpu_to_be32(END_W1_GENERATION) |
> +        SETFIELD_BE32(END_W1_PAGE_OFF, 0ul, 0ul);
> +    end.w0 |= cpu_to_be32(END_W0_VALID);
> +
> +    /* TODO: issue syncs required to ensure all in-flight interrupts
> +     * are complete on the old END */
> +
> +out:
> +    /* Update END */
> +    memcpy(&xive->endt[end_idx], &end, sizeof(XiveEND));
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
> + * target and priority.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-62: Reserved
> + *         Bit 63: Debug: Return debug data
> + * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - R6: "priority" is a valid priority not in
> + *       "ibm,plat-res-int-priorities"
> + *
> + * Output:
> + * - R4: "flags":
> + *       Bits 0-61: Reserved
> + *       Bit 62: The value of Event Queue Generation Number (g) per
> + *              the XIVE spec if "Debug" = 1
> + *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
> + * - R5: The logical real address of the start of the EQ
> + * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
> + * - R7: The value of Event Queue Offset Counter per XIVE spec
> + *       if "Debug" = 1, else 0
> + *
> + */
> +
> +#define SPAPR_XIVE_END_DEBUG     PPC_BIT(63)
> +
> +static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
> +                                           sPAPRMachineState *spapr,
> +                                           target_ulong opcode,
> +                                           target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    target_ulong flags = args[0];
> +    target_ulong target = args[1];
> +    target_ulong priority = args[2];
> +    XiveEND *end;
> +    uint8_t end_blk;
> +    uint32_t end_idx;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~SPAPR_XIVE_END_DEBUG) {
> +        return H_PARAMETER;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    if (spapr_xive_priority_is_reserved(priority)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
> +                      priority);
> +        return H_P3;
> +    }
> +
> +    /* Validate that "target" is part of the list of threads allocated
> +     * to the partition. For that, find the END corresponding to the
> +     * target.
> +     */
> +    if (spapr_xive_target_to_end(target, priority, &end_blk, &end_idx)) {
> +        return H_P2;
> +    }
> +
> +    assert(end_idx < xive->nr_ends);
> +    end = &xive->endt[end_idx];
> +
> +    args[0] = 0;
> +    if (xive_end_is_notify(end)) {
> +        args[0] |= SPAPR_XIVE_END_ALWAYS_NOTIFY;
> +    }
> +
> +    if (xive_end_is_enqueue(end)) {
> +        args[1] = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
> +            | be32_to_cpu(end->w3);
> +        args[2] = GETFIELD_BE32(END_W0_QSIZE, end->w0) + 12;
> +    } else {
> +        args[1] = 0;
> +        args[2] = 0;
> +    }
> +
> +    /* TODO: do we need any locking on the END ? */
> +    if (flags & SPAPR_XIVE_END_DEBUG) {
> +        /* Load the event queue generation number into the return flags */
> +        args[0] |= (uint64_t)GETFIELD_BE32(END_W1_GENERATION, end->w1) << 62;
> +
> +        /* Load R7 with the event queue offset counter */
> +        args[3] = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
> +    } else {
> +        args[3] = 0;
> +    }
> +
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
> + * reporting cache line pair for the calling thread.  The reporting
> + * cache lines will contain the OS interrupt context when the OS
> + * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
> + * interrupt. The reporting cache lines can be reset by inputting -1
> + * in "reportingLine".  Issuing the CI store byte without reporting
> + * cache lines registered will result in the data not being accessible
> + * to the OS.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-63: Reserved
> + * - R5: "reportingLine": The logical real address of the reporting cache
> + *       line pair
> + *
> + * Output:
> + * - None
> + */
> +static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
> +                                                sPAPRMachineState *spapr,
> +                                                target_ulong opcode,
> +                                                target_ulong *args)
> +{
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    /* TODO: H_INT_SET_OS_REPORTING_LINE */
> +    return H_FUNCTION;
> +}
> +
> +/*
> + * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
> + * real address of the reporting cache line pair set for the input
> + * "target".  If no reporting cache line pair has been set, -1 is
> + * returned.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-63: Reserved
> + * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
> + *       "ibm,ppc-interrupt-gserver#s"
> + * - R6: "reportingLine": The logical real address of the reporting
> + *        cache line pair
> + *
> + * Output:
> + * - R4: The logical real address of the reporting line if set, else -1
> + */
> +static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
> +                                                sPAPRMachineState *spapr,
> +                                                target_ulong opcode,
> +                                                target_ulong *args)
> +{
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    /* TODO: H_INT_GET_OS_REPORTING_LINE */
> +    return H_FUNCTION;
> +}
> +
> +/*
> + * The H_INT_ESB hcall() is used to issue a load or store to the ESB
> + * page for the input "lisn".  This hcall is only supported for LISNs
> + * that have the ESB hcall flag set to 1 when returned from hcall()
> + * H_INT_GET_SOURCE_INFO.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-62: Reserved
> + *         bit 63: Store: Store=1, store operation, else load operation
> + * - R5: "lisn" is per "interrupts", "interrupt-map", or
> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
> + *       ibm,query-interrupt-source-number RTAS call, or as
> + *       returned by the H_ALLOCATE_VAS_WINDOW hcall
> + * - R6: "esbOffset" is the offset into the ESB page for the load or
> + *       store operation
> + * - R7: "storeData" is the data to write for a store operation
> + *
> + * Output:
> + * - R4: The value of the load if load operation, else -1
> + */
> +
> +#define SPAPR_XIVE_ESB_STORE PPC_BIT(63)
> +
> +static target_ulong h_int_esb(PowerPCCPU *cpu,
> +                              sPAPRMachineState *spapr,
> +                              target_ulong opcode,
> +                              target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    XiveEAS eas;
> +    target_ulong flags  = args[0];
> +    target_ulong lisn   = args[1];
> +    target_ulong offset = args[2];
> +    target_ulong data   = args[3];
> +    hwaddr mmio_addr;
> +    XiveSource *xsrc = &xive->source;
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags & ~SPAPR_XIVE_ESB_STORE) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (lisn >= xive->nr_irqs) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    eas = xive->eat[lisn];
> +    if (!xive_eas_is_valid(&eas)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    if (offset > (1ull << xsrc->esb_shift)) {
> +        return H_P3;
> +    }
> +
> +    mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
> +
> +    if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
> +                      (flags & SPAPR_XIVE_ESB_STORE))) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
> +                      HWADDR_PRIx "\n", mmio_addr);
> +        return H_HARDWARE;
> +    }
> +    args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_SYNC hcall() is used to issue hardware syncs that will
> + * ensure any in flight events for the input lisn are in the event
> + * queue.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-63: Reserved
> + * - R5: "lisn" is per "interrupts", "interrupt-map", or
> + *       "ibm,xive-lisn-ranges" properties, or as returned by the
> + *       ibm,query-interrupt-source-number RTAS call, or as
> + *       returned by the H_ALLOCATE_VAS_WINDOW hcall
> + *
> + * Output:
> + * - None
> + */
> +static target_ulong h_int_sync(PowerPCCPU *cpu,
> +                               sPAPRMachineState *spapr,
> +                               target_ulong opcode,
> +                               target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    XiveEAS eas;
> +    target_ulong flags = args[0];
> +    target_ulong lisn = args[1];
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (lisn >= xive->nr_irqs) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    eas = xive->eat[lisn];
> +    if (!xive_eas_is_valid(&eas)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
> +        return H_P2;
> +    }
> +
> +    /*
> +     * H_STATE should be returned if a H_INT_RESET is in progress.
> +     * This is not needed when running the emulation under QEMU
> +     */
> +
> +    /* This is not real hardware. Nothing to be done */
> +    return H_SUCCESS;
> +}
> +
> +/*
> + * The H_INT_RESET hcall() is used to reset all of the partition's
> + * interrupt exploitation structures to their initial state.  This
> + * means losing all previously set interrupt state set via
> + * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> + *
> + * Parameters:
> + * Input:
> + * - R4: "flags"
> + *         Bits 0-63: Reserved
> + *
> + * Output:
> + * - None
> + */
> +static target_ulong h_int_reset(PowerPCCPU *cpu,
> +                                sPAPRMachineState *spapr,
> +                                target_ulong opcode,
> +                                target_ulong *args)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    target_ulong flags   = args[0];
> +
> +    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (flags) {
> +        return H_PARAMETER;
> +    }
> +
> +    device_reset(DEVICE(xive));
> +    return H_SUCCESS;
> +}
> +
> +void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> +{
> +    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
> +    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
> +    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
> +    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
> +    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
> +    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
> +    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
> +                             h_int_set_os_reporting_line);
> +    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
> +                             h_int_get_os_reporting_line);
> +    spapr_register_hypercall(H_INT_ESB, h_int_esb);
> +    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> +    spapr_register_hypercall(H_INT_RESET, h_int_reset);
> +}
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 0bf47ff9fa26..d6768d0936f9 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -259,6 +259,8 @@ static void spapr_irq_init_xive(sPAPRMachineState *spapr, Error **errp)
>          error_propagate(errp, local_err);
>          return;
>      }
> +
> +    spapr_xive_hcall_init(spapr);
>  }
>  
>  static int spapr_irq_claim_xive(sPAPRMachineState *spapr, int irq, bool lsi,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode Cédric Le Goater
@ 2018-12-10  6:39   ` David Gibson
  2018-12-10  7:53     ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-10  6:39 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 9372 bytes --]

On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
> The XIVE interface for the guest is described in the device tree under
> the "interrupt-controller" node. A couple of new properties are
> specific to XIVE :
> 
>  - "reg"
> 
>    contains the base address and size of the thread interrupt
>    managnement areas (TIMA), for the User level and for the Guest OS
>    level. Only the Guest OS level is taken into account today.
> 
>  - "ibm,xive-eq-sizes"
> 
>    the size of the event queues. One cell per size supported, contains
>    log2 of size, in ascending order.
> 
>  - "ibm,xive-lisn-ranges"
> 
>    the IRQ interrupt number ranges assigned to the guest for the IPIs.
> 
> and also under the root node :
> 
>  - "ibm,plat-res-int-priorities"
> 
>    contains a list of priorities that the hypervisor has reserved for
>    its own use. OPAL uses the priority 7 queue to automatically
>    escalate interrupts for all other queues (DD2.X POWER9). So only
>    priorities [0..6] are allowed for the guest.
> 
> Extend the sPAPR IRQ backend with a new handler to populate the DT
> with the appropriate "interrupt-controller" node.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/spapr_irq.h  |  2 ++
>  include/hw/ppc/spapr_xive.h |  2 ++
>  include/hw/ppc/xics.h       |  4 +--
>  hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
>  hw/intc/xics_spapr.c        |  3 +-
>  hw/ppc/spapr.c              |  3 +-
>  hw/ppc/spapr_irq.c          |  3 ++
>  7 files changed, 77 insertions(+), 4 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index 23cdb51b879e..e51e9f052f63 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
>      void (*free)(sPAPRMachineState *spapr, int irq, int num);
>      qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
>      void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
> +    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
> +                        void *fdt, uint32_t phandle);
>  } sPAPRIrq;
>  
>  extern sPAPRIrq spapr_irq_xics;
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 9506a8f4d10a..728a5e8dc163 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
>  typedef struct sPAPRMachineState sPAPRMachineState;
>  
>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> +                   uint32_t phandle);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 9958443d1984..14afda198cdb 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
>      ICPState *(*icp_get)(XICSFabric *xi, int server);
>  } XICSFabricClass;
>  
> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
> -
>  ICPState *xics_icp_get(XICSFabric *xi, int server);
>  
>  /* Internal XICS interfaces */
> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
>  
>  typedef struct sPAPRMachineState sPAPRMachineState;
>  
> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> +                   uint32_t phandle);
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>  void xics_spapr_init(sPAPRMachineState *spapr);
>  
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 982ac6e17051..a6d854b07690 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -14,6 +14,7 @@
>  #include "target/ppc/cpu.h"
>  #include "sysemu/cpus.h"
>  #include "monitor/monitor.h"
> +#include "hw/ppc/fdt.h"
>  #include "hw/ppc/spapr.h"
>  #include "hw/ppc/spapr_xive.h"
>  #include "hw/ppc/xive.h"
> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>  }
> +
> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> +                   uint32_t phandle)
> +{
> +    sPAPRXive *xive = spapr->xive;
> +    int node;
> +    uint64_t timas[2 * 2];
> +    /* Interrupt number ranges for the IPIs */
> +    uint32_t lisn_ranges[] = {
> +        cpu_to_be32(0),
> +        cpu_to_be32(nr_servers),
> +    };
> +    uint32_t eq_sizes[] = {
> +        cpu_to_be32(12), /* 4K */
> +        cpu_to_be32(16), /* 64K */
> +        cpu_to_be32(21), /* 2M */
> +        cpu_to_be32(24), /* 16M */

For KVM, are we going to need to clamp this list based on the
pagesizes the guest can use?

> +    };
> +    /* The following array is in sync with the reserved priorities
> +     * defined by the 'spapr_xive_priority_is_reserved' routine.
> +     */
> +    uint32_t plat_res_int_priorities[] = {
> +        cpu_to_be32(7),    /* start */
> +        cpu_to_be32(0xf8), /* count */
> +    };
> +    gchar *nodename;
> +
> +    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
> +    timas[0] = cpu_to_be64(xive->tm_base +
> +                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
> +    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
> +    timas[2] = cpu_to_be64(xive->tm_base +
> +                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
> +    timas[3] = cpu_to_be64(1ull << TM_SHIFT);

So this gives the MMIO address of the TIMA.  Where does the guest get
the MMIO address for the ESB pages from?

> +    nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
> +                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
> +    _FDT(node = fdt_add_subnode(fdt, 0, nodename));
> +    g_free(nodename);
> +
> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> +
> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
> +                     sizeof(eq_sizes)));
> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
> +                     sizeof(lisn_ranges)));
> +
> +    /* For Linux to link the LSIs to the interrupt controller. */
> +    _FDT(fdt_setprop(fdt, node, "interrupt-controller", NULL, 0));
> +    _FDT(fdt_setprop_cell(fdt, node, "#interrupt-cells", 2));
> +
> +    /* For SLOF */
> +    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
> +    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
> +
> +    /* The "ibm,plat-res-int-priorities" property defines the priority
> +     * ranges reserved by the hypervisor
> +     */
> +    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
> +                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
> +}
> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> index 2e27b92b871a..f67d3c80bf3a 100644
> --- a/hw/intc/xics_spapr.c
> +++ b/hw/intc/xics_spapr.c
> @@ -244,7 +244,8 @@ void xics_spapr_init(sPAPRMachineState *spapr)
>      spapr_register_hypercall(H_IPOLL, h_ipoll);
>  }
>  
> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> +                   uint32_t phandle)
>  {
>      uint32_t interrupt_server_ranges_prop[] = {
>          0, cpu_to_be32(nr_servers),
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3f9fc73f7f59..8ff22cdb79d8 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1268,7 +1268,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>      _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
>  
>      /* /interrupt controller */
> -    spapr_dt_xics(spapr_max_server_number(spapr), fdt, PHANDLE_XICP);
> +    smc->irq->dt_populate(spapr, spapr_max_server_number(spapr), fdt,
> +                          PHANDLE_XICP);
>  
>      ret = spapr_populate_memory(spapr, fdt);
>      if (ret < 0) {
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index d6768d0936f9..38ea2da7a094 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -204,6 +204,7 @@ sPAPRIrq spapr_irq_xics = {
>      .free        = spapr_irq_free_xics,
>      .qirq        = spapr_qirq_xics,
>      .print_info  = spapr_irq_print_info_xics,
> +    .dt_populate = spapr_dt_xics,
>  };
>  
>  /*
> @@ -318,6 +319,7 @@ sPAPRIrq spapr_irq_xive = {
>      .free        = spapr_irq_free_xive,
>      .qirq        = spapr_qirq_xive,
>      .print_info  = spapr_irq_print_info_xive,
> +    .dt_populate = spapr_dt_xive,
>  };
>  
>  /*
> @@ -422,4 +424,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
>      .free        = spapr_irq_free_xics,
>      .qirq        = spapr_qirq_xics,
>      .print_info  = spapr_irq_print_info_xics,
> +    .dt_populate = spapr_dt_xics,
>  };

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend Cédric Le Goater
@ 2018-12-10  6:42   ` David Gibson
  2018-12-10  7:30     ` Cédric Le Goater
  2018-12-11 10:55     ` Cédric Le Goater
  0 siblings, 2 replies; 61+ messages in thread
From: David Gibson @ 2018-12-10  6:42 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 6515 bytes --]

On Sun, Dec 09, 2018 at 08:46:03PM +0100, Cédric Le Goater wrote:
> For the time being, the XIVE reset handler updates the OS CAM line of
> the vCPU as it is done under a real hypervisor when a vCPU is
> scheduled to run on a HW thread.
> 
> This handler will become even more useful when we introduce the
> machine supporting both interrupt modes, XIVE and XICS. In this
> machine, the interrupt mode is chosen by the CAS negotiation process
> and activated after a reset.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/spapr_irq.h  |  2 ++
>  include/hw/ppc/spapr_xive.h |  1 +
>  hw/intc/spapr_xive.c        | 24 ++++++++++++++++++++++++
>  hw/ppc/spapr.c              |  5 +++++
>  hw/ppc/spapr_irq.c          | 24 ++++++++++++++++++++++++
>  5 files changed, 56 insertions(+)
> 
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index 84a25ffb6c65..63061a009b4c 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -44,6 +44,7 @@ typedef struct sPAPRIrq {
>      Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
>                                 Error **errp);
>      int (*post_load)(sPAPRMachineState *spapr, int version_id);
> +    void (*reset)(sPAPRMachineState *spapr, Error **errp);
>  } sPAPRIrq;
>  
>  extern sPAPRIrq spapr_irq_xics;
> @@ -55,6 +56,7 @@ int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
>  void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
>  qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
>  int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
> +void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp);
>  
>  /*
>   * XICS legacy routines
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 728a5e8dc163..7244a6231ce6 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -47,5 +47,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>                     uint32_t phandle);
> +void spapr_xive_reset_tctx(sPAPRXive *xive);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index a6d854b07690..560d8d031f74 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -179,6 +179,30 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
>  }
>  
> +/*
> + * When a Virtual Processor is scheduled to run on a HW thread, the
> + * hypervisor pushes its identifier in the OS CAM line. Emulate the
> + * same behavior under QEMU.
> + */
> +void spapr_xive_reset_tctx(sPAPRXive *xive)
> +{
> +    CPUState *cs;
> +    uint8_t  nvt_blk;
> +    uint32_t nvt_idx;
> +    uint32_t nvt_cam;
> +
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +
> +        spapr_xive_cpu_to_nvt(cpu, &nvt_blk, &nvt_idx);
> +
> +        nvt_cam = cpu_to_be32(TM_QW1W2_VO |
> +                              xive_nvt_cam_line(nvt_blk, nvt_idx));
> +        memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &nvt_cam, 4);
> +    }
> +}
> +
>  static void spapr_xive_end_reset(XiveEND *end)
>  {
>      memset(end, 0, sizeof(*end));
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8cea4cad1732..98d69f09e080 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1619,6 +1619,11 @@ static void spapr_machine_reset(void)
>  
>      qemu_devices_reset();
>  
> +    /* This is fixing some of the default configuration of the XIVE
> +     * devices. To be called after the reset of the machine devices.
> +     */
> +    spapr_irq_reset(spapr, &error_fatal);
> +
>      /* DRC reset may cause a device to be unplugged. This will cause troubles
>       * if this device is used by another device (eg, a running vhost backend
>       * will crash QEMU if the DIMM holding the vring goes away). To avoid such
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 35a067cad3f8..04f5c9665550 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -209,6 +209,10 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
>      return 0;
>  }
>  
> +static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
> +{
> +}

You already have a check for a NULL reset hook in spapr_irq_reset() so
you could omit this empty function.

> +
>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
>  #define SPAPR_IRQ_XICS_NR_MSIS     \
>      (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
> @@ -225,6 +229,7 @@ sPAPRIrq spapr_irq_xics = {
>      .dt_populate = spapr_dt_xics,
>      .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
>      .post_load   = spapr_irq_post_load_xics,
> +    .reset       = spapr_irq_reset_xics,
>  };
>  
>  /*
> @@ -333,6 +338,15 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>      return 0;
>  }
>  
> +static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
> +{
> +    /*
> +     * Set the OS CAM line of the cpu interrupt thread context. Needs
> +     * to come after the XiveTCTX reset handlers.
> +     */
> +    spapr_xive_reset_tctx(spapr->xive);
> +}
> +
>  /*
>   * XIVE uses the full IRQ number space. Set it to 8K to be compatible
>   * with XICS.
> @@ -353,6 +367,7 @@ sPAPRIrq spapr_irq_xive = {
>      .dt_populate = spapr_dt_xive,
>      .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
>      .post_load   = spapr_irq_post_load_xive,
> +    .reset       = spapr_irq_reset_xive,
>  };
>  
>  /*
> @@ -398,6 +413,15 @@ int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
>      return smc->irq->post_load(spapr, version_id);
>  }
>  
> +void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp)
> +{
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +
> +    if (smc->irq->reset) {
> +        smc->irq->reset(spapr, errp);
> +    }
> +}
> +
>  /*
>   * XICS legacy routines - to deprecate one day
>   */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type Cédric Le Goater
  2018-12-09 22:05   ` Benjamin Herrenschmidt
@ 2018-12-10  6:45   ` David Gibson
  1 sibling, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-10  6:45 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2373 bytes --]

On Sun, Dec 09, 2018 at 08:46:08PM +0100, Cédric Le Goater wrote:
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Applied, since we'll need something like this sooner or later anyway.
I may have conflicts to resolve since I think a patch including a
similar chage is in someone else's tree, but it shouldn't be too hard
to deal with.

> ---
>  include/hw/compat.h |  3 +++
>  hw/ppc/spapr.c      | 25 ++++++++++++++++++++++---
>  2 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 6f4d5fc64704..70958328fe7a 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -1,6 +1,9 @@
>  #ifndef HW_COMPAT_H
>  #define HW_COMPAT_H
>  
> +#define HW_COMPAT_3_1 \
> +    /* empty */
> +
>  #define HW_COMPAT_3_0 \
>      /* empty */
>  
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index fa41927d95dd..4012ebd794a4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3971,19 +3971,38 @@ static const TypeInfo spapr_machine_info = {
>      }                                                                \
>      type_init(spapr_machine_register_##suffix)
>  
> - /*
> +/*
> + * pseries-4.0
> + */
> +static void spapr_machine_4_0_instance_options(MachineState *machine)
> +{
> +}
> +
> +static void spapr_machine_4_0_class_options(MachineClass *mc)
> +{
> +    /* Defaults for the latest behaviour inherited from the base class */
> +}
> +
> +DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
> +
> +/*
>   * pseries-3.1
>   */
> +#define SPAPR_COMPAT_3_1                                              \
> +    HW_COMPAT_3_1
> +
>  static void spapr_machine_3_1_instance_options(MachineState *machine)
>  {
> +    spapr_machine_4_0_instance_options(machine);
>  }
>  
>  static void spapr_machine_3_1_class_options(MachineClass *mc)
>  {
> -    /* Defaults for the latest behaviour inherited from the base class */
> +    spapr_machine_4_0_class_options(mc);
> +    SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1);
>  }
>  
> -DEFINE_SPAPR_MACHINE(3_1, "3.1", true);
> +DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>  
>  /*
>   * pseries-3.0

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type
  2018-12-10  3:41     ` David Gibson
@ 2018-12-10  7:09       ` Cédric Le Goater
  0 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-10  7:09 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt; +Cc: qemu-ppc, qemu-devel

On 12/10/18 4:41 AM, David Gibson wrote:
> On Mon, Dec 10, 2018 at 09:05:06AM +1100, Benjamin Herrenschmidt wrote:
>> On Sun, 2018-12-09 at 20:46 +0100, Cédric Le Goater wrote:
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>
>> If you're going to do that, can we include large decrementer in there
>> too ? (patches from Suraj in my tree but they night need a bit of
>> massaging).
> 
> We don't need to worry about that here.  The machine type's not
> considered finalized until the release, so as long as you get the
> large dec stuff in before the 4.0 release, it's fine.

Are we talking about these 5 patches ? 

  target/ppc: Implement large decrementer support for TCG 
  https://github.com/legoater/qemu/commit/9b3131ae25aa1ee630c48a0489d7194b3046031a

  target/ppc: Implement large decrementer support for KVM 
  https://github.com/legoater/qemu/commit/eceb9fe2c77ba40230621af56dd20090a282e2f1

  target/ppc: Implement migration support for large decrementer 
  https://github.com/legoater/qemu/commit/8da02805dfa39b888df530a6f00a59e6b2fbe34b
 
  target/ppc: Enable the large decrementer for TCG and KVM guests 
  https://github.com/legoater/qemu/commit/0cff350c80e19553c35a3fc8a9859533d606c3e8

  target/ppc: Add cmd line option to disable the large decrementer 
  https://github.com/legoater/qemu/commit/7136bfa944d8dc405150d0bc281c3df5cab98ab1

The PowerNV POWER9 will need the TCG part. 

> Looks like Eduardo and others are probably doing a big batch machine
> type update via the machine tree.  That will probably conflict, but it
> should be a fairly easy one for me to sort out when the time comes.

I think you can possibly just drop this patch if someone adds the 
4.0 machine before or just drop the include/hw/compat.h changes

C.

>>
>>>  include/hw/compat.h |  3 +++
>>>  hw/ppc/spapr.c      | 25 ++++++++++++++++++++++---
>>>  2 files changed, 25 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/hw/compat.h b/include/hw/compat.h
>>> index 6f4d5fc64704..70958328fe7a 100644
>>> --- a/include/hw/compat.h
>>> +++ b/include/hw/compat.h
>>> @@ -1,6 +1,9 @@
>>>  #ifndef HW_COMPAT_H
>>>  #define HW_COMPAT_H
>>>  
>>> +#define HW_COMPAT_3_1 \
>>> +    /* empty */
>>> +
>>>  #define HW_COMPAT_3_0 \
>>>      /* empty */
>>>  
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index fa41927d95dd..4012ebd794a4 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -3971,19 +3971,38 @@ static const TypeInfo spapr_machine_info = {
>>>      }                                                                \
>>>      type_init(spapr_machine_register_##suffix)
>>>  
>>> - /*
>>> +/*
>>> + * pseries-4.0
>>> + */
>>> +static void spapr_machine_4_0_instance_options(MachineState *machine)
>>> +{
>>> +}
>>> +
>>> +static void spapr_machine_4_0_class_options(MachineClass *mc)
>>> +{
>>> +    /* Defaults for the latest behaviour inherited from the base class */
>>> +}
>>> +
>>> +DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
>>> +
>>> +/*
>>>   * pseries-3.1
>>>   */
>>> +#define SPAPR_COMPAT_3_1                                              \
>>> +    HW_COMPAT_3_1
>>> +
>>>  static void spapr_machine_3_1_instance_options(MachineState *machine)
>>>  {
>>> +    spapr_machine_4_0_instance_options(machine);
>>>  }
>>>  
>>>  static void spapr_machine_3_1_class_options(MachineClass *mc)
>>>  {
>>> -    /* Defaults for the latest behaviour inherited from the base class */
>>> +    spapr_machine_4_0_class_options(mc);
>>> +    SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1);
>>>  }
>>>  
>>> -DEFINE_SPAPR_MACHINE(3_1, "3.1", true);
>>> +DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>>>  
>>>  /*
>>>   * pseries-3.0
>>
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers
  2018-12-10  4:16   ` David Gibson
@ 2018-12-10  7:11     ` Cédric Le Goater
  0 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-10  7:11 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/10/18 5:16 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:45:52PM +0100, Cédric Le Goater wrote:
>> The Event Notification Descriptor (END) XIVE structure also contains
>> two Event State Buffers providing further coalescing of interrupts,
>> one for the notification event (ESn) and one for the escalation events
>> (ESe). A MMIO page is assigned for each to control the EOI through
>> loads only. Stores are not allowed.
>>
>> The END ESBs are modeled through an object resembling the 'XiveSource'
>> It is stateless as the END state bits are backed into the XiveEND
>> structure under the XiveRouter and the MMIO accesses follow the same
>> rules as for the XiveSource ESBs.
>>
>> END ESBs are not supported by the Linux drivers neither on OPAL nor on
>> sPAPR. Nevetherless, it provides a mean to study the question in the
>> future and validates a bit more the XIVE model.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>
>>  Changes since v6:
>>
>>  - removed the 'chip-id' field from XiveRouter
>>  - introduced a 'block-id' field in XiveENDSource to lookup the XIVE
>>    END structure when doing a load in the MMIO ESB
>>  - removed reset XiveENDSource handler
>>
>>  include/hw/ppc/xive.h |  21 ++++++
>>  hw/intc/xive.c        | 160 +++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 179 insertions(+), 2 deletions(-)
> 
> Applied to ppc-for-4.0.
> 
> I had some thoughts about maybe-nicer arrangements of things here, but
> nothing important enough to delay this (the things I'm mulling over
> wouldn't break migration, so it's fixable later).

OK. No problem for me to do it afterwards. 

It's a bit of pain to maintain a pile of 30/40 patches and changing stuff   
in the first ones. 

C.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter
  2018-12-10  4:27   ` David Gibson
@ 2018-12-10  7:15     ` Cédric Le Goater
  2018-12-11  1:37       ` David Gibson
  0 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-10  7:15 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/10/18 5:27 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:45:54PM +0100, Cédric Le Goater wrote:
>> The last sub-engine of the XIVE architecture is the Interrupt
>> Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
>> IVPE share elements, the Power Bus interface (CQ), the routing table
>> descriptors, and they can be combined in the same HW logic. We do the
>> same in QEMU and combine both engines in the XiveRouter for
>> simplicity.
>>
>> When the IVRE has completed its job of matching an event source with a
>> Notification Virtual Target (NVT) to notify, it forwards the event
>> notification to the IVPE sub-engine. The IVPE scans the thread
>> interrupt contexts of the Notification Virtual Targets (NVT)
>> dispatched on the HW processor threads and if a match is found, it
>> signals the thread. If not, the IVPE escalates the notification to
>> some other targets and records the notification in a backlog queue.
>>
>> The IVPE maintains the thread interrupt context state for each of its
>> NVTs not dispatched on HW processor threads in the Notification
>> Virtual Target table (NVTT).
>>
>> The model currently only supports single NVT notifications.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Applied.
> 
> I think the tctx_word2() should have the byteswap, rather than having
> it in the callers, but that can be fixed later.

I thought it was better to explicitly show in the code where the 
byteswaps were needed. Anyway, this is very localized, so, yes, 
we can change it later on.

C.

> 
>> ---
>>
>>  Changes since v6 :
>>
>>  - removed HW CAM line setting and use as it is only useful for PowerNV
>>  - made use of xive_tctx_word2() helper
>>  - made use of GETFIELD_BE32() to compare CAM lines
>>  - fixed initialization of XiveTCTXMatch
>>
>>  include/hw/ppc/xive.h      |  14 +++
>>  include/hw/ppc/xive_regs.h |  24 +++++
>>  hw/intc/xive.c             | 185 +++++++++++++++++++++++++++++++++++++
>>  3 files changed, 223 insertions(+)
>>
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 1e823a4c64e9..19309d1d65d1 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -325,6 +325,10 @@ typedef struct XiveRouterClass {
>>                     XiveEND *end);
>>      int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                       XiveEND *end, uint8_t word_number);
>> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                   XiveNVT *nvt);
>> +    int (*write_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                     XiveNVT *nvt, uint8_t word_number);
>>  } XiveRouterClass;
>>  
>>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
>> @@ -335,6 +339,11 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                          XiveEND *end);
>>  int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                            XiveEND *end, uint8_t word_number);
>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt);
>> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                          XiveNVT *nvt, uint8_t word_number);
>> +
>>  
>>  /*
>>   * XIVE END ESBs
>> @@ -411,4 +420,9 @@ extern const MemoryRegionOps xive_tm_ops;
>>  
>>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
>>  
>> +static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
>> +{
>> +    return (nvt_blk << 19) | nvt_idx;
>> +}
>> +
>>  #endif /* PPC_XIVE_H */
>> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
>> index ede3d04c5eda..85557e730cd8 100644
>> --- a/include/hw/ppc/xive_regs.h
>> +++ b/include/hw/ppc/xive_regs.h
>> @@ -186,4 +186,28 @@ typedef struct XiveEND {
>>  #define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
>>  #define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
>>  
>> +/* Notification Virtual Target (NVT) */
>> +typedef struct XiveNVT {
>> +        uint32_t        w0;
>> +#define NVT_W0_VALID             PPC_BIT32(0)
>> +        uint32_t        w1;
>> +        uint32_t        w2;
>> +        uint32_t        w3;
>> +        uint32_t        w4;
>> +        uint32_t        w5;
>> +        uint32_t        w6;
>> +        uint32_t        w7;
>> +        uint32_t        w8;
>> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
>> +        uint32_t        w9;
>> +        uint32_t        wa;
>> +        uint32_t        wb;
>> +        uint32_t        wc;
>> +        uint32_t        wd;
>> +        uint32_t        we;
>> +        uint32_t        wf;
>> +} XiveNVT;
>> +
>> +#define xive_nvt_is_valid(nvt)    (be32_to_cpu((nvt)->w0) & NVT_W0_VALID)
>> +
>>  #endif /* PPC_XIVE_REGS_H */
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 2615d16b7437..3eecffe99b3a 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -983,6 +983,183 @@ int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>     return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
>>  }
>>  
>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt)
>> +{
>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>> +
>> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
>> +}
>> +
>> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt, uint8_t word_number)
>> +{
>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>> +
>> +   return xrc->write_nvt(xrtr, nvt_blk, nvt_idx, nvt, word_number);
>> +}
>> +
>> +/*
>> + * The thread context register words are in big-endian format.
>> + */
>> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
>> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                     bool cam_ignore, uint32_t logic_serv)
>> +{
>> +    uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx);
>> +    uint32_t qw2w2 = xive_tctx_word2(&tctx->regs[TM_QW2_HV_POOL]);
>> +    uint32_t qw1w2 = xive_tctx_word2(&tctx->regs[TM_QW1_OS]);
>> +    uint32_t qw0w2 = xive_tctx_word2(&tctx->regs[TM_QW0_USER]);
>> +
>> +    /* TODO (PowerNV): ignore mode. The low order bits of the NVT
>> +     * identifier are ignored in the "CAM" match.
>> +     */
>> +
>> +    if (format == 0) {
>> +        if (cam_ignore == true) {
>> +            /* F=0 & i=1: Logical server notification (bits ignored at
>> +             * the end of the NVT identifier)
>> +             */
>> +            qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n",
>> +                          nvt_blk, nvt_idx);
>> +             return -1;
>> +        }
>> +
>> +        /* F=0 & i=0: Specific NVT notification */
>> +
>> +        /* TODO (PowerNV) : PHYS ring */
>> +
>> +        /* HV POOL ring */
>> +        if ((be32_to_cpu(qw2w2) & TM_QW2W2_VP) &&
>> +            cam == GETFIELD_BE32(TM_QW2W2_POOL_CAM, qw2w2)) {
>> +            return TM_QW2_HV_POOL;
>> +        }
>> +
>> +        /* OS ring */
>> +        if ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) &&
>> +            cam == GETFIELD_BE32(TM_QW1W2_OS_CAM, qw1w2)) {
>> +            return TM_QW1_OS;
>> +        }
>> +    } else {
>> +        /* F=1 : User level Event-Based Branch (EBB) notification */
>> +
>> +        /* USER ring */
>> +        if  ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) &&
>> +             (cam == GETFIELD_BE32(TM_QW1W2_OS_CAM, qw1w2)) &&
>> +             (be32_to_cpu(qw0w2) & TM_QW0W2_VU) &&
>> +             (logic_serv == GETFIELD_BE32(TM_QW0W2_LOGIC_SERV, qw0w2))) {
>> +            return TM_QW0_USER;
>> +        }
>> +    }
>> +    return -1;
>> +}
>> +
>> +typedef struct XiveTCTXMatch {
>> +    XiveTCTX *tctx;
>> +    uint8_t ring;
>> +} XiveTCTXMatch;
>> +
>> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
>> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                 bool cam_ignore, uint8_t priority,
>> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
>> +{
>> +    CPUState *cs;
>> +
>> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
>> +     * hardwired CAM compares */
>> +
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
>> +        int ring;
>> +
>> +        /*
>> +         * HW checks that the CPU is enabled in the Physical Thread
>> +         * Enable Register (PTER).
>> +         */
>> +
>> +        /*
>> +         * Check the thread context CAM lines and record matches. We
>> +         * will handle CPU exception delivery later
>> +         */
>> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
>> +                                         cam_ignore, logic_serv);
>> +        /*
>> +         * Save the context and follow on to catch duplicates, that we
>> +         * don't support yet.
>> +         */
>> +        if (ring != -1) {
>> +            if (match->tctx) {
>> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
>> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
>> +                return false;
>> +            }
>> +
>> +            match->ring = ring;
>> +            match->tctx = tctx;
>> +        }
>> +    }
>> +
>> +    if (!match->tctx) {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
>> +                      nvt_blk, nvt_idx);
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +/*
>> + * This is our simple Xive Presenter Engine model. It is merged in the
>> + * Router as it does not require an extra object.
>> + *
>> + * It receives notification requests sent by the IVRE to find one
>> + * matching NVT (or more) dispatched on the processor threads. In case
>> + * of a single NVT notification, the process is abreviated and the
>> + * thread is signaled if a match is found. In case of a logical server
>> + * notification (bits ignored at the end of the NVT identifier), the
>> + * IVPE and IVRE select a winning thread using different filters. This
>> + * involves 2 or 3 exchanges on the PowerBus that the model does not
>> + * support.
>> + *
>> + * The parameters represent what is sent on the PowerBus
>> + */
>> +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
>> +                                  uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                  bool cam_ignore, uint8_t priority,
>> +                                  uint32_t logic_serv)
>> +{
>> +    XiveNVT nvt;
>> +    XiveTCTXMatch match = { .tctx = NULL, .ring = 0 };
>> +    bool found;
>> +
>> +    /* NVT cache lookup */
>> +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
>> +                      nvt_blk, nvt_idx);
>> +        return;
>> +    }
>> +
>> +    if (!xive_nvt_is_valid(&nvt)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
>> +                      nvt_blk, nvt_idx);
>> +        return;
>> +    }
>> +
>> +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
>> +                                 priority, logic_serv, &match);
>> +    if (found) {
>> +        return;
>> +    }
>> +
>> +    /* If no matching NVT is dispatched on a HW thread :
>> +     * - update the NVT structure if backlog is activated
>> +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
>> +     *   activated
>> +     */
>> +}
>> +
>>  /*
>>   * An END trigger can come from an event trigger (IPI or HW) or from
>>   * another chip. We don't model the PowerBus but the END trigger
>> @@ -1052,6 +1229,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>>      /*
>>       * Follows IVPE notification
>>       */
>> +    xive_presenter_notify(xrtr, format,
>> +                          GETFIELD_BE32(END_W6_NVT_BLOCK, end.w6),
>> +                          GETFIELD_BE32(END_W6_NVT_INDEX, end.w6),
>> +                          GETFIELD_BE32(END_W7_F0_IGNORE, end.w7),
>> +                          priority,
>> +                          GETFIELD_BE32(END_W7_F1_LOG_SERVER_ID, end.w7));
>> +
>> +    /* TODO: Auto EOI. */
>>  }
>>  
>>  static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend
  2018-12-10  6:42   ` David Gibson
@ 2018-12-10  7:30     ` Cédric Le Goater
  2018-12-11 10:55     ` Cédric Le Goater
  1 sibling, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-10  7:30 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/10/18 7:42 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:46:03PM +0100, Cédric Le Goater wrote:
>> For the time being, the XIVE reset handler updates the OS CAM line of
>> the vCPU as it is done under a real hypervisor when a vCPU is
>> scheduled to run on a HW thread.
>>
>> This handler will become even more useful when we introduce the
>> machine supporting both interrupt modes, XIVE and XICS. In this
>> machine, the interrupt mode is chosen by the CAS negotiation process
>> and activated after a reset.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/spapr_irq.h  |  2 ++
>>  include/hw/ppc/spapr_xive.h |  1 +
>>  hw/intc/spapr_xive.c        | 24 ++++++++++++++++++++++++
>>  hw/ppc/spapr.c              |  5 +++++
>>  hw/ppc/spapr_irq.c          | 24 ++++++++++++++++++++++++
>>  5 files changed, 56 insertions(+)
>>
>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>> index 84a25ffb6c65..63061a009b4c 100644
>> --- a/include/hw/ppc/spapr_irq.h
>> +++ b/include/hw/ppc/spapr_irq.h
>> @@ -44,6 +44,7 @@ typedef struct sPAPRIrq {
>>      Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
>>                                 Error **errp);
>>      int (*post_load)(sPAPRMachineState *spapr, int version_id);
>> +    void (*reset)(sPAPRMachineState *spapr, Error **errp);
>>  } sPAPRIrq;
>>  
>>  extern sPAPRIrq spapr_irq_xics;
>> @@ -55,6 +56,7 @@ int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
>>  void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
>>  qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
>>  int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
>> +void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp);
>>  
>>  /*
>>   * XICS legacy routines
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 728a5e8dc163..7244a6231ce6 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -47,5 +47,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>                     uint32_t phandle);
>> +void spapr_xive_reset_tctx(sPAPRXive *xive);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index a6d854b07690..560d8d031f74 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -179,6 +179,30 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
>>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
>>  }
>>  
>> +/*
>> + * When a Virtual Processor is scheduled to run on a HW thread, the
>> + * hypervisor pushes its identifier in the OS CAM line. Emulate the
>> + * same behavior under QEMU.
>> + */
>> +void spapr_xive_reset_tctx(sPAPRXive *xive)
>> +{
>> +    CPUState *cs;
>> +    uint8_t  nvt_blk;
>> +    uint32_t nvt_idx;
>> +    uint32_t nvt_cam;
>> +
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
>> +
>> +        spapr_xive_cpu_to_nvt(cpu, &nvt_blk, &nvt_idx);
>> +
>> +        nvt_cam = cpu_to_be32(TM_QW1W2_VO |
>> +                              xive_nvt_cam_line(nvt_blk, nvt_idx));
>> +        memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &nvt_cam, 4);
>> +    }
>> +}
>> +
>>  static void spapr_xive_end_reset(XiveEND *end)
>>  {
>>      memset(end, 0, sizeof(*end));
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 8cea4cad1732..98d69f09e080 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1619,6 +1619,11 @@ static void spapr_machine_reset(void)
>>  
>>      qemu_devices_reset();
>>  
>> +    /* This is fixing some of the default configuration of the XIVE
>> +     * devices. To be called after the reset of the machine devices.
>> +     */
>> +    spapr_irq_reset(spapr, &error_fatal);
>> +
>>      /* DRC reset may cause a device to be unplugged. This will cause troubles
>>       * if this device is used by another device (eg, a running vhost backend
>>       * will crash QEMU if the DIMM holding the vring goes away). To avoid such
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 35a067cad3f8..04f5c9665550 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -209,6 +209,10 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
>>      return 0;
>>  }
>>  
>> +static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +}
> 
> You already have a check for a NULL reset hook in spapr_irq_reset() so
> you could omit this empty function.

It's being used in patch 14 and 15. But I can add the XICS reset handler
at that time.

C.

>> +
>>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
>>  #define SPAPR_IRQ_XICS_NR_MSIS     \
>>      (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
>> @@ -225,6 +229,7 @@ sPAPRIrq spapr_irq_xics = {
>>      .dt_populate = spapr_dt_xics,
>>      .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
>>      .post_load   = spapr_irq_post_load_xics,
>> +    .reset       = spapr_irq_reset_xics,
>>  };
>>  
>>  /*
>> @@ -333,6 +338,15 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>>      return 0;
>>  }
>>  
>> +static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +    /*
>> +     * Set the OS CAM line of the cpu interrupt thread context. Needs
>> +     * to come after the XiveTCTX reset handlers.
>> +     */
>> +    spapr_xive_reset_tctx(spapr->xive);
>> +}
>> +
>>  /*
>>   * XIVE uses the full IRQ number space. Set it to 8K to be compatible
>>   * with XICS.
>> @@ -353,6 +367,7 @@ sPAPRIrq spapr_irq_xive = {
>>      .dt_populate = spapr_dt_xive,
>>      .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
>>      .post_load   = spapr_irq_post_load_xive,
>> +    .reset       = spapr_irq_reset_xive,
>>  };
>>  
>>  /*
>> @@ -398,6 +413,15 @@ int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
>>      return smc->irq->post_load(spapr, version_id);
>>  }
>>  
>> +void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>> +
>> +    if (smc->irq->reset) {
>> +        smc->irq->reset(spapr, errp);
>> +    }
>> +}
>> +
>>  /*
>>   * XICS legacy routines - to deprecate one day
>>   */
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode
  2018-12-10  6:39   ` David Gibson
@ 2018-12-10  7:53     ` Cédric Le Goater
  2018-12-11  0:38       ` David Gibson
  0 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-10  7:53 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/10/18 7:39 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
>> The XIVE interface for the guest is described in the device tree under
>> the "interrupt-controller" node. A couple of new properties are
>> specific to XIVE :
>>
>>  - "reg"
>>
>>    contains the base address and size of the thread interrupt
>>    managnement areas (TIMA), for the User level and for the Guest OS
>>    level. Only the Guest OS level is taken into account today.
>>
>>  - "ibm,xive-eq-sizes"
>>
>>    the size of the event queues. One cell per size supported, contains
>>    log2 of size, in ascending order.
>>
>>  - "ibm,xive-lisn-ranges"
>>
>>    the IRQ interrupt number ranges assigned to the guest for the IPIs.
>>
>> and also under the root node :
>>
>>  - "ibm,plat-res-int-priorities"
>>
>>    contains a list of priorities that the hypervisor has reserved for
>>    its own use. OPAL uses the priority 7 queue to automatically
>>    escalate interrupts for all other queues (DD2.X POWER9). So only
>>    priorities [0..6] are allowed for the guest.
>>
>> Extend the sPAPR IRQ backend with a new handler to populate the DT
>> with the appropriate "interrupt-controller" node.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/spapr_irq.h  |  2 ++
>>  include/hw/ppc/spapr_xive.h |  2 ++
>>  include/hw/ppc/xics.h       |  4 +--
>>  hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
>>  hw/intc/xics_spapr.c        |  3 +-
>>  hw/ppc/spapr.c              |  3 +-
>>  hw/ppc/spapr_irq.c          |  3 ++
>>  7 files changed, 77 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>> index 23cdb51b879e..e51e9f052f63 100644
>> --- a/include/hw/ppc/spapr_irq.h
>> +++ b/include/hw/ppc/spapr_irq.h
>> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
>>      void (*free)(sPAPRMachineState *spapr, int irq, int num);
>>      qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
>>      void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
>> +    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
>> +                        void *fdt, uint32_t phandle);
>>  } sPAPRIrq;
>>  
>>  extern sPAPRIrq spapr_irq_xics;
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 9506a8f4d10a..728a5e8dc163 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>  
>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>> +                   uint32_t phandle);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 9958443d1984..14afda198cdb 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
>>      ICPState *(*icp_get)(XICSFabric *xi, int server);
>>  } XICSFabricClass;
>>  
>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>> -
>>  ICPState *xics_icp_get(XICSFabric *xi, int server);
>>  
>>  /* Internal XICS interfaces */
>> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
>>  
>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>  
>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>> +                   uint32_t phandle);
>>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>>  void xics_spapr_init(sPAPRMachineState *spapr);
>>  
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 982ac6e17051..a6d854b07690 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -14,6 +14,7 @@
>>  #include "target/ppc/cpu.h"
>>  #include "sysemu/cpus.h"
>>  #include "monitor/monitor.h"
>> +#include "hw/ppc/fdt.h"
>>  #include "hw/ppc/spapr.h"
>>  #include "hw/ppc/spapr_xive.h"
>>  #include "hw/ppc/xive.h"
>> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>  }
>> +
>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>> +                   uint32_t phandle)
>> +{
>> +    sPAPRXive *xive = spapr->xive;
>> +    int node;
>> +    uint64_t timas[2 * 2];
>> +    /* Interrupt number ranges for the IPIs */
>> +    uint32_t lisn_ranges[] = {
>> +        cpu_to_be32(0),
>> +        cpu_to_be32(nr_servers),
>> +    };
>> +    uint32_t eq_sizes[] = {
>> +        cpu_to_be32(12), /* 4K */
>> +        cpu_to_be32(16), /* 64K */
>> +        cpu_to_be32(21), /* 2M */
>> +        cpu_to_be32(24), /* 16M */
> 
> For KVM, are we going to need to clamp this list based on the
> pagesizes the guest can use?

I would say so. Is there a KVM service for that ?

Today, the OS scans the list and picks the size fitting its current 
PAGE_SIZE/SHIFT. But I suppose it would be better to advertise 
only the page sizes that QEMU/KVM supports. Or should we play safe 
and only export : 4K, 64K ? 


>> +    };
>> +    /* The following array is in sync with the reserved priorities
>> +     * defined by the 'spapr_xive_priority_is_reserved' routine.
>> +     */
>> +    uint32_t plat_res_int_priorities[] = {
>> +        cpu_to_be32(7),    /* start */
>> +        cpu_to_be32(0xf8), /* count */
>> +    };
>> +    gchar *nodename;
>> +
>> +    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
>> +    timas[0] = cpu_to_be64(xive->tm_base +
>> +                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
>> +    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
>> +    timas[2] = cpu_to_be64(xive->tm_base +
>> +                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
>> +    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
> 
> So this gives the MMIO address of the TIMA.  

Yes. It is considered to be a fixed "address" 

> Where does the guest get the MMIO address for the ESB pages from?

with the hcall H_INT_GET_SOURCE_INFO.

C.

>> +    nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
>> +                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
>> +    _FDT(node = fdt_add_subnode(fdt, 0, nodename));
>> +    g_free(nodename);
>> +
>> +    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
>> +    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
>> +
>> +    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
>> +                     sizeof(eq_sizes)));
>> +    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
>> +                     sizeof(lisn_ranges)));
>> +
>> +    /* For Linux to link the LSIs to the interrupt controller. */
>> +    _FDT(fdt_setprop(fdt, node, "interrupt-controller", NULL, 0));
>> +    _FDT(fdt_setprop_cell(fdt, node, "#interrupt-cells", 2));
>> +
>> +    /* For SLOF */
>> +    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
>> +    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
>> +
>> +    /* The "ibm,plat-res-int-priorities" property defines the priority
>> +     * ranges reserved by the hypervisor
>> +     */
>> +    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
>> +                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
>> +}
>> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
>> index 2e27b92b871a..f67d3c80bf3a 100644
>> --- a/hw/intc/xics_spapr.c
>> +++ b/hw/intc/xics_spapr.c
>> @@ -244,7 +244,8 @@ void xics_spapr_init(sPAPRMachineState *spapr)
>>      spapr_register_hypercall(H_IPOLL, h_ipoll);
>>  }
>>  
>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>> +                   uint32_t phandle)
>>  {
>>      uint32_t interrupt_server_ranges_prop[] = {
>>          0, cpu_to_be32(nr_servers),
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3f9fc73f7f59..8ff22cdb79d8 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1268,7 +1268,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>>      _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
>>  
>>      /* /interrupt controller */
>> -    spapr_dt_xics(spapr_max_server_number(spapr), fdt, PHANDLE_XICP);
>> +    smc->irq->dt_populate(spapr, spapr_max_server_number(spapr), fdt,
>> +                          PHANDLE_XICP);
>>  
>>      ret = spapr_populate_memory(spapr, fdt);
>>      if (ret < 0) {
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index d6768d0936f9..38ea2da7a094 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -204,6 +204,7 @@ sPAPRIrq spapr_irq_xics = {
>>      .free        = spapr_irq_free_xics,
>>      .qirq        = spapr_qirq_xics,
>>      .print_info  = spapr_irq_print_info_xics,
>> +    .dt_populate = spapr_dt_xics,
>>  };
>>  
>>  /*
>> @@ -318,6 +319,7 @@ sPAPRIrq spapr_irq_xive = {
>>      .free        = spapr_irq_free_xive,
>>      .qirq        = spapr_qirq_xive,
>>      .print_info  = spapr_irq_print_info_xive,
>> +    .dt_populate = spapr_dt_xive,
>>  };
>>  
>>  /*
>> @@ -422,4 +424,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
>>      .free        = spapr_irq_free_xics,
>>      .qirq        = spapr_qirq_xics,
>>      .print_info  = spapr_irq_print_info_xics,
>> +    .dt_populate = spapr_dt_xics,
>>  };
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' " Cédric Le Goater
@ 2018-12-10 22:17   ` Cédric Le Goater
  2018-12-11  2:06     ` David Gibson
  0 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-10 22:17 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/9/18 8:46 PM, Cédric Le Goater wrote:
> This pseries machine makes use of a new sPAPR IRQ backend supporting
> the XIVE interrupt mode.
> 
> The guest OS is required to have support for the XIVE exploitation
> mode of the POWER9 interrupt controller.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 4012ebd794a4..3cc134a0b673 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3985,6 +3985,21 @@ static void spapr_machine_4_0_class_options(MachineClass *mc)
>  
>  DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
>  
> +static void spapr_machine_4_0_xive_instance_options(MachineState *machine)
> +{
> +    spapr_machine_4_0_instance_options(machine);
> +}
> +
> +static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
> +{
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> +
> +    spapr_machine_4_0_class_options(mc);> +    smc->irq = &spapr_irq_xive;

I have been adding checks on the CPU model to export the XIVE capability 
only on POWER9 processors but it breaks some of the tests.

I was wondering if we could add a default POWER9 CPU to the -xive machine : 

  + mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");

and if we could change tests/cpu-plug-test.c with :

  @@ -198,8 +198,13 @@ static void add_pseries_test_case(const
       }
       data = g_new(PlugTestData, 1);
       data->machine = g_strdup(mname);
  -    data->cpu_model = "power8_v2.0";
  -    data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
  +    if (g_str_has_suffix(mname, "xive")) {
  +        data->cpu_model = "power9_v2.0";
  +        data->device_model = g_strdup("power9_v2.0-spapr-cpu-core");
  +    } else {
  +        data->cpu_model = "power8_v2.0";
  +        data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
  +    }
       data->sockets = 2;
       data->cores = 3;
       data->threads = 1;

or if there is a better way ? 

Thanks,

C.

> +}
> +
> +DEFINE_SPAPR_MACHINE(4_0_xive, "4.0-xive", false);
> +
>  /*
>   * pseries-3.1
>   */
> 
 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode
  2018-12-10  7:53     ` Cédric Le Goater
@ 2018-12-11  0:38       ` David Gibson
  2018-12-11  9:06         ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-11  0:38 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 8043 bytes --]

On Mon, Dec 10, 2018 at 08:53:17AM +0100, Cédric Le Goater wrote:
> On 12/10/18 7:39 AM, David Gibson wrote:
> > On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
> >> The XIVE interface for the guest is described in the device tree under
> >> the "interrupt-controller" node. A couple of new properties are
> >> specific to XIVE :
> >>
> >>  - "reg"
> >>
> >>    contains the base address and size of the thread interrupt
> >>    managnement areas (TIMA), for the User level and for the Guest OS
> >>    level. Only the Guest OS level is taken into account today.
> >>
> >>  - "ibm,xive-eq-sizes"
> >>
> >>    the size of the event queues. One cell per size supported, contains
> >>    log2 of size, in ascending order.
> >>
> >>  - "ibm,xive-lisn-ranges"
> >>
> >>    the IRQ interrupt number ranges assigned to the guest for the IPIs.
> >>
> >> and also under the root node :
> >>
> >>  - "ibm,plat-res-int-priorities"
> >>
> >>    contains a list of priorities that the hypervisor has reserved for
> >>    its own use. OPAL uses the priority 7 queue to automatically
> >>    escalate interrupts for all other queues (DD2.X POWER9). So only
> >>    priorities [0..6] are allowed for the guest.
> >>
> >> Extend the sPAPR IRQ backend with a new handler to populate the DT
> >> with the appropriate "interrupt-controller" node.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  include/hw/ppc/spapr_irq.h  |  2 ++
> >>  include/hw/ppc/spapr_xive.h |  2 ++
> >>  include/hw/ppc/xics.h       |  4 +--
> >>  hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
> >>  hw/intc/xics_spapr.c        |  3 +-
> >>  hw/ppc/spapr.c              |  3 +-
> >>  hw/ppc/spapr_irq.c          |  3 ++
> >>  7 files changed, 77 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> >> index 23cdb51b879e..e51e9f052f63 100644
> >> --- a/include/hw/ppc/spapr_irq.h
> >> +++ b/include/hw/ppc/spapr_irq.h
> >> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
> >>      void (*free)(sPAPRMachineState *spapr, int irq, int num);
> >>      qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
> >>      void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
> >> +    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
> >> +                        void *fdt, uint32_t phandle);
> >>  } sPAPRIrq;
> >>  
> >>  extern sPAPRIrq spapr_irq_xics;
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> index 9506a8f4d10a..728a5e8dc163 100644
> >> --- a/include/hw/ppc/spapr_xive.h
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
> >>  typedef struct sPAPRMachineState sPAPRMachineState;
> >>  
> >>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> >> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> >> +                   uint32_t phandle);
> >>  
> >>  #endif /* PPC_SPAPR_XIVE_H */
> >> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> >> index 9958443d1984..14afda198cdb 100644
> >> --- a/include/hw/ppc/xics.h
> >> +++ b/include/hw/ppc/xics.h
> >> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
> >>      ICPState *(*icp_get)(XICSFabric *xi, int server);
> >>  } XICSFabricClass;
> >>  
> >> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
> >> -
> >>  ICPState *xics_icp_get(XICSFabric *xi, int server);
> >>  
> >>  /* Internal XICS interfaces */
> >> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
> >>  
> >>  typedef struct sPAPRMachineState sPAPRMachineState;
> >>  
> >> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> >> +                   uint32_t phandle);
> >>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
> >>  void xics_spapr_init(sPAPRMachineState *spapr);
> >>  
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 982ac6e17051..a6d854b07690 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -14,6 +14,7 @@
> >>  #include "target/ppc/cpu.h"
> >>  #include "sysemu/cpus.h"
> >>  #include "monitor/monitor.h"
> >> +#include "hw/ppc/fdt.h"
> >>  #include "hw/ppc/spapr.h"
> >>  #include "hw/ppc/spapr_xive.h"
> >>  #include "hw/ppc/xive.h"
> >> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> >>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> >>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
> >>  }
> >> +
> >> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> >> +                   uint32_t phandle)
> >> +{
> >> +    sPAPRXive *xive = spapr->xive;
> >> +    int node;
> >> +    uint64_t timas[2 * 2];
> >> +    /* Interrupt number ranges for the IPIs */
> >> +    uint32_t lisn_ranges[] = {
> >> +        cpu_to_be32(0),
> >> +        cpu_to_be32(nr_servers),
> >> +    };
> >> +    uint32_t eq_sizes[] = {
> >> +        cpu_to_be32(12), /* 4K */
> >> +        cpu_to_be32(16), /* 64K */
> >> +        cpu_to_be32(21), /* 2M */
> >> +        cpu_to_be32(24), /* 16M */
> > 
> > For KVM, are we going to need to clamp this list based on the
> > pagesizes the guest can use?
> 
> I would say so. Is there a KVM service for that ?

I'm not sure what you mean by a KVM service here.

> Today, the OS scans the list and picks the size fitting its current 
> PAGE_SIZE/SHIFT. But I suppose it would be better to advertise 
> only the page sizes that QEMU/KVM supports. Or should we play safe 
> and only export : 4K, 64K ? 

So, I'm guessing the limitation here is that when we use the KVM XIVE
acceleration, the EQ will need to be host-contiguous?  So, we can't
have EQs larger than the backing pagesize for guest memory.

We try to avoid having guest visible differences based on whether
we're using KVM or not, so we don't want to change this list depending
on whether KVM is active etc. - or on whether we're backing the guest
with hugepages.

We could reuse the cap-hpt-max-page-size machine parameter which also
relates to backing page size, but that might be confusing since it's
explicitly about HPT only whereas this limitation would apply to RPT
guests too, IIUC.

I think our best bet is probably to limit to 64kiB queues
unconditionally.  AIUI, that still gives us 8192 slots in the queue,
which is enough to store one of every possible irq, which should be
enough.

There's still an issue if we have a 4kiB pagesize host kernel, but we
could just disable kernel_irqchip in that situation, since we don't
expect people to be using that much in practice.

> >> +    };
> >> +    /* The following array is in sync with the reserved priorities
> >> +     * defined by the 'spapr_xive_priority_is_reserved' routine.
> >> +     */
> >> +    uint32_t plat_res_int_priorities[] = {
> >> +        cpu_to_be32(7),    /* start */
> >> +        cpu_to_be32(0xf8), /* count */
> >> +    };
> >> +    gchar *nodename;
> >> +
> >> +    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
> >> +    timas[0] = cpu_to_be64(xive->tm_base +
> >> +                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
> >> +    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
> >> +    timas[2] = cpu_to_be64(xive->tm_base +
> >> +                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
> >> +    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
> > 
> > So this gives the MMIO address of the TIMA.  
> 
> Yes. It is considered to be a fixed "address" 
> 
> > Where does the guest get the MMIO address for the ESB pages from?
> 
> with the hcall H_INT_GET_SOURCE_INFO.

Ah, right.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter
  2018-12-10  7:15     ` Cédric Le Goater
@ 2018-12-11  1:37       ` David Gibson
  2018-12-11 10:43         ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-11  1:37 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2419 bytes --]

On Mon, Dec 10, 2018 at 08:15:40AM +0100, Cédric Le Goater wrote:
> On 12/10/18 5:27 AM, David Gibson wrote:
> > On Sun, Dec 09, 2018 at 08:45:54PM +0100, Cédric Le Goater wrote:
> >> The last sub-engine of the XIVE architecture is the Interrupt
> >> Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
> >> IVPE share elements, the Power Bus interface (CQ), the routing table
> >> descriptors, and they can be combined in the same HW logic. We do the
> >> same in QEMU and combine both engines in the XiveRouter for
> >> simplicity.
> >>
> >> When the IVRE has completed its job of matching an event source with a
> >> Notification Virtual Target (NVT) to notify, it forwards the event
> >> notification to the IVPE sub-engine. The IVPE scans the thread
> >> interrupt contexts of the Notification Virtual Targets (NVT)
> >> dispatched on the HW processor threads and if a match is found, it
> >> signals the thread. If not, the IVPE escalates the notification to
> >> some other targets and records the notification in a backlog queue.
> >>
> >> The IVPE maintains the thread interrupt context state for each of its
> >> NVTs not dispatched on HW processor threads in the Notification
> >> Virtual Target table (NVTT).
> >>
> >> The model currently only supports single NVT notifications.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > 
> > Applied.
> > 
> > I think the tctx_word2() should have the byteswap, rather than having
> > it in the callers, but that can be fixed later.
> 
> I thought it was better to explicitly show in the code where the 
> byteswaps were needed. Anyway, this is very localized, so, yes, 
> we can change it later on.

To clarify my thinking here; the important thing is not knowing where
the byteswaps are, but being able to tell as easily as possible what
endianness a given piece of data is right now.

The convention I'm aiming for here - which is one I try to use most
places is that structures - at least structures which map to specific
in-memory things - are in the required endianness for that stuff in
memory.  However bare integers - uint32_t or uint64_t or whatever -
are, well, numbers, in native endian.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset Cédric Le Goater
@ 2018-12-11  1:46   ` David Gibson
  2018-12-11 10:58     ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-11  1:46 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3611 bytes --]

On Sun, Dec 09, 2018 at 08:46:05PM +0100, Cédric Le Goater wrote:
> Currently, the interrupt presenter of the vCPU is set at realize
> time. Setting it at reset will become useful when the new machine
> supporting both interrupt modes is introduced. In this machine, the
> interrupt mode is chosen at CAS time and activated after a reset.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Shouldn't this also remove the code which sets cpu->intc at realize
time, in order to avoid confusion?

> ---
>  include/hw/ppc/spapr_cpu_core.h |  2 ++
>  hw/ppc/spapr_cpu_core.c         | 26 ++++++++++++++++++++++++++
>  hw/ppc/spapr_irq.c              | 12 ++++++++++++
>  3 files changed, 40 insertions(+)
> 
> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
> index 9e2821e4b31f..fc8ea9021656 100644
> --- a/include/hw/ppc/spapr_cpu_core.h
> +++ b/include/hw/ppc/spapr_cpu_core.h
> @@ -53,4 +53,6 @@ static inline sPAPRCPUState *spapr_cpu_state(PowerPCCPU *cpu)
>      return (sPAPRCPUState *)cpu->machine_data;
>  }
>  
> +void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type);
> +
>  #endif
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 1811cd48db90..529de0b6b9c8 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -398,3 +398,29 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
>  };
>  
>  DEFINE_TYPES(spapr_cpu_core_type_infos)
> +
> +typedef struct ForeachFindIntCArgs {
> +    const char *intc_type;
> +    Object *intc;
> +} ForeachFindIntCArgs;
> +
> +static int spapr_cpu_core_find_intc(Object *child, void *opaque)
> +{
> +    ForeachFindIntCArgs *args = opaque;
> +
> +    if (object_dynamic_cast(child, args->intc_type)) {
> +        args->intc = child;
> +    }
> +
> +    return args->intc != NULL;
> +}
> +
> +void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type)
> +{
> +    ForeachFindIntCArgs args = { intc_type, NULL };
> +
> +    object_child_foreach(OBJECT(cpu), spapr_cpu_core_find_intc, &args);
> +    g_assert(args.intc);
> +
> +    cpu->intc = args.intc;
> +}
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 7a0d4f529763..b423cee30e2c 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -12,6 +12,7 @@
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/ppc/spapr_xive.h"
>  #include "hw/ppc/xics.h"
>  #include "sysemu/kvm.h"
> @@ -211,6 +212,11 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
>  
>  static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
>  {
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
> +    }
>  }
>  
>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
> @@ -341,6 +347,12 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>  
>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>  {
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        spapr_cpu_core_set_intc(POWERPC_CPU(cs), TYPE_XIVE_TCTX);
> +    }
> +
>      /*
>       * Set the OS CAM line of the cpu interrupt thread context. Needs
>       * to come after the XiveTCTX reset handlers.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs at reset
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs " Cédric Le Goater
@ 2018-12-11  1:47   ` David Gibson
  2018-12-11 10:14     ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-11  1:47 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2916 bytes --]

On Sun, Dec 09, 2018 at 08:46:06PM +0100, Cédric Le Goater wrote:
> Depending on the interrupt mode chosen, enable or disable the XIVE
> MMIOs.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/spapr_xive.h | 1 +
>  hw/intc/spapr_xive.c        | 9 +++++++++
>  hw/ppc/spapr_irq.c          | 8 ++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 7244a6231ce6..308afb61a666 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -48,5 +48,6 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>                     uint32_t phandle);
>  void spapr_xive_reset_tctx(sPAPRXive *xive);
> +void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 560d8d031f74..c6dbb2e8cfc7 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -179,6 +179,15 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
>  }
>  
> +void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable)

The logic looks fine, but I dislike this name - it's called
..._enable() when it can also be used to disable.

> +{
> +    memory_region_set_enabled(&xive->source.esb_mmio, enable);
> +    memory_region_set_enabled(&xive->tm_mmio, enable);
> +
> +    /* Disable the END ESBs until a guest OS makes use of them */
> +    memory_region_set_enabled(&xive->end_source.esb_mmio, false);
> +}
> +
>  /*
>   * When a Virtual Processor is scheduled to run on a HW thread, the
>   * hypervisor pushes its identifier in the OS CAM line. Emulate the
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index b423cee30e2c..a8e50725397c 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -217,6 +217,11 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
>      CPU_FOREACH(cs) {
>          spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
>      }
> +
> +    /* Deactivate the XIVE MMIOs */
> +    if (spapr->xive) {
> +        spapr_xive_enable_mmio(spapr->xive, false);
> +    }
>  }
>  
>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
> @@ -358,6 +363,9 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>       * to come after the XiveTCTX reset handlers.
>       */
>      spapr_xive_reset_tctx(spapr->xive);
> +
> +    /* Activate the XIVE MMIOs */
> +    spapr_xive_enable_mmio(spapr->xive, true);
>  }
>  
>  /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS Cédric Le Goater
@ 2018-12-11  2:03   ` David Gibson
  2018-12-11 10:19     ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-11  2:03 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 8579 bytes --]

On Sun, Dec 09, 2018 at 08:46:07PM +0100, Cédric Le Goater wrote:
> The interrupt mode is chosen by the CAS negotiation process and
> activated after a reset to take into account the required changes in
> the machine. These impact the device tree layout, the interrupt
> presenter object and the exposed MMIO regions in the case of XIVE.
> 
> This default interrupt mode for the machine is XICS.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/spapr_irq.h |   1 +
>  hw/ppc/spapr.c             |   3 +-
>  hw/ppc/spapr_hcall.c       |  13 ++++
>  hw/ppc/spapr_irq.c         | 143 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 159 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index b34d5a00381b..29936498dbc8 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -51,6 +51,7 @@ typedef struct sPAPRIrq {
>  extern sPAPRIrq spapr_irq_xics;
>  extern sPAPRIrq spapr_irq_xics_legacy;
>  extern sPAPRIrq spapr_irq_xive;
> +extern sPAPRIrq spapr_irq_dual;
>  
>  void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
>  int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5ef87a00f68b..fa41927d95dd 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2631,7 +2631,8 @@ static void spapr_machine_init(MachineState *machine)
>      spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
>  
>      /* advertise XIVE */
> -    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT) {
> +    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT ||
> +        smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH) {
>          spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
>      }
>  
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index ae913d070f50..186b6a65543f 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1654,6 +1654,19 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
>                                            ov5_updates) != 0);
>      }
> +
> +    /*
> +     * Generate a machine reset when we have an update of the
> +     * interrupt mode. Only required on the machine supporting both
> +     * mode.
> +     */
> +    if (!spapr->cas_reboot) {
> +        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +
> +        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)
> +            && smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH;
> +    }
> +
>      spapr_ovec_cleanup(ov5_updates);
>  
>      if (spapr->cas_reboot) {
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index a8e50725397c..7c34939f774a 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -392,6 +392,149 @@ sPAPRIrq spapr_irq_xive = {
>      .reset       = spapr_irq_reset_xive,
>  };
>  
> +/*
> + * Dual XIVE and XICS IRQ backend.
> + *
> + * Both interrupt mode, XIVE and XICS, objects are created but the
> + * machine starts in legacy interrupt mode (XICS). It can be changed
> + * by the CAS negotiation process and, in that case, the new mode is
> + * activated after extra machine reset.
> + */
> +
> +/*
> + * Returns the sPAPR IRQ backend negotiated by CAS. XICS is the
> + * default.
> + */
> +static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
> +{
> +    return spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT) ?
> +        &spapr_irq_xive : &spapr_irq_xics;
> +}
> +
> +static void spapr_irq_init_dual(sPAPRMachineState *spapr, Error **errp)
> +{
> +    MachineState *machine = MACHINE(spapr);
> +    Error *local_err = NULL;
> +
> +    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
> +        error_setg(errp, "No KVM support for the 'dual' machine");
> +        return;
> +    }
> +
> +    spapr_irq_xics.init(spapr, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    spapr_irq_xive.init(spapr, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +static int spapr_irq_claim_dual(sPAPRMachineState *spapr, int irq, bool lsi,
> +                                Error **errp)
> +{
> +    int ret;
> +    Error *local_err = NULL;
> +
> +    ret = spapr_irq_xive.claim(spapr, irq, lsi, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return ret;
> +    }
> +
> +    ret = spapr_irq_xics.claim(spapr, irq, lsi, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }
> +
> +    return ret;
> +}
> +
> +static void spapr_irq_free_dual(sPAPRMachineState *spapr, int irq, int num)
> +{
> +    spapr_irq_xive.free(spapr, irq, num);
> +    spapr_irq_xics.free(spapr, irq, num);
> +}
> +
> +static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq)
> +{
> +    return spapr_irq_current(spapr)->qirq(spapr, irq);

Urgh... I don't think this is going to work.  IIRC the various devices
(PHB, VIO, etc.)  are wired up to their qirqs at realize() time, so if
you reboot from a XIVE guest to XICS guest (or maybe the other way
around) the peripherals won't be able to signal irqs in the new
scheme.

I think instead we need a common set of qirqs, whose set_irq routine
looks at whether to signal XICS or XIVE.  FOr now I think the easiest
approach is to layer those on top of the existing XICS or XIVE
specific qirqs.  Later we might want to remove the (input) qirqs
entirely from the XICS and XIVE subsystems, instead having just
explicit trigger functions.  Then spapr will always supply the qirqs
which call into one or the other.

> +}
> +
> +static void spapr_irq_print_info_dual(sPAPRMachineState *spapr, Monitor *mon)
> +{
> +    spapr_irq_current(spapr)->print_info(spapr, mon);
> +}
> +
> +static void spapr_irq_dt_populate_dual(sPAPRMachineState *spapr,
> +                                       uint32_t nr_servers, void *fdt,
> +                                       uint32_t phandle)
> +{
> +    spapr_irq_current(spapr)->dt_populate(spapr, nr_servers, fdt, phandle);
> +}
> +
> +static Object *spapr_irq_cpu_intc_create_dual(sPAPRMachineState *spapr,
> +                                              Object *cpu, Error **errp)
> +{
> +    Error *local_err = NULL;
> +
> +    spapr_irq_xive.cpu_intc_create(spapr, cpu, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +
> +    /* Default to XICS interrupt mode */
> +    return spapr_irq_xics.cpu_intc_create(spapr, cpu, errp);
> +}
> +
> +static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
> +{
> +    /*
> +     * Force a reset of the XIVE backend after migration. The machine
> +     * defaults to XICS at startup.
> +     */
> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        spapr_irq_xive.reset(spapr, &error_fatal);
> +    }
> +
> +    return spapr_irq_current(spapr)->post_load(spapr, version_id);
> +}
> +
> +static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
> +{
> +    /*
> +     * Reset the interrupt mode selected by CAS.
> +     */
> +    spapr_irq_current(spapr)->reset(spapr, errp);
> +}
> +
> +/*
> + * Define values in sync with the XIVE and XICS backend
> + */
> +#define SPAPR_IRQ_DUAL_NR_IRQS     0x2000
> +#define SPAPR_IRQ_DUAL_NR_MSIS     (SPAPR_IRQ_DUAL_NR_IRQS - SPAPR_IRQ_MSI)
> +
> +sPAPRIrq spapr_irq_dual = {
> +    .nr_irqs     = SPAPR_IRQ_DUAL_NR_IRQS,
> +    .nr_msis     = SPAPR_IRQ_DUAL_NR_MSIS,
> +    .ov5         = SPAPR_OV5_XIVE_BOTH,
> +
> +    .init        = spapr_irq_init_dual,
> +    .claim       = spapr_irq_claim_dual,
> +    .free        = spapr_irq_free_dual,
> +    .qirq        = spapr_qirq_dual,
> +    .print_info  = spapr_irq_print_info_dual,
> +    .dt_populate = spapr_irq_dt_populate_dual,
> +    .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
> +    .post_load   = spapr_irq_post_load_dual,
> +    .reset       = spapr_irq_reset_dual,
> +};
> +
>  /*
>   * sPAPR IRQ frontend routines for devices
>   */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-10 22:17   ` Cédric Le Goater
@ 2018-12-11  2:06     ` David Gibson
  2018-12-11 10:42       ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-11  2:06 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]

On Mon, Dec 10, 2018 at 11:17:33PM +0100, Cédric Le Goater wrote:
> On 12/9/18 8:46 PM, Cédric Le Goater wrote:
> > This pseries machine makes use of a new sPAPR IRQ backend supporting
> > the XIVE interrupt mode.
> > 
> > The guest OS is required to have support for the XIVE exploitation
> > mode of the POWER9 interrupt controller.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  hw/ppc/spapr.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 4012ebd794a4..3cc134a0b673 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -3985,6 +3985,21 @@ static void spapr_machine_4_0_class_options(MachineClass *mc)
> >  
> >  DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
> >  
> > +static void spapr_machine_4_0_xive_instance_options(MachineState *machine)
> > +{
> > +    spapr_machine_4_0_instance_options(machine);
> > +}
> > +
> > +static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
> > +{
> > +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> > +
> > +    spapr_machine_4_0_class_options(mc);> +    smc->irq = &spapr_irq_xive;
> 
> I have been adding checks on the CPU model to export the XIVE capability 
> only on POWER9 processors but it breaks some of the tests.
> 
> I was wondering if we could add a default POWER9 CPU to the -xive machine : 
> 
>   + mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
> 
> and if we could change tests/cpu-plug-test.c with :
> 
>   @@ -198,8 +198,13 @@ static void add_pseries_test_case(const
>        }
>        data = g_new(PlugTestData, 1);
>        data->machine = g_strdup(mname);
>   -    data->cpu_model = "power8_v2.0";
>   -    data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
>   +    if (g_str_has_suffix(mname, "xive")) {
>   +        data->cpu_model = "power9_v2.0";
>   +        data->device_model = g_strdup("power9_v2.0-spapr-cpu-core");
>   +    } else {
>   +        data->cpu_model = "power8_v2.0";
>   +        data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
>   +    }
>        data->sockets = 2;
>        data->cores = 3;
>        data->threads = 1;
> 
> or if there is a better way ?

So, I'd actually prefer a machine option, rather than wholly separate
machine types to select xics/xive/dual.  Machine types was fine while
prototyping this, but I don't think we want to actually merge new
machine types for it.

So, instead I think we want a machine option which can be set to
xics/xive/dual, with xics being the default for earlier machine types
and dual the default for 4.0 onwards.  We can make POWER9 the default
cpu for 4.0 onwards as well, if you want.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode
  2018-12-11  0:38       ` David Gibson
@ 2018-12-11  9:06         ` Cédric Le Goater
  2018-12-12  0:19           ` David Gibson
  0 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11  9:06 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/11/18 1:38 AM, David Gibson wrote:
> On Mon, Dec 10, 2018 at 08:53:17AM +0100, Cédric Le Goater wrote:
>> On 12/10/18 7:39 AM, David Gibson wrote:
>>> On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
>>>> The XIVE interface for the guest is described in the device tree under
>>>> the "interrupt-controller" node. A couple of new properties are
>>>> specific to XIVE :
>>>>
>>>>  - "reg"
>>>>
>>>>    contains the base address and size of the thread interrupt
>>>>    managnement areas (TIMA), for the User level and for the Guest OS
>>>>    level. Only the Guest OS level is taken into account today.
>>>>
>>>>  - "ibm,xive-eq-sizes"
>>>>
>>>>    the size of the event queues. One cell per size supported, contains
>>>>    log2 of size, in ascending order.
>>>>
>>>>  - "ibm,xive-lisn-ranges"
>>>>
>>>>    the IRQ interrupt number ranges assigned to the guest for the IPIs.
>>>>
>>>> and also under the root node :
>>>>
>>>>  - "ibm,plat-res-int-priorities"
>>>>
>>>>    contains a list of priorities that the hypervisor has reserved for
>>>>    its own use. OPAL uses the priority 7 queue to automatically
>>>>    escalate interrupts for all other queues (DD2.X POWER9). So only
>>>>    priorities [0..6] are allowed for the guest.
>>>>
>>>> Extend the sPAPR IRQ backend with a new handler to populate the DT
>>>> with the appropriate "interrupt-controller" node.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  include/hw/ppc/spapr_irq.h  |  2 ++
>>>>  include/hw/ppc/spapr_xive.h |  2 ++
>>>>  include/hw/ppc/xics.h       |  4 +--
>>>>  hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
>>>>  hw/intc/xics_spapr.c        |  3 +-
>>>>  hw/ppc/spapr.c              |  3 +-
>>>>  hw/ppc/spapr_irq.c          |  3 ++
>>>>  7 files changed, 77 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>>>> index 23cdb51b879e..e51e9f052f63 100644
>>>> --- a/include/hw/ppc/spapr_irq.h
>>>> +++ b/include/hw/ppc/spapr_irq.h
>>>> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
>>>>      void (*free)(sPAPRMachineState *spapr, int irq, int num);
>>>>      qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
>>>>      void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
>>>> +    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
>>>> +                        void *fdt, uint32_t phandle);
>>>>  } sPAPRIrq;
>>>>  
>>>>  extern sPAPRIrq spapr_irq_xics;
>>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>>>> index 9506a8f4d10a..728a5e8dc163 100644
>>>> --- a/include/hw/ppc/spapr_xive.h
>>>> +++ b/include/hw/ppc/spapr_xive.h
>>>> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
>>>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>>>  
>>>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>>> +                   uint32_t phandle);
>>>>  
>>>>  #endif /* PPC_SPAPR_XIVE_H */
>>>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>>>> index 9958443d1984..14afda198cdb 100644
>>>> --- a/include/hw/ppc/xics.h
>>>> +++ b/include/hw/ppc/xics.h
>>>> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
>>>>      ICPState *(*icp_get)(XICSFabric *xi, int server);
>>>>  } XICSFabricClass;
>>>>  
>>>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>>>> -
>>>>  ICPState *xics_icp_get(XICSFabric *xi, int server);
>>>>  
>>>>  /* Internal XICS interfaces */
>>>> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
>>>>  
>>>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>>>  
>>>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>>> +                   uint32_t phandle);
>>>>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>>>>  void xics_spapr_init(sPAPRMachineState *spapr);
>>>>  
>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>> index 982ac6e17051..a6d854b07690 100644
>>>> --- a/hw/intc/spapr_xive.c
>>>> +++ b/hw/intc/spapr_xive.c
>>>> @@ -14,6 +14,7 @@
>>>>  #include "target/ppc/cpu.h"
>>>>  #include "sysemu/cpus.h"
>>>>  #include "monitor/monitor.h"
>>>> +#include "hw/ppc/fdt.h"
>>>>  #include "hw/ppc/spapr.h"
>>>>  #include "hw/ppc/spapr_xive.h"
>>>>  #include "hw/ppc/xive.h"
>>>> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>>>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>>>  }
>>>> +
>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>>> +                   uint32_t phandle)
>>>> +{
>>>> +    sPAPRXive *xive = spapr->xive;
>>>> +    int node;
>>>> +    uint64_t timas[2 * 2];
>>>> +    /* Interrupt number ranges for the IPIs */
>>>> +    uint32_t lisn_ranges[] = {
>>>> +        cpu_to_be32(0),
>>>> +        cpu_to_be32(nr_servers),
>>>> +    };
>>>> +    uint32_t eq_sizes[] = {
>>>> +        cpu_to_be32(12), /* 4K */
>>>> +        cpu_to_be32(16), /* 64K */
>>>> +        cpu_to_be32(21), /* 2M */
>>>> +        cpu_to_be32(24), /* 16M */
>>>
>>> For KVM, are we going to need to clamp this list based on the
>>> pagesizes the guest can use?
>>
>> I would say so. Is there a KVM service for that ?
> 
> I'm not sure what you mean by a KVM service here.

I meant a routine giving a list of the possible backing pagesizes.
 
>> Today, the OS scans the list and picks the size fitting its current 
>> PAGE_SIZE/SHIFT. But I suppose it would be better to advertise 
>> only the page sizes that QEMU/KVM supports. Or should we play safe 
>> and only export : 4K, 64K ? 
> 
> So, I'm guessing the limitation here is that when we use the KVM XIVE
> acceleration, the EQ will need to be host-contiguous?  

The EQ should be a single page (from the specs). So we don't have 
a contiguity problem. 

> So, we can't have EQs larger than the backing pagesize for guest 
> memory.
>
 > We try to avoid having guest visible differences based on whether
> we're using KVM or not, so we don't want to change this list depending
> on whether KVM is active etc. - or on whether we're backing the guest
> with hugepages.
> 
> We could reuse the cap-hpt-max-page-size machine parameter which also
> relates to backing page size, but that might be confusing since it's
> explicitly about HPT only whereas this limitation would apply to RPT
> guests too, IIUC.
> 
> I think our best bet is probably to limit to 64kiB queues
> unconditionally.  

OK. That's the default. We can refine this list of page sizes later on
when we introduce KVM support if needed.

> AIUI, that still gives us 8192 slots in the queue,

The entries are 32bits. So that's 16384 entries. 

> which is enough to store one of every possible irq, which should be
> enough.

yes. 

I don't think Linux measures how much entries are dequeued at once.
It would give us a feel of how much pressure these EQs can sustain.

C.
 
> There's still an issue if we have a 4kiB pagesize host kernel, but we
> could just disable kernel_irqchip in that situation, since we don't
> expect people to be using that much in practice.
> 
>>>> +    };
>>>> +    /* The following array is in sync with the reserved priorities
>>>> +     * defined by the 'spapr_xive_priority_is_reserved' routine.
>>>> +     */
>>>> +    uint32_t plat_res_int_priorities[] = {
>>>> +        cpu_to_be32(7),    /* start */
>>>> +        cpu_to_be32(0xf8), /* count */
>>>> +    };
>>>> +    gchar *nodename;
>>>> +
>>>> +    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
>>>> +    timas[0] = cpu_to_be64(xive->tm_base +
>>>> +                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
>>>> +    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
>>>> +    timas[2] = cpu_to_be64(xive->tm_base +
>>>> +                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
>>>> +    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
>>>
>>> So this gives the MMIO address of the TIMA.  
>>
>> Yes. It is considered to be a fixed "address" 
>>
>>> Where does the guest get the MMIO address for the ESB pages from?
>>
>> with the hcall H_INT_GET_SOURCE_INFO.
> 
> Ah, right.
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs at reset
  2018-12-11  1:47   ` David Gibson
@ 2018-12-11 10:14     ` Cédric Le Goater
  2018-12-12  0:32       ` David Gibson
  0 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11 10:14 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/11/18 2:47 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:46:06PM +0100, Cédric Le Goater wrote:
>> Depending on the interrupt mode chosen, enable or disable the XIVE
>> MMIOs.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/spapr_xive.h | 1 +
>>  hw/intc/spapr_xive.c        | 9 +++++++++
>>  hw/ppc/spapr_irq.c          | 8 ++++++++
>>  3 files changed, 18 insertions(+)
>>
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 7244a6231ce6..308afb61a666 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -48,5 +48,6 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>                     uint32_t phandle);
>>  void spapr_xive_reset_tctx(sPAPRXive *xive);
>> +void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 560d8d031f74..c6dbb2e8cfc7 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -179,6 +179,15 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
>>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
>>  }
>>  
>> +void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable)
> 
> The logic looks fine, but I dislike this name - it's called
> ..._enable() when it can also be used to disable.

ok. Let's call it spapr_xive_mmio_set_enabled() then

C.
 
>> +{
>> +    memory_region_set_enabled(&xive->source.esb_mmio, enable);
>> +    memory_region_set_enabled(&xive->tm_mmio, enable);
>> +
>> +    /* Disable the END ESBs until a guest OS makes use of them */
>> +    memory_region_set_enabled(&xive->end_source.esb_mmio, false);
>> +}
>> +
>>  /*
>>   * When a Virtual Processor is scheduled to run on a HW thread, the
>>   * hypervisor pushes its identifier in the OS CAM line. Emulate the
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index b423cee30e2c..a8e50725397c 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -217,6 +217,11 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
>>      CPU_FOREACH(cs) {
>>          spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
>>      }
>> +
>> +    /* Deactivate the XIVE MMIOs */
>> +    if (spapr->xive) {
>> +        spapr_xive_enable_mmio(spapr->xive, false);
>> +    }
>>  }
>>  
>>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
>> @@ -358,6 +363,9 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>>       * to come after the XiveTCTX reset handlers.
>>       */
>>      spapr_xive_reset_tctx(spapr->xive);
>> +
>> +    /* Activate the XIVE MMIOs */
>> +    spapr_xive_enable_mmio(spapr->xive, true);
>>  }
>>  
>>  /*
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  2018-12-11  2:03   ` David Gibson
@ 2018-12-11 10:19     ` Cédric Le Goater
  2018-12-12  0:54       ` David Gibson
  0 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11 10:19 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/11/18 3:03 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:46:07PM +0100, Cédric Le Goater wrote:
>> The interrupt mode is chosen by the CAS negotiation process and
>> activated after a reset to take into account the required changes in
>> the machine. These impact the device tree layout, the interrupt
>> presenter object and the exposed MMIO regions in the case of XIVE.
>>
>> This default interrupt mode for the machine is XICS.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/spapr_irq.h |   1 +
>>  hw/ppc/spapr.c             |   3 +-
>>  hw/ppc/spapr_hcall.c       |  13 ++++
>>  hw/ppc/spapr_irq.c         | 143 +++++++++++++++++++++++++++++++++++++
>>  4 files changed, 159 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>> index b34d5a00381b..29936498dbc8 100644
>> --- a/include/hw/ppc/spapr_irq.h
>> +++ b/include/hw/ppc/spapr_irq.h
>> @@ -51,6 +51,7 @@ typedef struct sPAPRIrq {
>>  extern sPAPRIrq spapr_irq_xics;
>>  extern sPAPRIrq spapr_irq_xics_legacy;
>>  extern sPAPRIrq spapr_irq_xive;
>> +extern sPAPRIrq spapr_irq_dual;
>>  
>>  void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
>>  int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 5ef87a00f68b..fa41927d95dd 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2631,7 +2631,8 @@ static void spapr_machine_init(MachineState *machine)
>>      spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
>>  
>>      /* advertise XIVE */
>> -    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT) {
>> +    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT ||
>> +        smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH) {
>>          spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
>>      }
>>  
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index ae913d070f50..186b6a65543f 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -1654,6 +1654,19 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
>>                                            ov5_updates) != 0);
>>      }
>> +
>> +    /*
>> +     * Generate a machine reset when we have an update of the
>> +     * interrupt mode. Only required on the machine supporting both
>> +     * mode.
>> +     */
>> +    if (!spapr->cas_reboot) {
>> +        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>> +
>> +        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)
>> +            && smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH;
>> +    }
>> +
>>      spapr_ovec_cleanup(ov5_updates);
>>  
>>      if (spapr->cas_reboot) {
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index a8e50725397c..7c34939f774a 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -392,6 +392,149 @@ sPAPRIrq spapr_irq_xive = {
>>      .reset       = spapr_irq_reset_xive,
>>  };
>>  
>> +/*
>> + * Dual XIVE and XICS IRQ backend.
>> + *
>> + * Both interrupt mode, XIVE and XICS, objects are created but the
>> + * machine starts in legacy interrupt mode (XICS). It can be changed
>> + * by the CAS negotiation process and, in that case, the new mode is
>> + * activated after extra machine reset.
>> + */
>> +
>> +/*
>> + * Returns the sPAPR IRQ backend negotiated by CAS. XICS is the
>> + * default.
>> + */
>> +static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
>> +{
>> +    return spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT) ?
>> +        &spapr_irq_xive : &spapr_irq_xics;
>> +}
>> +
>> +static void spapr_irq_init_dual(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +    MachineState *machine = MACHINE(spapr);
>> +    Error *local_err = NULL;
>> +
>> +    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
>> +        error_setg(errp, "No KVM support for the 'dual' machine");
>> +        return;
>> +    }
>> +
>> +    spapr_irq_xics.init(spapr, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    spapr_irq_xive.init(spapr, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +}
>> +
>> +static int spapr_irq_claim_dual(sPAPRMachineState *spapr, int irq, bool lsi,
>> +                                Error **errp)
>> +{
>> +    int ret;
>> +    Error *local_err = NULL;
>> +
>> +    ret = spapr_irq_xive.claim(spapr, irq, lsi, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return ret;
>> +    }
>> +
>> +    ret = spapr_irq_xics.claim(spapr, irq, lsi, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static void spapr_irq_free_dual(sPAPRMachineState *spapr, int irq, int num)
>> +{
>> +    spapr_irq_xive.free(spapr, irq, num);
>> +    spapr_irq_xics.free(spapr, irq, num);
>> +}
>> +
>> +static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq)
>> +{
>> +    return spapr_irq_current(spapr)->qirq(spapr, irq);
> 
> Urgh... I don't think this is going to work.  IIRC the various devices
> (PHB, VIO, etc.)  are wired up to their qirqs at realize() time, so if
> you reboot from a XIVE guest to XICS guest (or maybe the other way
> around) the peripherals won't be able to signal irqs in the new
> scheme.

It does. The IRQ numbers are claimed in both backends. 

This is the problem since the very beginning. For reset and migration
to work, we need to keep in sync the IRQ number space of the machine 
and the different interrupt controllers. 

C. 


> I think instead we need a common set of qirqs, whose set_irq routine
> looks at whether to signal XICS or XIVE.  FOr now I think the easiest
> approach is to layer those on top of the existing XICS or XIVE
> specific qirqs.  Later we might want to remove the (input) qirqs
> entirely from the XICS and XIVE subsystems, instead having just
> explicit trigger functions.  Then spapr will always supply the qirqs
> which call into one or the other.
> 
>> +}
>> +
>> +static void spapr_irq_print_info_dual(sPAPRMachineState *spapr, Monitor *mon)
>> +{
>> +    spapr_irq_current(spapr)->print_info(spapr, mon);
>> +}
>> +
>> +static void spapr_irq_dt_populate_dual(sPAPRMachineState *spapr,
>> +                                       uint32_t nr_servers, void *fdt,
>> +                                       uint32_t phandle)
>> +{
>> +    spapr_irq_current(spapr)->dt_populate(spapr, nr_servers, fdt, phandle);
>> +}
>> +
>> +static Object *spapr_irq_cpu_intc_create_dual(sPAPRMachineState *spapr,
>> +                                              Object *cpu, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +
>> +    spapr_irq_xive.cpu_intc_create(spapr, cpu, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return NULL;
>> +    }
>> +
>> +    /* Default to XICS interrupt mode */
>> +    return spapr_irq_xics.cpu_intc_create(spapr, cpu, errp);
>> +}
>> +
>> +static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
>> +{
>> +    /*
>> +     * Force a reset of the XIVE backend after migration. The machine
>> +     * defaults to XICS at startup.
>> +     */
>> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>> +        spapr_irq_xive.reset(spapr, &error_fatal);
>> +    }
>> +
>> +    return spapr_irq_current(spapr)->post_load(spapr, version_id);
>> +}
>> +
>> +static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +    /*
>> +     * Reset the interrupt mode selected by CAS.
>> +     */
>> +    spapr_irq_current(spapr)->reset(spapr, errp);
>> +}
>> +
>> +/*
>> + * Define values in sync with the XIVE and XICS backend
>> + */
>> +#define SPAPR_IRQ_DUAL_NR_IRQS     0x2000
>> +#define SPAPR_IRQ_DUAL_NR_MSIS     (SPAPR_IRQ_DUAL_NR_IRQS - SPAPR_IRQ_MSI)
>> +
>> +sPAPRIrq spapr_irq_dual = {
>> +    .nr_irqs     = SPAPR_IRQ_DUAL_NR_IRQS,
>> +    .nr_msis     = SPAPR_IRQ_DUAL_NR_MSIS,
>> +    .ov5         = SPAPR_OV5_XIVE_BOTH,
>> +
>> +    .init        = spapr_irq_init_dual,
>> +    .claim       = spapr_irq_claim_dual,
>> +    .free        = spapr_irq_free_dual,
>> +    .qirq        = spapr_qirq_dual,
>> +    .print_info  = spapr_irq_print_info_dual,
>> +    .dt_populate = spapr_irq_dt_populate_dual,
>> +    .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
>> +    .post_load   = spapr_irq_post_load_dual,
>> +    .reset       = spapr_irq_reset_dual,
>> +};
>> +
>>  /*
>>   * sPAPR IRQ frontend routines for devices
>>   */
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-11  2:06     ` David Gibson
@ 2018-12-11 10:42       ` Cédric Le Goater
  2018-12-11 16:44         ` Cédric Le Goater
  2018-12-12  0:34         ` David Gibson
  0 siblings, 2 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11 10:42 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/11/18 3:06 AM, David Gibson wrote:
> On Mon, Dec 10, 2018 at 11:17:33PM +0100, Cédric Le Goater wrote:
>> On 12/9/18 8:46 PM, Cédric Le Goater wrote:
>>> This pseries machine makes use of a new sPAPR IRQ backend supporting
>>> the XIVE interrupt mode.
>>>
>>> The guest OS is required to have support for the XIVE exploitation
>>> mode of the POWER9 interrupt controller.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>>  hw/ppc/spapr.c | 15 +++++++++++++++
>>>  1 file changed, 15 insertions(+)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 4012ebd794a4..3cc134a0b673 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -3985,6 +3985,21 @@ static void spapr_machine_4_0_class_options(MachineClass *mc)
>>>  
>>>  DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
>>>  
>>> +static void spapr_machine_4_0_xive_instance_options(MachineState *machine)
>>> +{
>>> +    spapr_machine_4_0_instance_options(machine);
>>> +}
>>> +
>>> +static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
>>> +{
>>> +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
>>> +
>>> +    spapr_machine_4_0_class_options(mc);> +    smc->irq = &spapr_irq_xive;
>>
>> I have been adding checks on the CPU model to export the XIVE capability 
>> only on POWER9 processors but it breaks some of the tests.
>>
>> I was wondering if we could add a default POWER9 CPU to the -xive machine : 
>>
>>   + mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
>>
>> and if we could change tests/cpu-plug-test.c with :
>>
>>   @@ -198,8 +198,13 @@ static void add_pseries_test_case(const
>>        }
>>        data = g_new(PlugTestData, 1);
>>        data->machine = g_strdup(mname);
>>   -    data->cpu_model = "power8_v2.0";
>>   -    data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
>>   +    if (g_str_has_suffix(mname, "xive")) {
>>   +        data->cpu_model = "power9_v2.0";
>>   +        data->device_model = g_strdup("power9_v2.0-spapr-cpu-core");
>>   +    } else {
>>   +        data->cpu_model = "power8_v2.0";
>>   +        data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
>>   +    }
>>        data->sockets = 2;
>>        data->cores = 3;
>>        data->threads = 1;
>>
>> or if there is a better way ?
> 
> So, I'd actually prefer a machine option, rather than wholly separate
> machine types to select xics/xive/dual.  Machine types was fine while
> prototyping this, but I don't think we want to actually merge new
> machine types for it.

I agree. 

> So, instead I think we want a machine option which can be set to
> xics/xive/dual, with xics being the default for earlier machine types
> and dual the default for 4.0 onwards.

I will revive an old patch doing just that. 

The question now is how to link the sPAPRMachineState instance to 
the selected sPAPR IRQ backend. 

I don't think we can move 'smc->irq' to sPAPRMachineState. So we will
need an helper returning the appropriate backend depending on the machine 
option and 'smc->irq' should disappear.

> We can make POWER9 the default cpu for 4.0 onwards as well, if you want.

OK.

C.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter
  2018-12-11  1:37       ` David Gibson
@ 2018-12-11 10:43         ` Cédric Le Goater
  0 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11 10:43 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/11/18 2:37 AM, David Gibson wrote:
> On Mon, Dec 10, 2018 at 08:15:40AM +0100, Cédric Le Goater wrote:
>> On 12/10/18 5:27 AM, David Gibson wrote:
>>> On Sun, Dec 09, 2018 at 08:45:54PM +0100, Cédric Le Goater wrote:
>>>> The last sub-engine of the XIVE architecture is the Interrupt
>>>> Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
>>>> IVPE share elements, the Power Bus interface (CQ), the routing table
>>>> descriptors, and they can be combined in the same HW logic. We do the
>>>> same in QEMU and combine both engines in the XiveRouter for
>>>> simplicity.
>>>>
>>>> When the IVRE has completed its job of matching an event source with a
>>>> Notification Virtual Target (NVT) to notify, it forwards the event
>>>> notification to the IVPE sub-engine. The IVPE scans the thread
>>>> interrupt contexts of the Notification Virtual Targets (NVT)
>>>> dispatched on the HW processor threads and if a match is found, it
>>>> signals the thread. If not, the IVPE escalates the notification to
>>>> some other targets and records the notification in a backlog queue.
>>>>
>>>> The IVPE maintains the thread interrupt context state for each of its
>>>> NVTs not dispatched on HW processor threads in the Notification
>>>> Virtual Target table (NVTT).
>>>>
>>>> The model currently only supports single NVT notifications.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> Applied.
>>>
>>> I think the tctx_word2() should have the byteswap, rather than having
>>> it in the callers, but that can be fixed later.
>>
>> I thought it was better to explicitly show in the code where the 
>> byteswaps were needed. Anyway, this is very localized, so, yes, 
>> we can change it later on.
> 
> To clarify my thinking here; the important thing is not knowing where
> the byteswaps are, but being able to tell as easily as possible what
> endianness a given piece of data is right now.
> 
> The convention I'm aiming for here - which is one I try to use most
> places is that structures - at least structures which map to specific
> in-memory things - are in the required endianness for that stuff in
> memory.  However bare integers - uint32_t or uint64_t or whatever -
> are, well, numbers, in native endian.
> 

ok. Adding that to the TODO list.

C.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend
  2018-12-10  6:42   ` David Gibson
  2018-12-10  7:30     ` Cédric Le Goater
@ 2018-12-11 10:55     ` Cédric Le Goater
  1 sibling, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11 10:55 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/10/18 7:42 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:46:03PM +0100, Cédric Le Goater wrote:
>> For the time being, the XIVE reset handler updates the OS CAM line of
>> the vCPU as it is done under a real hypervisor when a vCPU is
>> scheduled to run on a HW thread.
>>
>> This handler will become even more useful when we introduce the
>> machine supporting both interrupt modes, XIVE and XICS. In this
>> machine, the interrupt mode is chosen by the CAS negotiation process
>> and activated after a reset.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/spapr_irq.h  |  2 ++
>>  include/hw/ppc/spapr_xive.h |  1 +
>>  hw/intc/spapr_xive.c        | 24 ++++++++++++++++++++++++
>>  hw/ppc/spapr.c              |  5 +++++
>>  hw/ppc/spapr_irq.c          | 24 ++++++++++++++++++++++++
>>  5 files changed, 56 insertions(+)
>>
>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>> index 84a25ffb6c65..63061a009b4c 100644
>> --- a/include/hw/ppc/spapr_irq.h
>> +++ b/include/hw/ppc/spapr_irq.h
>> @@ -44,6 +44,7 @@ typedef struct sPAPRIrq {
>>      Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
>>                                 Error **errp);
>>      int (*post_load)(sPAPRMachineState *spapr, int version_id);
>> +    void (*reset)(sPAPRMachineState *spapr, Error **errp);
>>  } sPAPRIrq;
>>  
>>  extern sPAPRIrq spapr_irq_xics;
>> @@ -55,6 +56,7 @@ int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
>>  void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
>>  qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
>>  int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
>> +void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp);
>>  
>>  /*
>>   * XICS legacy routines
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 728a5e8dc163..7244a6231ce6 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -47,5 +47,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>                     uint32_t phandle);
>> +void spapr_xive_reset_tctx(sPAPRXive *xive);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index a6d854b07690..560d8d031f74 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -179,6 +179,30 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
>>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
>>  }
>>  
>> +/*
>> + * When a Virtual Processor is scheduled to run on a HW thread, the
>> + * hypervisor pushes its identifier in the OS CAM line. Emulate the
>> + * same behavior under QEMU.
>> + */
>> +void spapr_xive_reset_tctx(sPAPRXive *xive)
>> +{
>> +    CPUState *cs;
>> +    uint8_t  nvt_blk;
>> +    uint32_t nvt_idx;
>> +    uint32_t nvt_cam;
>> +
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
>> +
>> +        spapr_xive_cpu_to_nvt(cpu, &nvt_blk, &nvt_idx);
>> +
>> +        nvt_cam = cpu_to_be32(TM_QW1W2_VO |
>> +                              xive_nvt_cam_line(nvt_blk, nvt_idx));
>> +        memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &nvt_cam, 4);
>> +    }
>> +}
>> +

I will need to rework that patch because the OS CAM line needs to be 
reseted when a CPU is hot plugged also.

C. 

>>  static void spapr_xive_end_reset(XiveEND *end)
>>  {
>>      memset(end, 0, sizeof(*end));
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 8cea4cad1732..98d69f09e080 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1619,6 +1619,11 @@ static void spapr_machine_reset(void)
>>  
>>      qemu_devices_reset();
>>  
>> +    /* This is fixing some of the default configuration of the XIVE
>> +     * devices. To be called after the reset of the machine devices.
>> +     */
>> +    spapr_irq_reset(spapr, &error_fatal);
>> +
>>      /* DRC reset may cause a device to be unplugged. This will cause troubles
>>       * if this device is used by another device (eg, a running vhost backend
>>       * will crash QEMU if the DIMM holding the vring goes away). To avoid such
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 35a067cad3f8..04f5c9665550 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -209,6 +209,10 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
>>      return 0;
>>  }
>>  
>> +static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +}
> 
> You already have a check for a NULL reset hook in spapr_irq_reset() so
> you could omit this empty function.
> 
>> +
>>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
>>  #define SPAPR_IRQ_XICS_NR_MSIS     \
>>      (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
>> @@ -225,6 +229,7 @@ sPAPRIrq spapr_irq_xics = {
>>      .dt_populate = spapr_dt_xics,
>>      .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
>>      .post_load   = spapr_irq_post_load_xics,
>> +    .reset       = spapr_irq_reset_xics,
>>  };
>>  
>>  /*
>> @@ -333,6 +338,15 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>>      return 0;
>>  }
>>  
>> +static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +    /*
>> +     * Set the OS CAM line of the cpu interrupt thread context. Needs
>> +     * to come after the XiveTCTX reset handlers.
>> +     */
>> +    spapr_xive_reset_tctx(spapr->xive);
>> +}
>> +
>>  /*
>>   * XIVE uses the full IRQ number space. Set it to 8K to be compatible
>>   * with XICS.
>> @@ -353,6 +367,7 @@ sPAPRIrq spapr_irq_xive = {
>>      .dt_populate = spapr_dt_xive,
>>      .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
>>      .post_load   = spapr_irq_post_load_xive,
>> +    .reset       = spapr_irq_reset_xive,
>>  };
>>  
>>  /*
>> @@ -398,6 +413,15 @@ int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
>>      return smc->irq->post_load(spapr, version_id);
>>  }
>>  
>> +void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp)
>> +{
>> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>> +
>> +    if (smc->irq->reset) {
>> +        smc->irq->reset(spapr, errp);
>> +    }
>> +}
>> +
>>  /*
>>   * XICS legacy routines - to deprecate one day
>>   */
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset
  2018-12-11  1:46   ` David Gibson
@ 2018-12-11 10:58     ` Cédric Le Goater
  0 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11 10:58 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/11/18 2:46 AM, David Gibson wrote:
> On Sun, Dec 09, 2018 at 08:46:05PM +0100, Cédric Le Goater wrote:
>> Currently, the interrupt presenter of the vCPU is set at realize
>> time. Setting it at reset will become useful when the new machine
>> supporting both interrupt modes is introduced. In this machine, the
>> interrupt mode is chosen at CAS time and activated after a reset.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Shouldn't this also remove the code which sets cpu->intc at realize
> time, in order to avoid confusion?

yes. Which also raises the question of the parenthood when a CPU is 
hot-unplugged (spapr_unrealize_vcpu)

C.

> 
>> ---
>>  include/hw/ppc/spapr_cpu_core.h |  2 ++
>>  hw/ppc/spapr_cpu_core.c         | 26 ++++++++++++++++++++++++++
>>  hw/ppc/spapr_irq.c              | 12 ++++++++++++
>>  3 files changed, 40 insertions(+)
>>
>> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
>> index 9e2821e4b31f..fc8ea9021656 100644
>> --- a/include/hw/ppc/spapr_cpu_core.h
>> +++ b/include/hw/ppc/spapr_cpu_core.h
>> @@ -53,4 +53,6 @@ static inline sPAPRCPUState *spapr_cpu_state(PowerPCCPU *cpu)
>>      return (sPAPRCPUState *)cpu->machine_data;
>>  }
>>  
>> +void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type);
>> +
>>  #endif
>> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
>> index 1811cd48db90..529de0b6b9c8 100644
>> --- a/hw/ppc/spapr_cpu_core.c
>> +++ b/hw/ppc/spapr_cpu_core.c
>> @@ -398,3 +398,29 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
>>  };
>>  
>>  DEFINE_TYPES(spapr_cpu_core_type_infos)
>> +
>> +typedef struct ForeachFindIntCArgs {
>> +    const char *intc_type;
>> +    Object *intc;
>> +} ForeachFindIntCArgs;
>> +
>> +static int spapr_cpu_core_find_intc(Object *child, void *opaque)
>> +{
>> +    ForeachFindIntCArgs *args = opaque;
>> +
>> +    if (object_dynamic_cast(child, args->intc_type)) {
>> +        args->intc = child;
>> +    }
>> +
>> +    return args->intc != NULL;
>> +}
>> +
>> +void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type)
>> +{
>> +    ForeachFindIntCArgs args = { intc_type, NULL };
>> +
>> +    object_child_foreach(OBJECT(cpu), spapr_cpu_core_find_intc, &args);
>> +    g_assert(args.intc);
>> +
>> +    cpu->intc = args.intc;
>> +}
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 7a0d4f529763..b423cee30e2c 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -12,6 +12,7 @@
>>  #include "qemu/error-report.h"
>>  #include "qapi/error.h"
>>  #include "hw/ppc/spapr.h"
>> +#include "hw/ppc/spapr_cpu_core.h"
>>  #include "hw/ppc/spapr_xive.h"
>>  #include "hw/ppc/xics.h"
>>  #include "sysemu/kvm.h"
>> @@ -211,6 +212,11 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
>>  
>>  static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
>>  {
>> +    CPUState *cs;
>> +
>> +    CPU_FOREACH(cs) {
>> +        spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
>> +    }
>>  }
>>  
>>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
>> @@ -341,6 +347,12 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>>  
>>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>>  {
>> +    CPUState *cs;
>> +
>> +    CPU_FOREACH(cs) {
>> +        spapr_cpu_core_set_intc(POWERPC_CPU(cs), TYPE_XIVE_TCTX);
>> +    }
>> +
>>      /*
>>       * Set the OS CAM line of the cpu interrupt thread context. Needs
>>       * to come after the XiveTCTX reset handlers.
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-11 10:42       ` Cédric Le Goater
@ 2018-12-11 16:44         ` Cédric Le Goater
  2018-12-15  8:03           ` David Gibson
  2018-12-12  0:34         ` David Gibson
  1 sibling, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-11 16:44 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/11/18 11:42 AM, Cédric Le Goater wrote:
> On 12/11/18 3:06 AM, David Gibson wrote:
>> On Mon, Dec 10, 2018 at 11:17:33PM +0100, Cédric Le Goater wrote:
>>> On 12/9/18 8:46 PM, Cédric Le Goater wrote:
>>>> This pseries machine makes use of a new sPAPR IRQ backend supporting
>>>> the XIVE interrupt mode.
>>>>
>>>> The guest OS is required to have support for the XIVE exploitation
>>>> mode of the POWER9 interrupt controller.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/ppc/spapr.c | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 4012ebd794a4..3cc134a0b673 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -3985,6 +3985,21 @@ static void spapr_machine_4_0_class_options(MachineClass *mc)
>>>>  
>>>>  DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
>>>>  
>>>> +static void spapr_machine_4_0_xive_instance_options(MachineState *machine)
>>>> +{
>>>> +    spapr_machine_4_0_instance_options(machine);
>>>> +}
>>>> +
>>>> +static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
>>>> +{
>>>> +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
>>>> +
>>>> +    spapr_machine_4_0_class_options(mc);> +    smc->irq = &spapr_irq_xive;
>>>
>>> I have been adding checks on the CPU model to export the XIVE capability 
>>> only on POWER9 processors but it breaks some of the tests.
>>>
>>> I was wondering if we could add a default POWER9 CPU to the -xive machine : 
>>>
>>>   + mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
>>>
>>> and if we could change tests/cpu-plug-test.c with :
>>>
>>>   @@ -198,8 +198,13 @@ static void add_pseries_test_case(const
>>>        }
>>>        data = g_new(PlugTestData, 1);
>>>        data->machine = g_strdup(mname);
>>>   -    data->cpu_model = "power8_v2.0";
>>>   -    data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
>>>   +    if (g_str_has_suffix(mname, "xive")) {
>>>   +        data->cpu_model = "power9_v2.0";
>>>   +        data->device_model = g_strdup("power9_v2.0-spapr-cpu-core");
>>>   +    } else {
>>>   +        data->cpu_model = "power8_v2.0";
>>>   +        data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
>>>   +    }
>>>        data->sockets = 2;
>>>        data->cores = 3;
>>>        data->threads = 1;
>>>
>>> or if there is a better way ?
>>
>> So, I'd actually prefer a machine option, rather than wholly separate
>> machine types to select xics/xive/dual.  Machine types was fine while
>> prototyping this, but I don't think we want to actually merge new
>> machine types for it.
> 
> I agree. 
> 
>> So, instead I think we want a machine option which can be set to
>> xics/xive/dual, with xics being the default for earlier machine types
>> and dual the default for 4.0 onwards.
> 
> I will revive an old patch doing just that. 
> 
> The question now is how to link the sPAPRMachineState instance to 
> the selected sPAPR IRQ backend. 

Would something like below be acceptable ? If so I will change the 
remaining patches to use 'spapr->irq' and not 'smc->irq' anymore.
 
Thanks,

C.

Index: qemu-xive.git/include/hw/ppc/spapr.h
===================================================================
--- qemu-xive.git.orig/include/hw/ppc/spapr.h
+++ qemu-xive.git/include/hw/ppc/spapr.h
@@ -177,6 +177,7 @@ struct sPAPRMachineState {
     int32_t irq_map_nr;
     unsigned long *irq_map;
     sPAPRXive  *xive;
+    sPAPRIrq *irq;
 
     bool cmd_line_caps[SPAPR_CAP_NUM];
     sPAPRCapabilities def, eff, mig;
Index: qemu-xive.git/hw/ppc/spapr.c
===================================================================
--- qemu-xive.git.orig/hw/ppc/spapr.c
+++ qemu-xive.git/hw/ppc/spapr.c
@@ -1302,7 +1301,7 @@ static void *spapr_build_fdt(sPAPRMachin
     }
 
     QLIST_FOREACH(phb, &spapr->phbs, list) {
-        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, smc->irq->nr_msis);
+        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, spapr->irq->nr_msis);
         if (ret < 0) {
             error_report("couldn't setup PCI devices in fdt");
             exit(1);
@@ -3056,9 +3055,38 @@ static void spapr_set_vsmt(Object *obj,
     visit_type_uint32(v, name, (uint32_t *)opaque, errp);
 }
 
+static char *spapr_get_irq(Object *obj, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    if (spapr->irq == &spapr_irq_xics_legacy) {
+        return g_strdup("legacy");
+    } else if (spapr->irq == &spapr_irq_xics) {
+        return g_strdup("xics");
+    } else if (spapr->irq == &spapr_irq_xive) {
+        return g_strdup("xive");
+    }
+    g_assert_not_reached();
+}
+
+static void spapr_set_irq(Object *obj, const char *value, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    /* We don't want to set the legacy IRQ backend */
+    if (strcmp(value, "xics") == 0) {
+        spapr->irq = &spapr_irq_xics;
+    } else if (strcmp(value, "xive") == 0) {
+        spapr->irq = &spapr_irq_xive;
+    } else {
+        error_setg(errp, "Bad value for \"irq\" property");
+    }
+}
+
 static void spapr_instance_init(Object *obj)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 
     spapr->htab_fd = -1;
     spapr->use_hotplug_event_source = true;
@@ -3092,6 +3120,13 @@ static void spapr_instance_init(Object *
                                     " the host's SMT mode", &error_abort);
     object_property_add_bool(obj, "vfio-no-msix-emulation",
                              spapr_get_msix_emulation, NULL, NULL);
+
+    /* Get the default from the machine class */
+    spapr->irq = smc->irq;
+    object_property_add_str(obj, "irq", spapr_get_irq, spapr_set_irq, NULL);
+    object_property_set_description(obj, "irq",
+                 "Specifies the interrupt controller mode (xics, xive)",
+                 NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -3814,9 +3849,8 @@ static void spapr_pic_print_info(Interru
                                  Monitor *mon)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 
-    smc->irq->print_info(spapr, mon);
+    spapr->irq->print_info(spapr, mon);
 }
 
 int spapr_get_vcpu_id(PowerPCCPU *cpu)
Index: qemu-xive.git/hw/ppc/spapr_irq.c
===================================================================
--- qemu-xive.git.orig/hw/ppc/spapr_irq.c
+++ qemu-xive.git/hw/ppc/spapr_irq.c
@@ -94,8 +94,7 @@ error:
 static void spapr_irq_init_xics(sPAPRMachineState *spapr, Error **errp)
 {
     MachineState *machine = MACHINE(spapr);
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-    int nr_irqs = smc->irq->nr_irqs;
+    int nr_irqs = spapr->irq->nr_irqs;
     Error *local_err = NULL;
 
     if (kvm_enabled()) {
@@ -234,7 +233,6 @@ sPAPRIrq spapr_irq_xics = {
 static void spapr_irq_init_xive(sPAPRMachineState *spapr, Error **errp)
 {
     MachineState *machine = MACHINE(spapr);
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
     uint32_t nr_servers = spapr_max_server_number(spapr);
     DeviceState *dev;
     int i;
@@ -248,7 +246,7 @@ static void spapr_irq_init_xive(sPAPRMac
     }
 
     dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
-    qdev_prop_set_uint32(dev, "nr-irqs", smc->irq->nr_irqs);
+    qdev_prop_set_uint32(dev, "nr-irqs", spapr->irq->nr_irqs);
     /*
      * 8 XIVE END structures per CPU. One for each available priority
      */
@@ -353,50 +351,38 @@ sPAPRIrq spapr_irq_xive = {
  */
 void spapr_irq_init(sPAPRMachineState *spapr, Error **errp)
 {
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-
     /* Initialize the MSI IRQ allocator. */
     if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
-        spapr_irq_msi_init(spapr, smc->irq->nr_msis);
+        spapr_irq_msi_init(spapr, spapr->irq->nr_msis);
     }
 
-    smc->irq->init(spapr, errp);
+    spapr->irq->init(spapr, errp);
 }
 
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp)
 {
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-
-    return smc->irq->claim(spapr, irq, lsi, errp);
+    return spapr->irq->claim(spapr, irq, lsi, errp);
 }
 
 void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
 {
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-
-    smc->irq->free(spapr, irq, num);
+    spapr->irq->free(spapr, irq, num);
 }
 
 qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq)
 {
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-
-    return smc->irq->qirq(spapr, irq);
+    return spapr->irq->qirq(spapr, irq);
 }
 
 int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
 {
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-
-    return smc->irq->post_load(spapr, version_id);
+    return spapr->irq->post_load(spapr, version_id);
 }
 
 void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp)
 {
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-
-    if (smc->irq->reset) {
-        smc->irq->reset(spapr, errp);
+    if (spapr->irq->reset) {
+        spapr->irq->reset(spapr, errp);
     }
 }
 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode
  2018-12-11  9:06         ` Cédric Le Goater
@ 2018-12-12  0:19           ` David Gibson
  2018-12-12  7:37             ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-12  0:19 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 9445 bytes --]

On Tue, Dec 11, 2018 at 10:06:46AM +0100, Cédric Le Goater wrote:
> On 12/11/18 1:38 AM, David Gibson wrote:
> > On Mon, Dec 10, 2018 at 08:53:17AM +0100, Cédric Le Goater wrote:
> >> On 12/10/18 7:39 AM, David Gibson wrote:
> >>> On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
> >>>> The XIVE interface for the guest is described in the device tree under
> >>>> the "interrupt-controller" node. A couple of new properties are
> >>>> specific to XIVE :
> >>>>
> >>>>  - "reg"
> >>>>
> >>>>    contains the base address and size of the thread interrupt
> >>>>    managnement areas (TIMA), for the User level and for the Guest OS
> >>>>    level. Only the Guest OS level is taken into account today.
> >>>>
> >>>>  - "ibm,xive-eq-sizes"
> >>>>
> >>>>    the size of the event queues. One cell per size supported, contains
> >>>>    log2 of size, in ascending order.
> >>>>
> >>>>  - "ibm,xive-lisn-ranges"
> >>>>
> >>>>    the IRQ interrupt number ranges assigned to the guest for the IPIs.
> >>>>
> >>>> and also under the root node :
> >>>>
> >>>>  - "ibm,plat-res-int-priorities"
> >>>>
> >>>>    contains a list of priorities that the hypervisor has reserved for
> >>>>    its own use. OPAL uses the priority 7 queue to automatically
> >>>>    escalate interrupts for all other queues (DD2.X POWER9). So only
> >>>>    priorities [0..6] are allowed for the guest.
> >>>>
> >>>> Extend the sPAPR IRQ backend with a new handler to populate the DT
> >>>> with the appropriate "interrupt-controller" node.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  include/hw/ppc/spapr_irq.h  |  2 ++
> >>>>  include/hw/ppc/spapr_xive.h |  2 ++
> >>>>  include/hw/ppc/xics.h       |  4 +--
> >>>>  hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
> >>>>  hw/intc/xics_spapr.c        |  3 +-
> >>>>  hw/ppc/spapr.c              |  3 +-
> >>>>  hw/ppc/spapr_irq.c          |  3 ++
> >>>>  7 files changed, 77 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> >>>> index 23cdb51b879e..e51e9f052f63 100644
> >>>> --- a/include/hw/ppc/spapr_irq.h
> >>>> +++ b/include/hw/ppc/spapr_irq.h
> >>>> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
> >>>>      void (*free)(sPAPRMachineState *spapr, int irq, int num);
> >>>>      qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
> >>>>      void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
> >>>> +    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
> >>>> +                        void *fdt, uint32_t phandle);
> >>>>  } sPAPRIrq;
> >>>>  
> >>>>  extern sPAPRIrq spapr_irq_xics;
> >>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >>>> index 9506a8f4d10a..728a5e8dc163 100644
> >>>> --- a/include/hw/ppc/spapr_xive.h
> >>>> +++ b/include/hw/ppc/spapr_xive.h
> >>>> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
> >>>>  typedef struct sPAPRMachineState sPAPRMachineState;
> >>>>  
> >>>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> >>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> >>>> +                   uint32_t phandle);
> >>>>  
> >>>>  #endif /* PPC_SPAPR_XIVE_H */
> >>>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> >>>> index 9958443d1984..14afda198cdb 100644
> >>>> --- a/include/hw/ppc/xics.h
> >>>> +++ b/include/hw/ppc/xics.h
> >>>> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
> >>>>      ICPState *(*icp_get)(XICSFabric *xi, int server);
> >>>>  } XICSFabricClass;
> >>>>  
> >>>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
> >>>> -
> >>>>  ICPState *xics_icp_get(XICSFabric *xi, int server);
> >>>>  
> >>>>  /* Internal XICS interfaces */
> >>>> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
> >>>>  
> >>>>  typedef struct sPAPRMachineState sPAPRMachineState;
> >>>>  
> >>>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> >>>> +                   uint32_t phandle);
> >>>>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
> >>>>  void xics_spapr_init(sPAPRMachineState *spapr);
> >>>>  
> >>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >>>> index 982ac6e17051..a6d854b07690 100644
> >>>> --- a/hw/intc/spapr_xive.c
> >>>> +++ b/hw/intc/spapr_xive.c
> >>>> @@ -14,6 +14,7 @@
> >>>>  #include "target/ppc/cpu.h"
> >>>>  #include "sysemu/cpus.h"
> >>>>  #include "monitor/monitor.h"
> >>>> +#include "hw/ppc/fdt.h"
> >>>>  #include "hw/ppc/spapr.h"
> >>>>  #include "hw/ppc/spapr_xive.h"
> >>>>  #include "hw/ppc/xive.h"
> >>>> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
> >>>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
> >>>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
> >>>>  }
> >>>> +
> >>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> >>>> +                   uint32_t phandle)
> >>>> +{
> >>>> +    sPAPRXive *xive = spapr->xive;
> >>>> +    int node;
> >>>> +    uint64_t timas[2 * 2];
> >>>> +    /* Interrupt number ranges for the IPIs */
> >>>> +    uint32_t lisn_ranges[] = {
> >>>> +        cpu_to_be32(0),
> >>>> +        cpu_to_be32(nr_servers),
> >>>> +    };
> >>>> +    uint32_t eq_sizes[] = {
> >>>> +        cpu_to_be32(12), /* 4K */
> >>>> +        cpu_to_be32(16), /* 64K */
> >>>> +        cpu_to_be32(21), /* 2M */
> >>>> +        cpu_to_be32(24), /* 16M */
> >>>
> >>> For KVM, are we going to need to clamp this list based on the
> >>> pagesizes the guest can use?
> >>
> >> I would say so. Is there a KVM service for that ?
> > 
> > I'm not sure what you mean by a KVM service here.
> 
> I meant a routine giving a list of the possible backing pagesizes.

I'm still not really sure what you mean by that.

> >> Today, the OS scans the list and picks the size fitting its current 
> >> PAGE_SIZE/SHIFT. But I suppose it would be better to advertise 
> >> only the page sizes that QEMU/KVM supports. Or should we play safe 
> >> and only export : 4K, 64K ? 
> > 
> > So, I'm guessing the limitation here is that when we use the KVM XIVE
> > acceleration, the EQ will need to be host-contiguous?  
> 
> The EQ should be a single page (from the specs). So we don't have 
> a contiguity problem.

A single page according to whom, though?  A RPT guest can use 2MiB
pages, even if it is backed with smaller pages on the host, but I'm
guessing it would not work to have its EQ be a 2MiB page, since it
won't be host-contiguous.

> > So, we can't have EQs larger than the backing pagesize for guest 
> > memory.
> >
>  > We try to avoid having guest visible differences based on whether
> > we're using KVM or not, so we don't want to change this list depending
> > on whether KVM is active etc. - or on whether we're backing the guest
> > with hugepages.
> > 
> > We could reuse the cap-hpt-max-page-size machine parameter which also
> > relates to backing page size, but that might be confusing since it's
> > explicitly about HPT only whereas this limitation would apply to RPT
> > guests too, IIUC.
> > 
> > I think our best bet is probably to limit to 64kiB queues
> > unconditionally.  
> 
> OK. That's the default. We can refine this list of page sizes later on
> when we introduce KVM support if needed.
> 
> > AIUI, that still gives us 8192 slots in the queue,
> 
> The entries are 32bits. So that's 16384 entries. 

Even better.

> > which is enough to store one of every possible irq, which should be
> > enough.
> 
> yes. 
> 
> I don't think Linux measures how much entries are dequeued at once.
> It would give us a feel of how much pressure these EQs can sustain.
> 
> C.
>  
> > There's still an issue if we have a 4kiB pagesize host kernel, but we
> > could just disable kernel_irqchip in that situation, since we don't
> > expect people to be using that much in practice.
> > 
> >>>> +    };
> >>>> +    /* The following array is in sync with the reserved priorities
> >>>> +     * defined by the 'spapr_xive_priority_is_reserved' routine.
> >>>> +     */
> >>>> +    uint32_t plat_res_int_priorities[] = {
> >>>> +        cpu_to_be32(7),    /* start */
> >>>> +        cpu_to_be32(0xf8), /* count */
> >>>> +    };
> >>>> +    gchar *nodename;
> >>>> +
> >>>> +    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
> >>>> +    timas[0] = cpu_to_be64(xive->tm_base +
> >>>> +                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
> >>>> +    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
> >>>> +    timas[2] = cpu_to_be64(xive->tm_base +
> >>>> +                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
> >>>> +    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
> >>>
> >>> So this gives the MMIO address of the TIMA.  
> >>
> >> Yes. It is considered to be a fixed "address" 
> >>
> >>> Where does the guest get the MMIO address for the ESB pages from?
> >>
> >> with the hcall H_INT_GET_SOURCE_INFO.
> > 
> > Ah, right.
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs at reset
  2018-12-11 10:14     ` Cédric Le Goater
@ 2018-12-12  0:32       ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-12  0:32 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3343 bytes --]

On Tue, Dec 11, 2018 at 11:14:41AM +0100, Cédric Le Goater wrote:
> On 12/11/18 2:47 AM, David Gibson wrote:
> > On Sun, Dec 09, 2018 at 08:46:06PM +0100, Cédric Le Goater wrote:
> >> Depending on the interrupt mode chosen, enable or disable the XIVE
> >> MMIOs.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  include/hw/ppc/spapr_xive.h | 1 +
> >>  hw/intc/spapr_xive.c        | 9 +++++++++
> >>  hw/ppc/spapr_irq.c          | 8 ++++++++
> >>  3 files changed, 18 insertions(+)
> >>
> >> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> >> index 7244a6231ce6..308afb61a666 100644
> >> --- a/include/hw/ppc/spapr_xive.h
> >> +++ b/include/hw/ppc/spapr_xive.h
> >> @@ -48,5 +48,6 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr);
> >>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> >>                     uint32_t phandle);
> >>  void spapr_xive_reset_tctx(sPAPRXive *xive);
> >> +void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable);
> >>  
> >>  #endif /* PPC_SPAPR_XIVE_H */
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 560d8d031f74..c6dbb2e8cfc7 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -179,6 +179,15 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
> >>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
> >>  }
> >>  
> >> +void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable)
> > 
> > The logic looks fine, but I dislike this name - it's called
> > ..._enable() when it can also be used to disable.
> 
> ok. Let's call it spapr_xive_mmio_set_enabled() then

Sounds good.

> 
> C.
>  
> >> +{
> >> +    memory_region_set_enabled(&xive->source.esb_mmio, enable);
> >> +    memory_region_set_enabled(&xive->tm_mmio, enable);
> >> +
> >> +    /* Disable the END ESBs until a guest OS makes use of them */
> >> +    memory_region_set_enabled(&xive->end_source.esb_mmio, false);
> >> +}
> >> +
> >>  /*
> >>   * When a Virtual Processor is scheduled to run on a HW thread, the
> >>   * hypervisor pushes its identifier in the OS CAM line. Emulate the
> >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> >> index b423cee30e2c..a8e50725397c 100644
> >> --- a/hw/ppc/spapr_irq.c
> >> +++ b/hw/ppc/spapr_irq.c
> >> @@ -217,6 +217,11 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
> >>      CPU_FOREACH(cs) {
> >>          spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
> >>      }
> >> +
> >> +    /* Deactivate the XIVE MMIOs */
> >> +    if (spapr->xive) {
> >> +        spapr_xive_enable_mmio(spapr->xive, false);
> >> +    }
> >>  }
> >>  
> >>  #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
> >> @@ -358,6 +363,9 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
> >>       * to come after the XiveTCTX reset handlers.
> >>       */
> >>      spapr_xive_reset_tctx(spapr->xive);
> >> +
> >> +    /* Activate the XIVE MMIOs */
> >> +    spapr_xive_enable_mmio(spapr->xive, true);
> >>  }
> >>  
> >>  /*
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-11 10:42       ` Cédric Le Goater
  2018-12-11 16:44         ` Cédric Le Goater
@ 2018-12-12  0:34         ` David Gibson
  2018-12-12  7:26           ` Cédric Le Goater
  1 sibling, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-12  0:34 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3712 bytes --]

On Tue, Dec 11, 2018 at 11:42:03AM +0100, Cédric Le Goater wrote:
> On 12/11/18 3:06 AM, David Gibson wrote:
> > On Mon, Dec 10, 2018 at 11:17:33PM +0100, Cédric Le Goater wrote:
> >> On 12/9/18 8:46 PM, Cédric Le Goater wrote:
> >>> This pseries machine makes use of a new sPAPR IRQ backend supporting
> >>> the XIVE interrupt mode.
> >>>
> >>> The guest OS is required to have support for the XIVE exploitation
> >>> mode of the POWER9 interrupt controller.
> >>>
> >>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>> ---
> >>>  hw/ppc/spapr.c | 15 +++++++++++++++
> >>>  1 file changed, 15 insertions(+)
> >>>
> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>> index 4012ebd794a4..3cc134a0b673 100644
> >>> --- a/hw/ppc/spapr.c
> >>> +++ b/hw/ppc/spapr.c
> >>> @@ -3985,6 +3985,21 @@ static void spapr_machine_4_0_class_options(MachineClass *mc)
> >>>  
> >>>  DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
> >>>  
> >>> +static void spapr_machine_4_0_xive_instance_options(MachineState *machine)
> >>> +{
> >>> +    spapr_machine_4_0_instance_options(machine);
> >>> +}
> >>> +
> >>> +static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
> >>> +{
> >>> +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> >>> +
> >>> +    spapr_machine_4_0_class_options(mc);> +    smc->irq = &spapr_irq_xive;
> >>
> >> I have been adding checks on the CPU model to export the XIVE capability 
> >> only on POWER9 processors but it breaks some of the tests.
> >>
> >> I was wondering if we could add a default POWER9 CPU to the -xive machine : 
> >>
> >>   + mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
> >>
> >> and if we could change tests/cpu-plug-test.c with :
> >>
> >>   @@ -198,8 +198,13 @@ static void add_pseries_test_case(const
> >>        }
> >>        data = g_new(PlugTestData, 1);
> >>        data->machine = g_strdup(mname);
> >>   -    data->cpu_model = "power8_v2.0";
> >>   -    data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
> >>   +    if (g_str_has_suffix(mname, "xive")) {
> >>   +        data->cpu_model = "power9_v2.0";
> >>   +        data->device_model = g_strdup("power9_v2.0-spapr-cpu-core");
> >>   +    } else {
> >>   +        data->cpu_model = "power8_v2.0";
> >>   +        data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
> >>   +    }
> >>        data->sockets = 2;
> >>        data->cores = 3;
> >>        data->threads = 1;
> >>
> >> or if there is a better way ?
> > 
> > So, I'd actually prefer a machine option, rather than wholly separate
> > machine types to select xics/xive/dual.  Machine types was fine while
> > prototyping this, but I don't think we want to actually merge new
> > machine types for it.
> 
> I agree. 
> 
> > So, instead I think we want a machine option which can be set to
> > xics/xive/dual, with xics being the default for earlier machine types
> > and dual the default for 4.0 onwards.
> 
> I will revive an old patch doing just that. 
> 
> The question now is how to link the sPAPRMachineState instance to 
> the selected sPAPR IRQ backend. 
> 
> I don't think we can move 'smc->irq' to sPAPRMachineState.

I think you could..

> So we will
> need an helper returning the appropriate backend depending on the machine 
> option and 'smc->irq' should disappear.

..but this approach might be easier.

> 
> > We can make POWER9 the default cpu for 4.0 onwards as well, if you want.
> 
> OK.
> 
> C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  2018-12-11 10:19     ` Cédric Le Goater
@ 2018-12-12  0:54       ` David Gibson
  2018-12-12  9:13         ` Cédric Le Goater
  0 siblings, 1 reply; 61+ messages in thread
From: David Gibson @ 2018-12-12  0:54 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 9984 bytes --]

On Tue, Dec 11, 2018 at 11:19:01AM +0100, Cédric Le Goater wrote:
> On 12/11/18 3:03 AM, David Gibson wrote:
> > On Sun, Dec 09, 2018 at 08:46:07PM +0100, Cédric Le Goater wrote:
> >> The interrupt mode is chosen by the CAS negotiation process and
> >> activated after a reset to take into account the required changes in
> >> the machine. These impact the device tree layout, the interrupt
> >> presenter object and the exposed MMIO regions in the case of XIVE.
> >>
> >> This default interrupt mode for the machine is XICS.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  include/hw/ppc/spapr_irq.h |   1 +
> >>  hw/ppc/spapr.c             |   3 +-
> >>  hw/ppc/spapr_hcall.c       |  13 ++++
> >>  hw/ppc/spapr_irq.c         | 143 +++++++++++++++++++++++++++++++++++++
> >>  4 files changed, 159 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> >> index b34d5a00381b..29936498dbc8 100644
> >> --- a/include/hw/ppc/spapr_irq.h
> >> +++ b/include/hw/ppc/spapr_irq.h
> >> @@ -51,6 +51,7 @@ typedef struct sPAPRIrq {
> >>  extern sPAPRIrq spapr_irq_xics;
> >>  extern sPAPRIrq spapr_irq_xics_legacy;
> >>  extern sPAPRIrq spapr_irq_xive;
> >> +extern sPAPRIrq spapr_irq_dual;
> >>  
> >>  void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
> >>  int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 5ef87a00f68b..fa41927d95dd 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2631,7 +2631,8 @@ static void spapr_machine_init(MachineState *machine)
> >>      spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
> >>  
> >>      /* advertise XIVE */
> >> -    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT) {
> >> +    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT ||
> >> +        smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH) {
> >>          spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
> >>      }
> >>  
> >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> >> index ae913d070f50..186b6a65543f 100644
> >> --- a/hw/ppc/spapr_hcall.c
> >> +++ b/hw/ppc/spapr_hcall.c
> >> @@ -1654,6 +1654,19 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
> >>                                            ov5_updates) != 0);
> >>      }
> >> +
> >> +    /*
> >> +     * Generate a machine reset when we have an update of the
> >> +     * interrupt mode. Only required on the machine supporting both
> >> +     * mode.
> >> +     */
> >> +    if (!spapr->cas_reboot) {
> >> +        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> >> +
> >> +        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)
> >> +            && smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH;
> >> +    }
> >> +
> >>      spapr_ovec_cleanup(ov5_updates);
> >>  
> >>      if (spapr->cas_reboot) {
> >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> >> index a8e50725397c..7c34939f774a 100644
> >> --- a/hw/ppc/spapr_irq.c
> >> +++ b/hw/ppc/spapr_irq.c
> >> @@ -392,6 +392,149 @@ sPAPRIrq spapr_irq_xive = {
> >>      .reset       = spapr_irq_reset_xive,
> >>  };
> >>  
> >> +/*
> >> + * Dual XIVE and XICS IRQ backend.
> >> + *
> >> + * Both interrupt mode, XIVE and XICS, objects are created but the
> >> + * machine starts in legacy interrupt mode (XICS). It can be changed
> >> + * by the CAS negotiation process and, in that case, the new mode is
> >> + * activated after extra machine reset.
> >> + */
> >> +
> >> +/*
> >> + * Returns the sPAPR IRQ backend negotiated by CAS. XICS is the
> >> + * default.
> >> + */
> >> +static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
> >> +{
> >> +    return spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT) ?
> >> +        &spapr_irq_xive : &spapr_irq_xics;
> >> +}
> >> +
> >> +static void spapr_irq_init_dual(sPAPRMachineState *spapr, Error **errp)
> >> +{
> >> +    MachineState *machine = MACHINE(spapr);
> >> +    Error *local_err = NULL;
> >> +
> >> +    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
> >> +        error_setg(errp, "No KVM support for the 'dual' machine");
> >> +        return;
> >> +    }
> >> +
> >> +    spapr_irq_xics.init(spapr, &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +
> >> +    spapr_irq_xive.init(spapr, &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +}
> >> +
> >> +static int spapr_irq_claim_dual(sPAPRMachineState *spapr, int irq, bool lsi,
> >> +                                Error **errp)
> >> +{
> >> +    int ret;
> >> +    Error *local_err = NULL;
> >> +
> >> +    ret = spapr_irq_xive.claim(spapr, irq, lsi, &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +        return ret;
> >> +    }
> >> +
> >> +    ret = spapr_irq_xics.claim(spapr, irq, lsi, &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +    }
> >> +
> >> +    return ret;
> >> +}
> >> +
> >> +static void spapr_irq_free_dual(sPAPRMachineState *spapr, int irq, int num)
> >> +{
> >> +    spapr_irq_xive.free(spapr, irq, num);
> >> +    spapr_irq_xics.free(spapr, irq, num);
> >> +}
> >> +
> >> +static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq)
> >> +{
> >> +    return spapr_irq_current(spapr)->qirq(spapr, irq);
> > 
> > Urgh... I don't think this is going to work.  IIRC the various devices
> > (PHB, VIO, etc.)  are wired up to their qirqs at realize() time, so if
> > you reboot from a XIVE guest to XICS guest (or maybe the other way
> > around) the peripherals won't be able to signal irqs in the new
> > scheme.
> 
> It does. The IRQ numbers are claimed in both backends.

Yes, I realize that, but the two backends still have their own set of
qirqs, which have their own set_irq routines associated with them.

> This is the problem since the very beginning. For reset and migration
> to work, we need to keep in sync the IRQ number space of the machine 
> and the different interrupt controllers.

Sure, we have the numbers in sync, but that won't help if when the
peripherals do a qemu_irq_pulse() it goes to the wrong backend's
trigger routine.


> 
> C. 
> 
> 
> > I think instead we need a common set of qirqs, whose set_irq routine
> > looks at whether to signal XICS or XIVE.  FOr now I think the easiest
> > approach is to layer those on top of the existing XICS or XIVE
> > specific qirqs.  Later we might want to remove the (input) qirqs
> > entirely from the XICS and XIVE subsystems, instead having just
> > explicit trigger functions.  Then spapr will always supply the qirqs
> > which call into one or the other.
> > 
> >> +}
> >> +
> >> +static void spapr_irq_print_info_dual(sPAPRMachineState *spapr, Monitor *mon)
> >> +{
> >> +    spapr_irq_current(spapr)->print_info(spapr, mon);
> >> +}
> >> +
> >> +static void spapr_irq_dt_populate_dual(sPAPRMachineState *spapr,
> >> +                                       uint32_t nr_servers, void *fdt,
> >> +                                       uint32_t phandle)
> >> +{
> >> +    spapr_irq_current(spapr)->dt_populate(spapr, nr_servers, fdt, phandle);
> >> +}
> >> +
> >> +static Object *spapr_irq_cpu_intc_create_dual(sPAPRMachineState *spapr,
> >> +                                              Object *cpu, Error **errp)
> >> +{
> >> +    Error *local_err = NULL;
> >> +
> >> +    spapr_irq_xive.cpu_intc_create(spapr, cpu, &local_err);
> >> +    if (local_err) {
> >> +        error_propagate(errp, local_err);
> >> +        return NULL;
> >> +    }
> >> +
> >> +    /* Default to XICS interrupt mode */
> >> +    return spapr_irq_xics.cpu_intc_create(spapr, cpu, errp);
> >> +}
> >> +
> >> +static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
> >> +{
> >> +    /*
> >> +     * Force a reset of the XIVE backend after migration. The machine
> >> +     * defaults to XICS at startup.
> >> +     */
> >> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> >> +        spapr_irq_xive.reset(spapr, &error_fatal);
> >> +    }
> >> +
> >> +    return spapr_irq_current(spapr)->post_load(spapr, version_id);
> >> +}
> >> +
> >> +static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
> >> +{
> >> +    /*
> >> +     * Reset the interrupt mode selected by CAS.
> >> +     */
> >> +    spapr_irq_current(spapr)->reset(spapr, errp);
> >> +}
> >> +
> >> +/*
> >> + * Define values in sync with the XIVE and XICS backend
> >> + */
> >> +#define SPAPR_IRQ_DUAL_NR_IRQS     0x2000
> >> +#define SPAPR_IRQ_DUAL_NR_MSIS     (SPAPR_IRQ_DUAL_NR_IRQS - SPAPR_IRQ_MSI)
> >> +
> >> +sPAPRIrq spapr_irq_dual = {
> >> +    .nr_irqs     = SPAPR_IRQ_DUAL_NR_IRQS,
> >> +    .nr_msis     = SPAPR_IRQ_DUAL_NR_MSIS,
> >> +    .ov5         = SPAPR_OV5_XIVE_BOTH,
> >> +
> >> +    .init        = spapr_irq_init_dual,
> >> +    .claim       = spapr_irq_claim_dual,
> >> +    .free        = spapr_irq_free_dual,
> >> +    .qirq        = spapr_qirq_dual,
> >> +    .print_info  = spapr_irq_print_info_dual,
> >> +    .dt_populate = spapr_irq_dt_populate_dual,
> >> +    .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
> >> +    .post_load   = spapr_irq_post_load_dual,
> >> +    .reset       = spapr_irq_reset_dual,
> >> +};
> >> +
> >>  /*
> >>   * sPAPR IRQ frontend routines for devices
> >>   */
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-12  0:34         ` David Gibson
@ 2018-12-12  7:26           ` Cédric Le Goater
  0 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-12  7:26 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[ ... ]

>>> So, instead I think we want a machine option which can be set to
>>> xics/xive/dual, with xics being the default for earlier machine types
>>> and dual the default for 4.0 onwards.
>>
>> I will revive an old patch doing just that. 
>>
>> The question now is how to link the sPAPRMachineState instance to 
>> the selected sPAPR IRQ backend. 
>>
>> I don't think we can move 'smc->irq' to sPAPRMachineState.
> 
> I think you could..
> 
>> So we will
>> need an helper returning the appropriate backend depending on the machine 
>> option and 'smc->irq' should disappear.
> 
> ..but this approach might be easier.

I proposed the first approach in v8. We can add the missing wrappers 
in a second round and move then under spapr_irq.h. These are :
 
ops :

   spapr->irq->dt_populate   spapr->irq->print_info
   spapr->irq->cpu_intc_create (this name is too long)

constants :

   spapr->irq->ov5
   spapr->irq->nr_msis

C.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode
  2018-12-12  0:19           ` David Gibson
@ 2018-12-12  7:37             ` Cédric Le Goater
  0 siblings, 0 replies; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-12  7:37 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/12/18 1:19 AM, David Gibson wrote:
> On Tue, Dec 11, 2018 at 10:06:46AM +0100, Cédric Le Goater wrote:
>> On 12/11/18 1:38 AM, David Gibson wrote:
>>> On Mon, Dec 10, 2018 at 08:53:17AM +0100, Cédric Le Goater wrote:
>>>> On 12/10/18 7:39 AM, David Gibson wrote:
>>>>> On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
>>>>>> The XIVE interface for the guest is described in the device tree under
>>>>>> the "interrupt-controller" node. A couple of new properties are
>>>>>> specific to XIVE :
>>>>>>
>>>>>>  - "reg"
>>>>>>
>>>>>>    contains the base address and size of the thread interrupt
>>>>>>    managnement areas (TIMA), for the User level and for the Guest OS
>>>>>>    level. Only the Guest OS level is taken into account today.
>>>>>>
>>>>>>  - "ibm,xive-eq-sizes"
>>>>>>
>>>>>>    the size of the event queues. One cell per size supported, contains
>>>>>>    log2 of size, in ascending order.
>>>>>>
>>>>>>  - "ibm,xive-lisn-ranges"
>>>>>>
>>>>>>    the IRQ interrupt number ranges assigned to the guest for the IPIs.
>>>>>>
>>>>>> and also under the root node :
>>>>>>
>>>>>>  - "ibm,plat-res-int-priorities"
>>>>>>
>>>>>>    contains a list of priorities that the hypervisor has reserved for
>>>>>>    its own use. OPAL uses the priority 7 queue to automatically
>>>>>>    escalate interrupts for all other queues (DD2.X POWER9). So only
>>>>>>    priorities [0..6] are allowed for the guest.
>>>>>>
>>>>>> Extend the sPAPR IRQ backend with a new handler to populate the DT
>>>>>> with the appropriate "interrupt-controller" node.
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>>>> ---
>>>>>>  include/hw/ppc/spapr_irq.h  |  2 ++
>>>>>>  include/hw/ppc/spapr_xive.h |  2 ++
>>>>>>  include/hw/ppc/xics.h       |  4 +--
>>>>>>  hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
>>>>>>  hw/intc/xics_spapr.c        |  3 +-
>>>>>>  hw/ppc/spapr.c              |  3 +-
>>>>>>  hw/ppc/spapr_irq.c          |  3 ++
>>>>>>  7 files changed, 77 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>>>>>> index 23cdb51b879e..e51e9f052f63 100644
>>>>>> --- a/include/hw/ppc/spapr_irq.h
>>>>>> +++ b/include/hw/ppc/spapr_irq.h
>>>>>> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
>>>>>>      void (*free)(sPAPRMachineState *spapr, int irq, int num);
>>>>>>      qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
>>>>>>      void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
>>>>>> +    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
>>>>>> +                        void *fdt, uint32_t phandle);
>>>>>>  } sPAPRIrq;
>>>>>>  
>>>>>>  extern sPAPRIrq spapr_irq_xics;
>>>>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>>>>>> index 9506a8f4d10a..728a5e8dc163 100644
>>>>>> --- a/include/hw/ppc/spapr_xive.h
>>>>>> +++ b/include/hw/ppc/spapr_xive.h
>>>>>> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
>>>>>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>>>>>  
>>>>>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>>>>> +                   uint32_t phandle);
>>>>>>  
>>>>>>  #endif /* PPC_SPAPR_XIVE_H */
>>>>>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>>>>>> index 9958443d1984..14afda198cdb 100644
>>>>>> --- a/include/hw/ppc/xics.h
>>>>>> +++ b/include/hw/ppc/xics.h
>>>>>> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
>>>>>>      ICPState *(*icp_get)(XICSFabric *xi, int server);
>>>>>>  } XICSFabricClass;
>>>>>>  
>>>>>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>>>>>> -
>>>>>>  ICPState *xics_icp_get(XICSFabric *xi, int server);
>>>>>>  
>>>>>>  /* Internal XICS interfaces */
>>>>>> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
>>>>>>  
>>>>>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>>>>>  
>>>>>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>>>>> +                   uint32_t phandle);
>>>>>>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>>>>>>  void xics_spapr_init(sPAPRMachineState *spapr);
>>>>>>  
>>>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>>>> index 982ac6e17051..a6d854b07690 100644
>>>>>> --- a/hw/intc/spapr_xive.c
>>>>>> +++ b/hw/intc/spapr_xive.c
>>>>>> @@ -14,6 +14,7 @@
>>>>>>  #include "target/ppc/cpu.h"
>>>>>>  #include "sysemu/cpus.h"
>>>>>>  #include "monitor/monitor.h"
>>>>>> +#include "hw/ppc/fdt.h"
>>>>>>  #include "hw/ppc/spapr.h"
>>>>>>  #include "hw/ppc/spapr_xive.h"
>>>>>>  #include "hw/ppc/xive.h"
>>>>>> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>>>>>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>>>>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>>>>>  }
>>>>>> +
>>>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>>>>> +                   uint32_t phandle)
>>>>>> +{
>>>>>> +    sPAPRXive *xive = spapr->xive;
>>>>>> +    int node;
>>>>>> +    uint64_t timas[2 * 2];
>>>>>> +    /* Interrupt number ranges for the IPIs */
>>>>>> +    uint32_t lisn_ranges[] = {
>>>>>> +        cpu_to_be32(0),
>>>>>> +        cpu_to_be32(nr_servers),
>>>>>> +    };
>>>>>> +    uint32_t eq_sizes[] = {
>>>>>> +        cpu_to_be32(12), /* 4K */
>>>>>> +        cpu_to_be32(16), /* 64K */
>>>>>> +        cpu_to_be32(21), /* 2M */
>>>>>> +        cpu_to_be32(24), /* 16M */
>>>>>
>>>>> For KVM, are we going to need to clamp this list based on the
>>>>> pagesizes the guest can use?
>>>>
>>>> I would say so. Is there a KVM service for that ?
>>>
>>> I'm not sure what you mean by a KVM service here.
>>
>> I meant a routine giving a list of the possible backing pagesizes.
> 
> I'm still not really sure what you mean by that.
> 
>>>> Today, the OS scans the list and picks the size fitting its current 
>>>> PAGE_SIZE/SHIFT. But I suppose it would be better to advertise 
>>>> only the page sizes that QEMU/KVM supports. Or should we play safe 
>>>> and only export : 4K, 64K ? 
>>>
>>> So, I'm guessing the limitation here is that when we use the KVM XIVE
>>> acceleration, the EQ will need to be host-contiguous?  
>>
>> The EQ should be a single page (from the specs). So we don't have 
>> a contiguity problem.
> 
> A single page according to whom, though?  A RPT guest can use 2MiB
> pages, even if it is backed with smaller pages on the host, but I'm
> guessing it would not work to have its EQ be a 2MiB page, since it
> won't be host-contiguous.

ok. yes. the HW won't be able to handle that.

The specs are not very clear on that topic. The Qsize field is 4bits, 
page size being (2^(n+12)), so that's a lot of different page sizes but 
the implementation only refers to 4K and 64K. 
 
C.

>>> So, we can't have EQs larger than the backing pagesize for guest 
>>> memory.
>>>
>>> We try to avoid having guest visible differences based on whether
>>> we're using KVM or not, so we don't want to change this list depending
>>> on whether KVM is active etc. - or on whether we're backing the guest
>>> with hugepages.
>>>
>>> We could reuse the cap-hpt-max-page-size machine parameter which also
>>> relates to backing page size, but that might be confusing since it's
>>> explicitly about HPT only whereas this limitation would apply to RPT
>>> guests too, IIUC.
>>>
>>> I think our best bet is probably to limit to 64kiB queues
>>> unconditionally.  
>>
>> OK. That's the default. We can refine this list of page sizes later on
>> when we introduce KVM support if needed.
>>
>>> AIUI, that still gives us 8192 slots in the queue,
>>
>> The entries are 32bits. So that's 16384 entries. 
> 
> Even better.
> 
>>> which is enough to store one of every possible irq, which should be
>>> enough.
>>
>> yes. 
>>
>> I don't think Linux measures how much entries are dequeued at once.
>> It would give us a feel of how much pressure these EQs can sustain.
>>
>> C.
>>  
>>> There's still an issue if we have a 4kiB pagesize host kernel, but we
>>> could just disable kernel_irqchip in that situation, since we don't
>>> expect people to be using that much in practice.
>>>
>>>>>> +    };
>>>>>> +    /* The following array is in sync with the reserved priorities
>>>>>> +     * defined by the 'spapr_xive_priority_is_reserved' routine.
>>>>>> +     */
>>>>>> +    uint32_t plat_res_int_priorities[] = {
>>>>>> +        cpu_to_be32(7),    /* start */
>>>>>> +        cpu_to_be32(0xf8), /* count */
>>>>>> +    };
>>>>>> +    gchar *nodename;
>>>>>> +
>>>>>> +    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
>>>>>> +    timas[0] = cpu_to_be64(xive->tm_base +
>>>>>> +                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
>>>>>> +    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
>>>>>> +    timas[2] = cpu_to_be64(xive->tm_base +
>>>>>> +                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
>>>>>> +    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
>>>>>
>>>>> So this gives the MMIO address of the TIMA.  
>>>>
>>>> Yes. It is considered to be a fixed "address" 
>>>>
>>>>> Where does the guest get the MMIO address for the ESB pages from?
>>>>
>>>> with the hcall H_INT_GET_SOURCE_INFO.
>>>
>>> Ah, right.
>>>
>>
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  2018-12-12  0:54       ` David Gibson
@ 2018-12-12  9:13         ` Cédric Le Goater
  2018-12-15  8:09           ` David Gibson
  0 siblings, 1 reply; 61+ messages in thread
From: Cédric Le Goater @ 2018-12-12  9:13 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[ ... ] 


>>>> +
>>>> +static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq)
>>>> +{
>>>> +    return spapr_irq_current(spapr)->qirq(spapr, irq);
>>>
>>> Urgh... I don't think this is going to work.  IIRC the various devices
>>> (PHB, VIO, etc.)  are wired up to their qirqs at realize() time, so if
>>> you reboot from a XIVE guest to XICS guest (or maybe the other way
>>> around) the peripherals won't be able to signal irqs in the new
>>> scheme.
>>
>> It does. The IRQ numbers are claimed in both backends.
> 
> Yes, I realize that, but the two backends still have their own set of
> qirqs, which have their own set_irq routines associated with them.
> 
>> This is the problem since the very beginning. For reset and migration
>> to work, we need to keep in sync the IRQ number space of the machine 
>> and the different interrupt controllers.
> 
> Sure, we have the numbers in sync, but that won't help if when the
> peripherals do a qemu_irq_pulse() it goes to the wrong backend's
> trigger routine.

Ah. You mean that the devices could bypass the sPAPR IRQ layer and 
trigger the IRQ directly from the IRQ controller model ? it's not 
the case today. 

I agree that having a common set of qirqs and a qemu_irq handler 
doing the dispatch on the selected interrupt contoller model is 
good idea. I didn't go in that direction because KVM brings additional 
head aches.

Where would put the qirqs ? at the machine level I suppose.

Thanks,

C.    
 
> 
>>
>> C. 
>>
>>
>>> I think instead we need a common set of qirqs, whose set_irq routine
>>> looks at whether to signal XICS or XIVE.  FOr now I think the easiest
>>> approach is to layer those on top of the existing XICS or XIVE
>>> specific qirqs.  Later we might want to remove the (input) qirqs
>>> entirely from the XICS and XIVE subsystems, instead having just
>>> explicit trigger functions.  Then spapr will always supply the qirqs
>>> which call into one or the other.
>>>
>>>> +}
>>>> +
>>>> +static void spapr_irq_print_info_dual(sPAPRMachineState *spapr, Monitor *mon)
>>>> +{
>>>> +    spapr_irq_current(spapr)->print_info(spapr, mon);
>>>> +}
>>>> +
>>>> +static void spapr_irq_dt_populate_dual(sPAPRMachineState *spapr,
>>>> +                                       uint32_t nr_servers, void *fdt,
>>>> +                                       uint32_t phandle)
>>>> +{
>>>> +    spapr_irq_current(spapr)->dt_populate(spapr, nr_servers, fdt, phandle);
>>>> +}
>>>> +
>>>> +static Object *spapr_irq_cpu_intc_create_dual(sPAPRMachineState *spapr,
>>>> +                                              Object *cpu, Error **errp)
>>>> +{
>>>> +    Error *local_err = NULL;
>>>> +
>>>> +    spapr_irq_xive.cpu_intc_create(spapr, cpu, &local_err);
>>>> +    if (local_err) {
>>>> +        error_propagate(errp, local_err);
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    /* Default to XICS interrupt mode */
>>>> +    return spapr_irq_xics.cpu_intc_create(spapr, cpu, errp);
>>>> +}
>>>> +
>>>> +static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
>>>> +{
>>>> +    /*
>>>> +     * Force a reset of the XIVE backend after migration. The machine
>>>> +     * defaults to XICS at startup.
>>>> +     */
>>>> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
>>>> +        spapr_irq_xive.reset(spapr, &error_fatal);
>>>> +    }
>>>> +
>>>> +    return spapr_irq_current(spapr)->post_load(spapr, version_id);
>>>> +}
>>>> +
>>>> +static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
>>>> +{
>>>> +    /*
>>>> +     * Reset the interrupt mode selected by CAS.
>>>> +     */
>>>> +    spapr_irq_current(spapr)->reset(spapr, errp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Define values in sync with the XIVE and XICS backend
>>>> + */
>>>> +#define SPAPR_IRQ_DUAL_NR_IRQS     0x2000
>>>> +#define SPAPR_IRQ_DUAL_NR_MSIS     (SPAPR_IRQ_DUAL_NR_IRQS - SPAPR_IRQ_MSI)
>>>> +
>>>> +sPAPRIrq spapr_irq_dual = {
>>>> +    .nr_irqs     = SPAPR_IRQ_DUAL_NR_IRQS,
>>>> +    .nr_msis     = SPAPR_IRQ_DUAL_NR_MSIS,
>>>> +    .ov5         = SPAPR_OV5_XIVE_BOTH,
>>>> +
>>>> +    .init        = spapr_irq_init_dual,
>>>> +    .claim       = spapr_irq_claim_dual,
>>>> +    .free        = spapr_irq_free_dual,
>>>> +    .qirq        = spapr_qirq_dual,
>>>> +    .print_info  = spapr_irq_print_info_dual,
>>>> +    .dt_populate = spapr_irq_dt_populate_dual,
>>>> +    .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
>>>> +    .post_load   = spapr_irq_post_load_dual,
>>>> +    .reset       = spapr_irq_reset_dual,
>>>> +};
>>>> +
>>>>  /*
>>>>   * sPAPR IRQ frontend routines for devices
>>>>   */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' machine type
  2018-12-11 16:44         ` Cédric Le Goater
@ 2018-12-15  8:03           ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-15  8:03 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 10283 bytes --]

On Tue, Dec 11, 2018 at 05:44:26PM +0100, Cédric Le Goater wrote:
> On 12/11/18 11:42 AM, Cédric Le Goater wrote:
> > On 12/11/18 3:06 AM, David Gibson wrote:
> >> On Mon, Dec 10, 2018 at 11:17:33PM +0100, Cédric Le Goater wrote:
> >>> On 12/9/18 8:46 PM, Cédric Le Goater wrote:
> >>>> This pseries machine makes use of a new sPAPR IRQ backend supporting
> >>>> the XIVE interrupt mode.
> >>>>
> >>>> The guest OS is required to have support for the XIVE exploitation
> >>>> mode of the POWER9 interrupt controller.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  hw/ppc/spapr.c | 15 +++++++++++++++
> >>>>  1 file changed, 15 insertions(+)
> >>>>
> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>> index 4012ebd794a4..3cc134a0b673 100644
> >>>> --- a/hw/ppc/spapr.c
> >>>> +++ b/hw/ppc/spapr.c
> >>>> @@ -3985,6 +3985,21 @@ static void spapr_machine_4_0_class_options(MachineClass *mc)
> >>>>  
> >>>>  DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
> >>>>  
> >>>> +static void spapr_machine_4_0_xive_instance_options(MachineState *machine)
> >>>> +{
> >>>> +    spapr_machine_4_0_instance_options(machine);
> >>>> +}
> >>>> +
> >>>> +static void spapr_machine_4_0_xive_class_options(MachineClass *mc)
> >>>> +{
> >>>> +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> >>>> +
> >>>> +    spapr_machine_4_0_class_options(mc);> +    smc->irq = &spapr_irq_xive;
> >>>
> >>> I have been adding checks on the CPU model to export the XIVE capability 
> >>> only on POWER9 processors but it breaks some of the tests.
> >>>
> >>> I was wondering if we could add a default POWER9 CPU to the -xive machine : 
> >>>
> >>>   + mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
> >>>
> >>> and if we could change tests/cpu-plug-test.c with :
> >>>
> >>>   @@ -198,8 +198,13 @@ static void add_pseries_test_case(const
> >>>        }
> >>>        data = g_new(PlugTestData, 1);
> >>>        data->machine = g_strdup(mname);
> >>>   -    data->cpu_model = "power8_v2.0";
> >>>   -    data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
> >>>   +    if (g_str_has_suffix(mname, "xive")) {
> >>>   +        data->cpu_model = "power9_v2.0";
> >>>   +        data->device_model = g_strdup("power9_v2.0-spapr-cpu-core");
> >>>   +    } else {
> >>>   +        data->cpu_model = "power8_v2.0";
> >>>   +        data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
> >>>   +    }
> >>>        data->sockets = 2;
> >>>        data->cores = 3;
> >>>        data->threads = 1;
> >>>
> >>> or if there is a better way ?
> >>
> >> So, I'd actually prefer a machine option, rather than wholly separate
> >> machine types to select xics/xive/dual.  Machine types was fine while
> >> prototyping this, but I don't think we want to actually merge new
> >> machine types for it.
> > 
> > I agree. 
> > 
> >> So, instead I think we want a machine option which can be set to
> >> xics/xive/dual, with xics being the default for earlier machine types
> >> and dual the default for 4.0 onwards.
> > 
> > I will revive an old patch doing just that. 
> > 
> > The question now is how to link the sPAPRMachineState instance to 
> > the selected sPAPR IRQ backend. 
> 
> Would something like below be acceptable ? If so I will change the 
> remaining patches to use 'spapr->irq' and not 'smc->irq' anymore.

Yeah, I think that looks ok.

>  
> Thanks,
> 
> C.
> 
> Index: qemu-xive.git/include/hw/ppc/spapr.h
> ===================================================================
> --- qemu-xive.git.orig/include/hw/ppc/spapr.h
> +++ qemu-xive.git/include/hw/ppc/spapr.h
> @@ -177,6 +177,7 @@ struct sPAPRMachineState {
>      int32_t irq_map_nr;
>      unsigned long *irq_map;
>      sPAPRXive  *xive;
> +    sPAPRIrq *irq;
>  
>      bool cmd_line_caps[SPAPR_CAP_NUM];
>      sPAPRCapabilities def, eff, mig;
> Index: qemu-xive.git/hw/ppc/spapr.c
> ===================================================================
> --- qemu-xive.git.orig/hw/ppc/spapr.c
> +++ qemu-xive.git/hw/ppc/spapr.c
> @@ -1302,7 +1301,7 @@ static void *spapr_build_fdt(sPAPRMachin
>      }
>  
>      QLIST_FOREACH(phb, &spapr->phbs, list) {
> -        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, smc->irq->nr_msis);
> +        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, spapr->irq->nr_msis);
>          if (ret < 0) {
>              error_report("couldn't setup PCI devices in fdt");
>              exit(1);
> @@ -3056,9 +3055,38 @@ static void spapr_set_vsmt(Object *obj,
>      visit_type_uint32(v, name, (uint32_t *)opaque, errp);
>  }
>  
> +static char *spapr_get_irq(Object *obj, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    if (spapr->irq == &spapr_irq_xics_legacy) {
> +        return g_strdup("legacy");
> +    } else if (spapr->irq == &spapr_irq_xics) {
> +        return g_strdup("xics");
> +    } else if (spapr->irq == &spapr_irq_xive) {
> +        return g_strdup("xive");
> +    }
> +    g_assert_not_reached();
> +}
> +
> +static void spapr_set_irq(Object *obj, const char *value, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    /* We don't want to set the legacy IRQ backend */
> +    if (strcmp(value, "xics") == 0) {
> +        spapr->irq = &spapr_irq_xics;
> +    } else if (strcmp(value, "xive") == 0) {
> +        spapr->irq = &spapr_irq_xive;
> +    } else {
> +        error_setg(errp, "Bad value for \"irq\" property");
> +    }
> +}
> +
>  static void spapr_instance_init(Object *obj)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>  
>      spapr->htab_fd = -1;
>      spapr->use_hotplug_event_source = true;
> @@ -3092,6 +3120,13 @@ static void spapr_instance_init(Object *
>                                      " the host's SMT mode", &error_abort);
>      object_property_add_bool(obj, "vfio-no-msix-emulation",
>                               spapr_get_msix_emulation, NULL, NULL);
> +
> +    /* Get the default from the machine class */
> +    spapr->irq = smc->irq;
> +    object_property_add_str(obj, "irq", spapr_get_irq, spapr_set_irq, NULL);
> +    object_property_set_description(obj, "irq",
> +                 "Specifies the interrupt controller mode (xics, xive)",
> +                 NULL);
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -3814,9 +3849,8 @@ static void spapr_pic_print_info(Interru
>                                   Monitor *mon)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>  
> -    smc->irq->print_info(spapr, mon);
> +    spapr->irq->print_info(spapr, mon);
>  }
>  
>  int spapr_get_vcpu_id(PowerPCCPU *cpu)
> Index: qemu-xive.git/hw/ppc/spapr_irq.c
> ===================================================================
> --- qemu-xive.git.orig/hw/ppc/spapr_irq.c
> +++ qemu-xive.git/hw/ppc/spapr_irq.c
> @@ -94,8 +94,7 @@ error:
>  static void spapr_irq_init_xics(sPAPRMachineState *spapr, Error **errp)
>  {
>      MachineState *machine = MACHINE(spapr);
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -    int nr_irqs = smc->irq->nr_irqs;
> +    int nr_irqs = spapr->irq->nr_irqs;
>      Error *local_err = NULL;
>  
>      if (kvm_enabled()) {
> @@ -234,7 +233,6 @@ sPAPRIrq spapr_irq_xics = {
>  static void spapr_irq_init_xive(sPAPRMachineState *spapr, Error **errp)
>  {
>      MachineState *machine = MACHINE(spapr);
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>      uint32_t nr_servers = spapr_max_server_number(spapr);
>      DeviceState *dev;
>      int i;
> @@ -248,7 +246,7 @@ static void spapr_irq_init_xive(sPAPRMac
>      }
>  
>      dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
> -    qdev_prop_set_uint32(dev, "nr-irqs", smc->irq->nr_irqs);
> +    qdev_prop_set_uint32(dev, "nr-irqs", spapr->irq->nr_irqs);
>      /*
>       * 8 XIVE END structures per CPU. One for each available priority
>       */
> @@ -353,50 +351,38 @@ sPAPRIrq spapr_irq_xive = {
>   */
>  void spapr_irq_init(sPAPRMachineState *spapr, Error **errp)
>  {
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -
>      /* Initialize the MSI IRQ allocator. */
>      if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
> -        spapr_irq_msi_init(spapr, smc->irq->nr_msis);
> +        spapr_irq_msi_init(spapr, spapr->irq->nr_msis);
>      }
>  
> -    smc->irq->init(spapr, errp);
> +    spapr->irq->init(spapr, errp);
>  }
>  
>  int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp)
>  {
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -
> -    return smc->irq->claim(spapr, irq, lsi, errp);
> +    return spapr->irq->claim(spapr, irq, lsi, errp);
>  }
>  
>  void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
>  {
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -
> -    smc->irq->free(spapr, irq, num);
> +    spapr->irq->free(spapr, irq, num);
>  }
>  
>  qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq)
>  {
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -
> -    return smc->irq->qirq(spapr, irq);
> +    return spapr->irq->qirq(spapr, irq);
>  }
>  
>  int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
>  {
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -
> -    return smc->irq->post_load(spapr, version_id);
> +    return spapr->irq->post_load(spapr, version_id);
>  }
>  
>  void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp)
>  {
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -
> -    if (smc->irq->reset) {
> -        smc->irq->reset(spapr, errp);
> +    if (spapr->irq->reset) {
> +        spapr->irq->reset(spapr, errp);
>      }
>  }
>  
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
  2018-12-12  9:13         ` Cédric Le Goater
@ 2018-12-15  8:09           ` David Gibson
  0 siblings, 0 replies; 61+ messages in thread
From: David Gibson @ 2018-12-15  8:09 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

On Wed, Dec 12, 2018 at 10:13:29AM +0100, Cédric Le Goater wrote:
> [ ... ] 
> 
> 
> >>>> +
> >>>> +static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq)
> >>>> +{
> >>>> +    return spapr_irq_current(spapr)->qirq(spapr, irq);
> >>>
> >>> Urgh... I don't think this is going to work.  IIRC the various devices
> >>> (PHB, VIO, etc.)  are wired up to their qirqs at realize() time, so if
> >>> you reboot from a XIVE guest to XICS guest (or maybe the other way
> >>> around) the peripherals won't be able to signal irqs in the new
> >>> scheme.
> >>
> >> It does. The IRQ numbers are claimed in both backends.
> > 
> > Yes, I realize that, but the two backends still have their own set of
> > qirqs, which have their own set_irq routines associated with them.
> > 
> >> This is the problem since the very beginning. For reset and migration
> >> to work, we need to keep in sync the IRQ number space of the machine 
> >> and the different interrupt controllers.
> > 
> > Sure, we have the numbers in sync, but that won't help if when the
> > peripherals do a qemu_irq_pulse() it goes to the wrong backend's
> > trigger routine.
> 
> Ah. You mean that the devices could bypass the sPAPR IRQ layer and 
> trigger the IRQ directly from the IRQ controller model ? it's not 
> the case today.

Ah, right, sorry.  We're saved by the fact that basically the only
root level devices for spapr are the PHB and VIO devices, and it so
happens that those look up the relevant qirqs from spapr at interrupt
time, rather than in advance.

I think it's fragile to rely on that - qirqs are supposed to be stable
objects that devices can keep a reference to and use later.

> I agree that having a common set of qirqs and a qemu_irq handler 
> doing the dispatch on the selected interrupt contoller model is 
> good idea. I didn't go in that direction because KVM brings additional 
> head aches.
> 
> Where would put the qirqs ? at the machine level I suppose.

Yes.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2018-12-15  8:16 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-09 19:45 [Qemu-devel] [PATCH v7 00/19] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 01/19] ppc/xive: add support for the END Event State Buffers Cédric Le Goater
2018-12-10  4:16   ` David Gibson
2018-12-10  7:11     ` Cédric Le Goater
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 02/19] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
2018-12-10  4:19   ` David Gibson
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 03/19] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
2018-12-10  4:27   ` David Gibson
2018-12-10  7:15     ` Cédric Le Goater
2018-12-11  1:37       ` David Gibson
2018-12-11 10:43         ` Cédric Le Goater
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 04/19] ppc/xive: notify the CPU when the interrupt priority is more privileged Cédric Le Goater
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 05/19] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
2018-12-10  4:36   ` David Gibson
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 06/19] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
2018-12-10  4:42   ` David Gibson
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 07/19] spapr: introduce a new machine IRQ backend for XIVE Cédric Le Goater
2018-12-10  4:45   ` David Gibson
2018-12-09 19:45 ` [Qemu-devel] [PATCH v7 08/19] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
2018-12-10  6:34   ` David Gibson
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode Cédric Le Goater
2018-12-10  6:39   ` David Gibson
2018-12-10  7:53     ` Cédric Le Goater
2018-12-11  0:38       ` David Gibson
2018-12-11  9:06         ` Cédric Le Goater
2018-12-12  0:19           ` David Gibson
2018-12-12  7:37             ` Cédric Le Goater
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 10/19] spapr: allocate the interrupt thread context under the CPU core Cédric Le Goater
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 11/19] spapr: extend the sPAPR IRQ backend for XICS migration Cédric Le Goater
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend Cédric Le Goater
2018-12-10  6:42   ` David Gibson
2018-12-10  7:30     ` Cédric Le Goater
2018-12-11 10:55     ` Cédric Le Goater
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 13/19] spapr: add an extra OV5 field " Cédric Le Goater
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset Cédric Le Goater
2018-12-11  1:46   ` David Gibson
2018-12-11 10:58     ` Cédric Le Goater
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 15/19] spapr/xive: enable XIVE MMIOs " Cédric Le Goater
2018-12-11  1:47   ` David Gibson
2018-12-11 10:14     ` Cédric Le Goater
2018-12-12  0:32       ` David Gibson
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 16/19] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS Cédric Le Goater
2018-12-11  2:03   ` David Gibson
2018-12-11 10:19     ` Cédric Le Goater
2018-12-12  0:54       ` David Gibson
2018-12-12  9:13         ` Cédric Le Goater
2018-12-15  8:09           ` David Gibson
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 17/19] spapr: Add a pseries-4.0 machine type Cédric Le Goater
2018-12-09 22:05   ` Benjamin Herrenschmidt
2018-12-10  3:41     ` David Gibson
2018-12-10  7:09       ` Cédric Le Goater
2018-12-10  6:45   ` David Gibson
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 18/19] spapr: add a 'pseries-4.0-xive' " Cédric Le Goater
2018-12-10 22:17   ` Cédric Le Goater
2018-12-11  2:06     ` David Gibson
2018-12-11 10:42       ` Cédric Le Goater
2018-12-11 16:44         ` Cédric Le Goater
2018-12-15  8:03           ` David Gibson
2018-12-12  0:34         ` David Gibson
2018-12-12  7:26           ` Cédric Le Goater
2018-12-09 19:46 ` [Qemu-devel] [PATCH v7 19/19] spapr: add a 'pseries-4.0-dual' " Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.