All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9)
@ 2018-12-05 23:22 Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 01/37] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
                   ` (37 more replies)
  0 siblings, 38 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Hello,

Here is the version 6 of the QEMU models adding support for the XIVE
interrupt controller to the sPAPR machine, under TCG and KVM. Support
for the PowerNV POWER9 machine will be proposed in a PowerNV patchset
sometime next year now.

The most important changes for sPAPR are the removal of the SysBusDevice
inheritance, the removal of the KVM classes, support for XIVE structures
in big-endian only format and the introduction of a VM change state handler
for KVM migration.

Thanks,

C.


Changes in v6 :

 Common XIVE models :

 - included documentation in xive.h
 - removed SysBusDevice inheritance from Xive Sources
 - set ASSERTED bit in xive_source_lsi_trigger()
 - renamed XiveFabric in XiveNotifier
 - reworked XIVE tables accessors, introduce a _write method for words
 - introduced the source number encoding on PowerBUS in accessors
 - used fixed big-endian format for XIVE structures
 - reworked the presenter matching routine
 - renamed *cam_line helpers

 sPAPR models :

 - reworked the 'info pic output
 - moved the END reset at the sPAPR level
 - renamed the spapr_xive_irq_enable/disable routine in claim/free
 - removed the reset_tctx hook
 - renamed xics_max_server_number() and fixed spapr_irq_init() prototype 
 - removed the use of the xive_router routines in the sPAPR XIVE hcalls
 - used address_space_map() to validate the EQ
 - introduced a spapr_xive_reset_tctx() to set the OS CAM line at reset
 - introduced OV5 defines for the XIVE mode
 - removed the XIVE classes
 - enable/disable the XIVE MMIOs depending on the mode
 - introduced a spapr_rtas_unregister() helper
 - mixed enhancements

 KVM :

 - removed the KVM XIVE models and reworked KVM support with helpers
 - introduced a VM change state handler to quiesce XIVE before
   transferring the EQ pages
 - improved KVM support for the dual machine (removed extra cleanups)

 PowerNV:

 - postponed for a PowerNV patchset only

Changes in v5 :

 Common XIVE models :

 - renamed the XIVE structures to fit the changes of the XIVE
   architecture documents: IVE, EQD, VPD -> EAS, END, NVT   
 - reworked the monitor ouput to print the EQ contents

 sPAPR models :

 - introduced a XIVE Router 'reset' method for the Xive Thread Context
   to set the OS CAM line of the VCPU
 - introduced a spapr_irq_init() routine to the sPAPR IRQ backend
   and reworked the XIVE-only machine to fit mainline QEMU
 - introduced a reset() method to the sPAPR IRQ backend to handle
   changes in the interrupt mode after machine reset
 - introduced a 'dual' machine supporting both interrupt mode

 KVM :

 - introduced some more sPAPR NVT and END indexing helpers for KVM support
 - fixed the virtual LSIs in KVM by using the H_INT_ESB source flag
 - improved the KVM support with better common classes and cleaner
   QEMU<->KVM interfaces
 - improved KVM migration with a better control on the capture sequence.
   Still some issues with 'ceded' VCPUs 
 - introduced KVM support for the 'dual' machine

 PowerNV:

 - introduced address spaces for the IPI and END set translation
   tables


Changes in v4 :

   See https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01672.html


= XIVE =================================================================


The POWER9 processor comes with a new interrupt controller, called
XIVE as "eXternal Interrupt Virtualization Engine".


* Overall architecture


             XIVE Interrupt Controller
             +------------------------------------+      IPIs
             | +---------+ +---------+ +--------+ |    +-------+
             | |VC       | |CQ       | |PC      |----> | CORES |
             | |     esb | |         | |        |----> |       |
             | |     eas | |  Bridge | |   tctx |----> |       |
             | |SC   end | |         | |    nvt | |    |       |
 +------+    | +---------+ +----+----+ +--------+ |    +-+-+-+-+
 | RAM  |    +------------------|-----------------+      | | |
 |      |                       |                        | | |
 |      |                       |                        | | |
 |      |  +--------------------v------------------------v-v-v--+    other
 |      <--+                     Power Bus                      +--> chips
 |  esb |  +---------+-----------------------+------------------+
 |  eas |            |                       |
 |  end |         +--|------+                |
 |  nvt |       +----+----+ |           +----+----+
 +------+       |SC       | |           |SC       |
                |         | |           |         |
                | PQ-bits | |           | PQ-bits |
                | local   |-+           |  in VC  |
                +---------+             +---------+
                   PCIe                 NX,NPU,CAPI

                  SC: Source Controller (aka. IVSE)
                  VC: Virtualization Controller (aka. IVRE)
                  PC: Presentation Controller (aka. IVPE)
                  CQ: Common Queue (Bridge)

             PQ-bits: 2 bits source state machine (P:pending Q:queued)
                 esb: Event State Buffer (Array of PQ bits in an IVSE)
                 eas: Event Assignment Structure
                 end: Event Notification Descriptor
                 nvt: Notification Virtual Target
                tctx: Thread interrupt Context


The XIVE IC is composed of three sub-engines :

  - Interrupt Virtualization Source Engine (IVSE), or Source
    Controller (SC). These are found in PCI PHBs, in the PSI host
    bridge controller, but also inside the main controller for the
    core IPIs and other sub-chips (NX, CAP, NPU) of the
    chip/processor. They are configured to feed the IVRE with events.

  - Interrupt Virtualization Routing Engine (IVRE) or Virtualization
    Controller (VC). Its job is to match an event source with an Event
    Notification Descriptor (END).

  - Interrupt Virtualization Presentation Engine (IVPE) or Presentation
    Controller (PC). It maintains the interrupt context state of each
    thread and handles the delivery of the external exception to the
    thread.


* XIVE internal tables

Each of the sub-engines uses a set of tables to redirect exceptions
from event sources to CPU threads.

                                          +-------+
  User or OS                              |  EQ   |
      or                          +------>|entries|
  Hypervisor                      |       |  ..   |
    Memory                        |       +-------+
                                  |           ^
                                  |           |
             +-------------------------------------------------+
                                  |           |
  Hypervisor      +------+    +---+--+    +---+--+   +------+
    Memory        | ESB  |    | EAT  |    | ENDT |   | NVTT |
   (skiboot)      +----+-+    +----+-+    +----+-+   +------+
                    ^  |        ^  |        ^  |       ^
                    |  |        |  |        |  |       |
             +-------------------------------------------------+
                    |  |        |  |        |  |       |
                    |  |        |  |        |  |       |
               +----|--|--------|--|--------|--|-+   +-|-----+    +------+
               |    |  |        |  |        |  | |   | | tctx|    |Thread|
   IPI or   ---+    +  v        +  v        +  v |---| +  .. |----->     |
  HW events    |                                 |   |       |    |      |
               |             IVRE                |   | IVPE  |    +------+
               +---------------------------------+   +-------+
            


The IVSE have a 2-bits state machine, P for pending and Q for queued,
for each source that allows events to be triggered. They are stored in
an Event State Buffer (ESB) array and can be controlled by MMIOs.

If the event is let through, the IVRE looks up in the Event Assignment
Structure (EAS) table for an Event Notification Descriptor (END)
configured for the source. Each Event Notification Descriptor defines
a notification path to a CPU and an in-memory Event Queue, in which
will be enqueued an EQ data for the OS to pull.

The IVPE determines if a Notification Virtual Target (NVT) can handle
the event by scanning the thread contexts of the VCPUs dispatched on
the processor HW threads. It maintains the interrupt context state of
each thread in a NVT table.


* Overview of the QEMU models for the XIVE sub-engines

The XiveSource models the IVSE in general, internal and external. It
handles the source ESBs and the MMIO interface to control them.

The XiveNotifier is a small helper interface interconnecting the
XiveSource to the XiveRouter.

The XiveRouter is an abstract model acting as a combined IVRE and
IVPE. It routes event notifications using the IVE and EQD tables to
the IVPE sub-engine which does a CAM scan to find a CPU to deliver the
exception. Storage should be provided by the inheriting classes.

XiveENDSource is a special source object. It exposes the EQ ESB MMIOs of
the Event Queues which are used for coalescing event notifications and
for escalation. Not used on the field, only to sync the EQ cache in
OPAL.

Finally, the XiveTCTX contains the interrupt state context of a thread,
four sets of registers, one for each exception that can be delivered
to a CPU. These contexts are scanned by the IVPE to find a matching VP
when a notification is triggered. It also models the Thread Interrupt
Management Area (TIMA), which exposes the thread context registers to
the CPU for interrupt management.


* XIVE for sPAPR

sPAPRXive models the XIVE interrupt controller of a sPAPR machine. It
inherits from the XiveRouter and provisions storage for the IVE and
END tables. The NVT table does not need a backend in sPAPR. It owns a
XiveSource object for the IPIs and the virtual device interrupts, a
memory region for the TIMA and a XiveENDSource to manage the END ESBs.
(not used by Linux).

These choices were made to have a sPAPR interrupt controller
consistent with the one found on baremetal and to facilitate KVM
support, the main difficulty being the host memory regions exposed to
the guest.

The NVT and tbe END indexing needs some care and a set of helpers are
defined to ease the conversion between the CPU id as seen by the guest
and the XIVE identifiers manipulated by the models. 


* Integration in the sPAPR machine, xive only and dual

A new sPAPR IRQ backend is defined for XIVE. It introduces a couple of
new operations to handle the differences in the creation of the device
tree and in the allocation of the CPU interrupt controller. A new
'xive' only pseries machine is defined using this XIVE backend.

Being able to support both interrupt mode in the same machine requires
some more changes. As the machine chooses the interrupt mode at CAS
time, it is activated after a reconfiguration done in a reset. This is
handled by a new 'dual' sPAPR IRQ backend which is built on top of the
XICS and XIVE backend. A new 'dual' pseries machine is defined using
this backend.


* KVM support

Support for KVM introduces a set of specific XIVE models, very much
like XICS does, which self-connect to their KVM counterparts in the
Linux kernel. Two host memory regions are exposed to the guest and
need special care at initialization :

  - ESB mmios
  - Thread Interrupt Management Area (TIMA)

The models uses KVM accessors to synchronize the QEMU state with
KVM. The states are :

  - the source configuration (EAT)
  - the END configuration (ENDT)
  - the OS EQ state (toggle bit and index)
  - the thread interrupt context registers.

Hybrid guest using KVM and an emulated irqchip (kernel_irqchip=off) is
supported. Migration under KVM is supported.

KVM support for the 'dual' machine required some more changes. Both
interrupt mode need to be initialized at the QEMU level to keep the
IRQ number space in sync and to allow switching from one mode to
another. At the KVM level, the whole initialization of the KVM device,
sources and presenters, needs to be done in the reset handler when the
interrupt mode is chosen. This is a major change in the KVM models.

KVM being initialized at reset, we loose the possiblity to fallback to
the QEMU emulated mode in case of failure and failures become fatal to
the machine.


* PowerNV models

The PnvXIVE model uses the XiveRouter abstract model just like
sPAPRXive. It provides accessors to the EAS, END and NVT tables which
are stored in the QEMU PowerNV machine and not in QEMU anymore. It
owns a set of memory regions for the IC registers, the ESBs, the END
ESBs, the TIMA, the notification MMIO.

Multichip is supported and the available IVSEs are the internal one
for the IPIS, the PSI host bridge controller and PHB4.

The next interesting step would be to add escalation events and model
the VCPU dispatching to support emulated KVM guests.


* GitHub trees
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-v6-3.1
  
QEMU PowerNV:

  https://github.com/legoater/qemu/commits/powernv-3.1

Linux/KVM:

  https://github.com/legoater/linux/commits/xive-4.20

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Cédric Le Goater (37):
  ppc/xive: introduce a XIVE interrupt source model
  ppc/xive: add support for the LSI interrupt sources
  ppc/xive: introduce the XiveNotifier interface
  ppc/xive: introduce the XiveRouter model
  ppc/xive: introduce the XIVE Event Notification Descriptors
  ppc/xive: add support for the END Event State buffers
  ppc/xive: introduce the XIVE interrupt thread context
  ppc/xive: introduce a simplified XIVE presenter
  ppc/xive: notify the CPU when the interrupt priority is more
    privileged
  spapr/xive: introduce a XIVE interrupt controller
  spapr/xive: use the VCPU id as a NVT identifier
  spapr: initialize VSMT before initializing the IRQ backend
  spapr: introduce a spapr_irq_init() routine
  spapr: modify the irq backend 'init' method
  spapr: export and rename the xics_max_server_number() routine
  spapr: introdude a new machine IRQ backend for XIVE
  spapr: add hcalls support for the XIVE exploitation interrupt mode
  spapr: add device tree support for the XIVE exploitation mode
  spapr: allocate the interrupt thread context under the CPU core
  spapr: extend the sPAPR IRQ backend for XICS migration
  spapr: add a 'reset' method to the sPAPR IRQ backend
  spapr: add a 'pseries-3.1-xive' machine type
  linux-headers: update to 4.20-rc5
  spapr/xive: add KVM support
  spapr/xive: add state synchronization with KVM
  spapr/xive: introduce a VM state change handler
  spapr/xive: add migration support for KVM
  spapr/xive: fix migration of the XiveTCTX under TCG
  spapr: set the interrupt presenter at reset
  spapr/xive: enable XIVE MMIOs at reset
  spapr: add a 'pseries-3.1-dual' machine type
  ppc/xics: introduce a icp_kvm_connect() routine
  spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
  sysbus: add a sysbus_mmio_unmap() helper
  spapr: introduce routines to delete the KVM IRQ device
  spapr: check for KVM IRQ device activation
  spapr: add KVM support to the 'dual' machine

 default-configs/ppc64-softmmu.mak |    3 +
 include/hw/ppc/spapr.h            |   28 +-
 include/hw/ppc/spapr_cpu_core.h   |    2 +
 include/hw/ppc/spapr_irq.h        |   15 +-
 include/hw/ppc/spapr_xive.h       |   79 ++
 include/hw/ppc/xics.h             |    5 +-
 include/hw/ppc/xive.h             |  453 ++++++++
 include/hw/ppc/xive_regs.h        |  213 ++++
 include/hw/sysbus.h               |    1 +
 linux-headers/asm-powerpc/kvm.h   |   46 +
 linux-headers/linux/kvm.h         |    6 +
 target/ppc/kvm_ppc.h              |    6 +
 hw/core/sysbus.c                  |   10 +
 hw/intc/spapr_xive.c              | 1519 ++++++++++++++++++++++++++
 hw/intc/spapr_xive_kvm.c          |  860 +++++++++++++++
 hw/intc/xics_kvm.c                |  136 ++-
 hw/intc/xics_spapr.c              |    3 +-
 hw/intc/xive.c                    | 1686 +++++++++++++++++++++++++++++
 hw/ppc/spapr.c                    |   92 +-
 hw/ppc/spapr_cpu_core.c           |   31 +-
 hw/ppc/spapr_hcall.c              |   13 +
 hw/ppc/spapr_irq.c                |  434 +++++++-
 hw/ppc/spapr_rtas.c               |    2 +-
 target/ppc/kvm.c                  |    7 +
 hw/intc/Makefile.objs             |    3 +
 25 files changed, 5588 insertions(+), 65 deletions(-)
 create mode 100644 include/hw/ppc/spapr_xive.h
 create mode 100644 include/hw/ppc/xive.h
 create mode 100644 include/hw/ppc/xive_regs.h
 create mode 100644 hw/intc/spapr_xive.c
 create mode 100644 hw/intc/spapr_xive_kvm.c
 create mode 100644 hw/intc/xive.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 01/37] ppc/xive: introduce a XIVE interrupt source model
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 02/37] ppc/xive: add support for the LSI interrupt sources Cédric Le Goater
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The first sub-engine of the overall XIVE architecture is the Interrupt
Virtualization Source Engine (IVSE). An IVSE can be integrated into
another logic, like in a PCI PHB or in the main interrupt controller
to manage IPIs.

Each IVSE instance is associated with an Event State Buffer (ESB) that
contains a two bit state entry for each possible event source. When an
event is signaled to the IVSE, by MMIO or some other means, the
associated interrupt state bits are fetched from the ESB and
modified. Depending on the resulting ESB state, the event is forwarded
to the IVRE sub-engine of the controller doing the routing.

Each supported ESB entry is associated with either a single or a
even/odd pair of pages which provides commands to manage the source:
to EOI, to turn off the source for instance.

On a sPAPR machine, the O/S will obtain the page address of the ESB
entry associated with a source and its characteristic using the
H_INT_GET_SOURCE_INFO hcall. On PowerNV, a similar OPAL call is used.

The xive_source_notify() routine is in charge forwarding the source
event notification to the routing engine. It will be filled later on.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/xive.h             | 260 ++++++++++++++++++++
 hw/intc/xive.c                    | 382 ++++++++++++++++++++++++++++++
 hw/intc/Makefile.objs             |   1 +
 4 files changed, 644 insertions(+)
 create mode 100644 include/hw/ppc/xive.h
 create mode 100644 hw/intc/xive.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index aec2855750d6..2d1e7c5c4668 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -16,6 +16,7 @@ CONFIG_VIRTIO_VGA=y
 CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
+CONFIG_XIVE=$(CONFIG_PSERIES)
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
 CONFIG_SPAPR_RNG=y
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
new file mode 100644
index 000000000000..7aa2e3801222
--- /dev/null
+++ b/include/hw/ppc/xive.h
@@ -0,0 +1,260 @@
+/*
+ * QEMU PowerPC XIVE interrupt controller model
+ *
+ *
+ * The POWER9 processor comes with a new interrupt controller, called
+ * XIVE as "eXternal Interrupt Virtualization Engine".
+ *
+ * = Overall architecture
+ *
+ *
+ *              XIVE Interrupt Controller
+ *              +------------------------------------+      IPIs
+ *              | +---------+ +---------+ +--------+ |    +-------+
+ *              | |VC       | |CQ       | |PC      |----> | CORES |
+ *              | |     esb | |         | |        |----> |       |
+ *              | |     eas | |  Bridge | |   tctx |----> |       |
+ *              | |SC   end | |         | |    nvt | |    |       |
+ *  +------+    | +---------+ +----+----+ +--------+ |    +-+-+-+-+
+ *  | RAM  |    +------------------|-----------------+      | | |
+ *  |      |                       |                        | | |
+ *  |      |                       |                        | | |
+ *  |      |  +--------------------v------------------------v-v-v--+    other
+ *  |      <--+                     Power Bus                      +--> chips
+ *  |  esb |  +---------+-----------------------+------------------+
+ *  |  eas |            |                       |
+ *  |  end |         +--|------+                |
+ *  |  nvt |       +----+----+ |           +----+----+
+ *  +------+       |SC       | |           |SC       |
+ *                 |         | |           |         |
+ *                 | PQ-bits | |           | PQ-bits |
+ *                 | local   |-+           |  in VC  |
+ *                 +---------+             +---------+
+ *                    PCIe                 NX,NPU,CAPI
+ *
+ *                   SC: Source Controller (aka. IVSE)
+ *                   VC: Virtualization Controller (aka. IVRE)
+ *                   PC: Presentation Controller (aka. IVPE)
+ *                   CQ: Common Queue (Bridge)
+ *
+ *              PQ-bits: 2 bits source state machine (P:pending Q:queued)
+ *                  esb: Event State Buffer (Array of PQ bits in an IVSE)
+ *                  eas: Event Assignment Structure
+ *                  end: Event Notification Descriptor
+ *                  nvt: Notification Virtual Target
+ *                 tctx: Thread interrupt Context
+ *
+ *
+ * The XIVE IC is composed of three sub-engines :
+ *
+ * - Interrupt Virtualization Source Engine (IVSE), or Source
+ *   Controller (SC). These are found in PCI PHBs, in the PSI host
+ *   bridge controller, but also inside the main controller for the
+ *   core IPIs and other sub-chips (NX, CAP, NPU) of the
+ *   chip/processor. They are configured to feed the IVRE with events.
+ *
+ * - Interrupt Virtualization Routing Engine (IVRE) or Virtualization
+ *   Controller (VC). Its job is to match an event source with an
+ *   Event Notification Descriptor (END).
+ *
+ * - Interrupt Virtualization Presentation Engine (IVPE) or
+ *   Presentation Controller (PC). It maintains the interrupt context
+ *   state of each thread and handles the delivery of the external
+ *   exception to the thread.
+ *
+ * In XIVE 1.0, the sub-engines used to be referred as:
+ *
+ *   SC     Source Controller
+ *   VC     Virtualization Controller
+ *   PC     Presentation Controller
+ *   CQ     Common Queue (PowerBUS Bridge)
+ *
+ *
+ * = XIVE internal tables
+ *
+ * Each of the sub-engines uses a set of tables to redirect exceptions
+ * from event sources to CPU threads.
+ *
+ *                                           +-------+
+ *   User or OS                              |  EQ   |
+ *       or                          +------>|entries|
+ *   Hypervisor                      |       |  ..   |
+ *     Memory                        |       +-------+
+ *                                   |           ^
+ *                                   |           |
+ *              +-------------------------------------------------+
+ *                                   |           |
+ *   Hypervisor      +------+    +---+--+    +---+--+   +------+
+ *     Memory        | ESB  |    | EAT  |    | ENDT |   | NVTT |
+ *    (skiboot)      +----+-+    +----+-+    +----+-+   +------+
+ *                     ^  |        ^  |        ^  |       ^
+ *                     |  |        |  |        |  |       |
+ *              +-------------------------------------------------+
+ *                     |  |        |  |        |  |       |
+ *                     |  |        |  |        |  |       |
+ *                +----|--|--------|--|--------|--|-+   +-|-----+    +------+
+ *                |    |  |        |  |        |  | |   | | tctx|    |Thread|
+ *   IPI or   --> |    +  v        +  v        +  v |---| +  .. |----->     |
+ *  HW events --> |                                 |   |       |    |      |
+ *    IVSE        |             IVRE                |   | IVPE  |    +------+
+ *                +---------------------------------+   +-------+
+ *
+ *
+ *
+ * The IVSE have a 2-bits state machine, P for pending and Q for queued,
+ * for each source that allows events to be triggered. They are stored in
+ * an Event State Buffer (ESB) array and can be controlled by MMIOs.
+ *
+ * If the event is let through, the IVRE looks up in the Event Assignment
+ * Structure (EAS) table for an Event Notification Descriptor (END)
+ * configured for the source. Each Event Notification Descriptor defines
+ * a notification path to a CPU and an in-memory Event Queue, in which
+ * will be enqueued an EQ data for the OS to pull.
+ *
+ * The IVPE determines if a Notification Virtual Target (NVT) can
+ * handle the event by scanning the thread contexts of the VCPUs
+ * dispatched on the processor HW threads. It maintains the state of
+ * the thread interrupt context (TCTX) of each thread in a NVT table.
+ *
+ * = Acronyms
+ *
+ *          Description                     In XIVE 1.0, used to be referred as
+ *
+ *   EAS    Event Assignment Structure      IVE   Interrupt Virt. Entry
+ *   EAT    Event Assignment Table          IVT   Interrupt Virt. Table
+ *   ENDT   Event Notif. Descriptor Table   EQDT  Event Queue Desc. Table
+ *   EQ     Event Queue                     same
+ *   ESB    Event State Buffer              SBE   State Bit Entry
+ *   NVT    Notif. Virtual Target           VPD   Virtual Processor Desc.
+ *   NVTT   Notif. Virtual Target Table     VPDT  Virtual Processor Desc. Table
+ *   TCTX   Thread interrupt Context
+ *
+ *
+ * Copyright (c) 2017-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef PPC_XIVE_H
+#define PPC_XIVE_H
+
+#include "hw/qdev-core.h"
+
+/*
+ * XIVE Interrupt Source
+ */
+
+#define TYPE_XIVE_SOURCE "xive-source"
+#define XIVE_SOURCE(obj) OBJECT_CHECK(XiveSource, (obj), TYPE_XIVE_SOURCE)
+
+/*
+ * XIVE Interrupt Source characteristics, which define how the ESB are
+ * controlled.
+ */
+#define XIVE_SRC_H_INT_ESB     0x1 /* ESB managed with hcall H_INT_ESB */
+#define XIVE_SRC_STORE_EOI     0x2 /* Store EOI supported */
+
+typedef struct XiveSource {
+    DeviceState parent;
+
+    /* IRQs */
+    uint32_t        nr_irqs;
+    qemu_irq        *qirqs;
+
+    /* PQ bits */
+    uint8_t         *status;
+
+    /* ESB memory region */
+    uint64_t        esb_flags;
+    uint32_t        esb_shift;
+    MemoryRegion    esb_mmio;
+} XiveSource;
+
+/*
+ * ESB MMIO setting. Can be one page, for both source triggering and
+ * source management, or two different pages. See below for magic
+ * values.
+ */
+#define XIVE_ESB_4K          12 /* PSI HB only */
+#define XIVE_ESB_4K_2PAGE    13
+#define XIVE_ESB_64K         16
+#define XIVE_ESB_64K_2PAGE   17
+
+static inline bool xive_source_esb_has_2page(XiveSource *xsrc)
+{
+    return xsrc->esb_shift == XIVE_ESB_64K_2PAGE ||
+        xsrc->esb_shift == XIVE_ESB_4K_2PAGE;
+}
+
+/* The trigger page is always the first/even page */
+static inline hwaddr xive_source_esb_page(XiveSource *xsrc, uint32_t srcno)
+{
+    assert(srcno < xsrc->nr_irqs);
+    return (1ull << xsrc->esb_shift) * srcno;
+}
+
+/* In a two pages ESB MMIO setting, the odd page is for management */
+static inline hwaddr xive_source_esb_mgmt(XiveSource *xsrc, int srcno)
+{
+    hwaddr addr = xive_source_esb_page(xsrc, srcno);
+
+    if (xive_source_esb_has_2page(xsrc)) {
+        addr += (1 << (xsrc->esb_shift - 1));
+    }
+
+    return addr;
+}
+
+/*
+ * Each interrupt source has a 2-bit state machine which can be
+ * controlled by MMIO. P indicates that an interrupt is pending (has
+ * been sent to a queue and is waiting for an EOI). Q indicates that
+ * the interrupt has been triggered while pending.
+ *
+ * This acts as a coalescing mechanism in order to guarantee that a
+ * given interrupt only occurs at most once in a queue.
+ *
+ * When doing an EOI, the Q bit will indicate if the interrupt
+ * needs to be re-triggered.
+ */
+#define XIVE_ESB_VAL_P        0x2
+#define XIVE_ESB_VAL_Q        0x1
+
+#define XIVE_ESB_RESET        0x0
+#define XIVE_ESB_PENDING      XIVE_ESB_VAL_P
+#define XIVE_ESB_QUEUED       (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
+#define XIVE_ESB_OFF          XIVE_ESB_VAL_Q
+
+/*
+ * "magic" Event State Buffer (ESB) MMIO offsets.
+ *
+ * The following offsets into the ESB MMIO allow to read or manipulate
+ * the PQ bits. They must be used with an 8-byte load instruction.
+ * They all return the previous state of the interrupt (atomically).
+ *
+ * Additionally, some ESB pages support doing an EOI via a store and
+ * some ESBs support doing a trigger via a separate trigger page.
+ */
+#define XIVE_ESB_STORE_EOI      0x400 /* Store */
+#define XIVE_ESB_LOAD_EOI       0x000 /* Load */
+#define XIVE_ESB_GET            0x800 /* Load */
+#define XIVE_ESB_SET_PQ_00      0xc00 /* Load */
+#define XIVE_ESB_SET_PQ_01      0xd00 /* Load */
+#define XIVE_ESB_SET_PQ_10      0xe00 /* Load */
+#define XIVE_ESB_SET_PQ_11      0xf00 /* Load */
+
+uint8_t xive_source_esb_get(XiveSource *xsrc, uint32_t srcno);
+uint8_t xive_source_esb_set(XiveSource *xsrc, uint32_t srcno, uint8_t pq);
+
+void xive_source_pic_print_info(XiveSource *xsrc, uint32_t offset,
+                                Monitor *mon);
+
+static inline qemu_irq xive_source_qirq(XiveSource *xsrc, uint32_t srcno)
+{
+    assert(srcno < xsrc->nr_irqs);
+    return xsrc->qirqs[srcno];
+}
+
+#endif /* PPC_XIVE_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
new file mode 100644
index 000000000000..6389bd832371
--- /dev/null
+++ b/hw/intc/xive.c
@@ -0,0 +1,382 @@
+/*
+ * QEMU PowerPC XIVE interrupt controller model
+ *
+ * Copyright (c) 2017-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/dma.h"
+#include "hw/qdev-properties.h"
+#include "monitor/monitor.h"
+#include "hw/ppc/xive.h"
+
+/*
+ * XIVE ESB helpers
+ */
+
+static uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
+{
+    uint8_t old_pq = *pq & 0x3;
+
+    *pq &= ~0x3;
+    *pq |= value & 0x3;
+
+    return old_pq;
+}
+
+static bool xive_esb_trigger(uint8_t *pq)
+{
+    uint8_t old_pq = *pq & 0x3;
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        xive_esb_set(pq, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_PENDING:
+    case XIVE_ESB_QUEUED:
+        xive_esb_set(pq, XIVE_ESB_QUEUED);
+        return false;
+    case XIVE_ESB_OFF:
+        xive_esb_set(pq, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+static bool xive_esb_eoi(uint8_t *pq)
+{
+    uint8_t old_pq = *pq & 0x3;
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+    case XIVE_ESB_PENDING:
+        xive_esb_set(pq, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_QUEUED:
+        xive_esb_set(pq, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_OFF:
+        xive_esb_set(pq, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+/*
+ * XIVE Interrupt Source (or IVSE)
+ */
+
+uint8_t xive_source_esb_get(XiveSource *xsrc, uint32_t srcno)
+{
+    assert(srcno < xsrc->nr_irqs);
+
+    return xsrc->status[srcno] & 0x3;
+}
+
+uint8_t xive_source_esb_set(XiveSource *xsrc, uint32_t srcno, uint8_t pq)
+{
+    assert(srcno < xsrc->nr_irqs);
+
+    return xive_esb_set(&xsrc->status[srcno], pq);
+}
+
+/*
+ * Returns whether the event notification should be forwarded.
+ */
+static bool xive_source_esb_trigger(XiveSource *xsrc, uint32_t srcno)
+{
+    assert(srcno < xsrc->nr_irqs);
+
+    return xive_esb_trigger(&xsrc->status[srcno]);
+}
+
+/*
+ * Returns whether the event notification should be forwarded.
+ */
+static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno)
+{
+    assert(srcno < xsrc->nr_irqs);
+
+    return xive_esb_eoi(&xsrc->status[srcno]);
+}
+
+/*
+ * Forward the source event notification to the Router
+ */
+static void xive_source_notify(XiveSource *xsrc, int srcno)
+{
+
+}
+
+/*
+ * In a two pages ESB MMIO setting, even page is the trigger page, odd
+ * page is for management
+ */
+static inline bool addr_is_even(hwaddr addr, uint32_t shift)
+{
+    return !((addr >> shift) & 1);
+}
+
+static inline bool xive_source_is_trigger_page(XiveSource *xsrc, hwaddr addr)
+{
+    return xive_source_esb_has_2page(xsrc) &&
+        addr_is_even(addr, xsrc->esb_shift - 1);
+}
+
+/*
+ * ESB MMIO loads
+ *                      Trigger page    Management/EOI page
+ *
+ * ESB MMIO setting     2 pages         1 or 2 pages
+ *
+ * 0x000 .. 0x3FF       -1              EOI and return 0|1
+ * 0x400 .. 0x7FF       -1              EOI and return 0|1
+ * 0x800 .. 0xBFF       -1              return PQ
+ * 0xC00 .. 0xCFF       -1              return PQ and atomically PQ=00
+ * 0xD00 .. 0xDFF       -1              return PQ and atomically PQ=01
+ * 0xE00 .. 0xDFF       -1              return PQ and atomically PQ=10
+ * 0xF00 .. 0xDFF       -1              return PQ and atomically PQ=11
+ */
+static uint64_t xive_source_esb_read(void *opaque, hwaddr addr, unsigned size)
+{
+    XiveSource *xsrc = XIVE_SOURCE(opaque);
+    uint32_t offset = addr & 0xFFF;
+    uint32_t srcno = addr >> xsrc->esb_shift;
+    uint64_t ret = -1;
+
+    /* In a two pages ESB MMIO setting, trigger page should not be read */
+    if (xive_source_is_trigger_page(xsrc, addr)) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "XIVE: invalid load on IRQ %d trigger page at "
+                      "0x%"HWADDR_PRIx"\n", srcno, addr);
+        return -1;
+    }
+
+    switch (offset) {
+    case XIVE_ESB_LOAD_EOI ... XIVE_ESB_LOAD_EOI + 0x7FF:
+        ret = xive_source_esb_eoi(xsrc, srcno);
+
+        /* Forward the source event notification for routing */
+        if (ret) {
+            xive_source_notify(xsrc, srcno);
+        }
+        break;
+
+    case XIVE_ESB_GET ... XIVE_ESB_GET + 0x3FF:
+        ret = xive_source_esb_get(xsrc, srcno);
+        break;
+
+    case XIVE_ESB_SET_PQ_00 ... XIVE_ESB_SET_PQ_00 + 0x0FF:
+    case XIVE_ESB_SET_PQ_01 ... XIVE_ESB_SET_PQ_01 + 0x0FF:
+    case XIVE_ESB_SET_PQ_10 ... XIVE_ESB_SET_PQ_10 + 0x0FF:
+    case XIVE_ESB_SET_PQ_11 ... XIVE_ESB_SET_PQ_11 + 0x0FF:
+        ret = xive_source_esb_set(xsrc, srcno, (offset >> 8) & 0x3);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB load addr %x\n",
+                      offset);
+    }
+
+    return ret;
+}
+
+/*
+ * ESB MMIO stores
+ *                      Trigger page    Management/EOI page
+ *
+ * ESB MMIO setting     2 pages         1 or 2 pages
+ *
+ * 0x000 .. 0x3FF       Trigger         Trigger
+ * 0x400 .. 0x7FF       Trigger         EOI
+ * 0x800 .. 0xBFF       Trigger         undefined
+ * 0xC00 .. 0xCFF       Trigger         PQ=00
+ * 0xD00 .. 0xDFF       Trigger         PQ=01
+ * 0xE00 .. 0xDFF       Trigger         PQ=10
+ * 0xF00 .. 0xDFF       Trigger         PQ=11
+ */
+static void xive_source_esb_write(void *opaque, hwaddr addr,
+                                  uint64_t value, unsigned size)
+{
+    XiveSource *xsrc = XIVE_SOURCE(opaque);
+    uint32_t offset = addr & 0xFFF;
+    uint32_t srcno = addr >> xsrc->esb_shift;
+    bool notify = false;
+
+    /* In a two pages ESB MMIO setting, trigger page only triggers */
+    if (xive_source_is_trigger_page(xsrc, addr)) {
+        notify = xive_source_esb_trigger(xsrc, srcno);
+        goto out;
+    }
+
+    switch (offset) {
+    case 0 ... 0x3FF:
+        notify = xive_source_esb_trigger(xsrc, srcno);
+        break;
+
+    case XIVE_ESB_STORE_EOI ... XIVE_ESB_STORE_EOI + 0x3FF:
+        if (!(xsrc->esb_flags & XIVE_SRC_STORE_EOI)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "XIVE: invalid Store EOI for IRQ %d\n", srcno);
+            return;
+        }
+
+        notify = xive_source_esb_eoi(xsrc, srcno);
+        break;
+
+    case XIVE_ESB_SET_PQ_00 ... XIVE_ESB_SET_PQ_00 + 0x0FF:
+    case XIVE_ESB_SET_PQ_01 ... XIVE_ESB_SET_PQ_01 + 0x0FF:
+    case XIVE_ESB_SET_PQ_10 ... XIVE_ESB_SET_PQ_10 + 0x0FF:
+    case XIVE_ESB_SET_PQ_11 ... XIVE_ESB_SET_PQ_11 + 0x0FF:
+        xive_source_esb_set(xsrc, srcno, (offset >> 8) & 0x3);
+        break;
+
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %x\n",
+                      offset);
+        return;
+    }
+
+out:
+    /* Forward the source event notification for routing */
+    if (notify) {
+        xive_source_notify(xsrc, srcno);
+    }
+}
+
+static const MemoryRegionOps xive_source_esb_ops = {
+    .read = xive_source_esb_read,
+    .write = xive_source_esb_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+};
+
+static void xive_source_set_irq(void *opaque, int srcno, int val)
+{
+    XiveSource *xsrc = XIVE_SOURCE(opaque);
+    bool notify = false;
+
+    if (val) {
+        notify = xive_source_esb_trigger(xsrc, srcno);
+    }
+
+    /* Forward the source event notification for routing */
+    if (notify) {
+        xive_source_notify(xsrc, srcno);
+    }
+}
+
+void xive_source_pic_print_info(XiveSource *xsrc, uint32_t offset, Monitor *mon)
+{
+    int i;
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        uint8_t pq = xive_source_esb_get(xsrc, i);
+
+        if (pq == XIVE_ESB_OFF) {
+            continue;
+        }
+
+        monitor_printf(mon, "  %08x %c%c\n", i + offset,
+                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
+                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-');
+    }
+}
+
+static void xive_source_reset(void *dev)
+{
+    XiveSource *xsrc = XIVE_SOURCE(dev);
+
+    /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
+    memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
+}
+
+static void xive_source_realize(DeviceState *dev, Error **errp)
+{
+    XiveSource *xsrc = XIVE_SOURCE(dev);
+
+    if (!xsrc->nr_irqs) {
+        error_setg(errp, "Number of interrupt needs to be greater than 0");
+        return;
+    }
+
+    if (xsrc->esb_shift != XIVE_ESB_4K &&
+        xsrc->esb_shift != XIVE_ESB_4K_2PAGE &&
+        xsrc->esb_shift != XIVE_ESB_64K &&
+        xsrc->esb_shift != XIVE_ESB_64K_2PAGE) {
+        error_setg(errp, "Invalid ESB shift setting");
+        return;
+    }
+
+    xsrc->status = g_malloc0(xsrc->nr_irqs);
+
+    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
+                          &xive_source_esb_ops, xsrc, "xive.esb",
+                          (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
+
+    xsrc->qirqs = qemu_allocate_irqs(xive_source_set_irq, xsrc,
+                                     xsrc->nr_irqs);
+
+    qemu_register_reset(xive_source_reset, dev);
+}
+
+static const VMStateDescription vmstate_xive_source = {
+    .name = TYPE_XIVE_SOURCE,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32_EQUAL(nr_irqs, XiveSource, NULL),
+        VMSTATE_VBUFFER_UINT32(status, XiveSource, 1, NULL, nr_irqs),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+/*
+ * The default XIVE interrupt source setting for the ESB MMIOs is two
+ * 64k pages without Store EOI, to be in sync with KVM.
+ */
+static Property xive_source_properties[] = {
+    DEFINE_PROP_UINT64("flags", XiveSource, esb_flags, 0),
+    DEFINE_PROP_UINT32("nr-irqs", XiveSource, nr_irqs, 0),
+    DEFINE_PROP_UINT32("shift", XiveSource, esb_shift, XIVE_ESB_64K_2PAGE),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xive_source_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->desc    = "XIVE Interrupt Source";
+    dc->props   = xive_source_properties;
+    dc->realize = xive_source_realize;
+    dc->vmsd    = &vmstate_xive_source;
+}
+
+static const TypeInfo xive_source_info = {
+    .name          = TYPE_XIVE_SOURCE,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(XiveSource),
+    .class_init    = xive_source_class_init,
+};
+
+static void xive_register_types(void)
+{
+    type_register_static(&xive_source_info);
+}
+
+type_init(xive_register_types)
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 0e9963f5eecc..72a46ed91c31 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -37,6 +37,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
+obj-$(CONFIG_XIVE) += xive.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 02/37] ppc/xive: add support for the LSI interrupt sources
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 01/37] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface Cédric Le Goater
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The 'sent' status of the LSI interrupt source is modeled with the 'P'
bit of the ESB and the assertion status of the source is maintained
with an extra bit under the main XiveSource object. The type of the
source is stored in the same array for practical reasons.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h | 19 ++++++++++++-
 hw/intc/xive.c        | 66 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 7aa2e3801222..7cebc32eba4c 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -162,8 +162,9 @@ typedef struct XiveSource {
     /* IRQs */
     uint32_t        nr_irqs;
     qemu_irq        *qirqs;
+    unsigned long   *lsi_map;
 
-    /* PQ bits */
+    /* PQ bits and LSI assertion bit */
     uint8_t         *status;
 
     /* ESB memory region */
@@ -219,6 +220,7 @@ static inline hwaddr xive_source_esb_mgmt(XiveSource *xsrc, int srcno)
  * When doing an EOI, the Q bit will indicate if the interrupt
  * needs to be re-triggered.
  */
+#define XIVE_STATUS_ASSERTED  0x4  /* Extra bit for LSI */
 #define XIVE_ESB_VAL_P        0x2
 #define XIVE_ESB_VAL_Q        0x1
 
@@ -257,4 +259,19 @@ static inline qemu_irq xive_source_qirq(XiveSource *xsrc, uint32_t srcno)
     return xsrc->qirqs[srcno];
 }
 
+static inline bool xive_source_irq_is_lsi(XiveSource *xsrc, uint32_t srcno)
+{
+    assert(srcno < xsrc->nr_irqs);
+    return test_bit(srcno, xsrc->lsi_map);
+}
+
+static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
+                                       bool lsi)
+{
+    assert(srcno < xsrc->nr_irqs);
+    if (lsi) {
+        bitmap_set(xsrc->lsi_map, srcno, 1);
+    }
+}
+
 #endif /* PPC_XIVE_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 6389bd832371..11c7aac962de 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -89,14 +89,42 @@ uint8_t xive_source_esb_set(XiveSource *xsrc, uint32_t srcno, uint8_t pq)
     return xive_esb_set(&xsrc->status[srcno], pq);
 }
 
+/*
+ * Returns whether the event notification should be forwarded.
+ */
+static bool xive_source_lsi_trigger(XiveSource *xsrc, uint32_t srcno)
+{
+    uint8_t old_pq = xive_source_esb_get(xsrc, srcno);
+
+    xsrc->status[srcno] |= XIVE_STATUS_ASSERTED;
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        xive_source_esb_set(xsrc, srcno, XIVE_ESB_PENDING);
+        return true;
+    default:
+        return false;
+    }
+}
+
 /*
  * Returns whether the event notification should be forwarded.
  */
 static bool xive_source_esb_trigger(XiveSource *xsrc, uint32_t srcno)
 {
+    bool ret;
+
     assert(srcno < xsrc->nr_irqs);
 
-    return xive_esb_trigger(&xsrc->status[srcno]);
+    ret = xive_esb_trigger(&xsrc->status[srcno]);
+
+    if (xive_source_irq_is_lsi(xsrc, srcno) &&
+        xive_source_esb_get(xsrc, srcno) == XIVE_ESB_QUEUED) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "XIVE: queued an event on LSI IRQ %d\n", srcno);
+    }
+
+    return ret;
 }
 
 /*
@@ -104,9 +132,22 @@ static bool xive_source_esb_trigger(XiveSource *xsrc, uint32_t srcno)
  */
 static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno)
 {
+    bool ret;
+
     assert(srcno < xsrc->nr_irqs);
 
-    return xive_esb_eoi(&xsrc->status[srcno]);
+    ret = xive_esb_eoi(&xsrc->status[srcno]);
+
+    /* LSI sources do not set the Q bit but they can still be
+     * asserted, in which case we should forward a new event
+     * notification
+     */
+    if (xive_source_irq_is_lsi(xsrc, srcno) &&
+        xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
+        ret = xive_source_lsi_trigger(xsrc, srcno);
+    }
+
+    return ret;
 }
 
 /*
@@ -271,8 +312,16 @@ static void xive_source_set_irq(void *opaque, int srcno, int val)
     XiveSource *xsrc = XIVE_SOURCE(opaque);
     bool notify = false;
 
-    if (val) {
-        notify = xive_source_esb_trigger(xsrc, srcno);
+    if (xive_source_irq_is_lsi(xsrc, srcno)) {
+        if (val) {
+            notify = xive_source_lsi_trigger(xsrc, srcno);
+        } else {
+            xsrc->status[srcno] &= ~XIVE_STATUS_ASSERTED;
+        }
+    } else {
+        if (val) {
+            notify = xive_source_esb_trigger(xsrc, srcno);
+        }
     }
 
     /* Forward the source event notification for routing */
@@ -292,9 +341,11 @@ void xive_source_pic_print_info(XiveSource *xsrc, uint32_t offset, Monitor *mon)
             continue;
         }
 
-        monitor_printf(mon, "  %08x %c%c\n", i + offset,
+        monitor_printf(mon, "  %08x %s %c%c%c\n", i + offset,
+                       xive_source_irq_is_lsi(xsrc, i) ? "LSI" : "MSI",
                        pq & XIVE_ESB_VAL_P ? 'P' : '-',
-                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-');
+                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
+                       xsrc->status[i] & XIVE_STATUS_ASSERTED ? 'A' : ' ');
     }
 }
 
@@ -302,6 +353,8 @@ static void xive_source_reset(void *dev)
 {
     XiveSource *xsrc = XIVE_SOURCE(dev);
 
+    /* Do not clear the LSI bitmap */
+
     /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
     memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
 }
@@ -324,6 +377,7 @@ static void xive_source_realize(DeviceState *dev, Error **errp)
     }
 
     xsrc->status = g_malloc0(xsrc->nr_irqs);
+    xsrc->lsi_map = bitmap_new(xsrc->nr_irqs);
 
     memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
                           &xive_source_esb_ops, xsrc, "xive.esb",
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 01/37] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 02/37] ppc/xive: add support for the LSI interrupt sources Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-06  3:25   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model Cédric Le Goater
                   ` (34 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The XiveNotifier offers a simple interface, between the XiveSource
object and the main interrupt controller of the machine. It will
forward event notifications to the XIVE Interrupt Virtualization
Routing Engine (IVRE).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h | 23 +++++++++++++++++++++++
 hw/intc/xive.c        | 25 +++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 7cebc32eba4c..6770cffec67d 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -142,6 +142,27 @@
 
 #include "hw/qdev-core.h"
 
+/*
+ * XIVE Fabric (Interface between Source and Router)
+ */
+
+typedef struct XiveNotifier {
+    Object parent;
+} XiveNotifier;
+
+#define TYPE_XIVE_NOTIFIER "xive-fabric"
+#define XIVE_NOTIFIER(obj)                                     \
+    OBJECT_CHECK(XiveNotifier, (obj), TYPE_XIVE_NOTIFIER)
+#define XIVE_NOTIFIER_CLASS(klass)                                     \
+    OBJECT_CLASS_CHECK(XiveNotifierClass, (klass), TYPE_XIVE_NOTIFIER)
+#define XIVE_NOTIFIER_GET_CLASS(obj)                                   \
+    OBJECT_GET_CLASS(XiveNotifierClass, (obj), TYPE_XIVE_NOTIFIER)
+
+typedef struct XiveNotifierClass {
+    InterfaceClass parent;
+    void (*notify)(XiveNotifier *xn, uint32_t lisn);
+} XiveNotifierClass;
+
 /*
  * XIVE Interrupt Source
  */
@@ -171,6 +192,8 @@ typedef struct XiveSource {
     uint64_t        esb_flags;
     uint32_t        esb_shift;
     MemoryRegion    esb_mmio;
+
+    XiveNotifier    *xive;
 } XiveSource;
 
 /*
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 11c7aac962de..79238eb57fae 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -155,7 +155,11 @@ static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno)
  */
 static void xive_source_notify(XiveSource *xsrc, int srcno)
 {
+    XiveNotifierClass *xnc = XIVE_NOTIFIER_GET_CLASS(xsrc->xive);
 
+    if (xnc->notify) {
+        xnc->notify(xsrc->xive, srcno);
+    }
 }
 
 /*
@@ -362,6 +366,17 @@ static void xive_source_reset(void *dev)
 static void xive_source_realize(DeviceState *dev, Error **errp)
 {
     XiveSource *xsrc = XIVE_SOURCE(dev);
+    Object *obj;
+    Error *local_err = NULL;
+
+    obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
+    if (!obj) {
+        error_propagate(errp, local_err);
+        error_prepend(errp, "required link 'xive' not found: ");
+        return;
+    }
+
+    xsrc->xive = XIVE_NOTIFIER(obj);
 
     if (!xsrc->nr_irqs) {
         error_setg(errp, "Number of interrupt needs to be greater than 0");
@@ -428,9 +443,19 @@ static const TypeInfo xive_source_info = {
     .class_init    = xive_source_class_init,
 };
 
+/*
+ * XIVE Fabric
+ */
+static const TypeInfo xive_fabric_info = {
+    .name = TYPE_XIVE_NOTIFIER,
+    .parent = TYPE_INTERFACE,
+    .class_size = sizeof(XiveNotifierClass),
+};
+
 static void xive_register_types(void)
 {
     type_register_static(&xive_source_info);
+    type_register_static(&xive_fabric_info);
 }
 
 type_init(xive_register_types)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (2 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-06  3:41   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 05/37] ppc/xive: introduce the XIVE Event Notification Descriptors Cédric Le Goater
                   ` (33 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The XiveRouter models the second sub-engine of the XIVE architecture :
the Interrupt Virtualization Routing Engine (IVRE).

The IVRE handles event notifications of the IVSE and performs the
interrupt routing process. For this purpose, it uses a set of tables
stored in system memory, the first of which being the Event Assignment
Structure (EAS) table.

The EAT associates an interrupt source number with an Event Notification
Descriptor (END) which will be used in a second phase of the routing
process to identify a Notification Virtual Target.

The XiveRouter is an abstract class which needs to be inherited from
to define a storage for the EAT, and other upcoming tables.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h      | 31 ++++++++++++++++
 include/hw/ppc/xive_regs.h | 50 +++++++++++++++++++++++++
 hw/intc/xive.c             | 76 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 157 insertions(+)
 create mode 100644 include/hw/ppc/xive_regs.h

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 6770cffec67d..57ec9f84f527 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -141,6 +141,8 @@
 #define PPC_XIVE_H
 
 #include "hw/qdev-core.h"
+#include "hw/sysbus.h"
+#include "hw/ppc/xive_regs.h"
 
 /*
  * XIVE Fabric (Interface between Source and Router)
@@ -297,4 +299,33 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
     }
 }
 
+/*
+ * XIVE Router
+ */
+
+typedef struct XiveRouter {
+    SysBusDevice    parent;
+} XiveRouter;
+
+#define TYPE_XIVE_ROUTER "xive-router"
+#define XIVE_ROUTER(obj)                                \
+    OBJECT_CHECK(XiveRouter, (obj), TYPE_XIVE_ROUTER)
+#define XIVE_ROUTER_CLASS(klass)                                        \
+    OBJECT_CLASS_CHECK(XiveRouterClass, (klass), TYPE_XIVE_ROUTER)
+#define XIVE_ROUTER_GET_CLASS(obj)                              \
+    OBJECT_GET_CLASS(XiveRouterClass, (obj), TYPE_XIVE_ROUTER)
+
+typedef struct XiveRouterClass {
+    SysBusDeviceClass parent;
+
+    /* XIVE table accessors */
+    int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
+                   XiveEAS *eas);
+} XiveRouterClass;
+
+void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
+
+int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
+                        XiveEAS *eas);
+
 #endif /* PPC_XIVE_H */
diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
new file mode 100644
index 000000000000..15f2470ed9cc
--- /dev/null
+++ b/include/hw/ppc/xive_regs.h
@@ -0,0 +1,50 @@
+/*
+ * QEMU PowerPC XIVE internal structure definitions
+ *
+ *
+ * The XIVE structures are accessed by the HW and their format is
+ * architected to be big-endian. Some macros are provided to ease
+ * access to the different fields.
+ *
+ *
+ * Copyright (c) 2016-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PPC_XIVE_REGS_H
+#define PPC_XIVE_REGS_H
+
+/*
+ * Interrupt source number encoding on PowerBUS
+ */
+#define XIVE_SRCNO_BLOCK(srcno) (((srcno) >> 28) & 0xf)
+#define XIVE_SRCNO_INDEX(srcno) ((srcno) & 0x0fffffff)
+#define XIVE_SRCNO(blk, idx)    ((uint32_t)(blk) << 28 | (idx))
+
+/* EAS (Event Assignment Structure)
+ *
+ * One per interrupt source. Targets an interrupt to a given Event
+ * Notification Descriptor (END) and provides the corresponding
+ * logical interrupt number (END data)
+ */
+typedef struct XiveEAS {
+        /* Use a single 64-bit definition to make it easier to
+         * perform atomic updates
+         */
+        uint64_t        w;
+#define EAS_VALID       PPC_BIT(0)
+#define EAS_END_BLOCK   PPC_BITMASK(4, 7)        /* Destination END block# */
+#define EAS_END_INDEX   PPC_BITMASK(8, 31)       /* Destination END index */
+#define EAS_MASKED      PPC_BIT(32)              /* Masked */
+#define EAS_END_DATA    PPC_BITMASK(33, 63)      /* Data written to the END */
+} XiveEAS;
+
+#define xive_eas_is_valid(eas)   (be64_to_cpu((eas)->w) & EAS_VALID)
+#define xive_eas_is_masked(eas)  (be64_to_cpu((eas)->w) & EAS_MASKED)
+
+#define GETFIELD_BE64(m, v)      GETFIELD(m, be64_to_cpu(v))
+#define SETFIELD_BE64(m, v, val) cpu_to_be64(SETFIELD(m, be64_to_cpu(v), val))
+
+#endif /* PPC_XIVE_REGS_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 79238eb57fae..d21df6674d8c 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -443,6 +443,81 @@ static const TypeInfo xive_source_info = {
     .class_init    = xive_source_class_init,
 };
 
+/*
+ * XIVE Router (aka. Virtualization Controller or IVRE)
+ */
+
+int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
+                        XiveEAS *eas)
+{
+    XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
+
+    return xrc->get_eas(xrtr, eas_blk, eas_idx, eas);
+}
+
+static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
+{
+    XiveRouter *xrtr = XIVE_ROUTER(xn);
+    uint8_t eas_blk = XIVE_SRCNO_BLOCK(lisn);
+    uint32_t eas_idx = XIVE_SRCNO_INDEX(lisn);
+    XiveEAS eas;
+
+    /* EAS cache lookup */
+    if (xive_router_get_eas(xrtr, eas_blk, eas_idx, &eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %x\n", lisn);
+        return;
+    }
+
+    /* The IVRE checks the State Bit Cache at this point. We skip the
+     * SBC lookup because the state bits of the sources are modeled
+     * internally in QEMU.
+     */
+
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %x\n", lisn);
+        return;
+    }
+
+    if (xive_eas_is_masked(&eas)) {
+        /* Notification completed */
+        return;
+    }
+}
+
+static void xive_router_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    XiveNotifierClass *xnc = XIVE_NOTIFIER_CLASS(klass);
+
+    dc->desc    = "XIVE Router Engine";
+    xnc->notify = xive_router_notify;
+}
+
+static const TypeInfo xive_router_info = {
+    .name          = TYPE_XIVE_ROUTER,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .abstract      = true,
+    .class_size    = sizeof(XiveRouterClass),
+    .class_init    = xive_router_class_init,
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_XIVE_NOTIFIER },
+        { }
+    }
+};
+
+void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon)
+{
+    if (!xive_eas_is_valid(eas)) {
+        return;
+    }
+
+    monitor_printf(mon, "  %08x %s end:%02x/%04x data:%08x\n",
+                   lisn, xive_eas_is_masked(eas) ? "M" : " ",
+                   (uint8_t)  GETFIELD_BE64(EAS_END_BLOCK, eas->w),
+                   (uint32_t) GETFIELD_BE64(EAS_END_INDEX, eas->w),
+                   (uint32_t) GETFIELD_BE64(EAS_END_DATA, eas->w));
+}
+
 /*
  * XIVE Fabric
  */
@@ -456,6 +531,7 @@ static void xive_register_types(void)
 {
     type_register_static(&xive_source_info);
     type_register_static(&xive_fabric_info);
+    type_register_static(&xive_router_info);
 }
 
 type_init(xive_register_types)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 05/37] ppc/xive: introduce the XIVE Event Notification Descriptors
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (3 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-06  3:56   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers Cédric Le Goater
                   ` (32 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

To complete the event routing, the IVRE sub-engine uses a second table
containing Event Notification Descriptor (END) structures.

An END specifies on which Event Queue (EQ) the event notification
data, defined in the associated EAS, should be posted when an
exception occurs. It also defines which Notification Virtual Target
(NVT) should be notified.

The Event Queue is a memory page provided by the O/S defining a
circular buffer, one per server and priority couple, containing Event
Queue entries. These are 4 bytes long, the first bit being a
'generation' bit and the 31 following bits the END Data field. They
are pulled by the O/S when the exception occurs.

The END Data field is a way to set an invariant logical event source
number for an IRQ. On sPAPR machines, it is set with the
H_INT_SET_SOURCE_CONFIG hcall when the EISN flag is used.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h      |  18 ++++
 include/hw/ppc/xive_regs.h |  57 ++++++++++++
 hw/intc/xive.c             | 174 +++++++++++++++++++++++++++++++++++++
 3 files changed, 249 insertions(+)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 57ec9f84f527..d1b4c6c78ec5 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -321,11 +321,29 @@ typedef struct XiveRouterClass {
     /* XIVE table accessors */
     int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
                    XiveEAS *eas);
+    int (*get_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
+                   XiveEND *end);
+    int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
+                     XiveEND *end, uint8_t word_number);
 } XiveRouterClass;
 
 void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
 
 int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
                         XiveEAS *eas);
+int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
+                        XiveEND *end);
+int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
+                          XiveEND *end, uint8_t word_number);
+
+/*
+ * For legacy compatibility, the exceptions define up to 256 different
+ * priorities. P9 implements only 9 levels : 8 active levels [0 - 7]
+ * and the least favored level 0xFF.
+ */
+#define XIVE_PRIORITY_MAX  7
+
+void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon);
+void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon);
 
 #endif /* PPC_XIVE_H */
diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
index 15f2470ed9cc..3c0ebad18b69 100644
--- a/include/hw/ppc/xive_regs.h
+++ b/include/hw/ppc/xive_regs.h
@@ -47,4 +47,61 @@ typedef struct XiveEAS {
 #define GETFIELD_BE64(m, v)      GETFIELD(m, be64_to_cpu(v))
 #define SETFIELD_BE64(m, v, val) cpu_to_be64(SETFIELD(m, be64_to_cpu(v), val))
 
+/* Event Notification Descriptor (END) */
+typedef struct XiveEND {
+        uint32_t        w0;
+#define END_W0_VALID             PPC_BIT32(0) /* "v" bit */
+#define END_W0_ENQUEUE           PPC_BIT32(1) /* "q" bit */
+#define END_W0_UCOND_NOTIFY      PPC_BIT32(2) /* "n" bit */
+#define END_W0_BACKLOG           PPC_BIT32(3) /* "b" bit */
+#define END_W0_PRECL_ESC_CTL     PPC_BIT32(4) /* "p" bit */
+#define END_W0_ESCALATE_CTL      PPC_BIT32(5) /* "e" bit */
+#define END_W0_UNCOND_ESCALATE   PPC_BIT32(6) /* "u" bit - DD2.0 */
+#define END_W0_SILENT_ESCALATE   PPC_BIT32(7) /* "s" bit - DD2.0 */
+#define END_W0_QSIZE             PPC_BITMASK32(12, 15)
+#define END_W0_SW0               PPC_BIT32(16)
+#define END_W0_FIRMWARE          END_W0_SW0 /* Owned by FW */
+#define END_QSIZE_4K             0
+#define END_QSIZE_64K            4
+#define END_W0_HWDEP             PPC_BITMASK32(24, 31)
+        uint32_t        w1;
+#define END_W1_ESn               PPC_BITMASK32(0, 1)
+#define END_W1_ESn_P             PPC_BIT32(0)
+#define END_W1_ESn_Q             PPC_BIT32(1)
+#define END_W1_ESe               PPC_BITMASK32(2, 3)
+#define END_W1_ESe_P             PPC_BIT32(2)
+#define END_W1_ESe_Q             PPC_BIT32(3)
+#define END_W1_GENERATION        PPC_BIT32(9)
+#define END_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
+        uint32_t        w2;
+#define END_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
+#define END_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
+        uint32_t        w3;
+#define END_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
+        uint32_t        w4;
+#define END_W4_ESC_END_BLOCK     PPC_BITMASK32(4, 7)
+#define END_W4_ESC_END_INDEX     PPC_BITMASK32(8, 31)
+        uint32_t        w5;
+#define END_W5_ESC_END_DATA      PPC_BITMASK32(1, 31)
+        uint32_t        w6;
+#define END_W6_FORMAT_BIT        PPC_BIT32(8)
+#define END_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
+#define END_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
+        uint32_t        w7;
+#define END_W7_F0_IGNORE         PPC_BIT32(0)
+#define END_W7_F0_BLK_GROUPING   PPC_BIT32(1)
+#define END_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
+#define END_W7_F1_WAKEZ          PPC_BIT32(0)
+#define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
+} XiveEND;
+
+#define xive_end_is_valid(end)    (be32_to_cpu((end)->w0) & END_W0_VALID)
+#define xive_end_is_enqueue(end)  (be32_to_cpu((end)->w0) & END_W0_ENQUEUE)
+#define xive_end_is_notify(end)   (be32_to_cpu((end)->w0) & END_W0_UCOND_NOTIFY)
+#define xive_end_is_backlog(end)  (be32_to_cpu((end)->w0) & END_W0_BACKLOG)
+#define xive_end_is_escalate(end) (be32_to_cpu((end)->w0) & END_W0_ESCALATE_CTL)
+
+#define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
+#define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
+
 #endif /* PPC_XIVE_REGS_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index d21df6674d8c..41d8ba1540d0 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -443,6 +443,95 @@ static const TypeInfo xive_source_info = {
     .class_init    = xive_source_class_init,
 };
 
+/*
+ * XiveEND helpers
+ */
+
+void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon)
+{
+    uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
+        | be32_to_cpu(end->w3);
+    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
+    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+    uint32_t qentries = 1 << (qsize + 10);
+    int i;
+
+    /*
+     * print out the [ (qindex - (width - 1)) .. (qindex + 1)] window
+     */
+    monitor_printf(mon, " [ ");
+    qindex = (qindex - (width - 1)) & (qentries - 1);
+    for (i = 0; i < width; i++) {
+        uint64_t qaddr = qaddr_base + (qindex << 2);
+        uint32_t qdata = -1;
+
+        if (dma_memory_read(&address_space_memory, qaddr, &qdata,
+                            sizeof(qdata))) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to read EQ @0x%"
+                          HWADDR_PRIx "\n", qaddr);
+            return;
+        }
+        monitor_printf(mon, "%s%08x ", i == width - 1 ? "^" : "",
+                       be32_to_cpu(qdata));
+        qindex = (qindex + 1) & (qentries - 1);
+    }
+}
+
+void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon)
+{
+    uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
+        | be32_to_cpu(end->w3);
+    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
+    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
+    uint32_t qentries = 1 << (qsize + 10);
+
+    uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
+    uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
+
+    if (!xive_end_is_valid(end)) {
+        return;
+    }
+
+    monitor_printf(mon, "  %08x %c%c%c%c%c prio:%d nvt:%04x eq:@%08"PRIx64
+                   "% 6d/%5d ^%d", end_idx,
+                   xive_end_is_valid(end)    ? 'v' : '-',
+                   xive_end_is_enqueue(end)  ? 'q' : '-',
+                   xive_end_is_notify(end)   ? 'n' : '-',
+                   xive_end_is_backlog(end)  ? 'b' : '-',
+                   xive_end_is_escalate(end) ? 'e' : '-',
+                   priority, nvt, qaddr_base, qindex, qentries, qgen);
+
+    xive_end_queue_pic_print_info(end, 6, mon);
+    monitor_printf(mon, "]\n");
+}
+
+static void xive_end_enqueue(XiveEND *end, uint32_t data)
+{
+    uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
+        | be32_to_cpu(end->w3);
+    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
+    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
+
+    uint64_t qaddr = qaddr_base + (qindex << 2);
+    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
+    uint32_t qentries = 1 << (qsize + 10);
+
+    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to write END data @0x%"
+                      HWADDR_PRIx "\n", qaddr);
+        return;
+    }
+
+    qindex = (qindex + 1) & (qentries - 1);
+    if (qindex == 0) {
+        qgen ^= 1;
+        end->w1 = SETFIELD_BE32(END_W1_GENERATION, end->w1, qgen);
+    }
+    end->w1 = SETFIELD_BE32(END_W1_PAGE_OFF, end->w1, qindex);
+}
+
 /*
  * XIVE Router (aka. Virtualization Controller or IVRE)
  */
@@ -455,6 +544,83 @@ int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
     return xrc->get_eas(xrtr, eas_blk, eas_idx, eas);
 }
 
+int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
+                        XiveEND *end)
+{
+   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
+
+   return xrc->get_end(xrtr, end_blk, end_idx, end);
+}
+
+int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
+                          XiveEND *end, uint8_t word_number)
+{
+   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
+
+   return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
+}
+
+/*
+ * An END trigger can come from an event trigger (IPI or HW) or from
+ * another chip. We don't model the PowerBus but the END trigger
+ * message has the same parameters than in the function below.
+ */
+static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
+                                   uint32_t end_idx, uint32_t end_data)
+{
+    XiveEND end;
+    uint8_t priority;
+    uint8_t format;
+
+    /* END cache lookup */
+    if (xive_router_get_end(xrtr, end_blk, end_idx, &end)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No END %x/%x\n", end_blk,
+                      end_idx);
+        return;
+    }
+
+    if (!xive_end_is_valid(&end)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: END %x/%x is invalid\n",
+                      end_blk, end_idx);
+        return;
+    }
+
+    if (xive_end_is_enqueue(&end)) {
+        xive_end_enqueue(&end, end_data);
+        /* Enqueuing event data modifies the EQ toggle and index */
+        xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
+    }
+
+    /*
+     * The W7 format depends on the F bit in W6. It defines the type
+     * of the notification :
+     *
+     *   F=0 : single or multiple NVT notification
+     *   F=1 : User level Event-Based Branch (EBB) notification, no
+     *         priority
+     */
+    format = GETFIELD_BE32(END_W6_FORMAT_BIT, end.w6);
+    priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end.w7);
+
+    /* The END is masked */
+    if (format == 0 && priority == 0xff) {
+        return;
+    }
+
+    /*
+     * Check the END ESn (Event State Buffer for notification) for
+     * even futher coalescing in the Router
+     */
+    if (!xive_end_is_notify(&end)) {
+        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
+        return;
+    }
+
+    /*
+     * Follows IVPE notification
+     */
+}
+
 static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
 {
     XiveRouter *xrtr = XIVE_ROUTER(xn);
@@ -482,6 +648,14 @@ static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
         /* Notification completed */
         return;
     }
+
+    /*
+     * The event trigger becomes an END trigger
+     */
+    xive_router_end_notify(xrtr,
+                           GETFIELD_BE64(EAS_END_BLOCK, eas.w),
+                           GETFIELD_BE64(EAS_END_INDEX, eas.w),
+                           GETFIELD_BE64(EAS_END_DATA,  eas.w));
 }
 
 static void xive_router_class_init(ObjectClass *klass, void *data)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (4 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 05/37] ppc/xive: introduce the XIVE Event Notification Descriptors Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-06  4:09   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 07/37] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
                   ` (31 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The Event Notification Descriptor (END) XIVE structure also contains
two Event State Buffers providing further coalescing of interrupts,
one for the notification event (ESn) and one for the escalation events
(ESe). A MMIO page is assigned for each to control the EOI through
loads only. Stores are not allowed.

The END ESBs are modeled through an object resembling the 'XiveSource'
It is stateless as the END state bits are backed into the XiveEND
structure under the XiveRouter and the MMIO accesses follow the same
rules as for the standard source ESBs.

END ESBs are not supported by the Linux drivers neither on OPAL nor on
sPAPR. Nevetherless, it provides a mean to study the question in the
future and validates a bit more the XIVE model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h |  22 ++++++
 hw/intc/xive.c        | 173 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 193 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index d1b4c6c78ec5..d67b0785df7c 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -305,6 +305,8 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
 
 typedef struct XiveRouter {
     SysBusDevice    parent;
+
+    uint32_t       chip_id;
 } XiveRouter;
 
 #define TYPE_XIVE_ROUTER "xive-router"
@@ -336,6 +338,26 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
 int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                           XiveEND *end, uint8_t word_number);
 
+/*
+ * XIVE END ESBs
+ */
+
+#define TYPE_XIVE_END_SOURCE "xive-end-source"
+#define XIVE_END_SOURCE(obj) \
+    OBJECT_CHECK(XiveENDSource, (obj), TYPE_XIVE_END_SOURCE)
+
+typedef struct XiveENDSource {
+    DeviceState parent;
+
+    uint32_t        nr_ends;
+
+    /* ESB memory region */
+    uint32_t        esb_shift;
+    MemoryRegion    esb_mmio;
+
+    XiveRouter      *xrtr;
+} XiveENDSource;
+
 /*
  * For legacy compatibility, the exceptions define up to 256 different
  * priorities. P9 implements only 9 levels : 8 active levels [0 - 7]
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 41d8ba1540d0..83686e260df5 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -612,8 +612,18 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
      * even futher coalescing in the Router
      */
     if (!xive_end_is_notify(&end)) {
-        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
-        return;
+        uint8_t pq = GETFIELD_BE32(END_W1_ESn, end.w1);
+        bool notify = xive_esb_trigger(&pq);
+
+        if (pq != GETFIELD_BE32(END_W1_ESn, end.w1)) {
+            end.w1 = SETFIELD_BE32(END_W1_ESn, end.w1, pq);
+            xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
+        }
+
+        /* ESn[Q]=1 : end of notification */
+        if (!notify) {
+            return;
+        }
     }
 
     /*
@@ -658,12 +668,18 @@ static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
                            GETFIELD_BE64(EAS_END_DATA,  eas.w));
 }
 
+static Property xive_router_properties[] = {
+    DEFINE_PROP_UINT32("chip-id", XiveRouter, chip_id, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void xive_router_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     XiveNotifierClass *xnc = XIVE_NOTIFIER_CLASS(klass);
 
     dc->desc    = "XIVE Router Engine";
+    dc->props   = xive_router_properties;
     xnc->notify = xive_router_notify;
 }
 
@@ -692,6 +708,158 @@ void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon)
                    (uint32_t) GETFIELD_BE64(EAS_END_DATA, eas->w));
 }
 
+/*
+ * END ESB MMIO loads
+ */
+static uint64_t xive_end_source_read(void *opaque, hwaddr addr, unsigned size)
+{
+    XiveENDSource *xsrc = XIVE_END_SOURCE(opaque);
+    XiveRouter *xrtr = xsrc->xrtr;
+    uint32_t offset = addr & 0xFFF;
+    uint8_t end_blk;
+    uint32_t end_idx;
+    XiveEND end;
+    uint32_t end_esmask;
+    uint8_t pq;
+    uint64_t ret = -1;
+
+    end_blk = xrtr->chip_id;
+    end_idx = addr >> (xsrc->esb_shift + 1);
+
+    if (xive_router_get_end(xrtr, end_blk, end_idx, &end)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No END %x/%x\n", end_blk,
+                      end_idx);
+        return -1;
+    }
+
+    if (!xive_end_is_valid(&end)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: END %x/%x is invalid\n",
+                      end_blk, end_idx);
+        return -1;
+    }
+
+    end_esmask = addr_is_even(addr, xsrc->esb_shift) ? END_W1_ESn : END_W1_ESe;
+    pq = GETFIELD_BE32(end_esmask, end.w1);
+
+    switch (offset) {
+    case XIVE_ESB_LOAD_EOI ... XIVE_ESB_LOAD_EOI + 0x7FF:
+        ret = xive_esb_eoi(&pq);
+
+        /* Forward the source event notification for routing ?? */
+        break;
+
+    case XIVE_ESB_GET ... XIVE_ESB_GET + 0x3FF:
+        ret = pq;
+        break;
+
+    case XIVE_ESB_SET_PQ_00 ... XIVE_ESB_SET_PQ_00 + 0x0FF:
+    case XIVE_ESB_SET_PQ_01 ... XIVE_ESB_SET_PQ_01 + 0x0FF:
+    case XIVE_ESB_SET_PQ_10 ... XIVE_ESB_SET_PQ_10 + 0x0FF:
+    case XIVE_ESB_SET_PQ_11 ... XIVE_ESB_SET_PQ_11 + 0x0FF:
+        ret = xive_esb_set(&pq, (offset >> 8) & 0x3);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid END ESB load addr %d\n",
+                      offset);
+        return -1;
+    }
+
+    if (pq != GETFIELD_BE32(end_esmask, end.w1)) {
+        end.w1 = SETFIELD_BE32(end_esmask, end.w1, pq);
+        xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
+    }
+
+    return ret;
+}
+
+/*
+ * END ESB MMIO stores are invalid
+ */
+static void xive_end_source_write(void *opaque, hwaddr addr,
+                                  uint64_t value, unsigned size)
+{
+    qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr 0x%"
+                  HWADDR_PRIx"\n", addr);
+}
+
+static const MemoryRegionOps xive_end_source_ops = {
+    .read = xive_end_source_read,
+    .write = xive_end_source_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+};
+
+static void xive_source_end_reset(void *dev)
+{
+    /* TODO: Loop on all ENDs and mask off the ESn and ESe */
+}
+
+static void xive_end_source_realize(DeviceState *dev, Error **errp)
+{
+    XiveENDSource *xsrc = XIVE_END_SOURCE(dev);
+    Object *obj;
+    Error *local_err = NULL;
+
+    obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
+    if (!obj) {
+        error_propagate(errp, local_err);
+        error_prepend(errp, "required link 'xive' not found: ");
+        return;
+    }
+
+    xsrc->xrtr = XIVE_ROUTER(obj);
+
+    if (!xsrc->nr_ends) {
+        error_setg(errp, "Number of interrupt needs to be greater than 0");
+        return;
+    }
+
+    if (xsrc->esb_shift != XIVE_ESB_4K &&
+        xsrc->esb_shift != XIVE_ESB_64K) {
+        error_setg(errp, "Invalid ESB shift setting");
+        return;
+    }
+
+    /*
+     * Each END is assigned an even/odd pair of MMIO pages, the even page
+     * manages the ESn field while the odd page manages the ESe field.
+     */
+    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
+                          &xive_end_source_ops, xsrc, "xive.end",
+                          (1ull << (xsrc->esb_shift + 1)) * xsrc->nr_ends);
+
+    qemu_register_reset(xive_source_end_reset, dev);
+}
+
+static Property xive_end_source_properties[] = {
+    DEFINE_PROP_UINT32("nr-ends", XiveENDSource, nr_ends, 0),
+    DEFINE_PROP_UINT32("shift", XiveENDSource, esb_shift, XIVE_ESB_64K),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xive_end_source_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->desc    = "XIVE END Source";
+    dc->props   = xive_end_source_properties;
+    dc->realize = xive_end_source_realize;
+}
+
+static const TypeInfo xive_end_source_info = {
+    .name          = TYPE_XIVE_END_SOURCE,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(XiveENDSource),
+    .class_init    = xive_end_source_class_init,
+};
+
 /*
  * XIVE Fabric
  */
@@ -706,6 +874,7 @@ static void xive_register_types(void)
     type_register_static(&xive_source_info);
     type_register_static(&xive_fabric_info);
     type_register_static(&xive_router_info);
+    type_register_static(&xive_end_source_info);
 }
 
 type_init(xive_register_types)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 07/37] ppc/xive: introduce the XIVE interrupt thread context
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (5 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-06  4:31   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
                   ` (30 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Each POWER9 processor chip has a XIVE presenter that can generate four
different exceptions to its threads:

  - hypervisor exception,
  - O/S exception
  - Event-Based Branch (EBB)
  - msgsnd (doorbell).

Each exception has a state independent from the others called a Thread
Interrupt Management context. This context is a set of registers which
lets the thread handle priority management and interrupt acknowledgment
among other things. The most important ones being :

  - Interrupt Priority Register  (PIPR)
  - Interrupt Pending Buffer     (IPB)
  - Current Processor Priority   (CPPR)
  - Notification Source Register (NSR)

These registers are accessible through a specific MMIO region, called
the Thread Interrupt Management Area (TIMA), four aligned pages, each
exposing a different view of the registers. First page (page address
ending in 0b00) gives access to the entire context and is reserved for
the ring 0 view for the physical thread context. The second (page
address ending in 0b01) is for the hypervisor, ring 1 view. The third
(page address ending in 0b10) is for the operating system, ring 2
view. The fourth (page address ending in 0b11) is for user level, ring
3 view.

The thread interrupt context is modeled with a XiveTCTX object
containing the values of the different exception registers. The TIMA
region is mapped at the same address for each CPU.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h      |  44 ++++
 include/hw/ppc/xive_regs.h |  82 ++++++++
 hw/intc/xive.c             | 419 +++++++++++++++++++++++++++++++++++++
 3 files changed, 545 insertions(+)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index d67b0785df7c..74b547707b17 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -368,4 +368,48 @@ typedef struct XiveENDSource {
 void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon);
 void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon);
 
+/*
+ * XIVE Thread interrupt Management (TM) context
+ */
+
+#define TYPE_XIVE_TCTX "xive-tctx"
+#define XIVE_TCTX(obj) OBJECT_CHECK(XiveTCTX, (obj), TYPE_XIVE_TCTX)
+
+/*
+ * XIVE Thread interrupt Management register rings :
+ *
+ *   QW-0  User       event-based exception state
+ *   QW-1  O/S        OS context for priority management, interrupt acks
+ *   QW-2  Pool       hypervisor pool context for virtual processors dispatched
+ *   QW-3  Physical   physical thread context and security context
+ */
+#define XIVE_TM_RING_COUNT      4
+#define XIVE_TM_RING_SIZE       0x10
+
+typedef struct XiveTCTX {
+    DeviceState parent_obj;
+
+    CPUState    *cs;
+    qemu_irq    output;
+
+    uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
+} XiveTCTX;
+
+/*
+ * XIVE Thread Interrupt Management Aera (TIMA)
+ *
+ * This region gives access to the registers of the thread interrupt
+ * management context. It is four page wide, each page providing a
+ * different view of the registers. The page with the lower offset is
+ * the most privileged and gives access to the entire context.
+ */
+#define XIVE_TM_HW_PAGE         0x0
+#define XIVE_TM_HV_PAGE         0x1
+#define XIVE_TM_OS_PAGE         0x2
+#define XIVE_TM_USER_PAGE       0x3
+
+extern const MemoryRegionOps xive_tm_ops;
+
+void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
+
 #endif /* PPC_XIVE_H */
diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
index 3c0ebad18b69..ede3d04c5eda 100644
--- a/include/hw/ppc/xive_regs.h
+++ b/include/hw/ppc/xive_regs.h
@@ -23,6 +23,88 @@
 #define XIVE_SRCNO_INDEX(srcno) ((srcno) & 0x0fffffff)
 #define XIVE_SRCNO(blk, idx)    ((uint32_t)(blk) << 28 | (idx))
 
+#define TM_SHIFT                16
+
+/* TM register offsets */
+#define TM_QW0_USER             0x000 /* All rings */
+#define TM_QW1_OS               0x010 /* Ring 0..2 */
+#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
+#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
+
+/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
+#define TM_NSR                  0x0  /*  +   +   -   +  */
+#define TM_CPPR                 0x1  /*  -   +   -   +  */
+#define TM_IPB                  0x2  /*  -   +   +   +  */
+#define TM_LSMFB                0x3  /*  -   +   +   +  */
+#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
+#define TM_INC                  0x5  /*  -   +   -   +  */
+#define TM_AGE                  0x6  /*  -   +   -   +  */
+#define TM_PIPR                 0x7  /*  -   +   -   +  */
+
+#define TM_WORD0                0x0
+#define TM_WORD1                0x4
+
+/*
+ * QW word 2 contains the valid bit at the top and other fields
+ * depending on the QW.
+ */
+#define TM_WORD2                0x8
+#define   TM_QW0W2_VU           PPC_BIT32(0)
+#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
+#define   TM_QW1W2_VO           PPC_BIT32(0)
+#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
+#define   TM_QW2W2_VP           PPC_BIT32(0)
+#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
+#define   TM_QW3W2_VT           PPC_BIT32(0)
+#define   TM_QW3W2_LP           PPC_BIT32(6)
+#define   TM_QW3W2_LE           PPC_BIT32(7)
+#define   TM_QW3W2_T            PPC_BIT32(31)
+
+/*
+ * In addition to normal loads to "peek" and writes (only when invalid)
+ * using 4 and 8 bytes accesses, the above registers support these
+ * "special" byte operations:
+ *
+ *   - Byte load from QW0[NSR] - User level NSR (EBB)
+ *   - Byte store to QW0[NSR] - User level NSR (EBB)
+ *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
+ *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
+ *                                    otherwise VT||0000000
+ *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
+ *
+ * Then we have all these "special" CI ops at these offset that trigger
+ * all sorts of side effects:
+ */
+#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
+#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
+#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
+#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
+                                         * context */
+#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
+#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
+                                         * context to reg */
+#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
+                                         * context to reg*/
+#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
+#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
+                                         * line */
+#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
+#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
+                                         * line */
+#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
+/* XXX more... */
+
+/* NSR fields for the various QW ack types */
+#define TM_QW0_NSR_EB           PPC_BIT8(0)
+#define TM_QW1_NSR_EO           PPC_BIT8(0)
+#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
+#define  TM_QW3_NSR_HE_NONE     0
+#define  TM_QW3_NSR_HE_POOL     1
+#define  TM_QW3_NSR_HE_PHYS     2
+#define  TM_QW3_NSR_HE_LSI      3
+#define TM_QW3_NSR_I            PPC_BIT8(2)
+#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
+
 /* EAS (Event Assignment Structure)
  *
  * One per interrupt source. Targets an interrupt to a given Event
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 83686e260df5..80a965c14200 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -16,6 +16,424 @@
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
 #include "hw/ppc/xive.h"
+#include "hw/ppc/xive_regs.h"
+
+/*
+ * XIVE Thread Interrupt Management context
+ */
+
+static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
+{
+    return 0;
+}
+
+static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
+{
+    if (cppr > XIVE_PRIORITY_MAX) {
+        cppr = 0xff;
+    }
+
+    tctx->regs[ring + TM_CPPR] = cppr;
+}
+
+/*
+ * XIVE Thread Interrupt Management Area (TIMA)
+ */
+
+/*
+ * Define an access map for each page of the TIMA that we will use in
+ * the memory region ops to filter values when doing loads and stores
+ * of raw registers values
+ *
+ * Registers accessibility bits :
+ *
+ *    0x0 - no access
+ *    0x1 - write only
+ *    0x2 - read only
+ *    0x3 - read/write
+ */
+
+static const uint8_t xive_tm_hw_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   3, 3, 3, 0,
+};
+
+static const uint8_t xive_tm_hv_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   0, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   0, 0, 0, 0,
+};
+
+static const uint8_t xive_tm_os_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
+    /* QW-1 OS   */   2, 3, 2, 2,   2, 2, 0, 2,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+};
+
+static const uint8_t xive_tm_user_view[] = {
+    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-1 OS   */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
+};
+
+/*
+ * Overall TIMA access map for the thread interrupt management context
+ * registers
+ */
+static const uint8_t *xive_tm_views[] = {
+    [XIVE_TM_HW_PAGE]   = xive_tm_hw_view,
+    [XIVE_TM_HV_PAGE]   = xive_tm_hv_view,
+    [XIVE_TM_OS_PAGE]   = xive_tm_os_view,
+    [XIVE_TM_USER_PAGE] = xive_tm_user_view,
+};
+
+/*
+ * Computes a register access mask for a given offset in the TIMA
+ */
+static uint64_t xive_tm_mask(hwaddr offset, unsigned size, bool write)
+{
+    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
+    uint8_t reg_offset = offset & 0x3F;
+    uint8_t reg_mask = write ? 0x1 : 0x2;
+    uint64_t mask = 0x0;
+    int i;
+
+    for (i = 0; i < size; i++) {
+        if (xive_tm_views[page_offset][reg_offset + i] & reg_mask) {
+            mask |= (uint64_t) 0xff << (8 * (size - i - 1));
+        }
+    }
+
+    return mask;
+}
+
+static void xive_tm_raw_write(XiveTCTX *tctx, hwaddr offset, uint64_t value,
+                              unsigned size)
+{
+    uint8_t ring_offset = offset & 0x30;
+    uint8_t reg_offset = offset & 0x3F;
+    uint64_t mask = xive_tm_mask(offset, size, true);
+    int i;
+
+    /*
+     * Only 4 or 8 bytes stores are allowed and the User ring is
+     * excluded
+     */
+    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA @%"
+                      HWADDR_PRIx"\n", offset);
+        return;
+    }
+
+    /*
+     * Use the register offset for the raw values and filter out
+     * reserved values
+     */
+    for (i = 0; i < size; i++) {
+        uint8_t byte_mask = (mask >> (8 * (size - i - 1)));
+        if (byte_mask) {
+            tctx->regs[reg_offset + i] = (value >> (8 * (size - i - 1))) &
+                byte_mask;
+        }
+    }
+}
+
+static uint64_t xive_tm_raw_read(XiveTCTX *tctx, hwaddr offset, unsigned size)
+{
+    uint8_t ring_offset = offset & 0x30;
+    uint8_t reg_offset = offset & 0x3F;
+    uint64_t mask = xive_tm_mask(offset, size, false);
+    uint64_t ret;
+    int i;
+
+    /*
+     * Only 4 or 8 bytes loads are allowed and the User ring is
+     * excluded
+     */
+    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access at TIMA @%"
+                      HWADDR_PRIx"\n", offset);
+        return -1;
+    }
+
+    /* Use the register offset for the raw values */
+    ret = 0;
+    for (i = 0; i < size; i++) {
+        ret |= (uint64_t) tctx->regs[reg_offset + i] << (8 * (size - i - 1));
+    }
+
+    /* filter out reserved values */
+    return ret & mask;
+}
+
+/*
+ * The TM context is mapped twice within each page. Stores and loads
+ * to the first mapping below 2K write and read the specified values
+ * without modification. The second mapping above 2K performs specific
+ * state changes (side effects) in addition to setting/returning the
+ * interrupt management area context of the processor thread.
+ */
+static uint64_t xive_tm_ack_os_reg(XiveTCTX *tctx, hwaddr offset, unsigned size)
+{
+    return xive_tctx_accept(tctx, TM_QW1_OS);
+}
+
+static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset,
+                                uint64_t value, unsigned size)
+{
+    xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff);
+}
+
+/*
+ * Define a mapping of "special" operations depending on the TIMA page
+ * offset and the size of the operation.
+ */
+typedef struct XiveTmOp {
+    uint8_t  page_offset;
+    uint32_t op_offset;
+    unsigned size;
+    void     (*write_handler)(XiveTCTX *tctx, hwaddr offset, uint64_t value,
+                              unsigned size);
+    uint64_t (*read_handler)(XiveTCTX *tctx, hwaddr offset, unsigned size);
+} XiveTmOp;
+
+static const XiveTmOp xive_tm_operations[] = {
+    /*
+     * MMIOs below 2K : raw values and special operations without side
+     * effects
+     */
+    { XIVE_TM_OS_PAGE, TM_QW1_OS + TM_CPPR,   1, xive_tm_set_os_cppr, NULL },
+
+    /* MMIOs above 2K : special operations with side effects */
+    { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG,     2, NULL, xive_tm_ack_os_reg },
+};
+
+static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write)
+{
+    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
+    uint32_t op_offset = offset & 0xFFF;
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(xive_tm_operations); i++) {
+        const XiveTmOp *xto = &xive_tm_operations[i];
+
+        /* Accesses done from a more privileged TIMA page is allowed */
+        if (xto->page_offset >= page_offset &&
+            xto->op_offset == op_offset &&
+            xto->size == size &&
+            ((write && xto->write_handler) || (!write && xto->read_handler))) {
+            return xto;
+        }
+    }
+    return NULL;
+}
+
+/*
+ * TIMA MMIO handlers
+ */
+static void xive_tm_write(void *opaque, hwaddr offset,
+                          uint64_t value, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+    const XiveTmOp *xto;
+
+    /*
+     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
+     */
+
+    /*
+     * First, check for special operations in the 2K region
+     */
+    if (offset & 0x800) {
+        xto = xive_tm_find_op(offset, size, true);
+        if (!xto) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA"
+                          "@%"HWADDR_PRIx"\n", offset);
+        } else {
+            xto->write_handler(tctx, offset, value, size);
+        }
+        return;
+    }
+
+    /*
+     * Then, for special operations in the region below 2K.
+     */
+    xto = xive_tm_find_op(offset, size, true);
+    if (xto) {
+        xto->write_handler(tctx, offset, value, size);
+        return;
+    }
+
+    /*
+     * Finish with raw access to the register values
+     */
+    xive_tm_raw_write(tctx, offset, value, size);
+}
+
+static uint64_t xive_tm_read(void *opaque, hwaddr offset, unsigned size)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
+    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+    const XiveTmOp *xto;
+
+    /*
+     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
+     */
+
+    /*
+     * First, check for special operations in the 2K region
+     */
+    if (offset & 0x800) {
+        xto = xive_tm_find_op(offset, size, false);
+        if (!xto) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access to TIMA"
+                          "@%"HWADDR_PRIx"\n", offset);
+            return -1;
+        }
+        return xto->read_handler(tctx, offset, size);
+    }
+
+    /*
+     * Then, for special operations in the region below 2K.
+     */
+    xto = xive_tm_find_op(offset, size, false);
+    if (xto) {
+        return xto->read_handler(tctx, offset, size);
+    }
+
+    /*
+     * Finish with raw access to the register values
+     */
+    return xive_tm_raw_read(tctx, offset, size);
+}
+
+const MemoryRegionOps xive_tm_ops = {
+    .read = xive_tm_read,
+    .write = xive_tm_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
+
+static char *xive_tctx_ring_print(uint8_t *ring)
+{
+    uint32_t w2 = be32_to_cpu(*((uint32_t *) &ring[TM_WORD2]));
+
+    return g_strdup_printf("%02x   %02x  %02x    %02x   %02x  "
+                   "%02x  %02x   %02x  %08x",
+                   ring[TM_NSR], ring[TM_CPPR], ring[TM_IPB], ring[TM_LSMFB],
+                   ring[TM_ACK_CNT], ring[TM_INC], ring[TM_AGE], ring[TM_PIPR],
+                   w2);
+}
+
+static const char * const xive_tctx_ring_names[] = {
+    "USER", "OS", "POOL", "PHYS",
+};
+
+void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
+{
+    int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
+    int i;
+
+    monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
+                   "  W2\n", cpu_index);
+
+    for (i = 0; i < XIVE_TM_RING_COUNT; i++) {
+        char *s = xive_tctx_ring_print(&tctx->regs[i * XIVE_TM_RING_SIZE]);
+        monitor_printf(mon, "CPU[%04x]: %4s    %s\n", cpu_index,
+                       xive_tctx_ring_names[i], s);
+        g_free(s);
+    }
+}
+
+static void xive_tctx_reset(void *dev)
+{
+    XiveTCTX *tctx = XIVE_TCTX(dev);
+
+    memset(tctx->regs, 0, sizeof(tctx->regs));
+
+    /* Set some defaults */
+    tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF;
+    tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF;
+    tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF;
+}
+
+static void xive_tctx_realize(DeviceState *dev, Error **errp)
+{
+    XiveTCTX *tctx = XIVE_TCTX(dev);
+    PowerPCCPU *cpu;
+    CPUPPCState *env;
+    Object *obj;
+    Error *local_err = NULL;
+
+    obj = object_property_get_link(OBJECT(dev), "cpu", &local_err);
+    if (!obj) {
+        error_propagate(errp, local_err);
+        error_prepend(errp, "required link 'cpu' not found: ");
+        return;
+    }
+
+    cpu = POWERPC_CPU(obj);
+    tctx->cs = CPU(obj);
+
+    env = &cpu->env;
+    switch (PPC_INPUT(env)) {
+    case PPC_FLAGS_INPUT_POWER7:
+        tctx->output = env->irq_inputs[POWER7_INPUT_INT];
+        break;
+
+    default:
+        error_setg(errp, "XIVE interrupt controller does not support "
+                   "this CPU bus model");
+        return;
+    }
+
+    qemu_register_reset(xive_tctx_reset, dev);
+}
+
+static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
+{
+    qemu_unregister_reset(xive_tctx_reset, dev);
+}
+
+static const VMStateDescription vmstate_xive_tctx = {
+    .name = TYPE_XIVE_TCTX,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_BUFFER(regs, XiveTCTX),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void xive_tctx_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->desc = "XIVE Interrupt Thread Context";
+    dc->realize = xive_tctx_realize;
+    dc->unrealize = xive_tctx_unrealize;
+    dc->vmsd = &vmstate_xive_tctx;
+}
+
+static const TypeInfo xive_tctx_info = {
+    .name          = TYPE_XIVE_TCTX,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(XiveTCTX),
+    .class_init    = xive_tctx_class_init,
+};
 
 /*
  * XIVE ESB helpers
@@ -875,6 +1293,7 @@ static void xive_register_types(void)
     type_register_static(&xive_fabric_info);
     type_register_static(&xive_router_info);
     type_register_static(&xive_end_source_info);
+    type_register_static(&xive_tctx_info);
 }
 
 type_init(xive_register_types)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (6 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 07/37] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-07  3:10   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 09/37] ppc/xive: notify the CPU when the interrupt priority is more privileged Cédric Le Goater
                   ` (29 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The last sub-engine of the XIVE architecture is the Interrupt
Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
IVPE share elements, the Power Bus interface (CQ), the routing table
descriptors, and they can be combined in the same HW logic. We do the
same in QEMU and combine both engines in the XiveRouter for
simplicity.

When the IVRE has completed its job of matching an event source with a
Notification Virtual Target (NVT) to notify, it forwards the event
notification to the IVPE sub-engine. The IVPE scans the thread
interrupt contexts of the Notification Virtual Targets (NVT)
dispatched on the HW processor threads and if a match is found, it
signals the thread. If not, the IVPE escalates the notification to
some other targets and records the notification in a backlog queue.

The IVPE maintains the thread interrupt context state for each of its
NVTs not dispatched on HW processor threads in the Notification
Virtual Target table (NVTT).

The model currently only supports single NVT notifications.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h      |  15 +++
 include/hw/ppc/xive_regs.h |  24 ++++
 hw/intc/xive.c             | 227 +++++++++++++++++++++++++++++++++++++
 3 files changed, 266 insertions(+)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 74b547707b17..e9b06e75fc1c 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -327,6 +327,10 @@ typedef struct XiveRouterClass {
                    XiveEND *end);
     int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                      XiveEND *end, uint8_t word_number);
+    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                   XiveNVT *nvt);
+    int (*write_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                     XiveNVT *nvt, uint8_t word_number);
 } XiveRouterClass;
 
 void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
@@ -337,6 +341,11 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                         XiveEND *end);
 int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
                           XiveEND *end, uint8_t word_number);
+int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                        XiveNVT *nvt);
+int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                          XiveNVT *nvt, uint8_t word_number);
+
 
 /*
  * XIVE END ESBs
@@ -393,6 +402,7 @@ typedef struct XiveTCTX {
     qemu_irq    output;
 
     uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
+    uint32_t    hw_cam;
 } XiveTCTX;
 
 /*
@@ -412,4 +422,9 @@ extern const MemoryRegionOps xive_tm_ops;
 
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
 
+static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
+{
+    return (nvt_blk << 19) | nvt_idx;
+}
+
 #endif /* PPC_XIVE_H */
diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
index ede3d04c5eda..85557e730cd8 100644
--- a/include/hw/ppc/xive_regs.h
+++ b/include/hw/ppc/xive_regs.h
@@ -186,4 +186,28 @@ typedef struct XiveEND {
 #define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
 #define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
 
+/* Notification Virtual Target (NVT) */
+typedef struct XiveNVT {
+        uint32_t        w0;
+#define NVT_W0_VALID             PPC_BIT32(0)
+        uint32_t        w1;
+        uint32_t        w2;
+        uint32_t        w3;
+        uint32_t        w4;
+        uint32_t        w5;
+        uint32_t        w6;
+        uint32_t        w7;
+        uint32_t        w8;
+#define NVT_W8_GRP_VALID         PPC_BIT32(0)
+        uint32_t        w9;
+        uint32_t        wa;
+        uint32_t        wb;
+        uint32_t        wc;
+        uint32_t        wd;
+        uint32_t        we;
+        uint32_t        wf;
+} XiveNVT;
+
+#define xive_nvt_is_valid(nvt)    (be32_to_cpu((nvt)->w0) & NVT_W0_VALID)
+
 #endif /* PPC_XIVE_REGS_H */
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 80a965c14200..891542920683 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -358,6 +358,25 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
     }
 }
 
+/* The HW CAM (23bits) is hardwired to :
+ *
+ *   0x000||0b1||4Bit chip number||7Bit Thread number.
+ *
+ * and when the block grouping extension is enabled :
+ *
+ *   4Bit chip number||0x001||7Bit Thread number.
+ */
+static uint32_t hw_cam_line(uint8_t chip_id, uint8_t tid)
+{
+    bool block_group = false; /* TODO (PowerNV) */
+
+    if (block_group) {
+        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
+    } else {
+        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
+    }
+}
+
 static void xive_tctx_reset(void *dev)
 {
     XiveTCTX *tctx = XIVE_TCTX(dev);
@@ -388,6 +407,12 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
     cpu = POWERPC_CPU(obj);
     tctx->cs = CPU(obj);
 
+    if (!tctx->hw_cam) {
+        error_setg(errp, "XIVE: HW CAM is not set for CPU %d",
+                   tctx->cs->cpu_index);
+        return;
+    }
+
     env = &cpu->env;
     switch (PPC_INPUT(env)) {
     case PPC_FLAGS_INPUT_POWER7:
@@ -418,11 +443,17 @@ static const VMStateDescription vmstate_xive_tctx = {
     },
 };
 
+static Property  xive_tctx_properties[] = {
+    DEFINE_PROP_UINT32("hw-cam", XiveTCTX, hw_cam, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void xive_tctx_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->desc = "XIVE Interrupt Thread Context";
+    dc->props = xive_tctx_properties;
     dc->realize = xive_tctx_realize;
     dc->unrealize = xive_tctx_unrealize;
     dc->vmsd = &vmstate_xive_tctx;
@@ -978,6 +1009,194 @@ int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
    return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
 }
 
+int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                        XiveNVT *nvt)
+{
+   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
+
+   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
+}
+
+int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
+                        XiveNVT *nvt, uint8_t word_number)
+{
+   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
+
+   return xrc->write_nvt(xrtr, nvt_blk, nvt_idx, nvt, word_number);
+}
+
+/*
+ * The thread context register words are in big-endian format.
+ */
+static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
+                                     uint8_t nvt_blk, uint32_t nvt_idx,
+                                     bool cam_ignore, uint32_t logic_serv)
+{
+    uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx);
+    uint8_t *regs;
+    uint32_t qw3w2;
+    uint32_t qw2w2;
+    uint32_t qw1w2;
+    uint32_t qw0w2;
+
+    /* TODO (PowerNV): ignore low order bits of nvt id */
+
+    regs = &tctx->regs[TM_QW3_HV_PHYS];
+    qw3w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
+    regs = &tctx->regs[TM_QW2_HV_POOL];
+    qw2w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
+    regs = &tctx->regs[TM_QW1_OS];
+    qw1w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
+    regs = &tctx->regs[TM_QW0_USER];
+    qw0w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
+
+    if (format == 0) {
+        /* F=0 & i=1: Logical server notification */
+        if (cam_ignore == true) {
+            qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n",
+                          nvt_blk, nvt_idx);
+             return -1;
+        }
+
+        /* F=0 & i=0: Specific NVT notification */
+
+        /* PHYS ring */
+        if ((qw3w2 & TM_QW3W2_VT) &&
+            tctx->hw_cam == hw_cam_line(nvt_blk, nvt_idx)) {
+            return TM_QW3_HV_PHYS;
+        }
+
+        /* HV POOL ring */
+        if ((qw2w2 & TM_QW2W2_VP) &&
+            cam == GETFIELD(TM_QW2W2_POOL_CAM, qw2w2)) {
+            return TM_QW2_HV_POOL;
+        }
+
+        /* OS ring */
+        if ((qw1w2 & TM_QW1W2_VO) &&
+            cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) {
+            return TM_QW1_OS;
+        }
+    } else {
+        /* F=1 : User level Event-Based Branch (EBB) notification */
+
+        /* USER ring */
+        if  ((qw1w2 & TM_QW1W2_VO) &&
+             (cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) &&
+             (qw0w2 & TM_QW0W2_VU) &&
+             (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, qw0w2))) {
+            return TM_QW0_USER;
+        }
+    }
+    return -1;
+}
+
+typedef struct XiveTCTXMatch {
+    XiveTCTX *tctx;
+    uint8_t ring;
+} XiveTCTXMatch;
+
+static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
+                                 uint8_t nvt_blk, uint32_t nvt_idx,
+                                 bool cam_ignore, uint8_t priority,
+                                 uint32_t logic_serv, XiveTCTXMatch *match)
+{
+    CPUState *cs;
+
+    /* TODO (PowerNV): handle chip_id overwrite of block field for
+     * hardwired CAM compares */
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+        int ring;
+
+        /*
+         * HW checks that the CPU is enabled in the Physical Thread
+         * Enable Register (PTER).
+         */
+
+        /*
+         * Check the thread context CAM lines and record matches. We
+         * will handle CPU exception delivery later
+         */
+        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
+                                         cam_ignore, logic_serv);
+        /*
+         * Save the context and follow on to catch duplicates, that we
+         * don't support yet.
+         */
+        if (ring != -1) {
+            if (match->tctx) {
+                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
+                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
+                return false;
+            }
+
+            match->ring = ring;
+            match->tctx = tctx;
+        }
+    }
+
+    if (!match->tctx) {
+        qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
+                      nvt_blk, nvt_idx);
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * This is our simple Xive Presenter Engine model. It is merged in the
+ * Router as it does not require an extra object.
+ *
+ * It receives notification requests sent by the IVRE to find one
+ * matching NVT (or more) dispatched on the processor threads. In case
+ * of a single NVT notification, the process is abreviated and the
+ * thread is signaled if a match is found. In case of a logical server
+ * notification (bits ignored at the end of the NVT identifier), the
+ * IVPE and IVRE select a winning thread using different filters. This
+ * involves 2 or 3 exchanges on the PowerBus that the model does not
+ * support.
+ *
+ * The parameters represent what is sent on the PowerBus
+ */
+static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
+                                  uint8_t nvt_blk, uint32_t nvt_idx,
+                                  bool cam_ignore, uint8_t priority,
+                                  uint32_t logic_serv)
+{
+    XiveNVT nvt;
+    XiveTCTXMatch match = { 0 };
+    bool found;
+
+    /* NVT cache lookup */
+    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
+                      nvt_blk, nvt_idx);
+        return;
+    }
+
+    if (!xive_nvt_is_valid(&nvt)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
+                      nvt_blk, nvt_idx);
+        return;
+    }
+
+    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
+                                 priority, logic_serv, &match);
+    if (found) {
+        return;
+    }
+
+    /* If no matching NVT is dispatched on a HW thread :
+     * - update the NVT structure if backlog is activated
+     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
+     *   activated
+     */
+}
+
 /*
  * An END trigger can come from an event trigger (IPI or HW) or from
  * another chip. We don't model the PowerBus but the END trigger
@@ -1047,6 +1266,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
     /*
      * Follows IVPE notification
      */
+    xive_presenter_notify(xrtr, format,
+                          GETFIELD_BE32(END_W6_NVT_BLOCK, end.w6),
+                          GETFIELD_BE32(END_W6_NVT_INDEX, end.w6),
+                          GETFIELD_BE32(END_W7_F0_IGNORE, end.w7),
+                          priority,
+                          GETFIELD_BE32(END_W7_F1_LOG_SERVER_ID, end.w7));
+
+    /* TODO: Auto EOI. */
 }
 
 static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 09/37] ppc/xive: notify the CPU when the interrupt priority is more privileged
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (7 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-07  3:27   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 10/37] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
                   ` (28 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

After the event data was enqueued in the O/S Event Queue, the IVPE
raises the bit corresponding to the priority of the pending interrupt
in the register IBP (Interrupt Pending Buffer) to indicate there is an
event pending in one of the 8 priority queues. The Pending Interrupt
Priority Register (PIPR) is also updated using the IPB. This register
represent the priority of the most favored pending notification.

The PIPR is then compared to the the Current Processor Priority
Register (CPPR). If it is more favored (numerically less than), the
CPU interrupt line is raised and the EO bit of the Notification Source
Register (NSR) is updated to notify the presence of an exception for
the O/S. The check needs to be done whenever the PIPR or the CPPR are
changed.

The O/S acknowledges the interrupt with a special load in the Thread
Interrupt Management Area. If the EO bit of the NSR is set, the CPPR
takes the value of PIPR. The bit number in the IBP corresponding to
the priority of the pending interrupt is reseted and so is the EO bit
of the NSR.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xive.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 891542920683..0db77107ab15 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -22,9 +22,73 @@
  * XIVE Thread Interrupt Management context
  */
 
+/* Convert a priority number to an Interrupt Pending Buffer (IPB)
+ * register, which indicates a pending interrupt at the priority
+ * corresponding to the bit number
+ */
+static uint8_t priority_to_ipb(uint8_t priority)
+{
+    return priority > XIVE_PRIORITY_MAX ?
+        0 : 1 << (XIVE_PRIORITY_MAX - priority);
+}
+
+/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
+ * Interrupt Priority Register (PIPR), which contains the priority of
+ * the most favored pending notification.
+ */
+static uint8_t ipb_to_pipr(uint8_t ibp)
+{
+    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
+}
+
+static void ipb_update(uint8_t *regs, uint8_t priority)
+{
+    regs[TM_IPB] |= priority_to_ipb(priority);
+    regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
+}
+
+static uint8_t exception_mask(uint8_t ring)
+{
+    switch (ring) {
+    case TM_QW1_OS:
+        return TM_QW1_NSR_EO;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
 {
-    return 0;
+    uint8_t *regs = &tctx->regs[ring];
+    uint8_t nsr = regs[TM_NSR];
+    uint8_t mask = exception_mask(ring);
+
+    qemu_irq_lower(tctx->output);
+
+    if (regs[TM_NSR] & mask) {
+        uint8_t cppr = regs[TM_PIPR];
+
+        regs[TM_CPPR] = cppr;
+
+        /* Reset the pending buffer bit */
+        regs[TM_IPB] &= ~priority_to_ipb(cppr);
+        regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
+
+        /* Drop Exception bit */
+        regs[TM_NSR] &= ~mask;
+    }
+
+    return (nsr << 8) | regs[TM_CPPR];
+}
+
+static void xive_tctx_notify(XiveTCTX *tctx, uint8_t ring)
+{
+    uint8_t *regs = &tctx->regs[ring];
+
+    if (regs[TM_PIPR] < regs[TM_CPPR]) {
+        regs[TM_NSR] |= exception_mask(ring);
+        qemu_irq_raise(tctx->output);
+    }
 }
 
 static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
@@ -34,6 +98,9 @@ static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
     }
 
     tctx->regs[ring + TM_CPPR] = cppr;
+
+    /* CPPR has changed, check if we need to raise a pending exception */
+    xive_tctx_notify(tctx, ring);
 }
 
 /*
@@ -189,6 +256,17 @@ static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset,
     xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff);
 }
 
+/*
+ * Adjust the IPB to allow a CPU to process event queues of other
+ * priorities during one physical interrupt cycle.
+ */
+static void xive_tm_set_os_pending(XiveTCTX *tctx, hwaddr offset,
+                                   uint64_t value, unsigned size)
+{
+    ipb_update(&tctx->regs[TM_QW1_OS], value & 0xff);
+    xive_tctx_notify(tctx, TM_QW1_OS);
+}
+
 /*
  * Define a mapping of "special" operations depending on the TIMA page
  * offset and the size of the operation.
@@ -211,6 +289,7 @@ static const XiveTmOp xive_tm_operations[] = {
 
     /* MMIOs above 2K : special operations with side effects */
     { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG,     2, NULL, xive_tm_ack_os_reg },
+    { XIVE_TM_OS_PAGE, TM_SPC_SET_OS_PENDING, 1, xive_tm_set_os_pending, NULL },
 };
 
 static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write)
@@ -387,6 +466,13 @@ static void xive_tctx_reset(void *dev)
     tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF;
     tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF;
     tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF;
+
+    /*
+     * Initialize PIPR to 0xFF to avoid phantom interrupts when the
+     * CPPR is first set.
+     */
+    tctx->regs[TM_QW1_OS + TM_PIPR] =
+        ipb_to_pipr(tctx->regs[TM_QW1_OS + TM_IPB]);
 }
 
 static void xive_tctx_realize(DeviceState *dev, Error **errp)
@@ -1187,9 +1273,15 @@ static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
     found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
                                  priority, logic_serv, &match);
     if (found) {
+        ipb_update(&match.tctx->regs[match.ring], priority);
+        xive_tctx_notify(match.tctx, match.ring);
         return;
     }
 
+    /* Record the IPB in the associated NVT structure */
+    ipb_update((uint8_t *) &nvt.w4, priority);
+    xive_router_write_nvt(xrtr, nvt_blk, nvt_idx, &nvt, 4);
+
     /* If no matching NVT is dispatched on a HW thread :
      * - update the NVT structure if backlog is activated
      * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 10/37] spapr/xive: introduce a XIVE interrupt controller
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (8 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 09/37] ppc/xive: notify the CPU when the interrupt priority is more privileged Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-07  3:39   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 11/37] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
                   ` (27 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

sPAPRXive models the XIVE interrupt controller of the sPAPR machine.
It inherits from the XiveRouter and provisions storage for the routing
tables :

  - Event Assignment Structure (EAS)
  - Event Notification Descriptor (END)

The sPAPRXive model incorporates an internal XiveSource for the IPIs
and for the interrupts of the virtual devices of the guest. This model
is consistent with XIVE architecture which also incorporates an
internal IVSE for IPIs and accelerator interrupts in the IVRE
sub-engine.

The sPAPRXive model exports two memory regions, one for the ESB
trigger and management pages used to control the sources and one for
the TIMA pages. They are mapped by default at the addresses found on
chip 0 of a baremetal system. This is also consistent with the XIVE
architecture which defines a Virtualization Controller BAR for the
internal IVSE ESB pages and a Thread Managment BAR for the TIMA.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/spapr_xive.h       |  45 ++++
 hw/intc/spapr_xive.c              | 366 ++++++++++++++++++++++++++++++
 hw/intc/Makefile.objs             |   1 +
 4 files changed, 413 insertions(+)
 create mode 100644 include/hw/ppc/spapr_xive.h
 create mode 100644 hw/intc/spapr_xive.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 2d1e7c5c4668..7f34ad0528ed 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_XIVE=$(CONFIG_PSERIES)
+CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
 CONFIG_SPAPR_RNG=y
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
new file mode 100644
index 000000000000..f087959b9924
--- /dev/null
+++ b/include/hw/ppc/spapr_xive.h
@@ -0,0 +1,45 @@
+/*
+ * QEMU PowerPC sPAPR XIVE interrupt controller model
+ *
+ * Copyright (c) 2017-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PPC_SPAPR_XIVE_H
+#define PPC_SPAPR_XIVE_H
+
+#include "hw/ppc/xive.h"
+
+#define TYPE_SPAPR_XIVE "spapr-xive"
+#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
+
+typedef struct sPAPRXive {
+    XiveRouter    parent;
+
+    /* Internal interrupt source for IPIs and virtual devices */
+    XiveSource    source;
+    hwaddr        vc_base;
+
+    /* END ESB MMIOs */
+    XiveENDSource end_source;
+    hwaddr        end_base;
+
+    /* Routing table */
+    XiveEAS       *eat;
+    uint32_t      nr_irqs;
+    XiveEND       *endt;
+    uint32_t      nr_ends;
+
+    /* TIMA mapping address */
+    hwaddr        tm_base;
+    MemoryRegion  tm_mmio;
+} sPAPRXive;
+
+bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
+bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
+void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
+qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
+
+#endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
new file mode 100644
index 000000000000..eef5830d45c6
--- /dev/null
+++ b/hw/intc/spapr_xive.c
@@ -0,0 +1,366 @@
+/*
+ * QEMU PowerPC sPAPR XIVE interrupt controller model
+ *
+ * Copyright (c) 2017-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "monitor/monitor.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/xive.h"
+#include "hw/ppc/xive_regs.h"
+
+/*
+ * XIVE Virtualization Controller BAR and Thread Managment BAR that we
+ * use for the ESB pages and the TIMA pages
+ */
+#define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
+#define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
+
+/*
+ * On sPAPR machines, use a simplified output for the XIVE END
+ * structure dumping only the information related to the OS EQ.
+ */
+static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
+                                          Monitor *mon)
+{
+    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
+    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
+    uint32_t qentries = 1 << (qsize + 10);
+    uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
+    uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
+
+    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
+                   priority, qindex, qentries, qgen);
+
+    xive_end_queue_pic_print_info(end, 6, mon);
+    monitor_printf(mon, "]");
+}
+
+void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
+{
+    XiveSource *xsrc = &xive->source;
+    int i;
+
+    monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
+
+    for (i = 0; i < xive->nr_irqs; i++) {
+        uint8_t pq = xive_source_esb_get(xsrc, i);
+        XiveEAS *eas = &xive->eat[i];
+
+        if (!xive_eas_is_valid(eas)) {
+            continue;
+        }
+
+        monitor_printf(mon, "  %08x %s %c%c%c %s %08x ", i,
+                       xive_source_irq_is_lsi(xsrc, i) ? "LSI" : "MSI",
+                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
+                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
+                       xsrc->status[i] & XIVE_STATUS_ASSERTED ? 'A' : ' ',
+                       xive_eas_is_masked(eas) ? "M" : " ",
+                       (int) GETFIELD_BE64(EAS_END_DATA, eas->w));
+
+        if (!xive_eas_is_masked(eas)) {
+            uint32_t end_idx = GETFIELD_BE64(EAS_END_INDEX, eas->w);
+            XiveEND *end;
+
+            assert(end_idx < xive->nr_ends);
+            end = &xive->endt[end_idx];
+
+            if (xive_end_is_valid(end)) {
+                spapr_xive_end_pic_print_info(xive, end, mon);
+            }
+        }
+        monitor_printf(mon, "\n");
+    }
+}
+
+static void spapr_xive_map_mmio(sPAPRXive *xive)
+{
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
+    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
+}
+
+static void spapr_xive_end_reset(XiveEND *end)
+{
+    memset(end, 0, sizeof(*end));
+
+    /* switch off the escalation and notification ESBs */
+    end->w1 = cpu_to_be32(END_W1_ESe_Q | END_W1_ESn_Q);
+}
+
+static void spapr_xive_reset(void *dev)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+    int i;
+
+    /*
+     * The XiveSource has its own reset handler, which mask off all
+     * IRQs (!P|Q)
+     */
+
+    /* Mask all valid EASs in the IRQ number space. */
+    for (i = 0; i < xive->nr_irqs; i++) {
+        XiveEAS *eas = &xive->eat[i];
+        if (xive_eas_is_valid(eas)) {
+            eas->w = cpu_to_be64(EAS_VALID | EAS_MASKED);
+        } else {
+            eas->w = 0;
+        }
+    }
+
+    /* Clear all ENDs */
+    for (i = 0; i < xive->nr_ends; i++) {
+        spapr_xive_end_reset(&xive->endt[i]);
+    }
+}
+
+static void spapr_xive_instance_init(Object *obj)
+{
+    sPAPRXive *xive = SPAPR_XIVE(obj);
+
+    object_initialize(&xive->source, sizeof(xive->source), TYPE_XIVE_SOURCE);
+    object_property_add_child(obj, "source", OBJECT(&xive->source), NULL);
+
+    object_initialize(&xive->end_source, sizeof(xive->end_source),
+                      TYPE_XIVE_END_SOURCE);
+    object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
+                              NULL);
+}
+
+static void spapr_xive_realize(DeviceState *dev, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_XIVE(dev);
+    XiveSource *xsrc = &xive->source;
+    XiveENDSource *end_xsrc = &xive->end_source;
+    Error *local_err = NULL;
+
+    if (!xive->nr_irqs) {
+        error_setg(errp, "Number of interrupt needs to be greater 0");
+        return;
+    }
+
+    if (!xive->nr_ends) {
+        error_setg(errp, "Number of interrupt needs to be greater 0");
+        return;
+    }
+
+    /*
+     * Initialize the internal sources, for IPIs and virtual devices.
+     */
+    object_property_set_int(OBJECT(xsrc), xive->nr_irqs, "nr-irqs",
+                            &error_fatal);
+    object_property_add_const_link(OBJECT(xsrc), "xive", OBJECT(xive),
+                                   &error_fatal);
+    object_property_set_bool(OBJECT(xsrc), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /*
+     * Initialize the END ESB source
+     */
+    object_property_set_int(OBJECT(end_xsrc), xive->nr_irqs, "nr-ends",
+                            &error_fatal);
+    object_property_add_const_link(OBJECT(end_xsrc), "xive", OBJECT(xive),
+                                   &error_fatal);
+    object_property_set_bool(OBJECT(end_xsrc), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Set the mapping address of the END ESB pages after the source ESBs */
+    xive->end_base = xive->vc_base + (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
+
+    /*
+     * Allocate the routing tables
+     */
+    xive->eat = g_new0(XiveEAS, xive->nr_irqs);
+    xive->endt = g_new0(XiveEND, xive->nr_ends);
+
+    /* TIMA initialization */
+    memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
+                          "xive.tima", 4ull << TM_SHIFT);
+
+    /* Define all XIVE MMIO regions on SysBus */
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
+
+    /* Map all regions */
+    spapr_xive_map_mmio(xive);
+
+    qemu_register_reset(spapr_xive_reset, dev);
+}
+
+static int spapr_xive_get_eas(XiveRouter *xrtr, uint8_t eas_blk,
+                              uint32_t eas_idx, XiveEAS *eas)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xrtr);
+
+    if (eas_idx >= xive->nr_irqs) {
+        return -1;
+    }
+
+    *eas = xive->eat[eas_idx];
+    return 0;
+}
+
+static int spapr_xive_get_end(XiveRouter *xrtr,
+                              uint8_t end_blk, uint32_t end_idx, XiveEND *end)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xrtr);
+
+    if (end_idx >= xive->nr_ends) {
+        return -1;
+    }
+
+    memcpy(end, &xive->endt[end_idx], sizeof(XiveEND));
+    return 0;
+}
+
+static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
+                                uint32_t end_idx, XiveEND *end,
+                                uint8_t word_number)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xrtr);
+
+    if (end_idx >= xive->nr_ends) {
+        return -1;
+    }
+
+    memcpy(&xive->endt[end_idx], end, sizeof(XiveEND));
+    return 0;
+}
+
+static const VMStateDescription vmstate_spapr_xive_end = {
+    .name = TYPE_SPAPR_XIVE "/end",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT32(w0, XiveEND),
+        VMSTATE_UINT32(w1, XiveEND),
+        VMSTATE_UINT32(w2, XiveEND),
+        VMSTATE_UINT32(w3, XiveEND),
+        VMSTATE_UINT32(w4, XiveEND),
+        VMSTATE_UINT32(w5, XiveEND),
+        VMSTATE_UINT32(w6, XiveEND),
+        VMSTATE_UINT32(w7, XiveEND),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_xive_eas = {
+    .name = TYPE_SPAPR_XIVE "/eas",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(w, XiveEAS),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_xive = {
+    .name = TYPE_SPAPR_XIVE,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
+        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
+                                     vmstate_spapr_xive_eas, XiveEAS),
+        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(endt, sPAPRXive, nr_ends,
+                                             vmstate_spapr_xive_end, XiveEND),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property spapr_xive_properties[] = {
+    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
+    DEFINE_PROP_UINT32("nr-ends", sPAPRXive, nr_ends, 0),
+    DEFINE_PROP_UINT64("vc-base", sPAPRXive, vc_base, SPAPR_XIVE_VC_BASE),
+    DEFINE_PROP_UINT64("tm-base", sPAPRXive, tm_base, SPAPR_XIVE_TM_BASE),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void spapr_xive_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    XiveRouterClass *xrc = XIVE_ROUTER_CLASS(klass);
+
+    dc->desc    = "sPAPR XIVE Interrupt Controller";
+    dc->props   = spapr_xive_properties;
+    dc->realize = spapr_xive_realize;
+    dc->vmsd    = &vmstate_spapr_xive;
+
+    xrc->get_eas = spapr_xive_get_eas;
+    xrc->get_end = spapr_xive_get_end;
+    xrc->write_end = spapr_xive_write_end;
+}
+
+static const TypeInfo spapr_xive_info = {
+    .name = TYPE_SPAPR_XIVE,
+    .parent = TYPE_XIVE_ROUTER,
+    .instance_init = spapr_xive_instance_init,
+    .instance_size = sizeof(sPAPRXive),
+    .class_init = spapr_xive_class_init,
+};
+
+static void spapr_xive_register_types(void)
+{
+    type_register_static(&spapr_xive_info);
+}
+
+type_init(spapr_xive_register_types)
+
+bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
+{
+    XiveSource *xsrc = &xive->source;
+
+    if (lisn >= xive->nr_irqs) {
+        return false;
+    }
+
+    xive->eat[lisn].w |= cpu_to_be64(EAS_VALID);
+    xive_source_irq_set(xsrc, lisn, lsi);
+    return true;
+}
+
+bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn)
+{
+    XiveSource *xsrc = &xive->source;
+
+    if (lisn >= xive->nr_irqs) {
+        return false;
+    }
+
+    xive->eat[lisn].w &= cpu_to_be64(~EAS_VALID);
+    xive_source_irq_set(xsrc, lisn, false);
+    return true;
+}
+
+qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
+{
+    XiveSource *xsrc = &xive->source;
+
+    if (lisn >= xive->nr_irqs) {
+        return NULL;
+    }
+
+    /* The sPAPR machine/device should have claimed the IRQ before */
+    assert(xive_eas_is_valid(&xive->eat[lisn]));
+
+    return xive_source_qirq(xsrc, lisn);
+}
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 72a46ed91c31..301a8e972d91 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -38,6 +38,7 @@ obj-$(CONFIG_XICS) += xics.o
 obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
 obj-$(CONFIG_XIVE) += xive.o
+obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 11/37] spapr/xive: use the VCPU id as a NVT identifier
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (9 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 10/37] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-07  3:46   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 12/37] spapr: initialize VSMT before initializing the IRQ backend Cédric Le Goater
                   ` (26 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts
to find a matching Notification Virtual Target (NVT) among the NVTs
dispatched on the HW processor threads.

On a real system, the thread interrupt contexts are updated by the
hypervisor when a Virtual Processor is scheduled to run on a HW
thread. Under QEMU, the model will emulate the same behavior by
hardwiring the NVT identifier in the thread context registers at
reset.

The NVT identifier used by the sPAPRXive model is the VCPU id. The END
identifier is also derived from the VCPU id. A set of helpers doing
the conversion between identifiers are provided for the hcalls
configuring the sources and the ENDs.

The model does not need a NVT table but The XiveRouter NVT operations
are provided to perform some extra checks in the routing algorithm.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c | 53 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index eef5830d45c6..8da7a8bee949 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -26,6 +26,27 @@
 #define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
 #define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
 
+/*
+ * The allocation of VP blocks is a complex operation in OPAL and the
+ * VP identifiers have a relation with the number of HW chips, the
+ * size of the VP blocks, VP grouping, etc. The QEMU sPAPR XIVE
+ * controller model does not have the same constraints and can use a
+ * simple mapping scheme of the CPU vcpu_id
+ *
+ * These identifiers are never returned to the OS.
+ */
+
+#define SPAPR_XIVE_NVT_BASE 0x400
+
+/*
+ * sPAPR NVT and END indexing helpers
+ */
+static uint32_t spapr_xive_nvt_to_target(sPAPRXive *xive, uint8_t nvt_blk,
+                                  uint32_t nvt_idx)
+{
+    return nvt_idx - SPAPR_XIVE_NVT_BASE;
+}
+
 /*
  * On sPAPR machines, use a simplified output for the XIVE END
  * structure dumping only the information related to the OS EQ.
@@ -40,7 +61,8 @@ static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
     uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
     uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
 
-    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
+    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d",
+                   spapr_xive_nvt_to_target(xive, 0, nvt),
                    priority, qindex, qentries, qgen);
 
     xive_end_queue_pic_print_info(end, 6, mon);
@@ -246,6 +268,33 @@ static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
     return 0;
 }
 
+static int spapr_xive_get_nvt(XiveRouter *xrtr,
+                              uint8_t nvt_blk, uint32_t nvt_idx, XiveNVT *nvt)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xrtr);
+    uint32_t vcpu_id = spapr_xive_nvt_to_target(xive, nvt_blk, nvt_idx);
+    PowerPCCPU *cpu = spapr_find_cpu(vcpu_id);
+
+    if (!cpu) {
+        return -1;
+    }
+
+    /*
+     * sPAPR does not maintain a NVT table. Return that the NVT is
+     * valid if we have found a matching CPU
+     */
+    nvt->w0 = cpu_to_be32(NVT_W0_VALID);
+    return 0;
+}
+
+static int spapr_xive_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk,
+                                uint32_t nvt_idx, XiveNVT *nvt,
+                                uint8_t word_number)
+{
+    /* no NVT table */
+    return 0;
+}
+
 static const VMStateDescription vmstate_spapr_xive_end = {
     .name = TYPE_SPAPR_XIVE "/end",
     .version_id = 1,
@@ -308,6 +357,8 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
     xrc->get_eas = spapr_xive_get_eas;
     xrc->get_end = spapr_xive_get_end;
     xrc->write_end = spapr_xive_write_end;
+    xrc->get_nvt = spapr_xive_get_nvt;
+    xrc->write_nvt = spapr_xive_write_nvt;
 }
 
 static const TypeInfo spapr_xive_info = {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 12/37] spapr: initialize VSMT before initializing the IRQ backend
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (10 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 11/37] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-07  3:59   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 13/37] spapr: introduce a spapr_irq_init() routine Cédric Le Goater
                   ` (25 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

We will need to use xics_max_server_number() to create the sPAPRXive
object modeling the interrupt controller of the machine which is
created before the CPUs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7afd1a175bf2..50cb9f9f4a02 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2466,11 +2466,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
         boot_cores_nr = possible_cpus->len;
     }
 
-    /* VSMT must be set in order to be able to compute VCPU ids, ie to
-     * call xics_max_server_number() or spapr_vcpu_id().
-     */
-    spapr_set_vsmt_mode(spapr, &error_fatal);
-
     if (smc->pre_2_10_has_unused_icps) {
         int i;
 
@@ -2593,6 +2588,11 @@ static void spapr_machine_init(MachineState *machine)
     /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
     load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
 
+    /* VSMT must be set in order to be able to compute VCPU ids, ie to
+     * call xics_max_server_number() or spapr_vcpu_id().
+     */
+    spapr_set_vsmt_mode(spapr, &error_fatal);
+
     /* Set up Interrupt Controller before we create the VCPUs */
     smc->irq->init(spapr, &error_fatal);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 13/37] spapr: introduce a spapr_irq_init() routine
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (11 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 12/37] spapr: initialize VSMT before initializing the IRQ backend Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-07  4:04   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 14/37] spapr: modify the irq backend 'init' method Cédric Le Goater
                   ` (24 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Initialize the MSI bitmap from it as this will be necessary for the
sPAPR IRQ backend for XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/hw/ppc/spapr_irq.h |  1 +
 hw/ppc/spapr.c             |  2 +-
 hw/ppc/spapr_irq.c         | 16 +++++++++++-----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index a467ce696ee4..bd7301e6d9c6 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -43,6 +43,7 @@ typedef struct sPAPRIrq {
 extern sPAPRIrq spapr_irq_xics;
 extern sPAPRIrq spapr_irq_xics_legacy;
 
+void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
 void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
 qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 50cb9f9f4a02..e470efe7993c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2594,7 +2594,7 @@ static void spapr_machine_init(MachineState *machine)
     spapr_set_vsmt_mode(spapr, &error_fatal);
 
     /* Set up Interrupt Controller before we create the VCPUs */
-    smc->irq->init(spapr, &error_fatal);
+    spapr_irq_init(spapr, &error_fatal);
 
     /* Set up containers for ibm,client-architecture-support negotiated options
      */
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index e77b94cc685e..f8b651de0ec9 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -97,11 +97,6 @@ static void spapr_irq_init_xics(sPAPRMachineState *spapr, Error **errp)
     int nr_irqs = smc->irq->nr_irqs;
     Error *local_err = NULL;
 
-    /* Initialize the MSI IRQ allocator. */
-    if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
-        spapr_irq_msi_init(spapr, smc->irq->nr_msis);
-    }
-
     if (kvm_enabled()) {
         if (machine_kernel_irqchip_allowed(machine) &&
             !xics_kvm_init(spapr, &local_err)) {
@@ -213,6 +208,17 @@ sPAPRIrq spapr_irq_xics = {
 /*
  * sPAPR IRQ frontend routines for devices
  */
+void spapr_irq_init(sPAPRMachineState *spapr, Error **errp)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+    /* Initialize the MSI IRQ allocator. */
+    if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
+        spapr_irq_msi_init(spapr, smc->irq->nr_msis);
+    }
+
+    smc->irq->init(spapr, errp);
+}
 
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 14/37] spapr: modify the irq backend 'init' method
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (12 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 13/37] spapr: introduce a spapr_irq_init() routine Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 15/37] spapr: export and rename the xics_max_server_number() routine Cédric Le Goater
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Add a 'nr_irqs' parameter to the 'init' method to remove the use of
the machine class. This will be useful when we introduce the machine
supporting the two sPAPR IRQ backends : XICS and XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h | 2 +-
 hw/ppc/spapr_irq.c         | 7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index bd7301e6d9c6..0e9229bf219e 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -33,7 +33,7 @@ typedef struct sPAPRIrq {
     uint32_t    nr_irqs;
     uint32_t    nr_msis;
 
-    void (*init)(sPAPRMachineState *spapr, Error **errp);
+    void (*init)(sPAPRMachineState *spapr, int nr_irqs, Error **errp);
     int (*claim)(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
     void (*free)(sPAPRMachineState *spapr, int irq, int num);
     qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index f8b651de0ec9..bac450ffff23 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -90,11 +90,10 @@ error:
     return NULL;
 }
 
-static void spapr_irq_init_xics(sPAPRMachineState *spapr, Error **errp)
+static void spapr_irq_init_xics(sPAPRMachineState *spapr, int nr_irqs,
+                                Error **errp)
 {
     MachineState *machine = MACHINE(spapr);
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-    int nr_irqs = smc->irq->nr_irqs;
     Error *local_err = NULL;
 
     if (kvm_enabled()) {
@@ -217,7 +216,7 @@ void spapr_irq_init(sPAPRMachineState *spapr, Error **errp)
         spapr_irq_msi_init(spapr, smc->irq->nr_msis);
     }
 
-    smc->irq->init(spapr, errp);
+    smc->irq->init(spapr, smc->irq->nr_irqs, errp);
 }
 
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 15/37] spapr: export and rename the xics_max_server_number() routine
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (13 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 14/37] spapr: modify the irq backend 'init' method Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-07  4:07   ` David Gibson
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 16/37] spapr: introdude a new machine IRQ backend for XIVE Cédric Le Goater
                   ` (22 subsequent siblings)
  37 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The XIVE sPAPR IRQ backend will use it to define the number of ENDs of
the IC controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h | 1 +
 hw/ppc/spapr.c         | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6279711fe8f7..198764066dc9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -737,6 +737,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
 void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
                           Error **errp);
 void spapr_clear_pending_events(sPAPRMachineState *spapr);
+int spapr_max_server_number(sPAPRMachineState *spapr);
 
 /* CPU and LMB DRC release callbacks. */
 void spapr_core_release(DeviceState *dev);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e470efe7993c..a689f853e020 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -150,7 +150,7 @@ static void pre_2_10_vmstate_unregister_dummy_icp(int i)
                        (void *)(uintptr_t) i);
 }
 
-static int xics_max_server_number(sPAPRMachineState *spapr)
+int spapr_max_server_number(sPAPRMachineState *spapr)
 {
     assert(spapr->vsmt);
     return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads);
@@ -1270,7 +1270,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
     /* /interrupt controller */
-    spapr_dt_xics(xics_max_server_number(spapr), fdt, PHANDLE_XICP);
+    spapr_dt_xics(spapr_max_server_number(spapr), fdt, PHANDLE_XICP);
 
     ret = spapr_populate_memory(spapr, fdt);
     if (ret < 0) {
@@ -2469,7 +2469,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
     if (smc->pre_2_10_has_unused_icps) {
         int i;
 
-        for (i = 0; i < xics_max_server_number(spapr); i++) {
+        for (i = 0; i < spapr_max_server_number(spapr); i++) {
             /* Dummy entries get deregistered when real ICPState objects
              * are registered during CPU core hotplug.
              */
@@ -2589,7 +2589,7 @@ static void spapr_machine_init(MachineState *machine)
     load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
 
     /* VSMT must be set in order to be able to compute VCPU ids, ie to
-     * call xics_max_server_number() or spapr_vcpu_id().
+     * call spapr_max_server_number() or spapr_vcpu_id().
      */
     spapr_set_vsmt_mode(spapr, &error_fatal);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 16/37] spapr: introdude a new machine IRQ backend for XIVE
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (14 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 15/37] spapr: export and rename the xics_max_server_number() routine Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 17/37] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The XIVE IRQ backend uses the same layout as the new XICS backend but
covers the full range of the IRQ number space. The IRQ numbers for the
CPU IPIs are allocated at the bottom of this space, below 4K, to
preserve compatibility with XICS which does not use that range.

This should be enough given that the maximum number of CPUs is 1024
for the sPAPR machine under QEMU. For the record, the biggest POWER8
or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
cores, SMT8).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h     |   2 +
 include/hw/ppc/spapr_irq.h |   2 +
 hw/ppc/spapr_irq.c         | 112 +++++++++++++++++++++++++++++++++++++
 3 files changed, 116 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 198764066dc9..cb3082d319af 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -16,6 +16,7 @@ typedef struct sPAPREventLogEntry sPAPREventLogEntry;
 typedef struct sPAPREventSource sPAPREventSource;
 typedef struct sPAPRPendingHPT sPAPRPendingHPT;
 typedef struct ICSState ICSState;
+typedef struct sPAPRXive sPAPRXive;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -175,6 +176,7 @@ struct sPAPRMachineState {
     const char *icp_type;
     int32_t irq_map_nr;
     unsigned long *irq_map;
+    sPAPRXive  *xive;
 
     bool cmd_line_caps[SPAPR_CAP_NUM];
     sPAPRCapabilities def, eff, mig;
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 0e9229bf219e..eec3159cd8d8 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -13,6 +13,7 @@
 /*
  * IRQ range offsets per device type
  */
+#define SPAPR_IRQ_IPI        0x0
 #define SPAPR_IRQ_EPOW       0x1000  /* XICS_IRQ_BASE offset */
 #define SPAPR_IRQ_HOTPLUG    0x1001
 #define SPAPR_IRQ_VIO        0x1100  /* 256 VIO devices */
@@ -42,6 +43,7 @@ typedef struct sPAPRIrq {
 
 extern sPAPRIrq spapr_irq_xics;
 extern sPAPRIrq spapr_irq_xics_legacy;
+extern sPAPRIrq spapr_irq_xive;
 
 void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index bac450ffff23..f05aa5a94959 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -12,6 +12,7 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
 #include "hw/ppc/xics.h"
 #include "sysemu/kvm.h"
 
@@ -204,6 +205,117 @@ sPAPRIrq spapr_irq_xics = {
     .print_info  = spapr_irq_print_info_xics,
 };
 
+/*
+ * XIVE IRQ backend.
+ */
+static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
+                                    int nr_servers, Error **errp)
+{
+    sPAPRXive *xive;
+    Error *local_err = NULL;
+    Object *obj;
+    uint32_t nr_ends = nr_servers << 3; /* 8 priority ENDs per CPU */
+    int i;
+
+    /* TODO : use qdev_create() ? */
+    obj = object_new(TYPE_SPAPR_XIVE);
+    object_property_set_int(obj, nr_irqs, "nr-irqs", &error_abort);
+    object_property_set_int(obj, nr_ends, "nr-ends", &error_abort);
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+    qdev_set_parent_bus(DEVICE(obj), sysbus_get_default());
+    xive = SPAPR_XIVE(obj);
+
+    /* Enable the CPU IPIs */
+    for (i = 0; i < nr_servers; ++i) {
+        spapr_xive_irq_claim(xive, SPAPR_IRQ_IPI + i, false);
+    }
+
+    return xive;
+}
+
+static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
+                                Error **errp)
+{
+    MachineState *machine = MACHINE(spapr);
+    Error *local_err = NULL;
+
+    /* No KVM support */
+    if (kvm_enabled()) {
+        if (machine_kernel_irqchip_required(machine)) {
+            error_setg(errp, "kernel_irqchip requested. no XIVE support");
+            return;
+        }
+    }
+
+    spapr->xive = spapr_xive_create(spapr, nr_irqs,
+                                    spapr_max_server_number(spapr), &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static int spapr_irq_claim_xive(sPAPRMachineState *spapr, int irq, bool lsi,
+                                Error **errp)
+{
+    if (!spapr_xive_irq_claim(spapr->xive, irq, lsi)) {
+        error_setg(errp, "IRQ %d is invalid", irq);
+        return -1;
+    }
+    return 0;
+}
+
+static void spapr_irq_free_xive(sPAPRMachineState *spapr, int irq, int num)
+{
+    int i;
+
+    for (i = irq; i < irq + num; ++i) {
+        spapr_xive_irq_free(spapr->xive, i);
+    }
+}
+
+static qemu_irq spapr_qirq_xive(sPAPRMachineState *spapr, int irq)
+{
+    return spapr_xive_qirq(spapr->xive, irq);
+}
+
+static void spapr_irq_print_info_xive(sPAPRMachineState *spapr,
+                                      Monitor *mon)
+{
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        xive_tctx_pic_print_info(XIVE_TCTX(cpu->intc), mon);
+    }
+
+    spapr_xive_pic_print_info(spapr->xive, mon);
+}
+
+/*
+ * XIVE uses the full IRQ number space. Set it to 8K to be compatible
+ * with XICS.
+ */
+
+#define SPAPR_IRQ_XIVE_NR_IRQS     0x2000
+#define SPAPR_IRQ_XIVE_NR_MSIS     (SPAPR_IRQ_XIVE_NR_IRQS - SPAPR_IRQ_MSI)
+
+sPAPRIrq spapr_irq_xive = {
+    .nr_irqs     = SPAPR_IRQ_XIVE_NR_IRQS,
+    .nr_msis     = SPAPR_IRQ_XIVE_NR_MSIS,
+
+    .init        = spapr_irq_init_xive,
+    .claim       = spapr_irq_claim_xive,
+    .free        = spapr_irq_free_xive,
+    .qirq        = spapr_qirq_xive,
+    .print_info  = spapr_irq_print_info_xive,
+};
+
 /*
  * sPAPR IRQ frontend routines for devices
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 17/37] spapr: add hcalls support for the XIVE exploitation interrupt mode
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (15 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 16/37] spapr: introdude a new machine IRQ backend for XIVE Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 18/37] spapr: add device tree support for the XIVE exploitation mode Cédric Le Goater
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The different XIVE virtualization structures (sources and event queues)
are configured with a set of Hypervisor calls :

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (ESB) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification configuration associated
   with the queue, only unconditional notification is supported for
   the moment. Reset is performed with a queue size of 0 and queueing
   is disabled in that case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the guest's internal interrupt structures to their
   initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure all notifications
   have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

See the code for more documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h      |  15 +-
 include/hw/ppc/spapr_xive.h |   4 +
 hw/intc/spapr_xive.c        | 964 ++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_irq.c          |   2 +
 4 files changed, 984 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index cb3082d319af..6bf028a02fe2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -452,7 +452,20 @@ struct sPAPRMachineState {
 #define H_INVALIDATE_PID        0x378
 #define H_REGISTER_PROC_TBL     0x37C
 #define H_SIGNAL_SYS_RESET      0x380
-#define MAX_HCALL_OPCODE        H_SIGNAL_SYS_RESET
+
+#define H_INT_GET_SOURCE_INFO   0x3A8
+#define H_INT_SET_SOURCE_CONFIG 0x3AC
+#define H_INT_GET_SOURCE_CONFIG 0x3B0
+#define H_INT_GET_QUEUE_INFO    0x3B4
+#define H_INT_SET_QUEUE_CONFIG  0x3B8
+#define H_INT_GET_QUEUE_CONFIG  0x3BC
+#define H_INT_SET_OS_REPORTING_LINE 0x3C0
+#define H_INT_GET_OS_REPORTING_LINE 0x3C4
+#define H_INT_ESB               0x3C8
+#define H_INT_SYNC              0x3CC
+#define H_INT_RESET             0x3D0
+
+#define MAX_HCALL_OPCODE        H_INT_RESET
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index f087959b9924..9506a8f4d10a 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -42,4 +42,8 @@ bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
 qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
 
+typedef struct sPAPRMachineState sPAPRMachineState;
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 8da7a8bee949..f54100b175a5 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -47,6 +47,72 @@ static uint32_t spapr_xive_nvt_to_target(sPAPRXive *xive, uint8_t nvt_blk,
     return nvt_idx - SPAPR_XIVE_NVT_BASE;
 }
 
+static void spapr_xive_cpu_to_nvt(sPAPRXive *xive, PowerPCCPU *cpu,
+                                  uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
+{
+    XiveRouter *xrtr = XIVE_ROUTER(xive);
+
+    assert(cpu);
+
+    if (out_nvt_blk) {
+        /* For testing purpose, we could use 0 for nvt_blk */
+        *out_nvt_blk = xrtr->chip_id;
+    }
+
+    if (out_nvt_blk) {
+        *out_nvt_idx = SPAPR_XIVE_NVT_BASE + cpu->vcpu_id;
+    }
+}
+
+static int spapr_xive_target_to_nvt(sPAPRXive *xive, uint32_t target,
+                                    uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
+{
+    PowerPCCPU *cpu = spapr_find_cpu(target);
+
+    if (!cpu) {
+        return -1;
+    }
+
+    spapr_xive_cpu_to_nvt(xive, cpu, out_nvt_blk, out_nvt_idx);
+    return 0;
+}
+
+/*
+ * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
+ * priorities per CPU
+ */
+static void spapr_xive_cpu_to_end(sPAPRXive *xive, PowerPCCPU *cpu,
+                                  uint8_t prio, uint8_t *out_end_blk,
+                                  uint32_t *out_end_idx)
+{
+    XiveRouter *xrtr = XIVE_ROUTER(xive);
+
+    assert(cpu);
+
+    if (out_end_blk) {
+        /* For testing purpose, we could use 0 for nvt_blk */
+        *out_end_blk = xrtr->chip_id;
+    }
+
+    if (out_end_idx) {
+        *out_end_idx = (cpu->vcpu_id << 3) + prio;
+    }
+}
+
+static int spapr_xive_target_to_end(sPAPRXive *xive,
+                                    uint32_t target, uint8_t prio,
+                                    uint8_t *out_end_blk, uint32_t *out_end_idx)
+{
+   PowerPCCPU *cpu = spapr_find_cpu(target);
+
+    if (!cpu) {
+        return -1;
+    }
+
+    spapr_xive_cpu_to_end(xive, cpu, prio, out_end_blk, out_end_idx);
+    return 0;
+}
+
 /*
  * On sPAPR machines, use a simplified output for the XIVE END
  * structure dumping only the information related to the OS EQ.
@@ -415,3 +481,901 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
 
     return xive_source_qirq(xsrc, lisn);
 }
+
+/*
+ * XIVE hcalls
+ *
+ * The terminology used by the XIVE hcalls is the following :
+ *
+ *   TARGET vCPU number
+ *   EQ     Event Queue assigned by OS to receive event data
+ *   ESB    page for source interrupt management
+ *   LISN   Logical Interrupt Source Number identifying a source in the
+ *          machine
+ *   EISN   Effective Interrupt Source Number used by guest OS to
+ *          identify source in the guest
+ *
+ * The EAS, END, NVT structures are not exposed.
+ */
+
+/*
+ * Linux hosts under OPAL reserve priority 7 for their own escalation
+ * interrupts (DD2.X POWER9). So we only allow the guest to use
+ * priorities [0..6].
+ */
+static bool spapr_xive_priority_is_reserved(uint8_t priority)
+{
+    switch (priority) {
+    case 0 ... 6:
+        return false;
+    case 7: /* OPAL escalation queue */
+    default:
+        return true;
+    }
+}
+
+/*
+ * The H_INT_GET_SOURCE_INFO hcall() is used to obtain the logical
+ * real address of the MMIO page through which the Event State Buffer
+ * entry associated with the value of the "lisn" parameter is managed.
+ *
+ * Parameters:
+ * Input
+ * - R4: "flags"
+ *         Bits 0-63 reserved
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as returned
+ *       by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output
+ * - R4: "flags"
+ *         Bits 0-59: Reserved
+ *         Bit 60: H_INT_ESB must be used for Event State Buffer
+ *                 management
+ *         Bit 61: 1 == LSI  0 == MSI
+ *         Bit 62: the full function page supports trigger
+ *         Bit 63: Store EOI Supported
+ * - R5: Logical Real address of full function Event State Buffer
+ *       management page, -1 if H_INT_ESB hcall flag is set to 1.
+ * - R6: Logical Real Address of trigger only Event State Buffer
+ *       management page or -1.
+ * - R7: Power of 2 page size for the ESB management pages returned in
+ *       R5 and R6.
+ */
+
+#define SPAPR_XIVE_SRC_H_INT_ESB     PPC_BIT(60) /* ESB manage with H_INT_ESB */
+#define SPAPR_XIVE_SRC_LSI           PPC_BIT(61) /* Virtual LSI type */
+#define SPAPR_XIVE_SRC_TRIGGER       PPC_BIT(62) /* Trigger and management
+                                                    on same page */
+#define SPAPR_XIVE_SRC_STORE_EOI     PPC_BIT(63) /* Store EOI support */
+
+static target_ulong h_int_get_source_info(PowerPCCPU *cpu,
+                                          sPAPRMachineState *spapr,
+                                          target_ulong opcode,
+                                          target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveSource *xsrc = &xive->source;
+    target_ulong flags  = args[0];
+    target_ulong lisn   = args[1];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    if (!xive_eas_is_valid(&xive->eat[lisn])) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /* All sources are emulated under the main XIVE object and share
+     * the same characteristics.
+     */
+    args[0] = 0;
+    if (!xive_source_esb_has_2page(xsrc)) {
+        args[0] |= SPAPR_XIVE_SRC_TRIGGER;
+    }
+    if (xsrc->esb_flags & XIVE_SRC_STORE_EOI) {
+        args[0] |= SPAPR_XIVE_SRC_STORE_EOI;
+    }
+
+    /*
+     * Force the use of the H_INT_ESB hcall in case of an LSI
+     * interrupt. This is necessary under KVM to re-trigger the
+     * interrupt if the level is still asserted
+     */
+    if (xive_source_irq_is_lsi(xsrc, lisn)) {
+        args[0] |= SPAPR_XIVE_SRC_H_INT_ESB | SPAPR_XIVE_SRC_LSI;
+    }
+
+    if (!(args[0] & SPAPR_XIVE_SRC_H_INT_ESB)) {
+        args[1] = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn);
+    } else {
+        args[1] = -1;
+    }
+
+    if (xive_source_esb_has_2page(xsrc) &&
+        !(args[0] & SPAPR_XIVE_SRC_H_INT_ESB)) {
+        args[2] = xive->vc_base + xive_source_esb_page(xsrc, lisn);
+    } else {
+        args[2] = -1;
+    }
+
+    if (xive_source_esb_has_2page(xsrc)) {
+        args[3] = xsrc->esb_shift - 1;
+    } else {
+        args[3] = xsrc->esb_shift;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_SOURCE_CONFIG hcall() is used to assign a Logical
+ * Interrupt Source to a target. The Logical Interrupt Source is
+ * designated with the "lisn" parameter and the target is designated
+ * with the "target" and "priority" parameters.  Upon return from the
+ * hcall(), no additional interrupts will be directed to the old EQ.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-61: Reserved
+ *         Bit 62: set the "eisn" in the EAS
+ *         Bit 63: masks the interrupt source in the hardware interrupt
+ *       control structure. An interrupt masked by this mechanism will
+ *       be dropped, but it's source state bits will still be
+ *       set. There is no race-free way of unmasking and restoring the
+ *       source. Thus this should only be used in interrupts that are
+ *       also masked at the source, and only in cases where the
+ *       interrupt is not meant to be used for a large amount of time
+ *       because no valid target exists for it for example
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as returned by
+ *       the H_ALLOCATE_VAS_WINDOW hcall
+ * - R6: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R7: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ * - R8: "eisn" is the guest EISN associated with the "lisn"
+ *
+ * Output:
+ * - None
+ */
+
+#define SPAPR_XIVE_SRC_SET_EISN PPC_BIT(62)
+#define SPAPR_XIVE_SRC_MASK     PPC_BIT(63)
+
+static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveEAS eas, new_eas;
+    target_ulong flags    = args[0];
+    target_ulong lisn     = args[1];
+    target_ulong target   = args[2];
+    target_ulong priority = args[3];
+    target_ulong eisn     = args[4];
+    uint8_t end_blk;
+    uint32_t end_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~(SPAPR_XIVE_SRC_SET_EISN | SPAPR_XIVE_SRC_MASK)) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /* priority 0xff is used to reset the EAS */
+    if (priority == 0xff) {
+        new_eas.w = cpu_to_be64(EAS_VALID | EAS_MASKED);
+        goto out;
+    }
+
+    if (flags & SPAPR_XIVE_SRC_MASK) {
+        new_eas.w = eas.w | cpu_to_be64(EAS_MASKED);
+    } else {
+        new_eas.w = eas.w & cpu_to_be64(~EAS_MASKED);
+    }
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P4;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+    if (spapr_xive_target_to_end(xive, target, priority, &end_blk, &end_idx)) {
+        return H_P3;
+    }
+
+    new_eas.w = SETFIELD_BE64(EAS_END_BLOCK, new_eas.w, end_blk);
+    new_eas.w = SETFIELD_BE64(EAS_END_INDEX, new_eas.w, end_idx);
+
+    if (flags & SPAPR_XIVE_SRC_SET_EISN) {
+        new_eas.w = SETFIELD_BE64(EAS_END_DATA, new_eas.w, eisn);
+    }
+
+out:
+    xive->eat[lisn] = new_eas;
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_SOURCE_CONFIG hcall() is used to determine to which
+ * target/priority pair is assigned to the specified Logical Interrupt
+ * Source.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63 Reserved
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as
+ *       returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - R4: Target to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R5: Priority to which the specified Logical Interrupt Source is
+ *       assigned
+ * - R6: EISN for the specified Logical Interrupt Source (this will be
+ *       equivalent to the LISN if not changed by H_INT_SET_SOURCE_CONFIG)
+ */
+static target_ulong h_int_get_source_config(PowerPCCPU *cpu,
+                                            sPAPRMachineState *spapr,
+                                            target_ulong opcode,
+                                            target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags = args[0];
+    target_ulong lisn = args[1];
+    XiveEAS eas;
+    XiveEND *end;
+    uint8_t nvt_blk;
+    uint32_t end_idx, nvt_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /* EAS_END_BLOCK is unused on sPAPR */
+    end_idx = GETFIELD_BE64(EAS_END_INDEX, eas.w);
+
+    assert(end_idx < xive->nr_ends);
+    end = &xive->endt[end_idx];
+
+    nvt_blk = GETFIELD_BE32(END_W6_NVT_BLOCK, end->w6);
+    nvt_idx = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
+    args[0] = spapr_xive_nvt_to_target(xive, nvt_blk, nvt_idx);
+
+    if (xive_eas_is_masked(&eas)) {
+        args[1] = 0xff;
+    } else {
+        args[1] = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
+    }
+
+    args[2] = GETFIELD_BE64(EAS_END_DATA, eas.w);
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_INFO hcall() is used to get the logical real
+ * address of the notification management page associated with the
+ * specified target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63 Reserved
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: Logical real address of notification page
+ * - R5: Power of 2 page size of the notification page
+ */
+static target_ulong h_int_get_queue_info(PowerPCCPU *cpu,
+                                         sPAPRMachineState *spapr,
+                                         target_ulong opcode,
+                                         target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveENDSource *end_xsrc = &xive->end_source;
+    target_ulong flags = args[0];
+    target_ulong target = args[1];
+    target_ulong priority = args[2];
+    XiveEND *end;
+    uint8_t end_blk;
+    uint32_t end_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+    if (spapr_xive_target_to_end(xive, target, priority, &end_blk, &end_idx)) {
+        return H_P2;
+    }
+
+    assert(end_idx < xive->nr_ends);
+    end = &xive->endt[end_idx];
+
+    args[0] = xive->end_base + (1ull << (end_xsrc->esb_shift + 1)) * end_idx;
+    if (xive_end_is_enqueue(end)) {
+        args[1] = GETFIELD_BE32(END_W0_QSIZE, end->w0) + 12;
+    } else {
+        args[1] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_QUEUE_CONFIG hcall() is used to set or reset a EQ for
+ * a given "target" and "priority".  It is also used to set the
+ * notification config associated with the EQ.  An EQ size of 0 is
+ * used to reset the EQ config for a given target and priority. If
+ * resetting the EQ config, the END associated with the given "target"
+ * and "priority" will be changed to disable queueing.
+ *
+ * Upon return from the hcall(), no additional interrupts will be
+ * directed to the old EQ (if one was set). The old EQ (if one was
+ * set) should be investigated for interrupts that occurred prior to
+ * or during the hcall().
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-62: Reserved
+ *         Bit 63: Unconditional Notify (n) per the XIVE spec
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ * - R7: "eventQueue": The logical real address of the start of the EQ
+ * - R8: "eventQueueSize": The power of 2 EQ size per "ibm,xive-eq-sizes"
+ *
+ * Output:
+ * - None
+ */
+
+#define SPAPR_XIVE_END_ALWAYS_NOTIFY PPC_BIT(63)
+
+static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags = args[0];
+    target_ulong target = args[1];
+    target_ulong priority = args[2];
+    target_ulong qpage = args[3];
+    target_ulong qsize = args[4];
+    XiveEND end;
+    uint8_t end_blk, nvt_blk;
+    uint32_t end_idx, nvt_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~SPAPR_XIVE_END_ALWAYS_NOTIFY) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+
+    if (spapr_xive_target_to_end(xive, target, priority, &end_blk, &end_idx)) {
+        return H_P2;
+    }
+
+    assert(end_idx < xive->nr_ends);
+    memcpy(&end, &xive->endt[end_idx], sizeof(XiveEND));
+
+    switch (qsize) {
+    case 12:
+    case 16:
+    case 21:
+    case 24:
+        end.w2 = cpu_to_be32((qpage >> 32) & 0x0fffffff);
+        end.w3 = cpu_to_be32(qpage & 0xffffffff);
+        end.w0 |= cpu_to_be32(END_W0_ENQUEUE);
+        end.w0 = SETFIELD_BE32(END_W0_QSIZE, end.w0, qsize - 12);
+        break;
+    case 0:
+        /* reset queue and disable queueing */
+        spapr_xive_end_reset(&end);
+        goto out;
+
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid EQ size %"PRIx64"\n",
+                      qsize);
+        return H_P5;
+    }
+
+    if (qsize) {
+        hwaddr plen = 1 << qsize;
+        void *eq;
+
+        /*
+         * Validate the guest EQ. We should also check that the queue
+         * has been zeroed by the OS.
+         */
+        eq = address_space_map(CPU(cpu)->as, qpage, &plen, true,
+                               MEMTXATTRS_UNSPECIFIED);
+        if (plen != 1 << qsize) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to map EQ @0x%"
+                          HWADDR_PRIx "\n", qpage);
+            return H_P4;
+        }
+        address_space_unmap(CPU(cpu)->as, eq, plen, true, plen);
+    }
+
+    /* "target" should have been validated above */
+    if (spapr_xive_target_to_nvt(xive, target, &nvt_blk, &nvt_idx)) {
+        g_assert_not_reached();
+    }
+
+    /* Ensure the priority and target are correctly set (they will not
+     * be right after allocation)
+     */
+    end.w6 = SETFIELD_BE32(END_W6_NVT_BLOCK, 0ul, nvt_blk) |
+        SETFIELD_BE32(END_W6_NVT_INDEX, 0ul, nvt_idx);
+    end.w7 = SETFIELD_BE32(END_W7_F0_PRIORITY, 0ul, priority);
+
+    if (flags & SPAPR_XIVE_END_ALWAYS_NOTIFY) {
+        end.w0 |= cpu_to_be32(END_W0_UCOND_NOTIFY);
+    } else {
+        end.w0 &= cpu_to_be32((uint32_t)~END_W0_UCOND_NOTIFY);
+    }
+
+    /* The generation bit for the END starts at 1 and The END page
+     * offset counter starts at 0.
+     */
+    end.w1 = cpu_to_be32(END_W1_GENERATION) |
+        SETFIELD_BE32(END_W1_PAGE_OFF, 0ul, 0ul);
+    end.w0 |= cpu_to_be32(END_W0_VALID);
+
+    /* TODO: issue syncs required to ensure all in-flight interrupts
+     * are complete on the old END */
+
+out:
+    /* Update END */
+    memcpy(&xive->endt[end_idx], &end, sizeof(XiveEND));
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_GET_QUEUE_CONFIG hcall() is used to get a EQ for a given
+ * target and priority.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-62: Reserved
+ *         Bit 63: Debug: Return debug data
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "priority" is a valid priority not in
+ *       "ibm,plat-res-int-priorities"
+ *
+ * Output:
+ * - R4: "flags":
+ *       Bits 0-61: Reserved
+ *       Bit 62: The value of Event Queue Generation Number (g) per
+ *              the XIVE spec if "Debug" = 1
+ *       Bit 63: The value of Unconditional Notify (n) per the XIVE spec
+ * - R5: The logical real address of the start of the EQ
+ * - R6: The power of 2 EQ size per "ibm,xive-eq-sizes"
+ * - R7: The value of Event Queue Offset Counter per XIVE spec
+ *       if "Debug" = 1, else 0
+ *
+ */
+
+#define SPAPR_XIVE_END_DEBUG     PPC_BIT(63)
+
+static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
+                                           sPAPRMachineState *spapr,
+                                           target_ulong opcode,
+                                           target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags = args[0];
+    target_ulong target = args[1];
+    target_ulong priority = args[2];
+    XiveEND *end;
+    uint8_t end_blk;
+    uint32_t end_idx;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~SPAPR_XIVE_END_DEBUG) {
+        return H_PARAMETER;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    if (spapr_xive_priority_is_reserved(priority)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: priority %ld is reserved\n",
+                      priority);
+        return H_P3;
+    }
+
+    /* Validate that "target" is part of the list of threads allocated
+     * to the partition. For that, find the END corresponding to the
+     * target.
+     */
+    if (spapr_xive_target_to_end(xive, target, priority, &end_blk, &end_idx)) {
+        return H_P2;
+    }
+
+    assert(end_idx < xive->nr_ends);
+    end = &xive->endt[end_idx];
+
+    args[0] = 0;
+    if (xive_end_is_notify(end)) {
+        args[0] |= SPAPR_XIVE_END_ALWAYS_NOTIFY;
+    }
+
+    if (xive_end_is_enqueue(end)) {
+        args[1] = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
+            | be32_to_cpu(end->w3);
+        args[2] = GETFIELD_BE32(END_W0_QSIZE, end->w0) + 12;
+    } else {
+        args[1] = 0;
+        args[2] = 0;
+    }
+
+    /* TODO: do we need any locking on the END ? */
+    if (flags & SPAPR_XIVE_END_DEBUG) {
+        /* Load the event queue generation number into the return flags */
+        args[0] |= (uint64_t)GETFIELD_BE32(END_W1_GENERATION, end->w1) << 62;
+
+        /* Load R7 with the event queue offset counter */
+        args[3] = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+    } else {
+        args[3] = 0;
+    }
+
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SET_OS_REPORTING_LINE hcall() is used to set the
+ * reporting cache line pair for the calling thread.  The reporting
+ * cache lines will contain the OS interrupt context when the OS
+ * issues a CI store byte to @TIMA+0xC10 to acknowledge the OS
+ * interrupt. The reporting cache lines can be reset by inputting -1
+ * in "reportingLine".  Issuing the CI store byte without reporting
+ * cache lines registered will result in the data not being accessible
+ * to the OS.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ * - R5: "reportingLine": The logical real address of the reporting cache
+ *       line pair
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_set_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_SET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_GET_OS_REPORTING_LINE hcall() is used to get the logical
+ * real address of the reporting cache line pair set for the input
+ * "target".  If no reporting cache line pair has been set, -1 is
+ * returned.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ * - R5: "target" is per "ibm,ppc-interrupt-server#s" or
+ *       "ibm,ppc-interrupt-gserver#s"
+ * - R6: "reportingLine": The logical real address of the reporting
+ *        cache line pair
+ *
+ * Output:
+ * - R4: The logical real address of the reporting line if set, else -1
+ */
+static target_ulong h_int_get_os_reporting_line(PowerPCCPU *cpu,
+                                                sPAPRMachineState *spapr,
+                                                target_ulong opcode,
+                                                target_ulong *args)
+{
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* TODO: H_INT_GET_OS_REPORTING_LINE */
+    return H_FUNCTION;
+}
+
+/*
+ * The H_INT_ESB hcall() is used to issue a load or store to the ESB
+ * page for the input "lisn".  This hcall is only supported for LISNs
+ * that have the ESB hcall flag set to 1 when returned from hcall()
+ * H_INT_GET_SOURCE_INFO.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-62: Reserved
+ *         bit 63: Store: Store=1, store operation, else load operation
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as
+ *       returned by the H_ALLOCATE_VAS_WINDOW hcall
+ * - R6: "esbOffset" is the offset into the ESB page for the load or
+ *       store operation
+ * - R7: "storeData" is the data to write for a store operation
+ *
+ * Output:
+ * - R4: The value of the load if load operation, else -1
+ */
+
+#define SPAPR_XIVE_ESB_STORE PPC_BIT(63)
+
+static target_ulong h_int_esb(PowerPCCPU *cpu,
+                              sPAPRMachineState *spapr,
+                              target_ulong opcode,
+                              target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveEAS eas;
+    target_ulong flags  = args[0];
+    target_ulong lisn   = args[1];
+    target_ulong offset = args[2];
+    target_ulong data   = args[3];
+    hwaddr mmio_addr;
+    XiveSource *xsrc = &xive->source;
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags & ~SPAPR_XIVE_ESB_STORE) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    if (offset > (1ull << xsrc->esb_shift)) {
+        return H_P3;
+    }
+
+    mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
+
+    if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
+                      (flags & SPAPR_XIVE_ESB_STORE))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
+                      HWADDR_PRIx "\n", mmio_addr);
+        return H_HARDWARE;
+    }
+    args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_SYNC hcall() is used to issue hardware syncs that will
+ * ensure any in flight events for the input lisn are in the event
+ * queue.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ * - R5: "lisn" is per "interrupts", "interrupt-map", or
+ *       "ibm,xive-lisn-ranges" properties, or as returned by the
+ *       ibm,query-interrupt-source-number RTAS call, or as
+ *       returned by the H_ALLOCATE_VAS_WINDOW hcall
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_sync(PowerPCCPU *cpu,
+                               sPAPRMachineState *spapr,
+                               target_ulong opcode,
+                               target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    XiveEAS eas;
+    target_ulong flags = args[0];
+    target_ulong lisn = args[1];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    if (lisn >= xive->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    eas = xive->eat[lisn];
+    if (!xive_eas_is_valid(&eas)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Invalid LISN %lx\n", lisn);
+        return H_P2;
+    }
+
+    /*
+     * H_STATE should be returned if a H_INT_RESET is in progress.
+     * This is not needed when running the emulation under QEMU
+     */
+
+    /* This is not real hardware. Nothing to be done */
+    return H_SUCCESS;
+}
+
+/*
+ * The H_INT_RESET hcall() is used to reset all of the partition's
+ * interrupt exploitation structures to their initial state.  This
+ * means losing all previously set interrupt state set via
+ * H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
+ *
+ * Parameters:
+ * Input:
+ * - R4: "flags"
+ *         Bits 0-63: Reserved
+ *
+ * Output:
+ * - None
+ */
+static target_ulong h_int_reset(PowerPCCPU *cpu,
+                                sPAPRMachineState *spapr,
+                                target_ulong opcode,
+                                target_ulong *args)
+{
+    sPAPRXive *xive = spapr->xive;
+    target_ulong flags   = args[0];
+
+    if (!spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        return H_FUNCTION;
+    }
+
+    if (flags) {
+        return H_PARAMETER;
+    }
+
+    device_reset(DEVICE(xive));
+    return H_SUCCESS;
+}
+
+void spapr_xive_hcall_init(sPAPRMachineState *spapr)
+{
+    spapr_register_hypercall(H_INT_GET_SOURCE_INFO, h_int_get_source_info);
+    spapr_register_hypercall(H_INT_SET_SOURCE_CONFIG, h_int_set_source_config);
+    spapr_register_hypercall(H_INT_GET_SOURCE_CONFIG, h_int_get_source_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_INFO, h_int_get_queue_info);
+    spapr_register_hypercall(H_INT_SET_QUEUE_CONFIG, h_int_set_queue_config);
+    spapr_register_hypercall(H_INT_GET_QUEUE_CONFIG, h_int_get_queue_config);
+    spapr_register_hypercall(H_INT_SET_OS_REPORTING_LINE,
+                             h_int_set_os_reporting_line);
+    spapr_register_hypercall(H_INT_GET_OS_REPORTING_LINE,
+                             h_int_get_os_reporting_line);
+    spapr_register_hypercall(H_INT_ESB, h_int_esb);
+    spapr_register_hypercall(H_INT_SYNC, h_int_sync);
+    spapr_register_hypercall(H_INT_RESET, h_int_reset);
+}
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index f05aa5a94959..7d2b7f425118 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -257,6 +257,8 @@ static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
         error_propagate(errp, local_err);
         return;
     }
+
+    spapr_xive_hcall_init(spapr);
 }
 
 static int spapr_irq_claim_xive(sPAPRMachineState *spapr, int irq, bool lsi,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 18/37] spapr: add device tree support for the XIVE exploitation mode
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (16 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 17/37] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 19/37] spapr: allocate the interrupt thread context under the CPU core Cédric Le Goater
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The XIVE interface for the guest is described in the device tree under
the "interrupt-controller" node. A couple of new properties are
specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), for the User level and for the Guest OS
   level. Only the Guest OS level is taken into account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the IRQ interrupt number ranges assigned to the guest for the IPIs.

and also under the root node :

 - "ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use. OPAL uses the priority 7 queue to automatically
   escalate interrupts for all other queues (DD2.X POWER9). So only
   priorities [0..6] are allowed for the guest.

Extend the sPAPR IRQ backend with a new handler to populate the DT
with the appropriate "interrupt-controller" node.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h  |  2 ++
 include/hw/ppc/spapr_xive.h |  2 ++
 include/hw/ppc/xics.h       |  4 +--
 hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
 hw/intc/xics_spapr.c        |  3 +-
 hw/ppc/spapr.c              |  3 +-
 hw/ppc/spapr_irq.c          |  3 ++
 7 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index eec3159cd8d8..457239826b8f 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
     void (*free)(sPAPRMachineState *spapr, int irq, int num);
     qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
     void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
+    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
+                        void *fdt, uint32_t phandle);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 9506a8f4d10a..728a5e8dc163 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
 typedef struct sPAPRMachineState sPAPRMachineState;
 
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
+void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 9958443d1984..14afda198cdb 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
     ICPState *(*icp_get)(XICSFabric *xi, int server);
 } XICSFabricClass;
 
-void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
-
 ICPState *xics_icp_get(XICSFabric *xi, int server);
 
 /* Internal XICS interfaces */
@@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
 
 typedef struct sPAPRMachineState sPAPRMachineState;
 
+void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle);
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
 void xics_spapr_init(sPAPRMachineState *spapr);
 
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index f54100b175a5..fd02dc6b91e4 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -14,6 +14,7 @@
 #include "target/ppc/cpu.h"
 #include "sysemu/cpus.h"
 #include "monitor/monitor.h"
+#include "hw/ppc/fdt.h"
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_xive.h"
 #include "hw/ppc/xive.h"
@@ -1379,3 +1380,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_INT_SYNC, h_int_sync);
     spapr_register_hypercall(H_INT_RESET, h_int_reset);
 }
+
+void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle)
+{
+    sPAPRXive *xive = spapr->xive;
+    int node;
+    uint64_t timas[2 * 2];
+    /* Interrupt number ranges for the IPIs */
+    uint32_t lisn_ranges[] = {
+        cpu_to_be32(0),
+        cpu_to_be32(nr_servers),
+    };
+    uint32_t eq_sizes[] = {
+        cpu_to_be32(12), /* 4K */
+        cpu_to_be32(16), /* 64K */
+        cpu_to_be32(21), /* 2M */
+        cpu_to_be32(24), /* 16M */
+    };
+    /* The following array is in sync with the reserved priorities
+     * defined by the 'spapr_xive_priority_is_reserved' routine.
+     */
+    uint32_t plat_res_int_priorities[] = {
+        cpu_to_be32(7),    /* start */
+        cpu_to_be32(0xf8), /* count */
+    };
+    gchar *nodename;
+
+    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
+    timas[0] = cpu_to_be64(xive->tm_base +
+                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
+    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
+    timas[2] = cpu_to_be64(xive->tm_base +
+                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
+    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
+
+    nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
+                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
+    _FDT(node = fdt_add_subnode(fdt, 0, nodename));
+    g_free(nodename);
+
+    _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
+
+    _FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe"));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes,
+                     sizeof(eq_sizes)));
+    _FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges,
+                     sizeof(lisn_ranges)));
+
+    /* For Linux to link the LSIs to the interrupt controller. */
+    _FDT(fdt_setprop(fdt, node, "interrupt-controller", NULL, 0));
+    _FDT(fdt_setprop_cell(fdt, node, "#interrupt-cells", 2));
+
+    /* For SLOF */
+    _FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle));
+    _FDT(fdt_setprop_cell(fdt, node, "phandle", phandle));
+
+    /* The "ibm,plat-res-int-priorities" property defines the priority
+     * ranges reserved by the hypervisor
+     */
+    _FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities",
+                     plat_res_int_priorities, sizeof(plat_res_int_priorities)));
+}
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index 2e27b92b871a..f67d3c80bf3a 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -244,7 +244,8 @@ void xics_spapr_init(sPAPRMachineState *spapr)
     spapr_register_hypercall(H_IPOLL, h_ipoll);
 }
 
-void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
+void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
+                   uint32_t phandle)
 {
     uint32_t interrupt_server_ranges_prop[] = {
         0, cpu_to_be32(nr_servers),
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a689f853e020..4dae32049d0a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1270,7 +1270,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
     /* /interrupt controller */
-    spapr_dt_xics(spapr_max_server_number(spapr), fdt, PHANDLE_XICP);
+    smc->irq->dt_populate(spapr, spapr_max_server_number(spapr), fdt,
+                          PHANDLE_XICP);
 
     ret = spapr_populate_memory(spapr, fdt);
     if (ret < 0) {
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 7d2b7f425118..8401c75fdbe4 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -203,6 +203,7 @@ sPAPRIrq spapr_irq_xics = {
     .free        = spapr_irq_free_xics,
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
+    .dt_populate = spapr_dt_xics,
 };
 
 /*
@@ -316,6 +317,7 @@ sPAPRIrq spapr_irq_xive = {
     .free        = spapr_irq_free_xive,
     .qirq        = spapr_qirq_xive,
     .print_info  = spapr_irq_print_info_xive,
+    .dt_populate = spapr_dt_xive,
 };
 
 /*
@@ -420,4 +422,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
     .free        = spapr_irq_free_xics,
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
+    .dt_populate = spapr_dt_xics,
 };
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 19/37] spapr: allocate the interrupt thread context under the CPU core
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (17 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 18/37] spapr: add device tree support for the XIVE exploitation mode Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 20/37] spapr: extend the sPAPR IRQ backend for XICS migration Cédric Le Goater
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Each interrupt mode has its own specific interrupt presenter object,
that we store under the CPU object, one for XICS and one for XIVE.
The XIVE model hardwires the NVT identifier in the thread context
model to emulate the push/pull of hypervisor when a vCPU is dispatched
on a HW thread.

The sPAPR IRQ backend is extended with a new handler to support them
both.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---

 Changes since v5:

 - hardwires the NVT identifier in the thread context
 
 include/hw/ppc/spapr_irq.h |  2 ++
 include/hw/ppc/xive.h      |  1 +
 hw/intc/xive.c             | 31 +++++++++++++++++++++++++++++++
 hw/ppc/spapr_cpu_core.c    |  5 ++---
 hw/ppc/spapr_irq.c         | 15 +++++++++++++++
 5 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 457239826b8f..689176455e51 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -41,6 +41,8 @@ typedef struct sPAPRIrq {
     void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
     void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
                         void *fdt, uint32_t phandle);
+    Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
+                               Error **errp);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index e9b06e75fc1c..60c335ce0e1e 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -421,6 +421,7 @@ typedef struct XiveTCTX {
 extern const MemoryRegionOps xive_tm_ops;
 
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp);
 
 static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
 {
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 0db77107ab15..7638592da20f 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -552,6 +552,37 @@ static const TypeInfo xive_tctx_info = {
     .class_init    = xive_tctx_class_init,
 };
 
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
+{
+
+    CPUPPCState *env = &POWERPC_CPU(cpu)->env;
+    uint32_t pir = env->spr_cb[SPR_PIR].default_value;
+    uint32_t hw_cam = hw_cam_line((pir >> 8) & 0xf, pir & 0x7f);
+    Error *local_err = NULL;
+    Object *obj;
+
+    obj = object_new(TYPE_XIVE_TCTX);
+    object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, &error_abort);
+    object_unref(obj);
+    object_property_add_const_link(obj, "cpu", cpu, &error_abort);
+    object_property_add_const_link(obj, "xive", OBJECT(xrtr), &error_abort);
+    object_property_set_int(obj, hw_cam, "hw-cam", &local_err);
+    if (local_err) {
+        goto error;
+    }
+    object_property_set_bool(obj, true, "realized", &local_err);
+    if (local_err) {
+        goto error;
+    }
+
+    return obj;
+
+error:
+    object_unparent(obj);
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 /*
  * XIVE ESB helpers
  */
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 2398ce62c0e7..1811cd48db90 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -11,7 +11,6 @@
 #include "hw/ppc/spapr_cpu_core.h"
 #include "target/ppc/cpu.h"
 #include "hw/ppc/spapr.h"
-#include "hw/ppc/xics.h" /* for icp_create() - to be removed */
 #include "hw/boards.h"
 #include "qapi/error.h"
 #include "sysemu/cpus.h"
@@ -215,6 +214,7 @@ static void spapr_cpu_core_unrealize(DeviceState *dev, Error **errp)
 static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr,
                                sPAPRCPUCore *sc, Error **errp)
 {
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
     CPUPPCState *env = &cpu->env;
     CPUState *cs = CPU(cpu);
     Error *local_err = NULL;
@@ -233,8 +233,7 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     qemu_register_reset(spapr_cpu_reset, cpu);
     spapr_cpu_reset(cpu);
 
-    cpu->intc = icp_create(OBJECT(cpu), spapr->icp_type, XICS_FABRIC(spapr),
-                           &local_err);
+    cpu->intc = smc->irq->cpu_intc_create(spapr, OBJECT(cpu), &local_err);
     if (local_err) {
         goto error_unregister;
     }
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 8401c75fdbe4..e16265f29d74 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -190,6 +190,12 @@ static void spapr_irq_print_info_xics(sPAPRMachineState *spapr, Monitor *mon)
     ics_pic_print_info(spapr->ics, mon);
 }
 
+static Object *spapr_irq_cpu_intc_create_xics(sPAPRMachineState *spapr,
+                                              Object *cpu, Error **errp)
+{
+    return icp_create(cpu, spapr->icp_type, XICS_FABRIC(spapr), errp);
+}
+
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
 #define SPAPR_IRQ_XICS_NR_MSIS     \
     (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
@@ -204,6 +210,7 @@ sPAPRIrq spapr_irq_xics = {
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
 };
 
 /*
@@ -300,6 +307,12 @@ static void spapr_irq_print_info_xive(sPAPRMachineState *spapr,
     spapr_xive_pic_print_info(spapr->xive, mon);
 }
 
+static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
+                                              Object *cpu, Error **errp)
+{
+    return xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp);
+}
+
 /*
  * XIVE uses the full IRQ number space. Set it to 8K to be compatible
  * with XICS.
@@ -318,6 +331,7 @@ sPAPRIrq spapr_irq_xive = {
     .qirq        = spapr_qirq_xive,
     .print_info  = spapr_irq_print_info_xive,
     .dt_populate = spapr_dt_xive,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
 };
 
 /*
@@ -423,4 +437,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
     .qirq        = spapr_qirq_xics,
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
 };
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 20/37] spapr: extend the sPAPR IRQ backend for XICS migration
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (18 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 19/37] spapr: allocate the interrupt thread context under the CPU core Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 21/37] spapr: add a 'reset' method to the sPAPR IRQ backend Cédric Le Goater
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Introduce a new sPAPR IRQ handler to handle resend after migration
when the machine is using a KVM XICS interrupt controller model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h |  2 ++
 hw/ppc/spapr.c             | 13 +++++--------
 hw/ppc/spapr_irq.c         | 27 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 689176455e51..91ac5784919c 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -43,6 +43,7 @@ typedef struct sPAPRIrq {
                         void *fdt, uint32_t phandle);
     Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
                                Error **errp);
+    int (*post_load)(sPAPRMachineState *spapr, int version_id);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
@@ -53,6 +54,7 @@ void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
 void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
 qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
+int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
 
 /*
  * XICS legacy routines
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4dae32049d0a..8911465e32cf 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1732,14 +1732,6 @@ static int spapr_post_load(void *opaque, int version_id)
         return err;
     }
 
-    if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) {
-        CPUState *cs;
-        CPU_FOREACH(cs) {
-            PowerPCCPU *cpu = POWERPC_CPU(cs);
-            icp_resend(ICP(cpu->intc));
-        }
-    }
-
     /* In earlier versions, there was no separate qdev for the PAPR
      * RTC, so the RTC offset was stored directly in sPAPREnvironment.
      * So when migrating from those versions, poke the incoming offset
@@ -1760,6 +1752,11 @@ static int spapr_post_load(void *opaque, int version_id)
         }
     }
 
+    err = spapr_irq_post_load(spapr, version_id);
+    if (err) {
+        return err;
+    }
+
     return err;
 }
 
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index e16265f29d74..8943e28fc11b 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -196,6 +196,18 @@ static Object *spapr_irq_cpu_intc_create_xics(sPAPRMachineState *spapr,
     return icp_create(cpu, spapr->icp_type, XICS_FABRIC(spapr), errp);
 }
 
+static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
+{
+    if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) {
+        CPUState *cs;
+        CPU_FOREACH(cs) {
+            PowerPCCPU *cpu = POWERPC_CPU(cs);
+            icp_resend(ICP(cpu->intc));
+        }
+    }
+    return 0;
+}
+
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
 #define SPAPR_IRQ_XICS_NR_MSIS     \
     (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
@@ -211,6 +223,7 @@ sPAPRIrq spapr_irq_xics = {
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
+    .post_load   = spapr_irq_post_load_xics,
 };
 
 /*
@@ -313,6 +326,11 @@ static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
     return xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp);
 }
 
+static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
+{
+    return 0;
+}
+
 /*
  * XIVE uses the full IRQ number space. Set it to 8K to be compatible
  * with XICS.
@@ -332,6 +350,7 @@ sPAPRIrq spapr_irq_xive = {
     .print_info  = spapr_irq_print_info_xive,
     .dt_populate = spapr_dt_xive,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
+    .post_load   = spapr_irq_post_load_xive,
 };
 
 /*
@@ -370,6 +389,13 @@ qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq)
     return smc->irq->qirq(spapr, irq);
 }
 
+int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+    return smc->irq->post_load(spapr, version_id);
+}
+
 /*
  * XICS legacy routines - to deprecate one day
  */
@@ -438,4 +464,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
     .print_info  = spapr_irq_print_info_xics,
     .dt_populate = spapr_dt_xics,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
+    .post_load   = spapr_irq_post_load_xics,
 };
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 21/37] spapr: add a 'reset' method to the sPAPR IRQ backend
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (19 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 20/37] spapr: extend the sPAPR IRQ backend for XICS migration Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 22/37] spapr: add a 'pseries-3.1-xive' machine type Cédric Le Goater
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

For the time being, the XIVE reset handler updates the OS CAM line of
the vCPU as it is done under a real hypervisor when a vCPU is
scheduled to run on a HW thread.

This method will become even more useful when the machine supporting
both interrupt modes, XIVE and XICS, is introduced. In this machine,
the interrupt mode is chosen by the CAS negotiation process and
activated after a reset.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h  |  2 ++
 include/hw/ppc/spapr_xive.h |  1 +
 hw/intc/spapr_xive.c        | 24 ++++++++++++++++++++++++
 hw/ppc/spapr.c              |  5 +++++
 hw/ppc/spapr_irq.c          | 25 +++++++++++++++++++++++++
 5 files changed, 57 insertions(+)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 91ac5784919c..bdb1c66125c9 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -44,6 +44,7 @@ typedef struct sPAPRIrq {
     Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu,
                                Error **errp);
     int (*post_load)(sPAPRMachineState *spapr, int version_id);
+    void (*reset)(sPAPRMachineState *spapr, Error **errp);
 } sPAPRIrq;
 
 extern sPAPRIrq spapr_irq_xics;
@@ -55,6 +56,7 @@ int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
 void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
 qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
 int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
+void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp);
 
 /*
  * XICS legacy routines
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 728a5e8dc163..7244a6231ce6 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -47,5 +47,6 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
 void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
+void spapr_xive_reset_tctx(sPAPRXive *xive);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index fd02dc6b91e4..3cddc9332acb 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -181,6 +181,30 @@ static void spapr_xive_map_mmio(sPAPRXive *xive)
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
 }
 
+/*
+ * When a Virtual Processor is scheduled to run on a HW thread, the
+ * hypervisor pushes its identifier in the OS CAM line. Emulate the
+ * same behavior under QEMU.
+ */
+void spapr_xive_reset_tctx(sPAPRXive *xive)
+{
+    CPUState *cs;
+    uint8_t  nvt_blk;
+    uint32_t nvt_idx;
+    uint32_t nvt_cam;
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
+
+        spapr_xive_cpu_to_nvt(xive, cpu, &nvt_blk, &nvt_idx);
+
+        nvt_cam = cpu_to_be32(TM_QW1W2_VO |
+                              xive_nvt_cam_line(nvt_blk, nvt_idx));
+        memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &nvt_cam, 4);
+    }
+}
+
 static void spapr_xive_end_reset(XiveEND *end)
 {
     memset(end, 0, sizeof(*end));
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8911465e32cf..530aee8d143d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1621,6 +1621,11 @@ static void spapr_machine_reset(void)
 
     qemu_devices_reset();
 
+    /* This is fixing some of the default configuration of the XIVE
+     * devices. To be called after the reset of the machine devices.
+     */
+    spapr_irq_reset(spapr, &error_fatal);
+
     /* DRC reset may cause a device to be unplugged. This will cause troubles
      * if this device is used by another device (eg, a running vhost backend
      * will crash QEMU if the DIMM holding the vring goes away). To avoid such
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 8943e28fc11b..58ce124c1501 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -13,6 +13,7 @@
 #include "qapi/error.h"
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/xics.h"
 #include "sysemu/kvm.h"
 
@@ -208,6 +209,10 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
     return 0;
 }
 
+static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
+{
+}
+
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
 #define SPAPR_IRQ_XICS_NR_MSIS     \
     (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
@@ -224,6 +229,7 @@ sPAPRIrq spapr_irq_xics = {
     .dt_populate = spapr_dt_xics,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
     .post_load   = spapr_irq_post_load_xics,
+    .reset       = spapr_irq_reset_xics,
 };
 
 /*
@@ -331,6 +337,15 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
     return 0;
 }
 
+static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
+{
+    /*
+     * Set the OS CAM line of the cpu interrupt thread context. Needs
+     * to come after the XiveTCTX reset handlers.
+     */
+    spapr_xive_reset_tctx(spapr->xive);
+}
+
 /*
  * XIVE uses the full IRQ number space. Set it to 8K to be compatible
  * with XICS.
@@ -351,6 +366,7 @@ sPAPRIrq spapr_irq_xive = {
     .dt_populate = spapr_dt_xive,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
     .post_load   = spapr_irq_post_load_xive,
+    .reset       = spapr_irq_reset_xive,
 };
 
 /*
@@ -396,6 +412,15 @@ int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id)
     return smc->irq->post_load(spapr, version_id);
 }
 
+void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+    if (smc->irq->reset) {
+        smc->irq->reset(spapr, errp);
+    }
+}
+
 /*
  * XICS legacy routines - to deprecate one day
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 22/37] spapr: add a 'pseries-3.1-xive' machine type
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (20 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 21/37] spapr: add a 'reset' method to the sPAPR IRQ backend Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 23/37] linux-headers: update to 4.20-rc5 Cédric Le Goater
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The interrupt mode is statically defined to XIVE only for this machine.
The guest OS is required to have support for the XIVE exploitation
mode of the POWER9 interrupt controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h     |  6 ++++++
 include/hw/ppc/spapr_irq.h |  1 +
 hw/ppc/spapr.c             | 36 +++++++++++++++++++++++++++++++-----
 hw/ppc/spapr_irq.c         |  3 +++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6bf028a02fe2..daced428a42c 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -824,5 +824,11 @@ int spapr_caps_post_migration(sPAPRMachineState *spapr);
 
 void spapr_check_pagesize(sPAPRMachineState *spapr, hwaddr pagesize,
                           Error **errp);
+/*
+ * XIVE definitions
+ */
+#define SPAPR_OV5_XIVE_LEGACY   0x0
+#define SPAPR_OV5_XIVE_EXPLOIT  0x40
+#define SPAPR_OV5_XIVE_BOTH     0x80
 
 #endif /* HW_SPAPR_H */
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index bdb1c66125c9..26727a7263a5 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -33,6 +33,7 @@ void spapr_irq_msi_reset(sPAPRMachineState *spapr);
 typedef struct sPAPRIrq {
     uint32_t    nr_irqs;
     uint32_t    nr_msis;
+    uint8_t     ov5;
 
     void (*init)(sPAPRMachineState *spapr, int nr_irqs, Error **errp);
     int (*claim)(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 530aee8d143d..817dd1b2c442 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1097,12 +1097,14 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt)
     spapr_dt_rtas_tokens(fdt, rtas);
 }
 
-/* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU features
- * that the guest may request and thus the valid values for bytes 24..26 of
- * option vector 5: */
-static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
+/* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU
+ * and the XIVE features that the guest may request and thus the valid
+ * values for bytes 23..26 of option vector 5: */
+static void spapr_dt_ov5_platform_support(sPAPRMachineState *spapr, void *fdt,
+                                          int chosen)
 {
     PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 
     char val[2 * 4] = {
         23, 0x00, /* Xive mode, filled in below. */
@@ -1123,7 +1125,11 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen)
         } else {
             val[3] = 0x00; /* Hash */
         }
+        /* No KVM support */
+        val[1] = SPAPR_OV5_XIVE_LEGACY;
     } else {
+        val[1] = smc->irq->ov5;
+
         /* V3 MMU supports both hash and radix in tcg (with dynamic switching) */
         val[3] = 0xC0;
     }
@@ -1191,7 +1197,7 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt)
         _FDT(fdt_setprop_string(fdt, chosen, "stdout-path", stdout_path));
     }
 
-    spapr_dt_ov5_platform_support(fdt, chosen);
+    spapr_dt_ov5_platform_support(spapr, fdt, chosen);
 
     g_free(stdout_path);
     g_free(bootlist);
@@ -2624,6 +2630,11 @@ static void spapr_machine_init(MachineState *machine)
     /* advertise support for ibm,dyamic-memory-v2 */
     spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
 
+    /* advertise XIVE */
+    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT) {
+        spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
+    }
+
     /* init CPUs */
     spapr_init_cpus(spapr);
 
@@ -3973,6 +3984,21 @@ static void spapr_machine_3_1_class_options(MachineClass *mc)
 
 DEFINE_SPAPR_MACHINE(3_1, "3.1", true);
 
+static void spapr_machine_3_1_xive_instance_options(MachineState *machine)
+{
+    spapr_machine_3_1_instance_options(machine);
+}
+
+static void spapr_machine_3_1_xive_class_options(MachineClass *mc)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
+    spapr_machine_3_1_class_options(mc);
+    smc->irq = &spapr_irq_xive;
+}
+
+DEFINE_SPAPR_MACHINE(3_1_xive, "3.1-xive", false);
+
 /*
  * pseries-3.0
  */
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 58ce124c1501..8eead17c8f36 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -220,6 +220,7 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
 sPAPRIrq spapr_irq_xics = {
     .nr_irqs     = SPAPR_IRQ_XICS_NR_IRQS,
     .nr_msis     = SPAPR_IRQ_XICS_NR_MSIS,
+    .ov5         = SPAPR_OV5_XIVE_LEGACY,
 
     .init        = spapr_irq_init_xics,
     .claim       = spapr_irq_claim_xics,
@@ -357,6 +358,7 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
 sPAPRIrq spapr_irq_xive = {
     .nr_irqs     = SPAPR_IRQ_XIVE_NR_IRQS,
     .nr_msis     = SPAPR_IRQ_XIVE_NR_MSIS,
+    .ov5         = SPAPR_OV5_XIVE_EXPLOIT,
 
     .init        = spapr_irq_init_xive,
     .claim       = spapr_irq_claim_xive,
@@ -481,6 +483,7 @@ int spapr_irq_find(sPAPRMachineState *spapr, int num, bool align, Error **errp)
 sPAPRIrq spapr_irq_xics_legacy = {
     .nr_irqs     = SPAPR_IRQ_XICS_LEGACY_NR_IRQS,
     .nr_msis     = SPAPR_IRQ_XICS_LEGACY_NR_IRQS,
+    .ov5         = SPAPR_OV5_XIVE_LEGACY,
 
     .init        = spapr_irq_init_xics,
     .claim       = spapr_irq_claim_xics,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 23/37] linux-headers: update to 4.20-rc5
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (21 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 22/37] spapr: add a 'pseries-3.1-xive' machine type Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 24/37] spapr/xive: add KVM support Cédric Le Goater
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

These changes provide the initial interface with the KVM device
implementing the XIVE native exploitation interrupt mode. Also used to
retrieve the state of the KVM device for the monitor usage and for
migration.

Available from :

  https://github.com/legoater/linux/commits/xive-4.20

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 linux-headers/asm-powerpc/kvm.h | 46 +++++++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h       |  6 +++++
 2 files changed, 52 insertions(+)

diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 8c876c166ef2..10fe86c21e8f 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -480,6 +480,8 @@ struct kvm_ppc_cpu_char {
 #define  KVM_REG_PPC_ICP_PPRI_SHIFT	16	/* pending irq priority */
 #define  KVM_REG_PPC_ICP_PPRI_MASK	0xff
 
+#define KVM_REG_PPC_NVT_STATE	(KVM_REG_PPC | KVM_REG_SIZE_U256 | 0x8d)
+
 /* Device control API: PPC-specific devices */
 #define KVM_DEV_MPIC_GRP_MISC		1
 #define   KVM_DEV_MPIC_BASE_ADDR	0	/* 64-bit */
@@ -675,4 +677,48 @@ struct kvm_ppc_cpu_char {
 #define  KVM_XICS_PRESENTED		(1ULL << 43)
 #define  KVM_XICS_QUEUED		(1ULL << 44)
 
+/* POWER9 XIVE Native Interrupt Controller */
+#define KVM_DEV_XIVE_GRP_CTRL		1
+#define   KVM_DEV_XIVE_GET_ESB_FD	1
+#define   KVM_DEV_XIVE_GET_TIMA_FD	2
+#define   KVM_DEV_XIVE_VC_BASE		3
+#define   KVM_DEV_XIVE_SAVE_EQ_PAGES	4
+#define KVM_DEV_XIVE_GRP_SOURCES	2	/* 64-bit source attributes */
+#define KVM_DEV_XIVE_GRP_SYNC		3	/* 64-bit source attributes */
+#define KVM_DEV_XIVE_GRP_EAS		4	/* 64-bit eas attributes */
+#define KVM_DEV_XIVE_GRP_EQ		5	/* 64-bit eq attributes */
+
+/* Layout of 64-bit XIVE source attribute values */
+#define KVM_XIVE_LEVEL_SENSITIVE	(1ULL << 0)
+#define KVM_XIVE_LEVEL_ASSERTED		(1ULL << 1)
+
+/* Layout of 64-bit eas attribute values */
+#define KVM_XIVE_EAS_PRIORITY_SHIFT	0
+#define KVM_XIVE_EAS_PRIORITY_MASK	0x7
+#define KVM_XIVE_EAS_SERVER_SHIFT	3
+#define KVM_XIVE_EAS_SERVER_MASK	0xfffffff8ULL
+#define KVM_XIVE_EAS_MASK_SHIFT		32
+#define KVM_XIVE_EAS_MASK_MASK		0x100000000ULL
+#define KVM_XIVE_EAS_EISN_SHIFT		33
+#define KVM_XIVE_EAS_EISN_MASK		0xfffffffe00000000ULL
+
+/* Layout of 64-bit eq attribute */
+#define KVM_XIVE_EQ_PRIORITY_SHIFT	0
+#define KVM_XIVE_EQ_PRIORITY_MASK	0x7
+#define KVM_XIVE_EQ_SERVER_SHIFT	3
+#define KVM_XIVE_EQ_SERVER_MASK		0xfffffff8ULL
+
+/* Layout of 64-bit eq attribute values */
+struct kvm_ppc_xive_eq {
+	__u32 flags;
+	__u32 qsize;
+	__u64 qpage;
+	__u32 qtoggle;
+	__u32 qindex;
+};
+
+#define KVM_XIVE_EQ_FLAG_ENABLED	0x00000001
+#define KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY	0x00000002
+#define KVM_XIVE_EQ_FLAG_ESCALATE	0x00000004
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index f11a7eb49cfa..b7a74c58d0db 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -965,6 +965,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_COALESCED_PIO 162
 #define KVM_CAP_HYPERV_ENLIGHTENED_VMCS 163
 #define KVM_CAP_EXCEPTION_PAYLOAD 164
+#define KVM_CAP_ARM_VM_IPA_SIZE 165
+#define KVM_CAP_PPC_IRQ_XIVE 166
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1188,6 +1190,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_ARM_VGIC_V3	KVM_DEV_TYPE_ARM_VGIC_V3
 	KVM_DEV_TYPE_ARM_VGIC_ITS,
 #define KVM_DEV_TYPE_ARM_VGIC_ITS	KVM_DEV_TYPE_ARM_VGIC_ITS
+	KVM_DEV_TYPE_XIVE,
+#define KVM_DEV_TYPE_XIVE		KVM_DEV_TYPE_XIVE
 	KVM_DEV_TYPE_MAX,
 };
 
@@ -1305,6 +1309,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct kvm_device_attr)
 
+#define KVM_DESTROY_DEVICE	  _IOWR(KVMIO,  0xf0, struct kvm_create_device)
+
 /*
  * ioctls for vcpu fds
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 24/37] spapr/xive: add KVM support
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (22 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 23/37] linux-headers: update to 4.20-rc5 Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 25/37] spapr/xive: add state synchronization with KVM Cédric Le Goater
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This introduces a set of helpers to activate the KVM XIVE device when
KVM is in use.

They handle the initialization of the TIMA and the source ESB memory
regions which have a different type under KVM. These are 'ram device'
memory mappings, similarly to VFIO, exposed to the guest and the
associated VMAs on the host are populated dynamically with the
appropriate pages using a fault handler.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/spapr_xive.h       |  10 ++
 include/hw/ppc/xive.h             |  20 +++
 target/ppc/kvm_ppc.h              |   6 +
 hw/intc/spapr_xive.c              |  31 ++--
 hw/intc/spapr_xive_kvm.c          | 253 ++++++++++++++++++++++++++++++
 hw/intc/xive.c                    |  30 +++-
 hw/ppc/spapr.c                    |   7 +-
 hw/ppc/spapr_irq.c                |   9 --
 target/ppc/kvm.c                  |   7 +
 hw/intc/Makefile.objs             |   1 +
 11 files changed, 349 insertions(+), 26 deletions(-)
 create mode 100644 hw/intc/spapr_xive_kvm.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 7f34ad0528ed..c1bf5cd951f5 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_XIVE=$(CONFIG_PSERIES)
 CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
+CONFIG_XIVE_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
 CONFIG_SPAPR_RNG=y
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 7244a6231ce6..ced187ee49e5 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -35,6 +35,10 @@ typedef struct sPAPRXive {
     /* TIMA mapping address */
     hwaddr        tm_base;
     MemoryRegion  tm_mmio;
+
+    /* KVM support */
+    int           fd;
+    void          *tm_mmap;
 } sPAPRXive;
 
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
@@ -48,5 +52,11 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr);
 void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
 void spapr_xive_reset_tctx(sPAPRXive *xive);
+void spapr_xive_map_mmio(sPAPRXive *xive);
+
+/*
+ * KVM XIVE device helpers
+ */
+void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 60c335ce0e1e..3684d8e4f6be 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -140,6 +140,7 @@
 #ifndef PPC_XIVE_H
 #define PPC_XIVE_H
 
+#include "sysemu/kvm.h"
 #include "hw/qdev-core.h"
 #include "hw/sysbus.h"
 #include "hw/ppc/xive_regs.h"
@@ -195,6 +196,9 @@ typedef struct XiveSource {
     uint32_t        esb_shift;
     MemoryRegion    esb_mmio;
 
+    /* KVM support */
+    void            *esb_mmap;
+
     XiveNotifier    *xive;
 } XiveSource;
 
@@ -428,4 +432,20 @@ static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
     return (nvt_blk << 19) | nvt_idx;
 }
 
+/*
+ * KVM XIVE device helpers
+ */
+
+/* Keep inlined to discard compile of KVM code sections */
+static inline bool kvmppc_xive_enabled(void)
+{
+    MachineState *machine = MACHINE(qdev_get_machine());
+
+    return kvm_enabled() && machine_kernel_irqchip_allowed(machine);
+}
+
+void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
+void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
+void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
+
 #endif /* PPC_XIVE_H */
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index bdfaa4e70a83..d2159660f9f2 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -59,6 +59,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
 bool kvmppc_has_cap_htm(void);
 bool kvmppc_has_cap_mmu_radix(void);
 bool kvmppc_has_cap_mmu_hash_v3(void);
+bool kvmppc_has_cap_xive(void);
 int kvmppc_get_cap_safe_cache(void);
 int kvmppc_get_cap_safe_bounds_check(void);
 int kvmppc_get_cap_safe_indirect_branch(void);
@@ -307,6 +308,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
     return false;
 }
 
+static inline bool kvmppc_has_cap_xive(void)
+{
+    return false;
+}
+
 static inline int kvmppc_get_cap_safe_cache(void)
 {
     return 0;
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 3cddc9332acb..256108914001 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -174,7 +174,7 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
     }
 }
 
-static void spapr_xive_map_mmio(sPAPRXive *xive)
+void spapr_xive_map_mmio(sPAPRXive *xive)
 {
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
@@ -250,6 +250,9 @@ static void spapr_xive_instance_init(Object *obj)
                       TYPE_XIVE_END_SOURCE);
     object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
                               NULL);
+
+    /* Not connected to the KVM XIVE device */
+    xive->fd = -1;
 }
 
 static void spapr_xive_realize(DeviceState *dev, Error **errp)
@@ -304,17 +307,25 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     xive->eat = g_new0(XiveEAS, xive->nr_irqs);
     xive->endt = g_new0(XiveEND, xive->nr_ends);
 
-    /* TIMA initialization */
-    memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
-                          "xive.tima", 4ull << TM_SHIFT);
+    if (kvmppc_xive_enabled()) {
+        kvmppc_xive_connect(xive, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    } else {
+        /* TIMA initialization */
+        memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
+                              "xive.tima", 4ull << TM_SHIFT);
 
-    /* Define all XIVE MMIO regions on SysBus */
-    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
-    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
-    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
+        /* Define all XIVE MMIO regions on SysBus */
+        sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
+        sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
+        sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
 
-    /* Map all regions */
-    spapr_xive_map_mmio(xive);
+        /* Map all regions */
+        spapr_xive_map_mmio(xive);
+    }
 
     qemu_register_reset(spapr_xive_reset, dev);
 }
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
new file mode 100644
index 000000000000..8f773646aa3a
--- /dev/null
+++ b/hw/intc/spapr_xive_kvm.c
@@ -0,0 +1,253 @@
+/*
+ * QEMU PowerPC sPAPR XIVE interrupt controller model
+ *
+ * Copyright (c) 2017-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/kvm.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/xive.h"
+#include "kvm_ppc.h"
+
+#include <sys/ioctl.h>
+
+/*
+ * Helpers for CPU hotplug
+ *
+ * TODO: make a common KVMEnabledCPU layer for XICS and XIVE
+ */
+typedef struct KVMEnabledCPU {
+    unsigned long vcpu_id;
+    QLIST_ENTRY(KVMEnabledCPU) node;
+} KVMEnabledCPU;
+
+static QLIST_HEAD(, KVMEnabledCPU)
+    kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus);
+
+static bool kvm_cpu_is_enabled(CPUState *cs)
+{
+    KVMEnabledCPU *enabled_cpu;
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
+
+    QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) {
+        if (enabled_cpu->vcpu_id == vcpu_id) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static void kvm_cpu_enable(CPUState *cs)
+{
+    KVMEnabledCPU *enabled_cpu;
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
+
+    enabled_cpu = g_malloc(sizeof(*enabled_cpu));
+    enabled_cpu->vcpu_id = vcpu_id;
+    QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
+}
+
+/*
+ * XIVE Thread Interrupt Management context (KVM)
+ */
+
+void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
+    unsigned long vcpu_id;
+    int ret;
+
+    /* Check if CPU was hot unplugged and replugged. */
+    if (kvm_cpu_is_enabled(tctx->cs)) {
+        return;
+    }
+
+    vcpu_id = kvm_arch_vcpu_id(tctx->cs);
+
+    ret = kvm_vcpu_enable_cap(tctx->cs, KVM_CAP_PPC_IRQ_XIVE, 0, xive->fd,
+                              vcpu_id, 0);
+    if (ret < 0) {
+        error_setg(errp, "Unable to connect CPU%ld to KVM XIVE device: %s",
+                   vcpu_id, strerror(errno));
+        return;
+    }
+
+    kvm_cpu_enable(tctx->cs);
+}
+
+/*
+ * XIVE Interrupt Source (KVM)
+ */
+
+/*
+ * At reset, the interrupt sources are simply created and MASKED. We
+ * only need to inform the KVM XIVE device about their type: LSI or
+ * MSI.
+ */
+void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
+    int i;
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        Error *local_err = NULL;
+        uint64_t state = 0;
+
+        if (xive_source_irq_is_lsi(xsrc, i)) {
+            state |= KVM_XIVE_LEVEL_SENSITIVE;
+            if (xsrc->status[i] & XIVE_STATUS_ASSERTED) {
+                state |= KVM_XIVE_LEVEL_ASSERTED;
+            }
+        }
+
+        kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCES, i, &state,
+                          true, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
+{
+    XiveSource *xsrc = opaque;
+    struct kvm_irq_level args;
+    int rc;
+
+    args.irq = srcno;
+    if (!xive_source_irq_is_lsi(xsrc, srcno)) {
+        if (!val) {
+            return;
+        }
+        args.level = KVM_INTERRUPT_SET;
+    } else {
+        if (val) {
+            xsrc->status[srcno] |= XIVE_STATUS_ASSERTED;
+            args.level = KVM_INTERRUPT_SET_LEVEL;
+        } else {
+            xsrc->status[srcno] &= ~XIVE_STATUS_ASSERTED;
+            args.level = KVM_INTERRUPT_UNSET;
+        }
+    }
+    rc = kvm_vm_ioctl(kvm_state, KVM_IRQ_LINE, &args);
+    if (rc < 0) {
+        error_report("kvm_irq_line() failed : %s", strerror(errno));
+    }
+}
+
+/*
+ * sPAPR XIVE interrupt controller (KVM)
+ */
+
+static void *kvmppc_xive_mmap(sPAPRXive *xive, int ctrl, size_t len,
+                                 Error **errp)
+{
+    Error *local_err = NULL;
+    void *addr;
+    int fd;
+
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL, ctrl, &fd, false,
+                      &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    addr = mmap(NULL, len, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
+    close(fd);
+    if (addr == MAP_FAILED) {
+        error_setg_errno(errp, errno, "Unable to set XIVE mmaping");
+        return NULL;
+    }
+
+    return addr;
+}
+
+/*
+ * All the XIVE memory regions are now backed by mappings from the KVM
+ * XIVE device.
+ */
+void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
+{
+    XiveSource *xsrc = &xive->source;
+    XiveENDSource *end_xsrc = &xive->end_source;
+    Error *local_err = NULL;
+    size_t esb_len;
+    size_t tima_len;
+
+    if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
+        error_setg(errp,
+                   "IRQ_XIVE capability must be present for KVM XIVE device");
+        return;
+    }
+
+    /* First, create the KVM XIVE device */
+    xive->fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_XIVE, false);
+    if (xive->fd < 0) {
+        error_setg_errno(errp, -xive->fd, "error creating KVM XIVE device");
+        return;
+    }
+
+    /* Source ESBs KVM mapping
+     *
+     * Inform KVM where we will map the ESB pages. This is needed by
+     * the H_INT_GET_SOURCE_INFO hcall which returns the source
+     * characteristics, among which the ESB page address.
+     */
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL, KVM_DEV_XIVE_VC_BASE,
+                      &xive->vc_base, true, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
+    xsrc->esb_mmap = kvmppc_xive_mmap(xive, KVM_DEV_XIVE_GET_ESB_FD,
+                                      esb_len, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    memory_region_init_ram_device_ptr(&xsrc->esb_mmio, OBJECT(xsrc),
+                                      "xive.esb", esb_len, xsrc->esb_mmap);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
+
+    /* END ESBs mapping (No KVM) */
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
+
+    /* TIMA KVM mapping
+     *
+     * We could also inform KVM where the TIMA will be mapped but as
+     * this is a fixed MMIO address for the system it does not seem
+     * necessary to provide a KVM ioctl to change it.
+     */
+    tima_len = 4ull << TM_SHIFT;
+    xive->tm_mmap = kvmppc_xive_mmap(xive, KVM_DEV_XIVE_GET_TIMA_FD,
+                                     tima_len, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    memory_region_init_ram_device_ptr(&xive->tm_mmio, OBJECT(xive),
+                                      "xive.tima", tima_len, xive->tm_mmap);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
+
+    kvm_kernel_irqchip = true;
+    kvm_msi_via_irqfd_allowed = true;
+    kvm_gsi_direct_mapping = true;
+
+    /* Map all regions */
+    spapr_xive_map_mmio(xive);
+}
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 7638592da20f..2788f9210144 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -15,6 +15,7 @@
 #include "sysemu/dma.h"
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
+#include "hw/boards.h"
 #include "hw/ppc/xive.h"
 #include "hw/ppc/xive_regs.h"
 
@@ -511,6 +512,15 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /* Connect the presenter to the VCPU (required for CPU hotplug) */
+    if (kvmppc_xive_enabled()) {
+        kvmppc_xive_cpu_connect(tctx, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
     qemu_register_reset(xive_tctx_reset, dev);
 }
 
@@ -927,6 +937,10 @@ static void xive_source_reset(void *dev)
 
     /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
     memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
+
+    if (kvmppc_xive_enabled()) {
+        kvmppc_xive_source_reset(xsrc, &error_fatal);
+    }
 }
 
 static void xive_source_realize(DeviceState *dev, Error **errp)
@@ -934,6 +948,7 @@ static void xive_source_realize(DeviceState *dev, Error **errp)
     XiveSource *xsrc = XIVE_SOURCE(dev);
     Object *obj;
     Error *local_err = NULL;
+    qemu_irq_handler irq_handler;
 
     obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
     if (!obj) {
@@ -960,12 +975,17 @@ static void xive_source_realize(DeviceState *dev, Error **errp)
     xsrc->status = g_malloc0(xsrc->nr_irqs);
     xsrc->lsi_map = bitmap_new(xsrc->nr_irqs);
 
-    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
-                          &xive_source_esb_ops, xsrc, "xive.esb",
-                          (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
+    if (kvmppc_xive_enabled()) {
+        irq_handler = kvmppc_xive_source_set_irq;
+    } else {
+        irq_handler = xive_source_set_irq;
+
+        memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
+                              &xive_source_esb_ops, xsrc, "xive.esb",
+                              (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
+    }
 
-    xsrc->qirqs = qemu_allocate_irqs(xive_source_set_irq, xsrc,
-                                     xsrc->nr_irqs);
+    xsrc->qirqs = qemu_allocate_irqs(irq_handler, xsrc, xsrc->nr_irqs);
 
     qemu_register_reset(xive_source_reset, dev);
 }
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 817dd1b2c442..3cdc66484f42 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1125,8 +1125,11 @@ static void spapr_dt_ov5_platform_support(sPAPRMachineState *spapr, void *fdt,
         } else {
             val[3] = 0x00; /* Hash */
         }
-        /* No KVM support */
-        val[1] = SPAPR_OV5_XIVE_LEGACY;
+        if (kvmppc_has_cap_xive()) {
+            val[1] = smc->irq->ov5;
+        } else {
+            val[1] = SPAPR_OV5_XIVE_LEGACY;
+        }
     } else {
         val[1] = smc->irq->ov5;
 
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 8eead17c8f36..94ee3ec6a9f4 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -268,17 +268,8 @@ static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
 static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
                                 Error **errp)
 {
-    MachineState *machine = MACHINE(spapr);
     Error *local_err = NULL;
 
-    /* No KVM support */
-    if (kvm_enabled()) {
-        if (machine_kernel_irqchip_required(machine)) {
-            error_setg(errp, "kernel_irqchip requested. no XIVE support");
-            return;
-        }
-    }
-
     spapr->xive = spapr_xive_create(spapr, nr_irqs,
                                     spapr_max_server_number(spapr), &local_err);
     if (local_err) {
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index f81327d6cd47..3b7cf106242b 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -86,6 +86,7 @@ static int cap_fixup_hcalls;
 static int cap_htm;             /* Hardware transactional memory support */
 static int cap_mmu_radix;
 static int cap_mmu_hash_v3;
+static int cap_xive;
 static int cap_resize_hpt;
 static int cap_ppc_pvr_compat;
 static int cap_ppc_safe_cache;
@@ -149,6 +150,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_htm = kvm_vm_check_extension(s, KVM_CAP_PPC_HTM);
     cap_mmu_radix = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_RADIX);
     cap_mmu_hash_v3 = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_HASH_V3);
+    cap_xive = kvm_vm_check_extension(s, KVM_CAP_PPC_IRQ_XIVE);
     cap_resize_hpt = kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT);
     kvmppc_get_cpu_characteristics(s);
     cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
@@ -2385,6 +2387,11 @@ static int parse_cap_ppc_safe_indirect_branch(struct kvm_ppc_cpu_char c)
     return 0;
 }
 
+bool kvmppc_has_cap_xive(void)
+{
+    return cap_xive;
+}
+
 static void kvmppc_get_cpu_characteristics(KVMState *s)
 {
     struct kvm_ppc_cpu_char c;
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 301a8e972d91..23126c199178 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -39,6 +39,7 @@ obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
 obj-$(CONFIG_XIVE) += xive.o
 obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
+obj-$(CONFIG_XIVE_KVM) += spapr_xive_kvm.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 25/37] spapr/xive: add state synchronization with KVM
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (23 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 24/37] spapr/xive: add KVM support Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 26/37] spapr/xive: introduce a VM state change handler Cédric Le Goater
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This extends the KVM XIVE device backend with 'synchronize_state'
methods used to retrieve the state from KVM. The HW state of the
sources, the KVM device and the thread interrupt contexts are
collected for the monitor usage and also migration.

These get operations rely on their KVM counterpart in the host kernel
which acts as a proxy for OPAL, the host firmware. The set operations
will be added for migration support later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |   9 ++
 include/hw/ppc/xive.h       |   1 +
 hw/intc/spapr_xive.c        |  20 ++--
 hw/intc/spapr_xive_kvm.c    | 198 ++++++++++++++++++++++++++++++++++++
 hw/intc/xive.c              |   4 +
 5 files changed, 223 insertions(+), 9 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index ced187ee49e5..bd81bb4d7608 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -45,6 +45,14 @@ bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
 bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
 qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
+bool spapr_xive_priority_is_reserved(uint8_t priority);
+
+void spapr_xive_cpu_to_nvt(sPAPRXive *xive, PowerPCCPU *cpu,
+                           uint8_t *out_nvt_blk, uint32_t *out_nvt_idx);
+void spapr_xive_cpu_to_end(sPAPRXive *xive, PowerPCCPU *cpu, uint8_t prio,
+                           uint8_t *out_end_blk, uint32_t *out_end_idx);
+int spapr_xive_target_to_end(sPAPRXive *xive, uint32_t target, uint8_t prio,
+                             uint8_t *out_end_blk, uint32_t *out_end_idx);
 
 typedef struct sPAPRMachineState sPAPRMachineState;
 
@@ -58,5 +66,6 @@ void spapr_xive_map_mmio(sPAPRXive *xive);
  * KVM XIVE device helpers
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
+void kvmppc_xive_synchronize_state(sPAPRXive *xive);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 3684d8e4f6be..7330c11d31c8 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -447,5 +447,6 @@ static inline bool kvmppc_xive_enabled(void)
 void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
+void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx);
 
 #endif /* PPC_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 256108914001..87f60dd4e453 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -48,8 +48,8 @@ static uint32_t spapr_xive_nvt_to_target(sPAPRXive *xive, uint8_t nvt_blk,
     return nvt_idx - SPAPR_XIVE_NVT_BASE;
 }
 
-static void spapr_xive_cpu_to_nvt(sPAPRXive *xive, PowerPCCPU *cpu,
-                                  uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
+void spapr_xive_cpu_to_nvt(sPAPRXive *xive, PowerPCCPU *cpu,
+                           uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
 {
     XiveRouter *xrtr = XIVE_ROUTER(xive);
 
@@ -82,9 +82,8 @@ static int spapr_xive_target_to_nvt(sPAPRXive *xive, uint32_t target,
  * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
  * priorities per CPU
  */
-static void spapr_xive_cpu_to_end(sPAPRXive *xive, PowerPCCPU *cpu,
-                                  uint8_t prio, uint8_t *out_end_blk,
-                                  uint32_t *out_end_idx)
+void spapr_xive_cpu_to_end(sPAPRXive *xive, PowerPCCPU *cpu, uint8_t prio,
+                           uint8_t *out_end_blk, uint32_t *out_end_idx)
 {
     XiveRouter *xrtr = XIVE_ROUTER(xive);
 
@@ -100,9 +99,8 @@ static void spapr_xive_cpu_to_end(sPAPRXive *xive, PowerPCCPU *cpu,
     }
 }
 
-static int spapr_xive_target_to_end(sPAPRXive *xive,
-                                    uint32_t target, uint8_t prio,
-                                    uint8_t *out_end_blk, uint32_t *out_end_idx)
+int spapr_xive_target_to_end(sPAPRXive *xive, uint32_t target, uint8_t prio,
+                             uint8_t *out_end_blk, uint32_t *out_end_idx)
 {
    PowerPCCPU *cpu = spapr_find_cpu(target);
 
@@ -141,6 +139,10 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
     XiveSource *xsrc = &xive->source;
     int i;
 
+    if (kvmppc_xive_enabled()) {
+        kvmppc_xive_synchronize_state(xive);
+    }
+
     monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
 
     for (i = 0; i < xive->nr_irqs; i++) {
@@ -539,7 +541,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
  * interrupts (DD2.X POWER9). So we only allow the guest to use
  * priorities [0..6].
  */
-static bool spapr_xive_priority_is_reserved(uint8_t priority)
+bool spapr_xive_priority_is_reserved(uint8_t priority)
 {
     switch (priority) {
     case 0 ... 6:
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 8f773646aa3a..7cdf08ca368c 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -60,6 +60,39 @@ static void kvm_cpu_enable(CPUState *cs)
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
+static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
+{
+    uint64_t state[4] = { 0 };
+    int ret;
+
+    ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
+    if (ret != 0) {
+        error_setg_errno(errp, errno, "Could capture KVM XIVE CPU %ld state",
+                         kvm_arch_vcpu_id(tctx->cs));
+        return;
+    }
+
+    /* word0 and word1 of the OS ring. */
+    *((uint64_t *) &tctx->regs[TM_QW1_OS]) = state[0];
+
+    /*
+     * KVM also returns word2 containing the OS CAM line which is
+     * interesting to print out in the QEMU monitor.
+     */
+    *((uint64_t *) &tctx->regs[TM_QW1_OS + TM_WORD2]) = state[1];
+}
+
+static void kvmppc_xive_cpu_do_synchronize_state(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    kvmppc_xive_cpu_get_state(arg.host_ptr, &error_fatal);
+}
+
+void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx)
+{
+    run_on_cpu(tctx->cs, kvmppc_xive_cpu_do_synchronize_state,
+               RUN_ON_CPU_HOST_PTR(tctx));
+}
 
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
 {
@@ -119,6 +152,34 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
     }
 }
 
+/*
+ * This is used to perform the magic loads on the ESB pages, described
+ * in xive.h.
+ */
+static uint8_t xive_esb_read(XiveSource *xsrc, int srcno, uint32_t offset)
+{
+    unsigned long addr = (unsigned long) xsrc->esb_mmap +
+        xive_source_esb_mgmt(xsrc, srcno) + offset;
+
+    /* Prevent the compiler from optimizing away the load */
+    volatile uint64_t value = *((uint64_t *) addr);
+
+    return be64_to_cpu(value) & 0x3;
+}
+
+static void kvmppc_xive_source_get_state(XiveSource *xsrc)
+{
+    int i;
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        /* Perform a load without side effect to retrieve the PQ bits */
+        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
+
+        /* and save PQ locally */
+        xive_source_esb_set(xsrc, i, pq);
+    }
+}
+
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 {
     XiveSource *xsrc = opaque;
@@ -149,6 +210,143 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 /*
  * sPAPR XIVE interrupt controller (KVM)
  */
+static int kvmppc_xive_get_eq_state(sPAPRXive *xive, CPUState *cs,
+                                       Error **errp)
+{
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
+    int ret;
+    int i;
+
+    for (i = 0; i < XIVE_PRIORITY_MAX + 1; i++) {
+        Error *local_err = NULL;
+        struct kvm_ppc_xive_eq kvm_eq = { 0 };
+        uint64_t kvm_eq_idx;
+        XiveEND end = { 0 };
+        uint8_t end_blk, nvt_blk;
+        uint32_t end_idx, nvt_idx;
+
+        /* Skip priorities reserved for the hypervisor */
+        if (spapr_xive_priority_is_reserved(i)) {
+            continue;
+        }
+
+        /* Encode the tuple (server, prio) as a KVM EQ index */
+        kvm_eq_idx = i << KVM_XIVE_EQ_PRIORITY_SHIFT &
+            KVM_XIVE_EQ_PRIORITY_MASK;
+        kvm_eq_idx |= vcpu_id << KVM_XIVE_EQ_SERVER_SHIFT &
+            KVM_XIVE_EQ_SERVER_MASK;
+
+        ret = kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ, kvm_eq_idx,
+                                &kvm_eq, false, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return ret;
+        }
+
+        if (!(kvm_eq.flags & KVM_XIVE_EQ_FLAG_ENABLED)) {
+            continue;
+        }
+
+        /* Update the local END structure with the KVM input */
+        if (kvm_eq.flags & KVM_XIVE_EQ_FLAG_ENABLED) {
+            end.w0 |= cpu_to_be32(END_W0_VALID | END_W0_ENQUEUE);
+        }
+        if (kvm_eq.flags & KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY) {
+            end.w0 |= cpu_to_be32(END_W0_UCOND_NOTIFY);
+        }
+        if (kvm_eq.flags & KVM_XIVE_EQ_FLAG_ESCALATE) {
+            end.w0 |= cpu_to_be32(END_W0_ESCALATE_CTL);
+        }
+        end.w0 |= SETFIELD_BE32(END_W0_QSIZE, 0ul, kvm_eq.qsize - 12);
+
+        end.w1 = SETFIELD_BE32(END_W1_GENERATION, 0ul, kvm_eq.qtoggle) |
+            SETFIELD_BE32(END_W1_PAGE_OFF, 0ul, kvm_eq.qindex);
+        end.w2 = cpu_to_be32((kvm_eq.qpage >> 32) & 0x0fffffff);
+        end.w3 = cpu_to_be32(kvm_eq.qpage & 0xffffffff);
+        end.w4 = 0;
+        end.w5 = 0;
+
+        spapr_xive_cpu_to_nvt(xive, POWERPC_CPU(cs), &nvt_blk, &nvt_idx);
+
+        end.w6 = SETFIELD_BE32(END_W6_NVT_BLOCK, 0ul, nvt_blk) |
+            SETFIELD_BE32(END_W6_NVT_INDEX, 0ul, nvt_idx);
+        end.w7 = SETFIELD_BE32(END_W7_F0_PRIORITY, 0ul, i);
+
+        spapr_xive_cpu_to_end(xive, POWERPC_CPU(cs), i, &end_blk, &end_idx);
+
+        assert(end_idx < xive->nr_ends);
+        memcpy(&xive->endt[end_idx], &end, sizeof(XiveEND));
+    }
+
+    return 0;
+}
+
+static void kvmppc_xive_get_eas_state(sPAPRXive *xive, Error **errp)
+{
+    XiveSource *xsrc = &xive->source;
+    int i;
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        XiveEAS *eas = &xive->eat[i];
+        XiveEAS new_eas;
+        uint64_t kvm_eas;
+        uint8_t priority;
+        uint32_t server;
+        uint32_t end_idx;
+        uint8_t end_blk;
+        uint32_t eisn;
+        Error *local_err = NULL;
+
+        if (!xive_eas_is_valid(eas)) {
+            continue;
+        }
+
+        kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EAS, i, &kvm_eas, false,
+                          &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        priority = (kvm_eas & KVM_XIVE_EAS_PRIORITY_MASK) >>
+            KVM_XIVE_EAS_PRIORITY_SHIFT;
+        server = (kvm_eas & KVM_XIVE_EAS_SERVER_MASK) >>
+            KVM_XIVE_EAS_SERVER_SHIFT;
+        eisn = (kvm_eas & KVM_XIVE_EAS_EISN_MASK) >> KVM_XIVE_EAS_EISN_SHIFT;
+
+        if (spapr_xive_target_to_end(xive, server, priority, &end_blk,
+                                     &end_idx)) {
+            error_setg(errp, "XIVE: invalid tuple CPU %d priority %d", server,
+                       priority);
+            return;
+        }
+
+        new_eas.w = cpu_to_be64(EAS_VALID);
+        if (kvm_eas & KVM_XIVE_EAS_MASK_MASK) {
+            new_eas.w |= cpu_to_be64(EAS_MASKED);
+        }
+
+        new_eas.w = SETFIELD_BE64(EAS_END_INDEX, new_eas.w, end_idx);
+        new_eas.w = SETFIELD_BE64(EAS_END_BLOCK, new_eas.w, end_blk);
+        new_eas.w = SETFIELD_BE64(EAS_END_DATA, new_eas.w, eisn);
+
+        *eas = new_eas;
+    }
+}
+
+void kvmppc_xive_synchronize_state(sPAPRXive *xive)
+{
+    XiveSource *xsrc = &xive->source;
+    CPUState *cs;
+
+    kvmppc_xive_source_get_state(xsrc);
+
+    kvmppc_xive_get_eas_state(xive, &error_fatal);
+
+    CPU_FOREACH(cs) {
+        kvmppc_xive_get_eq_state(xive, cs, &error_fatal);
+    }
+}
 
 static void *kvmppc_xive_mmap(sPAPRXive *xive, int ctrl, size_t len,
                                  Error **errp)
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 2788f9210144..12a931081cfc 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -427,6 +427,10 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
     int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
     int i;
 
+    if (kvmppc_xive_enabled()) {
+        kvmppc_xive_cpu_synchronize_state(tctx);
+    }
+
     monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
                    "  W2\n", cpu_index);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 26/37] spapr/xive: introduce a VM state change handler
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (24 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 25/37] spapr/xive: add state synchronization with KVM Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 27/37] spapr/xive: add migration support for KVM Cédric Le Goater
                   ` (11 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This handler is in charge of stabilizing the flow of event notifications
in the XIVE controller before migrating a guest. This is a requirement
before transferring the guest EQ pages to a destination.

When the VM is stopped, the handler masks the sources (PQ=01) to stop
the flow of events and saves their previous state. The XIVE controller
is then synced through KVM to flush any in-flight event notification
and to stabilize the EQs. At this stage, the EQ pages are marked dirty
to make sure the EQ pages are transferred if a migration sequence is
in progress.

The previous configuration of the sources is restored when the VM
resumes, after a migration or a stop.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |   1 +
 hw/intc/spapr_xive_kvm.c    | 111 +++++++++++++++++++++++++++++++++++-
 2 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index bd81bb4d7608..c447534b6b29 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -39,6 +39,7 @@ typedef struct sPAPRXive {
     /* KVM support */
     int           fd;
     void          *tm_mmap;
+    VMChangeStateEntry *change;
 } sPAPRXive;
 
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 7cdf08ca368c..5cb7461e9743 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -334,12 +334,118 @@ static void kvmppc_xive_get_eas_state(sPAPRXive *xive, Error **errp)
     }
 }
 
+/*
+ * Sync the XIVE controller through KVM to flush any in-flight event
+ * notification and stabilize the EQs.
+ */
+ static void kvmppc_xive_sync_all(sPAPRXive *xive, Error **errp)
+{
+    XiveSource *xsrc = &xive->source;
+    Error *local_err = NULL;
+    int i;
+
+    /* Sync the KVM source. This reaches the XIVE HW through OPAL */
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        XiveEAS *eas = &xive->eat[i];
+
+        if (!xive_eas_is_valid(eas)) {
+            continue;
+        }
+
+        kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SYNC, i, NULL, true,
+                          &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+/*
+ * The primary goal of the XIVE VM change handler is to mark the EQ
+ * pages dirty when all XIVE event notifications have stopped.
+ *
+ * Whenever the VM is stopped, the VM change handler masks the sources
+ * (PQ=01) to stop the flow of events and saves the previous state in
+ * anticipation of a migration. The XIVE controller is then synced
+ * through KVM to flush any in-flight event notification and stabilize
+ * the EQs.
+ *
+ * At this stage, we can mark the EQ page dirty and let a migration
+ * sequence transfer the EQ pages to the destination, which is done
+ * just after the stop state.
+ *
+ * The previous configuration of the sources is restored when the VM
+ * runs again.
+ */
+static void kvmppc_xive_change_state_handler(void *opaque, int running,
+                                             RunState state)
+{
+    sPAPRXive *xive = opaque;
+    XiveSource *xsrc = &xive->source;
+    Error *local_err = NULL;
+    int i;
+
+    /*
+     * Restore the sources to their initial state. This is called when
+     * the VM resumes after a stop or a migration.
+     */
+    if (running) {
+        for (i = 0; i < xsrc->nr_irqs; i++) {
+            uint8_t pq = xive_source_esb_get(xsrc, i);
+            if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 0x1) {
+                error_report("XIVE: IRQ %d has an invalid state", i);
+            }
+        }
+
+        return;
+    }
+
+    /*
+     * Mask the sources, to stop the flow of event notifications, and
+     * save the PQs locally in the XiveSource object. The XiveSource
+     * state will be collected later on by its vmstate handler if a
+     * migration is in progress.
+     */
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
+        xive_source_esb_set(xsrc, i, pq);
+    }
+
+    /*
+     * Sync the XIVE controller in KVM, to flush in-flight event
+     * notification that should be enqueued in the EQs.
+     */
+    kvmppc_xive_sync_all(xive, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return;
+    }
+
+    /*
+     * Mark the XIVE EQ pages dirty to collect all updates.
+     */
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL,
+                      KVM_DEV_XIVE_SAVE_EQ_PAGES, NULL, true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+}
+
 void kvmppc_xive_synchronize_state(sPAPRXive *xive)
 {
     XiveSource *xsrc = &xive->source;
     CPUState *cs;
 
-    kvmppc_xive_source_get_state(xsrc);
+    /*
+     * When the VM is stopped, the sources are masked and the previous
+     * state is saved in anticipation of a migration. We should not
+     * synchronize the source state in that case else we will override
+     * the saved state.
+     */
+    if (runstate_is_running()) {
+        kvmppc_xive_source_get_state(xsrc);
+    }
 
     kvmppc_xive_get_eas_state(xive, &error_fatal);
 
@@ -442,6 +548,9 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
                                       "xive.tima", tima_len, xive->tm_mmap);
     sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
 
+    xive->change = qemu_add_vm_change_state_handler(
+        kvmppc_xive_change_state_handler, xive);
+
     kvm_kernel_irqchip = true;
     kvm_msi_via_irqfd_allowed = true;
     kvm_gsi_direct_mapping = true;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 27/37] spapr/xive: add migration support for KVM
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (25 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 26/37] spapr/xive: introduce a VM state change handler Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 28/37] spapr/xive: fix migration of the XiveTCTX under TCG Cédric Le Goater
                   ` (10 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

When the VM is stopped, the VM state handler stabilizes the XIVE IC
and marks the EQ pages dirty. These are then transfered to destination
before the transfer of the device vmstates starts.

The sPAPRXive interrupt controller model captures the XIVE internal
tables, EAT and ENDT and the XiveTCTX model does the same for the
thread interrupt context registers.

At restart, the sPAPRXive 'post_load' method restores all the XIVE
states. It is called by the sPAPR machine 'post_load' method, when all
XIVE states have been transferred and loaded.

Finally, the source states are restored in the VM change state handler
when the machine reaches the running state.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |   5 +
 include/hw/ppc/xive.h       |   1 +
 hw/intc/spapr_xive.c        |  34 +++++++
 hw/intc/spapr_xive_kvm.c    | 187 +++++++++++++++++++++++++++++++++++-
 hw/intc/xive.c              |  17 ++++
 hw/ppc/spapr_irq.c          |   2 +-
 6 files changed, 244 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index c447534b6b29..277771eeb4d5 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -47,6 +47,7 @@ bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
 qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
 bool spapr_xive_priority_is_reserved(uint8_t priority);
+int spapr_xive_post_load(sPAPRXive *xive, int version_id);
 
 void spapr_xive_cpu_to_nvt(sPAPRXive *xive, PowerPCCPU *cpu,
                            uint8_t *out_nvt_blk, uint32_t *out_nvt_idx);
@@ -54,6 +55,8 @@ void spapr_xive_cpu_to_end(sPAPRXive *xive, PowerPCCPU *cpu, uint8_t prio,
                            uint8_t *out_end_blk, uint32_t *out_end_idx);
 int spapr_xive_target_to_end(sPAPRXive *xive, uint32_t target, uint8_t prio,
                              uint8_t *out_end_blk, uint32_t *out_end_idx);
+int spapr_xive_end_to_target(sPAPRXive *xive, uint8_t end_blk, uint32_t end_idx,
+                             uint32_t *out_server, uint8_t *out_prio);
 
 typedef struct sPAPRMachineState sPAPRMachineState;
 
@@ -68,5 +71,7 @@ void spapr_xive_map_mmio(sPAPRXive *xive);
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
 void kvmppc_xive_synchronize_state(sPAPRXive *xive);
+int kvmppc_xive_pre_save(sPAPRXive *xive);
+int kvmppc_xive_post_load(sPAPRXive *xive, int version_id);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 7330c11d31c8..d06bcae28e9d 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -448,5 +448,6 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
 void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx);
+void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp);
 
 #endif /* PPC_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 87f60dd4e453..b030cfe7f136 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -82,6 +82,19 @@ static int spapr_xive_target_to_nvt(sPAPRXive *xive, uint32_t target,
  * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
  * priorities per CPU
  */
+int spapr_xive_end_to_target(sPAPRXive *xive, uint8_t end_blk, uint32_t end_idx,
+                             uint32_t *out_server, uint8_t *out_prio)
+{
+    if (out_server) {
+        *out_server = end_idx >> 3;
+    }
+
+    if (out_prio) {
+        *out_prio = end_idx & 0x7;
+    }
+    return 0;
+}
+
 void spapr_xive_cpu_to_end(sPAPRXive *xive, PowerPCCPU *cpu, uint8_t prio,
                            uint8_t *out_end_blk, uint32_t *out_end_idx)
 {
@@ -426,10 +439,31 @@ static const VMStateDescription vmstate_spapr_xive_eas = {
     },
 };
 
+static int vmstate_spapr_xive_pre_save(void *opaque)
+{
+    if (kvmppc_xive_enabled()) {
+        return kvmppc_xive_pre_save(SPAPR_XIVE(opaque));
+    }
+
+    return 0;
+}
+
+/* Called by the sPAPR machine 'post_load' method */
+int spapr_xive_post_load(sPAPRXive *xive, int version_id)
+{
+    if (kvmppc_xive_enabled()) {
+        return kvmppc_xive_post_load(xive, version_id);
+    }
+
+    return 0;
+}
+
 static const VMStateDescription vmstate_spapr_xive = {
     .name = TYPE_SPAPR_XIVE,
     .version_id = 1,
     .minimum_version_id = 1,
+    .pre_save = vmstate_spapr_xive_pre_save,
+    .post_load = NULL, /* handled at the machine level */
     .fields = (VMStateField[]) {
         VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
         VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 5cb7461e9743..04f997479e8f 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -60,7 +60,29 @@ static void kvm_cpu_enable(CPUState *cs)
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
-static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
+
+static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error **errp)
+{
+    uint64_t state[4];
+    int ret;
+
+    /* word0 and word1 of the OS ring. */
+    state[0] = *((uint64_t *) &tctx->regs[TM_QW1_OS]);
+
+    /*
+     * OS CAM line. Used by KVM to print out the VP identifier. This
+     * is for debug only.
+     */
+    state[1] = *((uint64_t *) &tctx->regs[TM_QW1_OS + TM_WORD2]);
+
+    ret = kvm_set_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
+    if (ret != 0) {
+        error_setg_errno(errp, errno, "Could restore KVM XIVE CPU %ld state",
+                         kvm_arch_vcpu_id(tctx->cs));
+    }
+}
+
+void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
 {
     uint64_t state[4] = { 0 };
     int ret;
@@ -210,6 +232,59 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 /*
  * sPAPR XIVE interrupt controller (KVM)
  */
+
+static int kvmppc_xive_set_eq_state(sPAPRXive *xive, CPUState *cs,
+                                       Error **errp)
+{
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
+    int ret;
+    int i;
+
+    for (i = 0; i < XIVE_PRIORITY_MAX + 1; i++) {
+        Error *local_err = NULL;
+        XiveEND *end;
+        uint8_t end_blk;
+        uint32_t end_idx;
+        struct kvm_ppc_xive_eq kvm_eq = { 0 };
+        uint64_t kvm_eq_idx;
+
+        if (spapr_xive_priority_is_reserved(i)) {
+            continue;
+        }
+
+        spapr_xive_cpu_to_end(xive, POWERPC_CPU(cs), i, &end_blk, &end_idx);
+        assert(end_idx < xive->nr_ends);
+        end = &xive->endt[end_idx];
+
+        if (!xive_end_is_valid(end)) {
+            continue;
+        }
+
+        /* Build the KVM state from the local END structure */
+        kvm_eq.flags   = KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY;
+        kvm_eq.qsize   = GETFIELD_BE32(END_W0_QSIZE, end->w0) + 12;
+        kvm_eq.qpage   = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32 |
+            be32_to_cpu(end->w3);
+        kvm_eq.qtoggle = GETFIELD_BE32(END_W1_GENERATION, end->w1);
+        kvm_eq.qindex  = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
+
+        /* Encode the tuple (server, prio) as a KVM EQ index */
+        kvm_eq_idx = i << KVM_XIVE_EQ_PRIORITY_SHIFT &
+            KVM_XIVE_EQ_PRIORITY_MASK;
+        kvm_eq_idx |= vcpu_id << KVM_XIVE_EQ_SERVER_SHIFT &
+            KVM_XIVE_EQ_SERVER_MASK;
+
+        ret = kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ, kvm_eq_idx,
+                                &kvm_eq, true, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 static int kvmppc_xive_get_eq_state(sPAPRXive *xive, CPUState *cs,
                                        Error **errp)
 {
@@ -281,6 +356,48 @@ static int kvmppc_xive_get_eq_state(sPAPRXive *xive, CPUState *cs,
     return 0;
 }
 
+static void kvmppc_xive_set_eas_state(sPAPRXive *xive, Error **errp)
+{
+    XiveSource *xsrc = &xive->source;
+    int i;
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        XiveEAS *eas = &xive->eat[i];
+        uint32_t end_idx;
+        uint32_t end_blk;
+        uint32_t eisn;
+        uint8_t priority;
+        uint32_t server;
+        uint64_t kvm_eas;
+        Error *local_err = NULL;
+
+        /* No need to set MASKED EAS, this is the default state after reset */
+        if (!xive_eas_is_valid(eas) || xive_eas_is_masked(eas)) {
+            continue;
+        }
+
+        end_idx = GETFIELD_BE64(EAS_END_INDEX, eas->w);
+        end_blk = GETFIELD_BE64(EAS_END_BLOCK, eas->w);
+        eisn = GETFIELD_BE64(EAS_END_DATA, eas->w);
+
+        spapr_xive_end_to_target(xive, end_blk, end_idx, &server, &priority);
+
+        kvm_eas = priority << KVM_XIVE_EAS_PRIORITY_SHIFT &
+            KVM_XIVE_EAS_PRIORITY_MASK;
+        kvm_eas |= server << KVM_XIVE_EAS_SERVER_SHIFT &
+            KVM_XIVE_EAS_SERVER_MASK;
+        kvm_eas |= ((uint64_t)eisn << KVM_XIVE_EAS_EISN_SHIFT) &
+            KVM_XIVE_EAS_EISN_MASK;
+
+        kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EAS, i, &kvm_eas, true,
+                          &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
 static void kvmppc_xive_get_eas_state(sPAPRXive *xive, Error **errp)
 {
     XiveSource *xsrc = &xive->source;
@@ -432,6 +549,74 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
     }
 }
 
+int kvmppc_xive_pre_save(sPAPRXive *xive)
+{
+    Error *local_err = NULL;
+    CPUState *cs;
+
+    /* Grab the EAT */
+    kvmppc_xive_get_eas_state(xive, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -1;
+    }
+
+    /*
+     * Grab the ENDT. The EQ index and the toggle bit are what we want
+     * to capture.
+     */
+    CPU_FOREACH(cs) {
+        kvmppc_xive_get_eq_state(xive, cs, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * The sPAPRXive 'post_load' method is called by the sPAPR machine
+ * 'post_load' method, when all XIVE states have been transferred and
+ * loaded.
+ */
+int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
+{
+    Error *local_err = NULL;
+    CPUState *cs;
+
+    /* Restore the ENDT first. The targetting depends on it. */
+    CPU_FOREACH(cs) {
+        kvmppc_xive_set_eq_state(xive, cs, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    /* Restore the EAT */
+    kvmppc_xive_set_eas_state(xive, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -1;
+    }
+
+    /* Restore the thread interrupt contexts */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        kvmppc_xive_cpu_set_state(XIVE_TCTX(cpu->intc), &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    /* The source states will be restored when the machine starts running */
+    return 0;
+}
+
 void kvmppc_xive_synchronize_state(sPAPRXive *xive)
 {
     XiveSource *xsrc = &xive->source;
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 12a931081cfc..cde46b5ca161 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -533,10 +533,27 @@ static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
     qemu_unregister_reset(xive_tctx_reset, dev);
 }
 
+static int vmstate_xive_tctx_pre_save(void *opaque)
+{
+    Error *local_err = NULL;
+
+    if (kvmppc_xive_enabled()) {
+        kvmppc_xive_cpu_get_state(XIVE_TCTX(opaque), &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 static const VMStateDescription vmstate_xive_tctx = {
     .name = TYPE_XIVE_TCTX,
     .version_id = 1,
     .minimum_version_id = 1,
+    .pre_save = vmstate_xive_tctx_pre_save,
+    .post_load = NULL, /* handled by the sPAPRxive model */
     .fields = (VMStateField[]) {
         VMSTATE_BUFFER(regs, XiveTCTX),
         VMSTATE_END_OF_LIST()
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 94ee3ec6a9f4..7b401dc1d47c 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -326,7 +326,7 @@ static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
 
 static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 {
-    return 0;
+    return spapr_xive_post_load(spapr->xive, version_id);
 }
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 28/37] spapr/xive: fix migration of the XiveTCTX under TCG
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (26 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 27/37] spapr/xive: add migration support for KVM Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 29/37] spapr: set the interrupt presenter at reset Cédric Le Goater
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

When the thread interrupt management state is retrieved from the KVM
VCPU, word2 is saved under the QEMU XIVE thread context to print out
the OS CAM line under the QEMU monitor.

This breaks the migration on a TCG guest (or on KVM with
kernel_irqchip=off) because the matching algorithm of the presenter
relies on the OS CAM value. Fix with an extra reset of the thread
contexts to restore the expected value.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr_irq.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 7b401dc1d47c..951d4ff1350a 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -326,7 +326,25 @@ static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
 
 static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 {
-    return spapr_xive_post_load(spapr->xive, version_id);
+    int ret;
+
+    ret = spapr_xive_post_load(spapr->xive, version_id);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * When the states are collected from the KVM XIVE device, word2
+     * of the XiveTCTX is set to print out the OS CAM line under the
+     * QEMU monitor.
+     *
+     * This breaks the migration on a TCG guest (or on KVM with
+     * kernel_irqchip=off) because the matching algorithm of the
+     * presenter relies on the OS CAM value. Fix with an extra reset
+     * of the thread contexts to restore the expected value.
+     */
+    spapr_xive_reset_tctx(spapr->xive);
+    return 0;
 }
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 29/37] spapr: set the interrupt presenter at reset
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (27 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 28/37] spapr/xive: fix migration of the XiveTCTX under TCG Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 30/37] spapr/xive: enable XIVE MMIOs " Cédric Le Goater
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Currently, the interrupt presenter of the vCPU is set at realize
time. Setting it at reset will become useful when the new machine
supporting both interrupt modes is introduced. In this machine, the
interrupt mode is chosen at CAS time and activated after a reset.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_cpu_core.h |  2 ++
 hw/ppc/spapr_cpu_core.c         | 26 ++++++++++++++++++++++++++
 hw/ppc/spapr_irq.c              | 11 +++++++++++
 3 files changed, 39 insertions(+)

diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index 9e2821e4b31f..fc8ea9021656 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -53,4 +53,6 @@ static inline sPAPRCPUState *spapr_cpu_state(PowerPCCPU *cpu)
     return (sPAPRCPUState *)cpu->machine_data;
 }
 
+void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type);
+
 #endif
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 1811cd48db90..529de0b6b9c8 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -398,3 +398,29 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
 };
 
 DEFINE_TYPES(spapr_cpu_core_type_infos)
+
+typedef struct ForeachFindIntCArgs {
+    const char *intc_type;
+    Object *intc;
+} ForeachFindIntCArgs;
+
+static int spapr_cpu_core_find_intc(Object *child, void *opaque)
+{
+    ForeachFindIntCArgs *args = opaque;
+
+    if (object_dynamic_cast(child, args->intc_type)) {
+        args->intc = child;
+    }
+
+    return args->intc != NULL;
+}
+
+void spapr_cpu_core_set_intc(PowerPCCPU *cpu, const char *intc_type)
+{
+    ForeachFindIntCArgs args = { intc_type, NULL };
+
+    object_child_foreach(OBJECT(cpu), spapr_cpu_core_find_intc, &args);
+    g_assert(args.intc);
+
+    cpu->intc = args.intc;
+}
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 951d4ff1350a..f9b5564b271c 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -211,6 +211,11 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
 
 static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
 {
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
+    }
 }
 
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
@@ -349,6 +354,12 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
 {
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        spapr_cpu_core_set_intc(POWERPC_CPU(cs), TYPE_XIVE_TCTX);
+    }
+
     /*
      * Set the OS CAM line of the cpu interrupt thread context. Needs
      * to come after the XiveTCTX reset handlers.
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 30/37] spapr/xive: enable XIVE MMIOs at reset
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (28 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 29/37] spapr: set the interrupt presenter at reset Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 31/37] spapr: add a 'pseries-3.1-dual' machine type Cédric Le Goater
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Depending on the interrupt mode chosen, enable or disable the XIVE
MMIOs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h | 1 +
 hw/intc/spapr_xive.c        | 9 +++++++++
 hw/ppc/spapr_irq.c          | 8 ++++++++
 3 files changed, 18 insertions(+)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 277771eeb4d5..735e916d3844 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -65,6 +65,7 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
 void spapr_xive_reset_tctx(sPAPRXive *xive);
 void spapr_xive_map_mmio(sPAPRXive *xive);
+void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable);
 
 /*
  * KVM XIVE device helpers
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index b030cfe7f136..68d2a6fd8177 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -196,6 +196,15 @@ void spapr_xive_map_mmio(sPAPRXive *xive)
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
 }
 
+void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable)
+{
+    memory_region_set_enabled(&xive->source.esb_mmio, enable);
+    memory_region_set_enabled(&xive->tm_mmio, enable);
+
+    /* Disable the END ESBs until a guest OS makes use of them */
+    memory_region_set_enabled(&xive->end_source.esb_mmio, false);
+}
+
 /*
  * When a Virtual Processor is scheduled to run on a HW thread, the
  * hypervisor pushes its identifier in the OS CAM line. Emulate the
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index f9b5564b271c..94bb4d27758a 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -216,6 +216,11 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
     CPU_FOREACH(cs) {
         spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
     }
+
+    /* Deactivate the XIVE MMIOs */
+    if (spapr->xive) {
+        spapr_xive_enable_mmio(spapr->xive, false);
+    }
 }
 
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
@@ -365,6 +370,9 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
      * to come after the XiveTCTX reset handlers.
      */
     spapr_xive_reset_tctx(spapr->xive);
+
+    /* Activate the XIVE MMIOs */
+    spapr_xive_enable_mmio(spapr->xive, true);
 }
 
 /*
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 31/37] spapr: add a 'pseries-3.1-dual' machine type
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (29 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 30/37] spapr/xive: enable XIVE MMIOs " Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 32/37] ppc/xics: introduce a icp_kvm_connect() routine Cédric Le Goater
                   ` (6 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This pseries machine makes use of a new sPAPR IRQ backend supporting
both interrupt modes : XIVE and XICS, the default being XICS.

The interrupt mode is chosen by the CAS negotiation process and
activated after a reset to take into account the required changes in
the machine. These impact the device tree layout, the interrupt
presenter object and the exposed MMIO regions in the case of XIVE.

KVM is not yet supported.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h |   1 +
 hw/ppc/spapr.c             |  18 ++++-
 hw/ppc/spapr_hcall.c       |  13 ++++
 hw/ppc/spapr_irq.c         | 142 +++++++++++++++++++++++++++++++++++++
 4 files changed, 173 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 26727a7263a5..af429148ce2d 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -51,6 +51,7 @@ typedef struct sPAPRIrq {
 extern sPAPRIrq spapr_irq_xics;
 extern sPAPRIrq spapr_irq_xics_legacy;
 extern sPAPRIrq spapr_irq_xive;
+extern sPAPRIrq spapr_irq_dual;
 
 void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
 int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3cdc66484f42..232956116518 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2634,7 +2634,8 @@ static void spapr_machine_init(MachineState *machine)
     spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
 
     /* advertise XIVE */
-    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT) {
+    if (smc->irq->ov5 == SPAPR_OV5_XIVE_EXPLOIT ||
+        smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH) {
         spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
     }
 
@@ -4002,6 +4003,21 @@ static void spapr_machine_3_1_xive_class_options(MachineClass *mc)
 
 DEFINE_SPAPR_MACHINE(3_1_xive, "3.1-xive", false);
 
+static void spapr_machine_3_1_dual_instance_options(MachineState *machine)
+{
+    spapr_machine_3_1_instance_options(machine);
+}
+
+static void spapr_machine_3_1_dual_class_options(MachineClass *mc)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
+    spapr_machine_3_1_class_options(mc);
+    smc->irq = &spapr_irq_dual;
+}
+
+DEFINE_SPAPR_MACHINE(3_1_dual, "3.1-dual", false);
+
 /*
  * pseries-3.0
  */
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index ae913d070f50..186b6a65543f 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1654,6 +1654,19 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             (spapr_h_cas_compose_response(spapr, args[1], args[2],
                                           ov5_updates) != 0);
     }
+
+    /*
+     * Generate a machine reset when we have an update of the
+     * interrupt mode. Only required on the machine supporting both
+     * mode.
+     */
+    if (!spapr->cas_reboot) {
+        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+        spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)
+            && smc->irq->ov5 == SPAPR_OV5_XIVE_BOTH;
+    }
+
     spapr_ovec_cleanup(ov5_updates);
 
     if (spapr->cas_reboot) {
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 94bb4d27758a..157c335f6f8d 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -399,6 +399,148 @@ sPAPRIrq spapr_irq_xive = {
     .reset       = spapr_irq_reset_xive,
 };
 
+/*
+ * Dual XIVE and XICS IRQ backend.
+ *
+ * Both interrupt mode, XIVE and XICS, objects are created but the
+ * machine starts in legacy interrupt mode (XICS). It can be changed
+ * by the CAS negotiation process and, in that case, the new mode is
+ * activated after extra machine reset.
+ */
+
+/*
+ * Returns the sPAPR IRQ backend negotiated by CAS. XICS is the
+ * default.
+ */
+static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
+{
+    return spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT) ?
+        &spapr_irq_xive : &spapr_irq_xics;
+}
+
+static void spapr_irq_init_dual(sPAPRMachineState *spapr, int nr_irqs,
+                                Error **errp)
+{
+    MachineState *machine = MACHINE(spapr);
+    Error *local_err = NULL;
+
+    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
+        error_setg(errp, "No KVM support for the 'dual' machine");
+        return;
+    }
+
+    spapr_irq_xics.init(spapr, spapr_irq_xics.nr_irqs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    spapr_irq_xive.init(spapr, spapr_irq_xive.nr_irqs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static int spapr_irq_claim_dual(sPAPRMachineState *spapr, int irq, bool lsi,
+                                Error **errp)
+{
+    int ret;
+    Error *local_err = NULL;
+
+    ret = spapr_irq_xive.claim(spapr, irq, lsi, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return ret;
+    }
+
+    ret = spapr_irq_xics.claim(spapr, irq, lsi, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+
+    return ret;
+}
+
+static void spapr_irq_free_dual(sPAPRMachineState *spapr, int irq, int num)
+{
+    spapr_irq_xive.free(spapr, irq, num);
+    spapr_irq_xics.free(spapr, irq, num);
+}
+
+static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq)
+{
+    return spapr_irq_current(spapr)->qirq(spapr, irq);
+}
+
+static void spapr_irq_print_info_dual(sPAPRMachineState *spapr, Monitor *mon)
+{
+    spapr_irq_current(spapr)->print_info(spapr, mon);
+}
+
+static void spapr_irq_dt_populate_dual(sPAPRMachineState *spapr,
+                                       uint32_t nr_servers, void *fdt,
+                                       uint32_t phandle)
+{
+    spapr_irq_current(spapr)->dt_populate(spapr, nr_servers, fdt, phandle);
+}
+
+static Object *spapr_irq_cpu_intc_create_dual(sPAPRMachineState *spapr,
+                                              Object *cpu, Error **errp)
+{
+    Error *local_err = NULL;
+
+    spapr_irq_xive.cpu_intc_create(spapr, cpu, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    /* Default to XICS interrupt mode */
+    return spapr_irq_xics.cpu_intc_create(spapr, cpu, errp);
+}
+
+static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
+{
+    /*
+     * Force a reset of the XIVE backend after migration. The machine
+     * defaults to XICS at startup.
+     */
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        spapr_irq_xive.reset(spapr, &error_fatal);
+    }
+
+    return spapr_irq_current(spapr)->post_load(spapr, version_id);
+}
+
+static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
+{
+    /*
+     * Reset the selected interrupt mode to reconnect to KVM. At startup,
+     * the machine uses XICS as it is the default interrupt mode.
+     */
+    spapr_irq_current(spapr)->reset(spapr, errp);
+}
+
+#define SPAPR_IRQ_DUAL_NR_IRQS     0x2000
+#define SPAPR_IRQ_DUAL_NR_MSIS     (SPAPR_IRQ_DUAL_NR_IRQS - SPAPR_IRQ_MSI)
+
+sPAPRIrq spapr_irq_dual = {
+    .nr_irqs     = SPAPR_IRQ_DUAL_NR_IRQS,
+    .nr_msis     = SPAPR_IRQ_DUAL_NR_MSIS,
+    .ov5         = SPAPR_OV5_XIVE_BOTH,
+
+    .init        = spapr_irq_init_dual,
+    .claim       = spapr_irq_claim_dual,
+    .free        = spapr_irq_free_dual,
+    .qirq        = spapr_qirq_dual,
+    .print_info  = spapr_irq_print_info_dual,
+    .dt_populate = spapr_irq_dt_populate_dual,
+    .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
+    .post_load   = spapr_irq_post_load_dual,
+    .reset       = spapr_irq_reset_dual,
+};
+
 /*
  * sPAPR IRQ frontend routines for devices
  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 32/37] ppc/xics: introduce a icp_kvm_connect() routine
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (30 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 31/37] spapr: add a 'pseries-3.1-dual' machine type Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 33/37] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers Cédric Le Goater
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This routine gathers all the KVM initialization of the XICS KVM
presenter. It will be useful when the initialization of the KVM XICS
device is moved to a global routine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics_kvm.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index e8fa9a53aeba..de86e1d0b1ab 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -123,11 +123,8 @@ static void icp_kvm_reset(DeviceState *dev)
     icp_set_kvm_state(ICP(dev), 1);
 }
 
-static void icp_kvm_realize(DeviceState *dev, Error **errp)
+static void icp_kvm_connect(ICPState *icp, Error **errp)
 {
-    ICPState *icp = ICP(dev);
-    ICPStateClass *icpc = ICP_GET_CLASS(icp);
-    Error *local_err = NULL;
     CPUState *cs;
     KVMEnabledICP *enabled_icp;
     unsigned long vcpu_id;
@@ -135,11 +132,6 @@ static void icp_kvm_realize(DeviceState *dev, Error **errp)
 
     if (kernel_xics_fd == -1) {
         abort();
-    }
-
-    icpc->parent_realize(dev, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
         return;
     }
 
@@ -168,6 +160,25 @@ static void icp_kvm_realize(DeviceState *dev, Error **errp)
     QLIST_INSERT_HEAD(&kvm_enabled_icps, enabled_icp, node);
 }
 
+static void icp_kvm_realize(DeviceState *dev, Error **errp)
+{
+    ICPStateClass *icpc = ICP_GET_CLASS(dev);
+    Error *local_err = NULL;
+
+    icpc->parent_realize(dev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Connect the presenter to the VCPU (required for CPU hotplug) */
+    icp_kvm_connect(ICP(dev), &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
 static void icp_kvm_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 33/37] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (31 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 32/37] ppc/xics: introduce a icp_kvm_connect() routine Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 34/37] sysbus: add a sysbus_mmio_unmap() helper Cédric Le Goater
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

Removing RTAS handlers will become necessary when the new pseries
machine supporting multiple interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h | 4 ++++
 hw/ppc/spapr_rtas.c    | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index daced428a42c..ca38b9d9c046 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -649,6 +649,10 @@ typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPRMachineState *sm,
                               uint32_t nargs, target_ulong args,
                               uint32_t nret, target_ulong rets);
 void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn);
+static inline void spapr_rtas_unregister(int token)
+{
+    spapr_rtas_register(token, NULL, NULL);
+}
 target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPRMachineState *sm,
                              uint32_t token, uint32_t nargs, target_ulong args,
                              uint32_t nret, target_ulong rets);
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index d6a0952154ac..e005d5d08151 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -404,7 +404,7 @@ void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
 
     token -= RTAS_TOKEN_BASE;
 
-    assert(!rtas_table[token].name);
+    assert(!name || !rtas_table[token].name);
 
     rtas_table[token].name = name;
     rtas_table[token].fn = fn;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 34/37] sysbus: add a sysbus_mmio_unmap() helper
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (32 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 33/37] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 35/37] spapr: introduce routines to delete the KVM IRQ device Cédric Le Goater
                   ` (3 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

This will be used to remove the MMIO regions of the POWER9 XIVE
interrupt controller when the sPAPR machine is reseted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/hw/sysbus.h |  1 +
 hw/core/sysbus.c    | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/hw/sysbus.h b/include/hw/sysbus.h
index 0b59a3b8d605..bc641984b5da 100644
--- a/include/hw/sysbus.h
+++ b/include/hw/sysbus.h
@@ -92,6 +92,7 @@ qemu_irq sysbus_get_connected_irq(SysBusDevice *dev, int n);
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
 void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
                              int priority);
+void sysbus_mmio_unmap(SysBusDevice *dev, int n);
 void sysbus_add_io(SysBusDevice *dev, hwaddr addr,
                    MemoryRegion *mem);
 MemoryRegion *sysbus_address_space(SysBusDevice *dev);
diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
index 7ac36ad3e707..09f202167dcb 100644
--- a/hw/core/sysbus.c
+++ b/hw/core/sysbus.c
@@ -153,6 +153,16 @@ static void sysbus_mmio_map_common(SysBusDevice *dev, int n, hwaddr addr,
     }
 }
 
+void sysbus_mmio_unmap(SysBusDevice *dev, int n)
+{
+    assert(n >= 0 && n < dev->num_mmio);
+
+    if (dev->mmio[n].addr != (hwaddr)-1) {
+        memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory);
+        dev->mmio[n].addr = (hwaddr)-1;
+    }
+}
+
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
 {
     sysbus_mmio_map_common(dev, n, addr, false, 0);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 35/37] spapr: introduce routines to delete the KVM IRQ device
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (33 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 34/37] sysbus: add a sysbus_mmio_unmap() helper Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 36/37] spapr: check for KVM IRQ device activation Cédric Le Goater
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

If a new interrupt mode is chosen by CAS, the machine generates a
reset to reconfigure. At this point, the connection with the previous
KVM device needs to be closed and a new connection needs to opened
with the KVM device operating the chosen interrupt mode.

New routines are introduced to destroy the XICS and the XIVE KVM
devices. They make use of a new KVM device ioctl which destroys the
device and also disconnects the IRQ presenters from the vCPUs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |  1 +
 include/hw/ppc/xics.h       |  1 +
 hw/intc/spapr_xive_kvm.c    | 61 +++++++++++++++++++++++++++++++++++++
 hw/intc/xics_kvm.c          | 57 ++++++++++++++++++++++++++++++++++
 4 files changed, 120 insertions(+)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 735e916d3844..250de7fdc943 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -71,6 +71,7 @@ void spapr_xive_enable_mmio(sPAPRXive *xive, bool enable);
  * KVM XIVE device helpers
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
+void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp);
 void kvmppc_xive_synchronize_state(sPAPRXive *xive);
 int kvmppc_xive_pre_save(sPAPRXive *xive);
 int kvmppc_xive_post_load(sPAPRXive *xive, int version_id);
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 14afda198cdb..f7e5f8f9b5d7 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -205,6 +205,7 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
+int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp);
 void xics_spapr_init(sPAPRMachineState *spapr);
 
 Object *icp_create(Object *cpu, const char *type, XICSFabric *xi,
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 04f997479e8f..dba3344831c6 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -57,6 +57,16 @@ static void kvm_cpu_enable(CPUState *cs)
     QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
 }
 
+static void kvm_cpu_disable_all(void)
+{
+    KVMEnabledCPU *enabled_cpu, *next;
+
+    QLIST_FOREACH_SAFE(enabled_cpu, &kvm_enabled_cpus, node, next) {
+        QLIST_REMOVE(enabled_cpu, node);
+        g_free(enabled_cpu);
+    }
+}
+
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
@@ -743,3 +753,54 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
     /* Map all regions */
     spapr_xive_map_mmio(xive);
 }
+
+void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp)
+{
+    XiveSource *xsrc;
+    struct kvm_create_device xive_destroy_device = { 0 };
+    size_t esb_len;
+    int rc;
+
+    if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
+        error_setg(errp,
+                   "IRQ_XIVE capability must be present for KVM XIVE device");
+        return;
+    }
+
+    /* The KVM XIVE device is not in use */
+    if (!xive || xive->fd == -1) {
+        return;
+    }
+
+    /* Clear the KVM mapping */
+    xsrc = &xive->source;
+    esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
+
+    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 0);
+    munmap(xsrc->esb_mmap, esb_len);
+
+    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 1);
+
+    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 2);
+    munmap(xive->tm_mmap, 4ull << TM_SHIFT);
+
+    /* Destroy the KVM device. This also clears the VCPU presenters */
+    xive_destroy_device.fd = xive->fd;
+    xive_destroy_device.type = KVM_DEV_TYPE_XIVE;
+    rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, &xive_destroy_device);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XIVE");
+    }
+    close(xive->fd);
+    xive->fd = -1;
+
+    kvm_kernel_irqchip = false;
+    kvm_msi_via_irqfd_allowed = false;
+    kvm_gsi_direct_mapping = false;
+
+    /* Clear the local list of presenter (hotplug) */
+    kvm_cpu_disable_all();
+
+    /* VM Change state handler is not needed anymore */
+    qemu_del_vm_change_state_handler(xive->change);
+}
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index de86e1d0b1ab..2a60ae71730b 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -50,6 +50,16 @@ typedef struct KVMEnabledICP {
 static QLIST_HEAD(, KVMEnabledICP)
     kvm_enabled_icps = QLIST_HEAD_INITIALIZER(&kvm_enabled_icps);
 
+static void kvm_disable_icps(void)
+{
+    KVMEnabledICP *enabled_icp, *next;
+
+    QLIST_FOREACH_SAFE(enabled_icp, &kvm_enabled_icps, node, next) {
+        QLIST_REMOVE(enabled_icp, node);
+        g_free(enabled_icp);
+    }
+}
+
 /*
  * ICP-KVM
  */
@@ -456,6 +466,53 @@ fail:
     return -1;
 }
 
+int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp)
+{
+    int rc;
+    struct kvm_create_device xics_create_device = {
+        .fd = kernel_xics_fd,
+        .type = KVM_DEV_TYPE_XICS,
+        .flags = 0,
+    };
+
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return 0;
+    }
+
+    if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
+        error_setg(errp,
+                   "KVM and IRQ_XICS capability must be present for KVM XICS device");
+        return -1;
+    }
+
+    rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, &xics_create_device);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XICS");
+    }
+    close(kernel_xics_fd);
+    kernel_xics_fd = -1;
+
+    spapr_rtas_register(RTAS_IBM_SET_XIVE, NULL, 0);
+    spapr_rtas_register(RTAS_IBM_GET_XIVE, NULL, 0);
+    spapr_rtas_register(RTAS_IBM_INT_OFF, NULL, 0);
+    spapr_rtas_register(RTAS_IBM_INT_ON, NULL, 0);
+
+    kvmppc_define_rtas_kernel_token(0, "ibm,set-xive");
+    kvmppc_define_rtas_kernel_token(0, "ibm,get-xive");
+    kvmppc_define_rtas_kernel_token(0, "ibm,int-on");
+    kvmppc_define_rtas_kernel_token(0, "ibm,int-off");
+
+    kvm_kernel_irqchip = false;
+    kvm_msi_via_irqfd_allowed = false;
+    kvm_gsi_direct_mapping = false;
+
+    /* Clear the presenter from the VCPUs */
+    kvm_disable_icps();
+
+    return rc;
+}
+
 static void xics_kvm_register_types(void)
 {
     type_register_static(&ics_kvm_info);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 36/37] spapr: check for KVM IRQ device activation
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (34 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 35/37] spapr: introduce routines to delete the KVM IRQ device Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 37/37] spapr: add KVM support to the 'dual' machine Cédric Le Goater
  2018-12-06  1:10 ` [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) no-reply
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The KVM IRQ device activation will depend on the interrupt mode chosen
at CAS time by the machine and some methods used at reset or by the
migration need to be protected.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive_kvm.c | 28 ++++++++++++++++++++++++++++
 hw/intc/xics_kvm.c       | 25 ++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index dba3344831c6..6135b8c11e63 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -94,9 +94,15 @@ static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error **errp)
 
 void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
 {
+    sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
     uint64_t state[4] = { 0 };
     int ret;
 
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return;
+    }
+
     ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
     if (ret != 0) {
         error_setg_errno(errp, errno, "Could capture KVM XIVE CPU %ld state",
@@ -132,6 +138,11 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
     unsigned long vcpu_id;
     int ret;
 
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return;
+    }
+
     /* Check if CPU was hot unplugged and replugged. */
     if (kvm_cpu_is_enabled(tctx->cs)) {
         return;
@@ -215,9 +226,13 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc)
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 {
     XiveSource *xsrc = opaque;
+    sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
     struct kvm_irq_level args;
     int rc;
 
+    /* The KVM XIVE device should be in use */
+    assert(xive->fd != -1);
+
     args.irq = srcno;
     if (!xive_source_irq_is_lsi(xsrc, srcno)) {
         if (!val) {
@@ -564,6 +579,11 @@ int kvmppc_xive_pre_save(sPAPRXive *xive)
     Error *local_err = NULL;
     CPUState *cs;
 
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return 0;
+    }
+
     /* Grab the EAT */
     kvmppc_xive_get_eas_state(xive, &local_err);
     if (local_err) {
@@ -596,6 +616,9 @@ int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
     Error *local_err = NULL;
     CPUState *cs;
 
+    /* The KVM XIVE device should be in use */
+    assert(xive->fd != -1);
+
     /* Restore the ENDT first. The targetting depends on it. */
     CPU_FOREACH(cs) {
         kvmppc_xive_set_eq_state(xive, cs, &local_err);
@@ -632,6 +655,11 @@ void kvmppc_xive_synchronize_state(sPAPRXive *xive)
     XiveSource *xsrc = &xive->source;
     CPUState *cs;
 
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return;
+    }
+
     /*
      * When the VM is stopped, the sources are masked and the previous
      * state is saved in anticipation of a migration. We should not
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 2a60ae71730b..4355c9d12160 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -68,6 +68,11 @@ static void icp_get_kvm_state(ICPState *icp)
     uint64_t state;
     int ret;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return;
+    }
+
     /* ICP for this CPU thread is not in use, exiting */
     if (!icp->cs) {
         return;
@@ -104,6 +109,11 @@ static int icp_set_kvm_state(ICPState *icp, int version_id)
     uint64_t state;
     int ret;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return 0;
+    }
+
     /* ICP for this CPU thread is not in use, exiting */
     if (!icp->cs) {
         return 0;
@@ -140,8 +150,8 @@ static void icp_kvm_connect(ICPState *icp, Error **errp)
     unsigned long vcpu_id;
     int ret;
 
+    /* The KVM XICS device is not in use */
     if (kernel_xics_fd == -1) {
-        abort();
         return;
     }
 
@@ -220,6 +230,11 @@ static void ics_get_kvm_state(ICSState *ics)
     uint64_t state;
     int i;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return;
+    }
+
     for (i = 0; i < ics->nr_irqs; i++) {
         ICSIRQState *irq = &ics->irqs[i];
 
@@ -279,6 +294,11 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
     int i;
     Error *local_err = NULL;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return 0;
+    }
+
     for (i = 0; i < ics->nr_irqs; i++) {
         ICSIRQState *irq = &ics->irqs[i];
         int ret;
@@ -325,6 +345,9 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int val)
     struct kvm_irq_level args;
     int rc;
 
+    /* The KVM XICS device should be in use */
+    assert(kernel_xics_fd != -1);
+
     args.irq = srcno + ics->offset;
     if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
         if (!val) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [PATCH v6 37/37] spapr: add KVM support to the 'dual' machine
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (35 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 36/37] spapr: check for KVM IRQ device activation Cédric Le Goater
@ 2018-12-05 23:22 ` Cédric Le Goater
  2018-12-06  1:10 ` [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) no-reply
  37 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-05 23:22 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt, Cédric Le Goater

The interrupt mode is chosen by the CAS negotiation process and
activated after a reset to take into account the required changes in
the machine. This brings new constraints on how the associated KVM IRQ
device is initialized.

Currently, each model takes care of the initialization of the KVM
device in their realize method but this is not possible anymore as the
initialization needs to be done globaly when the interrupt mode is
known, i.e. when machine is reseted. It also means that we need a way
to delete a KVM device when another mode is chosen.

Also, to support migration, the QEMU objects holding the state to
transfer should always be available but not necessarily activated.

The overall approach of this proposal is to initialize both interrupt
mode at the QEMU level and keep the IRQ number space in sync to allow
switching from one mode to another. For the KVM side of things, the
whole initialization of the KVM device, sources and presenters, is
grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
handlers are modified accordingly to handle the init and the delete
sequences of the KVM device. The post_load handlers also are, to take
into account a possible change of interrupt mode after transfer.

As KVM is now initialized at reset, we loose the possiblity to
fallback to the QEMU emulated mode in case of failure and failures
become fatal to the machine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive.c     |  8 +----
 hw/intc/spapr_xive_kvm.c | 26 +++++++++++++++
 hw/intc/xics_kvm.c       | 25 +++++++++++++++
 hw/intc/xive.c           |  4 ---
 hw/ppc/spapr_irq.c       | 68 +++++++++++++++++++++++++++-------------
 5 files changed, 98 insertions(+), 33 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 68d2a6fd8177..cdbcf27f9544 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -331,13 +331,7 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     xive->eat = g_new0(XiveEAS, xive->nr_irqs);
     xive->endt = g_new0(XiveEND, xive->nr_ends);
 
-    if (kvmppc_xive_enabled()) {
-        kvmppc_xive_connect(xive, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            return;
-        }
-    } else {
+    if (!kvmppc_xive_enabled()) {
         /* TIMA initialization */
         memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
                               "xive.tima", 4ull << TM_SHIFT);
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 6135b8c11e63..d7d499db1b5d 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -712,6 +712,14 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
     Error *local_err = NULL;
     size_t esb_len;
     size_t tima_len;
+    CPUState *cs;
+
+    /* The KVM XIVE device already in use. This is the case when
+     * rebooting XIVE -> XIVE
+     */
+    if (xive->fd != -1) {
+        return;
+    }
 
     if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
         error_setg(errp,
@@ -774,6 +782,24 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
     xive->change = qemu_add_vm_change_state_handler(
         kvmppc_xive_change_state_handler, xive);
 
+    /* Connect the presenters to the initial VCPUs of the machine */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        kvmppc_xive_cpu_connect(XIVE_TCTX(cpu->intc), &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
+    /* Update the KVM sources */
+    kvmppc_xive_source_reset(xsrc, &local_err);
+    if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+    }
+
     kvm_kernel_irqchip = true;
     kvm_msi_via_irqfd_allowed = true;
     kvm_gsi_direct_mapping = true;
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 4355c9d12160..393e4e0bd79c 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -431,6 +431,15 @@ static void rtas_dummy(PowerPCCPU *cpu, sPAPRMachineState *spapr,
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
 {
     int rc;
+    CPUState *cs;
+    Error *local_err = NULL;
+
+    /* The KVM XICS device already in use. This is the case when
+     * rebooting XICS -> XICS
+     */
+    if (kernel_xics_fd != -1) {
+        return 0;
+    }
 
     if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
         error_setg(errp,
@@ -479,6 +488,22 @@ int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
     kvm_msi_via_irqfd_allowed = true;
     kvm_gsi_direct_mapping = true;
 
+    /* Connect the presenters to the initial VCPUs of the machine */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        ICPState *icp = ICP(cpu->intc);
+
+        icp_kvm_connect(icp, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            goto fail;
+        }
+        icp_set_kvm_state(icp, 1);
+    }
+
+    /* Update the KVM sources */
+    ics_set_kvm_state(ICS_KVM(spapr->ics), 1);
+
     return 0;
 
 fail:
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index cde46b5ca161..cb07a7f36f8c 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -958,10 +958,6 @@ static void xive_source_reset(void *dev)
 
     /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
     memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
-
-    if (kvmppc_xive_enabled()) {
-        kvmppc_xive_source_reset(xsrc, &error_fatal);
-    }
 }
 
 static void xive_source_realize(DeviceState *dev, Error **errp)
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 157c335f6f8d..0b5b3469b899 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -98,20 +98,14 @@ static void spapr_irq_init_xics(sPAPRMachineState *spapr, int nr_irqs,
     MachineState *machine = MACHINE(spapr);
     Error *local_err = NULL;
 
-    if (kvm_enabled()) {
-        if (machine_kernel_irqchip_allowed(machine) &&
-            !xics_kvm_init(spapr, &local_err)) {
-            spapr->icp_type = TYPE_KVM_ICP;
-            spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs,
-                                          &local_err);
-        }
-        if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
-            error_prepend(&local_err,
-                          "kernel_irqchip requested but unavailable: ");
-            goto error;
+    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
+        spapr->icp_type = TYPE_KVM_ICP;
+        spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs,
+                                      &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
         }
-        error_free(local_err);
-        local_err = NULL;
     }
 
     if (!spapr->ics) {
@@ -119,10 +113,11 @@ static void spapr_irq_init_xics(sPAPRMachineState *spapr, int nr_irqs,
         spapr->icp_type = TYPE_ICP;
         spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs,
                                       &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
     }
-
-error:
-    error_propagate(errp, local_err);
 }
 
 #define ICS_IRQ_FREE(ics, srcno)   \
@@ -211,7 +206,9 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id)
 
 static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
 {
+    MachineState *machine = MACHINE(spapr);
     CPUState *cs;
+    Error *local_err = NULL;
 
     CPU_FOREACH(cs) {
         spapr_cpu_core_set_intc(POWERPC_CPU(cs), spapr->icp_type);
@@ -221,6 +218,22 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
     if (spapr->xive) {
         spapr_xive_enable_mmio(spapr->xive, false);
     }
+
+    /* Get rid of the KVM XIVE device and activate XICS */
+    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
+        kvmppc_xive_disconnect(spapr->xive, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            error_prepend(errp, "KVM XIVE disconnect failed: ");
+            return;
+        }
+        xics_kvm_init(spapr, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            error_prepend(errp, "KVM XICS connect failed: ");
+            return;
+        }
+    }
 }
 
 #define SPAPR_IRQ_XICS_NR_IRQS     0x1000
@@ -360,6 +373,7 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
 {
     CPUState *cs;
+    Error *local_err = NULL;
 
     CPU_FOREACH(cs) {
         spapr_cpu_core_set_intc(POWERPC_CPU(cs), TYPE_XIVE_TCTX);
@@ -371,6 +385,22 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
      */
     spapr_xive_reset_tctx(spapr->xive);
 
+    /* Get rid of the KVM XICS device and activate XIVE */
+    if (kvmppc_xive_enabled()) {
+        xics_kvm_disconnect(spapr, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            error_prepend(errp, "KVM XICS disconnect failed: ");
+            return;
+        }
+        kvmppc_xive_connect(spapr->xive, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            error_prepend(errp, "KVM XIVE connect failed: ");
+            return;
+        }
+    }
+
     /* Activate the XIVE MMIOs */
     spapr_xive_enable_mmio(spapr->xive, true);
 }
@@ -421,14 +451,8 @@ static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
 static void spapr_irq_init_dual(sPAPRMachineState *spapr, int nr_irqs,
                                 Error **errp)
 {
-    MachineState *machine = MACHINE(spapr);
     Error *local_err = NULL;
 
-    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
-        error_setg(errp, "No KVM support for the 'dual' machine");
-        return;
-    }
-
     spapr_irq_xics.init(spapr, spapr_irq_xics.nr_irqs, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9)
  2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
                   ` (36 preceding siblings ...)
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 37/37] spapr: add KVM support to the 'dual' machine Cédric Le Goater
@ 2018-12-06  1:10 ` no-reply
  2018-12-06  6:14   ` Cédric Le Goater
  37 siblings, 1 reply; 66+ messages in thread
From: no-reply @ 2018-12-06  1:10 UTC (permalink / raw)
  To: clg; +Cc: famz, david, qemu-ppc, qemu-devel

Patchew URL: https://patchew.org/QEMU/20181205232251.10446-1-clg@kaod.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20181205232251.10446-1-clg@kaod.org
Subject: [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9)
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
b29ce00 spapr: add KVM support to the 'dual' machine
957ca7b spapr: check for KVM IRQ device activation
20c8779 spapr: introduce routines to delete the KVM IRQ device
6577bbe sysbus: add a sysbus_mmio_unmap() helper
9ae96ee spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
643bf4b ppc/xics: introduce a icp_kvm_connect() routine
f727182 spapr: add a 'pseries-3.1-dual' machine type
d69c736 spapr/xive: enable XIVE MMIOs at reset
114495a spapr: set the interrupt presenter at reset
d28fbd6 spapr/xive: fix migration of the XiveTCTX under TCG
af33d19 spapr/xive: add migration support for KVM
ca0d71d spapr/xive: introduce a VM state change handler
6b44f52 spapr/xive: add state synchronization with KVM
543c65b spapr/xive: add KVM support
11b9fc0 linux-headers: update to 4.20-rc5
de7e220 spapr: add a 'pseries-3.1-xive' machine type
9a3ad0b spapr: add a 'reset' method to the sPAPR IRQ backend
e414def spapr: extend the sPAPR IRQ backend for XICS migration
7ee593a spapr: allocate the interrupt thread context under the CPU core
5830c26 spapr: add device tree support for the XIVE exploitation mode
7c6ed94 spapr: add hcalls support for the XIVE exploitation interrupt mode
8b0265d spapr: introdude a new machine IRQ backend for XIVE
7438740 spapr: export and rename the xics_max_server_number() routine
6c85b1a spapr: modify the irq backend 'init' method
cbdc5f2 spapr: introduce a spapr_irq_init() routine
7b96ab4 spapr: initialize VSMT before initializing the IRQ backend
99ed715 spapr/xive: use the VCPU id as a NVT identifier
12a7762 spapr/xive: introduce a XIVE interrupt controller
043a1e5 ppc/xive: notify the CPU when the interrupt priority is more privileged
4e2626f ppc/xive: introduce a simplified XIVE presenter
58d0fa5 ppc/xive: introduce the XIVE interrupt thread context
28efbe5 ppc/xive: add support for the END Event State buffers
3f3bd6c ppc/xive: introduce the XIVE Event Notification Descriptors
871c908 ppc/xive: introduce the XiveRouter model
caf164f ppc/xive: introduce the XiveNotifier interface
eef5a2f ppc/xive: add support for the LSI interrupt sources
7c9d387 ppc/xive: introduce a XIVE interrupt source model

=== OUTPUT BEGIN ===
Checking PATCH 1/37: ppc/xive: introduce a XIVE interrupt source model...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#60: 
new file mode 100644

total: 0 errors, 1 warnings, 656 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 2/37: ppc/xive: add support for the LSI interrupt sources...
Checking PATCH 3/37: ppc/xive: introduce the XiveNotifier interface...
Checking PATCH 4/37: ppc/xive: introduce the XiveRouter model...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#169: 
new file mode 100644

total: 0 errors, 1 warnings, 179 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 5/37: ppc/xive: introduce the XIVE Event Notification Descriptors...
Checking PATCH 6/37: ppc/xive: add support for the END Event State buffers...
Checking PATCH 7/37: ppc/xive: introduce the XIVE interrupt thread context...
Checking PATCH 8/37: ppc/xive: introduce a simplified XIVE presenter...
Checking PATCH 9/37: ppc/xive: notify the CPU when the interrupt priority is more privileged...
Checking PATCH 10/37: spapr/xive: introduce a XIVE interrupt controller...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#57: 
new file mode 100644

total: 0 errors, 1 warnings, 425 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 11/37: spapr/xive: use the VCPU id as a NVT identifier...
Checking PATCH 12/37: spapr: initialize VSMT before initializing the IRQ backend...
Checking PATCH 13/37: spapr: introduce a spapr_irq_init() routine...
Checking PATCH 14/37: spapr: modify the irq backend 'init' method...
Checking PATCH 15/37: spapr: export and rename the xics_max_server_number() routine...
Checking PATCH 16/37: spapr: introdude a new machine IRQ backend for XIVE...
Checking PATCH 17/37: spapr: add hcalls support for the XIVE exploitation interrupt mode...
Checking PATCH 18/37: spapr: add device tree support for the XIVE exploitation mode...
Checking PATCH 19/37: spapr: allocate the interrupt thread context under the CPU core...
Checking PATCH 20/37: spapr: extend the sPAPR IRQ backend for XICS migration...
Checking PATCH 21/37: spapr: add a 'reset' method to the sPAPR IRQ backend...
Checking PATCH 22/37: spapr: add a 'pseries-3.1-xive' machine type...
Checking PATCH 23/37: linux-headers: update to 4.20-rc5...
Checking PATCH 24/37: spapr/xive: add KVM support...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#104: 
new file mode 100644

total: 0 errors, 1 warnings, 509 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 25/37: spapr/xive: add state synchronization with KVM...
Checking PATCH 26/37: spapr/xive: introduce a VM state change handler...
ERROR: spaces required around that '*' (ctx:WxV)
#38: FILE: hw/intc/spapr_xive_kvm.c:341:
+ static void kvmppc_xive_sync_all(sPAPRXive *xive, Error **errp)
                                             ^

total: 1 errors, 0 warnings, 135 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 27/37: spapr/xive: add migration support for KVM...
Checking PATCH 28/37: spapr/xive: fix migration of the XiveTCTX under TCG...
Checking PATCH 29/37: spapr: set the interrupt presenter at reset...
Checking PATCH 30/37: spapr/xive: enable XIVE MMIOs at reset...
Checking PATCH 31/37: spapr: add a 'pseries-3.1-dual' machine type...
Checking PATCH 32/37: ppc/xics: introduce a icp_kvm_connect() routine...
Checking PATCH 33/37: spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers...
Checking PATCH 34/37: sysbus: add a sysbus_mmio_unmap() helper...
Checking PATCH 35/37: spapr: introduce routines to delete the KVM IRQ device...
Checking PATCH 36/37: spapr: check for KVM IRQ device activation...
Checking PATCH 37/37: spapr: add KVM support to the 'dual' machine...
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20181205232251.10446-1-clg@kaod.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface Cédric Le Goater
@ 2018-12-06  3:25   ` David Gibson
  2018-12-06  6:17     ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-06  3:25 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3753 bytes --]

On Thu, Dec 06, 2018 at 12:22:17AM +0100, Cédric Le Goater wrote:
> The XiveNotifier offers a simple interface, between the XiveSource
> object and the main interrupt controller of the machine. It will
> forward event notifications to the XIVE Interrupt Virtualization
> Routing Engine (IVRE).
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/xive.h | 23 +++++++++++++++++++++++
>  hw/intc/xive.c        | 25 +++++++++++++++++++++++++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 7cebc32eba4c..6770cffec67d 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -142,6 +142,27 @@
>  
>  #include "hw/qdev-core.h"
>  
> +/*
> + * XIVE Fabric (Interface between Source and Router)
> + */
> +
> +typedef struct XiveNotifier {
> +    Object parent;
> +} XiveNotifier;
> +
> +#define TYPE_XIVE_NOTIFIER "xive-fabric"

I'm applying this, but changing the string here from "xive-fabric" to
"xive-notifier".


> +#define XIVE_NOTIFIER(obj)                                     \
> +    OBJECT_CHECK(XiveNotifier, (obj), TYPE_XIVE_NOTIFIER)
> +#define XIVE_NOTIFIER_CLASS(klass)                                     \
> +    OBJECT_CLASS_CHECK(XiveNotifierClass, (klass), TYPE_XIVE_NOTIFIER)
> +#define XIVE_NOTIFIER_GET_CLASS(obj)                                   \
> +    OBJECT_GET_CLASS(XiveNotifierClass, (obj), TYPE_XIVE_NOTIFIER)
> +
> +typedef struct XiveNotifierClass {
> +    InterfaceClass parent;
> +    void (*notify)(XiveNotifier *xn, uint32_t lisn);
> +} XiveNotifierClass;
> +
>  /*
>   * XIVE Interrupt Source
>   */
> @@ -171,6 +192,8 @@ typedef struct XiveSource {
>      uint64_t        esb_flags;
>      uint32_t        esb_shift;
>      MemoryRegion    esb_mmio;
> +
> +    XiveNotifier    *xive;
>  } XiveSource;
>  
>  /*
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 11c7aac962de..79238eb57fae 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -155,7 +155,11 @@ static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno)
>   */
>  static void xive_source_notify(XiveSource *xsrc, int srcno)
>  {
> +    XiveNotifierClass *xnc = XIVE_NOTIFIER_GET_CLASS(xsrc->xive);
>  
> +    if (xnc->notify) {
> +        xnc->notify(xsrc->xive, srcno);
> +    }
>  }
>  
>  /*
> @@ -362,6 +366,17 @@ static void xive_source_reset(void *dev)
>  static void xive_source_realize(DeviceState *dev, Error **errp)
>  {
>      XiveSource *xsrc = XIVE_SOURCE(dev);
> +    Object *obj;
> +    Error *local_err = NULL;
> +
> +    obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
> +    if (!obj) {
> +        error_propagate(errp, local_err);
> +        error_prepend(errp, "required link 'xive' not found: ");
> +        return;
> +    }
> +
> +    xsrc->xive = XIVE_NOTIFIER(obj);
>  
>      if (!xsrc->nr_irqs) {
>          error_setg(errp, "Number of interrupt needs to be greater than 0");
> @@ -428,9 +443,19 @@ static const TypeInfo xive_source_info = {
>      .class_init    = xive_source_class_init,
>  };
>  
> +/*
> + * XIVE Fabric
> + */
> +static const TypeInfo xive_fabric_info = {
> +    .name = TYPE_XIVE_NOTIFIER,
> +    .parent = TYPE_INTERFACE,
> +    .class_size = sizeof(XiveNotifierClass),
> +};
> +
>  static void xive_register_types(void)
>  {
>      type_register_static(&xive_source_info);
> +    type_register_static(&xive_fabric_info);
>  }
>  
>  type_init(xive_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model Cédric Le Goater
@ 2018-12-06  3:41   ` David Gibson
  2018-12-06  6:22     ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-06  3:41 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 8263 bytes --]

On Thu, Dec 06, 2018 at 12:22:18AM +0100, Cédric Le Goater wrote:
> The XiveRouter models the second sub-engine of the XIVE architecture :
> the Interrupt Virtualization Routing Engine (IVRE).
> 
> The IVRE handles event notifications of the IVSE and performs the
> interrupt routing process. For this purpose, it uses a set of tables
> stored in system memory, the first of which being the Event Assignment
> Structure (EAS) table.
> 
> The EAT associates an interrupt source number with an Event Notification
> Descriptor (END) which will be used in a second phase of the routing
> process to identify a Notification Virtual Target.
> 
> The XiveRouter is an abstract class which needs to be inherited from
> to define a storage for the EAT, and other upcoming tables.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/xive.h      | 31 ++++++++++++++++
>  include/hw/ppc/xive_regs.h | 50 +++++++++++++++++++++++++
>  hw/intc/xive.c             | 76 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 157 insertions(+)
>  create mode 100644 include/hw/ppc/xive_regs.h
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 6770cffec67d..57ec9f84f527 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -141,6 +141,8 @@
>  #define PPC_XIVE_H
>  
>  #include "hw/qdev-core.h"
> +#include "hw/sysbus.h"
> +#include "hw/ppc/xive_regs.h"
>  
>  /*
>   * XIVE Fabric (Interface between Source and Router)
> @@ -297,4 +299,33 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
>      }
>  }
>  
> +/*
> + * XIVE Router
> + */
> +
> +typedef struct XiveRouter {
> +    SysBusDevice    parent;

I thought the plan was to make XiveRouter as well as XiveSource a
TYPE_DEVICE descendent rather than a SysBusDevice?

> +} XiveRouter;
> +
> +#define TYPE_XIVE_ROUTER "xive-router"
> +#define XIVE_ROUTER(obj)                                \
> +    OBJECT_CHECK(XiveRouter, (obj), TYPE_XIVE_ROUTER)
> +#define XIVE_ROUTER_CLASS(klass)                                        \
> +    OBJECT_CLASS_CHECK(XiveRouterClass, (klass), TYPE_XIVE_ROUTER)
> +#define XIVE_ROUTER_GET_CLASS(obj)                              \
> +    OBJECT_GET_CLASS(XiveRouterClass, (obj), TYPE_XIVE_ROUTER)
> +
> +typedef struct XiveRouterClass {
> +    SysBusDeviceClass parent;
> +
> +    /* XIVE table accessors */
> +    int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
> +                   XiveEAS *eas);
> +} XiveRouterClass;
> +
> +void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> +
> +int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
> +                        XiveEAS *eas);
> +
>  #endif /* PPC_XIVE_H */
> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> new file mode 100644
> index 000000000000..15f2470ed9cc
> --- /dev/null
> +++ b/include/hw/ppc/xive_regs.h
> @@ -0,0 +1,50 @@
> +/*
> + * QEMU PowerPC XIVE internal structure definitions
> + *
> + *
> + * The XIVE structures are accessed by the HW and their format is
> + * architected to be big-endian. Some macros are provided to ease
> + * access to the different fields.
> + *
> + *
> + * Copyright (c) 2016-2018, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PPC_XIVE_REGS_H
> +#define PPC_XIVE_REGS_H
> +
> +/*
> + * Interrupt source number encoding on PowerBUS
> + */
> +#define XIVE_SRCNO_BLOCK(srcno) (((srcno) >> 28) & 0xf)
> +#define XIVE_SRCNO_INDEX(srcno) ((srcno) & 0x0fffffff)
> +#define XIVE_SRCNO(blk, idx)    ((uint32_t)(blk) << 28 | (idx))
> +
> +/* EAS (Event Assignment Structure)
> + *
> + * One per interrupt source. Targets an interrupt to a given Event
> + * Notification Descriptor (END) and provides the corresponding
> + * logical interrupt number (END data)
> + */
> +typedef struct XiveEAS {
> +        /* Use a single 64-bit definition to make it easier to
> +         * perform atomic updates
> +         */
> +        uint64_t        w;
> +#define EAS_VALID       PPC_BIT(0)
> +#define EAS_END_BLOCK   PPC_BITMASK(4, 7)        /* Destination END block# */
> +#define EAS_END_INDEX   PPC_BITMASK(8, 31)       /* Destination END index */
> +#define EAS_MASKED      PPC_BIT(32)              /* Masked */
> +#define EAS_END_DATA    PPC_BITMASK(33, 63)      /* Data written to the END */
> +} XiveEAS;
> +
> +#define xive_eas_is_valid(eas)   (be64_to_cpu((eas)->w) & EAS_VALID)
> +#define xive_eas_is_masked(eas)  (be64_to_cpu((eas)->w) & EAS_MASKED)
> +
> +#define GETFIELD_BE64(m, v)      GETFIELD(m, be64_to_cpu(v))
> +#define SETFIELD_BE64(m, v, val) cpu_to_be64(SETFIELD(m, be64_to_cpu(v), val))
> +
> +#endif /* PPC_XIVE_REGS_H */
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 79238eb57fae..d21df6674d8c 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -443,6 +443,81 @@ static const TypeInfo xive_source_info = {
>      .class_init    = xive_source_class_init,
>  };
>  
> +/*
> + * XIVE Router (aka. Virtualization Controller or IVRE)
> + */
> +
> +int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
> +                        XiveEAS *eas)
> +{
> +    XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +    return xrc->get_eas(xrtr, eas_blk, eas_idx, eas);
> +}

Didn't spot this before, but it probably makes sense for such a
trivial wrapper to be a static inline in the header.

> +
> +static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
> +{
> +    XiveRouter *xrtr = XIVE_ROUTER(xn);
> +    uint8_t eas_blk = XIVE_SRCNO_BLOCK(lisn);
> +    uint32_t eas_idx = XIVE_SRCNO_INDEX(lisn);
> +    XiveEAS eas;
> +
> +    /* EAS cache lookup */
> +    if (xive_router_get_eas(xrtr, eas_blk, eas_idx, &eas)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %x\n", lisn);
> +        return;
> +    }
> +
> +    /* The IVRE checks the State Bit Cache at this point. We skip the
> +     * SBC lookup because the state bits of the sources are modeled
> +     * internally in QEMU.
> +     */
> +
> +    if (!xive_eas_is_valid(&eas)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %x\n", lisn);
> +        return;
> +    }
> +
> +    if (xive_eas_is_masked(&eas)) {
> +        /* Notification completed */
> +        return;
> +    }
> +}
> +
> +static void xive_router_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    XiveNotifierClass *xnc = XIVE_NOTIFIER_CLASS(klass);
> +
> +    dc->desc    = "XIVE Router Engine";
> +    xnc->notify = xive_router_notify;
> +}
> +
> +static const TypeInfo xive_router_info = {
> +    .name          = TYPE_XIVE_ROUTER,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .abstract      = true,
> +    .class_size    = sizeof(XiveRouterClass),
> +    .class_init    = xive_router_class_init,
> +    .interfaces    = (InterfaceInfo[]) {
> +        { TYPE_XIVE_NOTIFIER },
> +        { }
> +    }
> +};
> +
> +void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon)
> +{
> +    if (!xive_eas_is_valid(eas)) {
> +        return;
> +    }
> +
> +    monitor_printf(mon, "  %08x %s end:%02x/%04x data:%08x\n",
> +                   lisn, xive_eas_is_masked(eas) ? "M" : " ",
> +                   (uint8_t)  GETFIELD_BE64(EAS_END_BLOCK, eas->w),
> +                   (uint32_t) GETFIELD_BE64(EAS_END_INDEX, eas->w),
> +                   (uint32_t) GETFIELD_BE64(EAS_END_DATA, eas->w));
> +}
> +
>  /*
>   * XIVE Fabric
>   */
> @@ -456,6 +531,7 @@ static void xive_register_types(void)
>  {
>      type_register_static(&xive_source_info);
>      type_register_static(&xive_fabric_info);
> +    type_register_static(&xive_router_info);
>  }
>  
>  type_init(xive_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 05/37] ppc/xive: introduce the XIVE Event Notification Descriptors
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 05/37] ppc/xive: introduce the XIVE Event Notification Descriptors Cédric Le Goater
@ 2018-12-06  3:56   ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-06  3:56 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 14028 bytes --]

On Thu, Dec 06, 2018 at 12:22:19AM +0100, Cédric Le Goater wrote:
> To complete the event routing, the IVRE sub-engine uses a second table
> containing Event Notification Descriptor (END) structures.
> 
> An END specifies on which Event Queue (EQ) the event notification
> data, defined in the associated EAS, should be posted when an
> exception occurs. It also defines which Notification Virtual Target
> (NVT) should be notified.
> 
> The Event Queue is a memory page provided by the O/S defining a
> circular buffer, one per server and priority couple, containing Event
> Queue entries. These are 4 bytes long, the first bit being a
> 'generation' bit and the 31 following bits the END Data field. They
> are pulled by the O/S when the exception occurs.
> 
> The END Data field is a way to set an invariant logical event source
> number for an IRQ. On sPAPR machines, it is set with the
> H_INT_SET_SOURCE_CONFIG hcall when the EISN flag is used.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/hw/ppc/xive.h      |  18 ++++
>  include/hw/ppc/xive_regs.h |  57 ++++++++++++
>  hw/intc/xive.c             | 174 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 249 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 57ec9f84f527..d1b4c6c78ec5 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -321,11 +321,29 @@ typedef struct XiveRouterClass {
>      /* XIVE table accessors */
>      int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
>                     XiveEAS *eas);
> +    int (*get_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> +                   XiveEND *end);
> +    int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> +                     XiveEND *end, uint8_t word_number);

I'm not sure if this is the best interface long term, but it doesn't
impact on public or migration interfaces, so I'm happy to run with it
for the time being.

>  } XiveRouterClass;
>  
>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
>  
>  int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
>                          XiveEAS *eas);
> +int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> +                        XiveEND *end);
> +int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> +                          XiveEND *end, uint8_t word_number);
> +
> +/*
> + * For legacy compatibility, the exceptions define up to 256 different
> + * priorities. P9 implements only 9 levels : 8 active levels [0 - 7]
> + * and the least favored level 0xFF.
> + */
> +#define XIVE_PRIORITY_MAX  7
> +
> +void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon);
> +void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon);
>  
>  #endif /* PPC_XIVE_H */
> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> index 15f2470ed9cc..3c0ebad18b69 100644
> --- a/include/hw/ppc/xive_regs.h
> +++ b/include/hw/ppc/xive_regs.h
> @@ -47,4 +47,61 @@ typedef struct XiveEAS {
>  #define GETFIELD_BE64(m, v)      GETFIELD(m, be64_to_cpu(v))
>  #define SETFIELD_BE64(m, v, val) cpu_to_be64(SETFIELD(m, be64_to_cpu(v), val))
>  
> +/* Event Notification Descriptor (END) */
> +typedef struct XiveEND {
> +        uint32_t        w0;
> +#define END_W0_VALID             PPC_BIT32(0) /* "v" bit */
> +#define END_W0_ENQUEUE           PPC_BIT32(1) /* "q" bit */
> +#define END_W0_UCOND_NOTIFY      PPC_BIT32(2) /* "n" bit */
> +#define END_W0_BACKLOG           PPC_BIT32(3) /* "b" bit */
> +#define END_W0_PRECL_ESC_CTL     PPC_BIT32(4) /* "p" bit */
> +#define END_W0_ESCALATE_CTL      PPC_BIT32(5) /* "e" bit */
> +#define END_W0_UNCOND_ESCALATE   PPC_BIT32(6) /* "u" bit - DD2.0 */
> +#define END_W0_SILENT_ESCALATE   PPC_BIT32(7) /* "s" bit - DD2.0 */
> +#define END_W0_QSIZE             PPC_BITMASK32(12, 15)
> +#define END_W0_SW0               PPC_BIT32(16)
> +#define END_W0_FIRMWARE          END_W0_SW0 /* Owned by FW */
> +#define END_QSIZE_4K             0
> +#define END_QSIZE_64K            4
> +#define END_W0_HWDEP             PPC_BITMASK32(24, 31)
> +        uint32_t        w1;
> +#define END_W1_ESn               PPC_BITMASK32(0, 1)
> +#define END_W1_ESn_P             PPC_BIT32(0)
> +#define END_W1_ESn_Q             PPC_BIT32(1)
> +#define END_W1_ESe               PPC_BITMASK32(2, 3)
> +#define END_W1_ESe_P             PPC_BIT32(2)
> +#define END_W1_ESe_Q             PPC_BIT32(3)
> +#define END_W1_GENERATION        PPC_BIT32(9)
> +#define END_W1_PAGE_OFF          PPC_BITMASK32(10, 31)
> +        uint32_t        w2;
> +#define END_W2_MIGRATION_REG     PPC_BITMASK32(0, 3)
> +#define END_W2_OP_DESC_HI        PPC_BITMASK32(4, 31)
> +        uint32_t        w3;
> +#define END_W3_OP_DESC_LO        PPC_BITMASK32(0, 31)
> +        uint32_t        w4;
> +#define END_W4_ESC_END_BLOCK     PPC_BITMASK32(4, 7)
> +#define END_W4_ESC_END_INDEX     PPC_BITMASK32(8, 31)
> +        uint32_t        w5;
> +#define END_W5_ESC_END_DATA      PPC_BITMASK32(1, 31)
> +        uint32_t        w6;
> +#define END_W6_FORMAT_BIT        PPC_BIT32(8)
> +#define END_W6_NVT_BLOCK         PPC_BITMASK32(9, 12)
> +#define END_W6_NVT_INDEX         PPC_BITMASK32(13, 31)
> +        uint32_t        w7;
> +#define END_W7_F0_IGNORE         PPC_BIT32(0)
> +#define END_W7_F0_BLK_GROUPING   PPC_BIT32(1)
> +#define END_W7_F0_PRIORITY       PPC_BITMASK32(8, 15)
> +#define END_W7_F1_WAKEZ          PPC_BIT32(0)
> +#define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> +} XiveEND;
> +
> +#define xive_end_is_valid(end)    (be32_to_cpu((end)->w0) & END_W0_VALID)
> +#define xive_end_is_enqueue(end)  (be32_to_cpu((end)->w0) & END_W0_ENQUEUE)
> +#define xive_end_is_notify(end)   (be32_to_cpu((end)->w0) & END_W0_UCOND_NOTIFY)
> +#define xive_end_is_backlog(end)  (be32_to_cpu((end)->w0) & END_W0_BACKLOG)
> +#define xive_end_is_escalate(end) (be32_to_cpu((end)->w0) & END_W0_ESCALATE_CTL)
> +
> +#define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
> +#define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
> +
>  #endif /* PPC_XIVE_REGS_H */
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index d21df6674d8c..41d8ba1540d0 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -443,6 +443,95 @@ static const TypeInfo xive_source_info = {
>      .class_init    = xive_source_class_init,
>  };
>  
> +/*
> + * XiveEND helpers
> + */
> +
> +void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon)
> +{
> +    uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
> +        | be32_to_cpu(end->w3);
> +    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
> +    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
> +    uint32_t qentries = 1 << (qsize + 10);
> +    int i;
> +
> +    /*
> +     * print out the [ (qindex - (width - 1)) .. (qindex + 1)] window
> +     */
> +    monitor_printf(mon, " [ ");
> +    qindex = (qindex - (width - 1)) & (qentries - 1);
> +    for (i = 0; i < width; i++) {
> +        uint64_t qaddr = qaddr_base + (qindex << 2);
> +        uint32_t qdata = -1;
> +
> +        if (dma_memory_read(&address_space_memory, qaddr, &qdata,
> +                            sizeof(qdata))) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to read EQ @0x%"
> +                          HWADDR_PRIx "\n", qaddr);
> +            return;
> +        }
> +        monitor_printf(mon, "%s%08x ", i == width - 1 ? "^" : "",
> +                       be32_to_cpu(qdata));
> +        qindex = (qindex + 1) & (qentries - 1);
> +    }
> +}
> +
> +void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon)
> +{
> +    uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
> +        | be32_to_cpu(end->w3);
> +    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
> +    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
> +    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
> +    uint32_t qentries = 1 << (qsize + 10);
> +
> +    uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
> +    uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
> +
> +    if (!xive_end_is_valid(end)) {
> +        return;
> +    }
> +
> +    monitor_printf(mon, "  %08x %c%c%c%c%c prio:%d nvt:%04x eq:@%08"PRIx64
> +                   "% 6d/%5d ^%d", end_idx,
> +                   xive_end_is_valid(end)    ? 'v' : '-',
> +                   xive_end_is_enqueue(end)  ? 'q' : '-',
> +                   xive_end_is_notify(end)   ? 'n' : '-',
> +                   xive_end_is_backlog(end)  ? 'b' : '-',
> +                   xive_end_is_escalate(end) ? 'e' : '-',
> +                   priority, nvt, qaddr_base, qindex, qentries, qgen);
> +
> +    xive_end_queue_pic_print_info(end, 6, mon);
> +    monitor_printf(mon, "]\n");
> +}
> +
> +static void xive_end_enqueue(XiveEND *end, uint32_t data)
> +{
> +    uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32
> +        | be32_to_cpu(end->w3);
> +    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
> +    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
> +    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
> +
> +    uint64_t qaddr = qaddr_base + (qindex << 2);
> +    uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fffffff));
> +    uint32_t qentries = 1 << (qsize + 10);
> +
> +    if (dma_memory_write(&address_space_memory, qaddr, &qdata, sizeof(qdata))) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to write END data @0x%"
> +                      HWADDR_PRIx "\n", qaddr);
> +        return;
> +    }
> +
> +    qindex = (qindex + 1) & (qentries - 1);
> +    if (qindex == 0) {
> +        qgen ^= 1;
> +        end->w1 = SETFIELD_BE32(END_W1_GENERATION, end->w1, qgen);
> +    }
> +    end->w1 = SETFIELD_BE32(END_W1_PAGE_OFF, end->w1, qindex);
> +}
> +
>  /*
>   * XIVE Router (aka. Virtualization Controller or IVRE)
>   */
> @@ -455,6 +544,83 @@ int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
>      return xrc->get_eas(xrtr, eas_blk, eas_idx, eas);
>  }
>  
> +int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> +                        XiveEND *end)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->get_end(xrtr, end_blk, end_idx, end);
> +}
> +
> +int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> +                          XiveEND *end, uint8_t word_number)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
> +}
> +
> +/*
> + * An END trigger can come from an event trigger (IPI or HW) or from
> + * another chip. We don't model the PowerBus but the END trigger
> + * message has the same parameters than in the function below.
> + */
> +static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
> +                                   uint32_t end_idx, uint32_t end_data)
> +{
> +    XiveEND end;
> +    uint8_t priority;
> +    uint8_t format;
> +
> +    /* END cache lookup */
> +    if (xive_router_get_end(xrtr, end_blk, end_idx, &end)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No END %x/%x\n", end_blk,
> +                      end_idx);
> +        return;
> +    }
> +
> +    if (!xive_end_is_valid(&end)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: END %x/%x is invalid\n",
> +                      end_blk, end_idx);
> +        return;
> +    }
> +
> +    if (xive_end_is_enqueue(&end)) {
> +        xive_end_enqueue(&end, end_data);
> +        /* Enqueuing event data modifies the EQ toggle and index */
> +        xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
> +    }
> +
> +    /*
> +     * The W7 format depends on the F bit in W6. It defines the type
> +     * of the notification :
> +     *
> +     *   F=0 : single or multiple NVT notification
> +     *   F=1 : User level Event-Based Branch (EBB) notification, no
> +     *         priority
> +     */
> +    format = GETFIELD_BE32(END_W6_FORMAT_BIT, end.w6);
> +    priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end.w7);
> +
> +    /* The END is masked */
> +    if (format == 0 && priority == 0xff) {
> +        return;
> +    }
> +
> +    /*
> +     * Check the END ESn (Event State Buffer for notification) for
> +     * even futher coalescing in the Router
> +     */
> +    if (!xive_end_is_notify(&end)) {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> +        return;
> +    }
> +
> +    /*
> +     * Follows IVPE notification
> +     */
> +}
> +
>  static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
>  {
>      XiveRouter *xrtr = XIVE_ROUTER(xn);
> @@ -482,6 +648,14 @@ static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
>          /* Notification completed */
>          return;
>      }
> +
> +    /*
> +     * The event trigger becomes an END trigger
> +     */
> +    xive_router_end_notify(xrtr,
> +                           GETFIELD_BE64(EAS_END_BLOCK, eas.w),
> +                           GETFIELD_BE64(EAS_END_INDEX, eas.w),
> +                           GETFIELD_BE64(EAS_END_DATA,  eas.w));
>  }
>  
>  static void xive_router_class_init(ObjectClass *klass, void *data)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers Cédric Le Goater
@ 2018-12-06  4:09   ` David Gibson
  2018-12-06  6:30     ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-06  4:09 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 10262 bytes --]

On Thu, Dec 06, 2018 at 12:22:20AM +0100, Cédric Le Goater wrote:
> The Event Notification Descriptor (END) XIVE structure also contains
> two Event State Buffers providing further coalescing of interrupts,
> one for the notification event (ESn) and one for the escalation events
> (ESe). A MMIO page is assigned for each to control the EOI through
> loads only. Stores are not allowed.
> 
> The END ESBs are modeled through an object resembling the 'XiveSource'
> It is stateless as the END state bits are backed into the XiveEND
> structure under the XiveRouter and the MMIO accesses follow the same
> rules as for the standard source ESBs.
> 
> END ESBs are not supported by the Linux drivers neither on OPAL nor on
> sPAPR. Nevetherless, it provides a mean to study the question in the
> future and validates a bit more the XIVE model.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/xive.h |  22 ++++++
>  hw/intc/xive.c        | 173 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 193 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index d1b4c6c78ec5..d67b0785df7c 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -305,6 +305,8 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
>  
>  typedef struct XiveRouter {
>      SysBusDevice    parent;
> +
> +    uint32_t       chip_id;

I still don't think you need this..

>  } XiveRouter;
>  
>  #define TYPE_XIVE_ROUTER "xive-router"
> @@ -336,6 +338,26 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>  int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                            XiveEND *end, uint8_t word_number);
>  
> +/*
> + * XIVE END ESBs
> + */
> +
> +#define TYPE_XIVE_END_SOURCE "xive-end-source"
> +#define XIVE_END_SOURCE(obj) \
> +    OBJECT_CHECK(XiveENDSource, (obj), TYPE_XIVE_END_SOURCE)
> +
> +typedef struct XiveENDSource {
> +    DeviceState parent;
> +
> +    uint32_t        nr_ends;
> +
> +    /* ESB memory region */
> +    uint32_t        esb_shift;
> +    MemoryRegion    esb_mmio;
> +
> +    XiveRouter      *xrtr;

..or this..

> +} XiveENDSource;
> +
>  /*
>   * For legacy compatibility, the exceptions define up to 256 different
>   * priorities. P9 implements only 9 levels : 8 active levels [0 - 7]
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 41d8ba1540d0..83686e260df5 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -612,8 +612,18 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>       * even futher coalescing in the Router
>       */
>      if (!xive_end_is_notify(&end)) {
> -        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
> -        return;
> +        uint8_t pq = GETFIELD_BE32(END_W1_ESn, end.w1);
> +        bool notify = xive_esb_trigger(&pq);
> +
> +        if (pq != GETFIELD_BE32(END_W1_ESn, end.w1)) {
> +            end.w1 = SETFIELD_BE32(END_W1_ESn, end.w1, pq);
> +            xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
> +        }
> +
> +        /* ESn[Q]=1 : end of notification */
> +        if (!notify) {
> +            return;
> +        }
>      }
>  
>      /*
> @@ -658,12 +668,18 @@ static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
>                             GETFIELD_BE64(EAS_END_DATA,  eas.w));
>  }
>  
> +static Property xive_router_properties[] = {
> +    DEFINE_PROP_UINT32("chip-id", XiveRouter, chip_id, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static void xive_router_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      XiveNotifierClass *xnc = XIVE_NOTIFIER_CLASS(klass);
>  
>      dc->desc    = "XIVE Router Engine";
> +    dc->props   = xive_router_properties;
>      xnc->notify = xive_router_notify;
>  }
>  
> @@ -692,6 +708,158 @@ void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon)
>                     (uint32_t) GETFIELD_BE64(EAS_END_DATA, eas->w));
>  }
>  
> +/*
> + * END ESB MMIO loads
> + */
> +static uint64_t xive_end_source_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    XiveENDSource *xsrc = XIVE_END_SOURCE(opaque);
> +    XiveRouter *xrtr = xsrc->xrtr;
> +    uint32_t offset = addr & 0xFFF;
> +    uint8_t end_blk;
> +    uint32_t end_idx;
> +    XiveEND end;
> +    uint32_t end_esmask;
> +    uint8_t pq;
> +    uint64_t ret = -1;
> +
> +    end_blk = xrtr->chip_id;

.. instead I think it makes more sense to just configure the end_blk
directly on the end_source, rather than reaching into another object
to 

> +    end_idx = addr >> (xsrc->esb_shift + 1);
> +
> +    if (xive_router_get_end(xrtr, end_blk, end_idx, &end)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No END %x/%x\n", end_blk,
> +                      end_idx);
> +        return -1;
> +    }
> +
> +    if (!xive_end_is_valid(&end)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: END %x/%x is invalid\n",
> +                      end_blk, end_idx);
> +        return -1;
> +    }
> +
> +    end_esmask = addr_is_even(addr, xsrc->esb_shift) ? END_W1_ESn : END_W1_ESe;
> +    pq = GETFIELD_BE32(end_esmask, end.w1);
> +
> +    switch (offset) {
> +    case XIVE_ESB_LOAD_EOI ... XIVE_ESB_LOAD_EOI + 0x7FF:
> +        ret = xive_esb_eoi(&pq);
> +
> +        /* Forward the source event notification for routing ?? */
> +        break;
> +
> +    case XIVE_ESB_GET ... XIVE_ESB_GET + 0x3FF:
> +        ret = pq;
> +        break;
> +
> +    case XIVE_ESB_SET_PQ_00 ... XIVE_ESB_SET_PQ_00 + 0x0FF:
> +    case XIVE_ESB_SET_PQ_01 ... XIVE_ESB_SET_PQ_01 + 0x0FF:
> +    case XIVE_ESB_SET_PQ_10 ... XIVE_ESB_SET_PQ_10 + 0x0FF:
> +    case XIVE_ESB_SET_PQ_11 ... XIVE_ESB_SET_PQ_11 + 0x0FF:
> +        ret = xive_esb_set(&pq, (offset >> 8) & 0x3);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid END ESB load addr %d\n",
> +                      offset);
> +        return -1;
> +    }
> +
> +    if (pq != GETFIELD_BE32(end_esmask, end.w1)) {
> +        end.w1 = SETFIELD_BE32(end_esmask, end.w1, pq);
> +        xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * END ESB MMIO stores are invalid
> + */
> +static void xive_end_source_write(void *opaque, hwaddr addr,
> +                                  uint64_t value, unsigned size)
> +{
> +    qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr 0x%"
> +                  HWADDR_PRIx"\n", addr);
> +}
> +
> +static const MemoryRegionOps xive_end_source_ops = {
> +    .read = xive_end_source_read,
> +    .write = xive_end_source_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +static void xive_source_end_reset(void *dev)
> +{
> +    /* TODO: Loop on all ENDs and mask off the ESn and ESe */

IIUC you're talking about actually writing the (potentially in memory)
END structures.  I don't think that makes sense for the END source
hardware model.  I'm guessing you want this for PAPR, where the ENDs
are in virtual hardware rather than guest memory, but in that case I
think the reset handling should be in the PAPR specific Xive object,
not the end_source itself.

> +}
> +
> +static void xive_end_source_realize(DeviceState *dev, Error **errp)
> +{
> +    XiveENDSource *xsrc = XIVE_END_SOURCE(dev);
> +    Object *obj;
> +    Error *local_err = NULL;
> +
> +    obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
> +    if (!obj) {
> +        error_propagate(errp, local_err);
> +        error_prepend(errp, "required link 'xive' not found: ");
> +        return;
> +    }
> +
> +    xsrc->xrtr = XIVE_ROUTER(obj);
> +
> +    if (!xsrc->nr_ends) {
> +        error_setg(errp, "Number of interrupt needs to be greater than 0");
> +        return;
> +    }
> +
> +    if (xsrc->esb_shift != XIVE_ESB_4K &&
> +        xsrc->esb_shift != XIVE_ESB_64K) {
> +        error_setg(errp, "Invalid ESB shift setting");
> +        return;
> +    }
> +
> +    /*
> +     * Each END is assigned an even/odd pair of MMIO pages, the even page
> +     * manages the ESn field while the odd page manages the ESe field.
> +     */
> +    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
> +                          &xive_end_source_ops, xsrc, "xive.end",
> +                          (1ull << (xsrc->esb_shift + 1)) * xsrc->nr_ends);
> +
> +    qemu_register_reset(xive_source_end_reset, dev);
> +}
> +
> +static Property xive_end_source_properties[] = {
> +    DEFINE_PROP_UINT32("nr-ends", XiveENDSource, nr_ends, 0),
> +    DEFINE_PROP_UINT32("shift", XiveENDSource, esb_shift, XIVE_ESB_64K),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void xive_end_source_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->desc    = "XIVE END Source";
> +    dc->props   = xive_end_source_properties;
> +    dc->realize = xive_end_source_realize;
> +}
> +
> +static const TypeInfo xive_end_source_info = {
> +    .name          = TYPE_XIVE_END_SOURCE,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(XiveENDSource),
> +    .class_init    = xive_end_source_class_init,
> +};
> +
>  /*
>   * XIVE Fabric
>   */
> @@ -706,6 +874,7 @@ static void xive_register_types(void)
>      type_register_static(&xive_source_info);
>      type_register_static(&xive_fabric_info);
>      type_register_static(&xive_router_info);
> +    type_register_static(&xive_end_source_info);
>  }
>  
>  type_init(xive_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/37] ppc/xive: introduce the XIVE interrupt thread context
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 07/37] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
@ 2018-12-06  4:31   ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-06  4:31 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 22937 bytes --]

On Thu, Dec 06, 2018 at 12:22:21AM +0100, Cédric Le Goater wrote:
> Each POWER9 processor chip has a XIVE presenter that can generate four
> different exceptions to its threads:
> 
>   - hypervisor exception,
>   - O/S exception
>   - Event-Based Branch (EBB)
>   - msgsnd (doorbell).
> 
> Each exception has a state independent from the others called a Thread
> Interrupt Management context. This context is a set of registers which
> lets the thread handle priority management and interrupt acknowledgment
> among other things. The most important ones being :
> 
>   - Interrupt Priority Register  (PIPR)
>   - Interrupt Pending Buffer     (IPB)
>   - Current Processor Priority   (CPPR)
>   - Notification Source Register (NSR)
> 
> These registers are accessible through a specific MMIO region, called
> the Thread Interrupt Management Area (TIMA), four aligned pages, each
> exposing a different view of the registers. First page (page address
> ending in 0b00) gives access to the entire context and is reserved for
> the ring 0 view for the physical thread context. The second (page
> address ending in 0b01) is for the hypervisor, ring 1 view. The third
> (page address ending in 0b10) is for the operating system, ring 2
> view. The fourth (page address ending in 0b11) is for user level, ring
> 3 view.
> 
> The thread interrupt context is modeled with a XiveTCTX object
> containing the values of the different exception registers. The TIMA
> region is mapped at the same address for each CPU.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/hw/ppc/xive.h      |  44 ++++
>  include/hw/ppc/xive_regs.h |  82 ++++++++
>  hw/intc/xive.c             | 419 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 545 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index d67b0785df7c..74b547707b17 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -368,4 +368,48 @@ typedef struct XiveENDSource {
>  void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon);
>  void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon);
>  
> +/*
> + * XIVE Thread interrupt Management (TM) context
> + */
> +
> +#define TYPE_XIVE_TCTX "xive-tctx"
> +#define XIVE_TCTX(obj) OBJECT_CHECK(XiveTCTX, (obj), TYPE_XIVE_TCTX)
> +
> +/*
> + * XIVE Thread interrupt Management register rings :
> + *
> + *   QW-0  User       event-based exception state
> + *   QW-1  O/S        OS context for priority management, interrupt acks
> + *   QW-2  Pool       hypervisor pool context for virtual processors dispatched
> + *   QW-3  Physical   physical thread context and security context
> + */
> +#define XIVE_TM_RING_COUNT      4
> +#define XIVE_TM_RING_SIZE       0x10
> +
> +typedef struct XiveTCTX {
> +    DeviceState parent_obj;
> +
> +    CPUState    *cs;
> +    qemu_irq    output;
> +
> +    uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
> +} XiveTCTX;
> +
> +/*
> + * XIVE Thread Interrupt Management Aera (TIMA)
> + *
> + * This region gives access to the registers of the thread interrupt
> + * management context. It is four page wide, each page providing a
> + * different view of the registers. The page with the lower offset is
> + * the most privileged and gives access to the entire context.
> + */
> +#define XIVE_TM_HW_PAGE         0x0
> +#define XIVE_TM_HV_PAGE         0x1
> +#define XIVE_TM_OS_PAGE         0x2
> +#define XIVE_TM_USER_PAGE       0x3
> +
> +extern const MemoryRegionOps xive_tm_ops;
> +
> +void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
> +
>  #endif /* PPC_XIVE_H */
> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> index 3c0ebad18b69..ede3d04c5eda 100644
> --- a/include/hw/ppc/xive_regs.h
> +++ b/include/hw/ppc/xive_regs.h
> @@ -23,6 +23,88 @@
>  #define XIVE_SRCNO_INDEX(srcno) ((srcno) & 0x0fffffff)
>  #define XIVE_SRCNO(blk, idx)    ((uint32_t)(blk) << 28 | (idx))
>  
> +#define TM_SHIFT                16
> +
> +/* TM register offsets */
> +#define TM_QW0_USER             0x000 /* All rings */
> +#define TM_QW1_OS               0x010 /* Ring 0..2 */
> +#define TM_QW2_HV_POOL          0x020 /* Ring 0..1 */
> +#define TM_QW3_HV_PHYS          0x030 /* Ring 0..1 */
> +
> +/* Byte offsets inside a QW             QW0 QW1 QW2 QW3 */
> +#define TM_NSR                  0x0  /*  +   +   -   +  */
> +#define TM_CPPR                 0x1  /*  -   +   -   +  */
> +#define TM_IPB                  0x2  /*  -   +   +   +  */
> +#define TM_LSMFB                0x3  /*  -   +   +   +  */
> +#define TM_ACK_CNT              0x4  /*  -   +   -   -  */
> +#define TM_INC                  0x5  /*  -   +   -   +  */
> +#define TM_AGE                  0x6  /*  -   +   -   +  */
> +#define TM_PIPR                 0x7  /*  -   +   -   +  */
> +
> +#define TM_WORD0                0x0
> +#define TM_WORD1                0x4
> +
> +/*
> + * QW word 2 contains the valid bit at the top and other fields
> + * depending on the QW.
> + */
> +#define TM_WORD2                0x8
> +#define   TM_QW0W2_VU           PPC_BIT32(0)
> +#define   TM_QW0W2_LOGIC_SERV   PPC_BITMASK32(1, 31) /* XX 2,31 ? */
> +#define   TM_QW1W2_VO           PPC_BIT32(0)
> +#define   TM_QW1W2_OS_CAM       PPC_BITMASK32(8, 31)
> +#define   TM_QW2W2_VP           PPC_BIT32(0)
> +#define   TM_QW2W2_POOL_CAM     PPC_BITMASK32(8, 31)
> +#define   TM_QW3W2_VT           PPC_BIT32(0)
> +#define   TM_QW3W2_LP           PPC_BIT32(6)
> +#define   TM_QW3W2_LE           PPC_BIT32(7)
> +#define   TM_QW3W2_T            PPC_BIT32(31)
> +
> +/*
> + * In addition to normal loads to "peek" and writes (only when invalid)
> + * using 4 and 8 bytes accesses, the above registers support these
> + * "special" byte operations:
> + *
> + *   - Byte load from QW0[NSR] - User level NSR (EBB)
> + *   - Byte store to QW0[NSR] - User level NSR (EBB)
> + *   - Byte load/store to QW1[CPPR] and QW3[CPPR] - CPPR access
> + *   - Byte load from QW3[TM_WORD2] - Read VT||00000||LP||LE on thrd 0
> + *                                    otherwise VT||0000000
> + *   - Byte store to QW3[TM_WORD2] - Set VT bit (and LP/LE if present)
> + *
> + * Then we have all these "special" CI ops at these offset that trigger
> + * all sorts of side effects:
> + */
> +#define TM_SPC_ACK_EBB          0x800   /* Load8 ack EBB to reg*/
> +#define TM_SPC_ACK_OS_REG       0x810   /* Load16 ack OS irq to reg */
> +#define TM_SPC_PUSH_USR_CTX     0x808   /* Store32 Push/Validate user context */
> +#define TM_SPC_PULL_USR_CTX     0x808   /* Load32 Pull/Invalidate user
> +                                         * context */
> +#define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit */
> +#define TM_SPC_PULL_OS_CTX      0x818   /* Load32/Load64 Pull/Invalidate OS
> +                                         * context to reg */
> +#define TM_SPC_PULL_POOL_CTX    0x828   /* Load32/Load64 Pull/Invalidate Pool
> +                                         * context to reg*/
> +#define TM_SPC_ACK_HV_REG       0x830   /* Load16 ack HV irq to reg */
> +#define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd
> +                                         * line */
> +#define TM_SPC_ACK_OS_EL        0xc10   /* Store8 ack OS irq to even line */
> +#define TM_SPC_ACK_HV_POOL_EL   0xc20   /* Store8 ack HV evt pool to even
> +                                         * line */
> +#define TM_SPC_ACK_HV_EL        0xc30   /* Store8 ack HV irq to even line */
> +/* XXX more... */
> +
> +/* NSR fields for the various QW ack types */
> +#define TM_QW0_NSR_EB           PPC_BIT8(0)
> +#define TM_QW1_NSR_EO           PPC_BIT8(0)
> +#define TM_QW3_NSR_HE           PPC_BITMASK8(0, 1)
> +#define  TM_QW3_NSR_HE_NONE     0
> +#define  TM_QW3_NSR_HE_POOL     1
> +#define  TM_QW3_NSR_HE_PHYS     2
> +#define  TM_QW3_NSR_HE_LSI      3
> +#define TM_QW3_NSR_I            PPC_BIT8(2)
> +#define TM_QW3_NSR_GRP_LVL      PPC_BIT8(3, 7)
> +
>  /* EAS (Event Assignment Structure)
>   *
>   * One per interrupt source. Targets an interrupt to a given Event
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 83686e260df5..80a965c14200 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -16,6 +16,424 @@
>  #include "hw/qdev-properties.h"
>  #include "monitor/monitor.h"
>  #include "hw/ppc/xive.h"
> +#include "hw/ppc/xive_regs.h"
> +
> +/*
> + * XIVE Thread Interrupt Management context
> + */
> +
> +static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
> +{
> +    return 0;
> +}
> +
> +static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
> +{
> +    if (cppr > XIVE_PRIORITY_MAX) {
> +        cppr = 0xff;
> +    }
> +
> +    tctx->regs[ring + TM_CPPR] = cppr;
> +}
> +
> +/*
> + * XIVE Thread Interrupt Management Area (TIMA)
> + */
> +
> +/*
> + * Define an access map for each page of the TIMA that we will use in
> + * the memory region ops to filter values when doing loads and stores
> + * of raw registers values
> + *
> + * Registers accessibility bits :
> + *
> + *    0x0 - no access
> + *    0x1 - write only
> + *    0x2 - read only
> + *    0x3 - read/write
> + */
> +
> +static const uint8_t xive_tm_hw_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   3, 3, 3, 0,
> +};
> +
> +static const uint8_t xive_tm_hv_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-1 OS   */   3, 3, 3, 3,   3, 3, 0, 3,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 3, 3,   0, 0, 0, 0,   0, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   3, 3, 3, 3,   0, 3, 0, 3,   3, 0, 0, 3,   0, 0, 0, 0,
> +};
> +
> +static const uint8_t xive_tm_os_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   3, 3, 3, 3,   0, 0, 0, 0,
> +    /* QW-1 OS   */   2, 3, 2, 2,   2, 2, 0, 2,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +};
> +
> +static const uint8_t xive_tm_user_view[] = {
> +    /* QW-0 User */   3, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-1 OS   */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-2 POOL */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +    /* QW-3 PHYS */   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,   0, 0, 0, 0,
> +};
> +
> +/*
> + * Overall TIMA access map for the thread interrupt management context
> + * registers
> + */
> +static const uint8_t *xive_tm_views[] = {
> +    [XIVE_TM_HW_PAGE]   = xive_tm_hw_view,
> +    [XIVE_TM_HV_PAGE]   = xive_tm_hv_view,
> +    [XIVE_TM_OS_PAGE]   = xive_tm_os_view,
> +    [XIVE_TM_USER_PAGE] = xive_tm_user_view,
> +};
> +
> +/*
> + * Computes a register access mask for a given offset in the TIMA
> + */
> +static uint64_t xive_tm_mask(hwaddr offset, unsigned size, bool write)
> +{
> +    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
> +    uint8_t reg_offset = offset & 0x3F;
> +    uint8_t reg_mask = write ? 0x1 : 0x2;
> +    uint64_t mask = 0x0;
> +    int i;
> +
> +    for (i = 0; i < size; i++) {
> +        if (xive_tm_views[page_offset][reg_offset + i] & reg_mask) {
> +            mask |= (uint64_t) 0xff << (8 * (size - i - 1));
> +        }
> +    }
> +
> +    return mask;
> +}
> +
> +static void xive_tm_raw_write(XiveTCTX *tctx, hwaddr offset, uint64_t value,
> +                              unsigned size)
> +{
> +    uint8_t ring_offset = offset & 0x30;
> +    uint8_t reg_offset = offset & 0x3F;
> +    uint64_t mask = xive_tm_mask(offset, size, true);
> +    int i;
> +
> +    /*
> +     * Only 4 or 8 bytes stores are allowed and the User ring is
> +     * excluded
> +     */
> +    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA @%"
> +                      HWADDR_PRIx"\n", offset);
> +        return;
> +    }
> +
> +    /*
> +     * Use the register offset for the raw values and filter out
> +     * reserved values
> +     */
> +    for (i = 0; i < size; i++) {
> +        uint8_t byte_mask = (mask >> (8 * (size - i - 1)));
> +        if (byte_mask) {
> +            tctx->regs[reg_offset + i] = (value >> (8 * (size - i - 1))) &
> +                byte_mask;
> +        }
> +    }
> +}
> +
> +static uint64_t xive_tm_raw_read(XiveTCTX *tctx, hwaddr offset, unsigned size)
> +{
> +    uint8_t ring_offset = offset & 0x30;
> +    uint8_t reg_offset = offset & 0x3F;
> +    uint64_t mask = xive_tm_mask(offset, size, false);
> +    uint64_t ret;
> +    int i;
> +
> +    /*
> +     * Only 4 or 8 bytes loads are allowed and the User ring is
> +     * excluded
> +     */
> +    if (size < 4 || !mask || ring_offset == TM_QW0_USER) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access at TIMA @%"
> +                      HWADDR_PRIx"\n", offset);
> +        return -1;
> +    }
> +
> +    /* Use the register offset for the raw values */
> +    ret = 0;
> +    for (i = 0; i < size; i++) {
> +        ret |= (uint64_t) tctx->regs[reg_offset + i] << (8 * (size - i - 1));
> +    }
> +
> +    /* filter out reserved values */
> +    return ret & mask;
> +}
> +
> +/*
> + * The TM context is mapped twice within each page. Stores and loads
> + * to the first mapping below 2K write and read the specified values
> + * without modification. The second mapping above 2K performs specific
> + * state changes (side effects) in addition to setting/returning the
> + * interrupt management area context of the processor thread.
> + */
> +static uint64_t xive_tm_ack_os_reg(XiveTCTX *tctx, hwaddr offset, unsigned size)
> +{
> +    return xive_tctx_accept(tctx, TM_QW1_OS);
> +}
> +
> +static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset,
> +                                uint64_t value, unsigned size)
> +{
> +    xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff);
> +}
> +
> +/*
> + * Define a mapping of "special" operations depending on the TIMA page
> + * offset and the size of the operation.
> + */
> +typedef struct XiveTmOp {
> +    uint8_t  page_offset;
> +    uint32_t op_offset;
> +    unsigned size;
> +    void     (*write_handler)(XiveTCTX *tctx, hwaddr offset, uint64_t value,
> +                              unsigned size);
> +    uint64_t (*read_handler)(XiveTCTX *tctx, hwaddr offset, unsigned size);
> +} XiveTmOp;
> +
> +static const XiveTmOp xive_tm_operations[] = {
> +    /*
> +     * MMIOs below 2K : raw values and special operations without side
> +     * effects
> +     */
> +    { XIVE_TM_OS_PAGE, TM_QW1_OS + TM_CPPR,   1, xive_tm_set_os_cppr, NULL },
> +
> +    /* MMIOs above 2K : special operations with side effects */
> +    { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG,     2, NULL, xive_tm_ack_os_reg },
> +};
> +
> +static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write)
> +{
> +    uint8_t page_offset = (offset >> TM_SHIFT) & 0x3;
> +    uint32_t op_offset = offset & 0xFFF;
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(xive_tm_operations); i++) {
> +        const XiveTmOp *xto = &xive_tm_operations[i];
> +
> +        /* Accesses done from a more privileged TIMA page is allowed */
> +        if (xto->page_offset >= page_offset &&
> +            xto->op_offset == op_offset &&
> +            xto->size == size &&
> +            ((write && xto->write_handler) || (!write && xto->read_handler))) {
> +            return xto;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +/*
> + * TIMA MMIO handlers
> + */
> +static void xive_tm_write(void *opaque, hwaddr offset,
> +                          uint64_t value, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> +    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +    const XiveTmOp *xto;
> +
> +    /*
> +     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
> +     */
> +
> +    /*
> +     * First, check for special operations in the 2K region
> +     */
> +    if (offset & 0x800) {
> +        xto = xive_tm_find_op(offset, size, true);
> +        if (!xto) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid write access at TIMA"
> +                          "@%"HWADDR_PRIx"\n", offset);
> +        } else {
> +            xto->write_handler(tctx, offset, value, size);
> +        }
> +        return;
> +    }
> +
> +    /*
> +     * Then, for special operations in the region below 2K.
> +     */
> +    xto = xive_tm_find_op(offset, size, true);
> +    if (xto) {
> +        xto->write_handler(tctx, offset, value, size);
> +        return;
> +    }
> +
> +    /*
> +     * Finish with raw access to the register values
> +     */
> +    xive_tm_raw_write(tctx, offset, value, size);
> +}
> +
> +static uint64_t xive_tm_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
> +    XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +    const XiveTmOp *xto;
> +
> +    /*
> +     * TODO: check V bit in Q[0-3]W2, check PTER bit associated with CPU
> +     */
> +
> +    /*
> +     * First, check for special operations in the 2K region
> +     */
> +    if (offset & 0x800) {
> +        xto = xive_tm_find_op(offset, size, false);
> +        if (!xto) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid read access to TIMA"
> +                          "@%"HWADDR_PRIx"\n", offset);
> +            return -1;
> +        }
> +        return xto->read_handler(tctx, offset, size);
> +    }
> +
> +    /*
> +     * Then, for special operations in the region below 2K.
> +     */
> +    xto = xive_tm_find_op(offset, size, false);
> +    if (xto) {
> +        return xto->read_handler(tctx, offset, size);
> +    }
> +
> +    /*
> +     * Finish with raw access to the register values
> +     */
> +    return xive_tm_raw_read(tctx, offset, size);
> +}
> +
> +const MemoryRegionOps xive_tm_ops = {
> +    .read = xive_tm_read,
> +    .write = xive_tm_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +static char *xive_tctx_ring_print(uint8_t *ring)
> +{
> +    uint32_t w2 = be32_to_cpu(*((uint32_t *) &ring[TM_WORD2]));
> +
> +    return g_strdup_printf("%02x   %02x  %02x    %02x   %02x  "
> +                   "%02x  %02x   %02x  %08x",
> +                   ring[TM_NSR], ring[TM_CPPR], ring[TM_IPB], ring[TM_LSMFB],
> +                   ring[TM_ACK_CNT], ring[TM_INC], ring[TM_AGE], ring[TM_PIPR],
> +                   w2);
> +}
> +
> +static const char * const xive_tctx_ring_names[] = {
> +    "USER", "OS", "POOL", "PHYS",
> +};
> +
> +void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
> +{
> +    int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
> +    int i;
> +
> +    monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
> +                   "  W2\n", cpu_index);
> +
> +    for (i = 0; i < XIVE_TM_RING_COUNT; i++) {
> +        char *s = xive_tctx_ring_print(&tctx->regs[i * XIVE_TM_RING_SIZE]);
> +        monitor_printf(mon, "CPU[%04x]: %4s    %s\n", cpu_index,
> +                       xive_tctx_ring_names[i], s);
> +        g_free(s);
> +    }
> +}
> +
> +static void xive_tctx_reset(void *dev)
> +{
> +    XiveTCTX *tctx = XIVE_TCTX(dev);
> +
> +    memset(tctx->regs, 0, sizeof(tctx->regs));
> +
> +    /* Set some defaults */
> +    tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF;
> +    tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF;
> +    tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF;
> +}
> +
> +static void xive_tctx_realize(DeviceState *dev, Error **errp)
> +{
> +    XiveTCTX *tctx = XIVE_TCTX(dev);
> +    PowerPCCPU *cpu;
> +    CPUPPCState *env;
> +    Object *obj;
> +    Error *local_err = NULL;
> +
> +    obj = object_property_get_link(OBJECT(dev), "cpu", &local_err);
> +    if (!obj) {
> +        error_propagate(errp, local_err);
> +        error_prepend(errp, "required link 'cpu' not found: ");
> +        return;
> +    }
> +
> +    cpu = POWERPC_CPU(obj);
> +    tctx->cs = CPU(obj);
> +
> +    env = &cpu->env;
> +    switch (PPC_INPUT(env)) {
> +    case PPC_FLAGS_INPUT_POWER7:
> +        tctx->output = env->irq_inputs[POWER7_INPUT_INT];
> +        break;
> +
> +    default:
> +        error_setg(errp, "XIVE interrupt controller does not support "
> +                   "this CPU bus model");
> +        return;
> +    }
> +
> +    qemu_register_reset(xive_tctx_reset, dev);
> +}
> +
> +static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
> +{
> +    qemu_unregister_reset(xive_tctx_reset, dev);
> +}
> +
> +static const VMStateDescription vmstate_xive_tctx = {
> +    .name = TYPE_XIVE_TCTX,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_BUFFER(regs, XiveTCTX),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static void xive_tctx_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->desc = "XIVE Interrupt Thread Context";
> +    dc->realize = xive_tctx_realize;
> +    dc->unrealize = xive_tctx_unrealize;
> +    dc->vmsd = &vmstate_xive_tctx;
> +}
> +
> +static const TypeInfo xive_tctx_info = {
> +    .name          = TYPE_XIVE_TCTX,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(XiveTCTX),
> +    .class_init    = xive_tctx_class_init,
> +};
>  
>  /*
>   * XIVE ESB helpers
> @@ -875,6 +1293,7 @@ static void xive_register_types(void)
>      type_register_static(&xive_fabric_info);
>      type_register_static(&xive_router_info);
>      type_register_static(&xive_end_source_info);
> +    type_register_static(&xive_tctx_info);
>  }
>  
>  type_init(xive_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9)
  2018-12-06  1:10 ` [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) no-reply
@ 2018-12-06  6:14   ` Cédric Le Goater
  2018-12-06  9:24     ` Fam Zheng
  0 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-06  6:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, david, qemu-ppc

Hello,

> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> Checking PATCH 25/37: spapr/xive: add state synchronization with KVM...
> Checking PATCH 26/37: spapr/xive: introduce a VM state change handler...
> ERROR: spaces required around that '*' (ctx:WxV)
> #38: FILE: hw/intc/spapr_xive_kvm.c:341:
> + static void kvmppc_xive_sync_all(sPAPRXive *xive, Error **errp)
>                                              ^
> 
> total: 1 errors, 0 warnings, 135 lines checked

This looks like a false positive.

C.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface
  2018-12-06  3:25   ` David Gibson
@ 2018-12-06  6:17     ` Cédric Le Goater
  2018-12-07  2:07       ` David Gibson
  0 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-06  6:17 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/6/18 4:25 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 12:22:17AM +0100, Cédric Le Goater wrote:
>> The XiveNotifier offers a simple interface, between the XiveSource
>> object and the main interrupt controller of the machine. It will
>> forward event notifications to the XIVE Interrupt Virtualization
>> Routing Engine (IVRE).
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/xive.h | 23 +++++++++++++++++++++++
>>  hw/intc/xive.c        | 25 +++++++++++++++++++++++++
>>  2 files changed, 48 insertions(+)
>>
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 7cebc32eba4c..6770cffec67d 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -142,6 +142,27 @@
>>  
>>  #include "hw/qdev-core.h"
>>  
>> +/*
>> + * XIVE Fabric (Interface between Source and Router)
>> + */
>> +
>> +typedef struct XiveNotifier {
>> +    Object parent;
>> +} XiveNotifier;
>> +
>> +#define TYPE_XIVE_NOTIFIER "xive-fabric"
> 
> I'm applying this, but changing the string here from "xive-fabric" to
> "xive-notifier".

Ah yes. My sed command missed that.

Thanks,

C.

> 
> 
>> +#define XIVE_NOTIFIER(obj)                                     \
>> +    OBJECT_CHECK(XiveNotifier, (obj), TYPE_XIVE_NOTIFIER)
>> +#define XIVE_NOTIFIER_CLASS(klass)                                     \
>> +    OBJECT_CLASS_CHECK(XiveNotifierClass, (klass), TYPE_XIVE_NOTIFIER)
>> +#define XIVE_NOTIFIER_GET_CLASS(obj)                                   \
>> +    OBJECT_GET_CLASS(XiveNotifierClass, (obj), TYPE_XIVE_NOTIFIER)
>> +
>> +typedef struct XiveNotifierClass {
>> +    InterfaceClass parent;
>> +    void (*notify)(XiveNotifier *xn, uint32_t lisn);
>> +} XiveNotifierClass;
>> +
>>  /*
>>   * XIVE Interrupt Source
>>   */
>> @@ -171,6 +192,8 @@ typedef struct XiveSource {
>>      uint64_t        esb_flags;
>>      uint32_t        esb_shift;
>>      MemoryRegion    esb_mmio;
>> +
>> +    XiveNotifier    *xive;
>>  } XiveSource;
>>  
>>  /*
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 11c7aac962de..79238eb57fae 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -155,7 +155,11 @@ static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno)
>>   */
>>  static void xive_source_notify(XiveSource *xsrc, int srcno)
>>  {
>> +    XiveNotifierClass *xnc = XIVE_NOTIFIER_GET_CLASS(xsrc->xive);
>>  
>> +    if (xnc->notify) {
>> +        xnc->notify(xsrc->xive, srcno);
>> +    }
>>  }
>>  
>>  /*
>> @@ -362,6 +366,17 @@ static void xive_source_reset(void *dev)
>>  static void xive_source_realize(DeviceState *dev, Error **errp)
>>  {
>>      XiveSource *xsrc = XIVE_SOURCE(dev);
>> +    Object *obj;
>> +    Error *local_err = NULL;
>> +
>> +    obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
>> +    if (!obj) {
>> +        error_propagate(errp, local_err);
>> +        error_prepend(errp, "required link 'xive' not found: ");
>> +        return;
>> +    }
>> +
>> +    xsrc->xive = XIVE_NOTIFIER(obj);
>>  
>>      if (!xsrc->nr_irqs) {
>>          error_setg(errp, "Number of interrupt needs to be greater than 0");
>> @@ -428,9 +443,19 @@ static const TypeInfo xive_source_info = {
>>      .class_init    = xive_source_class_init,
>>  };
>>  
>> +/*
>> + * XIVE Fabric
>> + */
>> +static const TypeInfo xive_fabric_info = {
>> +    .name = TYPE_XIVE_NOTIFIER,
>> +    .parent = TYPE_INTERFACE,
>> +    .class_size = sizeof(XiveNotifierClass),
>> +};
>> +
>>  static void xive_register_types(void)
>>  {
>>      type_register_static(&xive_source_info);
>> +    type_register_static(&xive_fabric_info);
>>  }
>>  
>>  type_init(xive_register_types)
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model
  2018-12-06  3:41   ` David Gibson
@ 2018-12-06  6:22     ` Cédric Le Goater
  2018-12-07  1:57       ` David Gibson
  0 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-06  6:22 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/6/18 4:41 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 12:22:18AM +0100, Cédric Le Goater wrote:
>> The XiveRouter models the second sub-engine of the XIVE architecture :
>> the Interrupt Virtualization Routing Engine (IVRE).
>>
>> The IVRE handles event notifications of the IVSE and performs the
>> interrupt routing process. For this purpose, it uses a set of tables
>> stored in system memory, the first of which being the Event Assignment
>> Structure (EAS) table.
>>
>> The EAT associates an interrupt source number with an Event Notification
>> Descriptor (END) which will be used in a second phase of the routing
>> process to identify a Notification Virtual Target.
>>
>> The XiveRouter is an abstract class which needs to be inherited from
>> to define a storage for the EAT, and other upcoming tables.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/xive.h      | 31 ++++++++++++++++
>>  include/hw/ppc/xive_regs.h | 50 +++++++++++++++++++++++++
>>  hw/intc/xive.c             | 76 ++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 157 insertions(+)
>>  create mode 100644 include/hw/ppc/xive_regs.h
>>
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 6770cffec67d..57ec9f84f527 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -141,6 +141,8 @@
>>  #define PPC_XIVE_H
>>  
>>  #include "hw/qdev-core.h"
>> +#include "hw/sysbus.h"
>> +#include "hw/ppc/xive_regs.h"
>>  
>>  /*
>>   * XIVE Fabric (Interface between Source and Router)
>> @@ -297,4 +299,33 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
>>      }
>>  }
>>  
>> +/*
>> + * XIVE Router
>> + */
>> +
>> +typedef struct XiveRouter {
>> +    SysBusDevice    parent;
> 
> I thought the plan was to make XiveRouter as well as XiveSource a
> TYPE_DEVICE descendent rather than a SysBusDevice?

We start talking about that, indeed, but then :

	https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg06407.html

I thought we concluded that it was going to get too complex.

Also, sPAPRXive is a direct descendant of XiveRouter and we want sPAPRXive 
on SysBus.

C.

> 
>> +} XiveRouter;
>> +
>> +#define TYPE_XIVE_ROUTER "xive-router"
>> +#define XIVE_ROUTER(obj)                                \
>> +    OBJECT_CHECK(XiveRouter, (obj), TYPE_XIVE_ROUTER)
>> +#define XIVE_ROUTER_CLASS(klass)                                        \
>> +    OBJECT_CLASS_CHECK(XiveRouterClass, (klass), TYPE_XIVE_ROUTER)
>> +#define XIVE_ROUTER_GET_CLASS(obj)                              \
>> +    OBJECT_GET_CLASS(XiveRouterClass, (obj), TYPE_XIVE_ROUTER)
>> +
>> +typedef struct XiveRouterClass {
>> +    SysBusDeviceClass parent;
>> +
>> +    /* XIVE table accessors */
>> +    int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
>> +                   XiveEAS *eas);
>> +} XiveRouterClass;
>> +
>> +void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
>> +
>> +int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
>> +                        XiveEAS *eas);
>> +
>>  #endif /* PPC_XIVE_H */
>> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
>> new file mode 100644
>> index 000000000000..15f2470ed9cc
>> --- /dev/null
>> +++ b/include/hw/ppc/xive_regs.h
>> @@ -0,0 +1,50 @@
>> +/*
>> + * QEMU PowerPC XIVE internal structure definitions
>> + *
>> + *
>> + * The XIVE structures are accessed by the HW and their format is
>> + * architected to be big-endian. Some macros are provided to ease
>> + * access to the different fields.
>> + *
>> + *
>> + * Copyright (c) 2016-2018, IBM Corporation.
>> + *
>> + * This code is licensed under the GPL version 2 or later. See the
>> + * COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef PPC_XIVE_REGS_H
>> +#define PPC_XIVE_REGS_H
>> +
>> +/*
>> + * Interrupt source number encoding on PowerBUS
>> + */
>> +#define XIVE_SRCNO_BLOCK(srcno) (((srcno) >> 28) & 0xf)
>> +#define XIVE_SRCNO_INDEX(srcno) ((srcno) & 0x0fffffff)
>> +#define XIVE_SRCNO(blk, idx)    ((uint32_t)(blk) << 28 | (idx))
>> +
>> +/* EAS (Event Assignment Structure)
>> + *
>> + * One per interrupt source. Targets an interrupt to a given Event
>> + * Notification Descriptor (END) and provides the corresponding
>> + * logical interrupt number (END data)
>> + */
>> +typedef struct XiveEAS {
>> +        /* Use a single 64-bit definition to make it easier to
>> +         * perform atomic updates
>> +         */
>> +        uint64_t        w;
>> +#define EAS_VALID       PPC_BIT(0)
>> +#define EAS_END_BLOCK   PPC_BITMASK(4, 7)        /* Destination END block# */
>> +#define EAS_END_INDEX   PPC_BITMASK(8, 31)       /* Destination END index */
>> +#define EAS_MASKED      PPC_BIT(32)              /* Masked */
>> +#define EAS_END_DATA    PPC_BITMASK(33, 63)      /* Data written to the END */
>> +} XiveEAS;
>> +
>> +#define xive_eas_is_valid(eas)   (be64_to_cpu((eas)->w) & EAS_VALID)
>> +#define xive_eas_is_masked(eas)  (be64_to_cpu((eas)->w) & EAS_MASKED)
>> +
>> +#define GETFIELD_BE64(m, v)      GETFIELD(m, be64_to_cpu(v))
>> +#define SETFIELD_BE64(m, v, val) cpu_to_be64(SETFIELD(m, be64_to_cpu(v), val))
>> +
>> +#endif /* PPC_XIVE_REGS_H */
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 79238eb57fae..d21df6674d8c 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -443,6 +443,81 @@ static const TypeInfo xive_source_info = {
>>      .class_init    = xive_source_class_init,
>>  };
>>  
>> +/*
>> + * XIVE Router (aka. Virtualization Controller or IVRE)
>> + */
>> +
>> +int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
>> +                        XiveEAS *eas)
>> +{
>> +    XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>> +
>> +    return xrc->get_eas(xrtr, eas_blk, eas_idx, eas);
>> +}
> 
> Didn't spot this before, but it probably makes sense for such a
> trivial wrapper to be a static inline in the header.
> 
>> +
>> +static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
>> +{
>> +    XiveRouter *xrtr = XIVE_ROUTER(xn);
>> +    uint8_t eas_blk = XIVE_SRCNO_BLOCK(lisn);
>> +    uint32_t eas_idx = XIVE_SRCNO_INDEX(lisn);
>> +    XiveEAS eas;
>> +
>> +    /* EAS cache lookup */
>> +    if (xive_router_get_eas(xrtr, eas_blk, eas_idx, &eas)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %x\n", lisn);
>> +        return;
>> +    }
>> +
>> +    /* The IVRE checks the State Bit Cache at this point. We skip the
>> +     * SBC lookup because the state bits of the sources are modeled
>> +     * internally in QEMU.
>> +     */
>> +
>> +    if (!xive_eas_is_valid(&eas)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %x\n", lisn);
>> +        return;
>> +    }
>> +
>> +    if (xive_eas_is_masked(&eas)) {
>> +        /* Notification completed */
>> +        return;
>> +    }
>> +}
>> +
>> +static void xive_router_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    XiveNotifierClass *xnc = XIVE_NOTIFIER_CLASS(klass);
>> +
>> +    dc->desc    = "XIVE Router Engine";
>> +    xnc->notify = xive_router_notify;
>> +}
>> +
>> +static const TypeInfo xive_router_info = {
>> +    .name          = TYPE_XIVE_ROUTER,
>> +    .parent        = TYPE_SYS_BUS_DEVICE,
>> +    .abstract      = true,
>> +    .class_size    = sizeof(XiveRouterClass),
>> +    .class_init    = xive_router_class_init,
>> +    .interfaces    = (InterfaceInfo[]) {
>> +        { TYPE_XIVE_NOTIFIER },
>> +        { }
>> +    }
>> +};
>> +
>> +void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon)
>> +{
>> +    if (!xive_eas_is_valid(eas)) {
>> +        return;
>> +    }
>> +
>> +    monitor_printf(mon, "  %08x %s end:%02x/%04x data:%08x\n",
>> +                   lisn, xive_eas_is_masked(eas) ? "M" : " ",
>> +                   (uint8_t)  GETFIELD_BE64(EAS_END_BLOCK, eas->w),
>> +                   (uint32_t) GETFIELD_BE64(EAS_END_INDEX, eas->w),
>> +                   (uint32_t) GETFIELD_BE64(EAS_END_DATA, eas->w));
>> +}
>> +
>>  /*
>>   * XIVE Fabric
>>   */
>> @@ -456,6 +531,7 @@ static void xive_register_types(void)
>>  {
>>      type_register_static(&xive_source_info);
>>      type_register_static(&xive_fabric_info);
>> +    type_register_static(&xive_router_info);
>>  }
>>  
>>  type_init(xive_register_types)
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers
  2018-12-06  4:09   ` David Gibson
@ 2018-12-06  6:30     ` Cédric Le Goater
  2018-12-07  2:05       ` David Gibson
  0 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-06  6:30 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/6/18 5:09 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 12:22:20AM +0100, Cédric Le Goater wrote:
>> The Event Notification Descriptor (END) XIVE structure also contains
>> two Event State Buffers providing further coalescing of interrupts,
>> one for the notification event (ESn) and one for the escalation events
>> (ESe). A MMIO page is assigned for each to control the EOI through
>> loads only. Stores are not allowed.
>>
>> The END ESBs are modeled through an object resembling the 'XiveSource'
>> It is stateless as the END state bits are backed into the XiveEND
>> structure under the XiveRouter and the MMIO accesses follow the same
>> rules as for the standard source ESBs.
>>
>> END ESBs are not supported by the Linux drivers neither on OPAL nor on
>> sPAPR. Nevetherless, it provides a mean to study the question in the
>> future and validates a bit more the XIVE model.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/xive.h |  22 ++++++
>>  hw/intc/xive.c        | 173 +++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 193 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index d1b4c6c78ec5..d67b0785df7c 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -305,6 +305,8 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
>>  
>>  typedef struct XiveRouter {
>>      SysBusDevice    parent;
>> +
>> +    uint32_t       chip_id;
> 
> I still don't think you need this..

I know :) 

> 
>>  } XiveRouter;
>>  
>>  #define TYPE_XIVE_ROUTER "xive-router"
>> @@ -336,6 +338,26 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>  int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                            XiveEND *end, uint8_t word_number);
>>  
>> +/*
>> + * XIVE END ESBs
>> + */
>> +
>> +#define TYPE_XIVE_END_SOURCE "xive-end-source"
>> +#define XIVE_END_SOURCE(obj) \
>> +    OBJECT_CHECK(XiveENDSource, (obj), TYPE_XIVE_END_SOURCE)
>> +
>> +typedef struct XiveENDSource {
>> +    DeviceState parent;
>> +
>> +    uint32_t        nr_ends;
>> +
>> +    /* ESB memory region */
>> +    uint32_t        esb_shift;
>> +    MemoryRegion    esb_mmio;
>> +
>> +    XiveRouter      *xrtr;
> 
> ..or this..
> 
>> +} XiveENDSource;
>> +
>>  /*
>>   * For legacy compatibility, the exceptions define up to 256 different
>>   * priorities. P9 implements only 9 levels : 8 active levels [0 - 7]
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 41d8ba1540d0..83686e260df5 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -612,8 +612,18 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>>       * even futher coalescing in the Router
>>       */
>>      if (!xive_end_is_notify(&end)) {
>> -        qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n");
>> -        return;
>> +        uint8_t pq = GETFIELD_BE32(END_W1_ESn, end.w1);
>> +        bool notify = xive_esb_trigger(&pq);
>> +
>> +        if (pq != GETFIELD_BE32(END_W1_ESn, end.w1)) {
>> +            end.w1 = SETFIELD_BE32(END_W1_ESn, end.w1, pq);
>> +            xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
>> +        }
>> +
>> +        /* ESn[Q]=1 : end of notification */
>> +        if (!notify) {
>> +            return;
>> +        }
>>      }
>>  
>>      /*
>> @@ -658,12 +668,18 @@ static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
>>                             GETFIELD_BE64(EAS_END_DATA,  eas.w));
>>  }
>>  
>> +static Property xive_router_properties[] = {
>> +    DEFINE_PROP_UINT32("chip-id", XiveRouter, chip_id, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>>  static void xive_router_class_init(ObjectClass *klass, void *data)
>>  {
>>      DeviceClass *dc = DEVICE_CLASS(klass);
>>      XiveNotifierClass *xnc = XIVE_NOTIFIER_CLASS(klass);
>>  
>>      dc->desc    = "XIVE Router Engine";
>> +    dc->props   = xive_router_properties;
>>      xnc->notify = xive_router_notify;
>>  }
>>  
>> @@ -692,6 +708,158 @@ void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon)
>>                     (uint32_t) GETFIELD_BE64(EAS_END_DATA, eas->w));
>>  }
>>  
>> +/*
>> + * END ESB MMIO loads
>> + */
>> +static uint64_t xive_end_source_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +    XiveENDSource *xsrc = XIVE_END_SOURCE(opaque);
>> +    XiveRouter *xrtr = xsrc->xrtr;
>> +    uint32_t offset = addr & 0xFFF;
>> +    uint8_t end_blk;
>> +    uint32_t end_idx;
>> +    XiveEND end;
>> +    uint32_t end_esmask;
>> +    uint8_t pq;
>> +    uint64_t ret = -1;
>> +
>> +    end_blk = xrtr->chip_id;
> 
> .. instead I think it makes more sense to just configure the end_blk
> directly on the end_source, rather than reaching into another object
> to 

Ah. That's what I was asking in an email. I missed the answer maybe.
Let's drop it and sPAPRXive block will be 0. 

> 
>> +    end_idx = addr >> (xsrc->esb_shift + 1);
>> +
>> +    if (xive_router_get_end(xrtr, end_blk, end_idx, &end)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No END %x/%x\n", end_blk,
>> +                      end_idx);
>> +        return -1;
>> +    }
>> +
>> +    if (!xive_end_is_valid(&end)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: END %x/%x is invalid\n",
>> +                      end_blk, end_idx);
>> +        return -1;
>> +    }
>> +
>> +    end_esmask = addr_is_even(addr, xsrc->esb_shift) ? END_W1_ESn : END_W1_ESe;
>> +    pq = GETFIELD_BE32(end_esmask, end.w1);
>> +
>> +    switch (offset) {
>> +    case XIVE_ESB_LOAD_EOI ... XIVE_ESB_LOAD_EOI + 0x7FF:
>> +        ret = xive_esb_eoi(&pq);
>> +
>> +        /* Forward the source event notification for routing ?? */
>> +        break;
>> +
>> +    case XIVE_ESB_GET ... XIVE_ESB_GET + 0x3FF:
>> +        ret = pq;
>> +        break;
>> +
>> +    case XIVE_ESB_SET_PQ_00 ... XIVE_ESB_SET_PQ_00 + 0x0FF:
>> +    case XIVE_ESB_SET_PQ_01 ... XIVE_ESB_SET_PQ_01 + 0x0FF:
>> +    case XIVE_ESB_SET_PQ_10 ... XIVE_ESB_SET_PQ_10 + 0x0FF:
>> +    case XIVE_ESB_SET_PQ_11 ... XIVE_ESB_SET_PQ_11 + 0x0FF:
>> +        ret = xive_esb_set(&pq, (offset >> 8) & 0x3);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid END ESB load addr %d\n",
>> +                      offset);
>> +        return -1;
>> +    }
>> +
>> +    if (pq != GETFIELD_BE32(end_esmask, end.w1)) {
>> +        end.w1 = SETFIELD_BE32(end_esmask, end.w1, pq);
>> +        xive_router_write_end(xrtr, end_blk, end_idx, &end, 1);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +/*
>> + * END ESB MMIO stores are invalid
>> + */
>> +static void xive_end_source_write(void *opaque, hwaddr addr,
>> +                                  uint64_t value, unsigned size)
>> +{
>> +    qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr 0x%"
>> +                  HWADDR_PRIx"\n", addr);
>> +}
>> +
>> +static const MemoryRegionOps xive_end_source_ops = {
>> +    .read = xive_end_source_read,
>> +    .write = xive_end_source_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +    .valid = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +    .impl = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>> +static void xive_source_end_reset(void *dev)
>> +{
>> +    /* TODO: Loop on all ENDs and mask off the ESn and ESe */
> 
> IIUC you're talking about actually writing the (potentially in memory)
> END structures.  I don't think that makes sense for the END source
> hardware model.  I'm guessing you want this for PAPR, where the ENDs
> are in virtual hardware 

That's done already.

> rather than guest memory, but in that case I
> think the reset handling should be in the PAPR specific Xive object,
> not the end_source itself.

I will remove the TODO, it's obsolete.

Thanks,

C.
 
> 
>> +}
>> +
>> +static void xive_end_source_realize(DeviceState *dev, Error **errp)
>> +{
>> +    XiveENDSource *xsrc = XIVE_END_SOURCE(dev);
>> +    Object *obj;
>> +    Error *local_err = NULL;
>> +
>> +    obj = object_property_get_link(OBJECT(dev), "xive", &local_err);
>> +    if (!obj) {
>> +        error_propagate(errp, local_err);
>> +        error_prepend(errp, "required link 'xive' not found: ");
>> +        return;
>> +    }
>> +
>> +    xsrc->xrtr = XIVE_ROUTER(obj);
>> +
>> +    if (!xsrc->nr_ends) {
>> +        error_setg(errp, "Number of interrupt needs to be greater than 0");
>> +        return;
>> +    }
>> +
>> +    if (xsrc->esb_shift != XIVE_ESB_4K &&
>> +        xsrc->esb_shift != XIVE_ESB_64K) {
>> +        error_setg(errp, "Invalid ESB shift setting");
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Each END is assigned an even/odd pair of MMIO pages, the even page
>> +     * manages the ESn field while the odd page manages the ESe field.
>> +     */
>> +    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
>> +                          &xive_end_source_ops, xsrc, "xive.end",
>> +                          (1ull << (xsrc->esb_shift + 1)) * xsrc->nr_ends);
>> +
>> +    qemu_register_reset(xive_source_end_reset, dev);
>> +}
>> +
>> +static Property xive_end_source_properties[] = {
>> +    DEFINE_PROP_UINT32("nr-ends", XiveENDSource, nr_ends, 0),
>> +    DEFINE_PROP_UINT32("shift", XiveENDSource, esb_shift, XIVE_ESB_64K),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void xive_end_source_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +    dc->desc    = "XIVE END Source";
>> +    dc->props   = xive_end_source_properties;
>> +    dc->realize = xive_end_source_realize;
>> +}
>> +
>> +static const TypeInfo xive_end_source_info = {
>> +    .name          = TYPE_XIVE_END_SOURCE,
>> +    .parent        = TYPE_DEVICE,
>> +    .instance_size = sizeof(XiveENDSource),
>> +    .class_init    = xive_end_source_class_init,
>> +};
>> +
>>  /*
>>   * XIVE Fabric
>>   */
>> @@ -706,6 +874,7 @@ static void xive_register_types(void)
>>      type_register_static(&xive_source_info);
>>      type_register_static(&xive_fabric_info);
>>      type_register_static(&xive_router_info);
>> +    type_register_static(&xive_end_source_info);
>>  }
>>  
>>  type_init(xive_register_types)
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9)
  2018-12-06  6:14   ` Cédric Le Goater
@ 2018-12-06  9:24     ` Fam Zheng
  0 siblings, 0 replies; 66+ messages in thread
From: Fam Zheng @ 2018-12-06  9:24 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-devel, qemu-ppc, famz, David Gibson



> On Dec 6, 2018, at 14:14, Cédric Le Goater <clg@kaod.org> wrote:
> 
> Hello,
> 
>> Your patch has style problems, please review.  If any of these errors
>> are false positives report them to the maintainer, see
>> CHECKPATCH in MAINTAINERS.
>> Checking PATCH 25/37: spapr/xive: add state synchronization with KVM...
>> Checking PATCH 26/37: spapr/xive: introduce a VM state change handler...
>> ERROR: spaces required around that '*' (ctx:WxV)
>> #38: FILE: hw/intc/spapr_xive_kvm.c:341:
>> + static void kvmppc_xive_sync_all(sPAPRXive *xive, Error **errp)
>>                                             ^
>> 
>> total: 1 errors, 0 warnings, 135 lines checked
> 
> This looks like a false positive.

Right. Without looking at the checkpatch code, I suspect this is due to sPAPRXive starts with a lower case letter which is not standard.

Fam

> 
> C.
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model
  2018-12-06  6:22     ` Cédric Le Goater
@ 2018-12-07  1:57       ` David Gibson
  2018-12-07  7:49         ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-07  1:57 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3023 bytes --]

On Thu, Dec 06, 2018 at 07:22:54AM +0100, Cédric Le Goater wrote:
> On 12/6/18 4:41 AM, David Gibson wrote:
> > On Thu, Dec 06, 2018 at 12:22:18AM +0100, Cédric Le Goater wrote:
> >> The XiveRouter models the second sub-engine of the XIVE architecture :
> >> the Interrupt Virtualization Routing Engine (IVRE).
> >>
> >> The IVRE handles event notifications of the IVSE and performs the
> >> interrupt routing process. For this purpose, it uses a set of tables
> >> stored in system memory, the first of which being the Event Assignment
> >> Structure (EAS) table.
> >>
> >> The EAT associates an interrupt source number with an Event Notification
> >> Descriptor (END) which will be used in a second phase of the routing
> >> process to identify a Notification Virtual Target.
> >>
> >> The XiveRouter is an abstract class which needs to be inherited from
> >> to define a storage for the EAT, and other upcoming tables.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  include/hw/ppc/xive.h      | 31 ++++++++++++++++
> >>  include/hw/ppc/xive_regs.h | 50 +++++++++++++++++++++++++
> >>  hw/intc/xive.c             | 76 ++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 157 insertions(+)
> >>  create mode 100644 include/hw/ppc/xive_regs.h
> >>
> >> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >> index 6770cffec67d..57ec9f84f527 100644
> >> --- a/include/hw/ppc/xive.h
> >> +++ b/include/hw/ppc/xive.h
> >> @@ -141,6 +141,8 @@
> >>  #define PPC_XIVE_H
> >>  
> >>  #include "hw/qdev-core.h"
> >> +#include "hw/sysbus.h"
> >> +#include "hw/ppc/xive_regs.h"
> >>  
> >>  /*
> >>   * XIVE Fabric (Interface between Source and Router)
> >> @@ -297,4 +299,33 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
> >>      }
> >>  }
> >>  
> >> +/*
> >> + * XIVE Router
> >> + */
> >> +
> >> +typedef struct XiveRouter {
> >> +    SysBusDevice    parent;
> > 
> > I thought the plan was to make XiveRouter as well as XiveSource a
> > TYPE_DEVICE descendent rather than a SysBusDevice?
> 
> We start talking about that, indeed, but then :
> 
> 	https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg06407.html
> 
> I thought we concluded that it was going to get too complex.
> 
> Also, sPAPRXive is a direct descendant of XiveRouter and we want sPAPRXive 
> on SysBus.

Ah, good point.  So, to clarify my thinking here - I think from a
theoretical point of view, having XiveRouter not be sysbus and
including it by composition is probably the "correct" approach.

But I can also see that that will be a bit of a pain in practice.  So
yes, keeping it as a SysBusDevice is ok, at least as long as any
migration stuff is in the "outermost" / most specific type, which I
believe it is.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers
  2018-12-06  6:30     ` Cédric Le Goater
@ 2018-12-07  2:05       ` David Gibson
  2018-12-07  7:48         ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-07  2:05 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2122 bytes --]

On Thu, Dec 06, 2018 at 07:30:34AM +0100, Cédric Le Goater wrote:
> On 12/6/18 5:09 AM, David Gibson wrote:
> > On Thu, Dec 06, 2018 at 12:22:20AM +0100, Cédric Le Goater wrote:
[snip]
> >> +/*
> >> + * END ESB MMIO loads
> >> + */
> >> +static uint64_t xive_end_source_read(void *opaque, hwaddr addr, unsigned size)
> >> +{
> >> +    XiveENDSource *xsrc = XIVE_END_SOURCE(opaque);
> >> +    XiveRouter *xrtr = xsrc->xrtr;
> >> +    uint32_t offset = addr & 0xFFF;
> >> +    uint8_t end_blk;
> >> +    uint32_t end_idx;
> >> +    XiveEND end;
> >> +    uint32_t end_esmask;
> >> +    uint8_t pq;
> >> +    uint64_t ret = -1;
> >> +
> >> +    end_blk = xrtr->chip_id;
> > 
> > .. instead I think it makes more sense to just configure the end_blk
> > directly on the end_source, rather than reaching into another object
> > to 
> 
> Ah. That's what I was asking in an email. I missed the answer maybe.
> Let's drop it and sPAPRXive block will be 0.

Actually, I think I might not have answered earlier, because it was
far enough into the thread that I'd forgotten the original context too
much to answer easily.

Anyway, proposed course of action sounds fine.

[snip]
> >> +static void xive_source_end_reset(void *dev)
> >> +{
> >> +    /* TODO: Loop on all ENDs and mask off the ESn and ESe */
> > 
> > IIUC you're talking about actually writing the (potentially in memory)
> > END structures.  I don't think that makes sense for the END source
> > hardware model.  I'm guessing you want this for PAPR, where the ENDs
> > are in virtual hardware 
> 
> That's done already.
> 
> > rather than guest memory, but in that case I
> > think the reset handling should be in the PAPR specific Xive object,
> > not the end_source itself.
> 
> I will remove the TODO, it's obsolete.

Ok, and the whole function with it, yes?  Or is there something else
that will really belong here?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface
  2018-12-06  6:17     ` Cédric Le Goater
@ 2018-12-07  2:07       ` David Gibson
  2018-12-07  9:08         ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-07  2:07 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1669 bytes --]

On Thu, Dec 06, 2018 at 07:17:47AM +0100, Cédric Le Goater wrote:
> On 12/6/18 4:25 AM, David Gibson wrote:
> > On Thu, Dec 06, 2018 at 12:22:17AM +0100, Cédric Le Goater wrote:
> >> The XiveNotifier offers a simple interface, between the XiveSource
> >> object and the main interrupt controller of the machine. It will
> >> forward event notifications to the XIVE Interrupt Virtualization
> >> Routing Engine (IVRE).
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  include/hw/ppc/xive.h | 23 +++++++++++++++++++++++
> >>  hw/intc/xive.c        | 25 +++++++++++++++++++++++++
> >>  2 files changed, 48 insertions(+)
> >>
> >> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >> index 7cebc32eba4c..6770cffec67d 100644
> >> --- a/include/hw/ppc/xive.h
> >> +++ b/include/hw/ppc/xive.h
> >> @@ -142,6 +142,27 @@
> >>  
> >>  #include "hw/qdev-core.h"
> >>  
> >> +/*
> >> + * XIVE Fabric (Interface between Source and Router)
> >> + */
> >> +
> >> +typedef struct XiveNotifier {
> >> +    Object parent;
> >> +} XiveNotifier;
> >> +
> >> +#define TYPE_XIVE_NOTIFIER "xive-fabric"
> > 
> > I'm applying this, but changing the string here from "xive-fabric" to
> > "xive-notifier".
> 
> Ah yes. My sed command missed that.

So, I've now applied patches 1-5.  I think we've agreed to a change in
patch 6 which will require tweaks to a bunch further down the series,
so I think the rest will need a v7.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
@ 2018-12-07  3:10   ` David Gibson
  2018-12-07  8:49     ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-07  3:10 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 15296 bytes --]

On Thu, Dec 06, 2018 at 12:22:22AM +0100, Cédric Le Goater wrote:
> The last sub-engine of the XIVE architecture is the Interrupt
> Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
> IVPE share elements, the Power Bus interface (CQ), the routing table
> descriptors, and they can be combined in the same HW logic. We do the
> same in QEMU and combine both engines in the XiveRouter for
> simplicity.
> 
> When the IVRE has completed its job of matching an event source with a
> Notification Virtual Target (NVT) to notify, it forwards the event
> notification to the IVPE sub-engine. The IVPE scans the thread
> interrupt contexts of the Notification Virtual Targets (NVT)
> dispatched on the HW processor threads and if a match is found, it
> signals the thread. If not, the IVPE escalates the notification to
> some other targets and records the notification in a backlog queue.
> 
> The IVPE maintains the thread interrupt context state for each of its
> NVTs not dispatched on HW processor threads in the Notification
> Virtual Target table (NVTT).
> 
> The model currently only supports single NVT notifications.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/xive.h      |  15 +++
>  include/hw/ppc/xive_regs.h |  24 ++++
>  hw/intc/xive.c             | 227 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 266 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 74b547707b17..e9b06e75fc1c 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -327,6 +327,10 @@ typedef struct XiveRouterClass {
>                     XiveEND *end);
>      int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                       XiveEND *end, uint8_t word_number);
> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                   XiveNVT *nvt);
> +    int (*write_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                     XiveNVT *nvt, uint8_t word_number);
>  } XiveRouterClass;
>  
>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> @@ -337,6 +341,11 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                          XiveEND *end);
>  int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                            XiveEND *end, uint8_t word_number);
> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt);
> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                          XiveNVT *nvt, uint8_t word_number);
> +
>  
>  /*
>   * XIVE END ESBs
> @@ -393,6 +402,7 @@ typedef struct XiveTCTX {
>      qemu_irq    output;
>  
>      uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
> +    uint32_t    hw_cam;

I don't love having this as a separate field.  Since it also appears
within the register space, it's kind of redundant.  On the other hand,
I see that wiring up the property directly to the register space
doesn't really work.  Not sure how to deal with that one.

>  } XiveTCTX;
>  
>  /*
> @@ -412,4 +422,9 @@ extern const MemoryRegionOps xive_tm_ops;
>  
>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
>  
> +static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
> +{
> +    return (nvt_blk << 19) | nvt_idx;
> +}
> +
>  #endif /* PPC_XIVE_H */
> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> index ede3d04c5eda..85557e730cd8 100644
> --- a/include/hw/ppc/xive_regs.h
> +++ b/include/hw/ppc/xive_regs.h
> @@ -186,4 +186,28 @@ typedef struct XiveEND {
>  #define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
>  #define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
>  
> +/* Notification Virtual Target (NVT) */
> +typedef struct XiveNVT {
> +        uint32_t        w0;
> +#define NVT_W0_VALID             PPC_BIT32(0)
> +        uint32_t        w1;
> +        uint32_t        w2;
> +        uint32_t        w3;
> +        uint32_t        w4;
> +        uint32_t        w5;
> +        uint32_t        w6;
> +        uint32_t        w7;
> +        uint32_t        w8;
> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
> +        uint32_t        w9;
> +        uint32_t        wa;
> +        uint32_t        wb;
> +        uint32_t        wc;
> +        uint32_t        wd;
> +        uint32_t        we;
> +        uint32_t        wf;
> +} XiveNVT;
> +
> +#define xive_nvt_is_valid(nvt)    (be32_to_cpu((nvt)->w0) & NVT_W0_VALID)
> +
>  #endif /* PPC_XIVE_REGS_H */
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 80a965c14200..891542920683 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -358,6 +358,25 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
>      }
>  }
>  
> +/* The HW CAM (23bits) is hardwired to :
> + *
> + *   0x000||0b1||4Bit chip number||7Bit Thread number.
> + *
> + * and when the block grouping extension is enabled :
> + *
> + *   4Bit chip number||0x001||7Bit Thread number.
> + */
> +static uint32_t hw_cam_line(uint8_t chip_id, uint8_t tid)
> +{
> +    bool block_group = false; /* TODO (PowerNV) */
> +
> +    if (block_group) {
> +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
> +    } else {
> +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
> +    }
> +}
> +
>  static void xive_tctx_reset(void *dev)
>  {
>      XiveTCTX *tctx = XIVE_TCTX(dev);
> @@ -388,6 +407,12 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
>      cpu = POWERPC_CPU(obj);
>      tctx->cs = CPU(obj);
>  
> +    if (!tctx->hw_cam) {
> +        error_setg(errp, "XIVE: HW CAM is not set for CPU %d",
> +                   tctx->cs->cpu_index);

You could do this at realize, rather than reset, couldn't you?

> +        return;
> +    }
> +
>      env = &cpu->env;
>      switch (PPC_INPUT(env)) {
>      case PPC_FLAGS_INPUT_POWER7:
> @@ -418,11 +443,17 @@ static const VMStateDescription vmstate_xive_tctx = {
>      },
>  };
>  
> +static Property  xive_tctx_properties[] = {
> +    DEFINE_PROP_UINT32("hw-cam", XiveTCTX, hw_cam, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static void xive_tctx_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>  
>      dc->desc = "XIVE Interrupt Thread Context";
> +    dc->props = xive_tctx_properties;
>      dc->realize = xive_tctx_realize;
>      dc->unrealize = xive_tctx_unrealize;
>      dc->vmsd = &vmstate_xive_tctx;
> @@ -978,6 +1009,194 @@ int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>     return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
>  }
>  
> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> +}
> +
> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt, uint8_t word_number)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->write_nvt(xrtr, nvt_blk, nvt_idx, nvt, word_number);
> +}
> +
> +/*
> + * The thread context register words are in big-endian format.
> + */
> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
> +                                     bool cam_ignore, uint32_t logic_serv)
> +{
> +    uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx);
> +    uint8_t *regs;
> +    uint32_t qw3w2;
> +    uint32_t qw2w2;
> +    uint32_t qw1w2;
> +    uint32_t qw0w2;
> +
> +    /* TODO (PowerNV): ignore low order bits of nvt id */
> +
> +    regs = &tctx->regs[TM_QW3_HV_PHYS];
> +    qw3w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));

This is one of the main places we access regs and we have to do
horrible casting.  Would it make more sense for it to be a uint32_t
array?  Or at least for the local *regs to be.

> +    regs = &tctx->regs[TM_QW2_HV_POOL];
> +    qw2w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> +    regs = &tctx->regs[TM_QW1_OS];
> +    qw1w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> +    regs = &tctx->regs[TM_QW0_USER];
> +    qw0w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> +
> +    if (format == 0) {
> +        /* F=0 & i=1: Logical server notification */

I'm guessing the i=1 is the cam_ignore==true check?  Maybe put this
comment inside the if block to make that clearer.

> +        if (cam_ignore == true) {
> +            qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n",
> +                          nvt_blk, nvt_idx);
> +             return -1;
> +        }
> +
> +        /* F=0 & i=0: Specific NVT notification */
> +
> +        /* PHYS ring */
> +        if ((qw3w2 & TM_QW3W2_VT) &&
> +            tctx->hw_cam == hw_cam_line(nvt_blk, nvt_idx)) {
> +            return TM_QW3_HV_PHYS;
> +        }
> +
> +        /* HV POOL ring */
> +        if ((qw2w2 & TM_QW2W2_VP) &&
> +            cam == GETFIELD(TM_QW2W2_POOL_CAM, qw2w2)) {

Does that need to be a GETFIELD_BE32?

> +            return TM_QW2_HV_POOL;
> +        }
> +
> +        /* OS ring */
> +        if ((qw1w2 & TM_QW1W2_VO) &&
> +            cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) {

And here.

> +            return TM_QW1_OS;
> +        }
> +    } else {
> +        /* F=1 : User level Event-Based Branch (EBB) notification */
> +
> +        /* USER ring */
> +        if  ((qw1w2 & TM_QW1W2_VO) &&
> +             (cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) &&

And here.

> +             (qw0w2 & TM_QW0W2_VU) &&
> +             (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, qw0w2))) {
> +            return TM_QW0_USER;
> +        }
> +    }
> +    return -1;
> +}
> +
> +typedef struct XiveTCTXMatch {
> +    XiveTCTX *tctx;
> +    uint8_t ring;
> +} XiveTCTXMatch;
> +
> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> +                                 bool cam_ignore, uint8_t priority,
> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
> +{
> +    CPUState *cs;
> +
> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
> +     * hardwired CAM compares */
> +
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +        int ring;
> +
> +        /*
> +         * HW checks that the CPU is enabled in the Physical Thread
> +         * Enable Register (PTER).
> +         */
> +
> +        /*
> +         * Check the thread context CAM lines and record matches. We
> +         * will handle CPU exception delivery later
> +         */
> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> +                                         cam_ignore, logic_serv);
> +        /*
> +         * Save the context and follow on to catch duplicates, that we
> +         * don't support yet.
> +         */
> +        if (ring != -1) {
> +            if (match->tctx) {
> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
> +                return false;
> +            }
> +
> +            match->ring = ring;
> +            match->tctx = tctx;
> +        }
> +    }
> +
> +    if (!match->tctx) {
> +        qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
> +                      nvt_blk, nvt_idx);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +/*
> + * This is our simple Xive Presenter Engine model. It is merged in the
> + * Router as it does not require an extra object.
> + *
> + * It receives notification requests sent by the IVRE to find one
> + * matching NVT (or more) dispatched on the processor threads. In case
> + * of a single NVT notification, the process is abreviated and the
> + * thread is signaled if a match is found. In case of a logical server
> + * notification (bits ignored at the end of the NVT identifier), the
> + * IVPE and IVRE select a winning thread using different filters. This
> + * involves 2 or 3 exchanges on the PowerBus that the model does not
> + * support.
> + *
> + * The parameters represent what is sent on the PowerBus
> + */
> +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
> +                                  uint8_t nvt_blk, uint32_t nvt_idx,
> +                                  bool cam_ignore, uint8_t priority,
> +                                  uint32_t logic_serv)
> +{
> +    XiveNVT nvt;
> +    XiveTCTXMatch match = { 0 };

IIUC that's initializing the tctx pointer field of match, so should be
NULL, not 0 (yes, technically they're equivalent in C, but using 0 for
a pointer is confusing).

> +    bool found;
> +
> +    /* NVT cache lookup */
> +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
> +                      nvt_blk, nvt_idx);
> +        return;
> +    }
> +
> +    if (!xive_nvt_is_valid(&nvt)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
> +                      nvt_blk, nvt_idx);
> +        return;
> +    }
> +
> +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
> +                                 priority, logic_serv, &match);
> +    if (found) {
> +        return;
> +    }
> +
> +    /* If no matching NVT is dispatched on a HW thread :
> +     * - update the NVT structure if backlog is activated
> +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
> +     *   activated
> +     */
> +}
> +
>  /*
>   * An END trigger can come from an event trigger (IPI or HW) or from
>   * another chip. We don't model the PowerBus but the END trigger
> @@ -1047,6 +1266,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>      /*
>       * Follows IVPE notification
>       */
> +    xive_presenter_notify(xrtr, format,
> +                          GETFIELD_BE32(END_W6_NVT_BLOCK, end.w6),
> +                          GETFIELD_BE32(END_W6_NVT_INDEX, end.w6),
> +                          GETFIELD_BE32(END_W7_F0_IGNORE, end.w7),
> +                          priority,
> +                          GETFIELD_BE32(END_W7_F1_LOG_SERVER_ID, end.w7));
> +
> +    /* TODO: Auto EOI. */
>  }
>  
>  static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/37] ppc/xive: notify the CPU when the interrupt priority is more privileged
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 09/37] ppc/xive: notify the CPU when the interrupt priority is more privileged Cédric Le Goater
@ 2018-12-07  3:27   ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-07  3:27 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 6634 bytes --]

On Thu, Dec 06, 2018 at 12:22:23AM +0100, Cédric Le Goater wrote:
> After the event data was enqueued in the O/S Event Queue, the IVPE
> raises the bit corresponding to the priority of the pending interrupt
> in the register IBP (Interrupt Pending Buffer) to indicate there is an
> event pending in one of the 8 priority queues. The Pending Interrupt
> Priority Register (PIPR) is also updated using the IPB. This register
> represent the priority of the most favored pending notification.
> 
> The PIPR is then compared to the the Current Processor Priority
> Register (CPPR). If it is more favored (numerically less than), the
> CPU interrupt line is raised and the EO bit of the Notification Source
> Register (NSR) is updated to notify the presence of an exception for
> the O/S. The check needs to be done whenever the PIPR or the CPPR are
> changed.
> 
> The O/S acknowledges the interrupt with a special load in the Thread
> Interrupt Management Area. If the EO bit of the NSR is set, the CPPR
> takes the value of PIPR. The bit number in the IBP corresponding to
> the priority of the pending interrupt is reseted and so is the EO bit
> of the NSR.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/intc/xive.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 891542920683..0db77107ab15 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -22,9 +22,73 @@
>   * XIVE Thread Interrupt Management context
>   */
>  
> +/* Convert a priority number to an Interrupt Pending Buffer (IPB)
> + * register, which indicates a pending interrupt at the priority
> + * corresponding to the bit number
> + */
> +static uint8_t priority_to_ipb(uint8_t priority)
> +{
> +    return priority > XIVE_PRIORITY_MAX ?
> +        0 : 1 << (XIVE_PRIORITY_MAX - priority);
> +}
> +
> +/* Convert an Interrupt Pending Buffer (IPB) register to a Pending
> + * Interrupt Priority Register (PIPR), which contains the priority of
> + * the most favored pending notification.
> + */
> +static uint8_t ipb_to_pipr(uint8_t ibp)
> +{
> +    return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
> +}
> +
> +static void ipb_update(uint8_t *regs, uint8_t priority)
> +{
> +    regs[TM_IPB] |= priority_to_ipb(priority);
> +    regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
> +}
> +
> +static uint8_t exception_mask(uint8_t ring)
> +{
> +    switch (ring) {
> +    case TM_QW1_OS:
> +        return TM_QW1_NSR_EO;
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
>  static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
>  {
> -    return 0;
> +    uint8_t *regs = &tctx->regs[ring];
> +    uint8_t nsr = regs[TM_NSR];
> +    uint8_t mask = exception_mask(ring);
> +
> +    qemu_irq_lower(tctx->output);
> +
> +    if (regs[TM_NSR] & mask) {
> +        uint8_t cppr = regs[TM_PIPR];
> +
> +        regs[TM_CPPR] = cppr;
> +
> +        /* Reset the pending buffer bit */
> +        regs[TM_IPB] &= ~priority_to_ipb(cppr);
> +        regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
> +
> +        /* Drop Exception bit */
> +        regs[TM_NSR] &= ~mask;
> +    }
> +
> +    return (nsr << 8) | regs[TM_CPPR];
> +}
> +
> +static void xive_tctx_notify(XiveTCTX *tctx, uint8_t ring)
> +{
> +    uint8_t *regs = &tctx->regs[ring];
> +
> +    if (regs[TM_PIPR] < regs[TM_CPPR]) {
> +        regs[TM_NSR] |= exception_mask(ring);
> +        qemu_irq_raise(tctx->output);
> +    }
>  }
>  
>  static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
> @@ -34,6 +98,9 @@ static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr)
>      }
>  
>      tctx->regs[ring + TM_CPPR] = cppr;
> +
> +    /* CPPR has changed, check if we need to raise a pending exception */
> +    xive_tctx_notify(tctx, ring);
>  }
>  
>  /*
> @@ -189,6 +256,17 @@ static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset,
>      xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff);
>  }
>  
> +/*
> + * Adjust the IPB to allow a CPU to process event queues of other
> + * priorities during one physical interrupt cycle.
> + */
> +static void xive_tm_set_os_pending(XiveTCTX *tctx, hwaddr offset,
> +                                   uint64_t value, unsigned size)
> +{
> +    ipb_update(&tctx->regs[TM_QW1_OS], value & 0xff);
> +    xive_tctx_notify(tctx, TM_QW1_OS);
> +}
> +
>  /*
>   * Define a mapping of "special" operations depending on the TIMA page
>   * offset and the size of the operation.
> @@ -211,6 +289,7 @@ static const XiveTmOp xive_tm_operations[] = {
>  
>      /* MMIOs above 2K : special operations with side effects */
>      { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG,     2, NULL, xive_tm_ack_os_reg },
> +    { XIVE_TM_OS_PAGE, TM_SPC_SET_OS_PENDING, 1, xive_tm_set_os_pending, NULL },
>  };
>  
>  static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write)
> @@ -387,6 +466,13 @@ static void xive_tctx_reset(void *dev)
>      tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF;
>      tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF;
>      tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF;
> +
> +    /*
> +     * Initialize PIPR to 0xFF to avoid phantom interrupts when the
> +     * CPPR is first set.
> +     */
> +    tctx->regs[TM_QW1_OS + TM_PIPR] =
> +        ipb_to_pipr(tctx->regs[TM_QW1_OS + TM_IPB]);
>  }
>  
>  static void xive_tctx_realize(DeviceState *dev, Error **errp)
> @@ -1187,9 +1273,15 @@ static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
>      found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
>                                   priority, logic_serv, &match);
>      if (found) {
> +        ipb_update(&match.tctx->regs[match.ring], priority);
> +        xive_tctx_notify(match.tctx, match.ring);
>          return;
>      }
>  
> +    /* Record the IPB in the associated NVT structure */
> +    ipb_update((uint8_t *) &nvt.w4, priority);
> +    xive_router_write_nvt(xrtr, nvt_blk, nvt_idx, &nvt, 4);
> +
>      /* If no matching NVT is dispatched on a HW thread :
>       * - update the NVT structure if backlog is activated
>       * - escalate (ESe PQ bits and EAS in w4-5) if escalation is

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 10/37] spapr/xive: introduce a XIVE interrupt controller
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 10/37] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
@ 2018-12-07  3:39   ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-07  3:39 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 16803 bytes --]

On Thu, Dec 06, 2018 at 12:22:24AM +0100, Cédric Le Goater wrote:
65;5402;1c> sPAPRXive models the XIVE interrupt controller of the sPAPR machine.
> It inherits from the XiveRouter and provisions storage for the routing
> tables :
> 
>   - Event Assignment Structure (EAS)
>   - Event Notification Descriptor (END)
> 
> The sPAPRXive model incorporates an internal XiveSource for the IPIs
> and for the interrupts of the virtual devices of the guest. This model
> is consistent with XIVE architecture which also incorporates an
> internal IVSE for IPIs and accelerator interrupts in the IVRE
> sub-engine.
> 
> The sPAPRXive model exports two memory regions, one for the ESB
> trigger and management pages used to control the sources and one for
> the TIMA pages. They are mapped by default at the addresses found on
> chip 0 of a baremetal system. This is also consistent with the XIVE
> architecture which defines a Virtualization Controller BAR for the
> internal IVSE ESB pages and a Thread Managment BAR for the TIMA.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  default-configs/ppc64-softmmu.mak |   1 +
>  include/hw/ppc/spapr_xive.h       |  45 ++++
>  hw/intc/spapr_xive.c              | 366 ++++++++++++++++++++++++++++++
>  hw/intc/Makefile.objs             |   1 +
>  4 files changed, 413 insertions(+)
>  create mode 100644 include/hw/ppc/spapr_xive.h
>  create mode 100644 hw/intc/spapr_xive.c
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 2d1e7c5c4668..7f34ad0528ed 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -17,6 +17,7 @@ CONFIG_XICS=$(CONFIG_PSERIES)
>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>  CONFIG_XIVE=$(CONFIG_PSERIES)
> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_MEM_DEVICE=y
>  CONFIG_DIMM=y
>  CONFIG_SPAPR_RNG=y
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> new file mode 100644
> index 000000000000..f087959b9924
> --- /dev/null
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -0,0 +1,45 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE interrupt controller model
> + *
> + * Copyright (c) 2017-2018, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PPC_SPAPR_XIVE_H
> +#define PPC_SPAPR_XIVE_H
> +
> +#include "hw/ppc/xive.h"
> +
> +#define TYPE_SPAPR_XIVE "spapr-xive"
> +#define SPAPR_XIVE(obj) OBJECT_CHECK(sPAPRXive, (obj), TYPE_SPAPR_XIVE)
> +
> +typedef struct sPAPRXive {
> +    XiveRouter    parent;
> +
> +    /* Internal interrupt source for IPIs and virtual devices */
> +    XiveSource    source;
> +    hwaddr        vc_base;
> +
> +    /* END ESB MMIOs */
> +    XiveENDSource end_source;
> +    hwaddr        end_base;
> +
> +    /* Routing table */
> +    XiveEAS       *eat;
> +    uint32_t      nr_irqs;
> +    XiveEND       *endt;
> +    uint32_t      nr_ends;
> +
> +    /* TIMA mapping address */
> +    hwaddr        tm_base;
> +    MemoryRegion  tm_mmio;
> +} sPAPRXive;
> +
> +bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
> +bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> +qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
> +
> +#endif /* PPC_SPAPR_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> new file mode 100644
> index 000000000000..eef5830d45c6
> --- /dev/null
> +++ b/hw/intc/spapr_xive.c
> @@ -0,0 +1,366 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE interrupt controller model
> + *
> + * Copyright (c) 2017-2018, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "target/ppc/cpu.h"
> +#include "sysemu/cpus.h"
> +#include "monitor/monitor.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_xive.h"
> +#include "hw/ppc/xive.h"
> +#include "hw/ppc/xive_regs.h"
> +
> +/*
> + * XIVE Virtualization Controller BAR and Thread Managment BAR that we
> + * use for the ESB pages and the TIMA pages
> + */
> +#define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
> +#define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
> +
> +/*
> + * On sPAPR machines, use a simplified output for the XIVE END
> + * structure dumping only the information related to the OS EQ.
> + */
> +static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
> +                                          Monitor *mon)
> +{
> +    uint32_t qindex = GETFIELD_BE32(END_W1_PAGE_OFF, end->w1);
> +    uint32_t qgen = GETFIELD_BE32(END_W1_GENERATION, end->w1);
> +    uint32_t qsize = GETFIELD_BE32(END_W0_QSIZE, end->w0);
> +    uint32_t qentries = 1 << (qsize + 10);
> +    uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
> +    uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
> +
> +    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
> +                   priority, qindex, qentries, qgen);
> +
> +    xive_end_queue_pic_print_info(end, 6, mon);
> +    monitor_printf(mon, "]");
> +}
> +
> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> +{
> +    XiveSource *xsrc = &xive->source;
> +    int i;
> +
> +    monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
> +
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        uint8_t pq = xive_source_esb_get(xsrc, i);
> +        XiveEAS *eas = &xive->eat[i];
> +
> +        if (!xive_eas_is_valid(eas)) {
> +            continue;
> +        }
> +
> +        monitor_printf(mon, "  %08x %s %c%c%c %s %08x ", i,
> +                       xive_source_irq_is_lsi(xsrc, i) ? "LSI" : "MSI",
> +                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
> +                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
> +                       xsrc->status[i] & XIVE_STATUS_ASSERTED ? 'A' : ' ',
> +                       xive_eas_is_masked(eas) ? "M" : " ",
> +                       (int) GETFIELD_BE64(EAS_END_DATA, eas->w));
> +
> +        if (!xive_eas_is_masked(eas)) {
> +            uint32_t end_idx = GETFIELD_BE64(EAS_END_INDEX, eas->w);
> +            XiveEND *end;
> +
> +            assert(end_idx < xive->nr_ends);
> +            end = &xive->endt[end_idx];
> +
> +            if (xive_end_is_valid(end)) {
> +                spapr_xive_end_pic_print_info(xive, end, mon);
> +            }
> +        }
> +        monitor_printf(mon, "\n");
> +    }
> +}
> +
> +static void spapr_xive_map_mmio(sPAPRXive *xive)
> +{
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
> +    sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base);
> +}
> +
> +static void spapr_xive_end_reset(XiveEND *end)
> +{
> +    memset(end, 0, sizeof(*end));
> +
> +    /* switch off the escalation and notification ESBs */
> +    end->w1 = cpu_to_be32(END_W1_ESe_Q | END_W1_ESn_Q);
> +}
> +
> +static void spapr_xive_reset(void *dev)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +    int i;
> +
> +    /*
> +     * The XiveSource has its own reset handler, which mask off all
> +     * IRQs (!P|Q)
> +     */
> +
> +    /* Mask all valid EASs in the IRQ number space. */
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        XiveEAS *eas = &xive->eat[i];
> +        if (xive_eas_is_valid(eas)) {
> +            eas->w = cpu_to_be64(EAS_VALID | EAS_MASKED);
> +        } else {
> +            eas->w = 0;
> +        }
> +    }
> +
> +    /* Clear all ENDs */
> +    for (i = 0; i < xive->nr_ends; i++) {
> +        spapr_xive_end_reset(&xive->endt[i]);
> +    }
> +}
> +
> +static void spapr_xive_instance_init(Object *obj)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(obj);
> +
> +    object_initialize(&xive->source, sizeof(xive->source), TYPE_XIVE_SOURCE);
> +    object_property_add_child(obj, "source", OBJECT(&xive->source), NULL);
> +
> +    object_initialize(&xive->end_source, sizeof(xive->end_source),
> +                      TYPE_XIVE_END_SOURCE);
> +    object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
> +                              NULL);
> +}
> +
> +static void spapr_xive_realize(DeviceState *dev, Error **errp)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(dev);
> +    XiveSource *xsrc = &xive->source;
> +    XiveENDSource *end_xsrc = &xive->end_source;
> +    Error *local_err = NULL;
> +
> +    if (!xive->nr_irqs) {
> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> +        return;
> +    }
> +
> +    if (!xive->nr_ends) {
> +        error_setg(errp, "Number of interrupt needs to be greater 0");
> +        return;
> +    }
> +
> +    /*
> +     * Initialize the internal sources, for IPIs and virtual devices.
> +     */
> +    object_property_set_int(OBJECT(xsrc), xive->nr_irqs, "nr-irqs",
> +                            &error_fatal);
> +    object_property_add_const_link(OBJECT(xsrc), "xive", OBJECT(xive),
> +                                   &error_fatal);
> +    object_property_set_bool(OBJECT(xsrc), true, "realized", &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /*
> +     * Initialize the END ESB source
> +     */
> +    object_property_set_int(OBJECT(end_xsrc), xive->nr_irqs, "nr-ends",
> +                            &error_fatal);
> +    object_property_add_const_link(OBJECT(end_xsrc), "xive", OBJECT(xive),
> +                                   &error_fatal);
> +    object_property_set_bool(OBJECT(end_xsrc), true, "realized", &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /* Set the mapping address of the END ESB pages after the source ESBs */
> +    xive->end_base = xive->vc_base + (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
> +
> +    /*
> +     * Allocate the routing tables
> +     */
> +    xive->eat = g_new0(XiveEAS, xive->nr_irqs);
> +    xive->endt = g_new0(XiveEND, xive->nr_ends);
> +
> +    /* TIMA initialization */
> +    memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
> +                          "xive.tima", 4ull << TM_SHIFT);
> +
> +    /* Define all XIVE MMIO regions on SysBus */
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
> +
> +    /* Map all regions */
> +    spapr_xive_map_mmio(xive);
> +
> +    qemu_register_reset(spapr_xive_reset, dev);
> +}
> +
> +static int spapr_xive_get_eas(XiveRouter *xrtr, uint8_t eas_blk,
> +                              uint32_t eas_idx, XiveEAS *eas)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
> +
> +    if (eas_idx >= xive->nr_irqs) {
> +        return -1;
> +    }
> +
> +    *eas = xive->eat[eas_idx];
> +    return 0;
> +}
> +
> +static int spapr_xive_get_end(XiveRouter *xrtr,
> +                              uint8_t end_blk, uint32_t end_idx, XiveEND *end)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
> +
> +    if (end_idx >= xive->nr_ends) {
> +        return -1;
> +    }
> +
> +    memcpy(end, &xive->endt[end_idx], sizeof(XiveEND));
> +    return 0;
> +}
> +
> +static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
> +                                uint32_t end_idx, XiveEND *end,
> +                                uint8_t word_number)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
> +
> +    if (end_idx >= xive->nr_ends) {
> +        return -1;
> +    }
> +
> +    memcpy(&xive->endt[end_idx], end, sizeof(XiveEND));
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_spapr_xive_end = {
> +    .name = TYPE_SPAPR_XIVE "/end",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT32(w0, XiveEND),
> +        VMSTATE_UINT32(w1, XiveEND),
> +        VMSTATE_UINT32(w2, XiveEND),
> +        VMSTATE_UINT32(w3, XiveEND),
> +        VMSTATE_UINT32(w4, XiveEND),
> +        VMSTATE_UINT32(w5, XiveEND),
> +        VMSTATE_UINT32(w6, XiveEND),
> +        VMSTATE_UINT32(w7, XiveEND),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_xive_eas = {
> +    .name = TYPE_SPAPR_XIVE "/eas",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT64(w, XiveEAS),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_xive = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
> +        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
> +                                     vmstate_spapr_xive_eas, XiveEAS),
> +        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(endt, sPAPRXive, nr_ends,
> +                                             vmstate_spapr_xive_end, XiveEND),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property spapr_xive_properties[] = {
> +    DEFINE_PROP_UINT32("nr-irqs", sPAPRXive, nr_irqs, 0),
> +    DEFINE_PROP_UINT32("nr-ends", sPAPRXive, nr_ends, 0),
> +    DEFINE_PROP_UINT64("vc-base", sPAPRXive, vc_base, SPAPR_XIVE_VC_BASE),
> +    DEFINE_PROP_UINT64("tm-base", sPAPRXive, tm_base, SPAPR_XIVE_TM_BASE),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void spapr_xive_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    XiveRouterClass *xrc = XIVE_ROUTER_CLASS(klass);
> +
> +    dc->desc    = "sPAPR XIVE Interrupt Controller";
> +    dc->props   = spapr_xive_properties;
> +    dc->realize = spapr_xive_realize;
> +    dc->vmsd    = &vmstate_spapr_xive;
> +
> +    xrc->get_eas = spapr_xive_get_eas;
> +    xrc->get_end = spapr_xive_get_end;
> +    xrc->write_end = spapr_xive_write_end;
> +}
> +
> +static const TypeInfo spapr_xive_info = {
> +    .name = TYPE_SPAPR_XIVE,
> +    .parent = TYPE_XIVE_ROUTER,
> +    .instance_init = spapr_xive_instance_init,
> +    .instance_size = sizeof(sPAPRXive),
> +    .class_init = spapr_xive_class_init,
> +};
> +
> +static void spapr_xive_register_types(void)
> +{
> +    type_register_static(&spapr_xive_info);
> +}
> +
> +type_init(spapr_xive_register_types)
> +
> +bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
> +{
> +    XiveSource *xsrc = &xive->source;
> +
> +    if (lisn >= xive->nr_irqs) {
> +        return false;
> +    }
> +
> +    xive->eat[lisn].w |= cpu_to_be64(EAS_VALID);
> +    xive_source_irq_set(xsrc, lisn, lsi);
> +    return true;
> +}
> +
> +bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveSource *xsrc = &xive->source;
> +
> +    if (lisn >= xive->nr_irqs) {
> +        return false;
> +    }
> +
> +    xive->eat[lisn].w &= cpu_to_be64(~EAS_VALID);
> +    xive_source_irq_set(xsrc, lisn, false);
> +    return true;
> +}
> +
> +qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn)
> +{
> +    XiveSource *xsrc = &xive->source;
> +
> +    if (lisn >= xive->nr_irqs) {
> +        return NULL;
> +    }
> +
> +    /* The sPAPR machine/device should have claimed the IRQ before */
> +    assert(xive_eas_is_valid(&xive->eat[lisn]));
> +
> +    return xive_source_qirq(xsrc, lisn);
> +}
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 72a46ed91c31..301a8e972d91 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -38,6 +38,7 @@ obj-$(CONFIG_XICS) += xics.o
>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>  obj-$(CONFIG_XIVE) += xive.o
> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/37] spapr/xive: use the VCPU id as a NVT identifier
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 11/37] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
@ 2018-12-07  3:46   ` David Gibson
  2018-12-07  8:05     ` Cédric Le Goater
  0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2018-12-07  3:46 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 4716 bytes --]

On Thu, Dec 06, 2018 at 12:22:25AM +0100, Cédric Le Goater wrote:
> The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts
> to find a matching Notification Virtual Target (NVT) among the NVTs
> dispatched on the HW processor threads.
> 
> On a real system, the thread interrupt contexts are updated by the
> hypervisor when a Virtual Processor is scheduled to run on a HW
> thread. Under QEMU, the model will emulate the same behavior by
> hardwiring the NVT identifier in the thread context registers at
> reset.
> 
> The NVT identifier used by the sPAPRXive model is the VCPU id. The END
> identifier is also derived from the VCPU id. A set of helpers doing
> the conversion between identifiers are provided for the hcalls
> configuring the sources and the ENDs.
> 
> The model does not need a NVT table but The XiveRouter NVT operations
> are provided to perform some extra checks in the routing algorithm.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive.c | 53 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index eef5830d45c6..8da7a8bee949 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -26,6 +26,27 @@
>  #define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
>  #define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
>  
> +/*
> + * The allocation of VP blocks is a complex operation in OPAL and the
> + * VP identifiers have a relation with the number of HW chips, the
> + * size of the VP blocks, VP grouping, etc. The QEMU sPAPR XIVE
> + * controller model does not have the same constraints and can use a
> + * simple mapping scheme of the CPU vcpu_id
> + *
> + * These identifiers are never returned to the OS.
> + */
> +
> +#define SPAPR_XIVE_NVT_BASE 0x400
> +
> +/*
> + * sPAPR NVT and END indexing helpers
> + */
> +static uint32_t spapr_xive_nvt_to_target(sPAPRXive *xive, uint8_t nvt_blk,
> +                                  uint32_t nvt_idx)
> +{
> +    return nvt_idx - SPAPR_XIVE_NVT_BASE;
> +}
> +
>  /*
>   * On sPAPR machines, use a simplified output for the XIVE END
>   * structure dumping only the information related to the OS EQ.
> @@ -40,7 +61,8 @@ static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
>      uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
>      uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
>  
> -    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
> +    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d",
> +                   spapr_xive_nvt_to_target(xive, 0, nvt),
>                     priority, qindex, qentries, qgen);
>  
>      xive_end_queue_pic_print_info(end, 6, mon);
> @@ -246,6 +268,33 @@ static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
>      return 0;
>  }
>  
> +static int spapr_xive_get_nvt(XiveRouter *xrtr,
> +                              uint8_t nvt_blk, uint32_t nvt_idx, XiveNVT *nvt)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
> +    uint32_t vcpu_id = spapr_xive_nvt_to_target(xive, nvt_blk, nvt_idx);
> +    PowerPCCPU *cpu = spapr_find_cpu(vcpu_id);
> +
> +    if (!cpu) {
> +        return -1;
> +    }
> +
> +    /*
> +     * sPAPR does not maintain a NVT table. Return that the NVT is
> +     * valid if we have found a matching CPU
> +     */
> +    nvt->w0 = cpu_to_be32(NVT_W0_VALID);
> +    return 0;
> +}
> +
> +static int spapr_xive_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk,
> +                                uint32_t nvt_idx, XiveNVT *nvt,
> +                                uint8_t word_number)
> +{
> +    /* no NVT table */
> +    return 0;

Should this ever get called.  IIUC, we don't need to write back to the
NVTs because the papr machine should never hit a non-scheduled NVT.
But in that case should this actually be a no-op, or should it be an
g_assert_not_reached()?

> +}
> +
>  static const VMStateDescription vmstate_spapr_xive_end = {
>      .name = TYPE_SPAPR_XIVE "/end",
>      .version_id = 1,
> @@ -308,6 +357,8 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
>      xrc->get_eas = spapr_xive_get_eas;
>      xrc->get_end = spapr_xive_get_end;
>      xrc->write_end = spapr_xive_write_end;
> +    xrc->get_nvt = spapr_xive_get_nvt;
> +    xrc->write_nvt = spapr_xive_write_nvt;
>  }
>  
>  static const TypeInfo spapr_xive_info = {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 12/37] spapr: initialize VSMT before initializing the IRQ backend
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 12/37] spapr: initialize VSMT before initializing the IRQ backend Cédric Le Goater
@ 2018-12-07  3:59   ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-07  3:59 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]

On Thu, Dec 06, 2018 at 12:22:26AM +0100, Cédric Le Goater wrote:
> We will need to use xics_max_server_number() to create the sPAPRXive
> object modeling the interrupt controller of the machine which is
> created before the CPUs.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Reviewed-by: Greg Kurz <groug@kaod.org>

This one stands on its own, so I've applied.

> ---
>  hw/ppc/spapr.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 7afd1a175bf2..50cb9f9f4a02 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2466,11 +2466,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>          boot_cores_nr = possible_cpus->len;
>      }
>  
> -    /* VSMT must be set in order to be able to compute VCPU ids, ie to
> -     * call xics_max_server_number() or spapr_vcpu_id().
> -     */
> -    spapr_set_vsmt_mode(spapr, &error_fatal);
> -
>      if (smc->pre_2_10_has_unused_icps) {
>          int i;
>  
> @@ -2593,6 +2588,11 @@ static void spapr_machine_init(MachineState *machine)
>      /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
>      load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
>  
> +    /* VSMT must be set in order to be able to compute VCPU ids, ie to
> +     * call xics_max_server_number() or spapr_vcpu_id().
> +     */
> +    spapr_set_vsmt_mode(spapr, &error_fatal);
> +
>      /* Set up Interrupt Controller before we create the VCPUs */
>      smc->irq->init(spapr, &error_fatal);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 13/37] spapr: introduce a spapr_irq_init() routine
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 13/37] spapr: introduce a spapr_irq_init() routine Cédric Le Goater
@ 2018-12-07  4:04   ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-07  4:04 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 3098 bytes --]

On Thu, Dec 06, 2018 at 12:22:27AM +0100, Cédric Le Goater wrote:
> Initialize the MSI bitmap from it as this will be necessary for the
> sPAPR IRQ backend for XIVE.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

This also stands on its own, so I've applied it.

> ---
>  include/hw/ppc/spapr_irq.h |  1 +
>  hw/ppc/spapr.c             |  2 +-
>  hw/ppc/spapr_irq.c         | 16 +++++++++++-----
>  3 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index a467ce696ee4..bd7301e6d9c6 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -43,6 +43,7 @@ typedef struct sPAPRIrq {
>  extern sPAPRIrq spapr_irq_xics;
>  extern sPAPRIrq spapr_irq_xics_legacy;
>  
> +void spapr_irq_init(sPAPRMachineState *spapr, Error **errp);
>  int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp);
>  void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num);
>  qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 50cb9f9f4a02..e470efe7993c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2594,7 +2594,7 @@ static void spapr_machine_init(MachineState *machine)
>      spapr_set_vsmt_mode(spapr, &error_fatal);
>  
>      /* Set up Interrupt Controller before we create the VCPUs */
> -    smc->irq->init(spapr, &error_fatal);
> +    spapr_irq_init(spapr, &error_fatal);
>  
>      /* Set up containers for ibm,client-architecture-support negotiated options
>       */
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index e77b94cc685e..f8b651de0ec9 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -97,11 +97,6 @@ static void spapr_irq_init_xics(sPAPRMachineState *spapr, Error **errp)
>      int nr_irqs = smc->irq->nr_irqs;
>      Error *local_err = NULL;
>  
> -    /* Initialize the MSI IRQ allocator. */
> -    if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
> -        spapr_irq_msi_init(spapr, smc->irq->nr_msis);
> -    }
> -
>      if (kvm_enabled()) {
>          if (machine_kernel_irqchip_allowed(machine) &&
>              !xics_kvm_init(spapr, &local_err)) {
> @@ -213,6 +208,17 @@ sPAPRIrq spapr_irq_xics = {
>  /*
>   * sPAPR IRQ frontend routines for devices
>   */
> +void spapr_irq_init(sPAPRMachineState *spapr, Error **errp)
> +{
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +
> +    /* Initialize the MSI IRQ allocator. */
> +    if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
> +        spapr_irq_msi_init(spapr, smc->irq->nr_msis);
> +    }
> +
> +    smc->irq->init(spapr, errp);
> +}
>  
>  int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp)
>  {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/37] spapr: export and rename the xics_max_server_number() routine
  2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 15/37] spapr: export and rename the xics_max_server_number() routine Cédric Le Goater
@ 2018-12-07  4:07   ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-07  4:07 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2949 bytes --]

On Thu, Dec 06, 2018 at 12:22:29AM +0100, Cédric Le Goater wrote:
> The XIVE sPAPR IRQ backend will use it to define the number of ENDs of
> the IC controller.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Again, this makes sense on its own, so I've applied.

> ---
>  include/hw/ppc/spapr.h | 1 +
>  hw/ppc/spapr.c         | 8 ++++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 6279711fe8f7..198764066dc9 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -737,6 +737,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>  void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
>                            Error **errp);
>  void spapr_clear_pending_events(sPAPRMachineState *spapr);
> +int spapr_max_server_number(sPAPRMachineState *spapr);
>  
>  /* CPU and LMB DRC release callbacks. */
>  void spapr_core_release(DeviceState *dev);
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e470efe7993c..a689f853e020 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -150,7 +150,7 @@ static void pre_2_10_vmstate_unregister_dummy_icp(int i)
>                         (void *)(uintptr_t) i);
>  }
>  
> -static int xics_max_server_number(sPAPRMachineState *spapr)
> +int spapr_max_server_number(sPAPRMachineState *spapr)
>  {
>      assert(spapr->vsmt);
>      return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads);
> @@ -1270,7 +1270,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>      _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
>  
>      /* /interrupt controller */
> -    spapr_dt_xics(xics_max_server_number(spapr), fdt, PHANDLE_XICP);
> +    spapr_dt_xics(spapr_max_server_number(spapr), fdt, PHANDLE_XICP);
>  
>      ret = spapr_populate_memory(spapr, fdt);
>      if (ret < 0) {
> @@ -2469,7 +2469,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>      if (smc->pre_2_10_has_unused_icps) {
>          int i;
>  
> -        for (i = 0; i < xics_max_server_number(spapr); i++) {
> +        for (i = 0; i < spapr_max_server_number(spapr); i++) {
>              /* Dummy entries get deregistered when real ICPState objects
>               * are registered during CPU core hotplug.
>               */
> @@ -2589,7 +2589,7 @@ static void spapr_machine_init(MachineState *machine)
>      load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
>  
>      /* VSMT must be set in order to be able to compute VCPU ids, ie to
> -     * call xics_max_server_number() or spapr_vcpu_id().
> +     * call spapr_max_server_number() or spapr_vcpu_id().
>       */
>      spapr_set_vsmt_mode(spapr, &error_fatal);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers
  2018-12-07  2:05       ` David Gibson
@ 2018-12-07  7:48         ` Cédric Le Goater
  0 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-07  7:48 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/7/18 3:05 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 07:30:34AM +0100, Cédric Le Goater wrote:
>> On 12/6/18 5:09 AM, David Gibson wrote:
>>> On Thu, Dec 06, 2018 at 12:22:20AM +0100, Cédric Le Goater wrote:
> [snip]
>>>> +/*
>>>> + * END ESB MMIO loads
>>>> + */
>>>> +static uint64_t xive_end_source_read(void *opaque, hwaddr addr, unsigned size)
>>>> +{
>>>> +    XiveENDSource *xsrc = XIVE_END_SOURCE(opaque);
>>>> +    XiveRouter *xrtr = xsrc->xrtr;
>>>> +    uint32_t offset = addr & 0xFFF;
>>>> +    uint8_t end_blk;
>>>> +    uint32_t end_idx;
>>>> +    XiveEND end;
>>>> +    uint32_t end_esmask;
>>>> +    uint8_t pq;
>>>> +    uint64_t ret = -1;
>>>> +
>>>> +    end_blk = xrtr->chip_id;
>>>
>>> .. instead I think it makes more sense to just configure the end_blk
>>> directly on the end_source, rather than reaching into another object
>>> to 
>>
>> Ah. That's what I was asking in an email. I missed the answer maybe.
>> Let's drop it and sPAPRXive block will be 0.
> 
> Actually, I think I might not have answered earlier, because it was
> far enough into the thread that I'd forgotten the original context too
> much to answer easily.
>
 
> Anyway, proposed course of action sounds fine.

Let's be clear. 

I will add a "block" property to the XiveENDSource model, which will be set 
to a fixed value (0) on sPAPRXive and to the chip_id on PowerNV. ok ? 

> 
> [snip]
>>>> +static void xive_source_end_reset(void *dev)
>>>> +{
>>>> +    /* TODO: Loop on all ENDs and mask off the ESn and ESe */
>>>
>>> IIUC you're talking about actually writing the (potentially in memory)
>>> END structures.  I don't think that makes sense for the END source
>>> hardware model.  I'm guessing you want this for PAPR, where the ENDs
>>> are in virtual hardware 
>>
>> That's done already.
>>
>>> rather than guest memory, but in that case I
>>> think the reset handling should be in the PAPR specific Xive object,
>>> not the end_source itself.
>>
>> I will remove the TODO, it's obsolete.
> 
> Ok, and the whole function with it, yes?  Or is there something else
> that will really belong here?

Nothing I would say. 

The XiveENDSource are backed by the END structures, which are software 
entities. So it's up to the PowerNV firmware, or the sPAPR IC model to 
do the cleanup.

C.   

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model
  2018-12-07  1:57       ` David Gibson
@ 2018-12-07  7:49         ` Cédric Le Goater
  2018-12-10  3:07           ` David Gibson
  0 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-07  7:49 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/7/18 2:57 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 07:22:54AM +0100, Cédric Le Goater wrote:
>> On 12/6/18 4:41 AM, David Gibson wrote:
>>> On Thu, Dec 06, 2018 at 12:22:18AM +0100, Cédric Le Goater wrote:
>>>> The XiveRouter models the second sub-engine of the XIVE architecture :
>>>> the Interrupt Virtualization Routing Engine (IVRE).
>>>>
>>>> The IVRE handles event notifications of the IVSE and performs the
>>>> interrupt routing process. For this purpose, it uses a set of tables
>>>> stored in system memory, the first of which being the Event Assignment
>>>> Structure (EAS) table.
>>>>
>>>> The EAT associates an interrupt source number with an Event Notification
>>>> Descriptor (END) which will be used in a second phase of the routing
>>>> process to identify a Notification Virtual Target.
>>>>
>>>> The XiveRouter is an abstract class which needs to be inherited from
>>>> to define a storage for the EAT, and other upcoming tables.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  include/hw/ppc/xive.h      | 31 ++++++++++++++++
>>>>  include/hw/ppc/xive_regs.h | 50 +++++++++++++++++++++++++
>>>>  hw/intc/xive.c             | 76 ++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 157 insertions(+)
>>>>  create mode 100644 include/hw/ppc/xive_regs.h
>>>>
>>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>>>> index 6770cffec67d..57ec9f84f527 100644
>>>> --- a/include/hw/ppc/xive.h
>>>> +++ b/include/hw/ppc/xive.h
>>>> @@ -141,6 +141,8 @@
>>>>  #define PPC_XIVE_H
>>>>  
>>>>  #include "hw/qdev-core.h"
>>>> +#include "hw/sysbus.h"
>>>> +#include "hw/ppc/xive_regs.h"
>>>>  
>>>>  /*
>>>>   * XIVE Fabric (Interface between Source and Router)
>>>> @@ -297,4 +299,33 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
>>>>      }
>>>>  }
>>>>  
>>>> +/*
>>>> + * XIVE Router
>>>> + */
>>>> +
>>>> +typedef struct XiveRouter {
>>>> +    SysBusDevice    parent;
>>>
>>> I thought the plan was to make XiveRouter as well as XiveSource a
>>> TYPE_DEVICE descendent rather than a SysBusDevice?
>>
>> We start talking about that, indeed, but then :
>>
>> 	https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg06407.html
>>
>> I thought we concluded that it was going to get too complex.
>>
>> Also, sPAPRXive is a direct descendant of XiveRouter and we want sPAPRXive 
>> on SysBus.
> 
> Ah, good point.  So, to clarify my thinking here - I think from a
> theoretical point of view, having XiveRouter not be sysbus and
> including it by composition is probably the "correct" approach.

One possible solution would be to transform the XiveRouter in a QOM 
interface, this will be possible when I have removed the chip_id field,
and define the VST accessors as we do today. I am not sure how QOM 
interfaces are considered, but I think they are more in the composition 
pattern than inheritance. That way, we could have sPAPRXive directly 
inherit from SysBusDevice.

I can give it a try for v7, and you could merge the small XiveRouter 
changes in the current XiveRouter patch.

> But I can also see that that will be a bit of a pain in practice.  So
> yes, keeping it as a SysBusDevice is ok, at least as long as any
> migration stuff is in the "outermost" / most specific type, which I
> believe it is.

By this sentence, you mean that we don't rely on the XiveRouter model 
to capture the sPAPRXive state ? 

Thanks,

C. 
 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/37] spapr/xive: use the VCPU id as a NVT identifier
  2018-12-07  3:46   ` David Gibson
@ 2018-12-07  8:05     ` Cédric Le Goater
  0 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-07  8:05 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/7/18 4:46 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 12:22:25AM +0100, Cédric Le Goater wrote:
>> The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts
>> to find a matching Notification Virtual Target (NVT) among the NVTs
>> dispatched on the HW processor threads.
>>
>> On a real system, the thread interrupt contexts are updated by the
>> hypervisor when a Virtual Processor is scheduled to run on a HW
>> thread. Under QEMU, the model will emulate the same behavior by
>> hardwiring the NVT identifier in the thread context registers at
>> reset.
>>
>> The NVT identifier used by the sPAPRXive model is the VCPU id. The END
>> identifier is also derived from the VCPU id. A set of helpers doing
>> the conversion between identifiers are provided for the hcalls
>> configuring the sources and the ENDs.
>>
>> The model does not need a NVT table but The XiveRouter NVT operations
>> are provided to perform some extra checks in the routing algorithm.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive.c | 53 +++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 52 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index eef5830d45c6..8da7a8bee949 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -26,6 +26,27 @@
>>  #define SPAPR_XIVE_VC_BASE   0x0006010000000000ull
>>  #define SPAPR_XIVE_TM_BASE   0x0006030203180000ull
>>  
>> +/*
>> + * The allocation of VP blocks is a complex operation in OPAL and the
>> + * VP identifiers have a relation with the number of HW chips, the
>> + * size of the VP blocks, VP grouping, etc. The QEMU sPAPR XIVE
>> + * controller model does not have the same constraints and can use a
>> + * simple mapping scheme of the CPU vcpu_id
>> + *
>> + * These identifiers are never returned to the OS.
>> + */
>> +
>> +#define SPAPR_XIVE_NVT_BASE 0x400
>> +
>> +/*
>> + * sPAPR NVT and END indexing helpers
>> + */
>> +static uint32_t spapr_xive_nvt_to_target(sPAPRXive *xive, uint8_t nvt_blk,
>> +                                  uint32_t nvt_idx)
>> +{
>> +    return nvt_idx - SPAPR_XIVE_NVT_BASE;
>> +}
>> +
>>  /*
>>   * On sPAPR machines, use a simplified output for the XIVE END
>>   * structure dumping only the information related to the OS EQ.
>> @@ -40,7 +61,8 @@ static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end,
>>      uint32_t nvt = GETFIELD_BE32(END_W6_NVT_INDEX, end->w6);
>>      uint8_t priority = GETFIELD_BE32(END_W7_F0_PRIORITY, end->w7);
>>  
>> -    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt,
>> +    monitor_printf(mon, "%3d/%d % 6d/%5d ^%d",
>> +                   spapr_xive_nvt_to_target(xive, 0, nvt),
>>                     priority, qindex, qentries, qgen);
>>  
>>      xive_end_queue_pic_print_info(end, 6, mon);
>> @@ -246,6 +268,33 @@ static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk,
>>      return 0;
>>  }
>>  
>> +static int spapr_xive_get_nvt(XiveRouter *xrtr,
>> +                              uint8_t nvt_blk, uint32_t nvt_idx, XiveNVT *nvt)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(xrtr);
>> +    uint32_t vcpu_id = spapr_xive_nvt_to_target(xive, nvt_blk, nvt_idx);
>> +    PowerPCCPU *cpu = spapr_find_cpu(vcpu_id);
>> +
>> +    if (!cpu) {
>> +        return -1;
>> +    }
>> +
>> +    /*
>> +     * sPAPR does not maintain a NVT table. Return that the NVT is
>> +     * valid if we have found a matching CPU
>> +     */
>> +    nvt->w0 = cpu_to_be32(NVT_W0_VALID);
>> +    return 0;
>> +}
>> +
>> +static int spapr_xive_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk,
>> +                                uint32_t nvt_idx, XiveNVT *nvt,
>> +                                uint8_t word_number)
>> +{
>> +    /* no NVT table */
>> +    return 0;
> 
> Should this ever get called.  IIUC, we don't need to write back to the
> NVTs because the papr machine should never hit a non-scheduled NVT.

yes.

> But in that case should this actually be a no-op, or should it be an
> g_assert_not_reached()?

We should never reach it because the get_nvt() should have failed
before in xive_presenter_notify() :

    ...
    /* NVT cache lookup */
    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
                      nvt_blk, nvt_idx);
        return;
    }
    ...

Leaving the machine in a curious state.

so for _write_nvt(), I think g_assert_not_reached() is appropriate and 
I will add the comment you just made.

may be, I should add an 'assert(cpu)' in _get_nvt() also.
C. 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter
  2018-12-07  3:10   ` David Gibson
@ 2018-12-07  8:49     ` Cédric Le Goater
  2018-12-10  3:05       ` David Gibson
  0 siblings, 1 reply; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-07  8:49 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/7/18 4:10 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 12:22:22AM +0100, Cédric Le Goater wrote:
>> The last sub-engine of the XIVE architecture is the Interrupt
>> Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
>> IVPE share elements, the Power Bus interface (CQ), the routing table
>> descriptors, and they can be combined in the same HW logic. We do the
>> same in QEMU and combine both engines in the XiveRouter for
>> simplicity.
>>
>> When the IVRE has completed its job of matching an event source with a
>> Notification Virtual Target (NVT) to notify, it forwards the event
>> notification to the IVPE sub-engine. The IVPE scans the thread
>> interrupt contexts of the Notification Virtual Targets (NVT)
>> dispatched on the HW processor threads and if a match is found, it
>> signals the thread. If not, the IVPE escalates the notification to
>> some other targets and records the notification in a backlog queue.
>>
>> The IVPE maintains the thread interrupt context state for each of its
>> NVTs not dispatched on HW processor threads in the Notification
>> Virtual Target table (NVTT).
>>
>> The model currently only supports single NVT notifications.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/xive.h      |  15 +++
>>  include/hw/ppc/xive_regs.h |  24 ++++
>>  hw/intc/xive.c             | 227 +++++++++++++++++++++++++++++++++++++
>>  3 files changed, 266 insertions(+)
>>
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 74b547707b17..e9b06e75fc1c 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -327,6 +327,10 @@ typedef struct XiveRouterClass {
>>                     XiveEND *end);
>>      int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                       XiveEND *end, uint8_t word_number);
>> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                   XiveNVT *nvt);
>> +    int (*write_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                     XiveNVT *nvt, uint8_t word_number);
>>  } XiveRouterClass;
>>  
>>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
>> @@ -337,6 +341,11 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                          XiveEND *end);
>>  int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                            XiveEND *end, uint8_t word_number);
>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt);
>> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                          XiveNVT *nvt, uint8_t word_number);
>> +
>>  
>>  /*
>>   * XIVE END ESBs
>> @@ -393,6 +402,7 @@ typedef struct XiveTCTX {
>>      qemu_irq    output;
>>  
>>      uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
>> +    uint32_t    hw_cam;
> 
> I don't love having this as a separate field.  Since it also appears
> within the register space, it's kind of redundant. 

yes.

> On the other hand,
> I see that wiring up the property directly to the register space
> doesn't really work.  Not sure how to deal with that one.

We could use get/set properties for "hw-cam" to assign WORD2 of the 
physical ring and exclude it from reset, which makes some sense. The
test on the PHYS ring in xive_presenter_tctx_match() would also look 
like the other tests. I think this is better. 

On a related topic, WORD2 of the OS ring is assigned by the hypervisor. 
For the sPAPR machine, this is done when the sPAPR IRQ backend is 
reseted. See patch 21 in v6.
 
> 
>>  } XiveTCTX;
>>  
>>  /*
>> @@ -412,4 +422,9 @@ extern const MemoryRegionOps xive_tm_ops;
>>  
>>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
>>  
>> +static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
>> +{
>> +    return (nvt_blk << 19) | nvt_idx;
>> +}
>> +
>>  #endif /* PPC_XIVE_H */
>> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
>> index ede3d04c5eda..85557e730cd8 100644
>> --- a/include/hw/ppc/xive_regs.h
>> +++ b/include/hw/ppc/xive_regs.h
>> @@ -186,4 +186,28 @@ typedef struct XiveEND {
>>  #define GETFIELD_BE32(m, v)       GETFIELD(m, be32_to_cpu(v))
>>  #define SETFIELD_BE32(m, v, val)  cpu_to_be32(SETFIELD(m, be32_to_cpu(v), val))
>>  
>> +/* Notification Virtual Target (NVT) */
>> +typedef struct XiveNVT {
>> +        uint32_t        w0;
>> +#define NVT_W0_VALID             PPC_BIT32(0)
>> +        uint32_t        w1;
>> +        uint32_t        w2;
>> +        uint32_t        w3;
>> +        uint32_t        w4;
>> +        uint32_t        w5;
>> +        uint32_t        w6;
>> +        uint32_t        w7;
>> +        uint32_t        w8;
>> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
>> +        uint32_t        w9;
>> +        uint32_t        wa;
>> +        uint32_t        wb;
>> +        uint32_t        wc;
>> +        uint32_t        wd;
>> +        uint32_t        we;
>> +        uint32_t        wf;
>> +} XiveNVT;
>> +
>> +#define xive_nvt_is_valid(nvt)    (be32_to_cpu((nvt)->w0) & NVT_W0_VALID)
>> +
>>  #endif /* PPC_XIVE_REGS_H */
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 80a965c14200..891542920683 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -358,6 +358,25 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
>>      }
>>  }
>>  
>> +/* The HW CAM (23bits) is hardwired to :
>> + *
>> + *   0x000||0b1||4Bit chip number||7Bit Thread number.
>> + *
>> + * and when the block grouping extension is enabled :
>> + *
>> + *   4Bit chip number||0x001||7Bit Thread number.
>> + */
>> +static uint32_t hw_cam_line(uint8_t chip_id, uint8_t tid)
>> +{
>> +    bool block_group = false; /* TODO (PowerNV) */
>> +
>> +    if (block_group) {
>> +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
>> +    } else {
>> +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
>> +    }
>> +}
>> +
>>  static void xive_tctx_reset(void *dev)
>>  {
>>      XiveTCTX *tctx = XIVE_TCTX(dev);
>> @@ -388,6 +407,12 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
>>      cpu = POWERPC_CPU(obj);
>>      tctx->cs = CPU(obj);
>>  
>> +    if (!tctx->hw_cam) {
>> +        error_setg(errp, "XIVE: HW CAM is not set for CPU %d",
>> +                   tctx->cs->cpu_index);
> 
> You could do this at realize, rather than reset, couldn't you?

yes but I will remove "hw-cam" I think.
 
>> +        return;
>> +    }
>> +
>>      env = &cpu->env;
>>      switch (PPC_INPUT(env)) {
>>      case PPC_FLAGS_INPUT_POWER7:
>> @@ -418,11 +443,17 @@ static const VMStateDescription vmstate_xive_tctx = {
>>      },
>>  };
>>  
>> +static Property  xive_tctx_properties[] = {
>> +    DEFINE_PROP_UINT32("hw-cam", XiveTCTX, hw_cam, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>>  static void xive_tctx_class_init(ObjectClass *klass, void *data)
>>  {
>>      DeviceClass *dc = DEVICE_CLASS(klass);
>>  
>>      dc->desc = "XIVE Interrupt Thread Context";
>> +    dc->props = xive_tctx_properties;
>>      dc->realize = xive_tctx_realize;
>>      dc->unrealize = xive_tctx_unrealize;
>>      dc->vmsd = &vmstate_xive_tctx;
>> @@ -978,6 +1009,194 @@ int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>     return xrc->write_end(xrtr, end_blk, end_idx, end, word_number);
>>  }
>>  
>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt)
>> +{
>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>> +
>> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
>> +}
>> +
>> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt, uint8_t word_number)
>> +{
>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>> +
>> +   return xrc->write_nvt(xrtr, nvt_blk, nvt_idx, nvt, word_number);
>> +}
>> +
>> +/*
>> + * The thread context register words are in big-endian format.
>> + */
>> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
>> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                     bool cam_ignore, uint32_t logic_serv)
>> +{
>> +    uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx);
>> +    uint8_t *regs;
>> +    uint32_t qw3w2;
>> +    uint32_t qw2w2;
>> +    uint32_t qw1w2;
>> +    uint32_t qw0w2;
>> +
>> +    /* TODO (PowerNV): ignore low order bits of nvt id */
>> +
>> +    regs = &tctx->regs[TM_QW3_HV_PHYS];
>> +    qw3w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> 
> This is one of the main places we access regs and we have to do
> horrible casting.  Would it make more sense for it to be a uint32_t
> array?  Or at least for the local *regs to be.

The register array is accessed by byte (patch 9) for the first two 
words and by word for WORD2. I don't see any good solution apart 
from a helper routine maybe : 

  static inline uint32_t xive_tctx_word2(int8_t *regs)
  {
      return be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
  }

which I need for xive_tctx_ring_print() also.
 
>> +    regs = &tctx->regs[TM_QW2_HV_POOL];
>> +    qw2w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
>> +    regs = &tctx->regs[TM_QW1_OS];
>> +    qw1w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
>> +    regs = &tctx->regs[TM_QW0_USER];
>> +    qw0w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
>> +
>> +    if (format == 0) {
>> +        /* F=0 & i=1: Logical server notification */
> 
> I'm guessing the i=1 is the cam_ignore==true check?  Maybe put this
> comment inside the if block to make that clearer.

yes. 

> 
>> +        if (cam_ignore == true) {
>> +            qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n",
>> +                          nvt_blk, nvt_idx);
>> +             return -1;
>> +        }
>> +
>> +        /* F=0 & i=0: Specific NVT notification */
>> +
>> +        /* PHYS ring */
>> +        if ((qw3w2 & TM_QW3W2_VT) &&
>> +            tctx->hw_cam == hw_cam_line(nvt_blk, nvt_idx)) {
>> +            return TM_QW3_HV_PHYS;
>> +        }
>> +
>> +        /* HV POOL ring */
>> +        if ((qw2w2 & TM_QW2W2_VP) &&
>> +            cam == GETFIELD(TM_QW2W2_POOL_CAM, qw2w2)) {
> 
> Does that need to be a GETFIELD_BE32?

the qw[0123]w2 variables have been byteswapped already. But, that might
not be a good idea. in that case, we should byteswap the V[TPOU] bit value 
instead ? What's your opinion.

we would get rid of the be32_to_cpu() above 

> 
>> +            return TM_QW2_HV_POOL;
>> +        }
>> +
>> +        /* OS ring */
>> +        if ((qw1w2 & TM_QW1W2_VO) &&
>> +            cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) {
> 
> And here.
> 
>> +            return TM_QW1_OS;
>> +        }
>> +    } else {
>> +        /* F=1 : User level Event-Based Branch (EBB) notification */
>> +
>> +        /* USER ring */
>> +        if  ((qw1w2 & TM_QW1W2_VO) &&
>> +             (cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) &&
> 
> And here.
> 
>> +             (qw0w2 & TM_QW0W2_VU) &&
>> +             (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, qw0w2))) {
>> +            return TM_QW0_USER;
>> +        }
>> +    }
>> +    return -1;
>> +}
>> +
>> +typedef struct XiveTCTXMatch {
>> +    XiveTCTX *tctx;
>> +    uint8_t ring;
>> +} XiveTCTXMatch;
>> +
>> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
>> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                 bool cam_ignore, uint8_t priority,
>> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
>> +{
>> +    CPUState *cs;
>> +
>> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
>> +     * hardwired CAM compares */
>> +
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
>> +        int ring;
>> +
>> +        /*
>> +         * HW checks that the CPU is enabled in the Physical Thread
>> +         * Enable Register (PTER).
>> +         */
>> +
>> +        /*
>> +         * Check the thread context CAM lines and record matches. We
>> +         * will handle CPU exception delivery later
>> +         */
>> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
>> +                                         cam_ignore, logic_serv);
>> +        /*
>> +         * Save the context and follow on to catch duplicates, that we
>> +         * don't support yet.
>> +         */
>> +        if (ring != -1) {
>> +            if (match->tctx) {
>> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
>> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
>> +                return false;
>> +            }
>> +
>> +            match->ring = ring;
>> +            match->tctx = tctx;
>> +        }
>> +    }
>> +
>> +    if (!match->tctx) {
>> +        qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
>> +                      nvt_blk, nvt_idx);
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +/*
>> + * This is our simple Xive Presenter Engine model. It is merged in the
>> + * Router as it does not require an extra object.
>> + *
>> + * It receives notification requests sent by the IVRE to find one
>> + * matching NVT (or more) dispatched on the processor threads. In case
>> + * of a single NVT notification, the process is abreviated and the
>> + * thread is signaled if a match is found. In case of a logical server
>> + * notification (bits ignored at the end of the NVT identifier), the
>> + * IVPE and IVRE select a winning thread using different filters. This
>> + * involves 2 or 3 exchanges on the PowerBus that the model does not
>> + * support.
>> + *
>> + * The parameters represent what is sent on the PowerBus
>> + */
>> +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
>> +                                  uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                  bool cam_ignore, uint8_t priority,
>> +                                  uint32_t logic_serv)
>> +{
>> +    XiveNVT nvt;
>> +    XiveTCTXMatch match = { 0 };
> 
> IIUC that's initializing the tctx pointer field of match, so should be
> NULL, not 0 (yes, technically they're equivalent in C, but using 0 for
> a pointer is confusing).

OK. I will clarify.

> 
>> +    bool found;
>> +
>> +    /* NVT cache lookup */
>> +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
>> +                      nvt_blk, nvt_idx);
>> +        return;
>> +    }
>> +
>> +    if (!xive_nvt_is_valid(&nvt)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
>> +                      nvt_blk, nvt_idx);
>> +        return;
>> +    }
>> +
>> +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
>> +                                 priority, logic_serv, &match);
>> +    if (found) {
>> +        return;
>> +    }
>> +
>> +    /* If no matching NVT is dispatched on a HW thread :
>> +     * - update the NVT structure if backlog is activated
>> +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
>> +     *   activated
>> +     */
>> +}
>> +
>>  /*
>>   * An END trigger can come from an event trigger (IPI or HW) or from
>>   * another chip. We don't model the PowerBus but the END trigger
>> @@ -1047,6 +1266,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>>      /*
>>       * Follows IVPE notification
>>       */
>> +    xive_presenter_notify(xrtr, format,
>> +                          GETFIELD_BE32(END_W6_NVT_BLOCK, end.w6),
>> +                          GETFIELD_BE32(END_W6_NVT_INDEX, end.w6),
>> +                          GETFIELD_BE32(END_W7_F0_IGNORE, end.w7),
>> +                          priority,
>> +                          GETFIELD_BE32(END_W7_F1_LOG_SERVER_ID, end.w7));
>> +
>> +    /* TODO: Auto EOI. */
>>  }
>>  
>>  static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface
  2018-12-07  2:07       ` David Gibson
@ 2018-12-07  9:08         ` Cédric Le Goater
  0 siblings, 0 replies; 66+ messages in thread
From: Cédric Le Goater @ 2018-12-07  9:08 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

On 12/7/18 3:07 AM, David Gibson wrote:
> On Thu, Dec 06, 2018 at 07:17:47AM +0100, Cédric Le Goater wrote:
>> On 12/6/18 4:25 AM, David Gibson wrote:
>>> On Thu, Dec 06, 2018 at 12:22:17AM +0100, Cédric Le Goater wrote:
>>>> The XiveNotifier offers a simple interface, between the XiveSource
>>>> object and the main interrupt controller of the machine. It will
>>>> forward event notifications to the XIVE Interrupt Virtualization
>>>> Routing Engine (IVRE).
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  include/hw/ppc/xive.h | 23 +++++++++++++++++++++++
>>>>  hw/intc/xive.c        | 25 +++++++++++++++++++++++++
>>>>  2 files changed, 48 insertions(+)
>>>>
>>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>>>> index 7cebc32eba4c..6770cffec67d 100644
>>>> --- a/include/hw/ppc/xive.h
>>>> +++ b/include/hw/ppc/xive.h
>>>> @@ -142,6 +142,27 @@
>>>>  
>>>>  #include "hw/qdev-core.h"
>>>>  
>>>> +/*
>>>> + * XIVE Fabric (Interface between Source and Router)
>>>> + */
>>>> +
>>>> +typedef struct XiveNotifier {
>>>> +    Object parent;
>>>> +} XiveNotifier;
>>>> +
>>>> +#define TYPE_XIVE_NOTIFIER "xive-fabric"
>>>
>>> I'm applying this, but changing the string here from "xive-fabric" to
>>> "xive-notifier".
>>
>> Ah yes. My sed command missed that.
> 
> So, I've now applied patches 1-5.  I think we've agreed to a change in
> patch 6 which will require tweaks to a bunch further down the series,

yes.

> so I think the rest will need a v7.

Yes.

Here's my TODO list : 

- some asserts and comments
- introduce a helper to get the CAM line of a ring
- rework the physical CAM line setting.
- introduce a "block" property in XiveENDSource
- remove chip_id id from XiveRouter
- may be modify XiveRouter to be a QOM interface


The hcalls are the big remaining part to review before adding the 
XIVE-only machine.

I think I will move up in the patchset the TCG "dual" machine because 
it does not need much more changes and the XICS machine is not impacted.

KVM will come after.
 

Thanks,

C. 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter
  2018-12-07  8:49     ` Cédric Le Goater
@ 2018-12-10  3:05       ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-10  3:05 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 13719 bytes --]

On Fri, Dec 07, 2018 at 09:49:29AM +0100, Cédric Le Goater wrote:
> On 12/7/18 4:10 AM, David Gibson wrote:
> > On Thu, Dec 06, 2018 at 12:22:22AM +0100, Cédric Le Goater wrote:
> >> The last sub-engine of the XIVE architecture is the Interrupt
> >> Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
> >> IVPE share elements, the Power Bus interface (CQ), the routing table
> >> descriptors, and they can be combined in the same HW logic. We do the
> >> same in QEMU and combine both engines in the XiveRouter for
> >> simplicity.
> >>
> >> When the IVRE has completed its job of matching an event source with a
> >> Notification Virtual Target (NVT) to notify, it forwards the event
> >> notification to the IVPE sub-engine. The IVPE scans the thread
> >> interrupt contexts of the Notification Virtual Targets (NVT)
> >> dispatched on the HW processor threads and if a match is found, it
> >> signals the thread. If not, the IVPE escalates the notification to
> >> some other targets and records the notification in a backlog queue.
> >>
> >> The IVPE maintains the thread interrupt context state for each of its
> >> NVTs not dispatched on HW processor threads in the Notification
> >> Virtual Target table (NVTT).
> >>
> >> The model currently only supports single NVT notifications.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  include/hw/ppc/xive.h      |  15 +++
> >>  include/hw/ppc/xive_regs.h |  24 ++++
> >>  hw/intc/xive.c             | 227 +++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 266 insertions(+)
> >>
> >> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >> index 74b547707b17..e9b06e75fc1c 100644
> >> --- a/include/hw/ppc/xive.h
> >> +++ b/include/hw/ppc/xive.h
> >> @@ -327,6 +327,10 @@ typedef struct XiveRouterClass {
> >>                     XiveEND *end);
> >>      int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>                       XiveEND *end, uint8_t word_number);
> >> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                   XiveNVT *nvt);
> >> +    int (*write_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                     XiveNVT *nvt, uint8_t word_number);
> >>  } XiveRouterClass;
> >>  
> >>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> >> @@ -337,6 +341,11 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>                          XiveEND *end);
> >>  int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>                            XiveEND *end, uint8_t word_number);
> >> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                        XiveNVT *nvt);
> >> +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                          XiveNVT *nvt, uint8_t word_number);
> >> +
> >>  
> >>  /*
> >>   * XIVE END ESBs
> >> @@ -393,6 +402,7 @@ typedef struct XiveTCTX {
> >>      qemu_irq    output;
> >>  
> >>      uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
> >> +    uint32_t    hw_cam;
> > 
> > I don't love having this as a separate field.  Since it also appears
> > within the register space, it's kind of redundant. 
> 
> yes.
> 
> > On the other hand,
> > I see that wiring up the property directly to the register space
> > doesn't really work.  Not sure how to deal with that one.
> 
> We could use get/set properties for "hw-cam" to assign WORD2 of the 
> physical ring and exclude it from reset, which makes some sense. The
> test on the PHYS ring in xive_presenter_tctx_match() would also look 
> like the other tests. I think this is better.

Ok sounds good.

> On a related topic, WORD2 of the OS ring is assigned by the hypervisor. 
> For the sPAPR machine, this is done when the sPAPR IRQ backend is 
> reseted. See patch 21 in v6.

Yes, I figured.

[snip]
> >> +/*
> >> + * The thread context register words are in big-endian format.
> >> + */
> >> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
> >> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                     bool cam_ignore, uint32_t logic_serv)
> >> +{
> >> +    uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx);
> >> +    uint8_t *regs;
> >> +    uint32_t qw3w2;
> >> +    uint32_t qw2w2;
> >> +    uint32_t qw1w2;
> >> +    uint32_t qw0w2;
> >> +
> >> +    /* TODO (PowerNV): ignore low order bits of nvt id */
> >> +
> >> +    regs = &tctx->regs[TM_QW3_HV_PHYS];
> >> +    qw3w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> > 
> > This is one of the main places we access regs and we have to do
> > horrible casting.  Would it make more sense for it to be a uint32_t
> > array?  Or at least for the local *regs to be.
> 
> The register array is accessed by byte (patch 9) for the first two 
> words and by word for WORD2. I don't see any good solution apart 
> from a helper routine maybe : 
> 
>   static inline uint32_t xive_tctx_word2(int8_t *regs)
>   {
>       return be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
>   }
> 
> which I need for xive_tctx_ring_print() also.

Well, you could at least make the regs local variable a uint32_t *,
since you're only accessing the 32-bit parts of the ring in this
function.

Alternatively, you could represent the regs not with a plain array,
but a structure which has some u8 fields and some u32 fields.

>  
> >> +    regs = &tctx->regs[TM_QW2_HV_POOL];
> >> +    qw2w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> >> +    regs = &tctx->regs[TM_QW1_OS];
> >> +    qw1w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> >> +    regs = &tctx->regs[TM_QW0_USER];
> >> +    qw0w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> >> +
> >> +    if (format == 0) {
> >> +        /* F=0 & i=1: Logical server notification */
> > 
> > I'm guessing the i=1 is the cam_ignore==true check?  Maybe put this
> > comment inside the if block to make that clearer.
> 
> yes. 
> 
> > 
> >> +        if (cam_ignore == true) {
> >> +            qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n",
> >> +                          nvt_blk, nvt_idx);
> >> +             return -1;
> >> +        }
> >> +
> >> +        /* F=0 & i=0: Specific NVT notification */
> >> +
> >> +        /* PHYS ring */
> >> +        if ((qw3w2 & TM_QW3W2_VT) &&
> >> +            tctx->hw_cam == hw_cam_line(nvt_blk, nvt_idx)) {
> >> +            return TM_QW3_HV_PHYS;
> >> +        }
> >> +
> >> +        /* HV POOL ring */
> >> +        if ((qw2w2 & TM_QW2W2_VP) &&
> >> +            cam == GETFIELD(TM_QW2W2_POOL_CAM, qw2w2)) {
> > 
> > Does that need to be a GETFIELD_BE32?
> 
> the qw[0123]w2 variables have been byteswapped already. But, that might
> not be a good idea. in that case, we should byteswap the V[TPOU] bit value 
> instead ? What's your opinion.

Actually I think it's fine as it is, I was just missing that the
locals were already byteswapped values.  As a rule I dislike
byteswapping constants rather than the variable part (at least partly
because it's a pattern that *only* works for bitwise operations).

> 
> we would get rid of the be32_to_cpu() above 
> 
> > 
> >> +            return TM_QW2_HV_POOL;
> >> +        }
> >> +
> >> +        /* OS ring */
> >> +        if ((qw1w2 & TM_QW1W2_VO) &&
> >> +            cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) {
> > 
> > And here.
> > 
> >> +            return TM_QW1_OS;
> >> +        }
> >> +    } else {
> >> +        /* F=1 : User level Event-Based Branch (EBB) notification */
> >> +
> >> +        /* USER ring */
> >> +        if  ((qw1w2 & TM_QW1W2_VO) &&
> >> +             (cam == GETFIELD(TM_QW1W2_OS_CAM, qw1w2)) &&
> > 
> > And here.
> > 
> >> +             (qw0w2 & TM_QW0W2_VU) &&
> >> +             (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, qw0w2))) {
> >> +            return TM_QW0_USER;
> >> +        }
> >> +    }
> >> +    return -1;
> >> +}
> >> +
> >> +typedef struct XiveTCTXMatch {
> >> +    XiveTCTX *tctx;
> >> +    uint8_t ring;
> >> +} XiveTCTXMatch;
> >> +
> >> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> >> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                 bool cam_ignore, uint8_t priority,
> >> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
> >> +{
> >> +    CPUState *cs;
> >> +
> >> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
> >> +     * hardwired CAM compares */
> >> +
> >> +    CPU_FOREACH(cs) {
> >> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> >> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> >> +        int ring;
> >> +
> >> +        /*
> >> +         * HW checks that the CPU is enabled in the Physical Thread
> >> +         * Enable Register (PTER).
> >> +         */
> >> +
> >> +        /*
> >> +         * Check the thread context CAM lines and record matches. We
> >> +         * will handle CPU exception delivery later
> >> +         */
> >> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> >> +                                         cam_ignore, logic_serv);
> >> +        /*
> >> +         * Save the context and follow on to catch duplicates, that we
> >> +         * don't support yet.
> >> +         */
> >> +        if (ring != -1) {
> >> +            if (match->tctx) {
> >> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
> >> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
> >> +                return false;
> >> +            }
> >> +
> >> +            match->ring = ring;
> >> +            match->tctx = tctx;
> >> +        }
> >> +    }
> >> +
> >> +    if (!match->tctx) {
> >> +        qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
> >> +                      nvt_blk, nvt_idx);
> >> +        return false;
> >> +    }
> >> +
> >> +    return true;
> >> +}
> >> +
> >> +/*
> >> + * This is our simple Xive Presenter Engine model. It is merged in the
> >> + * Router as it does not require an extra object.
> >> + *
> >> + * It receives notification requests sent by the IVRE to find one
> >> + * matching NVT (or more) dispatched on the processor threads. In case
> >> + * of a single NVT notification, the process is abreviated and the
> >> + * thread is signaled if a match is found. In case of a logical server
> >> + * notification (bits ignored at the end of the NVT identifier), the
> >> + * IVPE and IVRE select a winning thread using different filters. This
> >> + * involves 2 or 3 exchanges on the PowerBus that the model does not
> >> + * support.
> >> + *
> >> + * The parameters represent what is sent on the PowerBus
> >> + */
> >> +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
> >> +                                  uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                  bool cam_ignore, uint8_t priority,
> >> +                                  uint32_t logic_serv)
> >> +{
> >> +    XiveNVT nvt;
> >> +    XiveTCTXMatch match = { 0 };
> > 
> > IIUC that's initializing the tctx pointer field of match, so should be
> > NULL, not 0 (yes, technically they're equivalent in C, but using 0 for
> > a pointer is confusing).
> 
> OK. I will clarify.
> 
> > 
> >> +    bool found;
> >> +
> >> +    /* NVT cache lookup */
> >> +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
> >> +                      nvt_blk, nvt_idx);
> >> +        return;
> >> +    }
> >> +
> >> +    if (!xive_nvt_is_valid(&nvt)) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
> >> +                      nvt_blk, nvt_idx);
> >> +        return;
> >> +    }
> >> +
> >> +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
> >> +                                 priority, logic_serv, &match);
> >> +    if (found) {
> >> +        return;
> >> +    }
> >> +
> >> +    /* If no matching NVT is dispatched on a HW thread :
> >> +     * - update the NVT structure if backlog is activated
> >> +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
> >> +     *   activated
> >> +     */
> >> +}
> >> +
> >>  /*
> >>   * An END trigger can come from an event trigger (IPI or HW) or from
> >>   * another chip. We don't model the PowerBus but the END trigger
> >> @@ -1047,6 +1266,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
> >>      /*
> >>       * Follows IVPE notification
> >>       */
> >> +    xive_presenter_notify(xrtr, format,
> >> +                          GETFIELD_BE32(END_W6_NVT_BLOCK, end.w6),
> >> +                          GETFIELD_BE32(END_W6_NVT_INDEX, end.w6),
> >> +                          GETFIELD_BE32(END_W7_F0_IGNORE, end.w7),
> >> +                          priority,
> >> +                          GETFIELD_BE32(END_W7_F1_LOG_SERVER_ID, end.w7));
> >> +
> >> +    /* TODO: Auto EOI. */
> >>  }
> >>  
> >>  static void xive_router_notify(XiveNotifier *xn, uint32_t lisn)
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model
  2018-12-07  7:49         ` Cédric Le Goater
@ 2018-12-10  3:07           ` David Gibson
  0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2018-12-10  3:07 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-ppc, qemu-devel, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]

On Fri, Dec 07, 2018 at 08:49:21AM +0100, Cédric Le Goater wrote:
> On 12/7/18 2:57 AM, David Gibson wrote:
> > On Thu, Dec 06, 2018 at 07:22:54AM +0100, Cédric Le Goater wrote:
> >> On 12/6/18 4:41 AM, David Gibson wrote:
> >>> On Thu, Dec 06, 2018 at 12:22:18AM +0100, Cédric Le Goater wrote:
> >>>> The XiveRouter models the second sub-engine of the XIVE architecture :
> >>>> the Interrupt Virtualization Routing Engine (IVRE).
> >>>>
> >>>> The IVRE handles event notifications of the IVSE and performs the
> >>>> interrupt routing process. For this purpose, it uses a set of tables
> >>>> stored in system memory, the first of which being the Event Assignment
> >>>> Structure (EAS) table.
> >>>>
> >>>> The EAT associates an interrupt source number with an Event Notification
> >>>> Descriptor (END) which will be used in a second phase of the routing
> >>>> process to identify a Notification Virtual Target.
> >>>>
> >>>> The XiveRouter is an abstract class which needs to be inherited from
> >>>> to define a storage for the EAT, and other upcoming tables.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  include/hw/ppc/xive.h      | 31 ++++++++++++++++
> >>>>  include/hw/ppc/xive_regs.h | 50 +++++++++++++++++++++++++
> >>>>  hw/intc/xive.c             | 76 ++++++++++++++++++++++++++++++++++++++
> >>>>  3 files changed, 157 insertions(+)
> >>>>  create mode 100644 include/hw/ppc/xive_regs.h
> >>>>
> >>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >>>> index 6770cffec67d..57ec9f84f527 100644
> >>>> --- a/include/hw/ppc/xive.h
> >>>> +++ b/include/hw/ppc/xive.h
> >>>> @@ -141,6 +141,8 @@
> >>>>  #define PPC_XIVE_H
> >>>>  
> >>>>  #include "hw/qdev-core.h"
> >>>> +#include "hw/sysbus.h"
> >>>> +#include "hw/ppc/xive_regs.h"
> >>>>  
> >>>>  /*
> >>>>   * XIVE Fabric (Interface between Source and Router)
> >>>> @@ -297,4 +299,33 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
> >>>>      }
> >>>>  }
> >>>>  
> >>>> +/*
> >>>> + * XIVE Router
> >>>> + */
> >>>> +
> >>>> +typedef struct XiveRouter {
> >>>> +    SysBusDevice    parent;
> >>>
> >>> I thought the plan was to make XiveRouter as well as XiveSource a
> >>> TYPE_DEVICE descendent rather than a SysBusDevice?
> >>
> >> We start talking about that, indeed, but then :
> >>
> >> 	https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg06407.html
> >>
> >> I thought we concluded that it was going to get too complex.
> >>
> >> Also, sPAPRXive is a direct descendant of XiveRouter and we want sPAPRXive 
> >> on SysBus.
> > 
> > Ah, good point.  So, to clarify my thinking here - I think from a
> > theoretical point of view, having XiveRouter not be sysbus and
> > including it by composition is probably the "correct" approach.
> 
> One possible solution would be to transform the XiveRouter in a QOM 
> interface, this will be possible when I have removed the chip_id field,
> and define the VST accessors as we do today. I am not sure how QOM 
> interfaces are considered, but I think they are more in the composition 
> pattern than inheritance. That way, we could have sPAPRXive directly 
> inherit from SysBusDevice.
> 
> I can give it a try for v7, and you could merge the small XiveRouter 
> changes in the current XiveRouter patch.
> 
> > But I can also see that that will be a bit of a pain in practice.  So
> > yes, keeping it as a SysBusDevice is ok, at least as long as any
> > migration stuff is in the "outermost" / most specific type, which I
> > believe it is.
> 
> By this sentence, you mean that we don't rely on the XiveRouter model 
> to capture the sPAPRXive state ?

Yes.  Basically we should only have VMStateDecriptions registered by
the spapr specific objects, not the internal parts / superclasses
they're composed of.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2018-12-10  3:07 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 23:22 [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 01/37] ppc/xive: introduce a XIVE interrupt source model Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 02/37] ppc/xive: add support for the LSI interrupt sources Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 03/37] ppc/xive: introduce the XiveNotifier interface Cédric Le Goater
2018-12-06  3:25   ` David Gibson
2018-12-06  6:17     ` Cédric Le Goater
2018-12-07  2:07       ` David Gibson
2018-12-07  9:08         ` Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 04/37] ppc/xive: introduce the XiveRouter model Cédric Le Goater
2018-12-06  3:41   ` David Gibson
2018-12-06  6:22     ` Cédric Le Goater
2018-12-07  1:57       ` David Gibson
2018-12-07  7:49         ` Cédric Le Goater
2018-12-10  3:07           ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 05/37] ppc/xive: introduce the XIVE Event Notification Descriptors Cédric Le Goater
2018-12-06  3:56   ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 06/37] ppc/xive: add support for the END Event State buffers Cédric Le Goater
2018-12-06  4:09   ` David Gibson
2018-12-06  6:30     ` Cédric Le Goater
2018-12-07  2:05       ` David Gibson
2018-12-07  7:48         ` Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 07/37] ppc/xive: introduce the XIVE interrupt thread context Cédric Le Goater
2018-12-06  4:31   ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 08/37] ppc/xive: introduce a simplified XIVE presenter Cédric Le Goater
2018-12-07  3:10   ` David Gibson
2018-12-07  8:49     ` Cédric Le Goater
2018-12-10  3:05       ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 09/37] ppc/xive: notify the CPU when the interrupt priority is more privileged Cédric Le Goater
2018-12-07  3:27   ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 10/37] spapr/xive: introduce a XIVE interrupt controller Cédric Le Goater
2018-12-07  3:39   ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 11/37] spapr/xive: use the VCPU id as a NVT identifier Cédric Le Goater
2018-12-07  3:46   ` David Gibson
2018-12-07  8:05     ` Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 12/37] spapr: initialize VSMT before initializing the IRQ backend Cédric Le Goater
2018-12-07  3:59   ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 13/37] spapr: introduce a spapr_irq_init() routine Cédric Le Goater
2018-12-07  4:04   ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 14/37] spapr: modify the irq backend 'init' method Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 15/37] spapr: export and rename the xics_max_server_number() routine Cédric Le Goater
2018-12-07  4:07   ` David Gibson
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 16/37] spapr: introdude a new machine IRQ backend for XIVE Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 17/37] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 18/37] spapr: add device tree support for the XIVE exploitation mode Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 19/37] spapr: allocate the interrupt thread context under the CPU core Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 20/37] spapr: extend the sPAPR IRQ backend for XICS migration Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 21/37] spapr: add a 'reset' method to the sPAPR IRQ backend Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 22/37] spapr: add a 'pseries-3.1-xive' machine type Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 23/37] linux-headers: update to 4.20-rc5 Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 24/37] spapr/xive: add KVM support Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 25/37] spapr/xive: add state synchronization with KVM Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 26/37] spapr/xive: introduce a VM state change handler Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 27/37] spapr/xive: add migration support for KVM Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 28/37] spapr/xive: fix migration of the XiveTCTX under TCG Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 29/37] spapr: set the interrupt presenter at reset Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 30/37] spapr/xive: enable XIVE MMIOs " Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 31/37] spapr: add a 'pseries-3.1-dual' machine type Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 32/37] ppc/xics: introduce a icp_kvm_connect() routine Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 33/37] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 34/37] sysbus: add a sysbus_mmio_unmap() helper Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 35/37] spapr: introduce routines to delete the KVM IRQ device Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 36/37] spapr: check for KVM IRQ device activation Cédric Le Goater
2018-12-05 23:22 ` [Qemu-devel] [PATCH v6 37/37] spapr: add KVM support to the 'dual' machine Cédric Le Goater
2018-12-06  1:10 ` [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9) no-reply
2018-12-06  6:14   ` Cédric Le Goater
2018-12-06  9:24     ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.