All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges
@ 2020-01-27 14:45 Cédric Le Goater
  2020-01-27 14:45 ` [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge Cédric Le Goater
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Cédric Le Goater @ 2020-01-27 14:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-ppc, Oliver O'Halloran, qemu-devel, Nicholas Piggin,
	Cédric Le Goater

Hello,

These are models for the PCIe Host Bridges, PHB3 and PHB4, as found on
POWER8 and POWER9 processors. It includes the PowerBus logic interface
(PBCQ), IOMMU support, a single PCIe Gen.3/4 Root Complex, and support
for MSI and LSI interrupt sources as found on each system depending on
the interrupt controller: XICS or XIVE.

No default device layout is provided and PCI devices can be added on
any of the available PCIe Root Port (pcie.0 .. 2) with address 0x0 as
the firwware (skiboot) only accepts a single device per root port. To
run a simple system with a network and a storage adapters, use a
command line options such as :

  -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
  -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0

  -device megasas,id=scsi0,bus=pcie.1,addr=0x0
  -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
  -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2

If more are needed, include a bridge.

Multi chip is supported, each chip adding its set of PHB controllers
and its PCI busses. The model doesn't emulate the EEH error handling
and cold plugging PHB devices still needs some work.

XICS requires some adjustment to support the PHB3 MSI. The changes are
provided in the PHB3 model but they could be decoupled in prereq patches.

Thanks,

C.

Benjamin Herrenschmidt (1):
  ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge

Cédric Le Goater (1):
  ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge

 include/hw/pci-host/pnv_phb3.h      |  164 +++
 include/hw/pci-host/pnv_phb3_regs.h |  450 +++++++++
 include/hw/pci-host/pnv_phb4.h      |  230 +++++
 include/hw/pci-host/pnv_phb4_regs.h |  553 ++++++++++
 include/hw/pci/pcie_port.h          |    1 +
 include/hw/ppc/pnv.h                |   11 +
 include/hw/ppc/pnv_xscom.h          |   20 +
 include/hw/ppc/xics.h               |    5 +
 hw/intc/xics.c                      |   14 +-
 hw/pci-host/pnv_phb3.c              | 1195 ++++++++++++++++++++++
 hw/pci-host/pnv_phb3_msi.c          |  349 +++++++
 hw/pci-host/pnv_phb3_pbcq.c         |  357 +++++++
 hw/pci-host/pnv_phb4.c              | 1438 +++++++++++++++++++++++++++
 hw/pci-host/pnv_phb4_pec.c          |  593 +++++++++++
 hw/ppc/pnv.c                        |  176 +++-
 hw/pci-host/Makefile.objs           |    2 +
 hw/ppc/Kconfig                      |    2 +
 17 files changed, 5557 insertions(+), 3 deletions(-)
 create mode 100644 include/hw/pci-host/pnv_phb3.h
 create mode 100644 include/hw/pci-host/pnv_phb3_regs.h
 create mode 100644 include/hw/pci-host/pnv_phb4.h
 create mode 100644 include/hw/pci-host/pnv_phb4_regs.h
 create mode 100644 hw/pci-host/pnv_phb3.c
 create mode 100644 hw/pci-host/pnv_phb3_msi.c
 create mode 100644 hw/pci-host/pnv_phb3_pbcq.c
 create mode 100644 hw/pci-host/pnv_phb4.c
 create mode 100644 hw/pci-host/pnv_phb4_pec.c

-- 
2.21.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
  2020-01-27 14:45 [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges Cédric Le Goater
@ 2020-01-27 14:45 ` Cédric Le Goater
  2020-01-29  3:09   ` David Gibson
  2020-01-27 14:45 ` [PATCH 2/2] ppc/pnv: Add models for POWER8 PHB3 " Cédric Le Goater
  2020-01-29  6:31 ` [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges David Gibson
  2 siblings, 1 reply; 9+ messages in thread
From: Cédric Le Goater @ 2020-01-27 14:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, Nicholas Piggin, qemu-ppc, Cédric Le Goater,
	Oliver O'Halloran

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

These changes introduces models for the PCIe Host Bridge (PHB4) of the
POWER9 processor. It includes the PowerBus logic interface (PBCQ),
IOMMU support, a single PCIe Gen.4 Root Complex, and support for MSI
and LSI interrupt sources as found on a POWER9 system using the XIVE
interrupt controller.

POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and
each PEC can have several PHBs. By default,

  * PEC0 provides 1 PHB  (PHB0)
  * PEC1 provides 2 PHBs (PHB1 and PHB2)
  * PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5)

Each PEC has a set  "global" registers and some "per-stack" (per-PHB)
registers. Those are organized in two XSCOM ranges, the "Nest" range
and the "PCI" range, each range contains both some "PEC" registers and
some "per-stack" registers.

No default device layout is provided and PCI devices can be added on
any of the available PCIe Root Port (pcie.0 .. 2 of a Power9 chip)
with address 0x0 as the firwware (skiboot) only accepts a single
device per root port. To run a simple system with a network and a
storage adapters, use a command line options such as :

  -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
  -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0

  -device megasas,id=scsi0,bus=pcie.1,addr=0x0
  -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
  -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2

If more are needed, include a bridge.

Multi chip is supported, each chip adding its set of PHB4 controllers
and its PCI busses. The model doesn't emulate the EEH error handling.

This model is not ready for hotplug yet.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[ clg: - numerous cleanups
       - commit log
       - fix for broken LSI support
       - PHB pic printinfo
       - large QOM rework ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/pci-host/pnv_phb4.h      |  230 +++++
 include/hw/pci-host/pnv_phb4_regs.h |  553 ++++++++++
 include/hw/pci/pcie_port.h          |    1 +
 include/hw/ppc/pnv.h                |    7 +
 include/hw/ppc/pnv_xscom.h          |   11 +
 hw/pci-host/pnv_phb4.c              | 1438 +++++++++++++++++++++++++++
 hw/pci-host/pnv_phb4_pec.c          |  593 +++++++++++
 hw/ppc/pnv.c                        |  107 ++
 hw/pci-host/Makefile.objs           |    1 +
 hw/ppc/Kconfig                      |    2 +
 10 files changed, 2943 insertions(+)
 create mode 100644 include/hw/pci-host/pnv_phb4.h
 create mode 100644 include/hw/pci-host/pnv_phb4_regs.h
 create mode 100644 hw/pci-host/pnv_phb4.c
 create mode 100644 hw/pci-host/pnv_phb4_pec.c

diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
new file mode 100644
index 000000000000..c882bfd0aa23
--- /dev/null
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -0,0 +1,230 @@
+/*
+ * QEMU PowerPC PowerNV (POWER9) PHB4 model
+ *
+ * Copyright (c) 2018-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PCI_HOST_PNV_PHB4_H
+#define PCI_HOST_PNV_PHB4_H
+
+#include "hw/pci/pcie_host.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/ppc/xive.h"
+
+typedef struct PnvPhb4PecState PnvPhb4PecState;
+typedef struct PnvPhb4PecStack PnvPhb4PecStack;
+typedef struct PnvPHB4 PnvPHB4;
+typedef struct PnvChip PnvChip;
+
+/*
+ * We have one such address space wrapper per possible device under
+ * the PHB since they need to be assigned statically at qemu device
+ * creation time. The relationship to a PE is done later
+ * dynamically. This means we can potentially create a lot of these
+ * guys. Q35 stores them as some kind of radix tree but we never
+ * really need to do fast lookups so instead we simply keep a QLIST of
+ * them for now, we can add the radix if needed later on.
+ *
+ * We do cache the PE number to speed things up a bit though.
+ */
+typedef struct PnvPhb4DMASpace {
+    PCIBus *bus;
+    uint8_t devfn;
+    int pe_num;         /* Cached PE number */
+#define PHB_INVALID_PE (-1)
+    PnvPHB4 *phb;
+    AddressSpace dma_as;
+    IOMMUMemoryRegion dma_mr;
+    MemoryRegion msi32_mr;
+    MemoryRegion msi64_mr;
+    QLIST_ENTRY(PnvPhb4DMASpace) list;
+} PnvPhb4DMASpace;
+
+/*
+ * PHB4 PCIe Root port
+ */
+#define TYPE_PNV_PHB4_ROOT_BUS "pnv-phb4-root-bus"
+#define TYPE_PNV_PHB4_ROOT_PORT "pnv-phb4-root-port"
+
+typedef struct PnvPHB4RootPort {
+    PCIESlot parent_obj;
+} PnvPHB4RootPort;
+
+/*
+ * PHB4 PCIe Host Bridge for PowerNV machines (POWER9)
+ */
+#define TYPE_PNV_PHB4 "pnv-phb4"
+#define PNV_PHB4(obj) OBJECT_CHECK(PnvPHB4, (obj), TYPE_PNV_PHB4)
+
+#define PNV_PHB4_MAX_LSIs          8
+#define PNV_PHB4_MAX_INTs          4096
+#define PNV_PHB4_MAX_MIST          (PNV_PHB4_MAX_INTs >> 2)
+#define PNV_PHB4_MAX_MMIO_WINDOWS  32
+#define PNV_PHB4_MIN_MMIO_WINDOWS  16
+#define PNV_PHB4_NUM_REGS          (0x3000 >> 3)
+#define PNV_PHB4_MAX_PEs           512
+#define PNV_PHB4_MAX_TVEs          (PNV_PHB4_MAX_PEs * 2)
+#define PNV_PHB4_MAX_PEEVs         (PNV_PHB4_MAX_PEs / 64)
+#define PNV_PHB4_MAX_MBEs          (PNV_PHB4_MAX_MMIO_WINDOWS * 2)
+
+#define PNV_PHB4_VERSION           0x000000a400000002ull
+#define PNV_PHB4_DEVICE_ID         0x04c1
+
+#define PCI_MMIO_TOTAL_SIZE        (0x1ull << 60)
+
+struct PnvPHB4 {
+    PCIExpressHost parent_obj;
+
+    PnvPHB4RootPort root;
+
+    uint32_t chip_id;
+    uint32_t phb_id;
+
+    uint64_t version;
+    uint16_t device_id;
+
+    char bus_path[8];
+
+    /* Main register images */
+    uint64_t regs[PNV_PHB4_NUM_REGS];
+    MemoryRegion mr_regs;
+
+    /* Extra SCOM-only register */
+    uint64_t scom_hv_ind_addr_reg;
+
+    /*
+     * Geometry of the PHB. There are two types, small and big PHBs, a
+     * number of resources (number of PEs, windows etc...) are doubled
+     * for a big PHB
+     */
+    bool big_phb;
+
+    /* Memory regions for MMIO space */
+    MemoryRegion mr_mmio[PNV_PHB4_MAX_MMIO_WINDOWS];
+
+    /* PCI side space */
+    MemoryRegion pci_mmio;
+    MemoryRegion pci_io;
+
+    /* On-chip IODA tables */
+    uint64_t ioda_LIST[PNV_PHB4_MAX_LSIs];
+    uint64_t ioda_MIST[PNV_PHB4_MAX_MIST];
+    uint64_t ioda_TVT[PNV_PHB4_MAX_TVEs];
+    uint64_t ioda_MBT[PNV_PHB4_MAX_MBEs];
+    uint64_t ioda_MDT[PNV_PHB4_MAX_PEs];
+    uint64_t ioda_PEEV[PNV_PHB4_MAX_PEEVs];
+
+    /*
+     * The internal PESTA/B is 2 bits per PE split into two tables, we
+     * store them in a single array here to avoid wasting space.
+     */
+    uint8_t  ioda_PEST_AB[PNV_PHB4_MAX_PEs];
+
+    /* P9 Interrupt generation */
+    XiveSource xsrc;
+    qemu_irq *qirqs;
+
+    PnvPhb4PecStack *stack;
+
+    QLIST_HEAD(, PnvPhb4DMASpace) dma_spaces;
+};
+
+void pnv_phb4_pic_print_info(PnvPHB4 *phb, Monitor *mon);
+void pnv_phb4_update_regions(PnvPhb4PecStack *stack);
+extern const MemoryRegionOps pnv_phb4_xscom_ops;
+
+/*
+ * PHB4 PEC (PCI Express Controller)
+ */
+#define TYPE_PNV_PHB4_PEC "pnv-phb4-pec"
+#define PNV_PHB4_PEC(obj) \
+    OBJECT_CHECK(PnvPhb4PecState, (obj), TYPE_PNV_PHB4_PEC)
+
+#define TYPE_PNV_PHB4_PEC_STACK "pnv-phb4-pec-stack"
+#define PNV_PHB4_PEC_STACK(obj) \
+    OBJECT_CHECK(PnvPhb4PecStack, (obj), TYPE_PNV_PHB4_PEC_STACK)
+
+/* Per-stack data */
+struct PnvPhb4PecStack {
+    DeviceState parent;
+
+    /* My own stack number */
+    uint32_t stack_no;
+
+    /* Nest registers */
+#define PHB4_PEC_NEST_STK_REGS_COUNT  0x17
+    uint64_t nest_regs[PHB4_PEC_NEST_STK_REGS_COUNT];
+    MemoryRegion nest_regs_mr;
+
+    /* PCI registers (excluding pass-through) */
+#define PHB4_PEC_PCI_STK_REGS_COUNT  0xf
+    uint64_t pci_regs[PHB4_PEC_PCI_STK_REGS_COUNT];
+    MemoryRegion pci_regs_mr;
+
+    /* PHB pass-through XSCOM */
+    MemoryRegion phb_regs_mr;
+
+    /* Memory windows from PowerBus to PHB */
+    MemoryRegion mmbar0;
+    MemoryRegion mmbar1;
+    MemoryRegion phbbar;
+    MemoryRegion intbar;
+    uint64_t mmio0_base;
+    uint64_t mmio0_size;
+    uint64_t mmio1_base;
+    uint64_t mmio1_size;
+
+    /* The owner PEC */
+    PnvPhb4PecState *pec;
+
+    /* The actual PHB */
+    PnvPHB4 phb;
+};
+
+struct PnvPhb4PecState {
+    DeviceState parent;
+
+    /* PEC number in chip */
+    uint32_t index;
+    uint32_t chip_id;
+
+    MemoryRegion *system_memory;
+
+    /* Nest registers, excuding per-stack */
+#define PHB4_PEC_NEST_REGS_COUNT    0xf
+    uint64_t nest_regs[PHB4_PEC_NEST_REGS_COUNT];
+    MemoryRegion nest_regs_mr;
+
+    /* PCI registers, excluding per-stack */
+#define PHB4_PEC_PCI_REGS_COUNT     0x2
+    uint64_t pci_regs[PHB4_PEC_PCI_REGS_COUNT];
+    MemoryRegion pci_regs_mr;
+
+    /* Stacks */
+    #define PHB4_PEC_MAX_STACKS     3
+    uint32_t num_stacks;
+    PnvPhb4PecStack stacks[PHB4_PEC_MAX_STACKS];
+};
+
+#define PNV_PHB4_PEC_CLASS(klass) \
+     OBJECT_CLASS_CHECK(PnvPhb4PecClass, (klass), TYPE_PNV_PHB4_PEC)
+#define PNV_PHB4_PEC_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(PnvPhb4PecClass, (obj), TYPE_PNV_PHB4_PEC)
+
+typedef struct PnvPhb4PecClass {
+    DeviceClass parent_class;
+
+    uint32_t (*xscom_nest_base)(PnvPhb4PecState *pec);
+    uint32_t xscom_nest_size;
+    uint32_t (*xscom_pci_base)(PnvPhb4PecState *pec);
+    uint32_t xscom_pci_size;
+    const char *compat;
+    int compat_size;
+    const char *stk_compat;
+    int stk_compat_size;
+} PnvPhb4PecClass;
+
+#endif /* PCI_HOST_PNV_PHB4_H */
diff --git a/include/hw/pci-host/pnv_phb4_regs.h b/include/hw/pci-host/pnv_phb4_regs.h
new file mode 100644
index 000000000000..55df2c3e5ece
--- /dev/null
+++ b/include/hw/pci-host/pnv_phb4_regs.h
@@ -0,0 +1,553 @@
+/*
+ * QEMU PowerPC PowerNV (POWER9) PHB4 model
+ *
+ * Copyright (c) 2013-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PCI_HOST_PNV_PHB4_REGS_H
+#define PCI_HOST_PNV_PHB4_REGS_H
+
+/*
+ * PEC XSCOM registers
+ *
+ * There a 3 PECs in P9. Each PEC can have several PHBs. Each PEC has some
+ * "global" registers and some "per-stack" (per-PHB) registers. Those are
+ * organized in two XSCOM ranges, the "Nest" range and the "PCI" range, each
+ * range contains both some "PEC" registers and some "per-stack" registers.
+ *
+ * Finally the PCI range also contains an additional range per stack that
+ * passes through to some of the PHB own registers.
+ *
+ * PEC0 can contain 1 PHB  (PHB0)
+ * PEC1 can contain 2 PHBs (PHB1 and PHB2)
+ * PEC2 can contain 3 PHBs (PHB3, PHB4 and PHB5)
+ */
+
+/*
+ * This is the "stack" offset, it's the offset from a given range base
+ * to the first "per-stack" registers and also the stride between
+ * stacks, thus for PEC2, the global registers are at offset 0, the
+ * PHB3 registers at offset 0x40, the PHB4 at offset 0x80 etc....
+ *
+ * It is *also* the offset to the pass-through SCOM region but in this case
+ * it is 0 based, ie PHB3 is at 0x100 PHB4 is a 0x140 etc..
+ */
+#define PEC_STACK_OFFSET        0x40
+
+/* XSCOM Nest global registers */
+#define PEC_NEST_PBCQ_HW_CONFIG         0x00
+#define PEC_NEST_DROP_PRIO_CTRL         0x01
+#define PEC_NEST_PBCQ_ERR_INJECT        0x02
+#define PEC_NEST_PCI_NEST_CLK_TRACE_CTL 0x03
+#define PEC_NEST_PBCQ_PMON_CTRL         0x04
+#define PEC_NEST_PBCQ_PBUS_ADDR_EXT     0x05
+#define PEC_NEST_PBCQ_PRED_VEC_TIMEOUT  0x06
+#define PEC_NEST_CAPP_CTRL              0x07
+#define PEC_NEST_PBCQ_READ_STK_OVR      0x08
+#define PEC_NEST_PBCQ_WRITE_STK_OVR     0x09
+#define PEC_NEST_PBCQ_STORE_STK_OVR     0x0a
+#define PEC_NEST_PBCQ_RETRY_BKOFF_CTRL  0x0b
+
+/* XSCOM Nest per-stack registers */
+#define PEC_NEST_STK_PCI_NEST_FIR       0x00
+#define PEC_NEST_STK_PCI_NEST_FIR_CLR   0x01
+#define PEC_NEST_STK_PCI_NEST_FIR_SET   0x02
+#define PEC_NEST_STK_PCI_NEST_FIR_MSK   0x03
+#define PEC_NEST_STK_PCI_NEST_FIR_MSKC  0x04
+#define PEC_NEST_STK_PCI_NEST_FIR_MSKS  0x05
+#define PEC_NEST_STK_PCI_NEST_FIR_ACT0  0x06
+#define PEC_NEST_STK_PCI_NEST_FIR_ACT1  0x07
+#define PEC_NEST_STK_PCI_NEST_FIR_WOF   0x08
+#define PEC_NEST_STK_ERR_REPORT_0       0x0a
+#define PEC_NEST_STK_ERR_REPORT_1       0x0b
+#define PEC_NEST_STK_PBCQ_GNRL_STATUS   0x0c
+#define PEC_NEST_STK_PBCQ_MODE          0x0d
+#define PEC_NEST_STK_MMIO_BAR0          0x0e
+#define PEC_NEST_STK_MMIO_BAR0_MASK     0x0f
+#define PEC_NEST_STK_MMIO_BAR1          0x10
+#define PEC_NEST_STK_MMIO_BAR1_MASK     0x11
+#define PEC_NEST_STK_PHB_REGS_BAR       0x12
+#define PEC_NEST_STK_INT_BAR            0x13
+#define PEC_NEST_STK_BAR_EN             0x14
+#define   PEC_NEST_STK_BAR_EN_MMIO0             PPC_BIT(0)
+#define   PEC_NEST_STK_BAR_EN_MMIO1             PPC_BIT(1)
+#define   PEC_NEST_STK_BAR_EN_PHB               PPC_BIT(2)
+#define   PEC_NEST_STK_BAR_EN_INT               PPC_BIT(3)
+#define PEC_NEST_STK_DATA_FRZ_TYPE      0x15
+#define PEC_NEST_STK_PBCQ_TUN_BAR       0x16
+
+/* XSCOM PCI global registers */
+#define PEC_PCI_PBAIB_HW_CONFIG         0x00
+#define PEC_PCI_PBAIB_READ_STK_OVR      0x02
+
+/* XSCOM PCI per-stack registers */
+#define PEC_PCI_STK_PCI_FIR             0x00
+#define PEC_PCI_STK_PCI_FIR_CLR         0x01
+#define PEC_PCI_STK_PCI_FIR_SET         0x02
+#define PEC_PCI_STK_PCI_FIR_MSK         0x03
+#define PEC_PCI_STK_PCI_FIR_MSKC        0x04
+#define PEC_PCI_STK_PCI_FIR_MSKS        0x05
+#define PEC_PCI_STK_PCI_FIR_ACT0        0x06
+#define PEC_PCI_STK_PCI_FIR_ACT1        0x07
+#define PEC_PCI_STK_PCI_FIR_WOF         0x08
+#define PEC_PCI_STK_ETU_RESET           0x0a
+#define PEC_PCI_STK_PBAIB_ERR_REPORT    0x0b
+#define PEC_PCI_STK_PBAIB_TX_CMD_CRED   0x0d
+#define PEC_PCI_STK_PBAIB_TX_DAT_CRED   0x0e
+
+/*
+ * PHB "SCOM" registers. This is accessed via the above window
+ * and provides a backdoor to the PHB when the AIB bus is not
+ * functional. Some of these directly map some of the PHB MMIO
+ * registers, some are specific and allow indirect access to a
+ * wider range of PHB registers
+ */
+#define PHB_SCOM_HV_IND_ADDR            0x00
+#define   PHB_SCOM_HV_IND_ADDR_VALID            PPC_BIT(0)
+#define   PHB_SCOM_HV_IND_ADDR_4B               PPC_BIT(1)
+#define   PHB_SCOM_HV_IND_ADDR_AUTOINC          PPC_BIT(2)
+#define   PHB_SCOM_HV_IND_ADDR_ADDR             PPC_BITMASK(51, 63)
+#define PHB_SCOM_HV_IND_DATA            0x01
+#define PHB_SCOM_ETU_LEM_FIR            0x08
+#define PHB_SCOM_ETU_LEM_FIR_AND        0x09
+#define PHB_SCOM_ETU_LEM_FIR_OR         0x0a
+#define PHB_SCOM_ETU_LEM_FIR_MSK        0x0b
+#define PHB_SCOM_ETU_LEM_ERR_MSK_AND    0x0c
+#define PHB_SCOM_ETU_LEM_ERR_MSK_OR     0x0d
+#define PHB_SCOM_ETU_LEM_ACT0           0x0e
+#define PHB_SCOM_ETU_LEM_ACT1           0x0f
+#define PHB_SCOM_ETU_LEM_WOF            0x10
+#define PHB_SCOM_ETU_PMON_CONFIG        0x17
+#define PHB_SCOM_ETU_PMON_CTR0          0x18
+#define PHB_SCOM_ETU_PMON_CTR1          0x19
+#define PHB_SCOM_ETU_PMON_CTR2          0x1a
+#define PHB_SCOM_ETU_PMON_CTR3          0x1b
+
+
+/*
+ * PHB MMIO registers
+ */
+
+/* PHB Fundamental register set A */
+#define PHB_LSI_SOURCE_ID               0x100
+#define   PHB_LSI_SRC_ID                PPC_BITMASK(4, 12)
+#define PHB_DMA_CHAN_STATUS             0x110
+#define   PHB_DMA_CHAN_ANY_ERR          PPC_BIT(27)
+#define   PHB_DMA_CHAN_ANY_ERR1         PPC_BIT(28)
+#define   PHB_DMA_CHAN_ANY_FREEZE       PPC_BIT(29)
+#define PHB_CPU_LOADSTORE_STATUS        0x120
+#define   PHB_CPU_LS_ANY_ERR            PPC_BIT(27)
+#define   PHB_CPU_LS_ANY_ERR1           PPC_BIT(28)
+#define   PHB_CPU_LS_ANY_FREEZE         PPC_BIT(29)
+#define PHB_CONFIG_DATA                 0x130
+#define PHB_LOCK0                       0x138
+#define PHB_CONFIG_ADDRESS              0x140
+#define   PHB_CA_ENABLE                 PPC_BIT(0)
+#define   PHB_CA_STATUS                 PPC_BITMASK(1, 3)
+#define     PHB_CA_STATUS_GOOD          0
+#define     PHB_CA_STATUS_UR            1
+#define     PHB_CA_STATUS_CRS           2
+#define     PHB_CA_STATUS_CA            4
+#define   PHB_CA_BUS                    PPC_BITMASK(4, 11)
+#define   PHB_CA_DEV                    PPC_BITMASK(12, 16)
+#define   PHB_CA_FUNC                   PPC_BITMASK(17, 19)
+#define   PHB_CA_BDFN                   PPC_BITMASK(4, 19) /* bus,dev,func */
+#define   PHB_CA_REG                    PPC_BITMASK(20, 31)
+#define   PHB_CA_PE                     PPC_BITMASK(39, 47)
+#define PHB_LOCK1                       0x148
+#define PHB_PHB4_CONFIG                 0x160
+#define   PHB_PHB4C_32BIT_MSI_EN        PPC_BIT(8)
+#define   PHB_PHB4C_64BIT_MSI_EN        PPC_BIT(14)
+#define PHB_RTT_BAR                     0x168
+#define   PHB_RTT_BAR_ENABLE            PPC_BIT(0)
+#define   PHB_RTT_BASE_ADDRESS_MASK     PPC_BITMASK(8, 46)
+#define PHB_PELTV_BAR                   0x188
+#define   PHB_PELTV_BAR_ENABLE          PPC_BIT(0)
+#define   PHB_PELTV_BASE_ADDRESS        PPC_BITMASK(8, 50)
+#define PHB_M32_START_ADDR              0x1a0
+#define PHB_PEST_BAR                    0x1a8
+#define   PHB_PEST_BAR_ENABLE           PPC_BIT(0)
+#define   PHB_PEST_BASE_ADDRESS         PPC_BITMASK(8, 51)
+#define PHB_ASN_CMPM                    0x1C0
+#define   PHB_ASN_CMPM_ENABLE           PPC_BIT(63)
+#define PHB_CAPI_CMPM                   0x1C8
+#define   PHB_CAPI_CMPM_ENABLE          PPC_BIT(63)
+#define PHB_M64_AOMASK                  0x1d0
+#define PHB_M64_UPPER_BITS              0x1f0
+#define PHB_NXLATE_PREFIX               0x1f8
+#define PHB_DMARD_SYNC                  0x200
+#define   PHB_DMARD_SYNC_START          PPC_BIT(0)
+#define   PHB_DMARD_SYNC_COMPLETE       PPC_BIT(1)
+#define PHB_RTC_INVALIDATE              0x208
+#define   PHB_RTC_INVALIDATE_ALL        PPC_BIT(0)
+#define   PHB_RTC_INVALIDATE_RID        PPC_BITMASK(16, 31)
+#define PHB_TCE_KILL                    0x210
+#define   PHB_TCE_KILL_ALL              PPC_BIT(0)
+#define   PHB_TCE_KILL_PE               PPC_BIT(1)
+#define   PHB_TCE_KILL_ONE              PPC_BIT(2)
+#define   PHB_TCE_KILL_PSEL             PPC_BIT(3)
+#define   PHB_TCE_KILL_64K              0x1000 /* Address override */
+#define   PHB_TCE_KILL_2M               0x2000 /* Address override */
+#define   PHB_TCE_KILL_1G               0x3000 /* Address override */
+#define   PHB_TCE_KILL_PENUM            PPC_BITMASK(55, 63)
+#define PHB_TCE_SPEC_CTL                0x218
+#define PHB_IODA_ADDR                   0x220
+#define   PHB_IODA_AD_AUTOINC           PPC_BIT(0)
+#define   PHB_IODA_AD_TSEL              PPC_BITMASK(11, 15)
+#define   PHB_IODA_AD_MIST_PWV          PPC_BITMASK(28, 31)
+#define   PHB_IODA_AD_TADR              PPC_BITMASK(54, 63)
+#define PHB_IODA_DATA0                  0x228
+#define PHB_PHB4_GEN_CAP                0x250
+#define PHB_PHB4_TCE_CAP                0x258
+#define PHB_PHB4_IRQ_CAP                0x260
+#define PHB_PHB4_EEH_CAP                0x268
+#define PHB_PAPR_ERR_INJ_CTL            0x2b0
+#define   PHB_PAPR_ERR_INJ_CTL_INB      PPC_BIT(0)
+#define   PHB_PAPR_ERR_INJ_CTL_OUTB     PPC_BIT(1)
+#define   PHB_PAPR_ERR_INJ_CTL_STICKY   PPC_BIT(2)
+#define   PHB_PAPR_ERR_INJ_CTL_CFG      PPC_BIT(3)
+#define   PHB_PAPR_ERR_INJ_CTL_RD       PPC_BIT(4)
+#define   PHB_PAPR_ERR_INJ_CTL_WR       PPC_BIT(5)
+#define   PHB_PAPR_ERR_INJ_CTL_FREEZE   PPC_BIT(6)
+#define PHB_PAPR_ERR_INJ_ADDR           0x2b8
+#define   PHB_PAPR_ERR_INJ_ADDR_MMIO            PPC_BITMASK(16, 63)
+#define PHB_PAPR_ERR_INJ_MASK           0x2c0
+#define   PHB_PAPR_ERR_INJ_MASK_CFG             PPC_BITMASK(4, 11)
+#define   PHB_PAPR_ERR_INJ_MASK_CFG_ALL         PPC_BITMASK(4, 19)
+#define   PHB_PAPR_ERR_INJ_MASK_MMIO            PPC_BITMASK(16, 63)
+#define PHB_ETU_ERR_SUMMARY             0x2c8
+#define PHB_INT_NOTIFY_ADDR             0x300
+#define PHB_INT_NOTIFY_INDEX            0x308
+
+/* Fundamental register set B */
+#define PHB_VERSION                     0x800
+#define PHB_CTRLR                       0x810
+#define   PHB_CTRLR_IRQ_PGSZ_64K        PPC_BIT(11)
+#define   PHB_CTRLR_IRQ_STORE_EOI       PPC_BIT(12)
+#define   PHB_CTRLR_MMIO_RD_STRICT      PPC_BIT(13)
+#define   PHB_CTRLR_MMIO_EEH_DISABLE    PPC_BIT(14)
+#define   PHB_CTRLR_CFG_EEH_BLOCK       PPC_BIT(15)
+#define   PHB_CTRLR_FENCE_LNKILL_DIS    PPC_BIT(16)
+#define   PHB_CTRLR_TVT_ADDR_SEL        PPC_BITMASK(17, 19)
+#define     TVT_DD1_1_PER_PE            0
+#define     TVT_DD1_2_PER_PE            1
+#define     TVT_DD1_4_PER_PE            2
+#define     TVT_DD1_8_PER_PE            3
+#define     TVT_DD1_16_PER_PE           4
+#define     TVT_2_PER_PE                0
+#define     TVT_4_PER_PE                1
+#define     TVT_8_PER_PE                2
+#define     TVT_16_PER_PE               3
+#define   PHB_CTRLR_DMA_RD_SPACING      PPC_BITMASK(28, 31)
+#define PHB_AIB_FENCE_CTRL              0x860
+#define PHB_TCE_TAG_ENABLE              0x868
+#define PHB_TCE_WATERMARK               0x870
+#define PHB_TIMEOUT_CTRL1               0x878
+#define PHB_TIMEOUT_CTRL2               0x880
+#define PHB_Q_DMA_R                     0x888
+#define   PHB_Q_DMA_R_QUIESCE_DMA       PPC_BIT(0)
+#define   PHB_Q_DMA_R_AUTORESET         PPC_BIT(1)
+#define   PHB_Q_DMA_R_DMA_RESP_STATUS   PPC_BIT(4)
+#define   PHB_Q_DMA_R_MMIO_RESP_STATUS  PPC_BIT(5)
+#define   PHB_Q_DMA_R_TCE_RESP_STATUS   PPC_BIT(6)
+#define   PHB_Q_DMA_R_TCE_KILL_STATUS   PPC_BIT(7)
+#define PHB_TCE_TAG_STATUS              0x908
+
+/* FIR & Error registers */
+#define PHB_LEM_FIR_ACCUM               0xc00
+#define PHB_LEM_FIR_AND_MASK            0xc08
+#define PHB_LEM_FIR_OR_MASK             0xc10
+#define PHB_LEM_ERROR_MASK              0xc18
+#define PHB_LEM_ERROR_AND_MASK          0xc20
+#define PHB_LEM_ERROR_OR_MASK           0xc28
+#define PHB_LEM_ACTION0                 0xc30
+#define PHB_LEM_ACTION1                 0xc38
+#define PHB_LEM_WOF                     0xc40
+#define PHB_ERR_STATUS                  0xc80
+#define PHB_ERR1_STATUS                 0xc88
+#define PHB_ERR_INJECT                  0xc90
+#define PHB_ERR_LEM_ENABLE              0xc98
+#define PHB_ERR_IRQ_ENABLE              0xca0
+#define PHB_ERR_FREEZE_ENABLE           0xca8
+#define PHB_ERR_AIB_FENCE_ENABLE        0xcb0
+#define PHB_ERR_LOG_0                   0xcc0
+#define PHB_ERR_LOG_1                   0xcc8
+#define PHB_ERR_STATUS_MASK             0xcd0
+#define PHB_ERR1_STATUS_MASK            0xcd8
+
+#define PHB_TXE_ERR_STATUS                      0xd00
+#define PHB_TXE_ERR1_STATUS                     0xd08
+#define PHB_TXE_ERR_INJECT                      0xd10
+#define PHB_TXE_ERR_LEM_ENABLE                  0xd18
+#define PHB_TXE_ERR_IRQ_ENABLE                  0xd20
+#define PHB_TXE_ERR_FREEZE_ENABLE               0xd28
+#define PHB_TXE_ERR_AIB_FENCE_ENABLE            0xd30
+#define PHB_TXE_ERR_LOG_0                       0xd40
+#define PHB_TXE_ERR_LOG_1                       0xd48
+#define PHB_TXE_ERR_STATUS_MASK                 0xd50
+#define PHB_TXE_ERR1_STATUS_MASK                0xd58
+
+#define PHB_RXE_ARB_ERR_STATUS                  0xd80
+#define PHB_RXE_ARB_ERR1_STATUS                 0xd88
+#define PHB_RXE_ARB_ERR_INJECT                  0xd90
+#define PHB_RXE_ARB_ERR_LEM_ENABLE              0xd98
+#define PHB_RXE_ARB_ERR_IRQ_ENABLE              0xda0
+#define PHB_RXE_ARB_ERR_FREEZE_ENABLE           0xda8
+#define PHB_RXE_ARB_ERR_AIB_FENCE_ENABLE        0xdb0
+#define PHB_RXE_ARB_ERR_LOG_0                   0xdc0
+#define PHB_RXE_ARB_ERR_LOG_1                   0xdc8
+#define PHB_RXE_ARB_ERR_STATUS_MASK             0xdd0
+#define PHB_RXE_ARB_ERR1_STATUS_MASK            0xdd8
+
+#define PHB_RXE_MRG_ERR_STATUS                  0xe00
+#define PHB_RXE_MRG_ERR1_STATUS                 0xe08
+#define PHB_RXE_MRG_ERR_INJECT                  0xe10
+#define PHB_RXE_MRG_ERR_LEM_ENABLE              0xe18
+#define PHB_RXE_MRG_ERR_IRQ_ENABLE              0xe20
+#define PHB_RXE_MRG_ERR_FREEZE_ENABLE           0xe28
+#define PHB_RXE_MRG_ERR_AIB_FENCE_ENABLE        0xe30
+#define PHB_RXE_MRG_ERR_LOG_0                   0xe40
+#define PHB_RXE_MRG_ERR_LOG_1                   0xe48
+#define PHB_RXE_MRG_ERR_STATUS_MASK             0xe50
+#define PHB_RXE_MRG_ERR1_STATUS_MASK            0xe58
+
+#define PHB_RXE_TCE_ERR_STATUS                  0xe80
+#define PHB_RXE_TCE_ERR1_STATUS                 0xe88
+#define PHB_RXE_TCE_ERR_INJECT                  0xe90
+#define PHB_RXE_TCE_ERR_LEM_ENABLE              0xe98
+#define PHB_RXE_TCE_ERR_IRQ_ENABLE              0xea0
+#define PHB_RXE_TCE_ERR_FREEZE_ENABLE           0xea8
+#define PHB_RXE_TCE_ERR_AIB_FENCE_ENABLE        0xeb0
+#define PHB_RXE_TCE_ERR_LOG_0                   0xec0
+#define PHB_RXE_TCE_ERR_LOG_1                   0xec8
+#define PHB_RXE_TCE_ERR_STATUS_MASK             0xed0
+#define PHB_RXE_TCE_ERR1_STATUS_MASK            0xed8
+
+/* Performance monitor & Debug registers */
+#define PHB_TRACE_CONTROL                       0xf80
+#define PHB_PERFMON_CONFIG                      0xf88
+#define PHB_PERFMON_CTR0                        0xf90
+#define PHB_PERFMON_CTR1                        0xf98
+#define PHB_PERFMON_CTR2                        0xfa0
+#define PHB_PERFMON_CTR3                        0xfa8
+
+/* Root complex config space memory mapped */
+#define PHB_RC_CONFIG_BASE                      0x1000
+#define   PHB_RC_CONFIG_SIZE                    0x800
+
+/* PHB4 REGB registers */
+
+/* PBL core */
+#define PHB_PBL_CONTROL                         0x1800
+#define PHB_PBL_TIMEOUT_CTRL                    0x1810
+#define PHB_PBL_NPTAG_ENABLE                    0x1820
+#define PHB_PBL_NBW_CMP_MASK                    0x1830
+#define   PHB_PBL_NBW_MASK_ENABLE               PPC_BIT(63)
+#define PHB_PBL_SYS_LINK_INIT                   0x1838
+#define PHB_PBL_BUF_STATUS                      0x1840
+#define PHB_PBL_ERR_STATUS                      0x1900
+#define PHB_PBL_ERR1_STATUS                     0x1908
+#define PHB_PBL_ERR_INJECT                      0x1910
+#define PHB_PBL_ERR_INF_ENABLE                  0x1920
+#define PHB_PBL_ERR_ERC_ENABLE                  0x1928
+#define PHB_PBL_ERR_FAT_ENABLE                  0x1930
+#define PHB_PBL_ERR_LOG_0                       0x1940
+#define PHB_PBL_ERR_LOG_1                       0x1948
+#define PHB_PBL_ERR_STATUS_MASK                 0x1950
+#define PHB_PBL_ERR1_STATUS_MASK                0x1958
+
+/* PCI-E stack */
+#define PHB_PCIE_SCR                    0x1A00
+#define   PHB_PCIE_SCR_SLOT_CAP         PPC_BIT(15)
+#define   PHB_PCIE_SCR_MAXLINKSPEED     PPC_BITMASK(32, 35)
+
+
+#define PHB_PCIE_CRESET                 0x1A10
+#define   PHB_PCIE_CRESET_CFG_CORE      PPC_BIT(0)
+#define   PHB_PCIE_CRESET_TLDLP         PPC_BIT(1)
+#define   PHB_PCIE_CRESET_PBL           PPC_BIT(2)
+#define   PHB_PCIE_CRESET_PERST_N       PPC_BIT(3)
+#define   PHB_PCIE_CRESET_PIPE_N        PPC_BIT(4)
+
+
+#define PHB_PCIE_HOTPLUG_STATUS         0x1A20
+#define   PHB_PCIE_HPSTAT_PRESENCE      PPC_BIT(10)
+
+#define PHB_PCIE_DLP_TRAIN_CTL          0x1A40
+#define   PHB_PCIE_DLP_LINK_WIDTH       PPC_BITMASK(30, 35)
+#define   PHB_PCIE_DLP_LINK_SPEED       PPC_BITMASK(36, 39)
+#define   PHB_PCIE_DLP_LTSSM_TRC        PPC_BITMASK(24, 27)
+#define     PHB_PCIE_DLP_LTSSM_RESET    0
+#define     PHB_PCIE_DLP_LTSSM_DETECT   1
+#define     PHB_PCIE_DLP_LTSSM_POLLING  2
+#define     PHB_PCIE_DLP_LTSSM_CONFIG   3
+#define     PHB_PCIE_DLP_LTSSM_L0       4
+#define     PHB_PCIE_DLP_LTSSM_REC      5
+#define     PHB_PCIE_DLP_LTSSM_L1       6
+#define     PHB_PCIE_DLP_LTSSM_L2       7
+#define     PHB_PCIE_DLP_LTSSM_HOTRESET 8
+#define     PHB_PCIE_DLP_LTSSM_DISABLED 9
+#define     PHB_PCIE_DLP_LTSSM_LOOPBACK 10
+#define   PHB_PCIE_DLP_TL_LINKACT       PPC_BIT(23)
+#define   PHB_PCIE_DLP_DL_PGRESET       PPC_BIT(22)
+#define   PHB_PCIE_DLP_TRAINING         PPC_BIT(20)
+#define   PHB_PCIE_DLP_INBAND_PRESENCE  PPC_BIT(19)
+
+#define PHB_PCIE_DLP_CTL                0x1A78
+#define   PHB_PCIE_DLP_CTL_BYPASS_PH2   PPC_BIT(4)
+#define   PHB_PCIE_DLP_CTL_BYPASS_PH3   PPC_BIT(5)
+
+#define PHB_PCIE_DLP_TRWCTL             0x1A80
+#define   PHB_PCIE_DLP_TRWCTL_EN        PPC_BIT(0)
+
+#define PHB_PCIE_DLP_ERRLOG1            0x1AA0
+#define PHB_PCIE_DLP_ERRLOG2            0x1AA8
+#define PHB_PCIE_DLP_ERR_STATUS         0x1AB0
+#define PHB_PCIE_DLP_ERR_COUNTERS       0x1AB8
+
+#define PHB_PCIE_LANE_EQ_CNTL0          0x1AD0
+#define PHB_PCIE_LANE_EQ_CNTL1          0x1AD8
+#define PHB_PCIE_LANE_EQ_CNTL2          0x1AE0
+#define PHB_PCIE_LANE_EQ_CNTL3          0x1AE8
+#define PHB_PCIE_LANE_EQ_CNTL20         0x1AF0
+#define PHB_PCIE_LANE_EQ_CNTL21         0x1AF8
+#define PHB_PCIE_LANE_EQ_CNTL22         0x1B00 /* DD1 only */
+#define PHB_PCIE_LANE_EQ_CNTL23         0x1B08 /* DD1 only */
+#define PHB_PCIE_TRACE_CTRL             0x1B20
+#define PHB_PCIE_MISC_STRAP             0x1B30
+
+/* Error */
+#define PHB_REGB_ERR_STATUS             0x1C00
+#define PHB_REGB_ERR1_STATUS            0x1C08
+#define PHB_REGB_ERR_INJECT             0x1C10
+#define PHB_REGB_ERR_INF_ENABLE         0x1C20
+#define PHB_REGB_ERR_ERC_ENABLE         0x1C28
+#define PHB_REGB_ERR_FAT_ENABLE         0x1C30
+#define PHB_REGB_ERR_LOG_0              0x1C40
+#define PHB_REGB_ERR_LOG_1              0x1C48
+#define PHB_REGB_ERR_STATUS_MASK        0x1C50
+#define PHB_REGB_ERR1_STATUS_MASK       0x1C58
+
+/*
+ * IODA3 on-chip tables
+ */
+
+#define IODA3_TBL_LIST          1
+#define IODA3_TBL_MIST          2
+#define IODA3_TBL_RCAM          5
+#define IODA3_TBL_MRT           6
+#define IODA3_TBL_PESTA         7
+#define IODA3_TBL_PESTB         8
+#define IODA3_TBL_TVT           9
+#define IODA3_TBL_TCR           10
+#define IODA3_TBL_TDR           11
+#define IODA3_TBL_MBT           16
+#define IODA3_TBL_MDT           17
+#define IODA3_TBL_PEEV          20
+
+/* LIST */
+#define IODA3_LIST_P                    PPC_BIT(6)
+#define IODA3_LIST_Q                    PPC_BIT(7)
+#define IODA3_LIST_STATE                PPC_BIT(14)
+
+/* MIST */
+#define IODA3_MIST_P3                   PPC_BIT(48 + 0)
+#define IODA3_MIST_Q3                   PPC_BIT(48 + 1)
+#define IODA3_MIST_PE3                  PPC_BITMASK(48 + 4, 48 + 15)
+
+/* TVT */
+#define IODA3_TVT_TABLE_ADDR            PPC_BITMASK(0, 47)
+#define IODA3_TVT_NUM_LEVELS            PPC_BITMASK(48, 50)
+#define   IODA3_TVE_1_LEVEL     0
+#define   IODA3_TVE_2_LEVELS    1
+#define   IODA3_TVE_3_LEVELS    2
+#define   IODA3_TVE_4_LEVELS    3
+#define   IODA3_TVE_5_LEVELS    4
+#define IODA3_TVT_TCE_TABLE_SIZE        PPC_BITMASK(51, 55)
+#define IODA3_TVT_NON_TRANSLATE_50      PPC_BIT(56)
+#define IODA3_TVT_IO_PSIZE              PPC_BITMASK(59, 63)
+
+/* PESTA */
+#define IODA3_PESTA_MMIO_FROZEN         PPC_BIT(0)
+#define IODA3_PESTA_TRANS_TYPE          PPC_BITMASK(5, 7)
+#define  IODA3_PESTA_TRANS_TYPE_MMIOLOAD 0x4
+#define IODA3_PESTA_CA_CMPLT_TMT        PPC_BIT(8)
+#define IODA3_PESTA_UR                  PPC_BIT(9)
+
+/* PESTB */
+#define IODA3_PESTB_DMA_STOPPED         PPC_BIT(0)
+
+/* MDT */
+/* FIXME: check this field with Eric and add a B, C and D */
+#define IODA3_MDT_PE_A                  PPC_BITMASK(0, 15)
+#define IODA3_MDT_PE_B                  PPC_BITMASK(16, 31)
+#define IODA3_MDT_PE_C                  PPC_BITMASK(32, 47)
+#define IODA3_MDT_PE_D                  PPC_BITMASK(48, 63)
+
+/* MBT */
+#define IODA3_MBT0_ENABLE               PPC_BIT(0)
+#define IODA3_MBT0_TYPE                 PPC_BIT(1)
+#define   IODA3_MBT0_TYPE_M32           IODA3_MBT0_TYPE
+#define   IODA3_MBT0_TYPE_M64           0
+#define IODA3_MBT0_MODE                 PPC_BITMASK(2, 3)
+#define   IODA3_MBT0_MODE_PE_SEG        0
+#define   IODA3_MBT0_MODE_MDT           1
+#define   IODA3_MBT0_MODE_SINGLE_PE     2
+#define IODA3_MBT0_SEG_DIV              PPC_BITMASK(4, 5)
+#define   IODA3_MBT0_SEG_DIV_MAX        0
+#define   IODA3_MBT0_SEG_DIV_128        1
+#define   IODA3_MBT0_SEG_DIV_64         2
+#define   IODA3_MBT0_SEG_DIV_8          3
+#define IODA3_MBT0_MDT_COLUMN           PPC_BITMASK(4, 5)
+#define IODA3_MBT0_BASE_ADDR            PPC_BITMASK(8, 51)
+
+#define IODA3_MBT1_ENABLE               PPC_BIT(0)
+#define IODA3_MBT1_MASK                 PPC_BITMASK(8, 51)
+#define IODA3_MBT1_SEG_BASE             PPC_BITMASK(55, 63)
+#define IODA3_MBT1_SINGLE_PE_NUM        PPC_BITMASK(55, 63)
+
+/*
+ * IODA3 in-memory tables
+ */
+
+/*
+ * PEST
+ *
+ * 2x8 bytes entries, PEST0 and PEST1
+ */
+
+#define IODA3_PEST0_MMIO_CAUSE          PPC_BIT(2)
+#define IODA3_PEST0_CFG_READ            PPC_BIT(3)
+#define IODA3_PEST0_CFG_WRITE           PPC_BIT(4)
+#define IODA3_PEST0_TTYPE               PPC_BITMASK(5, 7)
+#define   PEST_TTYPE_DMA_WRITE          0
+#define   PEST_TTYPE_MSI                1
+#define   PEST_TTYPE_DMA_READ           2
+#define   PEST_TTYPE_DMA_READ_RESP      3
+#define   PEST_TTYPE_MMIO_LOAD          4
+#define   PEST_TTYPE_MMIO_STORE         5
+#define   PEST_TTYPE_OTHER              7
+#define IODA3_PEST0_CA_RETURN           PPC_BIT(8)
+#define IODA3_PEST0_UR_RETURN           PPC_BIT(9)
+#define IODA3_PEST0_PCIE_NONFATAL       PPC_BIT(10)
+#define IODA3_PEST0_PCIE_FATAL          PPC_BIT(11)
+#define IODA3_PEST0_PARITY_UE           PPC_BIT(13)
+#define IODA3_PEST0_PCIE_CORRECTABLE    PPC_BIT(14)
+#define IODA3_PEST0_PCIE_INTERRUPT      PPC_BIT(15)
+#define IODA3_PEST0_MMIO_XLATE          PPC_BIT(16)
+#define IODA3_PEST0_IODA3_ERROR         PPC_BIT(16) /* Same bit as MMIO xlate */
+#define IODA3_PEST0_TCE_PAGE_FAULT      PPC_BIT(18)
+#define IODA3_PEST0_TCE_ACCESS_FAULT    PPC_BIT(19)
+#define IODA3_PEST0_DMA_RESP_TIMEOUT    PPC_BIT(20)
+#define IODA3_PEST0_AIB_SIZE_INVALID    PPC_BIT(21)
+#define IODA3_PEST0_LEM_BIT             PPC_BITMASK(26, 31)
+#define IODA3_PEST0_RID                 PPC_BITMASK(32, 47)
+#define IODA3_PEST0_MSI_DATA            PPC_BITMASK(48, 63)
+
+#define IODA3_PEST1_FAIL_ADDR           PPC_BITMASK(3, 63)
+
+
+#endif /* PCI_HOST_PNV_PHB4_REGS_H */
diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
index 75154300870f..4b3d254b0821 100644
--- a/include/hw/pci/pcie_port.h
+++ b/include/hw/pci/pcie_port.h
@@ -72,6 +72,7 @@ void pcie_chassis_del_slot(PCIESlot *s);
 typedef struct PCIERootPortClass {
     PCIDeviceClass parent_class;
     DeviceRealize parent_realize;
+    DeviceReset parent_reset;
 
     uint8_t (*aer_vector)(const PCIDevice *dev);
     int (*interrupts_init)(PCIDevice *dev, Error **errp);
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index f225f2f6bf67..805f9058f5d9 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -30,6 +30,7 @@
 #include "hw/ppc/pnv_homer.h"
 #include "hw/ppc/pnv_xive.h"
 #include "hw/ppc/pnv_core.h"
+#include "hw/pci-host/pnv_phb4.h"
 
 #define TYPE_PNV_CHIP "pnv-chip"
 #define PNV_CHIP(obj) OBJECT_CHECK(PnvChip, (obj), TYPE_PNV_CHIP)
@@ -52,6 +53,8 @@ typedef struct PnvChip {
     uint64_t     cores_mask;
     PnvCore      **cores;
 
+    uint32_t     num_phbs;
+
     MemoryRegion xscom_mmio;
     MemoryRegion xscom;
     AddressSpace xscom_as;
@@ -93,6 +96,9 @@ typedef struct Pnv9Chip {
 
     uint32_t     nr_quads;
     PnvQuad      *quads;
+
+#define PNV9_CHIP_MAX_PEC 3
+    PnvPhb4PecState pecs[PNV9_CHIP_MAX_PEC];
 } Pnv9Chip;
 
 /*
@@ -120,6 +126,7 @@ typedef struct PnvChipClass {
     /*< public >*/
     uint64_t     chip_cfam_id;
     uint64_t     cores_mask;
+    uint32_t     num_phbs;
 
     DeviceRealize parent_realize;
 
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index f74c81a980f3..0fc57b036753 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -94,6 +94,17 @@ typedef struct PnvXScomInterfaceClass {
 #define PNV9_XSCOM_XIVE_BASE      0x5013000
 #define PNV9_XSCOM_XIVE_SIZE      0x300
 
+#define PNV9_XSCOM_PEC_NEST_BASE  0x4010c00
+#define PNV9_XSCOM_PEC_NEST_SIZE  0x100
+
+#define PNV9_XSCOM_PEC_PCI_BASE   0xd010800
+#define PNV9_XSCOM_PEC_PCI_SIZE   0x200
+
+/* XSCOM PCI "pass-through" window to PHB SCOM */
+#define PNV9_XSCOM_PEC_PCI_STK0   0x100
+#define PNV9_XSCOM_PEC_PCI_STK1   0x140
+#define PNV9_XSCOM_PEC_PCI_STK2   0x180
+
 /*
  * Layout of the XSCOM PCB addresses (POWER 10)
  */
diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
new file mode 100644
index 000000000000..3c54b02ec929
--- /dev/null
+++ b/hw/pci-host/pnv_phb4.c
@@ -0,0 +1,1438 @@
+/*
+ * QEMU PowerPC PowerNV (POWER9) PHB4 model
+ *
+ * Copyright (c) 2018-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/visitor.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "monitor/monitor.h"
+#include "target/ppc/cpu.h"
+#include "hw/pci-host/pnv_phb4_regs.h"
+#include "hw/pci-host/pnv_phb4.h"
+#include "hw/pci/pcie_host.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+
+#define phb_error(phb, fmt, ...)                                        \
+    qemu_log_mask(LOG_GUEST_ERROR, "phb4[%d:%d]: " fmt "\n",            \
+                  (phb)->chip_id, (phb)->phb_id, ## __VA_ARGS__)
+
+/*
+ * QEMU version of the GETFIELD/SETFIELD macros
+ *
+ * These are common with the PnvXive model.
+ */
+static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
+{
+    return (word & mask) >> ctz64(mask);
+}
+
+static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
+                                uint64_t value)
+{
+    return (word & ~mask) | ((value << ctz64(mask)) & mask);
+}
+
+static PCIDevice *pnv_phb4_find_cfg_dev(PnvPHB4 *phb)
+{
+    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
+    uint64_t addr = phb->regs[PHB_CONFIG_ADDRESS >> 3];
+    uint8_t bus, devfn;
+
+    if (!(addr >> 63)) {
+        return NULL;
+    }
+    bus = (addr >> 52) & 0xff;
+    devfn = (addr >> 44) & 0xff;
+
+    /* We don't access the root complex this way */
+    if (bus == 0 && devfn == 0) {
+        return NULL;
+    }
+    return pci_find_device(pci->bus, bus, devfn);
+}
+
+/*
+ * The CONFIG_DATA register expects little endian accesses, but as the
+ * region is big endian, we have to swap the value.
+ */
+static void pnv_phb4_config_write(PnvPHB4 *phb, unsigned off,
+                                  unsigned size, uint64_t val)
+{
+    uint32_t cfg_addr, limit;
+    PCIDevice *pdev;
+
+    pdev = pnv_phb4_find_cfg_dev(phb);
+    if (!pdev) {
+        return;
+    }
+    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
+    cfg_addr |= off;
+    limit = pci_config_size(pdev);
+    if (limit <= cfg_addr) {
+        /*
+         * conventional pci device can be behind pcie-to-pci bridge.
+         * 256 <= addr < 4K has no effects.
+         */
+        return;
+    }
+    switch (size) {
+    case 1:
+        break;
+    case 2:
+        val = bswap16(val);
+        break;
+    case 4:
+        val = bswap32(val);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    pci_host_config_write_common(pdev, cfg_addr, limit, val, size);
+}
+
+static uint64_t pnv_phb4_config_read(PnvPHB4 *phb, unsigned off,
+                                     unsigned size)
+{
+    uint32_t cfg_addr, limit;
+    PCIDevice *pdev;
+    uint64_t val;
+
+    pdev = pnv_phb4_find_cfg_dev(phb);
+    if (!pdev) {
+        return ~0ull;
+    }
+    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
+    cfg_addr |= off;
+    limit = pci_config_size(pdev);
+    if (limit <= cfg_addr) {
+        /*
+         * conventional pci device can be behind pcie-to-pci bridge.
+         * 256 <= addr < 4K has no effects.
+         */
+        return ~0ull;
+    }
+    val = pci_host_config_read_common(pdev, cfg_addr, limit, size);
+    switch (size) {
+    case 1:
+        return val;
+    case 2:
+        return bswap16(val);
+    case 4:
+        return bswap32(val);
+    default:
+        g_assert_not_reached();
+    }
+}
+
+/*
+ * Root complex register accesses are memory mapped.
+ */
+static void pnv_phb4_rc_config_write(PnvPHB4 *phb, unsigned off,
+                                     unsigned size, uint64_t val)
+{
+    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
+    PCIDevice *pdev;
+
+    if (size != 4) {
+        phb_error(phb, "rc_config_write invalid size %d\n", size);
+        return;
+    }
+
+    pdev = pci_find_device(pci->bus, 0, 0);
+    assert(pdev);
+
+    pci_host_config_write_common(pdev, off, PHB_RC_CONFIG_SIZE,
+                                 bswap32(val), 4);
+}
+
+static uint64_t pnv_phb4_rc_config_read(PnvPHB4 *phb, unsigned off,
+                                        unsigned size)
+{
+    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
+    PCIDevice *pdev;
+    uint64_t val;
+
+    if (size != 4) {
+        phb_error(phb, "rc_config_read invalid size %d\n", size);
+        return ~0ull;
+    }
+
+    pdev = pci_find_device(pci->bus, 0, 0);
+    assert(pdev);
+
+    val = pci_host_config_read_common(pdev, off, PHB_RC_CONFIG_SIZE, 4);
+    return bswap32(val);
+}
+
+static void pnv_phb4_check_mbt(PnvPHB4 *phb, uint32_t index)
+{
+    uint64_t base, start, size, mbe0, mbe1;
+    MemoryRegion *parent;
+    char name[64];
+
+    /* Unmap first */
+    if (memory_region_is_mapped(&phb->mr_mmio[index])) {
+        /* Should we destroy it in RCU friendly way... ? */
+        memory_region_del_subregion(phb->mr_mmio[index].container,
+                                    &phb->mr_mmio[index]);
+    }
+
+    /* Get table entry */
+    mbe0 = phb->ioda_MBT[(index << 1)];
+    mbe1 = phb->ioda_MBT[(index << 1) + 1];
+
+    if (!(mbe0 & IODA3_MBT0_ENABLE)) {
+        return;
+    }
+
+    /* Grab geometry from registers */
+    base = GETFIELD(IODA3_MBT0_BASE_ADDR, mbe0) << 12;
+    size = GETFIELD(IODA3_MBT1_MASK, mbe1) << 12;
+    size |= 0xff00000000000000ull;
+    size = ~size + 1;
+
+    /* Calculate PCI side start address based on M32/M64 window type */
+    if (mbe0 & IODA3_MBT0_TYPE_M32) {
+        start = phb->regs[PHB_M32_START_ADDR >> 3];
+        if ((start + size) > 0x100000000ull) {
+            phb_error(phb, "M32 set beyond 4GB boundary !");
+            size = 0x100000000 - start;
+        }
+    } else {
+        start = base | (phb->regs[PHB_M64_UPPER_BITS >> 3]);
+    }
+
+    /* TODO: Figure out how to implemet/decode AOMASK */
+
+    /* Check if it matches an enabled MMIO region in the PEC stack */
+    if (memory_region_is_mapped(&phb->stack->mmbar0) &&
+        base >= phb->stack->mmio0_base &&
+        (base + size) <= (phb->stack->mmio0_base + phb->stack->mmio0_size)) {
+        parent = &phb->stack->mmbar0;
+        base -= phb->stack->mmio0_base;
+    } else if (memory_region_is_mapped(&phb->stack->mmbar1) &&
+        base >= phb->stack->mmio1_base &&
+        (base + size) <= (phb->stack->mmio1_base + phb->stack->mmio1_size)) {
+        parent = &phb->stack->mmbar1;
+        base -= phb->stack->mmio1_base;
+    } else {
+        phb_error(phb, "PHB MBAR %d out of parent bounds", index);
+        return;
+    }
+
+    /* Create alias (better name ?) */
+    snprintf(name, sizeof(name), "phb4-mbar%d", index);
+    memory_region_init_alias(&phb->mr_mmio[index], OBJECT(phb), name,
+                             &phb->pci_mmio, start, size);
+    memory_region_add_subregion(parent, base, &phb->mr_mmio[index]);
+}
+
+static void pnv_phb4_check_all_mbt(PnvPHB4 *phb)
+{
+    uint64_t i;
+    uint32_t num_windows = phb->big_phb ? PNV_PHB4_MAX_MMIO_WINDOWS :
+        PNV_PHB4_MIN_MMIO_WINDOWS;
+
+    for (i = 0; i < num_windows; i++) {
+        pnv_phb4_check_mbt(phb, i);
+    }
+}
+
+static uint64_t *pnv_phb4_ioda_access(PnvPHB4 *phb,
+                                      unsigned *out_table, unsigned *out_idx)
+{
+    uint64_t adreg = phb->regs[PHB_IODA_ADDR >> 3];
+    unsigned int index = GETFIELD(PHB_IODA_AD_TADR, adreg);
+    unsigned int table = GETFIELD(PHB_IODA_AD_TSEL, adreg);
+    unsigned int mask;
+    uint64_t *tptr = NULL;
+
+    switch (table) {
+    case IODA3_TBL_LIST:
+        tptr = phb->ioda_LIST;
+        mask = 7;
+        break;
+    case IODA3_TBL_MIST:
+        tptr = phb->ioda_MIST;
+        mask = phb->big_phb ? PNV_PHB4_MAX_MIST : (PNV_PHB4_MAX_MIST >> 1);
+        mask -= 1;
+        break;
+    case IODA3_TBL_RCAM:
+        mask = phb->big_phb ? 127 : 63;
+        break;
+    case IODA3_TBL_MRT:
+        mask = phb->big_phb ? 15 : 7;
+        break;
+    case IODA3_TBL_PESTA:
+    case IODA3_TBL_PESTB:
+        mask = phb->big_phb ? PNV_PHB4_MAX_PEs : (PNV_PHB4_MAX_PEs >> 1);
+        mask -= 1;
+        break;
+    case IODA3_TBL_TVT:
+        tptr = phb->ioda_TVT;
+        mask = phb->big_phb ? PNV_PHB4_MAX_TVEs : (PNV_PHB4_MAX_TVEs >> 1);
+        mask -= 1;
+        break;
+    case IODA3_TBL_TCR:
+    case IODA3_TBL_TDR:
+        mask = phb->big_phb ? 1023 : 511;
+        break;
+    case IODA3_TBL_MBT:
+        tptr = phb->ioda_MBT;
+        mask = phb->big_phb ? PNV_PHB4_MAX_MBEs : (PNV_PHB4_MAX_MBEs >> 1);
+        mask -= 1;
+        break;
+    case IODA3_TBL_MDT:
+        tptr = phb->ioda_MDT;
+        mask = phb->big_phb ? PNV_PHB4_MAX_PEs : (PNV_PHB4_MAX_PEs >> 1);
+        mask -= 1;
+        break;
+    case IODA3_TBL_PEEV:
+        tptr = phb->ioda_PEEV;
+        mask = phb->big_phb ? PNV_PHB4_MAX_PEEVs : (PNV_PHB4_MAX_PEEVs >> 1);
+        mask -= 1;
+        break;
+    default:
+        phb_error(phb, "invalid IODA table %d", table);
+        return NULL;
+    }
+    index &= mask;
+    if (out_idx) {
+        *out_idx = index;
+    }
+    if (out_table) {
+        *out_table = table;
+    }
+    if (tptr) {
+        tptr += index;
+    }
+    if (adreg & PHB_IODA_AD_AUTOINC) {
+        index = (index + 1) & mask;
+        adreg = SETFIELD(PHB_IODA_AD_TADR, adreg, index);
+    }
+
+    phb->regs[PHB_IODA_ADDR >> 3] = adreg;
+    return tptr;
+}
+
+static uint64_t pnv_phb4_ioda_read(PnvPHB4 *phb)
+{
+    unsigned table, idx;
+    uint64_t *tptr;
+
+    tptr = pnv_phb4_ioda_access(phb, &table, &idx);
+    if (!tptr) {
+        /* Special PESTA case */
+        if (table == IODA3_TBL_PESTA) {
+            return ((uint64_t)(phb->ioda_PEST_AB[idx] & 1)) << 63;
+        } else if (table == IODA3_TBL_PESTB) {
+            return ((uint64_t)(phb->ioda_PEST_AB[idx] & 2)) << 62;
+        }
+        /* Return 0 on unsupported tables, not ff's */
+        return 0;
+    }
+    return *tptr;
+}
+
+static void pnv_phb4_ioda_write(PnvPHB4 *phb, uint64_t val)
+{
+    unsigned table, idx;
+    uint64_t *tptr;
+
+    tptr = pnv_phb4_ioda_access(phb, &table, &idx);
+    if (!tptr) {
+        /* Special PESTA case */
+        if (table == IODA3_TBL_PESTA) {
+            phb->ioda_PEST_AB[idx] &= ~1;
+            phb->ioda_PEST_AB[idx] |= (val >> 63) & 1;
+        } else if (table == IODA3_TBL_PESTB) {
+            phb->ioda_PEST_AB[idx] &= ~2;
+            phb->ioda_PEST_AB[idx] |= (val >> 62) & 2;
+        }
+        return;
+    }
+
+    /* Handle side effects */
+    switch (table) {
+    case IODA3_TBL_LIST:
+        break;
+    case IODA3_TBL_MIST: {
+        /* Special mask for MIST partial write */
+        uint64_t adreg = phb->regs[PHB_IODA_ADDR >> 3];
+        uint32_t mmask = GETFIELD(PHB_IODA_AD_MIST_PWV, adreg);
+        uint64_t v = *tptr;
+        if (mmask == 0) {
+            mmask = 0xf;
+        }
+        if (mmask & 8) {
+            v &= 0x0000ffffffffffffull;
+            v |= 0xcfff000000000000ull & val;
+        }
+        if (mmask & 4) {
+            v &= 0xffff0000ffffffffull;
+            v |= 0x0000cfff00000000ull & val;
+        }
+        if (mmask & 2) {
+            v &= 0xffffffff0000ffffull;
+            v |= 0x00000000cfff0000ull & val;
+        }
+        if (mmask & 1) {
+            v &= 0xffffffffffff0000ull;
+            v |= 0x000000000000cfffull & val;
+        }
+        *tptr = val;
+        break;
+    }
+    case IODA3_TBL_MBT:
+        *tptr = val;
+
+        /* Copy accross the valid bit to the other half */
+        phb->ioda_MBT[idx ^ 1] &= 0x7fffffffffffffffull;
+        phb->ioda_MBT[idx ^ 1] |= 0x8000000000000000ull & val;
+
+        /* Update mappings */
+        pnv_phb4_check_mbt(phb, idx >> 1);
+        break;
+    default:
+        *tptr = val;
+    }
+}
+
+static void pnv_phb4_rtc_invalidate(PnvPHB4 *phb, uint64_t val)
+{
+    PnvPhb4DMASpace *ds;
+
+    /* Always invalidate all for now ... */
+    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
+        ds->pe_num = PHB_INVALID_PE;
+    }
+}
+
+static void pnv_phb4_update_msi_regions(PnvPhb4DMASpace *ds)
+{
+    uint64_t cfg = ds->phb->regs[PHB_PHB4_CONFIG >> 3];
+
+    if (cfg & PHB_PHB4C_32BIT_MSI_EN) {
+        if (!memory_region_is_mapped(MEMORY_REGION(&ds->msi32_mr))) {
+            memory_region_add_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        0xffff0000, &ds->msi32_mr);
+        }
+    } else {
+        if (memory_region_is_mapped(MEMORY_REGION(&ds->msi32_mr))) {
+            memory_region_del_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        &ds->msi32_mr);
+        }
+    }
+
+    if (cfg & PHB_PHB4C_64BIT_MSI_EN) {
+        if (!memory_region_is_mapped(MEMORY_REGION(&ds->msi64_mr))) {
+            memory_region_add_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        (1ull << 60), &ds->msi64_mr);
+        }
+    } else {
+        if (memory_region_is_mapped(MEMORY_REGION(&ds->msi64_mr))) {
+            memory_region_del_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        &ds->msi64_mr);
+        }
+    }
+}
+
+static void pnv_phb4_update_all_msi_regions(PnvPHB4 *phb)
+{
+    PnvPhb4DMASpace *ds;
+
+    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
+        pnv_phb4_update_msi_regions(ds);
+    }
+}
+
+static void pnv_phb4_update_xsrc(PnvPHB4 *phb)
+{
+    int shift, flags, i, lsi_base;
+    XiveSource *xsrc = &phb->xsrc;
+
+    /* The XIVE source characteristics can be set at run time */
+    if (phb->regs[PHB_CTRLR >> 3] & PHB_CTRLR_IRQ_PGSZ_64K) {
+        shift = XIVE_ESB_64K;
+    } else {
+        shift = XIVE_ESB_4K;
+    }
+    if (phb->regs[PHB_CTRLR >> 3] & PHB_CTRLR_IRQ_STORE_EOI) {
+        flags = XIVE_SRC_STORE_EOI;
+    } else {
+        flags = 0;
+    }
+
+    phb->xsrc.esb_shift = shift;
+    phb->xsrc.esb_flags = flags;
+
+    lsi_base = GETFIELD(PHB_LSI_SRC_ID, phb->regs[PHB_LSI_SOURCE_ID >> 3]);
+    lsi_base <<= 3;
+
+    /* TODO: handle reset values of PHB_LSI_SRC_ID */
+    if (!lsi_base) {
+        return;
+    }
+
+    /* TODO: need a xive_source_irq_reset_lsi() */
+    bitmap_zero(xsrc->lsi_map, xsrc->nr_irqs);
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        bool msi = (i < lsi_base || i >= (lsi_base + 8));
+        if (!msi) {
+            xive_source_irq_set_lsi(xsrc, i);
+        }
+    }
+}
+
+static void pnv_phb4_reg_write(void *opaque, hwaddr off, uint64_t val,
+                               unsigned size)
+{
+    PnvPHB4 *phb = PNV_PHB4(opaque);
+    bool changed;
+
+    /* Special case outbound configuration data */
+    if ((off & 0xfffc) == PHB_CONFIG_DATA) {
+        pnv_phb4_config_write(phb, off & 0x3, size, val);
+        return;
+    }
+
+    /* Special case RC configuration space */
+    if ((off & 0xf800) == PHB_RC_CONFIG_BASE) {
+        pnv_phb4_rc_config_write(phb, off & 0x7ff, size, val);
+        return;
+    }
+
+    /* Other registers are 64-bit only */
+    if (size != 8 || off & 0x7) {
+        phb_error(phb, "Invalid register access, offset: 0x%"PRIx64" size: %d",
+                   off, size);
+        return;
+    }
+
+    /* Handle masking */
+    switch (off) {
+    case PHB_LSI_SOURCE_ID:
+        val &= PHB_LSI_SRC_ID;
+        break;
+    case PHB_M64_UPPER_BITS:
+        val &= 0xff00000000000000ull;
+        break;
+    /* TCE Kill */
+    case PHB_TCE_KILL:
+        /* Clear top 3 bits which HW does to indicate successful queuing */
+        val &= ~(PHB_TCE_KILL_ALL | PHB_TCE_KILL_PE | PHB_TCE_KILL_ONE);
+        break;
+    case PHB_Q_DMA_R:
+        /*
+         * This is enough logic to make SW happy but we aren't
+         * actually quiescing the DMAs
+         */
+        if (val & PHB_Q_DMA_R_AUTORESET) {
+            val = 0;
+        } else {
+            val &= PHB_Q_DMA_R_QUIESCE_DMA;
+        }
+        break;
+    /* LEM stuff */
+    case PHB_LEM_FIR_AND_MASK:
+        phb->regs[PHB_LEM_FIR_ACCUM >> 3] &= val;
+        return;
+    case PHB_LEM_FIR_OR_MASK:
+        phb->regs[PHB_LEM_FIR_ACCUM >> 3] |= val;
+        return;
+    case PHB_LEM_ERROR_AND_MASK:
+        phb->regs[PHB_LEM_ERROR_MASK >> 3] &= val;
+        return;
+    case PHB_LEM_ERROR_OR_MASK:
+        phb->regs[PHB_LEM_ERROR_MASK >> 3] |= val;
+        return;
+    case PHB_LEM_WOF:
+        val = 0;
+        break;
+    /* TODO: More regs ..., maybe create a table with masks... */
+
+    /* Read only registers */
+    case PHB_CPU_LOADSTORE_STATUS:
+    case PHB_ETU_ERR_SUMMARY:
+    case PHB_PHB4_GEN_CAP:
+    case PHB_PHB4_TCE_CAP:
+    case PHB_PHB4_IRQ_CAP:
+    case PHB_PHB4_EEH_CAP:
+        return;
+    }
+
+    /* Record whether it changed */
+    changed = phb->regs[off >> 3] != val;
+
+    /* Store in register cache first */
+    phb->regs[off >> 3] = val;
+
+    /* Handle side effects */
+    switch (off) {
+    case PHB_PHB4_CONFIG:
+        if (changed) {
+            pnv_phb4_update_all_msi_regions(phb);
+        }
+        break;
+    case PHB_M32_START_ADDR:
+    case PHB_M64_UPPER_BITS:
+        if (changed) {
+            pnv_phb4_check_all_mbt(phb);
+        }
+        break;
+
+    /* IODA table accesses */
+    case PHB_IODA_DATA0:
+        pnv_phb4_ioda_write(phb, val);
+        break;
+
+    /* RTC invalidation */
+    case PHB_RTC_INVALIDATE:
+        pnv_phb4_rtc_invalidate(phb, val);
+        break;
+
+    /* PHB Control (Affects XIVE source) */
+    case PHB_CTRLR:
+    case PHB_LSI_SOURCE_ID:
+        pnv_phb4_update_xsrc(phb);
+        break;
+
+    /* Silent simple writes */
+    case PHB_ASN_CMPM:
+    case PHB_CONFIG_ADDRESS:
+    case PHB_IODA_ADDR:
+    case PHB_TCE_KILL:
+    case PHB_TCE_SPEC_CTL:
+    case PHB_PEST_BAR:
+    case PHB_PELTV_BAR:
+    case PHB_RTT_BAR:
+    case PHB_LEM_FIR_ACCUM:
+    case PHB_LEM_ERROR_MASK:
+    case PHB_LEM_ACTION0:
+    case PHB_LEM_ACTION1:
+    case PHB_TCE_TAG_ENABLE:
+    case PHB_INT_NOTIFY_ADDR:
+    case PHB_INT_NOTIFY_INDEX:
+    case PHB_DMARD_SYNC:
+       break;
+
+    /* Noise on anything else */
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb4: reg_write 0x%"PRIx64"=%"PRIx64"\n",
+                      off, val);
+    }
+}
+
+static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr off, unsigned size)
+{
+    PnvPHB4 *phb = PNV_PHB4(opaque);
+    uint64_t val;
+
+    if ((off & 0xfffc) == PHB_CONFIG_DATA) {
+        return pnv_phb4_config_read(phb, off & 0x3, size);
+    }
+
+    /* Special case RC configuration space */
+    if ((off & 0xf800) == PHB_RC_CONFIG_BASE) {
+        return pnv_phb4_rc_config_read(phb, off & 0x7ff, size);
+    }
+
+    /* Other registers are 64-bit only */
+    if (size != 8 || off & 0x7) {
+        phb_error(phb, "Invalid register access, offset: 0x%"PRIx64" size: %d",
+                   off, size);
+        return ~0ull;
+    }
+
+    /* Default read from cache */
+    val = phb->regs[off >> 3];
+
+    switch (off) {
+    case PHB_VERSION:
+        return phb->version;
+
+        /* Read-only */
+    case PHB_PHB4_GEN_CAP:
+        return 0xe4b8000000000000ull;
+    case PHB_PHB4_TCE_CAP:
+        return phb->big_phb ? 0x4008440000000400ull : 0x2008440000000200ull;
+    case PHB_PHB4_IRQ_CAP:
+        return phb->big_phb ? 0x0800000000001000ull : 0x0800000000000800ull;
+    case PHB_PHB4_EEH_CAP:
+        return phb->big_phb ? 0x2000000000000000ull : 0x1000000000000000ull;
+
+    /* IODA table accesses */
+    case PHB_IODA_DATA0:
+        return pnv_phb4_ioda_read(phb);
+
+    /* Link training always appears trained */
+    case PHB_PCIE_DLP_TRAIN_CTL:
+        /* TODO: Do something sensible with speed ? */
+        return PHB_PCIE_DLP_INBAND_PRESENCE | PHB_PCIE_DLP_TL_LINKACT;
+
+    /* DMA read sync: make it look like it's complete */
+    case PHB_DMARD_SYNC:
+        return PHB_DMARD_SYNC_COMPLETE;
+
+    /* Silent simple reads */
+    case PHB_LSI_SOURCE_ID:
+    case PHB_CPU_LOADSTORE_STATUS:
+    case PHB_ASN_CMPM:
+    case PHB_PHB4_CONFIG:
+    case PHB_M32_START_ADDR:
+    case PHB_CONFIG_ADDRESS:
+    case PHB_IODA_ADDR:
+    case PHB_RTC_INVALIDATE:
+    case PHB_TCE_KILL:
+    case PHB_TCE_SPEC_CTL:
+    case PHB_PEST_BAR:
+    case PHB_PELTV_BAR:
+    case PHB_RTT_BAR:
+    case PHB_M64_UPPER_BITS:
+    case PHB_CTRLR:
+    case PHB_LEM_FIR_ACCUM:
+    case PHB_LEM_ERROR_MASK:
+    case PHB_LEM_ACTION0:
+    case PHB_LEM_ACTION1:
+    case PHB_TCE_TAG_ENABLE:
+    case PHB_INT_NOTIFY_ADDR:
+    case PHB_INT_NOTIFY_INDEX:
+    case PHB_Q_DMA_R:
+    case PHB_ETU_ERR_SUMMARY:
+        break;
+
+    /* Noise on anything else */
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb4: reg_read 0x%"PRIx64"=%"PRIx64"\n",
+                      off, val);
+    }
+    return val;
+}
+
+static const MemoryRegionOps pnv_phb4_reg_ops = {
+    .read = pnv_phb4_reg_read,
+    .write = pnv_phb4_reg_write,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 1,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static uint64_t pnv_phb4_xscom_read(void *opaque, hwaddr addr, unsigned size)
+{
+    PnvPHB4 *phb = PNV_PHB4(opaque);
+    uint32_t reg = addr >> 3;
+    uint64_t val;
+    hwaddr offset;
+
+    switch (reg) {
+    case PHB_SCOM_HV_IND_ADDR:
+        return phb->scom_hv_ind_addr_reg;
+
+    case PHB_SCOM_HV_IND_DATA:
+        if (!(phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_VALID)) {
+            phb_error(phb, "Invalid indirect address");
+            return ~0ull;
+        }
+        size = (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_4B) ? 4 : 8;
+        offset = GETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR, phb->scom_hv_ind_addr_reg);
+        val = pnv_phb4_reg_read(phb, offset, size);
+        if (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_AUTOINC) {
+            offset += size;
+            offset &= 0x3fff;
+            phb->scom_hv_ind_addr_reg = SETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR,
+                                                 phb->scom_hv_ind_addr_reg,
+                                                 offset);
+        }
+        return val;
+    case PHB_SCOM_ETU_LEM_FIR:
+    case PHB_SCOM_ETU_LEM_FIR_AND:
+    case PHB_SCOM_ETU_LEM_FIR_OR:
+    case PHB_SCOM_ETU_LEM_FIR_MSK:
+    case PHB_SCOM_ETU_LEM_ERR_MSK_AND:
+    case PHB_SCOM_ETU_LEM_ERR_MSK_OR:
+    case PHB_SCOM_ETU_LEM_ACT0:
+    case PHB_SCOM_ETU_LEM_ACT1:
+    case PHB_SCOM_ETU_LEM_WOF:
+        offset = ((reg - PHB_SCOM_ETU_LEM_FIR) << 3) + PHB_LEM_FIR_ACCUM;
+        return pnv_phb4_reg_read(phb, offset, size);
+    case PHB_SCOM_ETU_PMON_CONFIG:
+    case PHB_SCOM_ETU_PMON_CTR0:
+    case PHB_SCOM_ETU_PMON_CTR1:
+    case PHB_SCOM_ETU_PMON_CTR2:
+    case PHB_SCOM_ETU_PMON_CTR3:
+        offset = ((reg - PHB_SCOM_ETU_PMON_CONFIG) << 3) + PHB_PERFMON_CONFIG;
+        return pnv_phb4_reg_read(phb, offset, size);
+
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb4: xscom_read 0x%"HWADDR_PRIx"\n", addr);
+        return ~0ull;
+    }
+}
+
+static void pnv_phb4_xscom_write(void *opaque, hwaddr addr,
+                                 uint64_t val, unsigned size)
+{
+    PnvPHB4 *phb = PNV_PHB4(opaque);
+    uint32_t reg = addr >> 3;
+    hwaddr offset;
+
+    switch (reg) {
+    case PHB_SCOM_HV_IND_ADDR:
+        phb->scom_hv_ind_addr_reg = val & 0xe000000000001fff;
+        break;
+    case PHB_SCOM_HV_IND_DATA:
+        if (!(phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_VALID)) {
+            phb_error(phb, "Invalid indirect address");
+            break;
+        }
+        size = (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_4B) ? 4 : 8;
+        offset = GETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR, phb->scom_hv_ind_addr_reg);
+        pnv_phb4_reg_write(phb, offset, val, size);
+        if (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_AUTOINC) {
+            offset += size;
+            offset &= 0x3fff;
+            phb->scom_hv_ind_addr_reg = SETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR,
+                                                 phb->scom_hv_ind_addr_reg,
+                                                 offset);
+        }
+        break;
+    case PHB_SCOM_ETU_LEM_FIR:
+    case PHB_SCOM_ETU_LEM_FIR_AND:
+    case PHB_SCOM_ETU_LEM_FIR_OR:
+    case PHB_SCOM_ETU_LEM_FIR_MSK:
+    case PHB_SCOM_ETU_LEM_ERR_MSK_AND:
+    case PHB_SCOM_ETU_LEM_ERR_MSK_OR:
+    case PHB_SCOM_ETU_LEM_ACT0:
+    case PHB_SCOM_ETU_LEM_ACT1:
+    case PHB_SCOM_ETU_LEM_WOF:
+        offset = ((reg - PHB_SCOM_ETU_LEM_FIR) << 3) + PHB_LEM_FIR_ACCUM;
+        pnv_phb4_reg_write(phb, offset, val, size);
+        break;
+    case PHB_SCOM_ETU_PMON_CONFIG:
+    case PHB_SCOM_ETU_PMON_CTR0:
+    case PHB_SCOM_ETU_PMON_CTR1:
+    case PHB_SCOM_ETU_PMON_CTR2:
+    case PHB_SCOM_ETU_PMON_CTR3:
+        offset = ((reg - PHB_SCOM_ETU_PMON_CONFIG) << 3) + PHB_PERFMON_CONFIG;
+        pnv_phb4_reg_write(phb, offset, val, size);
+        break;
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb4: xscom_write 0x%"HWADDR_PRIx
+                      "=%"PRIx64"\n", addr, val);
+    }
+}
+
+const MemoryRegionOps pnv_phb4_xscom_ops = {
+    .read = pnv_phb4_xscom_read,
+    .write = pnv_phb4_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static int pnv_phb4_map_irq(PCIDevice *pci_dev, int irq_num)
+{
+    /* Check that out properly ... */
+    return irq_num & 3;
+}
+
+static void pnv_phb4_set_irq(void *opaque, int irq_num, int level)
+{
+    PnvPHB4 *phb = PNV_PHB4(opaque);
+    uint32_t lsi_base;
+
+    /* LSI only ... */
+    if (irq_num > 3) {
+        phb_error(phb, "IRQ %x is not an LSI", irq_num);
+    }
+    lsi_base = GETFIELD(PHB_LSI_SRC_ID, phb->regs[PHB_LSI_SOURCE_ID >> 3]);
+    lsi_base <<= 3;
+    qemu_set_irq(phb->qirqs[lsi_base + irq_num], level);
+}
+
+static bool pnv_phb4_resolve_pe(PnvPhb4DMASpace *ds)
+{
+    uint64_t rtt, addr;
+    uint16_t rte;
+    int bus_num;
+    int num_PEs;
+
+    /* Already resolved ? */
+    if (ds->pe_num != PHB_INVALID_PE) {
+        return true;
+    }
+
+    /* We need to lookup the RTT */
+    rtt = ds->phb->regs[PHB_RTT_BAR >> 3];
+    if (!(rtt & PHB_RTT_BAR_ENABLE)) {
+        phb_error(ds->phb, "DMA with RTT BAR disabled !");
+        /* Set error bits ? fence ? ... */
+        return false;
+    }
+
+    /* Read RTE */
+    bus_num = pci_bus_num(ds->bus);
+    addr = rtt & PHB_RTT_BASE_ADDRESS_MASK;
+    addr += 2 * ((bus_num << 8) | ds->devfn);
+    if (dma_memory_read(&address_space_memory, addr, &rte, sizeof(rte))) {
+        phb_error(ds->phb, "Failed to read RTT entry at 0x%"PRIx64, addr);
+        /* Set error bits ? fence ? ... */
+        return false;
+    }
+    rte = be16_to_cpu(rte);
+
+    /* Fail upon reading of invalid PE# */
+    num_PEs = ds->phb->big_phb ? PNV_PHB4_MAX_PEs : (PNV_PHB4_MAX_PEs >> 1);
+    if (rte >= num_PEs) {
+        phb_error(ds->phb, "RTE for RID 0x%x invalid (%04x", ds->devfn, rte);
+        rte &= num_PEs - 1;
+    }
+    ds->pe_num = rte;
+    return true;
+}
+
+static void pnv_phb4_translate_tve(PnvPhb4DMASpace *ds, hwaddr addr,
+                                   bool is_write, uint64_t tve,
+                                   IOMMUTLBEntry *tlb)
+{
+    uint64_t tta = GETFIELD(IODA3_TVT_TABLE_ADDR, tve);
+    int32_t  lev = GETFIELD(IODA3_TVT_NUM_LEVELS, tve);
+    uint32_t tts = GETFIELD(IODA3_TVT_TCE_TABLE_SIZE, tve);
+    uint32_t tps = GETFIELD(IODA3_TVT_IO_PSIZE, tve);
+
+    /* Invalid levels */
+    if (lev > 4) {
+        phb_error(ds->phb, "Invalid #levels in TVE %d", lev);
+        return;
+    }
+
+    /* Invalid entry */
+    if (tts == 0) {
+        phb_error(ds->phb, "Access to invalid TVE");
+        return;
+    }
+
+    /* IO Page Size of 0 means untranslated, else use TCEs */
+    if (tps == 0) {
+        /* TODO: Handle boundaries */
+
+        /* Use 4k pages like q35 ... for now */
+        tlb->iova = addr & 0xfffffffffffff000ull;
+        tlb->translated_addr = addr & 0x0003fffffffff000ull;
+        tlb->addr_mask = 0xfffull;
+        tlb->perm = IOMMU_RW;
+    } else {
+        uint32_t tce_shift, tbl_shift, sh;
+        uint64_t base, taddr, tce, tce_mask;
+
+        /* Address bits per bottom level TCE entry */
+        tce_shift = tps + 11;
+
+        /* Address bits per table level */
+        tbl_shift = tts + 8;
+
+        /* Top level table base address */
+        base = tta << 12;
+
+        /* Total shift to first level */
+        sh = tbl_shift * lev + tce_shift;
+
+        /* TODO: Limit to support IO page sizes */
+
+        /* TODO: Multi-level untested */
+        while ((lev--) >= 0) {
+            /* Grab the TCE address */
+            taddr = base | (((addr >> sh) & ((1ul << tbl_shift) - 1)) << 3);
+            if (dma_memory_read(&address_space_memory, taddr, &tce,
+                                sizeof(tce))) {
+                phb_error(ds->phb, "Failed to read TCE at 0x%"PRIx64, taddr);
+                return;
+            }
+            tce = be64_to_cpu(tce);
+
+            /* Check permission for indirect TCE */
+            if ((lev >= 0) && !(tce & 3)) {
+                phb_error(ds->phb, "Invalid indirect TCE at 0x%"PRIx64, taddr);
+                phb_error(ds->phb, " xlate %"PRIx64":%c TVE=%"PRIx64, addr,
+                           is_write ? 'W' : 'R', tve);
+                phb_error(ds->phb, " tta=%"PRIx64" lev=%d tts=%d tps=%d",
+                           tta, lev, tts, tps);
+                return;
+            }
+            sh -= tbl_shift;
+            base = tce & ~0xfffull;
+        }
+
+        /* We exit the loop with TCE being the final TCE */
+        tce_mask = ~((1ull << tce_shift) - 1);
+        tlb->iova = addr & tce_mask;
+        tlb->translated_addr = tce & tce_mask;
+        tlb->addr_mask = ~tce_mask;
+        tlb->perm = tce & 3;
+        if ((is_write & !(tce & 2)) || ((!is_write) && !(tce & 1))) {
+            phb_error(ds->phb, "TCE access fault at 0x%"PRIx64, taddr);
+            phb_error(ds->phb, " xlate %"PRIx64":%c TVE=%"PRIx64, addr,
+                       is_write ? 'W' : 'R', tve);
+            phb_error(ds->phb, " tta=%"PRIx64" lev=%d tts=%d tps=%d",
+                       tta, lev, tts, tps);
+        }
+    }
+}
+
+static IOMMUTLBEntry pnv_phb4_translate_iommu(IOMMUMemoryRegion *iommu,
+                                              hwaddr addr,
+                                              IOMMUAccessFlags flag,
+                                              int iommu_idx)
+{
+    PnvPhb4DMASpace *ds = container_of(iommu, PnvPhb4DMASpace, dma_mr);
+    int tve_sel;
+    uint64_t tve, cfg;
+    IOMMUTLBEntry ret = {
+        .target_as = &address_space_memory,
+        .iova = addr,
+        .translated_addr = 0,
+        .addr_mask = ~(hwaddr)0,
+        .perm = IOMMU_NONE,
+    };
+
+    /* Resolve PE# */
+    if (!pnv_phb4_resolve_pe(ds)) {
+        phb_error(ds->phb, "Failed to resolve PE# for bus @%p (%d) devfn 0x%x",
+                   ds->bus, pci_bus_num(ds->bus), ds->devfn);
+        return ret;
+    }
+
+    /* Check top bits */
+    switch (addr >> 60) {
+    case 00:
+        /* DMA or 32-bit MSI ? */
+        cfg = ds->phb->regs[PHB_PHB4_CONFIG >> 3];
+        if ((cfg & PHB_PHB4C_32BIT_MSI_EN) &&
+            ((addr & 0xffffffffffff0000ull) == 0xffff0000ull)) {
+            phb_error(ds->phb, "xlate on 32-bit MSI region");
+            return ret;
+        }
+        /* Choose TVE XXX Use PHB4 Control Register */
+        tve_sel = (addr >> 59) & 1;
+        tve = ds->phb->ioda_TVT[ds->pe_num * 2 + tve_sel];
+        pnv_phb4_translate_tve(ds, addr, flag & IOMMU_WO, tve, &ret);
+        break;
+    case 01:
+        phb_error(ds->phb, "xlate on 64-bit MSI region");
+        break;
+    default:
+        phb_error(ds->phb, "xlate on unsupported address 0x%"PRIx64, addr);
+    }
+    return ret;
+}
+
+#define TYPE_PNV_PHB4_IOMMU_MEMORY_REGION "pnv-phb4-iommu-memory-region"
+#define PNV_PHB4_IOMMU_MEMORY_REGION(obj) \
+    OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_PNV_PHB4_IOMMU_MEMORY_REGION)
+
+static void pnv_phb4_iommu_memory_region_class_init(ObjectClass *klass,
+                                                    void *data)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
+
+    imrc->translate = pnv_phb4_translate_iommu;
+}
+
+static const TypeInfo pnv_phb4_iommu_memory_region_info = {
+    .parent = TYPE_IOMMU_MEMORY_REGION,
+    .name = TYPE_PNV_PHB4_IOMMU_MEMORY_REGION,
+    .class_init = pnv_phb4_iommu_memory_region_class_init,
+};
+
+/*
+ * MSI/MSIX memory region implementation.
+ * The handler handles both MSI and MSIX.
+ */
+static void pnv_phb4_msi_write(void *opaque, hwaddr addr,
+                               uint64_t data, unsigned size)
+{
+    PnvPhb4DMASpace *ds = opaque;
+    PnvPHB4 *phb = ds->phb;
+
+    uint32_t src = ((addr >> 4) & 0xffff) | (data & 0x1f);
+
+    /* Resolve PE# */
+    if (!pnv_phb4_resolve_pe(ds)) {
+        phb_error(phb, "Failed to resolve PE# for bus @%p (%d) devfn 0x%x",
+                   ds->bus, pci_bus_num(ds->bus), ds->devfn);
+        return;
+    }
+
+    /* TODO: Check it doesn't collide with LSIs */
+    if (src >= phb->xsrc.nr_irqs) {
+        phb_error(phb, "MSI %d out of bounds", src);
+        return;
+    }
+
+    /* TODO: check PE/MSI assignement */
+
+    qemu_irq_pulse(phb->qirqs[src]);
+}
+
+/* There is no .read as the read result is undefined by PCI spec */
+static uint64_t pnv_phb4_msi_read(void *opaque, hwaddr addr, unsigned size)
+{
+    PnvPhb4DMASpace *ds = opaque;
+
+    phb_error(ds->phb, "Invalid MSI read @ 0x%" HWADDR_PRIx, addr);
+    return -1;
+}
+
+static const MemoryRegionOps pnv_phb4_msi_ops = {
+    .read = pnv_phb4_msi_read,
+    .write = pnv_phb4_msi_write,
+    .endianness = DEVICE_LITTLE_ENDIAN
+};
+
+static PnvPhb4DMASpace *pnv_phb4_dma_find(PnvPHB4 *phb, PCIBus *bus, int devfn)
+{
+    PnvPhb4DMASpace *ds;
+
+    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
+        if (ds->bus == bus && ds->devfn == devfn) {
+            break;
+        }
+    }
+    return ds;
+}
+
+static AddressSpace *pnv_phb4_dma_iommu(PCIBus *bus, void *opaque, int devfn)
+{
+    PnvPHB4 *phb = opaque;
+    PnvPhb4DMASpace *ds;
+    char name[32];
+
+    ds = pnv_phb4_dma_find(phb, bus, devfn);
+
+    if (ds == NULL) {
+        ds = g_malloc0(sizeof(PnvPhb4DMASpace));
+        ds->bus = bus;
+        ds->devfn = devfn;
+        ds->pe_num = PHB_INVALID_PE;
+        ds->phb = phb;
+        snprintf(name, sizeof(name), "phb4-%d.%d-iommu", phb->chip_id,
+                 phb->phb_id);
+        memory_region_init_iommu(&ds->dma_mr, sizeof(ds->dma_mr),
+                                 TYPE_PNV_PHB4_IOMMU_MEMORY_REGION,
+                                 OBJECT(phb), name, UINT64_MAX);
+        address_space_init(&ds->dma_as, MEMORY_REGION(&ds->dma_mr),
+                           name);
+        memory_region_init_io(&ds->msi32_mr, OBJECT(phb), &pnv_phb4_msi_ops,
+                              ds, "msi32", 0x10000);
+        memory_region_init_io(&ds->msi64_mr, OBJECT(phb), &pnv_phb4_msi_ops,
+                              ds, "msi64", 0x100000);
+        pnv_phb4_update_msi_regions(ds);
+
+        QLIST_INSERT_HEAD(&phb->dma_spaces, ds, list);
+    }
+    return &ds->dma_as;
+}
+
+static void pnv_phb4_instance_init(Object *obj)
+{
+    PnvPHB4 *phb = PNV_PHB4(obj);
+
+    QLIST_INIT(&phb->dma_spaces);
+
+    /* XIVE interrupt source object */
+    object_initialize_child(obj, "source", &phb->xsrc, sizeof(XiveSource),
+                            TYPE_XIVE_SOURCE, &error_abort, NULL);
+
+    /* Root Port */
+    object_initialize_child(obj, "root", &phb->root, sizeof(phb->root),
+                            TYPE_PNV_PHB4_ROOT_PORT, &error_abort, NULL);
+
+    qdev_prop_set_int32(DEVICE(&phb->root), "addr", PCI_DEVFN(0, 0));
+    qdev_prop_set_bit(DEVICE(&phb->root), "multifunction", false);
+}
+
+static void pnv_phb4_realize(DeviceState *dev, Error **errp)
+{
+    PnvPHB4 *phb = PNV_PHB4(dev);
+    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
+    XiveSource *xsrc = &phb->xsrc;
+    Error *local_err = NULL;
+    int nr_irqs;
+    char name[32];
+
+    assert(phb->stack);
+
+    /* Set the "big_phb" flag */
+    phb->big_phb = phb->phb_id == 0 || phb->phb_id == 3;
+
+    /* Controller Registers */
+    snprintf(name, sizeof(name), "phb4-%d.%d-regs", phb->chip_id,
+             phb->phb_id);
+    memory_region_init_io(&phb->mr_regs, OBJECT(phb), &pnv_phb4_reg_ops, phb,
+                          name, 0x2000);
+
+    /*
+     * PHB4 doesn't support IO space. However, qemu gets very upset if
+     * we don't have an IO region to anchor IO BARs onto so we just
+     * initialize one which we never hook up to anything
+     */
+
+    snprintf(name, sizeof(name), "phb4-%d.%d-pci-io", phb->chip_id,
+             phb->phb_id);
+    memory_region_init(&phb->pci_io, OBJECT(phb), name, 0x10000);
+
+    snprintf(name, sizeof(name), "phb4-%d.%d-pci-mmio", phb->chip_id,
+             phb->phb_id);
+    memory_region_init(&phb->pci_mmio, OBJECT(phb), name,
+                       PCI_MMIO_TOTAL_SIZE);
+
+    pci->bus = pci_register_root_bus(dev, "root-bus",
+                                     pnv_phb4_set_irq, pnv_phb4_map_irq, phb,
+                                     &phb->pci_mmio, &phb->pci_io,
+                                     0, 4, TYPE_PNV_PHB4_ROOT_BUS);
+    pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb);
+
+    /* Add a single Root port */
+    qdev_prop_set_uint8(DEVICE(&phb->root), "chassis", phb->chip_id);
+    qdev_prop_set_uint16(DEVICE(&phb->root), "slot", phb->phb_id);
+    qdev_set_parent_bus(DEVICE(&phb->root), BUS(pci->bus));
+    qdev_init_nofail(DEVICE(&phb->root));
+
+    /* Setup XIVE Source */
+    if (phb->big_phb) {
+        nr_irqs = PNV_PHB4_MAX_INTs;
+    } else {
+        nr_irqs = PNV_PHB4_MAX_INTs >> 1;
+    }
+    object_property_set_int(OBJECT(xsrc), nr_irqs, "nr-irqs", &error_fatal);
+    object_property_set_link(OBJECT(xsrc), OBJECT(phb), "xive", &error_fatal);
+    object_property_set_bool(OBJECT(xsrc), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    pnv_phb4_update_xsrc(phb);
+
+    phb->qirqs = qemu_allocate_irqs(xive_source_set_irq, xsrc, xsrc->nr_irqs);
+}
+
+static void pnv_phb4_reset(DeviceState *dev)
+{
+    PnvPHB4 *phb = PNV_PHB4(dev);
+    PCIDevice *root_dev = PCI_DEVICE(&phb->root);
+
+    /*
+     * Configure PCI device id at reset using a property.
+     */
+    pci_config_set_vendor_id(root_dev->config, PCI_VENDOR_ID_IBM);
+    pci_config_set_device_id(root_dev->config, phb->device_id);
+}
+
+static const char *pnv_phb4_root_bus_path(PCIHostState *host_bridge,
+                                          PCIBus *rootbus)
+{
+    PnvPHB4 *phb = PNV_PHB4(host_bridge);
+
+    snprintf(phb->bus_path, sizeof(phb->bus_path), "00%02x:%02x",
+             phb->chip_id, phb->phb_id);
+    return phb->bus_path;
+}
+
+static void pnv_phb4_xive_notify(XiveNotifier *xf, uint32_t srcno)
+{
+    PnvPHB4 *phb = PNV_PHB4(xf);
+    uint64_t notif_port = phb->regs[PHB_INT_NOTIFY_ADDR >> 3];
+    uint32_t offset = phb->regs[PHB_INT_NOTIFY_INDEX >> 3];
+    uint64_t data = XIVE_TRIGGER_PQ | offset | srcno;
+    MemTxResult result;
+
+    address_space_stq_be(&address_space_memory, notif_port, data,
+                         MEMTXATTRS_UNSPECIFIED, &result);
+    if (result != MEMTX_OK) {
+        phb_error(phb, "trigger failed @%"HWADDR_PRIx "\n", notif_port);
+        return;
+    }
+}
+
+static Property pnv_phb4_properties[] = {
+        DEFINE_PROP_UINT32("index", PnvPHB4, phb_id, 0),
+        DEFINE_PROP_UINT32("chip-id", PnvPHB4, chip_id, 0),
+        DEFINE_PROP_UINT64("version", PnvPHB4, version, 0),
+        DEFINE_PROP_UINT16("device-id", PnvPHB4, device_id, 0),
+        DEFINE_PROP_LINK("stack", PnvPHB4, stack, TYPE_PNV_PHB4_PEC_STACK,
+                         PnvPhb4PecStack *),
+        DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pnv_phb4_class_init(ObjectClass *klass, void *data)
+{
+    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    XiveNotifierClass *xfc = XIVE_NOTIFIER_CLASS(klass);
+
+    hc->root_bus_path   = pnv_phb4_root_bus_path;
+    dc->realize         = pnv_phb4_realize;
+    dc->props           = pnv_phb4_properties;
+    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+    dc->user_creatable  = true;
+    dc->reset           = pnv_phb4_reset;
+
+    xfc->notify         = pnv_phb4_xive_notify;
+}
+
+static const TypeInfo pnv_phb4_type_info = {
+    .name          = TYPE_PNV_PHB4,
+    .parent        = TYPE_PCIE_HOST_BRIDGE,
+    .instance_init = pnv_phb4_instance_init,
+    .instance_size = sizeof(PnvPHB4),
+    .class_init    = pnv_phb4_class_init,
+    .interfaces = (InterfaceInfo[]) {
+            { TYPE_XIVE_NOTIFIER },
+            { },
+    }
+};
+
+static void pnv_phb4_root_bus_class_init(ObjectClass *klass, void *data)
+{
+    BusClass *k = BUS_CLASS(klass);
+
+    /*
+     * PHB4 has only a single root complex. Enforce the limit on the
+     * parent bus
+     */
+    k->max_dev = 1;
+}
+
+static const TypeInfo pnv_phb4_root_bus_info = {
+    .name = TYPE_PNV_PHB4_ROOT_BUS,
+    .parent = TYPE_PCIE_BUS,
+    .class_init = pnv_phb4_root_bus_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_PCIE_DEVICE },
+        { }
+    },
+};
+
+static void pnv_phb4_root_port_reset(DeviceState *dev)
+{
+    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+    PCIDevice *d = PCI_DEVICE(dev);
+    uint8_t *conf = d->config;
+
+    rpc->parent_reset(dev);
+
+    pci_byte_test_and_set_mask(conf + PCI_IO_BASE,
+                               PCI_IO_RANGE_MASK & 0xff);
+    pci_byte_test_and_clear_mask(conf + PCI_IO_LIMIT,
+                                 PCI_IO_RANGE_MASK & 0xff);
+    pci_set_word(conf + PCI_MEMORY_BASE, 0);
+    pci_set_word(conf + PCI_MEMORY_LIMIT, 0xfff0);
+    pci_set_word(conf + PCI_PREF_MEMORY_BASE, 0x1);
+    pci_set_word(conf + PCI_PREF_MEMORY_LIMIT, 0xfff1);
+    pci_set_long(conf + PCI_PREF_BASE_UPPER32, 0x1); /* Hack */
+    pci_set_long(conf + PCI_PREF_LIMIT_UPPER32, 0xffffffff);
+}
+
+static void pnv_phb4_root_port_realize(DeviceState *dev, Error **errp)
+{
+    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+    Error *local_err = NULL;
+
+    rpc->parent_realize(dev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static void pnv_phb4_root_port_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
+
+    dc->desc     = "IBM PHB4 PCIE Root Port";
+
+    device_class_set_parent_realize(dc, pnv_phb4_root_port_realize,
+                                    &rpc->parent_realize);
+    device_class_set_parent_reset(dc, pnv_phb4_root_port_reset,
+                                  &rpc->parent_reset);
+
+    k->vendor_id = PCI_VENDOR_ID_IBM;
+    k->device_id = PNV_PHB4_DEVICE_ID;
+    k->revision  = 0;
+
+    rpc->exp_offset = 0x48;
+    rpc->aer_offset = 0x100;
+
+    dc->reset = &pnv_phb4_root_port_reset;
+}
+
+static const TypeInfo pnv_phb4_root_port_info = {
+    .name          = TYPE_PNV_PHB4_ROOT_PORT,
+    .parent        = TYPE_PCIE_ROOT_PORT,
+    .instance_size = sizeof(PnvPHB4RootPort),
+    .class_init    = pnv_phb4_root_port_class_init,
+};
+
+static void pnv_phb4_register_types(void)
+{
+    type_register_static(&pnv_phb4_root_bus_info);
+    type_register_static(&pnv_phb4_root_port_info);
+    type_register_static(&pnv_phb4_type_info);
+    type_register_static(&pnv_phb4_iommu_memory_region_info);
+}
+
+type_init(pnv_phb4_register_types);
+
+void pnv_phb4_update_regions(PnvPhb4PecStack *stack)
+{
+    PnvPHB4 *phb = &stack->phb;
+
+    /* Unmap first always */
+    if (memory_region_is_mapped(&phb->mr_regs)) {
+        memory_region_del_subregion(&stack->phbbar, &phb->mr_regs);
+    }
+    if (memory_region_is_mapped(&phb->xsrc.esb_mmio)) {
+        memory_region_del_subregion(&stack->intbar, &phb->xsrc.esb_mmio);
+    }
+
+    /* Map registers if enabled */
+    if (memory_region_is_mapped(&stack->phbbar)) {
+        memory_region_add_subregion(&stack->phbbar, 0, &phb->mr_regs);
+    }
+
+    /* Map ESB if enabled */
+    if (memory_region_is_mapped(&stack->intbar)) {
+        memory_region_add_subregion(&stack->intbar, 0, &phb->xsrc.esb_mmio);
+    }
+
+    /* Check/update m32 */
+    pnv_phb4_check_all_mbt(phb);
+}
+
+void pnv_phb4_pic_print_info(PnvPHB4 *phb, Monitor *mon)
+{
+    uint32_t offset = phb->regs[PHB_INT_NOTIFY_INDEX >> 3];
+
+    monitor_printf(mon, "PHB4[%x:%x] Source %08x .. %08x\n",
+                   phb->chip_id, phb->phb_id,
+                   offset, offset + phb->xsrc.nr_irqs - 1);
+    xive_source_pic_print_info(&phb->xsrc, 0, mon);
+}
diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
new file mode 100644
index 000000000000..ea400bf6a1fb
--- /dev/null
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -0,0 +1,593 @@
+/*
+ * QEMU PowerPC PowerNV (POWER9) PHB4 model
+ *
+ * Copyright (c) 2018-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "target/ppc/cpu.h"
+#include "hw/ppc/fdt.h"
+#include "hw/pci-host/pnv_phb4_regs.h"
+#include "hw/pci-host/pnv_phb4.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/ppc/pnv.h"
+#include "hw/qdev-properties.h"
+
+#include <libfdt.h>
+
+#define phb_pec_error(pec, fmt, ...)                                    \
+    qemu_log_mask(LOG_GUEST_ERROR, "phb4_pec[%d:%d]: " fmt "\n",        \
+                  (pec)->chip_id, (pec)->index, ## __VA_ARGS__)
+
+
+static uint64_t pnv_pec_nest_xscom_read(void *opaque, hwaddr addr,
+                                        unsigned size)
+{
+    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
+    uint32_t reg = addr >> 3;
+
+    /* TODO: add list of allowed registers and error out if not */
+    return pec->nest_regs[reg];
+}
+
+static void pnv_pec_nest_xscom_write(void *opaque, hwaddr addr,
+                                     uint64_t val, unsigned size)
+{
+    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
+    uint32_t reg = addr >> 3;
+
+    switch (reg) {
+    case PEC_NEST_PBCQ_HW_CONFIG:
+    case PEC_NEST_DROP_PRIO_CTRL:
+    case PEC_NEST_PBCQ_ERR_INJECT:
+    case PEC_NEST_PCI_NEST_CLK_TRACE_CTL:
+    case PEC_NEST_PBCQ_PMON_CTRL:
+    case PEC_NEST_PBCQ_PBUS_ADDR_EXT:
+    case PEC_NEST_PBCQ_PRED_VEC_TIMEOUT:
+    case PEC_NEST_CAPP_CTRL:
+    case PEC_NEST_PBCQ_READ_STK_OVR:
+    case PEC_NEST_PBCQ_WRITE_STK_OVR:
+    case PEC_NEST_PBCQ_STORE_STK_OVR:
+    case PEC_NEST_PBCQ_RETRY_BKOFF_CTRL:
+        pec->nest_regs[reg] = val;
+        break;
+    default:
+        phb_pec_error(pec, "%s @0x%"HWADDR_PRIx"=%"PRIx64"\n", __func__,
+                      addr, val);
+    }
+}
+
+static const MemoryRegionOps pnv_pec_nest_xscom_ops = {
+    .read = pnv_pec_nest_xscom_read,
+    .write = pnv_pec_nest_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static uint64_t pnv_pec_pci_xscom_read(void *opaque, hwaddr addr,
+                                       unsigned size)
+{
+    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
+    uint32_t reg = addr >> 3;
+
+    /* TODO: add list of allowed registers and error out if not */
+    return pec->pci_regs[reg];
+}
+
+static void pnv_pec_pci_xscom_write(void *opaque, hwaddr addr,
+                                    uint64_t val, unsigned size)
+{
+    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
+    uint32_t reg = addr >> 3;
+
+    switch (reg) {
+    case PEC_PCI_PBAIB_HW_CONFIG:
+    case PEC_PCI_PBAIB_READ_STK_OVR:
+        pec->pci_regs[reg] = val;
+        break;
+    default:
+        phb_pec_error(pec, "%s @0x%"HWADDR_PRIx"=%"PRIx64"\n", __func__,
+                      addr, val);
+    }
+}
+
+static const MemoryRegionOps pnv_pec_pci_xscom_ops = {
+    .read = pnv_pec_pci_xscom_read,
+    .write = pnv_pec_pci_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static uint64_t pnv_pec_stk_nest_xscom_read(void *opaque, hwaddr addr,
+                                            unsigned size)
+{
+    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
+    uint32_t reg = addr >> 3;
+
+    /* TODO: add list of allowed registers and error out if not */
+    return stack->nest_regs[reg];
+}
+
+static void pnv_pec_stk_update_map(PnvPhb4PecStack *stack)
+{
+    PnvPhb4PecState *pec = stack->pec;
+    MemoryRegion *sysmem = pec->system_memory;
+    uint64_t bar_en = stack->nest_regs[PEC_NEST_STK_BAR_EN];
+    uint64_t bar, mask, size;
+    char name[64];
+
+    /*
+     * NOTE: This will really not work well if those are remapped
+     * after the PHB has created its sub regions. We could do better
+     * if we had a way to resize regions but we don't really care
+     * that much in practice as the stuff below really only happens
+     * once early during boot
+     */
+
+    /* Handle unmaps */
+    if (memory_region_is_mapped(&stack->mmbar0) &&
+        !(bar_en & PEC_NEST_STK_BAR_EN_MMIO0)) {
+        memory_region_del_subregion(sysmem, &stack->mmbar0);
+    }
+    if (memory_region_is_mapped(&stack->mmbar1) &&
+        !(bar_en & PEC_NEST_STK_BAR_EN_MMIO1)) {
+        memory_region_del_subregion(sysmem, &stack->mmbar1);
+    }
+    if (memory_region_is_mapped(&stack->phbbar) &&
+        !(bar_en & PEC_NEST_STK_BAR_EN_PHB)) {
+        memory_region_del_subregion(sysmem, &stack->phbbar);
+    }
+    if (memory_region_is_mapped(&stack->intbar) &&
+        !(bar_en & PEC_NEST_STK_BAR_EN_INT)) {
+        memory_region_del_subregion(sysmem, &stack->intbar);
+    }
+
+    /* Update PHB */
+    pnv_phb4_update_regions(stack);
+
+    /* Handle maps */
+    if (!memory_region_is_mapped(&stack->mmbar0) &&
+        (bar_en & PEC_NEST_STK_BAR_EN_MMIO0)) {
+        bar = stack->nest_regs[PEC_NEST_STK_MMIO_BAR0] >> 8;
+        mask = stack->nest_regs[PEC_NEST_STK_MMIO_BAR0_MASK];
+        size = ((~mask) >> 8) + 1;
+        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-mmio0",
+                 pec->chip_id, pec->index, stack->stack_no);
+        memory_region_init(&stack->mmbar0, OBJECT(stack), name, size);
+        memory_region_add_subregion(sysmem, bar, &stack->mmbar0);
+        stack->mmio0_base = bar;
+        stack->mmio0_size = size;
+    }
+    if (!memory_region_is_mapped(&stack->mmbar1) &&
+        (bar_en & PEC_NEST_STK_BAR_EN_MMIO1)) {
+        bar = stack->nest_regs[PEC_NEST_STK_MMIO_BAR1] >> 8;
+        mask = stack->nest_regs[PEC_NEST_STK_MMIO_BAR1_MASK];
+        size = ((~mask) >> 8) + 1;
+        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-mmio1",
+                 pec->chip_id, pec->index, stack->stack_no);
+        memory_region_init(&stack->mmbar1, OBJECT(stack), name, size);
+        memory_region_add_subregion(sysmem, bar, &stack->mmbar1);
+        stack->mmio1_base = bar;
+        stack->mmio1_size = size;
+    }
+    if (!memory_region_is_mapped(&stack->phbbar) &&
+        (bar_en & PEC_NEST_STK_BAR_EN_PHB)) {
+        bar = stack->nest_regs[PEC_NEST_STK_PHB_REGS_BAR] >> 8;
+        size = PNV_PHB4_NUM_REGS << 3;
+        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-phb",
+                 pec->chip_id, pec->index, stack->stack_no);
+        memory_region_init(&stack->phbbar, OBJECT(stack), name, size);
+        memory_region_add_subregion(sysmem, bar, &stack->phbbar);
+    }
+    if (!memory_region_is_mapped(&stack->intbar) &&
+        (bar_en & PEC_NEST_STK_BAR_EN_INT)) {
+        bar = stack->nest_regs[PEC_NEST_STK_INT_BAR] >> 8;
+        size = PNV_PHB4_MAX_INTs << 16;
+        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-int",
+                 stack->pec->chip_id, stack->pec->index, stack->stack_no);
+        memory_region_init(&stack->intbar, OBJECT(stack), name, size);
+        memory_region_add_subregion(sysmem, bar, &stack->intbar);
+    }
+
+    /* Update PHB */
+    pnv_phb4_update_regions(stack);
+}
+
+static void pnv_pec_stk_nest_xscom_write(void *opaque, hwaddr addr,
+                                         uint64_t val, unsigned size)
+{
+    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
+    PnvPhb4PecState *pec = stack->pec;
+    uint32_t reg = addr >> 3;
+
+    switch (reg) {
+    case PEC_NEST_STK_PCI_NEST_FIR:
+        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] = val;
+        break;
+    case PEC_NEST_STK_PCI_NEST_FIR_CLR:
+        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] &= val;
+        break;
+    case PEC_NEST_STK_PCI_NEST_FIR_SET:
+        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] |= val;
+        break;
+    case PEC_NEST_STK_PCI_NEST_FIR_MSK:
+        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] = val;
+        break;
+    case PEC_NEST_STK_PCI_NEST_FIR_MSKC:
+        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] &= val;
+        break;
+    case PEC_NEST_STK_PCI_NEST_FIR_MSKS:
+        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] |= val;
+        break;
+    case PEC_NEST_STK_PCI_NEST_FIR_ACT0:
+    case PEC_NEST_STK_PCI_NEST_FIR_ACT1:
+        stack->nest_regs[reg] = val;
+        break;
+    case PEC_NEST_STK_PCI_NEST_FIR_WOF:
+        stack->nest_regs[reg] = 0;
+        break;
+    case PEC_NEST_STK_ERR_REPORT_0:
+    case PEC_NEST_STK_ERR_REPORT_1:
+    case PEC_NEST_STK_PBCQ_GNRL_STATUS:
+        /* Flag error ? */
+        break;
+    case PEC_NEST_STK_PBCQ_MODE:
+        stack->nest_regs[reg] = val & 0xff00000000000000ull;
+        break;
+    case PEC_NEST_STK_MMIO_BAR0:
+    case PEC_NEST_STK_MMIO_BAR0_MASK:
+    case PEC_NEST_STK_MMIO_BAR1:
+    case PEC_NEST_STK_MMIO_BAR1_MASK:
+        if (stack->nest_regs[PEC_NEST_STK_BAR_EN] &
+            (PEC_NEST_STK_BAR_EN_MMIO0 |
+             PEC_NEST_STK_BAR_EN_MMIO1)) {
+            phb_pec_error(pec, "Changing enabled BAR unsupported\n");
+        }
+        stack->nest_regs[reg] = val & 0xffffffffff000000ull;
+        break;
+    case PEC_NEST_STK_PHB_REGS_BAR:
+        if (stack->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_PHB) {
+            phb_pec_error(pec, "Changing enabled BAR unsupported\n");
+        }
+        stack->nest_regs[reg] = val & 0xffffffffffc00000ull;
+        break;
+    case PEC_NEST_STK_INT_BAR:
+        if (stack->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_INT) {
+            phb_pec_error(pec, "Changing enabled BAR unsupported\n");
+        }
+        stack->nest_regs[reg] = val & 0xfffffff000000000ull;
+        break;
+    case PEC_NEST_STK_BAR_EN:
+        stack->nest_regs[reg] = val & 0xf000000000000000ull;
+        pnv_pec_stk_update_map(stack);
+        break;
+    case PEC_NEST_STK_DATA_FRZ_TYPE:
+    case PEC_NEST_STK_PBCQ_TUN_BAR:
+        /* Not used for now */
+        stack->nest_regs[reg] = val;
+        break;
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb4_pec: nest_xscom_write 0x%"HWADDR_PRIx
+                      "=%"PRIx64"\n", addr, val);
+    }
+}
+
+static const MemoryRegionOps pnv_pec_stk_nest_xscom_ops = {
+    .read = pnv_pec_stk_nest_xscom_read,
+    .write = pnv_pec_stk_nest_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static uint64_t pnv_pec_stk_pci_xscom_read(void *opaque, hwaddr addr,
+                                           unsigned size)
+{
+    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
+    uint32_t reg = addr >> 3;
+
+    /* TODO: add list of allowed registers and error out if not */
+    return stack->pci_regs[reg];
+}
+
+static void pnv_pec_stk_pci_xscom_write(void *opaque, hwaddr addr,
+                                        uint64_t val, unsigned size)
+{
+    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
+    uint32_t reg = addr >> 3;
+
+    switch (reg) {
+    case PEC_PCI_STK_PCI_FIR:
+        stack->nest_regs[reg] = val;
+        break;
+    case PEC_PCI_STK_PCI_FIR_CLR:
+        stack->nest_regs[PEC_PCI_STK_PCI_FIR] &= val;
+        break;
+    case PEC_PCI_STK_PCI_FIR_SET:
+        stack->nest_regs[PEC_PCI_STK_PCI_FIR] |= val;
+        break;
+    case PEC_PCI_STK_PCI_FIR_MSK:
+        stack->nest_regs[reg] = val;
+        break;
+    case PEC_PCI_STK_PCI_FIR_MSKC:
+        stack->nest_regs[PEC_PCI_STK_PCI_FIR_MSK] &= val;
+        break;
+    case PEC_PCI_STK_PCI_FIR_MSKS:
+        stack->nest_regs[PEC_PCI_STK_PCI_FIR_MSK] |= val;
+        break;
+    case PEC_PCI_STK_PCI_FIR_ACT0:
+    case PEC_PCI_STK_PCI_FIR_ACT1:
+        stack->nest_regs[reg] = val;
+        break;
+    case PEC_PCI_STK_PCI_FIR_WOF:
+        stack->nest_regs[reg] = 0;
+        break;
+    case PEC_PCI_STK_ETU_RESET:
+        stack->nest_regs[reg] = val & 0x8000000000000000ull;
+        /* TODO: Implement reset */
+        break;
+    case PEC_PCI_STK_PBAIB_ERR_REPORT:
+        break;
+    case PEC_PCI_STK_PBAIB_TX_CMD_CRED:
+    case PEC_PCI_STK_PBAIB_TX_DAT_CRED:
+        stack->nest_regs[reg] = val;
+        break;
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb4_pec_stk: pci_xscom_write 0x%"HWADDR_PRIx
+                      "=%"PRIx64"\n", addr, val);
+    }
+}
+
+static const MemoryRegionOps pnv_pec_stk_pci_xscom_ops = {
+    .read = pnv_pec_stk_pci_xscom_read,
+    .write = pnv_pec_stk_pci_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static void pnv_pec_instance_init(Object *obj)
+{
+    PnvPhb4PecState *pec = PNV_PHB4_PEC(obj);
+    int i;
+
+    for (i = 0; i < PHB4_PEC_MAX_STACKS; i++) {
+        object_initialize_child(obj, "stack[*]", &pec->stacks[i],
+                                sizeof(pec->stacks[i]), TYPE_PNV_PHB4_PEC_STACK,
+                                &error_abort, NULL);
+    }
+}
+
+static void pnv_pec_realize(DeviceState *dev, Error **errp)
+{
+    PnvPhb4PecState *pec = PNV_PHB4_PEC(dev);
+    Error *local_err = NULL;
+    char name[64];
+    int i;
+
+    assert(pec->system_memory);
+
+    /* Create stacks */
+    for (i = 0; i < pec->num_stacks; i++) {
+        PnvPhb4PecStack *stack = &pec->stacks[i];
+        Object *stk_obj = OBJECT(stack);
+
+        object_property_set_int(stk_obj, i, "stack-no", &error_abort);
+        object_property_set_link(stk_obj, OBJECT(pec), "pec", &error_abort);
+        object_property_set_bool(stk_obj, true, "realized", errp);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
+    /* Initialize the XSCOM regions for the PEC registers */
+    snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest", pec->chip_id,
+             pec->index);
+    pnv_xscom_region_init(&pec->nest_regs_mr, OBJECT(dev),
+                          &pnv_pec_nest_xscom_ops, pec, name,
+                          PHB4_PEC_NEST_REGS_COUNT);
+
+    snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci", pec->chip_id,
+             pec->index);
+    pnv_xscom_region_init(&pec->pci_regs_mr, OBJECT(dev),
+                          &pnv_pec_pci_xscom_ops, pec, name,
+                          PHB4_PEC_PCI_REGS_COUNT);
+}
+
+static int pnv_pec_dt_xscom(PnvXScomInterface *dev, void *fdt,
+                            int xscom_offset)
+{
+    PnvPhb4PecState *pec = PNV_PHB4_PEC(dev);
+    PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(dev);
+    uint32_t nbase = pecc->xscom_nest_base(pec);
+    uint32_t pbase = pecc->xscom_pci_base(pec);
+    int offset, i;
+    char *name;
+    uint32_t reg[] = {
+        cpu_to_be32(nbase),
+        cpu_to_be32(pecc->xscom_nest_size),
+        cpu_to_be32(pbase),
+        cpu_to_be32(pecc->xscom_pci_size),
+    };
+
+    name = g_strdup_printf("pbcq@%x", nbase);
+    offset = fdt_add_subnode(fdt, xscom_offset, name);
+    _FDT(offset);
+    g_free(name);
+
+    _FDT((fdt_setprop(fdt, offset, "reg", reg, sizeof(reg))));
+
+    _FDT((fdt_setprop_cell(fdt, offset, "ibm,pec-index", pec->index)));
+    _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 1)));
+    _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0)));
+    _FDT((fdt_setprop(fdt, offset, "compatible", pecc->compat,
+                      pecc->compat_size)));
+
+    for (i = 0; i < pec->num_stacks; i++) {
+        PnvPhb4PecStack *stack = &pec->stacks[i];
+        PnvPHB4 *phb = &stack->phb;
+        int stk_offset;
+
+        name = g_strdup_printf("stack@%x", i);
+        stk_offset = fdt_add_subnode(fdt, offset, name);
+        _FDT(stk_offset);
+        g_free(name);
+        _FDT((fdt_setprop(fdt, stk_offset, "compatible", pecc->stk_compat,
+                          pecc->stk_compat_size)));
+        _FDT((fdt_setprop_cell(fdt, stk_offset, "reg", i)));
+        _FDT((fdt_setprop_cell(fdt, stk_offset, "ibm,phb-index", phb->phb_id)));
+    }
+
+    return 0;
+}
+
+static Property pnv_pec_properties[] = {
+        DEFINE_PROP_UINT32("index", PnvPhb4PecState, index, 0),
+        DEFINE_PROP_UINT32("num-stacks", PnvPhb4PecState, num_stacks, 0),
+        DEFINE_PROP_UINT32("chip-id", PnvPhb4PecState, chip_id, 0),
+        DEFINE_PROP_LINK("system-memory", PnvPhb4PecState, system_memory,
+                     TYPE_MEMORY_REGION, MemoryRegion *),
+        DEFINE_PROP_END_OF_LIST(),
+};
+
+static uint32_t pnv_pec_xscom_pci_base(PnvPhb4PecState *pec)
+{
+    return PNV9_XSCOM_PEC_PCI_BASE + 0x1000000 * pec->index;
+}
+
+static uint32_t pnv_pec_xscom_nest_base(PnvPhb4PecState *pec)
+{
+    return PNV9_XSCOM_PEC_NEST_BASE + 0x400 * pec->index;
+}
+
+static void pnv_pec_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PnvXScomInterfaceClass *xdc = PNV_XSCOM_INTERFACE_CLASS(klass);
+    PnvPhb4PecClass *pecc = PNV_PHB4_PEC_CLASS(klass);
+    static const char compat[] = "ibm,power9-pbcq";
+    static const char stk_compat[] = "ibm,power9-phb-stack";
+
+    xdc->dt_xscom = pnv_pec_dt_xscom;
+
+    dc->realize = pnv_pec_realize;
+    dc->props = pnv_pec_properties;
+
+    pecc->xscom_nest_base = pnv_pec_xscom_nest_base;
+    pecc->xscom_pci_base  = pnv_pec_xscom_pci_base;
+    pecc->xscom_nest_size = PNV9_XSCOM_PEC_NEST_SIZE;
+    pecc->xscom_pci_size  = PNV9_XSCOM_PEC_PCI_SIZE;
+    pecc->compat = compat;
+    pecc->compat_size = sizeof(compat);
+    pecc->stk_compat = stk_compat;
+    pecc->stk_compat_size = sizeof(stk_compat);
+}
+
+static const TypeInfo pnv_pec_type_info = {
+    .name          = TYPE_PNV_PHB4_PEC,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(PnvPhb4PecState),
+    .instance_init = pnv_pec_instance_init,
+    .class_init    = pnv_pec_class_init,
+    .class_size    = sizeof(PnvPhb4PecClass),
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_PNV_XSCOM_INTERFACE },
+        { }
+    }
+};
+
+static void pnv_pec_stk_instance_init(Object *obj)
+{
+    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(obj);
+
+    object_initialize_child(obj, "phb", &stack->phb, sizeof(stack->phb),
+                            TYPE_PNV_PHB4, &error_abort, NULL);
+}
+
+static void pnv_pec_stk_realize(DeviceState *dev, Error **errp)
+{
+    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(dev);
+    PnvPhb4PecState *pec = stack->pec;
+    char name[64];
+
+    assert(pec);
+
+    /* Initialize the XSCOM regions for the stack registers */
+    snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest-stack-%d",
+             pec->chip_id, pec->index, stack->stack_no);
+    pnv_xscom_region_init(&stack->nest_regs_mr, OBJECT(stack),
+                          &pnv_pec_stk_nest_xscom_ops, stack, name,
+                          PHB4_PEC_NEST_STK_REGS_COUNT);
+
+    snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-stack-%d",
+             pec->chip_id, pec->index, stack->stack_no);
+    pnv_xscom_region_init(&stack->pci_regs_mr, OBJECT(stack),
+                          &pnv_pec_stk_pci_xscom_ops, stack, name,
+                          PHB4_PEC_PCI_STK_REGS_COUNT);
+
+    /* PHB pass-through */
+    snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-stack-%d-phb",
+             pec->chip_id, pec->index, stack->stack_no);
+    pnv_xscom_region_init(&stack->phb_regs_mr, OBJECT(&stack->phb),
+                          &pnv_phb4_xscom_ops, &stack->phb, name, 0x40);
+
+    /*
+     * Let the machine/chip realize the PHB object to customize more
+     * easily some fields
+     */
+}
+
+static Property pnv_pec_stk_properties[] = {
+        DEFINE_PROP_UINT32("stack-no", PnvPhb4PecStack, stack_no, 0),
+        DEFINE_PROP_LINK("pec", PnvPhb4PecStack, pec, TYPE_PNV_PHB4_PEC,
+                         PnvPhb4PecState *),
+        DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pnv_pec_stk_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->props = pnv_pec_stk_properties;
+    dc->realize = pnv_pec_stk_realize;
+
+    /* TODO: reset regs ? */
+}
+
+static const TypeInfo pnv_pec_stk_type_info = {
+    .name          = TYPE_PNV_PHB4_PEC_STACK,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(PnvPhb4PecStack),
+    .instance_init = pnv_pec_stk_instance_init,
+    .class_init    = pnv_pec_stk_class_init,
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_PNV_XSCOM_INTERFACE },
+        { }
+    }
+};
+
+static void pnv_pec_register_types(void)
+{
+    type_register_static(&pnv_pec_type_info);
+    type_register_static(&pnv_pec_stk_type_info);
+}
+
+type_init(pnv_pec_register_types);
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index a4b073c2c529..44c74be81b66 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -40,6 +40,7 @@
 #include "hw/intc/intc.h"
 #include "hw/ipmi/ipmi.h"
 #include "target/ppc/mmu-hash64.h"
+#include "hw/pci/msi.h"
 
 #include "hw/ppc/xics.h"
 #include "hw/qdev-properties.h"
@@ -622,9 +623,17 @@ static void pnv_chip_power8_pic_print_info(PnvChip *chip, Monitor *mon)
 static void pnv_chip_power9_pic_print_info(PnvChip *chip, Monitor *mon)
 {
     Pnv9Chip *chip9 = PNV9_CHIP(chip);
+    int i, j;
 
     pnv_xive_pic_print_info(&chip9->xive, mon);
     pnv_psi_pic_print_info(&chip9->psi, mon);
+
+    for (i = 0; i < PNV9_CHIP_MAX_PEC; i++) {
+        PnvPhb4PecState *pec = &chip9->pecs[i];
+        for (j = 0; j < pec->num_stacks; j++) {
+            pnv_phb4_pic_print_info(&pec->stacks[j].phb, mon);
+        }
+    }
 }
 
 static uint64_t pnv_chip_power8_xscom_core_base(PnvChip *chip,
@@ -753,6 +762,9 @@ static void pnv_init(MachineState *machine)
         }
     }
 
+    /* MSIs are supported on this platform */
+    msi_nonbroken = true;
+
     /*
      * Check compatibility of the specified CPU with the machine
      * default.
@@ -1235,7 +1247,10 @@ static void pnv_chip_power8nvl_class_init(ObjectClass *klass, void *data)
 
 static void pnv_chip_power9_instance_init(Object *obj)
 {
+    PnvChip *chip = PNV_CHIP(obj);
     Pnv9Chip *chip9 = PNV9_CHIP(obj);
+    PnvChipClass *pcc = PNV_CHIP_GET_CLASS(obj);
+    int i;
 
     object_initialize_child(obj, "xive", &chip9->xive, sizeof(chip9->xive),
                             TYPE_PNV_XIVE, &error_abort, NULL);
@@ -1253,6 +1268,17 @@ static void pnv_chip_power9_instance_init(Object *obj)
 
     object_initialize_child(obj, "homer",  &chip9->homer, sizeof(chip9->homer),
                             TYPE_PNV9_HOMER, &error_abort, NULL);
+
+    for (i = 0; i < PNV9_CHIP_MAX_PEC; i++) {
+        object_initialize_child(obj, "pec[*]", &chip9->pecs[i],
+                                sizeof(chip9->pecs[i]), TYPE_PNV_PHB4_PEC,
+                                &error_abort, NULL);
+    }
+
+    /*
+     * Number of PHBs is the chip default
+     */
+    chip->num_phbs = pcc->num_phbs;
 }
 
 static void pnv_chip_quad_realize(Pnv9Chip *chip9, Error **errp)
@@ -1281,6 +1307,78 @@ static void pnv_chip_quad_realize(Pnv9Chip *chip9, Error **errp)
     }
 }
 
+static void pnv_chip_power9_phb_realize(PnvChip *chip, Error **errp)
+{
+    Pnv9Chip *chip9 = PNV9_CHIP(chip);
+    Error *local_err = NULL;
+    int i, j;
+    int phb_id = 0;
+
+    for (i = 0; i < PNV9_CHIP_MAX_PEC; i++) {
+        PnvPhb4PecState *pec = &chip9->pecs[i];
+        PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
+        uint32_t pec_nest_base;
+        uint32_t pec_pci_base;
+
+        object_property_set_int(OBJECT(pec), i, "index", &error_fatal);
+        /*
+         * PEC0 -> 1 stack
+         * PEC1 -> 2 stacks
+         * PEC2 -> 3 stacks
+         */
+        object_property_set_int(OBJECT(pec), i + 1, "num-stacks",
+                                &error_fatal);
+        object_property_set_int(OBJECT(pec), chip->chip_id, "chip-id",
+                                 &error_fatal);
+        object_property_set_link(OBJECT(pec), OBJECT(get_system_memory()),
+                                 "system-memory", &error_abort);
+        object_property_set_bool(OBJECT(pec), true, "realized", &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        pec_nest_base = pecc->xscom_nest_base(pec);
+        pec_pci_base = pecc->xscom_pci_base(pec);
+
+        pnv_xscom_add_subregion(chip, pec_nest_base, &pec->nest_regs_mr);
+        pnv_xscom_add_subregion(chip, pec_pci_base, &pec->pci_regs_mr);
+
+        for (j = 0; j < pec->num_stacks && phb_id < chip->num_phbs;
+             j++, phb_id++) {
+            PnvPhb4PecStack *stack = &pec->stacks[j];
+            Object *obj = OBJECT(&stack->phb);
+
+            object_property_set_int(obj, phb_id, "index", &error_fatal);
+            object_property_set_int(obj, chip->chip_id, "chip-id",
+                                    &error_fatal);
+            object_property_set_int(obj, PNV_PHB4_VERSION, "version",
+                                    &error_fatal);
+            object_property_set_int(obj, PNV_PHB4_DEVICE_ID, "device-id",
+                                    &error_fatal);
+            object_property_set_link(obj, OBJECT(stack), "stack", &error_abort);
+            object_property_set_bool(obj, true, "realized", &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                return;
+            }
+            qdev_set_parent_bus(DEVICE(obj), sysbus_get_default());
+
+            /* Populate the XSCOM address space. */
+            pnv_xscom_add_subregion(chip,
+                                   pec_nest_base + 0x40 * (stack->stack_no + 1),
+                                   &stack->nest_regs_mr);
+            pnv_xscom_add_subregion(chip,
+                                    pec_pci_base + 0x40 * (stack->stack_no + 1),
+                                    &stack->pci_regs_mr);
+            pnv_xscom_add_subregion(chip,
+                                    pec_pci_base + PNV9_XSCOM_PEC_PCI_STK0 +
+                                    0x40 * stack->stack_no,
+                                    &stack->phb_regs_mr);
+        }
+    }
+}
+
 static void pnv_chip_power9_realize(DeviceState *dev, Error **errp)
 {
     PnvChipClass *pcc = PNV_CHIP_GET_CLASS(dev);
@@ -1383,6 +1481,13 @@ static void pnv_chip_power9_realize(DeviceState *dev, Error **errp)
     /* Homer mmio region */
     memory_region_add_subregion(get_system_memory(), PNV9_HOMER_BASE(chip),
                                 &chip9->homer.regs);
+
+    /* PHBs */
+    pnv_chip_power9_phb_realize(chip, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
 }
 
 static uint32_t pnv_chip_power9_xscom_pcba(PnvChip *chip, uint64_t addr)
@@ -1409,6 +1514,7 @@ static void pnv_chip_power9_class_init(ObjectClass *klass, void *data)
     k->xscom_core_base = pnv_chip_power9_xscom_core_base;
     k->xscom_pcba = pnv_chip_power9_xscom_pcba;
     dc->desc = "PowerNV Chip POWER9";
+    k->num_phbs = 6;
 
     device_class_set_parent_realize(dc, pnv_chip_power9_realize,
                                     &k->parent_realize);
@@ -1613,6 +1719,7 @@ static Property pnv_chip_properties[] = {
     DEFINE_PROP_UINT32("nr-cores", PnvChip, nr_cores, 1),
     DEFINE_PROP_UINT64("cores-mask", PnvChip, cores_mask, 0x0),
     DEFINE_PROP_UINT32("nr-threads", PnvChip, nr_threads, 1),
+    DEFINE_PROP_UINT32("num-phbs", PnvChip, num_phbs, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
index 9c466fab0101..8a296e2f93b2 100644
--- a/hw/pci-host/Makefile.objs
+++ b/hw/pci-host/Makefile.objs
@@ -20,3 +20,4 @@ common-obj-$(CONFIG_PCI_EXPRESS_GENERIC_BRIDGE) += gpex.o
 common-obj-$(CONFIG_PCI_EXPRESS_XILINX) += xilinx-pcie.o
 
 common-obj-$(CONFIG_PCI_EXPRESS_DESIGNWARE) += designware.o
+obj-$(CONFIG_POWERNV) += pnv_phb4.o pnv_phb4_pec.o
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index e27efe9a2459..354828bf132f 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -135,6 +135,8 @@ config XIVE_SPAPR
     default y
     depends on PSERIES
     select XIVE
+    select PCI
+    select PCIE_PORT
 
 config XIVE_KVM
     bool
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge
  2020-01-27 14:45 [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges Cédric Le Goater
  2020-01-27 14:45 ` [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge Cédric Le Goater
@ 2020-01-27 14:45 ` Cédric Le Goater
  2020-01-29  6:31 ` [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges David Gibson
  2 siblings, 0 replies; 9+ messages in thread
From: Cédric Le Goater @ 2020-01-27 14:45 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, Nicholas Piggin, qemu-ppc, Cédric Le Goater,
	Oliver O'Halloran

This is a model of the PCIe Host Bridge (PHB3) found on a POWER8
processor. It includes the PowerBus logic interface (PBCQ), IOMMU
support, a single PCIe Gen.3 Root Complex, and support for MSI and LSI
interrupt sources as found on a POWER8 system using the XICS interrupt
controller.

The POWER8 processor comes in different flavors: Venice, Murano,
Naple, each having a different number of PHBs. To make things simpler,
the models provides 3 PHB3 per chip. Some platforms, like the
Firestone, can also couple PHBs on the first chip to provide more
bandwidth but this is too specific to model in QEMU.

XICS requires some adjustment to support the PHB3 MSI. The changes are
provided here but they could be decoupled in prereq patches.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/pci-host/pnv_phb3.h      |  164 ++++
 include/hw/pci-host/pnv_phb3_regs.h |  450 ++++++++++
 include/hw/ppc/pnv.h                |    4 +
 include/hw/ppc/pnv_xscom.h          |    9 +
 include/hw/ppc/xics.h               |    5 +
 hw/intc/xics.c                      |   14 +-
 hw/pci-host/pnv_phb3.c              | 1195 +++++++++++++++++++++++++++
 hw/pci-host/pnv_phb3_msi.c          |  349 ++++++++
 hw/pci-host/pnv_phb3_pbcq.c         |  357 ++++++++
 hw/ppc/pnv.c                        |   69 +-
 hw/pci-host/Makefile.objs           |    1 +
 11 files changed, 2614 insertions(+), 3 deletions(-)
 create mode 100644 include/hw/pci-host/pnv_phb3.h
 create mode 100644 include/hw/pci-host/pnv_phb3_regs.h
 create mode 100644 hw/pci-host/pnv_phb3.c
 create mode 100644 hw/pci-host/pnv_phb3_msi.c
 create mode 100644 hw/pci-host/pnv_phb3_pbcq.c

diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
new file mode 100644
index 000000000000..75b787867a57
--- /dev/null
+++ b/include/hw/pci-host/pnv_phb3.h
@@ -0,0 +1,164 @@
+/*
+ * QEMU PowerPC PowerNV (POWER8) PHB3 model
+ *
+ * Copyright (c) 2014-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PCI_HOST_PNV_PHB3_H
+#define PCI_HOST_PNV_PHB3_H
+
+#include "hw/pci/pcie_host.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/ppc/xics.h"
+
+typedef struct PnvPHB3 PnvPHB3;
+
+/*
+ * PHB3 XICS Source for MSIs
+ */
+#define TYPE_PHB3_MSI "phb3-msi"
+#define PHB3_MSI(obj) OBJECT_CHECK(Phb3MsiState, (obj), TYPE_PHB3_MSI)
+
+#define PHB3_MAX_MSI     2048
+
+typedef struct Phb3MsiState {
+    ICSState ics;
+    qemu_irq *qirqs;
+
+    PnvPHB3 *phb;
+    uint64_t rba[PHB3_MAX_MSI / 64];
+    uint32_t rba_sum;
+} Phb3MsiState;
+
+void pnv_phb3_msi_update_config(Phb3MsiState *msis, uint32_t base,
+                                uint32_t count);
+void pnv_phb3_msi_send(Phb3MsiState *msis, uint64_t addr, uint16_t data,
+                       int32_t dev_pe);
+void pnv_phb3_msi_ffi(Phb3MsiState *msis, uint64_t val);
+void pnv_phb3_msi_pic_print_info(Phb3MsiState *msis, Monitor *mon);
+
+
+/*
+ * We have one such address space wrapper per possible device under
+ * the PHB since they need to be assigned statically at qemu device
+ * creation time. The relationship to a PE is done later dynamically.
+ * This means we can potentially create a lot of these guys. Q35
+ * stores them as some kind of radix tree but we never really need to
+ * do fast lookups so instead we simply keep a QLIST of them for now,
+ * we can add the radix if needed later on.
+ *
+ * We do cache the PE number to speed things up a bit though.
+ */
+typedef struct PnvPhb3DMASpace {
+    PCIBus *bus;
+    uint8_t devfn;
+    int pe_num;         /* Cached PE number */
+#define PHB_INVALID_PE (-1)
+    PnvPHB3 *phb;
+    AddressSpace dma_as;
+    IOMMUMemoryRegion dma_mr;
+    MemoryRegion msi32_mr;
+    MemoryRegion msi64_mr;
+    QLIST_ENTRY(PnvPhb3DMASpace) list;
+} PnvPhb3DMASpace;
+
+/*
+ * PHB3 Power Bus Common Queue
+ */
+#define TYPE_PNV_PBCQ "pnv-pbcq"
+#define PNV_PBCQ(obj) OBJECT_CHECK(PnvPBCQState, (obj), TYPE_PNV_PBCQ)
+
+typedef struct PnvPBCQState {
+    DeviceState parent;
+
+    uint32_t nest_xbase;
+    uint32_t spci_xbase;
+    uint32_t pci_xbase;
+#define PBCQ_NEST_REGS_COUNT    0x46
+#define PBCQ_PCI_REGS_COUNT     0x15
+#define PBCQ_SPCI_REGS_COUNT    0x5
+
+    uint64_t nest_regs[PBCQ_NEST_REGS_COUNT];
+    uint64_t spci_regs[PBCQ_SPCI_REGS_COUNT];
+    uint64_t pci_regs[PBCQ_PCI_REGS_COUNT];
+    MemoryRegion mmbar0;
+    MemoryRegion mmbar1;
+    MemoryRegion phbbar;
+    uint64_t mmio0_base;
+    uint64_t mmio0_size;
+    uint64_t mmio1_base;
+    uint64_t mmio1_size;
+    PnvPHB3 *phb;
+
+    MemoryRegion xscom_nest_regs;
+    MemoryRegion xscom_pci_regs;
+    MemoryRegion xscom_spci_regs;
+} PnvPBCQState;
+
+/*
+ * PHB3 PCIe Root port
+ */
+#define TYPE_PNV_PHB3_ROOT_BUS "pnv-phb3-root-bus"
+
+#define TYPE_PNV_PHB3_ROOT_PORT "pnv-phb3-root-port"
+
+typedef struct PnvPHB3RootPort {
+    PCIESlot parent_obj;
+} PnvPHB3RootPort;
+
+/*
+ * PHB3 PCIe Host Bridge for PowerNV machines (POWER8)
+ */
+#define TYPE_PNV_PHB3 "pnv-phb3"
+#define PNV_PHB3(obj) OBJECT_CHECK(PnvPHB3, (obj), TYPE_PNV_PHB3)
+
+#define PNV_PHB3_NUM_M64      16
+#define PNV_PHB3_NUM_REGS     (0x1000 >> 3)
+#define PNV_PHB3_NUM_LSI      8
+#define PNV_PHB3_NUM_PE       256
+
+#define PCI_MMIO_TOTAL_SIZE   (0x1ull << 60)
+
+struct PnvPHB3 {
+    PCIExpressHost parent_obj;
+
+    uint32_t chip_id;
+    uint32_t phb_id;
+    char bus_path[8];
+
+    uint64_t regs[PNV_PHB3_NUM_REGS];
+    MemoryRegion mr_regs;
+
+    MemoryRegion mr_m32;
+    MemoryRegion mr_m64[PNV_PHB3_NUM_M64];
+    MemoryRegion pci_mmio;
+    MemoryRegion pci_io;
+
+    uint64_t ioda_LIST[8];
+    uint64_t ioda_LXIVT[8];
+    uint64_t ioda_TVT[512];
+    uint64_t ioda_M64BT[16];
+    uint64_t ioda_MDT[256];
+    uint64_t ioda_PEEV[4];
+
+    uint32_t total_irq;
+    ICSState lsis;
+    qemu_irq *qirqs;
+    Phb3MsiState msis;
+
+    PnvPBCQState pbcq;
+
+    PnvPHB3RootPort root;
+
+    QLIST_HEAD(, PnvPhb3DMASpace) dma_spaces;
+};
+
+uint64_t pnv_phb3_reg_read(void *opaque, hwaddr off, unsigned size);
+void pnv_phb3_reg_write(void *opaque, hwaddr off, uint64_t val, unsigned size);
+void pnv_phb3_update_regions(PnvPHB3 *phb);
+void pnv_phb3_remap_irqs(PnvPHB3 *phb);
+
+#endif /* PCI_HOST_PNV_PHB3_H */
diff --git a/include/hw/pci-host/pnv_phb3_regs.h b/include/hw/pci-host/pnv_phb3_regs.h
new file mode 100644
index 000000000000..a174ef1f7045
--- /dev/null
+++ b/include/hw/pci-host/pnv_phb3_regs.h
@@ -0,0 +1,450 @@
+/*
+ * QEMU PowerPC PowerNV (POWER8) PHB3 model
+ *
+ * Copyright (c) 2013-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PCI_HOST_PNV_PHB3_REGS_H
+#define PCI_HOST_PNV_PHB3_REGS_H
+
+#include "qemu/host-utils.h"
+
+/*
+ * QEMU version of the GETFIELD/SETFIELD macros
+ *
+ * These are common with the PnvXive model.
+ */
+static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
+{
+    return (word & mask) >> ctz64(mask);
+}
+
+static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
+                                uint64_t value)
+{
+    return (word & ~mask) | ((value << ctz64(mask)) & mask);
+}
+
+/*
+ * PBCQ XSCOM registers
+ */
+
+#define PBCQ_NEST_IRSN_COMPARE  0x1a
+#define PBCQ_NEST_IRSN_COMP           PPC_BITMASK(0, 18)
+#define PBCQ_NEST_IRSN_MASK     0x1b
+#define PBCQ_NEST_LSI_SRC_ID    0x1f
+#define   PBCQ_NEST_LSI_SRC           PPC_BITMASK(0, 7)
+#define PBCQ_NEST_REGS_COUNT    0x46
+#define PBCQ_NEST_MMIO_BAR0     0x40
+#define PBCQ_NEST_MMIO_BAR1     0x41
+#define PBCQ_NEST_PHB_BAR       0x42
+#define PBCQ_NEST_MMIO_MASK0    0x43
+#define PBCQ_NEST_MMIO_MASK1    0x44
+#define PBCQ_NEST_BAR_EN        0x45
+#define   PBCQ_NEST_BAR_EN_MMIO0    PPC_BIT(0)
+#define   PBCQ_NEST_BAR_EN_MMIO1    PPC_BIT(1)
+#define   PBCQ_NEST_BAR_EN_PHB      PPC_BIT(2)
+#define   PBCQ_NEST_BAR_EN_IRSN_RX  PPC_BIT(3)
+#define   PBCQ_NEST_BAR_EN_IRSN_TX  PPC_BIT(4)
+
+#define PBCQ_PCI_REGS_COUNT     0x15
+#define PBCQ_PCI_BAR2           0x0b
+
+#define PBCQ_SPCI_REGS_COUNT    0x5
+#define PBCQ_SPCI_ASB_ADDR      0x0
+#define PBCQ_SPCI_ASB_STATUS    0x1
+#define PBCQ_SPCI_ASB_DATA      0x2
+#define PBCQ_SPCI_AIB_CAPP_EN   0x3
+#define PBCQ_SPCI_CAPP_SEC_TMR  0x4
+
+/*
+ * PHB MMIO registers
+ */
+
+/* PHB Fundamental register set A */
+#define PHB_LSI_SOURCE_ID               0x100
+#define   PHB_LSI_SRC_ID                PPC_BITMASK(5, 12)
+#define PHB_DMA_CHAN_STATUS             0x110
+#define   PHB_DMA_CHAN_ANY_ERR          PPC_BIT(27)
+#define   PHB_DMA_CHAN_ANY_ERR1         PPC_BIT(28)
+#define   PHB_DMA_CHAN_ANY_FREEZE       PPC_BIT(29)
+#define PHB_CPU_LOADSTORE_STATUS        0x120
+#define   PHB_CPU_LS_ANY_ERR            PPC_BIT(27)
+#define   PHB_CPU_LS_ANY_ERR1           PPC_BIT(28)
+#define   PHB_CPU_LS_ANY_FREEZE         PPC_BIT(29)
+#define PHB_DMA_MSI_NODE_ID             0x128
+#define   PHB_DMAMSI_NID_FIXED          PPC_BIT(0)
+#define   PHB_DMAMSI_NID                PPC_BITMASK(24, 31)
+#define PHB_CONFIG_DATA                 0x130
+#define PHB_LOCK0                       0x138
+#define PHB_CONFIG_ADDRESS              0x140
+#define   PHB_CA_ENABLE                 PPC_BIT(0)
+#define   PHB_CA_BUS                    PPC_BITMASK(4, 11)
+#define   PHB_CA_DEV                    PPC_BITMASK(12, 16)
+#define   PHB_CA_FUNC                   PPC_BITMASK(17, 19)
+#define   PHB_CA_REG                    PPC_BITMASK(20, 31)
+#define   PHB_CA_PE                     PPC_BITMASK(40, 47)
+#define PHB_LOCK1                       0x148
+#define PHB_IVT_BAR                     0x150
+#define   PHB_IVT_BAR_ENABLE            PPC_BIT(0)
+#define   PHB_IVT_BASE_ADDRESS_MASK     PPC_BITMASK(14, 48)
+#define   PHB_IVT_LENGTH_MASK           PPC_BITMASK(52, 63)
+#define PHB_RBA_BAR                     0x158
+#define   PHB_RBA_BAR_ENABLE            PPC_BIT(0)
+#define   PHB_RBA_BASE_ADDRESS          PPC_BITMASK(14, 55)
+#define PHB_PHB3_CONFIG                 0x160
+#define   PHB_PHB3C_64B_TCE_EN          PPC_BIT(2)
+#define   PHB_PHB3C_32BIT_MSI_EN        PPC_BIT(8)
+#define   PHB_PHB3C_64BIT_MSI_EN        PPC_BIT(14)
+#define   PHB_PHB3C_M32_EN              PPC_BIT(16)
+#define PHB_RTT_BAR                     0x168
+#define   PHB_RTT_BAR_ENABLE            PPC_BIT(0)
+#define   PHB_RTT_BASE_ADDRESS_MASK     PPC_BITMASK(14, 46)
+#define PHB_PELTV_BAR                   0x188
+#define   PHB_PELTV_BAR_ENABLE          PPC_BIT(0)
+#define   PHB_PELTV_BASE_ADDRESS        PPC_BITMASK(14, 50)
+#define PHB_M32_BASE_ADDR               0x190
+#define PHB_M32_BASE_MASK               0x198
+#define PHB_M32_START_ADDR              0x1a0
+#define PHB_PEST_BAR                    0x1a8
+#define   PHB_PEST_BAR_ENABLE           PPC_BIT(0)
+#define   PHB_PEST_BASE_ADDRESS         PPC_BITMASK(14, 51)
+#define PHB_M64_UPPER_BITS              0x1f0
+#define PHB_INTREP_TIMER                0x1f8
+#define PHB_DMARD_SYNC                  0x200
+#define   PHB_DMARD_SYNC_START          PPC_BIT(0)
+#define   PHB_DMARD_SYNC_COMPLETE       PPC_BIT(1)
+#define PHB_RTC_INVALIDATE              0x208
+#define   PHB_RTC_INVALIDATE_ALL        PPC_BIT(0)
+#define   PHB_RTC_INVALIDATE_RID        PPC_BITMASK(16, 31)
+#define PHB_TCE_KILL                    0x210
+#define   PHB_TCE_KILL_ALL              PPC_BIT(0)
+#define PHB_TCE_SPEC_CTL                0x218
+#define PHB_IODA_ADDR                   0x220
+#define   PHB_IODA_AD_AUTOINC           PPC_BIT(0)
+#define   PHB_IODA_AD_TSEL              PPC_BITMASK(11, 15)
+#define   PHB_IODA_AD_TADR              PPC_BITMASK(55, 63)
+#define PHB_IODA_DATA0                  0x228
+#define PHB_FFI_REQUEST                 0x238
+#define   PHB_FFI_LOCK_CLEAR            PPC_BIT(3)
+#define   PHB_FFI_REQUEST_ISN           PPC_BITMASK(49, 59)
+#define PHB_FFI_LOCK                    0x240
+#define   PHB_FFI_LOCK_STATE            PPC_BIT(0)
+#define PHB_XIVE_UPDATE                 0x248 /* Broken in DD1 */
+#define PHB_PHB3_GEN_CAP                0x250
+#define PHB_PHB3_TCE_CAP                0x258
+#define PHB_PHB3_IRQ_CAP                0x260
+#define PHB_PHB3_EEH_CAP                0x268
+#define PHB_IVC_INVALIDATE              0x2a0
+#define   PHB_IVC_INVALIDATE_ALL        PPC_BIT(0)
+#define   PHB_IVC_INVALIDATE_SID        PPC_BITMASK(16, 31)
+#define PHB_IVC_UPDATE                  0x2a8
+#define   PHB_IVC_UPDATE_ENABLE_P       PPC_BIT(0)
+#define   PHB_IVC_UPDATE_ENABLE_Q       PPC_BIT(1)
+#define   PHB_IVC_UPDATE_ENABLE_SERVER  PPC_BIT(2)
+#define   PHB_IVC_UPDATE_ENABLE_PRI     PPC_BIT(3)
+#define   PHB_IVC_UPDATE_ENABLE_GEN     PPC_BIT(4)
+#define   PHB_IVC_UPDATE_ENABLE_CON     PPC_BIT(5)
+#define   PHB_IVC_UPDATE_GEN_MATCH      PPC_BITMASK(6, 7)
+#define   PHB_IVC_UPDATE_SERVER         PPC_BITMASK(8, 23)
+#define   PHB_IVC_UPDATE_PRI            PPC_BITMASK(24, 31)
+#define   PHB_IVC_UPDATE_GEN            PPC_BITMASK(32, 33)
+#define   PHB_IVC_UPDATE_P              PPC_BITMASK(34, 34)
+#define   PHB_IVC_UPDATE_Q              PPC_BITMASK(35, 35)
+#define   PHB_IVC_UPDATE_SID            PPC_BITMASK(48, 63)
+#define PHB_PAPR_ERR_INJ_CTL            0x2b0
+#define   PHB_PAPR_ERR_INJ_CTL_INB      PPC_BIT(0)
+#define   PHB_PAPR_ERR_INJ_CTL_OUTB     PPC_BIT(1)
+#define   PHB_PAPR_ERR_INJ_CTL_STICKY   PPC_BIT(2)
+#define   PHB_PAPR_ERR_INJ_CTL_CFG      PPC_BIT(3)
+#define   PHB_PAPR_ERR_INJ_CTL_RD       PPC_BIT(4)
+#define   PHB_PAPR_ERR_INJ_CTL_WR       PPC_BIT(5)
+#define   PHB_PAPR_ERR_INJ_CTL_FREEZE   PPC_BIT(6)
+#define PHB_PAPR_ERR_INJ_ADDR           0x2b8
+#define   PHB_PAPR_ERR_INJ_ADDR_MMIO            PPC_BITMASK(16, 63)
+#define PHB_PAPR_ERR_INJ_MASK           0x2c0
+#define   PHB_PAPR_ERR_INJ_MASK_CFG             PPC_BITMASK(4, 11)
+#define   PHB_PAPR_ERR_INJ_MASK_MMIO            PPC_BITMASK(16, 63)
+#define PHB_ETU_ERR_SUMMARY             0x2c8
+
+/*  UTL registers */
+#define UTL_SYS_BUS_CONTROL             0x400
+#define UTL_STATUS                      0x408
+#define UTL_SYS_BUS_AGENT_STATUS        0x410
+#define UTL_SYS_BUS_AGENT_ERR_SEVERITY  0x418
+#define UTL_SYS_BUS_AGENT_IRQ_EN        0x420
+#define UTL_SYS_BUS_BURST_SZ_CONF       0x440
+#define UTL_REVISION_ID                 0x448
+#define UTL_BCLK_DOMAIN_DBG1            0x460
+#define UTL_BCLK_DOMAIN_DBG2            0x468
+#define UTL_BCLK_DOMAIN_DBG3            0x470
+#define UTL_BCLK_DOMAIN_DBG4            0x478
+#define UTL_BCLK_DOMAIN_DBG5            0x480
+#define UTL_BCLK_DOMAIN_DBG6            0x488
+#define UTL_OUT_POST_HDR_BUF_ALLOC      0x4c0
+#define UTL_OUT_POST_DAT_BUF_ALLOC      0x4d0
+#define UTL_IN_POST_HDR_BUF_ALLOC       0x4e0
+#define UTL_IN_POST_DAT_BUF_ALLOC       0x4f0
+#define UTL_OUT_NP_BUF_ALLOC            0x500
+#define UTL_IN_NP_BUF_ALLOC             0x510
+#define UTL_PCIE_TAGS_ALLOC             0x520
+#define UTL_GBIF_READ_TAGS_ALLOC        0x530
+#define UTL_PCIE_PORT_CONTROL           0x540
+#define UTL_PCIE_PORT_STATUS            0x548
+#define UTL_PCIE_PORT_ERROR_SEV         0x550
+#define UTL_PCIE_PORT_IRQ_EN            0x558
+#define UTL_RC_STATUS                   0x560
+#define UTL_RC_ERR_SEVERITY             0x568
+#define UTL_RC_IRQ_EN                   0x570
+#define UTL_EP_STATUS                   0x578
+#define UTL_EP_ERR_SEVERITY             0x580
+#define UTL_EP_ERR_IRQ_EN               0x588
+#define UTL_PCI_PM_CTRL1                0x590
+#define UTL_PCI_PM_CTRL2                0x598
+#define UTL_GP_CTL1                     0x5a0
+#define UTL_GP_CTL2                     0x5a8
+#define UTL_PCLK_DOMAIN_DBG1            0x5b0
+#define UTL_PCLK_DOMAIN_DBG2            0x5b8
+#define UTL_PCLK_DOMAIN_DBG3            0x5c0
+#define UTL_PCLK_DOMAIN_DBG4            0x5c8
+
+/* PCI-E Stack registers */
+#define PHB_PCIE_SYSTEM_CONFIG          0x600
+#define PHB_PCIE_BUS_NUMBER             0x608
+#define PHB_PCIE_SYSTEM_TEST            0x618
+#define PHB_PCIE_LINK_MANAGEMENT        0x630
+#define   PHB_PCIE_LM_LINK_ACTIVE       PPC_BIT(8)
+#define PHB_PCIE_DLP_TRAIN_CTL          0x640
+#define   PHB_PCIE_DLP_TCTX_DISABLE     PPC_BIT(1)
+#define   PHB_PCIE_DLP_TCRX_DISABLED    PPC_BIT(16)
+#define   PHB_PCIE_DLP_INBAND_PRESENCE  PPC_BIT(19)
+#define   PHB_PCIE_DLP_TC_DL_LINKUP     PPC_BIT(21)
+#define   PHB_PCIE_DLP_TC_DL_PGRESET    PPC_BIT(22)
+#define   PHB_PCIE_DLP_TC_DL_LINKACT    PPC_BIT(23)
+#define PHB_PCIE_SLOP_LOOPBACK_STATUS   0x648
+#define PHB_PCIE_SYS_LINK_INIT          0x668
+#define PHB_PCIE_UTL_CONFIG             0x670
+#define PHB_PCIE_DLP_CONTROL            0x678
+#define PHB_PCIE_UTL_ERRLOG1            0x680
+#define PHB_PCIE_UTL_ERRLOG2            0x688
+#define PHB_PCIE_UTL_ERRLOG3            0x690
+#define PHB_PCIE_UTL_ERRLOG4            0x698
+#define PHB_PCIE_DLP_ERRLOG1            0x6a0
+#define PHB_PCIE_DLP_ERRLOG2            0x6a8
+#define PHB_PCIE_DLP_ERR_STATUS         0x6b0
+#define PHB_PCIE_DLP_ERR_COUNTERS       0x6b8
+#define PHB_PCIE_UTL_ERR_INJECT         0x6c0
+#define PHB_PCIE_TLDLP_ERR_INJECT       0x6c8
+#define PHB_PCIE_LANE_EQ_CNTL0          0x6d0
+#define PHB_PCIE_LANE_EQ_CNTL1          0x6d8
+#define PHB_PCIE_LANE_EQ_CNTL2          0x6e0
+#define PHB_PCIE_LANE_EQ_CNTL3          0x6e8
+#define PHB_PCIE_STRAPPING              0x700
+
+/* Fundamental register set B */
+#define PHB_VERSION                     0x800
+#define PHB_RESET                       0x808
+#define PHB_CONTROL                     0x810
+#define   PHB_CTRL_IVE_128_BYTES        PPC_BIT(24)
+#define PHB_AIB_RX_CRED_INIT_TIMER      0x818
+#define PHB_AIB_RX_CMD_CRED             0x820
+#define PHB_AIB_RX_DATA_CRED            0x828
+#define PHB_AIB_TX_CMD_CRED             0x830
+#define PHB_AIB_TX_DATA_CRED            0x838
+#define PHB_AIB_TX_CHAN_MAPPING         0x840
+#define PHB_AIB_TAG_ENABLE              0x858
+#define PHB_AIB_FENCE_CTRL              0x860
+#define PHB_TCE_TAG_ENABLE              0x868
+#define PHB_TCE_WATERMARK               0x870
+#define PHB_TIMEOUT_CTRL1               0x878
+#define PHB_TIMEOUT_CTRL2               0x880
+#define PHB_Q_DMA_R                     0x888
+#define   PHB_Q_DMA_R_QUIESCE_DMA       PPC_BIT(0)
+#define   PHB_Q_DMA_R_AUTORESET         PPC_BIT(1)
+#define   PHB_Q_DMA_R_DMA_RESP_STATUS   PPC_BIT(4)
+#define   PHB_Q_DMA_R_MMIO_RESP_STATUS  PPC_BIT(5)
+#define   PHB_Q_DMA_R_TCE_RESP_STATUS   PPC_BIT(6)
+#define PHB_AIB_TAG_STATUS              0x900
+#define PHB_TCE_TAG_STATUS              0x908
+
+/* FIR & Error registers */
+#define PHB_LEM_FIR_ACCUM               0xc00
+#define PHB_LEM_FIR_AND_MASK            0xc08
+#define PHB_LEM_FIR_OR_MASK             0xc10
+#define PHB_LEM_ERROR_MASK              0xc18
+#define PHB_LEM_ERROR_AND_MASK          0xc20
+#define PHB_LEM_ERROR_OR_MASK           0xc28
+#define PHB_LEM_ACTION0                 0xc30
+#define PHB_LEM_ACTION1                 0xc38
+#define PHB_LEM_WOF                     0xc40
+#define PHB_ERR_STATUS                  0xc80
+#define PHB_ERR1_STATUS                 0xc88
+#define PHB_ERR_INJECT                  0xc90
+#define PHB_ERR_LEM_ENABLE              0xc98
+#define PHB_ERR_IRQ_ENABLE              0xca0
+#define PHB_ERR_FREEZE_ENABLE           0xca8
+#define PHB_ERR_AIB_FENCE_ENABLE        0xcb0
+#define PHB_ERR_LOG_0                   0xcc0
+#define PHB_ERR_LOG_1                   0xcc8
+#define PHB_ERR_STATUS_MASK             0xcd0
+#define PHB_ERR1_STATUS_MASK            0xcd8
+
+#define PHB_OUT_ERR_STATUS              0xd00
+#define PHB_OUT_ERR1_STATUS             0xd08
+#define PHB_OUT_ERR_INJECT              0xd10
+#define PHB_OUT_ERR_LEM_ENABLE          0xd18
+#define PHB_OUT_ERR_IRQ_ENABLE          0xd20
+#define PHB_OUT_ERR_FREEZE_ENABLE       0xd28
+#define PHB_OUT_ERR_AIB_FENCE_ENABLE    0xd30
+#define PHB_OUT_ERR_LOG_0               0xd40
+#define PHB_OUT_ERR_LOG_1               0xd48
+#define PHB_OUT_ERR_STATUS_MASK         0xd50
+#define PHB_OUT_ERR1_STATUS_MASK        0xd58
+
+#define PHB_INA_ERR_STATUS              0xd80
+#define PHB_INA_ERR1_STATUS             0xd88
+#define PHB_INA_ERR_INJECT              0xd90
+#define PHB_INA_ERR_LEM_ENABLE          0xd98
+#define PHB_INA_ERR_IRQ_ENABLE          0xda0
+#define PHB_INA_ERR_FREEZE_ENABLE       0xda8
+#define PHB_INA_ERR_AIB_FENCE_ENABLE    0xdb0
+#define PHB_INA_ERR_LOG_0               0xdc0
+#define PHB_INA_ERR_LOG_1               0xdc8
+#define PHB_INA_ERR_STATUS_MASK         0xdd0
+#define PHB_INA_ERR1_STATUS_MASK        0xdd8
+
+#define PHB_INB_ERR_STATUS              0xe00
+#define PHB_INB_ERR1_STATUS             0xe08
+#define PHB_INB_ERR_INJECT              0xe10
+#define PHB_INB_ERR_LEM_ENABLE          0xe18
+#define PHB_INB_ERR_IRQ_ENABLE          0xe20
+#define PHB_INB_ERR_FREEZE_ENABLE       0xe28
+#define PHB_INB_ERR_AIB_FENCE_ENABLE    0xe30
+#define PHB_INB_ERR_LOG_0               0xe40
+#define PHB_INB_ERR_LOG_1               0xe48
+#define PHB_INB_ERR_STATUS_MASK         0xe50
+#define PHB_INB_ERR1_STATUS_MASK        0xe58
+
+/* Performance monitor & Debug registers */
+#define PHB_TRACE_CONTROL               0xf80
+#define PHB_PERFMON_CONFIG              0xf88
+#define PHB_PERFMON_CTR0                0xf90
+#define PHB_PERFMON_CTR1                0xf98
+#define PHB_PERFMON_CTR2                0xfa0
+#define PHB_PERFMON_CTR3                0xfa8
+#define PHB_HOTPLUG_OVERRIDE            0xfb0
+#define   PHB_HPOVR_FORCE_RESAMPLE      PPC_BIT(9)
+#define   PHB_HPOVR_PRESENCE_A          PPC_BIT(10)
+#define   PHB_HPOVR_PRESENCE_B          PPC_BIT(11)
+#define   PHB_HPOVR_LINK_ACTIVE         PPC_BIT(12)
+#define   PHB_HPOVR_LINK_BIFURCATED     PPC_BIT(13)
+#define   PHB_HPOVR_LINK_LANE_SWAPPED   PPC_BIT(14)
+
+/*
+ * IODA2 on-chip tables
+ */
+
+#define IODA2_TBL_LIST          1
+#define IODA2_TBL_LXIVT         2
+#define IODA2_TBL_IVC_CAM       3
+#define IODA2_TBL_RBA           4
+#define IODA2_TBL_RCAM          5
+#define IODA2_TBL_MRT           6
+#define IODA2_TBL_PESTA         7
+#define IODA2_TBL_PESTB         8
+#define IODA2_TBL_TVT           9
+#define IODA2_TBL_TCAM          10
+#define IODA2_TBL_TDR           11
+#define IODA2_TBL_M64BT         16
+#define IODA2_TBL_M32DT         17
+#define IODA2_TBL_PEEV          20
+
+/* LXIVT */
+#define IODA2_LXIVT_SERVER              PPC_BITMASK(8, 23)
+#define IODA2_LXIVT_PRIORITY            PPC_BITMASK(24, 31)
+#define IODA2_LXIVT_NODE_ID             PPC_BITMASK(56, 63)
+
+/* IVT */
+#define IODA2_IVT_SERVER                PPC_BITMASK(0, 23)
+#define IODA2_IVT_PRIORITY              PPC_BITMASK(24, 31)
+#define IODA2_IVT_GEN                   PPC_BITMASK(37, 38)
+#define IODA2_IVT_P                     PPC_BITMASK(39, 39)
+#define IODA2_IVT_Q                     PPC_BITMASK(47, 47)
+#define IODA2_IVT_PE                    PPC_BITMASK(48, 63)
+
+/* TVT */
+#define IODA2_TVT_TABLE_ADDR            PPC_BITMASK(0, 47)
+#define IODA2_TVT_NUM_LEVELS            PPC_BITMASK(48, 50)
+#define   IODA2_TVE_1_LEVEL     0
+#define   IODA2_TVE_2_LEVELS    1
+#define   IODA2_TVE_3_LEVELS    2
+#define   IODA2_TVE_4_LEVELS    3
+#define   IODA2_TVE_5_LEVELS    4
+#define IODA2_TVT_TCE_TABLE_SIZE        PPC_BITMASK(51, 55)
+#define IODA2_TVT_IO_PSIZE              PPC_BITMASK(59, 63)
+
+/* PESTA */
+#define IODA2_PESTA_MMIO_FROZEN         PPC_BIT(0)
+
+/* PESTB */
+#define IODA2_PESTB_DMA_STOPPED         PPC_BIT(0)
+
+/* M32DT */
+#define IODA2_M32DT_PE                  PPC_BITMASK(8, 15)
+
+/* M64BT */
+#define IODA2_M64BT_ENABLE              PPC_BIT(0)
+#define IODA2_M64BT_SINGLE_PE           PPC_BIT(1)
+#define IODA2_M64BT_BASE                PPC_BITMASK(2, 31)
+#define IODA2_M64BT_MASK                PPC_BITMASK(34, 63)
+#define IODA2_M64BT_SINGLE_BASE         PPC_BITMASK(2, 26)
+#define IODA2_M64BT_PE_HI               PPC_BITMASK(27, 31)
+#define IODA2_M64BT_SINGLE_MASK         PPC_BITMASK(34, 58)
+#define IODA2_M64BT_PE_LOW              PPC_BITMASK(59, 63)
+
+/*
+ * IODA2 in-memory tables
+ */
+
+/*
+ * PEST
+ *
+ * 2x8 bytes entries, PEST0 and PEST1
+ */
+
+#define IODA2_PEST0_MMIO_CAUSE          PPC_BIT(2)
+#define IODA2_PEST0_CFG_READ            PPC_BIT(3)
+#define IODA2_PEST0_CFG_WRITE           PPC_BIT(4)
+#define IODA2_PEST0_TTYPE               PPC_BITMASK(5, 7)
+#define   PEST_TTYPE_DMA_WRITE          0
+#define   PEST_TTYPE_MSI                1
+#define   PEST_TTYPE_DMA_READ           2
+#define   PEST_TTYPE_DMA_READ_RESP      3
+#define   PEST_TTYPE_MMIO_LOAD          4
+#define   PEST_TTYPE_MMIO_STORE         5
+#define   PEST_TTYPE_OTHER              7
+#define IODA2_PEST0_CA_RETURN           PPC_BIT(8)
+#define IODA2_PEST0_UTL_RTOS_TIMEOUT    PPC_BIT(8) /* Same bit as CA return */
+#define IODA2_PEST0_UR_RETURN           PPC_BIT(9)
+#define IODA2_PEST0_UTL_NONFATAL        PPC_BIT(10)
+#define IODA2_PEST0_UTL_FATAL           PPC_BIT(11)
+#define IODA2_PEST0_PARITY_UE           PPC_BIT(13)
+#define IODA2_PEST0_UTL_CORRECTABLE     PPC_BIT(14)
+#define IODA2_PEST0_UTL_INTERRUPT       PPC_BIT(15)
+#define IODA2_PEST0_MMIO_XLATE          PPC_BIT(16)
+#define IODA2_PEST0_IODA2_ERROR         PPC_BIT(16) /* Same bit as MMIO xlate */
+#define IODA2_PEST0_TCE_PAGE_FAULT      PPC_BIT(18)
+#define IODA2_PEST0_TCE_ACCESS_FAULT    PPC_BIT(19)
+#define IODA2_PEST0_DMA_RESP_TIMEOUT    PPC_BIT(20)
+#define IODA2_PEST0_AIB_SIZE_INVALID    PPC_BIT(21)
+#define IODA2_PEST0_LEM_BIT             PPC_BITMASK(26, 31)
+#define IODA2_PEST0_RID                 PPC_BITMASK(32, 47)
+#define IODA2_PEST0_MSI_DATA            PPC_BITMASK(48, 63)
+
+#define IODA2_PEST1_FAIL_ADDR           PPC_BITMASK(3, 63)
+
+
+#endif /* PCI_HOST_PNV_PHB3_REGS_H */
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 805f9058f5d9..fb4d0c0234b3 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -30,6 +30,7 @@
 #include "hw/ppc/pnv_homer.h"
 #include "hw/ppc/pnv_xive.h"
 #include "hw/ppc/pnv_core.h"
+#include "hw/pci-host/pnv_phb3.h"
 #include "hw/pci-host/pnv_phb4.h"
 
 #define TYPE_PNV_CHIP "pnv-chip"
@@ -77,6 +78,9 @@ typedef struct Pnv8Chip {
     PnvOCC       occ;
     PnvHomer     homer;
 
+#define PNV8_CHIP_PHB3_MAX 4
+    PnvPHB3      phbs[PNV8_CHIP_PHB3_MAX];
+
     XICSFabric    *xics;
 } Pnv8Chip;
 
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index 0fc57b036753..09156a5a7a6f 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -71,6 +71,15 @@ typedef struct PnvXScomInterfaceClass {
 #define PNV_XSCOM_PBA_BASE        0x2013f00
 #define PNV_XSCOM_PBA_SIZE        0x40
 
+#define PNV_XSCOM_PBCQ_NEST_BASE  0x2012000
+#define PNV_XSCOM_PBCQ_NEST_SIZE  0x46
+
+#define PNV_XSCOM_PBCQ_PCI_BASE   0x9012000
+#define PNV_XSCOM_PBCQ_PCI_SIZE   0x15
+
+#define PNV_XSCOM_PBCQ_SPCI_BASE  0x9013c00
+#define PNV_XSCOM_PBCQ_SPCI_SIZE  0x5
+
 /*
  * Layout of the XSCOM PCB addresses (POWER 9)
  */
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 48a75aa4ab75..9ed58ec7e910 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -101,6 +101,10 @@ struct ICSStateClass {
     DeviceClass parent_class;
 
     DeviceRealize parent_realize;
+    DeviceReset parent_reset;
+
+    void (*reject)(ICSState *s, uint32_t irq);
+    void (*resend)(ICSState *s);
 };
 
 struct ICSState {
@@ -161,6 +165,7 @@ void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
 uint32_t icp_accept(ICPState *ss);
 uint32_t icp_ipoll(ICPState *ss, uint32_t *mfrr);
 void icp_eoi(ICPState *icp, uint32_t xirr);
+void icp_irq(ICSState *ics, int server, int nr, uint8_t priority);
 void icp_reset(ICPState *icp);
 
 void ics_write_xive(ICSState *ics, int nr, int server,
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 1952009e6d22..917a1ecc38c2 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -217,7 +217,7 @@ void icp_eoi(ICPState *icp, uint32_t xirr)
     }
 }
 
-static void icp_irq(ICSState *ics, int server, int nr, uint8_t priority)
+void icp_irq(ICSState *ics, int server, int nr, uint8_t priority)
 {
     ICPState *icp = xics_icp_get(ics->xics, server);
 
@@ -512,8 +512,14 @@ void ics_write_xive(ICSState *ics, int srcno, int server,
 
 static void ics_reject(ICSState *ics, uint32_t nr)
 {
+    ICSStateClass *isc = ICS_GET_CLASS(ics);
     ICSIRQState *irq = ics->irqs + nr - ics->offset;
 
+    if (isc->reject) {
+        isc->reject(ics, nr);
+        return;
+    }
+
     trace_xics_ics_reject(nr, nr - ics->offset);
     if (irq->flags & XICS_FLAGS_IRQ_MSI) {
         irq->status |= XICS_STATUS_REJECTED;
@@ -524,8 +530,14 @@ static void ics_reject(ICSState *ics, uint32_t nr)
 
 void ics_resend(ICSState *ics)
 {
+    ICSStateClass *isc = ICS_GET_CLASS(ics);
     int i;
 
+    if (isc->resend) {
+        isc->resend(ics);
+        return;
+    }
+
     for (i = 0; i < ics->nr_irqs; i++) {
         /* FIXME: filter by server#? */
         if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c
new file mode 100644
index 000000000000..2a89796bdb42
--- /dev/null
+++ b/hw/pci-host/pnv_phb3.c
@@ -0,0 +1,1195 @@
+/*
+ * QEMU PowerPC PowerNV (POWER8) PHB3 model
+ *
+ * Copyright (c) 2014-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/visitor.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "hw/pci-host/pnv_phb3_regs.h"
+#include "hw/pci-host/pnv_phb3.h"
+#include "hw/pci/pcie_host.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/ppc/pnv.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+
+#define phb3_error(phb, fmt, ...)                                       \
+    qemu_log_mask(LOG_GUEST_ERROR, "phb3[%d:%d]: " fmt "\n",            \
+                  (phb)->chip_id, (phb)->phb_id, ## __VA_ARGS__)
+
+static PCIDevice *pnv_phb3_find_cfg_dev(PnvPHB3 *phb)
+{
+    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
+    uint64_t addr = phb->regs[PHB_CONFIG_ADDRESS >> 3];
+    uint8_t bus, devfn;
+
+    if (!(addr >> 63)) {
+        return NULL;
+    }
+    bus = (addr >> 52) & 0xff;
+    devfn = (addr >> 44) & 0xff;
+
+    return pci_find_device(pci->bus, bus, devfn);
+}
+
+/*
+ * The CONFIG_DATA register expects little endian accesses, but as the
+ * region is big endian, we have to swap the value.
+ */
+static void pnv_phb3_config_write(PnvPHB3 *phb, unsigned off,
+                                  unsigned size, uint64_t val)
+{
+    uint32_t cfg_addr, limit;
+    PCIDevice *pdev;
+
+    pdev = pnv_phb3_find_cfg_dev(phb);
+    if (!pdev) {
+        return;
+    }
+    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
+    cfg_addr |= off;
+    limit = pci_config_size(pdev);
+    if (limit <= cfg_addr) {
+        /*
+         * conventional pci device can be behind pcie-to-pci bridge.
+         * 256 <= addr < 4K has no effects.
+         */
+        return;
+    }
+    switch (size) {
+    case 1:
+        break;
+    case 2:
+        val = bswap16(val);
+        break;
+    case 4:
+        val = bswap32(val);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    pci_host_config_write_common(pdev, cfg_addr, limit, val, size);
+}
+
+static uint64_t pnv_phb3_config_read(PnvPHB3 *phb, unsigned off,
+                                     unsigned size)
+{
+    uint32_t cfg_addr, limit;
+    PCIDevice *pdev;
+    uint64_t val;
+
+    pdev = pnv_phb3_find_cfg_dev(phb);
+    if (!pdev) {
+        return ~0ull;
+    }
+    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
+    cfg_addr |= off;
+    limit = pci_config_size(pdev);
+    if (limit <= cfg_addr) {
+        /*
+         * conventional pci device can be behind pcie-to-pci bridge.
+         * 256 <= addr < 4K has no effects.
+         */
+        return ~0ull;
+    }
+    val = pci_host_config_read_common(pdev, cfg_addr, limit, size);
+    switch (size) {
+    case 1:
+        return val;
+    case 2:
+        return bswap16(val);
+    case 4:
+        return bswap32(val);
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void pnv_phb3_check_m32(PnvPHB3 *phb)
+{
+    uint64_t base, start, size;
+    MemoryRegion *parent;
+    PnvPBCQState *pbcq = &phb->pbcq;
+
+    if (memory_region_is_mapped(&phb->mr_m32)) {
+        memory_region_del_subregion(phb->mr_m32.container, &phb->mr_m32);
+    }
+
+    if (!(phb->regs[PHB_PHB3_CONFIG >> 3] & PHB_PHB3C_M32_EN)) {
+        return;
+    }
+
+    /* Grab geometry from registers */
+    base = phb->regs[PHB_M32_BASE_ADDR >> 3];
+    start = phb->regs[PHB_M32_START_ADDR >> 3];
+    size = ~(phb->regs[PHB_M32_BASE_MASK >> 3] | 0xfffc000000000000ull) + 1;
+
+    /* Check if it matches an enabled MMIO region in the PBCQ */
+    if (memory_region_is_mapped(&pbcq->mmbar0) &&
+        base >= pbcq->mmio0_base &&
+        (base + size) <= (pbcq->mmio0_base + pbcq->mmio0_size)) {
+        parent = &pbcq->mmbar0;
+        base -= pbcq->mmio0_base;
+    } else if (memory_region_is_mapped(&pbcq->mmbar1) &&
+               base >= pbcq->mmio1_base &&
+               (base + size) <= (pbcq->mmio1_base + pbcq->mmio1_size)) {
+        parent = &pbcq->mmbar1;
+        base -= pbcq->mmio1_base;
+    } else {
+        return;
+    }
+
+    /* Create alias */
+    memory_region_init_alias(&phb->mr_m32, OBJECT(phb), "phb3-m32",
+                             &phb->pci_mmio, start, size);
+    memory_region_add_subregion(parent, base, &phb->mr_m32);
+}
+
+static void pnv_phb3_check_m64(PnvPHB3 *phb, uint32_t index)
+{
+    uint64_t base, start, size, m64;
+    MemoryRegion *parent;
+    PnvPBCQState *pbcq = &phb->pbcq;
+
+    if (memory_region_is_mapped(&phb->mr_m64[index])) {
+        /* Should we destroy it in RCU friendly way... ? */
+        memory_region_del_subregion(phb->mr_m64[index].container,
+                                    &phb->mr_m64[index]);
+    }
+
+    /* Get table entry */
+    m64 = phb->ioda_M64BT[index];
+
+    if (!(m64 & IODA2_M64BT_ENABLE)) {
+        return;
+    }
+
+    /* Grab geometry from registers */
+    base = GETFIELD(IODA2_M64BT_BASE, m64) << 20;
+    if (m64 & IODA2_M64BT_SINGLE_PE) {
+        base &= ~0x1ffffffull;
+    }
+    size = GETFIELD(IODA2_M64BT_MASK, m64) << 20;
+    size |= 0xfffc000000000000ull;
+    size = ~size + 1;
+    start = base | (phb->regs[PHB_M64_UPPER_BITS >> 3]);
+
+    /* Check if it matches an enabled MMIO region in the PBCQ */
+    if (memory_region_is_mapped(&pbcq->mmbar0) &&
+        base >= pbcq->mmio0_base &&
+        (base + size) <= (pbcq->mmio0_base + pbcq->mmio0_size)) {
+        parent = &pbcq->mmbar0;
+        base -= pbcq->mmio0_base;
+    } else if (memory_region_is_mapped(&pbcq->mmbar1) &&
+               base >= pbcq->mmio1_base &&
+               (base + size) <= (pbcq->mmio1_base + pbcq->mmio1_size)) {
+        parent = &pbcq->mmbar1;
+        base -= pbcq->mmio1_base;
+    } else {
+        return;
+    }
+
+    /* Create alias */
+    memory_region_init_alias(&phb->mr_m64[index], OBJECT(phb), "phb3-m64",
+                             &phb->pci_mmio, start, size);
+    memory_region_add_subregion(parent, base, &phb->mr_m64[index]);
+}
+
+static void pnv_phb3_check_all_m64s(PnvPHB3 *phb)
+{
+    uint64_t i;
+
+    for (i = 0; i < PNV_PHB3_NUM_M64; i++) {
+        pnv_phb3_check_m64(phb, i);
+    }
+}
+
+static void pnv_phb3_lxivt_write(PnvPHB3 *phb, unsigned idx, uint64_t val)
+{
+    uint8_t server, prio;
+
+    phb->ioda_LXIVT[idx] = val & (IODA2_LXIVT_SERVER |
+                                  IODA2_LXIVT_PRIORITY |
+                                  IODA2_LXIVT_NODE_ID);
+    server = GETFIELD(IODA2_LXIVT_SERVER, val);
+    prio = GETFIELD(IODA2_LXIVT_PRIORITY, val);
+
+    /*
+     * The low order 2 bits are the link pointer (Type II interrupts).
+     * Shift back to get a valid IRQ server.
+     */
+    server >>= 2;
+
+    ics_write_xive(&phb->lsis, idx, server, prio, prio);
+}
+
+static uint64_t *pnv_phb3_ioda_access(PnvPHB3 *phb,
+                                      unsigned *out_table, unsigned *out_idx)
+{
+    uint64_t adreg = phb->regs[PHB_IODA_ADDR >> 3];
+    unsigned int index = GETFIELD(PHB_IODA_AD_TADR, adreg);
+    unsigned int table = GETFIELD(PHB_IODA_AD_TSEL, adreg);
+    unsigned int mask;
+    uint64_t *tptr = NULL;
+
+    switch (table) {
+    case IODA2_TBL_LIST:
+        tptr = phb->ioda_LIST;
+        mask = 7;
+        break;
+    case IODA2_TBL_LXIVT:
+        tptr = phb->ioda_LXIVT;
+        mask = 7;
+        break;
+    case IODA2_TBL_IVC_CAM:
+    case IODA2_TBL_RBA:
+        mask = 31;
+        break;
+    case IODA2_TBL_RCAM:
+        mask = 63;
+        break;
+    case IODA2_TBL_MRT:
+        mask = 7;
+        break;
+    case IODA2_TBL_PESTA:
+    case IODA2_TBL_PESTB:
+        mask = 255;
+        break;
+    case IODA2_TBL_TVT:
+        tptr = phb->ioda_TVT;
+        mask = 511;
+        break;
+    case IODA2_TBL_TCAM:
+    case IODA2_TBL_TDR:
+        mask = 63;
+        break;
+    case IODA2_TBL_M64BT:
+        tptr = phb->ioda_M64BT;
+        mask = 15;
+        break;
+    case IODA2_TBL_M32DT:
+        tptr = phb->ioda_MDT;
+        mask = 255;
+        break;
+    case IODA2_TBL_PEEV:
+        tptr = phb->ioda_PEEV;
+        mask = 3;
+        break;
+    default:
+        phb3_error(phb, "invalid IODA table %d", table);
+        return NULL;
+    }
+    index &= mask;
+    if (out_idx) {
+        *out_idx = index;
+    }
+    if (out_table) {
+        *out_table = table;
+    }
+    if (tptr) {
+        tptr += index;
+    }
+    if (adreg & PHB_IODA_AD_AUTOINC) {
+        index = (index + 1) & mask;
+        adreg = SETFIELD(PHB_IODA_AD_TADR, adreg, index);
+    }
+    phb->regs[PHB_IODA_ADDR >> 3] = adreg;
+    return tptr;
+}
+
+static uint64_t pnv_phb3_ioda_read(PnvPHB3 *phb)
+{
+        unsigned table;
+        uint64_t *tptr;
+
+        tptr = pnv_phb3_ioda_access(phb, &table, NULL);
+        if (!tptr) {
+            /* Return 0 on unsupported tables, not ff's */
+            return 0;
+        }
+        return *tptr;
+}
+
+static void pnv_phb3_ioda_write(PnvPHB3 *phb, uint64_t val)
+{
+        unsigned table, idx;
+        uint64_t *tptr;
+
+        tptr = pnv_phb3_ioda_access(phb, &table, &idx);
+        if (!tptr) {
+            return;
+        }
+
+        /* Handle side effects */
+        switch (table) {
+        case IODA2_TBL_LXIVT:
+            pnv_phb3_lxivt_write(phb, idx, val);
+            break;
+        case IODA2_TBL_M64BT:
+            *tptr = val;
+            pnv_phb3_check_m64(phb, idx);
+            break;
+        default:
+            *tptr = val;
+        }
+}
+
+/*
+ * This is called whenever the PHB LSI, MSI source ID register or
+ * the PBCQ irq filters are written.
+ */
+void pnv_phb3_remap_irqs(PnvPHB3 *phb)
+{
+    ICSState *ics = &phb->lsis;
+    uint32_t local, global, count, mask, comp;
+    uint64_t baren;
+    PnvPBCQState *pbcq = &phb->pbcq;
+
+    /*
+     * First check if we are enabled. Unlike real HW we don't separate
+     * TX and RX so we enable if both are set
+     */
+    baren = pbcq->nest_regs[PBCQ_NEST_BAR_EN];
+    if (!(baren & PBCQ_NEST_BAR_EN_IRSN_RX) ||
+        !(baren & PBCQ_NEST_BAR_EN_IRSN_TX)) {
+        ics->offset = 0;
+        return;
+    }
+
+    /* Grab local LSI source ID */
+    local = GETFIELD(PHB_LSI_SRC_ID, phb->regs[PHB_LSI_SOURCE_ID >> 3]) << 3;
+
+    /* Grab global one and compare */
+    global = GETFIELD(PBCQ_NEST_LSI_SRC,
+                      pbcq->nest_regs[PBCQ_NEST_LSI_SRC_ID]) << 3;
+    if (global != local) {
+        /*
+         * This happens during initialization, let's come back when we
+         * are properly configured
+         */
+        ics->offset = 0;
+        return;
+    }
+
+    /* Get the base on the powerbus */
+    comp = GETFIELD(PBCQ_NEST_IRSN_COMP,
+                    pbcq->nest_regs[PBCQ_NEST_IRSN_COMPARE]);
+    mask = GETFIELD(PBCQ_NEST_IRSN_COMP,
+                    pbcq->nest_regs[PBCQ_NEST_IRSN_MASK]);
+    count = ((~mask) + 1) & 0x7ffff;
+    phb->total_irq = count;
+
+    /* Sanity checks */
+    if ((global + PNV_PHB3_NUM_LSI) > count) {
+        phb3_error(phb, "LSIs out of reach: LSI base=%d total irq=%d", global,
+                   count);
+    }
+
+    if (count > 2048) {
+        phb3_error(phb, "More interrupts than supported: %d", count);
+    }
+
+    if ((comp & mask) != comp) {
+        phb3_error(phb, "IRQ compare bits not in mask: comp=0x%x mask=0x%x",
+                   comp, mask);
+        comp &= mask;
+    }
+    /* Setup LSI offset */
+    ics->offset = comp + global;
+
+    /* Setup MSI offset */
+    pnv_phb3_msi_update_config(&phb->msis, comp, count - PNV_PHB3_NUM_LSI);
+}
+
+static void pnv_phb3_lsi_src_id_write(PnvPHB3 *phb, uint64_t val)
+{
+    /* Sanitize content */
+    val &= PHB_LSI_SRC_ID;
+    phb->regs[PHB_LSI_SOURCE_ID >> 3] = val;
+    pnv_phb3_remap_irqs(phb);
+}
+
+static void pnv_phb3_rtc_invalidate(PnvPHB3 *phb, uint64_t val)
+{
+    PnvPhb3DMASpace *ds;
+
+    /* Always invalidate all for now ... */
+    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
+        ds->pe_num = PHB_INVALID_PE;
+    }
+}
+
+
+static void pnv_phb3_update_msi_regions(PnvPhb3DMASpace *ds)
+{
+    uint64_t cfg = ds->phb->regs[PHB_PHB3_CONFIG >> 3];
+
+    if (cfg & PHB_PHB3C_32BIT_MSI_EN) {
+        if (!memory_region_is_mapped(&ds->msi32_mr)) {
+            memory_region_add_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        0xffff0000, &ds->msi32_mr);
+        }
+    } else {
+        if (memory_region_is_mapped(&ds->msi32_mr)) {
+            memory_region_del_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        &ds->msi32_mr);
+        }
+    }
+
+    if (cfg & PHB_PHB3C_64BIT_MSI_EN) {
+        if (!memory_region_is_mapped(&ds->msi64_mr)) {
+            memory_region_add_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        (1ull << 60), &ds->msi64_mr);
+        }
+    } else {
+        if (memory_region_is_mapped(&ds->msi64_mr)) {
+            memory_region_del_subregion(MEMORY_REGION(&ds->dma_mr),
+                                        &ds->msi64_mr);
+        }
+    }
+}
+
+static void pnv_phb3_update_all_msi_regions(PnvPHB3 *phb)
+{
+    PnvPhb3DMASpace *ds;
+
+    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
+        pnv_phb3_update_msi_regions(ds);
+    }
+}
+
+void pnv_phb3_reg_write(void *opaque, hwaddr off, uint64_t val, unsigned size)
+{
+    PnvPHB3 *phb = opaque;
+    bool changed;
+
+    /* Special case configuration data */
+    if ((off & 0xfffc) == PHB_CONFIG_DATA) {
+        pnv_phb3_config_write(phb, off & 0x3, size, val);
+        return;
+    }
+
+    /* Other registers are 64-bit only */
+    if (size != 8 || off & 0x7) {
+        phb3_error(phb, "Invalid register access, offset: 0x%"PRIx64" size: %d",
+                   off, size);
+        return;
+    }
+
+    /* Handle masking & filtering */
+    switch (off) {
+    case PHB_M64_UPPER_BITS:
+        val &= 0xfffc000000000000ull;
+        break;
+    case PHB_Q_DMA_R:
+        /*
+         * This is enough logic to make SW happy but we aren't actually
+         * quiescing the DMAs
+         */
+        if (val & PHB_Q_DMA_R_AUTORESET) {
+            val = 0;
+        } else {
+            val &= PHB_Q_DMA_R_QUIESCE_DMA;
+        }
+        break;
+    /* LEM stuff */
+    case PHB_LEM_FIR_AND_MASK:
+        phb->regs[PHB_LEM_FIR_ACCUM >> 3] &= val;
+        return;
+    case PHB_LEM_FIR_OR_MASK:
+        phb->regs[PHB_LEM_FIR_ACCUM >> 3] |= val;
+        return;
+    case PHB_LEM_ERROR_AND_MASK:
+        phb->regs[PHB_LEM_ERROR_MASK >> 3] &= val;
+        return;
+    case PHB_LEM_ERROR_OR_MASK:
+        phb->regs[PHB_LEM_ERROR_MASK >> 3] |= val;
+        return;
+    case PHB_LEM_WOF:
+        val = 0;
+        break;
+    }
+
+    /* Record whether it changed */
+    changed = phb->regs[off >> 3] != val;
+
+    /* Store in register cache first */
+    phb->regs[off >> 3] = val;
+
+    /* Handle side effects */
+    switch (off) {
+    case PHB_PHB3_CONFIG:
+        if (changed) {
+            pnv_phb3_update_all_msi_regions(phb);
+        }
+        /* fall through */
+    case PHB_M32_BASE_ADDR:
+    case PHB_M32_BASE_MASK:
+    case PHB_M32_START_ADDR:
+        if (changed) {
+            pnv_phb3_check_m32(phb);
+        }
+        break;
+    case PHB_M64_UPPER_BITS:
+        if (changed) {
+            pnv_phb3_check_all_m64s(phb);
+        }
+        break;
+    case PHB_LSI_SOURCE_ID:
+        if (changed) {
+            pnv_phb3_lsi_src_id_write(phb, val);
+        }
+        break;
+
+    /* IODA table accesses */
+    case PHB_IODA_DATA0:
+        pnv_phb3_ioda_write(phb, val);
+        break;
+
+    /* RTC invalidation */
+    case PHB_RTC_INVALIDATE:
+        pnv_phb3_rtc_invalidate(phb, val);
+        break;
+
+    /* FFI request */
+    case PHB_FFI_REQUEST:
+        pnv_phb3_msi_ffi(&phb->msis, val);
+        break;
+
+    /* Silent simple writes */
+    case PHB_CONFIG_ADDRESS:
+    case PHB_IODA_ADDR:
+    case PHB_TCE_KILL:
+    case PHB_TCE_SPEC_CTL:
+    case PHB_PEST_BAR:
+    case PHB_PELTV_BAR:
+    case PHB_RTT_BAR:
+    case PHB_RBA_BAR:
+    case PHB_IVT_BAR:
+    case PHB_FFI_LOCK:
+    case PHB_LEM_FIR_ACCUM:
+    case PHB_LEM_ERROR_MASK:
+    case PHB_LEM_ACTION0:
+    case PHB_LEM_ACTION1:
+        break;
+
+    /* Noise on anything else */
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb3: reg_write 0x%"PRIx64"=%"PRIx64"\n",
+                      off, val);
+    }
+}
+
+uint64_t pnv_phb3_reg_read(void *opaque, hwaddr off, unsigned size)
+{
+    PnvPHB3 *phb = opaque;
+    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
+    uint64_t val;
+
+    if ((off & 0xfffc) == PHB_CONFIG_DATA) {
+        return pnv_phb3_config_read(phb, off & 0x3, size);
+    }
+
+    /* Other registers are 64-bit only */
+    if (size != 8 || off & 0x7) {
+        phb3_error(phb, "Invalid register access, offset: 0x%"PRIx64" size: %d",
+                   off, size);
+        return ~0ull;
+    }
+
+    /* Default read from cache */
+    val = phb->regs[off >> 3];
+
+    switch (off) {
+    /* Simulate venice DD2.0 */
+    case PHB_VERSION:
+        return 0x000000a300000005ull;
+    case PHB_PCIE_SYSTEM_CONFIG:
+        return 0x441100fc30000000;
+
+    /* IODA table accesses */
+    case PHB_IODA_DATA0:
+        return pnv_phb3_ioda_read(phb);
+
+    /* Link training always appears trained */
+    case PHB_PCIE_DLP_TRAIN_CTL:
+        if (!pci_find_device(pci->bus, 1, 0)) {
+            return 0;
+        }
+        return PHB_PCIE_DLP_INBAND_PRESENCE | PHB_PCIE_DLP_TC_DL_LINKACT;
+
+    /* FFI Lock */
+    case PHB_FFI_LOCK:
+        /* Set lock and return previous value */
+        phb->regs[off >> 3] |= PHB_FFI_LOCK_STATE;
+        return val;
+
+    /* DMA read sync: make it look like it's complete */
+    case PHB_DMARD_SYNC:
+        return PHB_DMARD_SYNC_COMPLETE;
+
+    /* Silent simple reads */
+    case PHB_PHB3_CONFIG:
+    case PHB_M32_BASE_ADDR:
+    case PHB_M32_BASE_MASK:
+    case PHB_M32_START_ADDR:
+    case PHB_CONFIG_ADDRESS:
+    case PHB_IODA_ADDR:
+    case PHB_RTC_INVALIDATE:
+    case PHB_TCE_KILL:
+    case PHB_TCE_SPEC_CTL:
+    case PHB_PEST_BAR:
+    case PHB_PELTV_BAR:
+    case PHB_RTT_BAR:
+    case PHB_RBA_BAR:
+    case PHB_IVT_BAR:
+    case PHB_M64_UPPER_BITS:
+    case PHB_LEM_FIR_ACCUM:
+    case PHB_LEM_ERROR_MASK:
+    case PHB_LEM_ACTION0:
+    case PHB_LEM_ACTION1:
+        break;
+
+    /* Noise on anything else */
+    default:
+        qemu_log_mask(LOG_UNIMP, "phb3: reg_read 0x%"PRIx64"=%"PRIx64"\n",
+                      off, val);
+    }
+    return val;
+}
+
+static const MemoryRegionOps pnv_phb3_reg_ops = {
+    .read = pnv_phb3_reg_read,
+    .write = pnv_phb3_reg_write,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 1,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static int pnv_phb3_map_irq(PCIDevice *pci_dev, int irq_num)
+{
+    /* Check that out properly ... */
+    return irq_num & 3;
+}
+
+static void pnv_phb3_set_irq(void *opaque, int irq_num, int level)
+{
+    PnvPHB3 *phb = opaque;
+
+    /* LSI only ... */
+    if (irq_num > 3) {
+        phb3_error(phb, "Unknown IRQ to set %d", irq_num);
+    }
+    qemu_set_irq(phb->qirqs[irq_num], level);
+}
+
+static bool pnv_phb3_resolve_pe(PnvPhb3DMASpace *ds)
+{
+    uint64_t rtt, addr;
+    uint16_t rte;
+    int bus_num;
+
+    /* Already resolved ? */
+    if (ds->pe_num != PHB_INVALID_PE) {
+        return true;
+    }
+
+    /* We need to lookup the RTT */
+    rtt = ds->phb->regs[PHB_RTT_BAR >> 3];
+    if (!(rtt & PHB_RTT_BAR_ENABLE)) {
+        phb3_error(ds->phb, "DMA with RTT BAR disabled !");
+        /* Set error bits ? fence ? ... */
+        return false;
+    }
+
+    /* Read RTE */
+    bus_num = pci_bus_num(ds->bus);
+    addr = rtt & PHB_RTT_BASE_ADDRESS_MASK;
+    addr += 2 * ((bus_num << 8) | ds->devfn);
+    if (dma_memory_read(&address_space_memory, addr, &rte, sizeof(rte))) {
+        phb3_error(ds->phb, "Failed to read RTT entry at 0x%"PRIx64, addr);
+        /* Set error bits ? fence ? ... */
+        return false;
+    }
+    rte = be16_to_cpu(rte);
+
+    /* Fail upon reading of invalid PE# */
+    if (rte >= PNV_PHB3_NUM_PE) {
+        phb3_error(ds->phb, "RTE for RID 0x%x invalid (%04x", ds->devfn, rte);
+        /* Set error bits ? fence ? ... */
+        return false;
+    }
+    ds->pe_num = rte;
+    return true;
+}
+
+static void pnv_phb3_translate_tve(PnvPhb3DMASpace *ds, hwaddr addr,
+                                   bool is_write, uint64_t tve,
+                                   IOMMUTLBEntry *tlb)
+{
+    uint64_t tta = GETFIELD(IODA2_TVT_TABLE_ADDR, tve);
+    int32_t  lev = GETFIELD(IODA2_TVT_NUM_LEVELS, tve);
+    uint32_t tts = GETFIELD(IODA2_TVT_TCE_TABLE_SIZE, tve);
+    uint32_t tps = GETFIELD(IODA2_TVT_IO_PSIZE, tve);
+    PnvPHB3 *phb = ds->phb;
+
+    /* Invalid levels */
+    if (lev > 4) {
+        phb3_error(phb, "Invalid #levels in TVE %d", lev);
+        return;
+    }
+
+    /* IO Page Size of 0 means untranslated, else use TCEs */
+    if (tps == 0) {
+        /*
+         * We only support non-translate in top window.
+         *
+         * TODO: Venice/Murano support it on bottom window above 4G and
+         * Naples suports it on everything
+         */
+        if (!(tve & PPC_BIT(51))) {
+            phb3_error(phb, "xlate for invalid non-translate TVE");
+            return;
+        }
+        /* TODO: Handle boundaries */
+
+        /* Use 4k pages like q35 ... for now */
+        tlb->iova = addr & 0xfffffffffffff000ull;
+        tlb->translated_addr = addr & 0x0003fffffffff000ull;
+        tlb->addr_mask = 0xfffull;
+        tlb->perm = IOMMU_RW;
+    } else {
+        uint32_t tce_shift, tbl_shift, sh;
+        uint64_t base, taddr, tce, tce_mask;
+
+        /* TVE disabled ? */
+        if (tts == 0) {
+            phb3_error(phb, "xlate for invalid translated TVE");
+            return;
+        }
+
+        /* Address bits per bottom level TCE entry */
+        tce_shift = tps + 11;
+
+        /* Address bits per table level */
+        tbl_shift = tts + 8;
+
+        /* Top level table base address */
+        base = tta << 12;
+
+        /* Total shift to first level */
+        sh = tbl_shift * lev + tce_shift;
+
+        /* TODO: Multi-level untested */
+        while ((lev--) >= 0) {
+            /* Grab the TCE address */
+            taddr = base | (((addr >> sh) & ((1ul << tbl_shift) - 1)) << 3);
+            if (dma_memory_read(&address_space_memory, taddr, &tce,
+                                sizeof(tce))) {
+                phb3_error(phb, "Failed to read TCE at 0x%"PRIx64, taddr);
+                return;
+            }
+            tce = be64_to_cpu(tce);
+
+            /* Check permission for indirect TCE */
+            if ((lev >= 0) && !(tce & 3)) {
+                phb3_error(phb, "Invalid indirect TCE at 0x%"PRIx64, taddr);
+                phb3_error(phb, " xlate %"PRIx64":%c TVE=%"PRIx64, addr,
+                           is_write ? 'W' : 'R', tve);
+                phb3_error(phb, " tta=%"PRIx64" lev=%d tts=%d tps=%d",
+                           tta, lev, tts, tps);
+                return;
+            }
+            sh -= tbl_shift;
+            base = tce & ~0xfffull;
+        }
+
+        /* We exit the loop with TCE being the final TCE */
+        tce_mask = ~((1ull << tce_shift) - 1);
+        tlb->iova = addr & tce_mask;
+        tlb->translated_addr = tce & tce_mask;
+        tlb->addr_mask = ~tce_mask;
+        tlb->perm = tce & 3;
+        if ((is_write & !(tce & 2)) || ((!is_write) && !(tce & 1))) {
+            phb3_error(phb, "TCE access fault at 0x%"PRIx64, taddr);
+            phb3_error(phb, " xlate %"PRIx64":%c TVE=%"PRIx64, addr,
+                       is_write ? 'W' : 'R', tve);
+            phb3_error(phb, " tta=%"PRIx64" lev=%d tts=%d tps=%d",
+                       tta, lev, tts, tps);
+        }
+    }
+}
+
+static IOMMUTLBEntry pnv_phb3_translate_iommu(IOMMUMemoryRegion *iommu,
+                                              hwaddr addr,
+                                              IOMMUAccessFlags flag,
+                                              int iommu_idx)
+{
+    PnvPhb3DMASpace *ds = container_of(iommu, PnvPhb3DMASpace, dma_mr);
+    int tve_sel;
+    uint64_t tve, cfg;
+    IOMMUTLBEntry ret = {
+        .target_as = &address_space_memory,
+        .iova = addr,
+        .translated_addr = 0,
+        .addr_mask = ~(hwaddr)0,
+        .perm = IOMMU_NONE,
+    };
+    PnvPHB3 *phb = ds->phb;
+
+    /* Resolve PE# */
+    if (!pnv_phb3_resolve_pe(ds)) {
+        phb3_error(phb, "Failed to resolve PE# for bus @%p (%d) devfn 0x%x",
+                   ds->bus, pci_bus_num(ds->bus), ds->devfn);
+        return ret;
+    }
+
+    /* Check top bits */
+    switch (addr >> 60) {
+    case 00:
+        /* DMA or 32-bit MSI ? */
+        cfg = ds->phb->regs[PHB_PHB3_CONFIG >> 3];
+        if ((cfg & PHB_PHB3C_32BIT_MSI_EN) &&
+            ((addr & 0xffffffffffff0000ull) == 0xffff0000ull)) {
+            phb3_error(phb, "xlate on 32-bit MSI region");
+            return ret;
+        }
+        /* Choose TVE XXX Use PHB3 Control Register */
+        tve_sel = (addr >> 59) & 1;
+        tve = ds->phb->ioda_TVT[ds->pe_num * 2 + tve_sel];
+        pnv_phb3_translate_tve(ds, addr, flag & IOMMU_WO, tve, &ret);
+        break;
+    case 01:
+        phb3_error(phb, "xlate on 64-bit MSI region");
+        break;
+    default:
+        phb3_error(phb, "xlate on unsupported address 0x%"PRIx64, addr);
+    }
+    return ret;
+}
+
+#define TYPE_PNV_PHB3_IOMMU_MEMORY_REGION "pnv-phb3-iommu-memory-region"
+#define PNV_PHB3_IOMMU_MEMORY_REGION(obj) \
+    OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_PNV_PHB3_IOMMU_MEMORY_REGION)
+
+static void pnv_phb3_iommu_memory_region_class_init(ObjectClass *klass,
+                                                    void *data)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
+
+    imrc->translate = pnv_phb3_translate_iommu;
+}
+
+static const TypeInfo pnv_phb3_iommu_memory_region_info = {
+    .parent = TYPE_IOMMU_MEMORY_REGION,
+    .name = TYPE_PNV_PHB3_IOMMU_MEMORY_REGION,
+    .class_init = pnv_phb3_iommu_memory_region_class_init,
+};
+
+/*
+ * MSI/MSIX memory region implementation.
+ * The handler handles both MSI and MSIX.
+ */
+static void pnv_phb3_msi_write(void *opaque, hwaddr addr,
+                               uint64_t data, unsigned size)
+{
+    PnvPhb3DMASpace *ds = opaque;
+
+    /* Resolve PE# */
+    if (!pnv_phb3_resolve_pe(ds)) {
+        phb3_error(ds->phb, "Failed to resolve PE# for bus @%p (%d) devfn 0x%x",
+                   ds->bus, pci_bus_num(ds->bus), ds->devfn);
+        return;
+    }
+
+    pnv_phb3_msi_send(&ds->phb->msis, addr, data, ds->pe_num);
+}
+
+/* There is no .read as the read result is undefined by PCI spec */
+static uint64_t pnv_phb3_msi_read(void *opaque, hwaddr addr, unsigned size)
+{
+    PnvPhb3DMASpace *ds = opaque;
+
+    phb3_error(ds->phb, "invalid read @ 0x%" HWADDR_PRIx, addr);
+    return -1;
+}
+
+static const MemoryRegionOps pnv_phb3_msi_ops = {
+    .read = pnv_phb3_msi_read,
+    .write = pnv_phb3_msi_write,
+    .endianness = DEVICE_LITTLE_ENDIAN
+};
+
+static AddressSpace *pnv_phb3_dma_iommu(PCIBus *bus, void *opaque, int devfn)
+{
+    PnvPHB3 *phb = opaque;
+    PnvPhb3DMASpace *ds;
+
+    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
+        if (ds->bus == bus && ds->devfn == devfn) {
+            break;
+        }
+    }
+
+    if (ds == NULL) {
+        ds = g_malloc0(sizeof(PnvPhb3DMASpace));
+        ds->bus = bus;
+        ds->devfn = devfn;
+        ds->pe_num = PHB_INVALID_PE;
+        ds->phb = phb;
+        memory_region_init_iommu(&ds->dma_mr, sizeof(ds->dma_mr),
+                                 TYPE_PNV_PHB3_IOMMU_MEMORY_REGION,
+                                 OBJECT(phb), "phb3_iommu", UINT64_MAX);
+        address_space_init(&ds->dma_as, MEMORY_REGION(&ds->dma_mr),
+                           "phb3_iommu");
+        memory_region_init_io(&ds->msi32_mr, OBJECT(phb), &pnv_phb3_msi_ops,
+                              ds, "msi32", 0x10000);
+        memory_region_init_io(&ds->msi64_mr, OBJECT(phb), &pnv_phb3_msi_ops,
+                              ds, "msi64", 0x100000);
+        pnv_phb3_update_msi_regions(ds);
+
+        QLIST_INSERT_HEAD(&phb->dma_spaces, ds, list);
+    }
+    return &ds->dma_as;
+}
+
+static void pnv_phb3_instance_init(Object *obj)
+{
+    PnvPHB3 *phb = PNV_PHB3(obj);
+
+    QLIST_INIT(&phb->dma_spaces);
+
+    /* LSI sources */
+    object_initialize_child(obj, "lsi", &phb->lsis, sizeof(phb->lsis),
+                             TYPE_ICS, &error_abort, NULL);
+
+    /* Default init ... will be fixed by HW inits */
+    phb->lsis.offset = 0;
+
+    /* MSI sources */
+    object_initialize_child(obj, "msi", &phb->msis, sizeof(phb->msis),
+                            TYPE_PHB3_MSI, &error_abort, NULL);
+
+    /* Power Bus Common Queue */
+    object_initialize_child(obj, "pbcq", &phb->pbcq, sizeof(phb->pbcq),
+                            TYPE_PNV_PBCQ, &error_abort, NULL);
+
+    /* Root Port */
+    object_initialize_child(obj, "root", &phb->root, sizeof(phb->root),
+                            TYPE_PNV_PHB3_ROOT_PORT, &error_abort, NULL);
+    qdev_prop_set_int32(DEVICE(&phb->root), "addr", PCI_DEVFN(0, 0));
+    qdev_prop_set_bit(DEVICE(&phb->root), "multifunction", false);
+}
+
+static void pnv_phb3_realize(DeviceState *dev, Error **errp)
+{
+    PnvPHB3 *phb = PNV_PHB3(dev);
+    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
+    PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
+    Error *local_err = NULL;
+    int i;
+
+    if (phb->phb_id >= PNV8_CHIP_PHB3_MAX) {
+        error_setg(errp, "invalid PHB index: %d", phb->phb_id);
+        return;
+    }
+
+    /* LSI sources */
+    object_property_set_link(OBJECT(&phb->lsis), OBJECT(pnv), "xics",
+                                   &error_abort);
+    object_property_set_int(OBJECT(&phb->lsis), PNV_PHB3_NUM_LSI, "nr-irqs",
+                            &error_abort);
+    object_property_set_bool(OBJECT(&phb->lsis), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    for (i = 0; i < phb->lsis.nr_irqs; i++) {
+        ics_set_irq_type(&phb->lsis, i, true);
+    }
+
+    phb->qirqs = qemu_allocate_irqs(ics_set_irq, &phb->lsis, phb->lsis.nr_irqs);
+
+    /* MSI sources */
+    object_property_set_link(OBJECT(&phb->msis), OBJECT(phb), "phb",
+                                   &error_abort);
+    object_property_set_link(OBJECT(&phb->msis), OBJECT(pnv), "xics",
+                                   &error_abort);
+    object_property_set_int(OBJECT(&phb->msis), PHB3_MAX_MSI, "nr-irqs",
+                            &error_abort);
+    object_property_set_bool(OBJECT(&phb->msis), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Power Bus Common Queue */
+    object_property_set_link(OBJECT(&phb->pbcq), OBJECT(phb), "phb",
+                                   &error_abort);
+    object_property_set_bool(OBJECT(&phb->pbcq), true, "realized", &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Controller Registers */
+    memory_region_init_io(&phb->mr_regs, OBJECT(phb), &pnv_phb3_reg_ops, phb,
+                          "phb3-regs", 0x1000);
+
+    /*
+     * PHB3 doesn't support IO space. However, qemu gets very upset if
+     * we don't have an IO region to anchor IO BARs onto so we just
+     * initialize one which we never hook up to anything
+     */
+    memory_region_init(&phb->pci_io, OBJECT(phb), "pci-io", 0x10000);
+    memory_region_init(&phb->pci_mmio, OBJECT(phb), "pci-mmio",
+                       PCI_MMIO_TOTAL_SIZE);
+
+    pci->bus = pci_register_root_bus(dev, "root-bus",
+                                     pnv_phb3_set_irq, pnv_phb3_map_irq, phb,
+                                     &phb->pci_mmio, &phb->pci_io,
+                                     0, 4, TYPE_PNV_PHB3_ROOT_BUS);
+
+    pci_setup_iommu(pci->bus, pnv_phb3_dma_iommu, phb);
+
+    /* Add a single Root port */
+    qdev_prop_set_uint8(DEVICE(&phb->root), "chassis", phb->chip_id);
+    qdev_prop_set_uint16(DEVICE(&phb->root), "slot", phb->phb_id);
+    qdev_set_parent_bus(DEVICE(&phb->root), BUS(pci->bus));
+    qdev_init_nofail(DEVICE(&phb->root));
+}
+
+void pnv_phb3_update_regions(PnvPHB3 *phb)
+{
+    PnvPBCQState *pbcq = &phb->pbcq;
+
+    /* Unmap first always */
+    if (memory_region_is_mapped(&phb->mr_regs)) {
+        memory_region_del_subregion(&pbcq->phbbar, &phb->mr_regs);
+    }
+
+    /* Map registers if enabled */
+    if (memory_region_is_mapped(&pbcq->phbbar)) {
+        /* TODO: We should use the PHB BAR 2 register but we don't ... */
+        memory_region_add_subregion(&pbcq->phbbar, 0, &phb->mr_regs);
+    }
+
+    /* Check/update m32 */
+    if (memory_region_is_mapped(&phb->mr_m32)) {
+        pnv_phb3_check_m32(phb);
+    }
+    pnv_phb3_check_all_m64s(phb);
+}
+
+static const char *pnv_phb3_root_bus_path(PCIHostState *host_bridge,
+                                          PCIBus *rootbus)
+{
+    PnvPHB3 *phb = PNV_PHB3(host_bridge);
+
+    snprintf(phb->bus_path, sizeof(phb->bus_path), "00%02x:%02x",
+             phb->chip_id, phb->phb_id);
+    return phb->bus_path;
+}
+
+static Property pnv_phb3_properties[] = {
+        DEFINE_PROP_UINT32("index", PnvPHB3, phb_id, 0),
+        DEFINE_PROP_UINT32("chip-id", PnvPHB3, chip_id, 0),
+        DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pnv_phb3_class_init(ObjectClass *klass, void *data)
+{
+    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    hc->root_bus_path = pnv_phb3_root_bus_path;
+    dc->realize = pnv_phb3_realize;
+    dc->props = pnv_phb3_properties;
+    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+}
+
+static const TypeInfo pnv_phb3_type_info = {
+    .name          = TYPE_PNV_PHB3,
+    .parent        = TYPE_PCIE_HOST_BRIDGE,
+    .instance_size = sizeof(PnvPHB3),
+    .class_init    = pnv_phb3_class_init,
+    .instance_init = pnv_phb3_instance_init,
+};
+
+static void pnv_phb3_root_bus_class_init(ObjectClass *klass, void *data)
+{
+    BusClass *k = BUS_CLASS(klass);
+
+    /*
+     * PHB3 has only a single root complex. Enforce the limit on the
+     * parent bus
+     */
+    k->max_dev = 1;
+}
+
+static const TypeInfo pnv_phb3_root_bus_info = {
+    .name = TYPE_PNV_PHB3_ROOT_BUS,
+    .parent = TYPE_PCIE_BUS,
+    .class_init = pnv_phb3_root_bus_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_PCIE_DEVICE },
+        { }
+    },
+};
+
+static void pnv_phb3_root_port_realize(DeviceState *dev, Error **errp)
+{
+    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+    Error *local_err = NULL;
+
+    rpc->parent_realize(dev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static void pnv_phb3_root_port_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
+
+    dc->desc     = "IBM PHB3 PCIE Root Port";
+
+    device_class_set_parent_realize(dc, pnv_phb3_root_port_realize,
+                                    &rpc->parent_realize);
+
+    k->vendor_id = PCI_VENDOR_ID_IBM;
+    k->device_id = 0x03dc;
+    k->revision  = 0;
+
+    rpc->exp_offset = 0x48;
+    rpc->aer_offset = 0x100;
+}
+
+static const TypeInfo pnv_phb3_root_port_info = {
+    .name          = TYPE_PNV_PHB3_ROOT_PORT,
+    .parent        = TYPE_PCIE_ROOT_PORT,
+    .instance_size = sizeof(PnvPHB3RootPort),
+    .class_init    = pnv_phb3_root_port_class_init,
+};
+
+static void pnv_phb3_register_types(void)
+{
+    type_register_static(&pnv_phb3_root_bus_info);
+    type_register_static(&pnv_phb3_root_port_info);
+    type_register_static(&pnv_phb3_type_info);
+    type_register_static(&pnv_phb3_iommu_memory_region_info);
+}
+
+type_init(pnv_phb3_register_types)
diff --git a/hw/pci-host/pnv_phb3_msi.c b/hw/pci-host/pnv_phb3_msi.c
new file mode 100644
index 000000000000..ecfc1b2c4e3d
--- /dev/null
+++ b/hw/pci-host/pnv_phb3_msi.c
@@ -0,0 +1,349 @@
+/*
+ * QEMU PowerPC PowerNV (POWER8) PHB3 model
+ *
+ * Copyright (c) 2014-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "hw/pci-host/pnv_phb3_regs.h"
+#include "hw/pci-host/pnv_phb3.h"
+#include "hw/ppc/pnv.h"
+#include "hw/pci/msi.h"
+#include "monitor/monitor.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "sysemu/reset.h"
+
+static uint64_t phb3_msi_ive_addr(PnvPHB3 *phb, int srcno)
+{
+    uint64_t ivtbar = phb->regs[PHB_IVT_BAR >> 3];
+    uint64_t phbctl = phb->regs[PHB_CONTROL >> 3];
+
+    if (!(ivtbar & PHB_IVT_BAR_ENABLE)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "Failed access to disable IVT BAR !");
+        return 0;
+    }
+
+    if (srcno >= (ivtbar & PHB_IVT_LENGTH_MASK)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "MSI out of bounds (%d vs  0x%"PRIx64")",
+                      srcno, (uint64_t) (ivtbar & PHB_IVT_LENGTH_MASK));
+        return 0;
+    }
+
+    ivtbar &= PHB_IVT_BASE_ADDRESS_MASK;
+
+    if (phbctl & PHB_CTRL_IVE_128_BYTES) {
+        return ivtbar + 128 * srcno;
+    } else {
+        return ivtbar + 16 * srcno;
+    }
+}
+
+static bool phb3_msi_read_ive(PnvPHB3 *phb, int srcno, uint64_t *out_ive)
+{
+    uint64_t ive_addr, ive;
+
+    ive_addr = phb3_msi_ive_addr(phb, srcno);
+    if (!ive_addr) {
+        return false;
+    }
+
+    if (dma_memory_read(&address_space_memory, ive_addr, &ive, sizeof(ive))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "Failed to read IVE at 0x%" PRIx64,
+                      ive_addr);
+        return false;
+    }
+    *out_ive = be64_to_cpu(ive);
+
+    return true;
+}
+
+static void phb3_msi_set_p(Phb3MsiState *msi, int srcno, uint8_t gen)
+{
+    uint64_t ive_addr;
+    uint8_t p = 0x01 | (gen << 1);
+
+    ive_addr = phb3_msi_ive_addr(msi->phb, srcno);
+    if (!ive_addr) {
+        return;
+    }
+
+    if (dma_memory_write(&address_space_memory, ive_addr + 4, &p, 1)) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "Failed to write IVE (set P) at 0x%" PRIx64, ive_addr);
+    }
+}
+
+static void phb3_msi_set_q(Phb3MsiState *msi, int srcno)
+{
+    uint64_t ive_addr;
+    uint8_t q = 0x01;
+
+    ive_addr = phb3_msi_ive_addr(msi->phb, srcno);
+    if (!ive_addr) {
+        return;
+    }
+
+    if (dma_memory_write(&address_space_memory, ive_addr + 5, &q, 1)) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "Failed to write IVE (set Q) at 0x%" PRIx64, ive_addr);
+    }
+}
+
+static void phb3_msi_try_send(Phb3MsiState *msi, int srcno, bool force)
+{
+    ICSState *ics = ICS(msi);
+    uint64_t ive;
+    uint64_t server, prio, pq, gen;
+
+    if (!phb3_msi_read_ive(msi->phb, srcno, &ive)) {
+        return;
+    }
+
+    server = GETFIELD(IODA2_IVT_SERVER, ive);
+    prio = GETFIELD(IODA2_IVT_PRIORITY, ive);
+    if (!force) {
+        pq = GETFIELD(IODA2_IVT_Q, ive) | (GETFIELD(IODA2_IVT_P, ive) << 1);
+    } else {
+        pq = 0;
+    }
+    gen = GETFIELD(IODA2_IVT_GEN, ive);
+
+    /*
+     * The low order 2 bits are the link pointer (Type II interrupts).
+     * Shift back to get a valid IRQ server.
+     */
+    server >>= 2;
+
+    switch (pq) {
+    case 0: /* 00 */
+        if (prio == 0xff) {
+            /* Masked, set Q */
+            phb3_msi_set_q(msi, srcno);
+        } else {
+            /* Enabled, set P and send */
+            phb3_msi_set_p(msi, srcno, gen);
+            icp_irq(ics, server, srcno + ics->offset, prio);
+        }
+        break;
+    case 2: /* 10 */
+        /* Already pending, set Q */
+        phb3_msi_set_q(msi, srcno);
+        break;
+    case 1: /* 01 */
+    case 3: /* 11 */
+    default:
+        /* Just drop stuff if Q already set */
+        break;
+    }
+}
+
+static void phb3_msi_set_irq(void *opaque, int srcno, int val)
+{
+    Phb3MsiState *msi = PHB3_MSI(opaque);
+
+    if (val) {
+        phb3_msi_try_send(msi, srcno, false);
+    }
+}
+
+
+void pnv_phb3_msi_send(Phb3MsiState *msi, uint64_t addr, uint16_t data,
+                       int32_t dev_pe)
+{
+    ICSState *ics = ICS(msi);
+    uint64_t ive;
+    uint16_t pe;
+    uint32_t src = ((addr >> 4) & 0xffff) | (data & 0x1f);
+
+    if (src >= ics->nr_irqs) {
+        qemu_log_mask(LOG_GUEST_ERROR, "MSI %d out of bounds", src);
+        return;
+    }
+    if (dev_pe >= 0) {
+        if (!phb3_msi_read_ive(msi->phb, src, &ive)) {
+            return;
+        }
+        pe = GETFIELD(IODA2_IVT_PE, ive);
+        if (pe != dev_pe) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "MSI %d send by PE#%d but assigned to PE#%d",
+                          src, dev_pe, pe);
+            return;
+        }
+    }
+    qemu_irq_pulse(msi->qirqs[src]);
+}
+
+void pnv_phb3_msi_ffi(Phb3MsiState *msi, uint64_t val)
+{
+    /* Emit interrupt */
+    pnv_phb3_msi_send(msi, val, 0, -1);
+
+    /* Clear FFI lock */
+    msi->phb->regs[PHB_FFI_LOCK >> 3] = 0;
+}
+
+static void phb3_msi_reject(ICSState *ics, uint32_t nr)
+{
+    Phb3MsiState *msi = PHB3_MSI(ics);
+    unsigned int srcno = nr - ics->offset;
+    unsigned int idx = srcno >> 6;
+    unsigned int bit = 1ull << (srcno & 0x3f);
+
+    assert(srcno < PHB3_MAX_MSI);
+
+    msi->rba[idx] |= bit;
+    msi->rba_sum |= (1u << idx);
+}
+
+static void phb3_msi_resend(ICSState *ics)
+{
+    Phb3MsiState *msi = PHB3_MSI(ics);
+    unsigned int i, j;
+
+    if (msi->rba_sum == 0) {
+        return;
+    }
+
+    for (i = 0; i < 32; i++) {
+        if ((msi->rba_sum & (1u << i)) == 0) {
+            continue;
+        }
+        msi->rba_sum &= ~(1u << i);
+        for (j = 0; j < 64; j++) {
+            if ((msi->rba[i] & (1ull << j)) == 0) {
+                continue;
+            }
+            msi->rba[i] &= ~(1u << j);
+            phb3_msi_try_send(msi, i * 64 + j, true);
+        }
+    }
+}
+
+static void phb3_msi_reset(DeviceState *dev)
+{
+    Phb3MsiState *msi = PHB3_MSI(dev);
+    ICSStateClass *icsc = ICS_GET_CLASS(dev);
+
+    icsc->parent_reset(dev);
+
+    memset(msi->rba, 0, sizeof(msi->rba));
+    msi->rba_sum = 0;
+}
+
+static void phb3_msi_reset_handler(void *dev)
+{
+    phb3_msi_reset(dev);
+}
+
+void pnv_phb3_msi_update_config(Phb3MsiState *msi, uint32_t base,
+                                uint32_t count)
+{
+    ICSState *ics = ICS(msi);
+
+    if (count > PHB3_MAX_MSI) {
+        count = PHB3_MAX_MSI;
+    }
+    ics->nr_irqs = count;
+    ics->offset = base;
+}
+
+static void phb3_msi_realize(DeviceState *dev, Error **errp)
+{
+    Phb3MsiState *msi = PHB3_MSI(dev);
+    ICSState *ics = ICS(msi);
+    ICSStateClass *icsc = ICS_GET_CLASS(ics);
+    Error *local_err = NULL;
+
+    assert(msi->phb);
+
+    icsc->parent_realize(dev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    msi->qirqs = qemu_allocate_irqs(phb3_msi_set_irq, msi, ics->nr_irqs);
+
+    qemu_register_reset(phb3_msi_reset_handler, dev);
+}
+
+static void phb3_msi_instance_init(Object *obj)
+{
+    Phb3MsiState *msi = PHB3_MSI(obj);
+    ICSState *ics = ICS(obj);
+
+    object_property_add_link(obj, "phb", TYPE_PNV_PHB3,
+                             (Object **)&msi->phb,
+                             object_property_allow_set_link,
+                             OBJ_PROP_LINK_STRONG,
+                             &error_abort);
+
+    /* Will be overriden later */
+    ics->offset = 0;
+}
+
+static void phb3_msi_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    ICSStateClass *isc = ICS_CLASS(klass);
+
+    device_class_set_parent_realize(dc, phb3_msi_realize,
+                                    &isc->parent_realize);
+    device_class_set_parent_reset(dc, phb3_msi_reset,
+                                  &isc->parent_reset);
+
+    isc->reject = phb3_msi_reject;
+    isc->resend = phb3_msi_resend;
+}
+
+static const TypeInfo phb3_msi_info = {
+    .name = TYPE_PHB3_MSI,
+    .parent = TYPE_ICS,
+    .instance_size = sizeof(Phb3MsiState),
+    .class_init = phb3_msi_class_init,
+    .class_size = sizeof(ICSStateClass),
+    .instance_init = phb3_msi_instance_init,
+};
+
+static void pnv_phb3_msi_register_types(void)
+{
+    type_register_static(&phb3_msi_info);
+}
+
+type_init(pnv_phb3_msi_register_types);
+
+void pnv_phb3_msi_pic_print_info(Phb3MsiState *msi, Monitor *mon)
+{
+    ICSState *ics = ICS(msi);
+    int i;
+
+    monitor_printf(mon, "ICS %4x..%4x %p\n",
+                   ics->offset, ics->offset + ics->nr_irqs - 1, ics);
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        uint64_t ive;
+
+        if (!phb3_msi_read_ive(msi->phb, i, &ive)) {
+            return;
+        }
+
+        if (GETFIELD(IODA2_IVT_PRIORITY, ive) == 0xff) {
+            continue;
+        }
+
+        monitor_printf(mon, "  %4x %c%c server=%04x prio=%02x gen=%d\n",
+                       ics->offset + i,
+                       GETFIELD(IODA2_IVT_P, ive) ? 'P' : '-',
+                       GETFIELD(IODA2_IVT_Q, ive) ? 'Q' : '-',
+                       (uint32_t) GETFIELD(IODA2_IVT_SERVER, ive) >> 2,
+                       (uint32_t) GETFIELD(IODA2_IVT_PRIORITY, ive),
+                       (uint32_t) GETFIELD(IODA2_IVT_GEN, ive));
+    }
+}
diff --git a/hw/pci-host/pnv_phb3_pbcq.c b/hw/pci-host/pnv_phb3_pbcq.c
new file mode 100644
index 000000000000..6f0c05be682a
--- /dev/null
+++ b/hw/pci-host/pnv_phb3_pbcq.c
@@ -0,0 +1,357 @@
+/*
+ * QEMU PowerPC PowerNV (POWER8) PHB3 model
+ *
+ * Copyright (c) 2014-2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "target/ppc/cpu.h"
+#include "hw/ppc/fdt.h"
+#include "hw/pci-host/pnv_phb3_regs.h"
+#include "hw/pci-host/pnv_phb3.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_bus.h"
+
+#include <libfdt.h>
+
+#define phb3_pbcq_error(pbcq, fmt, ...)                                 \
+    qemu_log_mask(LOG_GUEST_ERROR, "phb3_pbcq[%d:%d]: " fmt "\n",       \
+                  (pbcq)->phb->chip_id, (pbcq)->phb->phb_id, ## __VA_ARGS__)
+
+static uint64_t pnv_pbcq_nest_xscom_read(void *opaque, hwaddr addr,
+                                         unsigned size)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(opaque);
+    uint32_t offset = addr >> 3;
+
+    return pbcq->nest_regs[offset];
+}
+
+static uint64_t pnv_pbcq_pci_xscom_read(void *opaque, hwaddr addr,
+                                        unsigned size)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(opaque);
+    uint32_t offset = addr >> 3;
+
+    return pbcq->pci_regs[offset];
+}
+
+static uint64_t pnv_pbcq_spci_xscom_read(void *opaque, hwaddr addr,
+                                         unsigned size)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(opaque);
+    uint32_t offset = addr >> 3;
+
+    if (offset == PBCQ_SPCI_ASB_DATA) {
+        return pnv_phb3_reg_read(pbcq->phb,
+                                 pbcq->spci_regs[PBCQ_SPCI_ASB_ADDR], 8);
+    }
+    return pbcq->spci_regs[offset];
+}
+
+static void pnv_pbcq_update_map(PnvPBCQState *pbcq)
+{
+    uint64_t bar_en = pbcq->nest_regs[PBCQ_NEST_BAR_EN];
+    uint64_t bar, mask, size;
+
+    /*
+     * NOTE: This will really not work well if those are remapped
+     * after the PHB has created its sub regions. We could do better
+     * if we had a way to resize regions but we don't really care
+     * that much in practice as the stuff below really only happens
+     * once early during boot
+     */
+
+    /* Handle unmaps */
+    if (memory_region_is_mapped(&pbcq->mmbar0) &&
+        !(bar_en & PBCQ_NEST_BAR_EN_MMIO0)) {
+        memory_region_del_subregion(get_system_memory(), &pbcq->mmbar0);
+    }
+    if (memory_region_is_mapped(&pbcq->mmbar1) &&
+        !(bar_en & PBCQ_NEST_BAR_EN_MMIO1)) {
+        memory_region_del_subregion(get_system_memory(), &pbcq->mmbar1);
+    }
+    if (memory_region_is_mapped(&pbcq->phbbar) &&
+        !(bar_en & PBCQ_NEST_BAR_EN_PHB)) {
+        memory_region_del_subregion(get_system_memory(), &pbcq->phbbar);
+    }
+
+    /* Update PHB */
+    pnv_phb3_update_regions(pbcq->phb);
+
+    /* Handle maps */
+    if (!memory_region_is_mapped(&pbcq->mmbar0) &&
+        (bar_en & PBCQ_NEST_BAR_EN_MMIO0)) {
+        bar = pbcq->nest_regs[PBCQ_NEST_MMIO_BAR0] >> 14;
+        mask = pbcq->nest_regs[PBCQ_NEST_MMIO_MASK0];
+        size = ((~mask) >> 14) + 1;
+        memory_region_init(&pbcq->mmbar0, OBJECT(pbcq), "pbcq-mmio0", size);
+        memory_region_add_subregion(get_system_memory(), bar, &pbcq->mmbar0);
+        pbcq->mmio0_base = bar;
+        pbcq->mmio0_size = size;
+    }
+    if (!memory_region_is_mapped(&pbcq->mmbar1) &&
+        (bar_en & PBCQ_NEST_BAR_EN_MMIO1)) {
+        bar = pbcq->nest_regs[PBCQ_NEST_MMIO_BAR1] >> 14;
+        mask = pbcq->nest_regs[PBCQ_NEST_MMIO_MASK1];
+        size = ((~mask) >> 14) + 1;
+        memory_region_init(&pbcq->mmbar1, OBJECT(pbcq), "pbcq-mmio1", size);
+        memory_region_add_subregion(get_system_memory(), bar, &pbcq->mmbar1);
+        pbcq->mmio1_base = bar;
+        pbcq->mmio1_size = size;
+    }
+    if (!memory_region_is_mapped(&pbcq->phbbar)
+        && (bar_en & PBCQ_NEST_BAR_EN_PHB)) {
+        bar = pbcq->nest_regs[PBCQ_NEST_PHB_BAR] >> 14;
+        size = 0x1000;
+        memory_region_init(&pbcq->phbbar, OBJECT(pbcq), "pbcq-phb", size);
+        memory_region_add_subregion(get_system_memory(), bar, &pbcq->phbbar);
+    }
+
+    /* Update PHB */
+    pnv_phb3_update_regions(pbcq->phb);
+}
+
+static void pnv_pbcq_nest_xscom_write(void *opaque, hwaddr addr,
+                                uint64_t val, unsigned size)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(opaque);
+    uint32_t reg = addr >> 3;
+
+    switch (reg) {
+    case PBCQ_NEST_MMIO_BAR0:
+    case PBCQ_NEST_MMIO_BAR1:
+    case PBCQ_NEST_MMIO_MASK0:
+    case PBCQ_NEST_MMIO_MASK1:
+        if (pbcq->nest_regs[PBCQ_NEST_BAR_EN] &
+            (PBCQ_NEST_BAR_EN_MMIO0 |
+             PBCQ_NEST_BAR_EN_MMIO1)) {
+            phb3_pbcq_error(pbcq, "Changing enabled BAR unsupported");
+        }
+        pbcq->nest_regs[reg] = val & 0xffffffffc0000000ull;
+        break;
+    case PBCQ_NEST_PHB_BAR:
+        if (pbcq->nest_regs[PBCQ_NEST_BAR_EN] & PBCQ_NEST_BAR_EN_PHB) {
+            phb3_pbcq_error(pbcq, "Changing enabled BAR unsupported");
+        }
+        pbcq->nest_regs[reg] = val & 0xfffffffffc000000ull;
+        break;
+    case PBCQ_NEST_BAR_EN:
+        pbcq->nest_regs[reg] = val & 0xf800000000000000ull;
+        pnv_pbcq_update_map(pbcq);
+        pnv_phb3_remap_irqs(pbcq->phb);
+        break;
+    case PBCQ_NEST_IRSN_COMPARE:
+    case PBCQ_NEST_IRSN_MASK:
+        pbcq->nest_regs[reg] = val & PBCQ_NEST_IRSN_COMP;
+        pnv_phb3_remap_irqs(pbcq->phb);
+        break;
+    case PBCQ_NEST_LSI_SRC_ID:
+        pbcq->nest_regs[reg] = val & PBCQ_NEST_LSI_SRC;
+        pnv_phb3_remap_irqs(pbcq->phb);
+        break;
+    default:
+        phb3_pbcq_error(pbcq, "%s @0x%"HWADDR_PRIx"=%"PRIx64, __func__,
+                        addr, val);
+    }
+}
+
+static void pnv_pbcq_pci_xscom_write(void *opaque, hwaddr addr,
+                                     uint64_t val, unsigned size)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(opaque);
+    uint32_t reg = addr >> 3;
+
+    switch (reg) {
+    case PBCQ_PCI_BAR2:
+        pbcq->pci_regs[reg] = val & 0xfffffffffc000000ull;
+        pnv_pbcq_update_map(pbcq);
+    default:
+        phb3_pbcq_error(pbcq, "%s @0x%"HWADDR_PRIx"=%"PRIx64, __func__,
+                        addr, val);
+    }
+}
+
+static void pnv_pbcq_spci_xscom_write(void *opaque, hwaddr addr,
+                                uint64_t val, unsigned size)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(opaque);
+    uint32_t reg = addr >> 3;
+
+    switch (reg) {
+    case PBCQ_SPCI_ASB_ADDR:
+        pbcq->spci_regs[reg] = val & 0xfff;
+        break;
+    case PBCQ_SPCI_ASB_STATUS:
+        pbcq->spci_regs[reg] &= ~val;
+        break;
+    case PBCQ_SPCI_ASB_DATA:
+        pnv_phb3_reg_write(pbcq->phb, pbcq->spci_regs[PBCQ_SPCI_ASB_ADDR],
+                           val, 8);
+        break;
+    case PBCQ_SPCI_AIB_CAPP_EN:
+    case PBCQ_SPCI_CAPP_SEC_TMR:
+        break;
+    default:
+        phb3_pbcq_error(pbcq, "%s @0x%"HWADDR_PRIx"=%"PRIx64, __func__,
+                        addr, val);
+    }
+}
+
+static const MemoryRegionOps pnv_pbcq_nest_xscom_ops = {
+    .read = pnv_pbcq_nest_xscom_read,
+    .write = pnv_pbcq_nest_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static const MemoryRegionOps pnv_pbcq_pci_xscom_ops = {
+    .read = pnv_pbcq_pci_xscom_read,
+    .write = pnv_pbcq_pci_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static const MemoryRegionOps pnv_pbcq_spci_xscom_ops = {
+    .read = pnv_pbcq_spci_xscom_read,
+    .write = pnv_pbcq_spci_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static void pnv_pbcq_default_bars(PnvPBCQState *pbcq)
+{
+    uint64_t mm0, mm1, reg;
+    PnvPHB3 *phb = pbcq->phb;
+
+    mm0 = 0x3d00000000000ull + 0x4000000000ull * phb->chip_id +
+            0x1000000000ull * phb->phb_id;
+    mm1 = 0x3ff8000000000ull + 0x0200000000ull * phb->chip_id +
+            0x0080000000ull * phb->phb_id;
+    reg = 0x3fffe40000000ull + 0x0000400000ull * phb->chip_id +
+            0x0000100000ull * phb->phb_id;
+
+    pbcq->nest_regs[PBCQ_NEST_MMIO_BAR0] = mm0 << 14;
+    pbcq->nest_regs[PBCQ_NEST_MMIO_BAR1] = mm1 << 14;
+    pbcq->nest_regs[PBCQ_NEST_PHB_BAR] = reg << 14;
+    pbcq->nest_regs[PBCQ_NEST_MMIO_MASK0] = 0x3fff000000000ull << 14;
+    pbcq->nest_regs[PBCQ_NEST_MMIO_MASK1] = 0x3ffff80000000ull << 14;
+    pbcq->pci_regs[PBCQ_PCI_BAR2] = reg << 14;
+}
+
+static void pnv_pbcq_realize(DeviceState *dev, Error **errp)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(dev);
+    PnvPHB3 *phb;
+    char name[32];
+
+    assert(pbcq->phb);
+    phb = pbcq->phb;
+
+    /* TODO: Fix OPAL to do that: establish default BAR values */
+    pnv_pbcq_default_bars(pbcq);
+
+    /* Initialize the XSCOM region for the PBCQ registers */
+    snprintf(name, sizeof(name), "xscom-pbcq-nest-%d.%d",
+             phb->chip_id, phb->phb_id);
+    pnv_xscom_region_init(&pbcq->xscom_nest_regs, OBJECT(dev),
+                          &pnv_pbcq_nest_xscom_ops, pbcq, name,
+                          PNV_XSCOM_PBCQ_NEST_SIZE);
+    snprintf(name, sizeof(name), "xscom-pbcq-pci-%d.%d",
+             phb->chip_id, phb->phb_id);
+    pnv_xscom_region_init(&pbcq->xscom_pci_regs, OBJECT(dev),
+                          &pnv_pbcq_pci_xscom_ops, pbcq, name,
+                          PNV_XSCOM_PBCQ_PCI_SIZE);
+    snprintf(name, sizeof(name), "xscom-pbcq-spci-%d.%d",
+             phb->chip_id, phb->phb_id);
+    pnv_xscom_region_init(&pbcq->xscom_spci_regs, OBJECT(dev),
+                          &pnv_pbcq_spci_xscom_ops, pbcq, name,
+                          PNV_XSCOM_PBCQ_SPCI_SIZE);
+}
+
+static int pnv_pbcq_dt_xscom(PnvXScomInterface *dev, void *fdt,
+                             int xscom_offset)
+{
+    const char compat[] = "ibm,power8-pbcq";
+    PnvPHB3 *phb = PNV_PBCQ(dev)->phb;
+    char *name;
+    int offset;
+    uint32_t lpc_pcba = PNV_XSCOM_PBCQ_NEST_BASE + 0x400 * phb->phb_id;
+    uint32_t reg[] = {
+        cpu_to_be32(lpc_pcba),
+        cpu_to_be32(PNV_XSCOM_PBCQ_NEST_SIZE),
+        cpu_to_be32(PNV_XSCOM_PBCQ_PCI_BASE + 0x400 * phb->phb_id),
+        cpu_to_be32(PNV_XSCOM_PBCQ_PCI_SIZE),
+        cpu_to_be32(PNV_XSCOM_PBCQ_SPCI_BASE + 0x040 * phb->phb_id),
+        cpu_to_be32(PNV_XSCOM_PBCQ_SPCI_SIZE)
+    };
+
+    name = g_strdup_printf("pbcq@%x", lpc_pcba);
+    offset = fdt_add_subnode(fdt, xscom_offset, name);
+    _FDT(offset);
+    g_free(name);
+
+    _FDT((fdt_setprop(fdt, offset, "reg", reg, sizeof(reg))));
+
+    _FDT((fdt_setprop_cell(fdt, offset, "ibm,phb-index", phb->phb_id)));
+    _FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id", phb->chip_id)));
+    _FDT((fdt_setprop(fdt, offset, "compatible", compat,
+                      sizeof(compat))));
+    return 0;
+}
+
+static void phb3_pbcq_instance_init(Object *obj)
+{
+    PnvPBCQState *pbcq = PNV_PBCQ(obj);
+
+    object_property_add_link(obj, "phb", TYPE_PNV_PHB3,
+                             (Object **)&pbcq->phb,
+                             object_property_allow_set_link,
+                             OBJ_PROP_LINK_STRONG,
+                             &error_abort);
+}
+
+static void pnv_pbcq_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PnvXScomInterfaceClass *xdc = PNV_XSCOM_INTERFACE_CLASS(klass);
+
+    xdc->dt_xscom = pnv_pbcq_dt_xscom;
+
+    dc->realize = pnv_pbcq_realize;
+}
+
+static const TypeInfo pnv_pbcq_type_info = {
+    .name          = TYPE_PNV_PBCQ,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(PnvPBCQState),
+    .instance_init = phb3_pbcq_instance_init,
+    .class_init    = pnv_pbcq_class_init,
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_PNV_XSCOM_INTERFACE },
+        { }
+    }
+};
+
+static void pnv_pbcq_register_types(void)
+{
+    type_register_static(&pnv_pbcq_type_info);
+}
+
+type_init(pnv_pbcq_register_types)
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 44c74be81b66..960e97d96882 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -616,8 +616,13 @@ static ISABus *pnv_isa_create(PnvChip *chip, Error **errp)
 static void pnv_chip_power8_pic_print_info(PnvChip *chip, Monitor *mon)
 {
     Pnv8Chip *chip8 = PNV8_CHIP(chip);
+    int i;
 
     ics_pic_print_info(&chip8->psi.ics, mon);
+    for (i = 0; i < chip->num_phbs; i++) {
+        pnv_phb3_msi_pic_print_info(&chip8->phbs[i].msis, mon);
+        ics_pic_print_info(&chip8->phbs[i].lsis, mon);
+    }
 }
 
 static void pnv_chip_power9_pic_print_info(PnvChip *chip, Monitor *mon)
@@ -1031,7 +1036,10 @@ static void pnv_chip_power10_intc_print_info(PnvChip *chip, PowerPCCPU *cpu,
 
 static void pnv_chip_power8_instance_init(Object *obj)
 {
+    PnvChip *chip = PNV_CHIP(obj);
     Pnv8Chip *chip8 = PNV8_CHIP(obj);
+    PnvChipClass *pcc = PNV_CHIP_GET_CLASS(obj);
+    int i;
 
     object_property_add_link(obj, "xics", TYPE_XICS_FABRIC,
                              (Object **)&chip8->xics,
@@ -1050,6 +1058,17 @@ static void pnv_chip_power8_instance_init(Object *obj)
 
     object_initialize_child(obj, "homer",  &chip8->homer, sizeof(chip8->homer),
                             TYPE_PNV8_HOMER, &error_abort, NULL);
+
+    for (i = 0; i < pcc->num_phbs; i++) {
+        object_initialize_child(obj, "phb[*]", &chip8->phbs[i],
+                                sizeof(chip8->phbs[i]), TYPE_PNV_PHB3,
+                                &error_abort, NULL);
+    }
+
+    /*
+     * Number of PHBs is the chip default
+     */
+    chip->num_phbs = pcc->num_phbs;
 }
 
 static void pnv_chip_icp_realize(Pnv8Chip *chip8, Error **errp)
@@ -1088,6 +1107,7 @@ static void pnv_chip_power8_realize(DeviceState *dev, Error **errp)
     Pnv8Chip *chip8 = PNV8_CHIP(dev);
     Pnv8Psi *psi8 = &chip8->psi;
     Error *local_err = NULL;
+    int i;
 
     assert(chip8->xics);
 
@@ -1168,6 +1188,33 @@ static void pnv_chip_power8_realize(DeviceState *dev, Error **errp)
     /* Homer mmio region */
     memory_region_add_subregion(get_system_memory(), PNV_HOMER_BASE(chip),
                                 &chip8->homer.regs);
+
+    /* PHB3 controllers */
+    for (i = 0; i < chip->num_phbs; i++) {
+        PnvPHB3 *phb = &chip8->phbs[i];
+        PnvPBCQState *pbcq = &phb->pbcq;
+
+        object_property_set_int(OBJECT(phb), i, "index", &error_fatal);
+        object_property_set_int(OBJECT(phb), chip->chip_id, "chip-id",
+                                &error_fatal);
+        object_property_set_bool(OBJECT(phb), true, "realized", &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+        qdev_set_parent_bus(DEVICE(phb), sysbus_get_default());
+
+        /* Populate the XSCOM address space. */
+        pnv_xscom_add_subregion(chip,
+                                PNV_XSCOM_PBCQ_NEST_BASE + 0x400 * phb->phb_id,
+                                &pbcq->xscom_nest_regs);
+        pnv_xscom_add_subregion(chip,
+                                PNV_XSCOM_PBCQ_PCI_BASE + 0x400 * phb->phb_id,
+                                &pbcq->xscom_pci_regs);
+        pnv_xscom_add_subregion(chip,
+                                PNV_XSCOM_PBCQ_SPCI_BASE + 0x040 * phb->phb_id,
+                                &pbcq->xscom_spci_regs);
+    }
 }
 
 static uint32_t pnv_chip_power8_xscom_pcba(PnvChip *chip, uint64_t addr)
@@ -1183,6 +1230,7 @@ static void pnv_chip_power8e_class_init(ObjectClass *klass, void *data)
 
     k->chip_cfam_id = 0x221ef04980000000ull;  /* P8 Murano DD2.1 */
     k->cores_mask = POWER8E_CORE_MASK;
+    k->num_phbs = 3;
     k->core_pir = pnv_chip_core_pir_p8;
     k->intc_create = pnv_chip_power8_intc_create;
     k->intc_reset = pnv_chip_power8_intc_reset;
@@ -1206,6 +1254,7 @@ static void pnv_chip_power8_class_init(ObjectClass *klass, void *data)
 
     k->chip_cfam_id = 0x220ea04980000000ull; /* P8 Venice DD2.0 */
     k->cores_mask = POWER8_CORE_MASK;
+    k->num_phbs = 3;
     k->core_pir = pnv_chip_core_pir_p8;
     k->intc_create = pnv_chip_power8_intc_create;
     k->intc_reset = pnv_chip_power8_intc_reset;
@@ -1229,6 +1278,7 @@ static void pnv_chip_power8nvl_class_init(ObjectClass *klass, void *data)
 
     k->chip_cfam_id = 0x120d304980000000ull;  /* P8 Naples DD1.0 */
     k->cores_mask = POWER8_CORE_MASK;
+    k->num_phbs = 3;
     k->core_pir = pnv_chip_core_pir_p8;
     k->intc_create = pnv_chip_power8_intc_create;
     k->intc_reset = pnv_chip_power8_intc_reset;
@@ -1753,14 +1803,23 @@ PowerPCCPU *pnv_chip_find_cpu(PnvChip *chip, uint32_t pir)
 static ICSState *pnv_ics_get(XICSFabric *xi, int irq)
 {
     PnvMachineState *pnv = PNV_MACHINE(xi);
-    int i;
+    int i, j;
 
     for (i = 0; i < pnv->num_chips; i++) {
+        PnvChip *chip = pnv->chips[i];
         Pnv8Chip *chip8 = PNV8_CHIP(pnv->chips[i]);
 
         if (ics_valid_irq(&chip8->psi.ics, irq)) {
             return &chip8->psi.ics;
         }
+        for (j = 0; j < chip->num_phbs; j++) {
+            if (ics_valid_irq(&chip8->phbs[j].lsis, irq)) {
+                return &chip8->phbs[j].lsis;
+            }
+            if (ics_valid_irq(ICS(&chip8->phbs[j].msis), irq)) {
+                return ICS(&chip8->phbs[j].msis);
+            }
+        }
     }
     return NULL;
 }
@@ -1768,11 +1827,17 @@ static ICSState *pnv_ics_get(XICSFabric *xi, int irq)
 static void pnv_ics_resend(XICSFabric *xi)
 {
     PnvMachineState *pnv = PNV_MACHINE(xi);
-    int i;
+    int i, j;
 
     for (i = 0; i < pnv->num_chips; i++) {
+        PnvChip *chip = pnv->chips[i];
         Pnv8Chip *chip8 = PNV8_CHIP(pnv->chips[i]);
+
         ics_resend(&chip8->psi.ics);
+        for (j = 0; j < chip->num_phbs; j++) {
+            ics_resend(&chip8->phbs[j].lsis);
+            ics_resend(ICS(&chip8->phbs[j].msis));
+        }
     }
 }
 
diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
index 8a296e2f93b2..8c87e8494d3d 100644
--- a/hw/pci-host/Makefile.objs
+++ b/hw/pci-host/Makefile.objs
@@ -21,3 +21,4 @@ common-obj-$(CONFIG_PCI_EXPRESS_XILINX) += xilinx-pcie.o
 
 common-obj-$(CONFIG_PCI_EXPRESS_DESIGNWARE) += designware.o
 obj-$(CONFIG_POWERNV) += pnv_phb4.o pnv_phb4_pec.o
+obj-$(CONFIG_POWERNV) += pnv_phb3.o pnv_phb3_msi.o pnv_phb3_pbcq.o
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
  2020-01-27 14:45 ` [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge Cédric Le Goater
@ 2020-01-29  3:09   ` David Gibson
  2020-01-29  3:54     ` Oliver O'Halloran
  0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2020-01-29  3:09 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, Oliver O'Halloran, qemu-devel, Nicholas Piggin

[-- Attachment #1: Type: text/plain, Size: 120265 bytes --]

On Mon, Jan 27, 2020 at 03:45:05PM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> These changes introduces models for the PCIe Host Bridge (PHB4) of the
> POWER9 processor. It includes the PowerBus logic interface (PBCQ),
> IOMMU support, a single PCIe Gen.4 Root Complex, and support for MSI
> and LSI interrupt sources as found on a POWER9 system using the XIVE
> interrupt controller.
> 
> POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and
> each PEC can have several PHBs. By default,
> 
>   * PEC0 provides 1 PHB  (PHB0)
>   * PEC1 provides 2 PHBs (PHB1 and PHB2)
>   * PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5)
> 
> Each PEC has a set  "global" registers and some "per-stack" (per-PHB)
> registers. Those are organized in two XSCOM ranges, the "Nest" range
> and the "PCI" range, each range contains both some "PEC" registers and
> some "per-stack" registers.
> 
> No default device layout is provided and PCI devices can be added on
> any of the available PCIe Root Port (pcie.0 .. 2 of a Power9 chip)
> with address 0x0 as the firwware (skiboot) only accepts a single
> device per root port. To run a simple system with a network and a
> storage adapters, use a command line options such as :
> 
>   -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
>   -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
> 
>   -device megasas,id=scsi0,bus=pcie.1,addr=0x0
>   -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>   -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
> 
> If more are needed, include a bridge.
> 
> Multi chip is supported, each chip adding its set of PHB4 controllers
> and its PCI busses. The model doesn't emulate the EEH error handling.
> 
> This model is not ready for hotplug yet.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Mostly LGTM, one query below.

> [ clg: - numerous cleanups
>        - commit log
>        - fix for broken LSI support
>        - PHB pic printinfo
>        - large QOM rework ]
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/pci-host/pnv_phb4.h      |  230 +++++
>  include/hw/pci-host/pnv_phb4_regs.h |  553 ++++++++++
>  include/hw/pci/pcie_port.h          |    1 +
>  include/hw/ppc/pnv.h                |    7 +
>  include/hw/ppc/pnv_xscom.h          |   11 +
>  hw/pci-host/pnv_phb4.c              | 1438 +++++++++++++++++++++++++++
>  hw/pci-host/pnv_phb4_pec.c          |  593 +++++++++++
>  hw/ppc/pnv.c                        |  107 ++
>  hw/pci-host/Makefile.objs           |    1 +
>  hw/ppc/Kconfig                      |    2 +
>  10 files changed, 2943 insertions(+)
>  create mode 100644 include/hw/pci-host/pnv_phb4.h
>  create mode 100644 include/hw/pci-host/pnv_phb4_regs.h
>  create mode 100644 hw/pci-host/pnv_phb4.c
>  create mode 100644 hw/pci-host/pnv_phb4_pec.c
> 
> diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
> new file mode 100644
> index 000000000000..c882bfd0aa23
> --- /dev/null
> +++ b/include/hw/pci-host/pnv_phb4.h
> @@ -0,0 +1,230 @@
> +/*
> + * QEMU PowerPC PowerNV (POWER9) PHB4 model
> + *
> + * Copyright (c) 2018-2020, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PCI_HOST_PNV_PHB4_H
> +#define PCI_HOST_PNV_PHB4_H
> +
> +#include "hw/pci/pcie_host.h"
> +#include "hw/pci/pcie_port.h"
> +#include "hw/ppc/xive.h"
> +
> +typedef struct PnvPhb4PecState PnvPhb4PecState;
> +typedef struct PnvPhb4PecStack PnvPhb4PecStack;
> +typedef struct PnvPHB4 PnvPHB4;
> +typedef struct PnvChip PnvChip;
> +
> +/*
> + * We have one such address space wrapper per possible device under
> + * the PHB since they need to be assigned statically at qemu device
> + * creation time. The relationship to a PE is done later
> + * dynamically. This means we can potentially create a lot of these
> + * guys. Q35 stores them as some kind of radix tree but we never
> + * really need to do fast lookups so instead we simply keep a QLIST of
> + * them for now, we can add the radix if needed later on.
> + *
> + * We do cache the PE number to speed things up a bit though.
> + */
> +typedef struct PnvPhb4DMASpace {
> +    PCIBus *bus;
> +    uint8_t devfn;
> +    int pe_num;         /* Cached PE number */
> +#define PHB_INVALID_PE (-1)
> +    PnvPHB4 *phb;
> +    AddressSpace dma_as;
> +    IOMMUMemoryRegion dma_mr;
> +    MemoryRegion msi32_mr;
> +    MemoryRegion msi64_mr;
> +    QLIST_ENTRY(PnvPhb4DMASpace) list;
> +} PnvPhb4DMASpace;
> +
> +/*
> + * PHB4 PCIe Root port
> + */
> +#define TYPE_PNV_PHB4_ROOT_BUS "pnv-phb4-root-bus"
> +#define TYPE_PNV_PHB4_ROOT_PORT "pnv-phb4-root-port"
> +
> +typedef struct PnvPHB4RootPort {
> +    PCIESlot parent_obj;
> +} PnvPHB4RootPort;
> +
> +/*
> + * PHB4 PCIe Host Bridge for PowerNV machines (POWER9)
> + */
> +#define TYPE_PNV_PHB4 "pnv-phb4"
> +#define PNV_PHB4(obj) OBJECT_CHECK(PnvPHB4, (obj), TYPE_PNV_PHB4)
> +
> +#define PNV_PHB4_MAX_LSIs          8
> +#define PNV_PHB4_MAX_INTs          4096
> +#define PNV_PHB4_MAX_MIST          (PNV_PHB4_MAX_INTs >> 2)
> +#define PNV_PHB4_MAX_MMIO_WINDOWS  32
> +#define PNV_PHB4_MIN_MMIO_WINDOWS  16
> +#define PNV_PHB4_NUM_REGS          (0x3000 >> 3)
> +#define PNV_PHB4_MAX_PEs           512
> +#define PNV_PHB4_MAX_TVEs          (PNV_PHB4_MAX_PEs * 2)
> +#define PNV_PHB4_MAX_PEEVs         (PNV_PHB4_MAX_PEs / 64)
> +#define PNV_PHB4_MAX_MBEs          (PNV_PHB4_MAX_MMIO_WINDOWS * 2)
> +
> +#define PNV_PHB4_VERSION           0x000000a400000002ull
> +#define PNV_PHB4_DEVICE_ID         0x04c1
> +
> +#define PCI_MMIO_TOTAL_SIZE        (0x1ull << 60)
> +
> +struct PnvPHB4 {
> +    PCIExpressHost parent_obj;
> +
> +    PnvPHB4RootPort root;
> +
> +    uint32_t chip_id;
> +    uint32_t phb_id;
> +
> +    uint64_t version;
> +    uint16_t device_id;
> +
> +    char bus_path[8];
> +
> +    /* Main register images */
> +    uint64_t regs[PNV_PHB4_NUM_REGS];
> +    MemoryRegion mr_regs;
> +
> +    /* Extra SCOM-only register */
> +    uint64_t scom_hv_ind_addr_reg;
> +
> +    /*
> +     * Geometry of the PHB. There are two types, small and big PHBs, a
> +     * number of resources (number of PEs, windows etc...) are doubled
> +     * for a big PHB
> +     */
> +    bool big_phb;
> +
> +    /* Memory regions for MMIO space */
> +    MemoryRegion mr_mmio[PNV_PHB4_MAX_MMIO_WINDOWS];
> +
> +    /* PCI side space */
> +    MemoryRegion pci_mmio;
> +    MemoryRegion pci_io;
> +
> +    /* On-chip IODA tables */
> +    uint64_t ioda_LIST[PNV_PHB4_MAX_LSIs];
> +    uint64_t ioda_MIST[PNV_PHB4_MAX_MIST];
> +    uint64_t ioda_TVT[PNV_PHB4_MAX_TVEs];
> +    uint64_t ioda_MBT[PNV_PHB4_MAX_MBEs];
> +    uint64_t ioda_MDT[PNV_PHB4_MAX_PEs];
> +    uint64_t ioda_PEEV[PNV_PHB4_MAX_PEEVs];
> +
> +    /*
> +     * The internal PESTA/B is 2 bits per PE split into two tables, we
> +     * store them in a single array here to avoid wasting space.
> +     */
> +    uint8_t  ioda_PEST_AB[PNV_PHB4_MAX_PEs];
> +
> +    /* P9 Interrupt generation */
> +    XiveSource xsrc;
> +    qemu_irq *qirqs;
> +
> +    PnvPhb4PecStack *stack;
> +
> +    QLIST_HEAD(, PnvPhb4DMASpace) dma_spaces;
> +};
> +
> +void pnv_phb4_pic_print_info(PnvPHB4 *phb, Monitor *mon);
> +void pnv_phb4_update_regions(PnvPhb4PecStack *stack);
> +extern const MemoryRegionOps pnv_phb4_xscom_ops;
> +
> +/*
> + * PHB4 PEC (PCI Express Controller)
> + */
> +#define TYPE_PNV_PHB4_PEC "pnv-phb4-pec"
> +#define PNV_PHB4_PEC(obj) \
> +    OBJECT_CHECK(PnvPhb4PecState, (obj), TYPE_PNV_PHB4_PEC)
> +
> +#define TYPE_PNV_PHB4_PEC_STACK "pnv-phb4-pec-stack"
> +#define PNV_PHB4_PEC_STACK(obj) \
> +    OBJECT_CHECK(PnvPhb4PecStack, (obj), TYPE_PNV_PHB4_PEC_STACK)
> +
> +/* Per-stack data */
> +struct PnvPhb4PecStack {
> +    DeviceState parent;
> +
> +    /* My own stack number */
> +    uint32_t stack_no;
> +
> +    /* Nest registers */
> +#define PHB4_PEC_NEST_STK_REGS_COUNT  0x17
> +    uint64_t nest_regs[PHB4_PEC_NEST_STK_REGS_COUNT];
> +    MemoryRegion nest_regs_mr;
> +
> +    /* PCI registers (excluding pass-through) */
> +#define PHB4_PEC_PCI_STK_REGS_COUNT  0xf
> +    uint64_t pci_regs[PHB4_PEC_PCI_STK_REGS_COUNT];
> +    MemoryRegion pci_regs_mr;
> +
> +    /* PHB pass-through XSCOM */
> +    MemoryRegion phb_regs_mr;
> +
> +    /* Memory windows from PowerBus to PHB */
> +    MemoryRegion mmbar0;
> +    MemoryRegion mmbar1;
> +    MemoryRegion phbbar;
> +    MemoryRegion intbar;
> +    uint64_t mmio0_base;
> +    uint64_t mmio0_size;
> +    uint64_t mmio1_base;
> +    uint64_t mmio1_size;
> +
> +    /* The owner PEC */
> +    PnvPhb4PecState *pec;
> +
> +    /* The actual PHB */
> +    PnvPHB4 phb;
> +};
> +
> +struct PnvPhb4PecState {
> +    DeviceState parent;
> +
> +    /* PEC number in chip */
> +    uint32_t index;
> +    uint32_t chip_id;
> +
> +    MemoryRegion *system_memory;
> +
> +    /* Nest registers, excuding per-stack */
> +#define PHB4_PEC_NEST_REGS_COUNT    0xf
> +    uint64_t nest_regs[PHB4_PEC_NEST_REGS_COUNT];
> +    MemoryRegion nest_regs_mr;
> +
> +    /* PCI registers, excluding per-stack */
> +#define PHB4_PEC_PCI_REGS_COUNT     0x2
> +    uint64_t pci_regs[PHB4_PEC_PCI_REGS_COUNT];
> +    MemoryRegion pci_regs_mr;
> +
> +    /* Stacks */
> +    #define PHB4_PEC_MAX_STACKS     3
> +    uint32_t num_stacks;
> +    PnvPhb4PecStack stacks[PHB4_PEC_MAX_STACKS];
> +};
> +
> +#define PNV_PHB4_PEC_CLASS(klass) \
> +     OBJECT_CLASS_CHECK(PnvPhb4PecClass, (klass), TYPE_PNV_PHB4_PEC)
> +#define PNV_PHB4_PEC_GET_CLASS(obj) \
> +     OBJECT_GET_CLASS(PnvPhb4PecClass, (obj), TYPE_PNV_PHB4_PEC)
> +
> +typedef struct PnvPhb4PecClass {
> +    DeviceClass parent_class;
> +
> +    uint32_t (*xscom_nest_base)(PnvPhb4PecState *pec);
> +    uint32_t xscom_nest_size;
> +    uint32_t (*xscom_pci_base)(PnvPhb4PecState *pec);
> +    uint32_t xscom_pci_size;
> +    const char *compat;
> +    int compat_size;
> +    const char *stk_compat;
> +    int stk_compat_size;
> +} PnvPhb4PecClass;
> +
> +#endif /* PCI_HOST_PNV_PHB4_H */
> diff --git a/include/hw/pci-host/pnv_phb4_regs.h b/include/hw/pci-host/pnv_phb4_regs.h
> new file mode 100644
> index 000000000000..55df2c3e5ece
> --- /dev/null
> +++ b/include/hw/pci-host/pnv_phb4_regs.h
> @@ -0,0 +1,553 @@
> +/*
> + * QEMU PowerPC PowerNV (POWER9) PHB4 model
> + *
> + * Copyright (c) 2013-2020, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PCI_HOST_PNV_PHB4_REGS_H
> +#define PCI_HOST_PNV_PHB4_REGS_H
> +
> +/*
> + * PEC XSCOM registers
> + *
> + * There a 3 PECs in P9. Each PEC can have several PHBs. Each PEC has some
> + * "global" registers and some "per-stack" (per-PHB) registers. Those are
> + * organized in two XSCOM ranges, the "Nest" range and the "PCI" range, each
> + * range contains both some "PEC" registers and some "per-stack" registers.
> + *
> + * Finally the PCI range also contains an additional range per stack that
> + * passes through to some of the PHB own registers.
> + *
> + * PEC0 can contain 1 PHB  (PHB0)
> + * PEC1 can contain 2 PHBs (PHB1 and PHB2)
> + * PEC2 can contain 3 PHBs (PHB3, PHB4 and PHB5)
> + */
> +
> +/*
> + * This is the "stack" offset, it's the offset from a given range base
> + * to the first "per-stack" registers and also the stride between
> + * stacks, thus for PEC2, the global registers are at offset 0, the
> + * PHB3 registers at offset 0x40, the PHB4 at offset 0x80 etc....
> + *
> + * It is *also* the offset to the pass-through SCOM region but in this case
> + * it is 0 based, ie PHB3 is at 0x100 PHB4 is a 0x140 etc..
> + */
> +#define PEC_STACK_OFFSET        0x40
> +
> +/* XSCOM Nest global registers */
> +#define PEC_NEST_PBCQ_HW_CONFIG         0x00
> +#define PEC_NEST_DROP_PRIO_CTRL         0x01
> +#define PEC_NEST_PBCQ_ERR_INJECT        0x02
> +#define PEC_NEST_PCI_NEST_CLK_TRACE_CTL 0x03
> +#define PEC_NEST_PBCQ_PMON_CTRL         0x04
> +#define PEC_NEST_PBCQ_PBUS_ADDR_EXT     0x05
> +#define PEC_NEST_PBCQ_PRED_VEC_TIMEOUT  0x06
> +#define PEC_NEST_CAPP_CTRL              0x07
> +#define PEC_NEST_PBCQ_READ_STK_OVR      0x08
> +#define PEC_NEST_PBCQ_WRITE_STK_OVR     0x09
> +#define PEC_NEST_PBCQ_STORE_STK_OVR     0x0a
> +#define PEC_NEST_PBCQ_RETRY_BKOFF_CTRL  0x0b
> +
> +/* XSCOM Nest per-stack registers */
> +#define PEC_NEST_STK_PCI_NEST_FIR       0x00
> +#define PEC_NEST_STK_PCI_NEST_FIR_CLR   0x01
> +#define PEC_NEST_STK_PCI_NEST_FIR_SET   0x02
> +#define PEC_NEST_STK_PCI_NEST_FIR_MSK   0x03
> +#define PEC_NEST_STK_PCI_NEST_FIR_MSKC  0x04
> +#define PEC_NEST_STK_PCI_NEST_FIR_MSKS  0x05
> +#define PEC_NEST_STK_PCI_NEST_FIR_ACT0  0x06
> +#define PEC_NEST_STK_PCI_NEST_FIR_ACT1  0x07
> +#define PEC_NEST_STK_PCI_NEST_FIR_WOF   0x08
> +#define PEC_NEST_STK_ERR_REPORT_0       0x0a
> +#define PEC_NEST_STK_ERR_REPORT_1       0x0b
> +#define PEC_NEST_STK_PBCQ_GNRL_STATUS   0x0c
> +#define PEC_NEST_STK_PBCQ_MODE          0x0d
> +#define PEC_NEST_STK_MMIO_BAR0          0x0e
> +#define PEC_NEST_STK_MMIO_BAR0_MASK     0x0f
> +#define PEC_NEST_STK_MMIO_BAR1          0x10
> +#define PEC_NEST_STK_MMIO_BAR1_MASK     0x11
> +#define PEC_NEST_STK_PHB_REGS_BAR       0x12
> +#define PEC_NEST_STK_INT_BAR            0x13
> +#define PEC_NEST_STK_BAR_EN             0x14
> +#define   PEC_NEST_STK_BAR_EN_MMIO0             PPC_BIT(0)
> +#define   PEC_NEST_STK_BAR_EN_MMIO1             PPC_BIT(1)
> +#define   PEC_NEST_STK_BAR_EN_PHB               PPC_BIT(2)
> +#define   PEC_NEST_STK_BAR_EN_INT               PPC_BIT(3)
> +#define PEC_NEST_STK_DATA_FRZ_TYPE      0x15
> +#define PEC_NEST_STK_PBCQ_TUN_BAR       0x16
> +
> +/* XSCOM PCI global registers */
> +#define PEC_PCI_PBAIB_HW_CONFIG         0x00
> +#define PEC_PCI_PBAIB_READ_STK_OVR      0x02
> +
> +/* XSCOM PCI per-stack registers */
> +#define PEC_PCI_STK_PCI_FIR             0x00
> +#define PEC_PCI_STK_PCI_FIR_CLR         0x01
> +#define PEC_PCI_STK_PCI_FIR_SET         0x02
> +#define PEC_PCI_STK_PCI_FIR_MSK         0x03
> +#define PEC_PCI_STK_PCI_FIR_MSKC        0x04
> +#define PEC_PCI_STK_PCI_FIR_MSKS        0x05
> +#define PEC_PCI_STK_PCI_FIR_ACT0        0x06
> +#define PEC_PCI_STK_PCI_FIR_ACT1        0x07
> +#define PEC_PCI_STK_PCI_FIR_WOF         0x08
> +#define PEC_PCI_STK_ETU_RESET           0x0a
> +#define PEC_PCI_STK_PBAIB_ERR_REPORT    0x0b
> +#define PEC_PCI_STK_PBAIB_TX_CMD_CRED   0x0d
> +#define PEC_PCI_STK_PBAIB_TX_DAT_CRED   0x0e
> +
> +/*
> + * PHB "SCOM" registers. This is accessed via the above window
> + * and provides a backdoor to the PHB when the AIB bus is not
> + * functional. Some of these directly map some of the PHB MMIO
> + * registers, some are specific and allow indirect access to a
> + * wider range of PHB registers
> + */
> +#define PHB_SCOM_HV_IND_ADDR            0x00
> +#define   PHB_SCOM_HV_IND_ADDR_VALID            PPC_BIT(0)
> +#define   PHB_SCOM_HV_IND_ADDR_4B               PPC_BIT(1)
> +#define   PHB_SCOM_HV_IND_ADDR_AUTOINC          PPC_BIT(2)
> +#define   PHB_SCOM_HV_IND_ADDR_ADDR             PPC_BITMASK(51, 63)
> +#define PHB_SCOM_HV_IND_DATA            0x01
> +#define PHB_SCOM_ETU_LEM_FIR            0x08
> +#define PHB_SCOM_ETU_LEM_FIR_AND        0x09
> +#define PHB_SCOM_ETU_LEM_FIR_OR         0x0a
> +#define PHB_SCOM_ETU_LEM_FIR_MSK        0x0b
> +#define PHB_SCOM_ETU_LEM_ERR_MSK_AND    0x0c
> +#define PHB_SCOM_ETU_LEM_ERR_MSK_OR     0x0d
> +#define PHB_SCOM_ETU_LEM_ACT0           0x0e
> +#define PHB_SCOM_ETU_LEM_ACT1           0x0f
> +#define PHB_SCOM_ETU_LEM_WOF            0x10
> +#define PHB_SCOM_ETU_PMON_CONFIG        0x17
> +#define PHB_SCOM_ETU_PMON_CTR0          0x18
> +#define PHB_SCOM_ETU_PMON_CTR1          0x19
> +#define PHB_SCOM_ETU_PMON_CTR2          0x1a
> +#define PHB_SCOM_ETU_PMON_CTR3          0x1b
> +
> +
> +/*
> + * PHB MMIO registers
> + */
> +
> +/* PHB Fundamental register set A */
> +#define PHB_LSI_SOURCE_ID               0x100
> +#define   PHB_LSI_SRC_ID                PPC_BITMASK(4, 12)
> +#define PHB_DMA_CHAN_STATUS             0x110
> +#define   PHB_DMA_CHAN_ANY_ERR          PPC_BIT(27)
> +#define   PHB_DMA_CHAN_ANY_ERR1         PPC_BIT(28)
> +#define   PHB_DMA_CHAN_ANY_FREEZE       PPC_BIT(29)
> +#define PHB_CPU_LOADSTORE_STATUS        0x120
> +#define   PHB_CPU_LS_ANY_ERR            PPC_BIT(27)
> +#define   PHB_CPU_LS_ANY_ERR1           PPC_BIT(28)
> +#define   PHB_CPU_LS_ANY_FREEZE         PPC_BIT(29)
> +#define PHB_CONFIG_DATA                 0x130
> +#define PHB_LOCK0                       0x138
> +#define PHB_CONFIG_ADDRESS              0x140
> +#define   PHB_CA_ENABLE                 PPC_BIT(0)
> +#define   PHB_CA_STATUS                 PPC_BITMASK(1, 3)
> +#define     PHB_CA_STATUS_GOOD          0
> +#define     PHB_CA_STATUS_UR            1
> +#define     PHB_CA_STATUS_CRS           2
> +#define     PHB_CA_STATUS_CA            4
> +#define   PHB_CA_BUS                    PPC_BITMASK(4, 11)
> +#define   PHB_CA_DEV                    PPC_BITMASK(12, 16)
> +#define   PHB_CA_FUNC                   PPC_BITMASK(17, 19)
> +#define   PHB_CA_BDFN                   PPC_BITMASK(4, 19) /* bus,dev,func */
> +#define   PHB_CA_REG                    PPC_BITMASK(20, 31)
> +#define   PHB_CA_PE                     PPC_BITMASK(39, 47)
> +#define PHB_LOCK1                       0x148
> +#define PHB_PHB4_CONFIG                 0x160
> +#define   PHB_PHB4C_32BIT_MSI_EN        PPC_BIT(8)
> +#define   PHB_PHB4C_64BIT_MSI_EN        PPC_BIT(14)
> +#define PHB_RTT_BAR                     0x168
> +#define   PHB_RTT_BAR_ENABLE            PPC_BIT(0)
> +#define   PHB_RTT_BASE_ADDRESS_MASK     PPC_BITMASK(8, 46)
> +#define PHB_PELTV_BAR                   0x188
> +#define   PHB_PELTV_BAR_ENABLE          PPC_BIT(0)
> +#define   PHB_PELTV_BASE_ADDRESS        PPC_BITMASK(8, 50)
> +#define PHB_M32_START_ADDR              0x1a0
> +#define PHB_PEST_BAR                    0x1a8
> +#define   PHB_PEST_BAR_ENABLE           PPC_BIT(0)
> +#define   PHB_PEST_BASE_ADDRESS         PPC_BITMASK(8, 51)
> +#define PHB_ASN_CMPM                    0x1C0
> +#define   PHB_ASN_CMPM_ENABLE           PPC_BIT(63)
> +#define PHB_CAPI_CMPM                   0x1C8
> +#define   PHB_CAPI_CMPM_ENABLE          PPC_BIT(63)
> +#define PHB_M64_AOMASK                  0x1d0
> +#define PHB_M64_UPPER_BITS              0x1f0
> +#define PHB_NXLATE_PREFIX               0x1f8
> +#define PHB_DMARD_SYNC                  0x200
> +#define   PHB_DMARD_SYNC_START          PPC_BIT(0)
> +#define   PHB_DMARD_SYNC_COMPLETE       PPC_BIT(1)
> +#define PHB_RTC_INVALIDATE              0x208
> +#define   PHB_RTC_INVALIDATE_ALL        PPC_BIT(0)
> +#define   PHB_RTC_INVALIDATE_RID        PPC_BITMASK(16, 31)
> +#define PHB_TCE_KILL                    0x210
> +#define   PHB_TCE_KILL_ALL              PPC_BIT(0)
> +#define   PHB_TCE_KILL_PE               PPC_BIT(1)
> +#define   PHB_TCE_KILL_ONE              PPC_BIT(2)
> +#define   PHB_TCE_KILL_PSEL             PPC_BIT(3)
> +#define   PHB_TCE_KILL_64K              0x1000 /* Address override */
> +#define   PHB_TCE_KILL_2M               0x2000 /* Address override */
> +#define   PHB_TCE_KILL_1G               0x3000 /* Address override */
> +#define   PHB_TCE_KILL_PENUM            PPC_BITMASK(55, 63)
> +#define PHB_TCE_SPEC_CTL                0x218
> +#define PHB_IODA_ADDR                   0x220
> +#define   PHB_IODA_AD_AUTOINC           PPC_BIT(0)
> +#define   PHB_IODA_AD_TSEL              PPC_BITMASK(11, 15)
> +#define   PHB_IODA_AD_MIST_PWV          PPC_BITMASK(28, 31)
> +#define   PHB_IODA_AD_TADR              PPC_BITMASK(54, 63)
> +#define PHB_IODA_DATA0                  0x228
> +#define PHB_PHB4_GEN_CAP                0x250
> +#define PHB_PHB4_TCE_CAP                0x258
> +#define PHB_PHB4_IRQ_CAP                0x260
> +#define PHB_PHB4_EEH_CAP                0x268
> +#define PHB_PAPR_ERR_INJ_CTL            0x2b0
> +#define   PHB_PAPR_ERR_INJ_CTL_INB      PPC_BIT(0)
> +#define   PHB_PAPR_ERR_INJ_CTL_OUTB     PPC_BIT(1)
> +#define   PHB_PAPR_ERR_INJ_CTL_STICKY   PPC_BIT(2)
> +#define   PHB_PAPR_ERR_INJ_CTL_CFG      PPC_BIT(3)
> +#define   PHB_PAPR_ERR_INJ_CTL_RD       PPC_BIT(4)
> +#define   PHB_PAPR_ERR_INJ_CTL_WR       PPC_BIT(5)
> +#define   PHB_PAPR_ERR_INJ_CTL_FREEZE   PPC_BIT(6)
> +#define PHB_PAPR_ERR_INJ_ADDR           0x2b8
> +#define   PHB_PAPR_ERR_INJ_ADDR_MMIO            PPC_BITMASK(16, 63)
> +#define PHB_PAPR_ERR_INJ_MASK           0x2c0
> +#define   PHB_PAPR_ERR_INJ_MASK_CFG             PPC_BITMASK(4, 11)
> +#define   PHB_PAPR_ERR_INJ_MASK_CFG_ALL         PPC_BITMASK(4, 19)
> +#define   PHB_PAPR_ERR_INJ_MASK_MMIO            PPC_BITMASK(16, 63)
> +#define PHB_ETU_ERR_SUMMARY             0x2c8
> +#define PHB_INT_NOTIFY_ADDR             0x300
> +#define PHB_INT_NOTIFY_INDEX            0x308
> +
> +/* Fundamental register set B */
> +#define PHB_VERSION                     0x800
> +#define PHB_CTRLR                       0x810
> +#define   PHB_CTRLR_IRQ_PGSZ_64K        PPC_BIT(11)
> +#define   PHB_CTRLR_IRQ_STORE_EOI       PPC_BIT(12)
> +#define   PHB_CTRLR_MMIO_RD_STRICT      PPC_BIT(13)
> +#define   PHB_CTRLR_MMIO_EEH_DISABLE    PPC_BIT(14)
> +#define   PHB_CTRLR_CFG_EEH_BLOCK       PPC_BIT(15)
> +#define   PHB_CTRLR_FENCE_LNKILL_DIS    PPC_BIT(16)
> +#define   PHB_CTRLR_TVT_ADDR_SEL        PPC_BITMASK(17, 19)
> +#define     TVT_DD1_1_PER_PE            0
> +#define     TVT_DD1_2_PER_PE            1
> +#define     TVT_DD1_4_PER_PE            2
> +#define     TVT_DD1_8_PER_PE            3
> +#define     TVT_DD1_16_PER_PE           4
> +#define     TVT_2_PER_PE                0
> +#define     TVT_4_PER_PE                1
> +#define     TVT_8_PER_PE                2
> +#define     TVT_16_PER_PE               3
> +#define   PHB_CTRLR_DMA_RD_SPACING      PPC_BITMASK(28, 31)
> +#define PHB_AIB_FENCE_CTRL              0x860
> +#define PHB_TCE_TAG_ENABLE              0x868
> +#define PHB_TCE_WATERMARK               0x870
> +#define PHB_TIMEOUT_CTRL1               0x878
> +#define PHB_TIMEOUT_CTRL2               0x880
> +#define PHB_Q_DMA_R                     0x888
> +#define   PHB_Q_DMA_R_QUIESCE_DMA       PPC_BIT(0)
> +#define   PHB_Q_DMA_R_AUTORESET         PPC_BIT(1)
> +#define   PHB_Q_DMA_R_DMA_RESP_STATUS   PPC_BIT(4)
> +#define   PHB_Q_DMA_R_MMIO_RESP_STATUS  PPC_BIT(5)
> +#define   PHB_Q_DMA_R_TCE_RESP_STATUS   PPC_BIT(6)
> +#define   PHB_Q_DMA_R_TCE_KILL_STATUS   PPC_BIT(7)
> +#define PHB_TCE_TAG_STATUS              0x908
> +
> +/* FIR & Error registers */
> +#define PHB_LEM_FIR_ACCUM               0xc00
> +#define PHB_LEM_FIR_AND_MASK            0xc08
> +#define PHB_LEM_FIR_OR_MASK             0xc10
> +#define PHB_LEM_ERROR_MASK              0xc18
> +#define PHB_LEM_ERROR_AND_MASK          0xc20
> +#define PHB_LEM_ERROR_OR_MASK           0xc28
> +#define PHB_LEM_ACTION0                 0xc30
> +#define PHB_LEM_ACTION1                 0xc38
> +#define PHB_LEM_WOF                     0xc40
> +#define PHB_ERR_STATUS                  0xc80
> +#define PHB_ERR1_STATUS                 0xc88
> +#define PHB_ERR_INJECT                  0xc90
> +#define PHB_ERR_LEM_ENABLE              0xc98
> +#define PHB_ERR_IRQ_ENABLE              0xca0
> +#define PHB_ERR_FREEZE_ENABLE           0xca8
> +#define PHB_ERR_AIB_FENCE_ENABLE        0xcb0
> +#define PHB_ERR_LOG_0                   0xcc0
> +#define PHB_ERR_LOG_1                   0xcc8
> +#define PHB_ERR_STATUS_MASK             0xcd0
> +#define PHB_ERR1_STATUS_MASK            0xcd8
> +
> +#define PHB_TXE_ERR_STATUS                      0xd00
> +#define PHB_TXE_ERR1_STATUS                     0xd08
> +#define PHB_TXE_ERR_INJECT                      0xd10
> +#define PHB_TXE_ERR_LEM_ENABLE                  0xd18
> +#define PHB_TXE_ERR_IRQ_ENABLE                  0xd20
> +#define PHB_TXE_ERR_FREEZE_ENABLE               0xd28
> +#define PHB_TXE_ERR_AIB_FENCE_ENABLE            0xd30
> +#define PHB_TXE_ERR_LOG_0                       0xd40
> +#define PHB_TXE_ERR_LOG_1                       0xd48
> +#define PHB_TXE_ERR_STATUS_MASK                 0xd50
> +#define PHB_TXE_ERR1_STATUS_MASK                0xd58
> +
> +#define PHB_RXE_ARB_ERR_STATUS                  0xd80
> +#define PHB_RXE_ARB_ERR1_STATUS                 0xd88
> +#define PHB_RXE_ARB_ERR_INJECT                  0xd90
> +#define PHB_RXE_ARB_ERR_LEM_ENABLE              0xd98
> +#define PHB_RXE_ARB_ERR_IRQ_ENABLE              0xda0
> +#define PHB_RXE_ARB_ERR_FREEZE_ENABLE           0xda8
> +#define PHB_RXE_ARB_ERR_AIB_FENCE_ENABLE        0xdb0
> +#define PHB_RXE_ARB_ERR_LOG_0                   0xdc0
> +#define PHB_RXE_ARB_ERR_LOG_1                   0xdc8
> +#define PHB_RXE_ARB_ERR_STATUS_MASK             0xdd0
> +#define PHB_RXE_ARB_ERR1_STATUS_MASK            0xdd8
> +
> +#define PHB_RXE_MRG_ERR_STATUS                  0xe00
> +#define PHB_RXE_MRG_ERR1_STATUS                 0xe08
> +#define PHB_RXE_MRG_ERR_INJECT                  0xe10
> +#define PHB_RXE_MRG_ERR_LEM_ENABLE              0xe18
> +#define PHB_RXE_MRG_ERR_IRQ_ENABLE              0xe20
> +#define PHB_RXE_MRG_ERR_FREEZE_ENABLE           0xe28
> +#define PHB_RXE_MRG_ERR_AIB_FENCE_ENABLE        0xe30
> +#define PHB_RXE_MRG_ERR_LOG_0                   0xe40
> +#define PHB_RXE_MRG_ERR_LOG_1                   0xe48
> +#define PHB_RXE_MRG_ERR_STATUS_MASK             0xe50
> +#define PHB_RXE_MRG_ERR1_STATUS_MASK            0xe58
> +
> +#define PHB_RXE_TCE_ERR_STATUS                  0xe80
> +#define PHB_RXE_TCE_ERR1_STATUS                 0xe88
> +#define PHB_RXE_TCE_ERR_INJECT                  0xe90
> +#define PHB_RXE_TCE_ERR_LEM_ENABLE              0xe98
> +#define PHB_RXE_TCE_ERR_IRQ_ENABLE              0xea0
> +#define PHB_RXE_TCE_ERR_FREEZE_ENABLE           0xea8
> +#define PHB_RXE_TCE_ERR_AIB_FENCE_ENABLE        0xeb0
> +#define PHB_RXE_TCE_ERR_LOG_0                   0xec0
> +#define PHB_RXE_TCE_ERR_LOG_1                   0xec8
> +#define PHB_RXE_TCE_ERR_STATUS_MASK             0xed0
> +#define PHB_RXE_TCE_ERR1_STATUS_MASK            0xed8
> +
> +/* Performance monitor & Debug registers */
> +#define PHB_TRACE_CONTROL                       0xf80
> +#define PHB_PERFMON_CONFIG                      0xf88
> +#define PHB_PERFMON_CTR0                        0xf90
> +#define PHB_PERFMON_CTR1                        0xf98
> +#define PHB_PERFMON_CTR2                        0xfa0
> +#define PHB_PERFMON_CTR3                        0xfa8
> +
> +/* Root complex config space memory mapped */
> +#define PHB_RC_CONFIG_BASE                      0x1000
> +#define   PHB_RC_CONFIG_SIZE                    0x800
> +
> +/* PHB4 REGB registers */
> +
> +/* PBL core */
> +#define PHB_PBL_CONTROL                         0x1800
> +#define PHB_PBL_TIMEOUT_CTRL                    0x1810
> +#define PHB_PBL_NPTAG_ENABLE                    0x1820
> +#define PHB_PBL_NBW_CMP_MASK                    0x1830
> +#define   PHB_PBL_NBW_MASK_ENABLE               PPC_BIT(63)
> +#define PHB_PBL_SYS_LINK_INIT                   0x1838
> +#define PHB_PBL_BUF_STATUS                      0x1840
> +#define PHB_PBL_ERR_STATUS                      0x1900
> +#define PHB_PBL_ERR1_STATUS                     0x1908
> +#define PHB_PBL_ERR_INJECT                      0x1910
> +#define PHB_PBL_ERR_INF_ENABLE                  0x1920
> +#define PHB_PBL_ERR_ERC_ENABLE                  0x1928
> +#define PHB_PBL_ERR_FAT_ENABLE                  0x1930
> +#define PHB_PBL_ERR_LOG_0                       0x1940
> +#define PHB_PBL_ERR_LOG_1                       0x1948
> +#define PHB_PBL_ERR_STATUS_MASK                 0x1950
> +#define PHB_PBL_ERR1_STATUS_MASK                0x1958
> +
> +/* PCI-E stack */
> +#define PHB_PCIE_SCR                    0x1A00
> +#define   PHB_PCIE_SCR_SLOT_CAP         PPC_BIT(15)
> +#define   PHB_PCIE_SCR_MAXLINKSPEED     PPC_BITMASK(32, 35)
> +
> +
> +#define PHB_PCIE_CRESET                 0x1A10
> +#define   PHB_PCIE_CRESET_CFG_CORE      PPC_BIT(0)
> +#define   PHB_PCIE_CRESET_TLDLP         PPC_BIT(1)
> +#define   PHB_PCIE_CRESET_PBL           PPC_BIT(2)
> +#define   PHB_PCIE_CRESET_PERST_N       PPC_BIT(3)
> +#define   PHB_PCIE_CRESET_PIPE_N        PPC_BIT(4)
> +
> +
> +#define PHB_PCIE_HOTPLUG_STATUS         0x1A20
> +#define   PHB_PCIE_HPSTAT_PRESENCE      PPC_BIT(10)
> +
> +#define PHB_PCIE_DLP_TRAIN_CTL          0x1A40
> +#define   PHB_PCIE_DLP_LINK_WIDTH       PPC_BITMASK(30, 35)
> +#define   PHB_PCIE_DLP_LINK_SPEED       PPC_BITMASK(36, 39)
> +#define   PHB_PCIE_DLP_LTSSM_TRC        PPC_BITMASK(24, 27)
> +#define     PHB_PCIE_DLP_LTSSM_RESET    0
> +#define     PHB_PCIE_DLP_LTSSM_DETECT   1
> +#define     PHB_PCIE_DLP_LTSSM_POLLING  2
> +#define     PHB_PCIE_DLP_LTSSM_CONFIG   3
> +#define     PHB_PCIE_DLP_LTSSM_L0       4
> +#define     PHB_PCIE_DLP_LTSSM_REC      5
> +#define     PHB_PCIE_DLP_LTSSM_L1       6
> +#define     PHB_PCIE_DLP_LTSSM_L2       7
> +#define     PHB_PCIE_DLP_LTSSM_HOTRESET 8
> +#define     PHB_PCIE_DLP_LTSSM_DISABLED 9
> +#define     PHB_PCIE_DLP_LTSSM_LOOPBACK 10
> +#define   PHB_PCIE_DLP_TL_LINKACT       PPC_BIT(23)
> +#define   PHB_PCIE_DLP_DL_PGRESET       PPC_BIT(22)
> +#define   PHB_PCIE_DLP_TRAINING         PPC_BIT(20)
> +#define   PHB_PCIE_DLP_INBAND_PRESENCE  PPC_BIT(19)
> +
> +#define PHB_PCIE_DLP_CTL                0x1A78
> +#define   PHB_PCIE_DLP_CTL_BYPASS_PH2   PPC_BIT(4)
> +#define   PHB_PCIE_DLP_CTL_BYPASS_PH3   PPC_BIT(5)
> +
> +#define PHB_PCIE_DLP_TRWCTL             0x1A80
> +#define   PHB_PCIE_DLP_TRWCTL_EN        PPC_BIT(0)
> +
> +#define PHB_PCIE_DLP_ERRLOG1            0x1AA0
> +#define PHB_PCIE_DLP_ERRLOG2            0x1AA8
> +#define PHB_PCIE_DLP_ERR_STATUS         0x1AB0
> +#define PHB_PCIE_DLP_ERR_COUNTERS       0x1AB8
> +
> +#define PHB_PCIE_LANE_EQ_CNTL0          0x1AD0
> +#define PHB_PCIE_LANE_EQ_CNTL1          0x1AD8
> +#define PHB_PCIE_LANE_EQ_CNTL2          0x1AE0
> +#define PHB_PCIE_LANE_EQ_CNTL3          0x1AE8
> +#define PHB_PCIE_LANE_EQ_CNTL20         0x1AF0
> +#define PHB_PCIE_LANE_EQ_CNTL21         0x1AF8
> +#define PHB_PCIE_LANE_EQ_CNTL22         0x1B00 /* DD1 only */
> +#define PHB_PCIE_LANE_EQ_CNTL23         0x1B08 /* DD1 only */
> +#define PHB_PCIE_TRACE_CTRL             0x1B20
> +#define PHB_PCIE_MISC_STRAP             0x1B30
> +
> +/* Error */
> +#define PHB_REGB_ERR_STATUS             0x1C00
> +#define PHB_REGB_ERR1_STATUS            0x1C08
> +#define PHB_REGB_ERR_INJECT             0x1C10
> +#define PHB_REGB_ERR_INF_ENABLE         0x1C20
> +#define PHB_REGB_ERR_ERC_ENABLE         0x1C28
> +#define PHB_REGB_ERR_FAT_ENABLE         0x1C30
> +#define PHB_REGB_ERR_LOG_0              0x1C40
> +#define PHB_REGB_ERR_LOG_1              0x1C48
> +#define PHB_REGB_ERR_STATUS_MASK        0x1C50
> +#define PHB_REGB_ERR1_STATUS_MASK       0x1C58
> +
> +/*
> + * IODA3 on-chip tables
> + */
> +
> +#define IODA3_TBL_LIST          1
> +#define IODA3_TBL_MIST          2
> +#define IODA3_TBL_RCAM          5
> +#define IODA3_TBL_MRT           6
> +#define IODA3_TBL_PESTA         7
> +#define IODA3_TBL_PESTB         8
> +#define IODA3_TBL_TVT           9
> +#define IODA3_TBL_TCR           10
> +#define IODA3_TBL_TDR           11
> +#define IODA3_TBL_MBT           16
> +#define IODA3_TBL_MDT           17
> +#define IODA3_TBL_PEEV          20
> +
> +/* LIST */
> +#define IODA3_LIST_P                    PPC_BIT(6)
> +#define IODA3_LIST_Q                    PPC_BIT(7)
> +#define IODA3_LIST_STATE                PPC_BIT(14)
> +
> +/* MIST */
> +#define IODA3_MIST_P3                   PPC_BIT(48 + 0)
> +#define IODA3_MIST_Q3                   PPC_BIT(48 + 1)
> +#define IODA3_MIST_PE3                  PPC_BITMASK(48 + 4, 48 + 15)
> +
> +/* TVT */
> +#define IODA3_TVT_TABLE_ADDR            PPC_BITMASK(0, 47)
> +#define IODA3_TVT_NUM_LEVELS            PPC_BITMASK(48, 50)
> +#define   IODA3_TVE_1_LEVEL     0
> +#define   IODA3_TVE_2_LEVELS    1
> +#define   IODA3_TVE_3_LEVELS    2
> +#define   IODA3_TVE_4_LEVELS    3
> +#define   IODA3_TVE_5_LEVELS    4
> +#define IODA3_TVT_TCE_TABLE_SIZE        PPC_BITMASK(51, 55)
> +#define IODA3_TVT_NON_TRANSLATE_50      PPC_BIT(56)
> +#define IODA3_TVT_IO_PSIZE              PPC_BITMASK(59, 63)
> +
> +/* PESTA */
> +#define IODA3_PESTA_MMIO_FROZEN         PPC_BIT(0)
> +#define IODA3_PESTA_TRANS_TYPE          PPC_BITMASK(5, 7)
> +#define  IODA3_PESTA_TRANS_TYPE_MMIOLOAD 0x4
> +#define IODA3_PESTA_CA_CMPLT_TMT        PPC_BIT(8)
> +#define IODA3_PESTA_UR                  PPC_BIT(9)
> +
> +/* PESTB */
> +#define IODA3_PESTB_DMA_STOPPED         PPC_BIT(0)
> +
> +/* MDT */
> +/* FIXME: check this field with Eric and add a B, C and D */
> +#define IODA3_MDT_PE_A                  PPC_BITMASK(0, 15)
> +#define IODA3_MDT_PE_B                  PPC_BITMASK(16, 31)
> +#define IODA3_MDT_PE_C                  PPC_BITMASK(32, 47)
> +#define IODA3_MDT_PE_D                  PPC_BITMASK(48, 63)
> +
> +/* MBT */
> +#define IODA3_MBT0_ENABLE               PPC_BIT(0)
> +#define IODA3_MBT0_TYPE                 PPC_BIT(1)
> +#define   IODA3_MBT0_TYPE_M32           IODA3_MBT0_TYPE
> +#define   IODA3_MBT0_TYPE_M64           0
> +#define IODA3_MBT0_MODE                 PPC_BITMASK(2, 3)
> +#define   IODA3_MBT0_MODE_PE_SEG        0
> +#define   IODA3_MBT0_MODE_MDT           1
> +#define   IODA3_MBT0_MODE_SINGLE_PE     2
> +#define IODA3_MBT0_SEG_DIV              PPC_BITMASK(4, 5)
> +#define   IODA3_MBT0_SEG_DIV_MAX        0
> +#define   IODA3_MBT0_SEG_DIV_128        1
> +#define   IODA3_MBT0_SEG_DIV_64         2
> +#define   IODA3_MBT0_SEG_DIV_8          3
> +#define IODA3_MBT0_MDT_COLUMN           PPC_BITMASK(4, 5)
> +#define IODA3_MBT0_BASE_ADDR            PPC_BITMASK(8, 51)
> +
> +#define IODA3_MBT1_ENABLE               PPC_BIT(0)
> +#define IODA3_MBT1_MASK                 PPC_BITMASK(8, 51)
> +#define IODA3_MBT1_SEG_BASE             PPC_BITMASK(55, 63)
> +#define IODA3_MBT1_SINGLE_PE_NUM        PPC_BITMASK(55, 63)
> +
> +/*
> + * IODA3 in-memory tables
> + */
> +
> +/*
> + * PEST
> + *
> + * 2x8 bytes entries, PEST0 and PEST1
> + */
> +
> +#define IODA3_PEST0_MMIO_CAUSE          PPC_BIT(2)
> +#define IODA3_PEST0_CFG_READ            PPC_BIT(3)
> +#define IODA3_PEST0_CFG_WRITE           PPC_BIT(4)
> +#define IODA3_PEST0_TTYPE               PPC_BITMASK(5, 7)
> +#define   PEST_TTYPE_DMA_WRITE          0
> +#define   PEST_TTYPE_MSI                1
> +#define   PEST_TTYPE_DMA_READ           2
> +#define   PEST_TTYPE_DMA_READ_RESP      3
> +#define   PEST_TTYPE_MMIO_LOAD          4
> +#define   PEST_TTYPE_MMIO_STORE         5
> +#define   PEST_TTYPE_OTHER              7
> +#define IODA3_PEST0_CA_RETURN           PPC_BIT(8)
> +#define IODA3_PEST0_UR_RETURN           PPC_BIT(9)
> +#define IODA3_PEST0_PCIE_NONFATAL       PPC_BIT(10)
> +#define IODA3_PEST0_PCIE_FATAL          PPC_BIT(11)
> +#define IODA3_PEST0_PARITY_UE           PPC_BIT(13)
> +#define IODA3_PEST0_PCIE_CORRECTABLE    PPC_BIT(14)
> +#define IODA3_PEST0_PCIE_INTERRUPT      PPC_BIT(15)
> +#define IODA3_PEST0_MMIO_XLATE          PPC_BIT(16)
> +#define IODA3_PEST0_IODA3_ERROR         PPC_BIT(16) /* Same bit as MMIO xlate */
> +#define IODA3_PEST0_TCE_PAGE_FAULT      PPC_BIT(18)
> +#define IODA3_PEST0_TCE_ACCESS_FAULT    PPC_BIT(19)
> +#define IODA3_PEST0_DMA_RESP_TIMEOUT    PPC_BIT(20)
> +#define IODA3_PEST0_AIB_SIZE_INVALID    PPC_BIT(21)
> +#define IODA3_PEST0_LEM_BIT             PPC_BITMASK(26, 31)
> +#define IODA3_PEST0_RID                 PPC_BITMASK(32, 47)
> +#define IODA3_PEST0_MSI_DATA            PPC_BITMASK(48, 63)
> +
> +#define IODA3_PEST1_FAIL_ADDR           PPC_BITMASK(3, 63)
> +
> +
> +#endif /* PCI_HOST_PNV_PHB4_REGS_H */
> diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
> index 75154300870f..4b3d254b0821 100644
> --- a/include/hw/pci/pcie_port.h
> +++ b/include/hw/pci/pcie_port.h
> @@ -72,6 +72,7 @@ void pcie_chassis_del_slot(PCIESlot *s);
>  typedef struct PCIERootPortClass {
>      PCIDeviceClass parent_class;
>      DeviceRealize parent_realize;
> +    DeviceReset parent_reset;
>  
>      uint8_t (*aer_vector)(const PCIDevice *dev);
>      int (*interrupts_init)(PCIDevice *dev, Error **errp);
> diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
> index f225f2f6bf67..805f9058f5d9 100644
> --- a/include/hw/ppc/pnv.h
> +++ b/include/hw/ppc/pnv.h
> @@ -30,6 +30,7 @@
>  #include "hw/ppc/pnv_homer.h"
>  #include "hw/ppc/pnv_xive.h"
>  #include "hw/ppc/pnv_core.h"
> +#include "hw/pci-host/pnv_phb4.h"
>  
>  #define TYPE_PNV_CHIP "pnv-chip"
>  #define PNV_CHIP(obj) OBJECT_CHECK(PnvChip, (obj), TYPE_PNV_CHIP)
> @@ -52,6 +53,8 @@ typedef struct PnvChip {
>      uint64_t     cores_mask;
>      PnvCore      **cores;
>  
> +    uint32_t     num_phbs;
> +
>      MemoryRegion xscom_mmio;
>      MemoryRegion xscom;
>      AddressSpace xscom_as;
> @@ -93,6 +96,9 @@ typedef struct Pnv9Chip {
>  
>      uint32_t     nr_quads;
>      PnvQuad      *quads;
> +
> +#define PNV9_CHIP_MAX_PEC 3
> +    PnvPhb4PecState pecs[PNV9_CHIP_MAX_PEC];
>  } Pnv9Chip;
>  
>  /*
> @@ -120,6 +126,7 @@ typedef struct PnvChipClass {
>      /*< public >*/
>      uint64_t     chip_cfam_id;
>      uint64_t     cores_mask;
> +    uint32_t     num_phbs;
>  
>      DeviceRealize parent_realize;
>  
> diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
> index f74c81a980f3..0fc57b036753 100644
> --- a/include/hw/ppc/pnv_xscom.h
> +++ b/include/hw/ppc/pnv_xscom.h
> @@ -94,6 +94,17 @@ typedef struct PnvXScomInterfaceClass {
>  #define PNV9_XSCOM_XIVE_BASE      0x5013000
>  #define PNV9_XSCOM_XIVE_SIZE      0x300
>  
> +#define PNV9_XSCOM_PEC_NEST_BASE  0x4010c00
> +#define PNV9_XSCOM_PEC_NEST_SIZE  0x100
> +
> +#define PNV9_XSCOM_PEC_PCI_BASE   0xd010800
> +#define PNV9_XSCOM_PEC_PCI_SIZE   0x200
> +
> +/* XSCOM PCI "pass-through" window to PHB SCOM */
> +#define PNV9_XSCOM_PEC_PCI_STK0   0x100
> +#define PNV9_XSCOM_PEC_PCI_STK1   0x140
> +#define PNV9_XSCOM_PEC_PCI_STK2   0x180
> +
>  /*
>   * Layout of the XSCOM PCB addresses (POWER 10)
>   */
> diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
> new file mode 100644
> index 000000000000..3c54b02ec929
> --- /dev/null
> +++ b/hw/pci-host/pnv_phb4.c
> @@ -0,0 +1,1438 @@
> +/*
> + * QEMU PowerPC PowerNV (POWER9) PHB4 model
> + *
> + * Copyright (c) 2018-2020, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qapi/visitor.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +#include "monitor/monitor.h"
> +#include "target/ppc/cpu.h"
> +#include "hw/pci-host/pnv_phb4_regs.h"
> +#include "hw/pci-host/pnv_phb4.h"
> +#include "hw/pci/pcie_host.h"
> +#include "hw/pci/pcie_port.h"
> +#include "hw/ppc/pnv.h"
> +#include "hw/ppc/pnv_xscom.h"
> +#include "hw/irq.h"
> +#include "hw/qdev-properties.h"
> +
> +#define phb_error(phb, fmt, ...)                                        \
> +    qemu_log_mask(LOG_GUEST_ERROR, "phb4[%d:%d]: " fmt "\n",            \
> +                  (phb)->chip_id, (phb)->phb_id, ## __VA_ARGS__)
> +
> +/*
> + * QEMU version of the GETFIELD/SETFIELD macros
> + *
> + * These are common with the PnvXive model.
> + */
> +static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
> +{
> +    return (word & mask) >> ctz64(mask);
> +}
> +
> +static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
> +                                uint64_t value)
> +{
> +    return (word & ~mask) | ((value << ctz64(mask)) & mask);
> +}
> +
> +static PCIDevice *pnv_phb4_find_cfg_dev(PnvPHB4 *phb)
> +{
> +    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
> +    uint64_t addr = phb->regs[PHB_CONFIG_ADDRESS >> 3];
> +    uint8_t bus, devfn;
> +
> +    if (!(addr >> 63)) {
> +        return NULL;
> +    }
> +    bus = (addr >> 52) & 0xff;
> +    devfn = (addr >> 44) & 0xff;
> +
> +    /* We don't access the root complex this way */
> +    if (bus == 0 && devfn == 0) {
> +        return NULL;
> +    }
> +    return pci_find_device(pci->bus, bus, devfn);
> +}
> +
> +/*
> + * The CONFIG_DATA register expects little endian accesses, but as the
> + * region is big endian, we have to swap the value.
> + */
> +static void pnv_phb4_config_write(PnvPHB4 *phb, unsigned off,
> +                                  unsigned size, uint64_t val)
> +{
> +    uint32_t cfg_addr, limit;
> +    PCIDevice *pdev;
> +
> +    pdev = pnv_phb4_find_cfg_dev(phb);
> +    if (!pdev) {
> +        return;
> +    }
> +    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
> +    cfg_addr |= off;
> +    limit = pci_config_size(pdev);
> +    if (limit <= cfg_addr) {
> +        /*
> +         * conventional pci device can be behind pcie-to-pci bridge.
> +         * 256 <= addr < 4K has no effects.
> +         */
> +        return;
> +    }
> +    switch (size) {
> +    case 1:
> +        break;
> +    case 2:
> +        val = bswap16(val);

I'm a little confused by these byteswaps.  As I see below the device
is set to big endian, so the values passed in here should already be
in host-native endian.  Why do you need the swap?  Are some of the
registers in the bank BE and some LE?

> +        break;
> +    case 4:
> +        val = bswap32(val);
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +    pci_host_config_write_common(pdev, cfg_addr, limit, val, size);
> +}
> +
> +static uint64_t pnv_phb4_config_read(PnvPHB4 *phb, unsigned off,
> +                                     unsigned size)
> +{
> +    uint32_t cfg_addr, limit;
> +    PCIDevice *pdev;
> +    uint64_t val;
> +
> +    pdev = pnv_phb4_find_cfg_dev(phb);
> +    if (!pdev) {
> +        return ~0ull;
> +    }
> +    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
> +    cfg_addr |= off;
> +    limit = pci_config_size(pdev);
> +    if (limit <= cfg_addr) {
> +        /*
> +         * conventional pci device can be behind pcie-to-pci bridge.
> +         * 256 <= addr < 4K has no effects.
> +         */
> +        return ~0ull;
> +    }
> +    val = pci_host_config_read_common(pdev, cfg_addr, limit, size);
> +    switch (size) {
> +    case 1:
> +        return val;
> +    case 2:
> +        return bswap16(val);
> +    case 4:
> +        return bswap32(val);
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
> +/*
> + * Root complex register accesses are memory mapped.
> + */
> +static void pnv_phb4_rc_config_write(PnvPHB4 *phb, unsigned off,
> +                                     unsigned size, uint64_t val)
> +{
> +    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
> +    PCIDevice *pdev;
> +
> +    if (size != 4) {
> +        phb_error(phb, "rc_config_write invalid size %d\n", size);
> +        return;
> +    }
> +
> +    pdev = pci_find_device(pci->bus, 0, 0);
> +    assert(pdev);
> +
> +    pci_host_config_write_common(pdev, off, PHB_RC_CONFIG_SIZE,
> +                                 bswap32(val), 4);
> +}
> +
> +static uint64_t pnv_phb4_rc_config_read(PnvPHB4 *phb, unsigned off,
> +                                        unsigned size)
> +{
> +    PCIHostState *pci = PCI_HOST_BRIDGE(phb);
> +    PCIDevice *pdev;
> +    uint64_t val;
> +
> +    if (size != 4) {
> +        phb_error(phb, "rc_config_read invalid size %d\n", size);
> +        return ~0ull;
> +    }
> +
> +    pdev = pci_find_device(pci->bus, 0, 0);
> +    assert(pdev);
> +
> +    val = pci_host_config_read_common(pdev, off, PHB_RC_CONFIG_SIZE, 4);
> +    return bswap32(val);
> +}
> +
> +static void pnv_phb4_check_mbt(PnvPHB4 *phb, uint32_t index)
> +{
> +    uint64_t base, start, size, mbe0, mbe1;
> +    MemoryRegion *parent;
> +    char name[64];
> +
> +    /* Unmap first */
> +    if (memory_region_is_mapped(&phb->mr_mmio[index])) {
> +        /* Should we destroy it in RCU friendly way... ? */
> +        memory_region_del_subregion(phb->mr_mmio[index].container,
> +                                    &phb->mr_mmio[index]);
> +    }
> +
> +    /* Get table entry */
> +    mbe0 = phb->ioda_MBT[(index << 1)];
> +    mbe1 = phb->ioda_MBT[(index << 1) + 1];
> +
> +    if (!(mbe0 & IODA3_MBT0_ENABLE)) {
> +        return;
> +    }
> +
> +    /* Grab geometry from registers */
> +    base = GETFIELD(IODA3_MBT0_BASE_ADDR, mbe0) << 12;
> +    size = GETFIELD(IODA3_MBT1_MASK, mbe1) << 12;
> +    size |= 0xff00000000000000ull;
> +    size = ~size + 1;
> +
> +    /* Calculate PCI side start address based on M32/M64 window type */
> +    if (mbe0 & IODA3_MBT0_TYPE_M32) {
> +        start = phb->regs[PHB_M32_START_ADDR >> 3];
> +        if ((start + size) > 0x100000000ull) {
> +            phb_error(phb, "M32 set beyond 4GB boundary !");
> +            size = 0x100000000 - start;
> +        }
> +    } else {
> +        start = base | (phb->regs[PHB_M64_UPPER_BITS >> 3]);
> +    }
> +
> +    /* TODO: Figure out how to implemet/decode AOMASK */
> +
> +    /* Check if it matches an enabled MMIO region in the PEC stack */
> +    if (memory_region_is_mapped(&phb->stack->mmbar0) &&
> +        base >= phb->stack->mmio0_base &&
> +        (base + size) <= (phb->stack->mmio0_base + phb->stack->mmio0_size)) {
> +        parent = &phb->stack->mmbar0;
> +        base -= phb->stack->mmio0_base;
> +    } else if (memory_region_is_mapped(&phb->stack->mmbar1) &&
> +        base >= phb->stack->mmio1_base &&
> +        (base + size) <= (phb->stack->mmio1_base + phb->stack->mmio1_size)) {
> +        parent = &phb->stack->mmbar1;
> +        base -= phb->stack->mmio1_base;
> +    } else {
> +        phb_error(phb, "PHB MBAR %d out of parent bounds", index);
> +        return;
> +    }
> +
> +    /* Create alias (better name ?) */
> +    snprintf(name, sizeof(name), "phb4-mbar%d", index);
> +    memory_region_init_alias(&phb->mr_mmio[index], OBJECT(phb), name,
> +                             &phb->pci_mmio, start, size);
> +    memory_region_add_subregion(parent, base, &phb->mr_mmio[index]);
> +}
> +
> +static void pnv_phb4_check_all_mbt(PnvPHB4 *phb)
> +{
> +    uint64_t i;
> +    uint32_t num_windows = phb->big_phb ? PNV_PHB4_MAX_MMIO_WINDOWS :
> +        PNV_PHB4_MIN_MMIO_WINDOWS;
> +
> +    for (i = 0; i < num_windows; i++) {
> +        pnv_phb4_check_mbt(phb, i);
> +    }
> +}
> +
> +static uint64_t *pnv_phb4_ioda_access(PnvPHB4 *phb,
> +                                      unsigned *out_table, unsigned *out_idx)
> +{
> +    uint64_t adreg = phb->regs[PHB_IODA_ADDR >> 3];
> +    unsigned int index = GETFIELD(PHB_IODA_AD_TADR, adreg);
> +    unsigned int table = GETFIELD(PHB_IODA_AD_TSEL, adreg);
> +    unsigned int mask;
> +    uint64_t *tptr = NULL;
> +
> +    switch (table) {
> +    case IODA3_TBL_LIST:
> +        tptr = phb->ioda_LIST;
> +        mask = 7;
> +        break;
> +    case IODA3_TBL_MIST:
> +        tptr = phb->ioda_MIST;
> +        mask = phb->big_phb ? PNV_PHB4_MAX_MIST : (PNV_PHB4_MAX_MIST >> 1);
> +        mask -= 1;
> +        break;
> +    case IODA3_TBL_RCAM:
> +        mask = phb->big_phb ? 127 : 63;
> +        break;
> +    case IODA3_TBL_MRT:
> +        mask = phb->big_phb ? 15 : 7;
> +        break;
> +    case IODA3_TBL_PESTA:
> +    case IODA3_TBL_PESTB:
> +        mask = phb->big_phb ? PNV_PHB4_MAX_PEs : (PNV_PHB4_MAX_PEs >> 1);
> +        mask -= 1;
> +        break;
> +    case IODA3_TBL_TVT:
> +        tptr = phb->ioda_TVT;
> +        mask = phb->big_phb ? PNV_PHB4_MAX_TVEs : (PNV_PHB4_MAX_TVEs >> 1);
> +        mask -= 1;
> +        break;
> +    case IODA3_TBL_TCR:
> +    case IODA3_TBL_TDR:
> +        mask = phb->big_phb ? 1023 : 511;
> +        break;
> +    case IODA3_TBL_MBT:
> +        tptr = phb->ioda_MBT;
> +        mask = phb->big_phb ? PNV_PHB4_MAX_MBEs : (PNV_PHB4_MAX_MBEs >> 1);
> +        mask -= 1;
> +        break;
> +    case IODA3_TBL_MDT:
> +        tptr = phb->ioda_MDT;
> +        mask = phb->big_phb ? PNV_PHB4_MAX_PEs : (PNV_PHB4_MAX_PEs >> 1);
> +        mask -= 1;
> +        break;
> +    case IODA3_TBL_PEEV:
> +        tptr = phb->ioda_PEEV;
> +        mask = phb->big_phb ? PNV_PHB4_MAX_PEEVs : (PNV_PHB4_MAX_PEEVs >> 1);
> +        mask -= 1;
> +        break;
> +    default:
> +        phb_error(phb, "invalid IODA table %d", table);
> +        return NULL;
> +    }
> +    index &= mask;
> +    if (out_idx) {
> +        *out_idx = index;
> +    }
> +    if (out_table) {
> +        *out_table = table;
> +    }
> +    if (tptr) {
> +        tptr += index;
> +    }
> +    if (adreg & PHB_IODA_AD_AUTOINC) {
> +        index = (index + 1) & mask;
> +        adreg = SETFIELD(PHB_IODA_AD_TADR, adreg, index);
> +    }
> +
> +    phb->regs[PHB_IODA_ADDR >> 3] = adreg;
> +    return tptr;
> +}
> +
> +static uint64_t pnv_phb4_ioda_read(PnvPHB4 *phb)
> +{
> +    unsigned table, idx;
> +    uint64_t *tptr;
> +
> +    tptr = pnv_phb4_ioda_access(phb, &table, &idx);
> +    if (!tptr) {
> +        /* Special PESTA case */
> +        if (table == IODA3_TBL_PESTA) {
> +            return ((uint64_t)(phb->ioda_PEST_AB[idx] & 1)) << 63;
> +        } else if (table == IODA3_TBL_PESTB) {
> +            return ((uint64_t)(phb->ioda_PEST_AB[idx] & 2)) << 62;
> +        }
> +        /* Return 0 on unsupported tables, not ff's */
> +        return 0;
> +    }
> +    return *tptr;
> +}
> +
> +static void pnv_phb4_ioda_write(PnvPHB4 *phb, uint64_t val)
> +{
> +    unsigned table, idx;
> +    uint64_t *tptr;
> +
> +    tptr = pnv_phb4_ioda_access(phb, &table, &idx);
> +    if (!tptr) {
> +        /* Special PESTA case */
> +        if (table == IODA3_TBL_PESTA) {
> +            phb->ioda_PEST_AB[idx] &= ~1;
> +            phb->ioda_PEST_AB[idx] |= (val >> 63) & 1;
> +        } else if (table == IODA3_TBL_PESTB) {
> +            phb->ioda_PEST_AB[idx] &= ~2;
> +            phb->ioda_PEST_AB[idx] |= (val >> 62) & 2;
> +        }
> +        return;
> +    }
> +
> +    /* Handle side effects */
> +    switch (table) {
> +    case IODA3_TBL_LIST:
> +        break;
> +    case IODA3_TBL_MIST: {
> +        /* Special mask for MIST partial write */
> +        uint64_t adreg = phb->regs[PHB_IODA_ADDR >> 3];
> +        uint32_t mmask = GETFIELD(PHB_IODA_AD_MIST_PWV, adreg);
> +        uint64_t v = *tptr;
> +        if (mmask == 0) {
> +            mmask = 0xf;
> +        }
> +        if (mmask & 8) {
> +            v &= 0x0000ffffffffffffull;
> +            v |= 0xcfff000000000000ull & val;
> +        }
> +        if (mmask & 4) {
> +            v &= 0xffff0000ffffffffull;
> +            v |= 0x0000cfff00000000ull & val;
> +        }
> +        if (mmask & 2) {
> +            v &= 0xffffffff0000ffffull;
> +            v |= 0x00000000cfff0000ull & val;
> +        }
> +        if (mmask & 1) {
> +            v &= 0xffffffffffff0000ull;
> +            v |= 0x000000000000cfffull & val;
> +        }
> +        *tptr = val;
> +        break;
> +    }
> +    case IODA3_TBL_MBT:
> +        *tptr = val;
> +
> +        /* Copy accross the valid bit to the other half */
> +        phb->ioda_MBT[idx ^ 1] &= 0x7fffffffffffffffull;
> +        phb->ioda_MBT[idx ^ 1] |= 0x8000000000000000ull & val;
> +
> +        /* Update mappings */
> +        pnv_phb4_check_mbt(phb, idx >> 1);
> +        break;
> +    default:
> +        *tptr = val;
> +    }
> +}
> +
> +static void pnv_phb4_rtc_invalidate(PnvPHB4 *phb, uint64_t val)
> +{
> +    PnvPhb4DMASpace *ds;
> +
> +    /* Always invalidate all for now ... */
> +    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
> +        ds->pe_num = PHB_INVALID_PE;
> +    }
> +}
> +
> +static void pnv_phb4_update_msi_regions(PnvPhb4DMASpace *ds)
> +{
> +    uint64_t cfg = ds->phb->regs[PHB_PHB4_CONFIG >> 3];
> +
> +    if (cfg & PHB_PHB4C_32BIT_MSI_EN) {
> +        if (!memory_region_is_mapped(MEMORY_REGION(&ds->msi32_mr))) {
> +            memory_region_add_subregion(MEMORY_REGION(&ds->dma_mr),
> +                                        0xffff0000, &ds->msi32_mr);
> +        }
> +    } else {
> +        if (memory_region_is_mapped(MEMORY_REGION(&ds->msi32_mr))) {
> +            memory_region_del_subregion(MEMORY_REGION(&ds->dma_mr),
> +                                        &ds->msi32_mr);
> +        }
> +    }
> +
> +    if (cfg & PHB_PHB4C_64BIT_MSI_EN) {
> +        if (!memory_region_is_mapped(MEMORY_REGION(&ds->msi64_mr))) {
> +            memory_region_add_subregion(MEMORY_REGION(&ds->dma_mr),
> +                                        (1ull << 60), &ds->msi64_mr);
> +        }
> +    } else {
> +        if (memory_region_is_mapped(MEMORY_REGION(&ds->msi64_mr))) {
> +            memory_region_del_subregion(MEMORY_REGION(&ds->dma_mr),
> +                                        &ds->msi64_mr);
> +        }
> +    }
> +}
> +
> +static void pnv_phb4_update_all_msi_regions(PnvPHB4 *phb)
> +{
> +    PnvPhb4DMASpace *ds;
> +
> +    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
> +        pnv_phb4_update_msi_regions(ds);
> +    }
> +}
> +
> +static void pnv_phb4_update_xsrc(PnvPHB4 *phb)
> +{
> +    int shift, flags, i, lsi_base;
> +    XiveSource *xsrc = &phb->xsrc;
> +
> +    /* The XIVE source characteristics can be set at run time */
> +    if (phb->regs[PHB_CTRLR >> 3] & PHB_CTRLR_IRQ_PGSZ_64K) {
> +        shift = XIVE_ESB_64K;
> +    } else {
> +        shift = XIVE_ESB_4K;
> +    }
> +    if (phb->regs[PHB_CTRLR >> 3] & PHB_CTRLR_IRQ_STORE_EOI) {
> +        flags = XIVE_SRC_STORE_EOI;
> +    } else {
> +        flags = 0;
> +    }
> +
> +    phb->xsrc.esb_shift = shift;
> +    phb->xsrc.esb_flags = flags;
> +
> +    lsi_base = GETFIELD(PHB_LSI_SRC_ID, phb->regs[PHB_LSI_SOURCE_ID >> 3]);
> +    lsi_base <<= 3;
> +
> +    /* TODO: handle reset values of PHB_LSI_SRC_ID */
> +    if (!lsi_base) {
> +        return;
> +    }
> +
> +    /* TODO: need a xive_source_irq_reset_lsi() */
> +    bitmap_zero(xsrc->lsi_map, xsrc->nr_irqs);
> +
> +    for (i = 0; i < xsrc->nr_irqs; i++) {
> +        bool msi = (i < lsi_base || i >= (lsi_base + 8));
> +        if (!msi) {
> +            xive_source_irq_set_lsi(xsrc, i);
> +        }
> +    }
> +}
> +
> +static void pnv_phb4_reg_write(void *opaque, hwaddr off, uint64_t val,
> +                               unsigned size)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(opaque);
> +    bool changed;
> +
> +    /* Special case outbound configuration data */
> +    if ((off & 0xfffc) == PHB_CONFIG_DATA) {
> +        pnv_phb4_config_write(phb, off & 0x3, size, val);
> +        return;
> +    }
> +
> +    /* Special case RC configuration space */
> +    if ((off & 0xf800) == PHB_RC_CONFIG_BASE) {
> +        pnv_phb4_rc_config_write(phb, off & 0x7ff, size, val);
> +        return;
> +    }
> +
> +    /* Other registers are 64-bit only */
> +    if (size != 8 || off & 0x7) {
> +        phb_error(phb, "Invalid register access, offset: 0x%"PRIx64" size: %d",
> +                   off, size);
> +        return;
> +    }
> +
> +    /* Handle masking */
> +    switch (off) {
> +    case PHB_LSI_SOURCE_ID:
> +        val &= PHB_LSI_SRC_ID;
> +        break;
> +    case PHB_M64_UPPER_BITS:
> +        val &= 0xff00000000000000ull;
> +        break;
> +    /* TCE Kill */
> +    case PHB_TCE_KILL:
> +        /* Clear top 3 bits which HW does to indicate successful queuing */
> +        val &= ~(PHB_TCE_KILL_ALL | PHB_TCE_KILL_PE | PHB_TCE_KILL_ONE);
> +        break;
> +    case PHB_Q_DMA_R:
> +        /*
> +         * This is enough logic to make SW happy but we aren't
> +         * actually quiescing the DMAs
> +         */
> +        if (val & PHB_Q_DMA_R_AUTORESET) {
> +            val = 0;
> +        } else {
> +            val &= PHB_Q_DMA_R_QUIESCE_DMA;
> +        }
> +        break;
> +    /* LEM stuff */
> +    case PHB_LEM_FIR_AND_MASK:
> +        phb->regs[PHB_LEM_FIR_ACCUM >> 3] &= val;
> +        return;
> +    case PHB_LEM_FIR_OR_MASK:
> +        phb->regs[PHB_LEM_FIR_ACCUM >> 3] |= val;
> +        return;
> +    case PHB_LEM_ERROR_AND_MASK:
> +        phb->regs[PHB_LEM_ERROR_MASK >> 3] &= val;
> +        return;
> +    case PHB_LEM_ERROR_OR_MASK:
> +        phb->regs[PHB_LEM_ERROR_MASK >> 3] |= val;
> +        return;
> +    case PHB_LEM_WOF:
> +        val = 0;
> +        break;
> +    /* TODO: More regs ..., maybe create a table with masks... */
> +
> +    /* Read only registers */
> +    case PHB_CPU_LOADSTORE_STATUS:
> +    case PHB_ETU_ERR_SUMMARY:
> +    case PHB_PHB4_GEN_CAP:
> +    case PHB_PHB4_TCE_CAP:
> +    case PHB_PHB4_IRQ_CAP:
> +    case PHB_PHB4_EEH_CAP:
> +        return;
> +    }
> +
> +    /* Record whether it changed */
> +    changed = phb->regs[off >> 3] != val;
> +
> +    /* Store in register cache first */
> +    phb->regs[off >> 3] = val;
> +
> +    /* Handle side effects */
> +    switch (off) {
> +    case PHB_PHB4_CONFIG:
> +        if (changed) {
> +            pnv_phb4_update_all_msi_regions(phb);
> +        }
> +        break;
> +    case PHB_M32_START_ADDR:
> +    case PHB_M64_UPPER_BITS:
> +        if (changed) {
> +            pnv_phb4_check_all_mbt(phb);
> +        }
> +        break;
> +
> +    /* IODA table accesses */
> +    case PHB_IODA_DATA0:
> +        pnv_phb4_ioda_write(phb, val);
> +        break;
> +
> +    /* RTC invalidation */
> +    case PHB_RTC_INVALIDATE:
> +        pnv_phb4_rtc_invalidate(phb, val);
> +        break;
> +
> +    /* PHB Control (Affects XIVE source) */
> +    case PHB_CTRLR:
> +    case PHB_LSI_SOURCE_ID:
> +        pnv_phb4_update_xsrc(phb);
> +        break;
> +
> +    /* Silent simple writes */
> +    case PHB_ASN_CMPM:
> +    case PHB_CONFIG_ADDRESS:
> +    case PHB_IODA_ADDR:
> +    case PHB_TCE_KILL:
> +    case PHB_TCE_SPEC_CTL:
> +    case PHB_PEST_BAR:
> +    case PHB_PELTV_BAR:
> +    case PHB_RTT_BAR:
> +    case PHB_LEM_FIR_ACCUM:
> +    case PHB_LEM_ERROR_MASK:
> +    case PHB_LEM_ACTION0:
> +    case PHB_LEM_ACTION1:
> +    case PHB_TCE_TAG_ENABLE:
> +    case PHB_INT_NOTIFY_ADDR:
> +    case PHB_INT_NOTIFY_INDEX:
> +    case PHB_DMARD_SYNC:
> +       break;
> +
> +    /* Noise on anything else */
> +    default:
> +        qemu_log_mask(LOG_UNIMP, "phb4: reg_write 0x%"PRIx64"=%"PRIx64"\n",
> +                      off, val);
> +    }
> +}
> +
> +static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr off, unsigned size)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(opaque);
> +    uint64_t val;
> +
> +    if ((off & 0xfffc) == PHB_CONFIG_DATA) {
> +        return pnv_phb4_config_read(phb, off & 0x3, size);
> +    }
> +
> +    /* Special case RC configuration space */
> +    if ((off & 0xf800) == PHB_RC_CONFIG_BASE) {
> +        return pnv_phb4_rc_config_read(phb, off & 0x7ff, size);
> +    }
> +
> +    /* Other registers are 64-bit only */
> +    if (size != 8 || off & 0x7) {
> +        phb_error(phb, "Invalid register access, offset: 0x%"PRIx64" size: %d",
> +                   off, size);
> +        return ~0ull;
> +    }
> +
> +    /* Default read from cache */
> +    val = phb->regs[off >> 3];
> +
> +    switch (off) {
> +    case PHB_VERSION:
> +        return phb->version;
> +
> +        /* Read-only */
> +    case PHB_PHB4_GEN_CAP:
> +        return 0xe4b8000000000000ull;
> +    case PHB_PHB4_TCE_CAP:
> +        return phb->big_phb ? 0x4008440000000400ull : 0x2008440000000200ull;
> +    case PHB_PHB4_IRQ_CAP:
> +        return phb->big_phb ? 0x0800000000001000ull : 0x0800000000000800ull;
> +    case PHB_PHB4_EEH_CAP:
> +        return phb->big_phb ? 0x2000000000000000ull : 0x1000000000000000ull;
> +
> +    /* IODA table accesses */
> +    case PHB_IODA_DATA0:
> +        return pnv_phb4_ioda_read(phb);
> +
> +    /* Link training always appears trained */
> +    case PHB_PCIE_DLP_TRAIN_CTL:
> +        /* TODO: Do something sensible with speed ? */
> +        return PHB_PCIE_DLP_INBAND_PRESENCE | PHB_PCIE_DLP_TL_LINKACT;
> +
> +    /* DMA read sync: make it look like it's complete */
> +    case PHB_DMARD_SYNC:
> +        return PHB_DMARD_SYNC_COMPLETE;
> +
> +    /* Silent simple reads */
> +    case PHB_LSI_SOURCE_ID:
> +    case PHB_CPU_LOADSTORE_STATUS:
> +    case PHB_ASN_CMPM:
> +    case PHB_PHB4_CONFIG:
> +    case PHB_M32_START_ADDR:
> +    case PHB_CONFIG_ADDRESS:
> +    case PHB_IODA_ADDR:
> +    case PHB_RTC_INVALIDATE:
> +    case PHB_TCE_KILL:
> +    case PHB_TCE_SPEC_CTL:
> +    case PHB_PEST_BAR:
> +    case PHB_PELTV_BAR:
> +    case PHB_RTT_BAR:
> +    case PHB_M64_UPPER_BITS:
> +    case PHB_CTRLR:
> +    case PHB_LEM_FIR_ACCUM:
> +    case PHB_LEM_ERROR_MASK:
> +    case PHB_LEM_ACTION0:
> +    case PHB_LEM_ACTION1:
> +    case PHB_TCE_TAG_ENABLE:
> +    case PHB_INT_NOTIFY_ADDR:
> +    case PHB_INT_NOTIFY_INDEX:
> +    case PHB_Q_DMA_R:
> +    case PHB_ETU_ERR_SUMMARY:
> +        break;
> +
> +    /* Noise on anything else */
> +    default:
> +        qemu_log_mask(LOG_UNIMP, "phb4: reg_read 0x%"PRIx64"=%"PRIx64"\n",
> +                      off, val);
> +    }
> +    return val;
> +}
> +
> +static const MemoryRegionOps pnv_phb4_reg_ops = {
> +    .read = pnv_phb4_reg_read,
> +    .write = pnv_phb4_reg_write,
> +    .valid.min_access_size = 1,
> +    .valid.max_access_size = 8,
> +    .impl.min_access_size = 1,
> +    .impl.max_access_size = 8,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
> +
> +static uint64_t pnv_phb4_xscom_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(opaque);
> +    uint32_t reg = addr >> 3;
> +    uint64_t val;
> +    hwaddr offset;
> +
> +    switch (reg) {
> +    case PHB_SCOM_HV_IND_ADDR:
> +        return phb->scom_hv_ind_addr_reg;
> +
> +    case PHB_SCOM_HV_IND_DATA:
> +        if (!(phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_VALID)) {
> +            phb_error(phb, "Invalid indirect address");
> +            return ~0ull;
> +        }
> +        size = (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_4B) ? 4 : 8;
> +        offset = GETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR, phb->scom_hv_ind_addr_reg);
> +        val = pnv_phb4_reg_read(phb, offset, size);
> +        if (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_AUTOINC) {
> +            offset += size;
> +            offset &= 0x3fff;
> +            phb->scom_hv_ind_addr_reg = SETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR,
> +                                                 phb->scom_hv_ind_addr_reg,
> +                                                 offset);
> +        }
> +        return val;
> +    case PHB_SCOM_ETU_LEM_FIR:
> +    case PHB_SCOM_ETU_LEM_FIR_AND:
> +    case PHB_SCOM_ETU_LEM_FIR_OR:
> +    case PHB_SCOM_ETU_LEM_FIR_MSK:
> +    case PHB_SCOM_ETU_LEM_ERR_MSK_AND:
> +    case PHB_SCOM_ETU_LEM_ERR_MSK_OR:
> +    case PHB_SCOM_ETU_LEM_ACT0:
> +    case PHB_SCOM_ETU_LEM_ACT1:
> +    case PHB_SCOM_ETU_LEM_WOF:
> +        offset = ((reg - PHB_SCOM_ETU_LEM_FIR) << 3) + PHB_LEM_FIR_ACCUM;
> +        return pnv_phb4_reg_read(phb, offset, size);
> +    case PHB_SCOM_ETU_PMON_CONFIG:
> +    case PHB_SCOM_ETU_PMON_CTR0:
> +    case PHB_SCOM_ETU_PMON_CTR1:
> +    case PHB_SCOM_ETU_PMON_CTR2:
> +    case PHB_SCOM_ETU_PMON_CTR3:
> +        offset = ((reg - PHB_SCOM_ETU_PMON_CONFIG) << 3) + PHB_PERFMON_CONFIG;
> +        return pnv_phb4_reg_read(phb, offset, size);
> +
> +    default:
> +        qemu_log_mask(LOG_UNIMP, "phb4: xscom_read 0x%"HWADDR_PRIx"\n", addr);
> +        return ~0ull;
> +    }
> +}
> +
> +static void pnv_phb4_xscom_write(void *opaque, hwaddr addr,
> +                                 uint64_t val, unsigned size)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(opaque);
> +    uint32_t reg = addr >> 3;
> +    hwaddr offset;
> +
> +    switch (reg) {
> +    case PHB_SCOM_HV_IND_ADDR:
> +        phb->scom_hv_ind_addr_reg = val & 0xe000000000001fff;
> +        break;
> +    case PHB_SCOM_HV_IND_DATA:
> +        if (!(phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_VALID)) {
> +            phb_error(phb, "Invalid indirect address");
> +            break;
> +        }
> +        size = (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_4B) ? 4 : 8;
> +        offset = GETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR, phb->scom_hv_ind_addr_reg);
> +        pnv_phb4_reg_write(phb, offset, val, size);
> +        if (phb->scom_hv_ind_addr_reg & PHB_SCOM_HV_IND_ADDR_AUTOINC) {
> +            offset += size;
> +            offset &= 0x3fff;
> +            phb->scom_hv_ind_addr_reg = SETFIELD(PHB_SCOM_HV_IND_ADDR_ADDR,
> +                                                 phb->scom_hv_ind_addr_reg,
> +                                                 offset);
> +        }
> +        break;
> +    case PHB_SCOM_ETU_LEM_FIR:
> +    case PHB_SCOM_ETU_LEM_FIR_AND:
> +    case PHB_SCOM_ETU_LEM_FIR_OR:
> +    case PHB_SCOM_ETU_LEM_FIR_MSK:
> +    case PHB_SCOM_ETU_LEM_ERR_MSK_AND:
> +    case PHB_SCOM_ETU_LEM_ERR_MSK_OR:
> +    case PHB_SCOM_ETU_LEM_ACT0:
> +    case PHB_SCOM_ETU_LEM_ACT1:
> +    case PHB_SCOM_ETU_LEM_WOF:
> +        offset = ((reg - PHB_SCOM_ETU_LEM_FIR) << 3) + PHB_LEM_FIR_ACCUM;
> +        pnv_phb4_reg_write(phb, offset, val, size);
> +        break;
> +    case PHB_SCOM_ETU_PMON_CONFIG:
> +    case PHB_SCOM_ETU_PMON_CTR0:
> +    case PHB_SCOM_ETU_PMON_CTR1:
> +    case PHB_SCOM_ETU_PMON_CTR2:
> +    case PHB_SCOM_ETU_PMON_CTR3:
> +        offset = ((reg - PHB_SCOM_ETU_PMON_CONFIG) << 3) + PHB_PERFMON_CONFIG;
> +        pnv_phb4_reg_write(phb, offset, val, size);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_UNIMP, "phb4: xscom_write 0x%"HWADDR_PRIx
> +                      "=%"PRIx64"\n", addr, val);
> +    }
> +}
> +
> +const MemoryRegionOps pnv_phb4_xscom_ops = {
> +    .read = pnv_phb4_xscom_read,
> +    .write = pnv_phb4_xscom_write,
> +    .valid.min_access_size = 8,
> +    .valid.max_access_size = 8,
> +    .impl.min_access_size = 8,
> +    .impl.max_access_size = 8,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
> +
> +static int pnv_phb4_map_irq(PCIDevice *pci_dev, int irq_num)
> +{
> +    /* Check that out properly ... */
> +    return irq_num & 3;
> +}
> +
> +static void pnv_phb4_set_irq(void *opaque, int irq_num, int level)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(opaque);
> +    uint32_t lsi_base;
> +
> +    /* LSI only ... */
> +    if (irq_num > 3) {
> +        phb_error(phb, "IRQ %x is not an LSI", irq_num);
> +    }
> +    lsi_base = GETFIELD(PHB_LSI_SRC_ID, phb->regs[PHB_LSI_SOURCE_ID >> 3]);
> +    lsi_base <<= 3;
> +    qemu_set_irq(phb->qirqs[lsi_base + irq_num], level);
> +}
> +
> +static bool pnv_phb4_resolve_pe(PnvPhb4DMASpace *ds)
> +{
> +    uint64_t rtt, addr;
> +    uint16_t rte;
> +    int bus_num;
> +    int num_PEs;
> +
> +    /* Already resolved ? */
> +    if (ds->pe_num != PHB_INVALID_PE) {
> +        return true;
> +    }
> +
> +    /* We need to lookup the RTT */
> +    rtt = ds->phb->regs[PHB_RTT_BAR >> 3];
> +    if (!(rtt & PHB_RTT_BAR_ENABLE)) {
> +        phb_error(ds->phb, "DMA with RTT BAR disabled !");
> +        /* Set error bits ? fence ? ... */
> +        return false;
> +    }
> +
> +    /* Read RTE */
> +    bus_num = pci_bus_num(ds->bus);
> +    addr = rtt & PHB_RTT_BASE_ADDRESS_MASK;
> +    addr += 2 * ((bus_num << 8) | ds->devfn);
> +    if (dma_memory_read(&address_space_memory, addr, &rte, sizeof(rte))) {
> +        phb_error(ds->phb, "Failed to read RTT entry at 0x%"PRIx64, addr);
> +        /* Set error bits ? fence ? ... */
> +        return false;
> +    }
> +    rte = be16_to_cpu(rte);
> +
> +    /* Fail upon reading of invalid PE# */
> +    num_PEs = ds->phb->big_phb ? PNV_PHB4_MAX_PEs : (PNV_PHB4_MAX_PEs >> 1);
> +    if (rte >= num_PEs) {
> +        phb_error(ds->phb, "RTE for RID 0x%x invalid (%04x", ds->devfn, rte);
> +        rte &= num_PEs - 1;
> +    }
> +    ds->pe_num = rte;
> +    return true;
> +}
> +
> +static void pnv_phb4_translate_tve(PnvPhb4DMASpace *ds, hwaddr addr,
> +                                   bool is_write, uint64_t tve,
> +                                   IOMMUTLBEntry *tlb)
> +{
> +    uint64_t tta = GETFIELD(IODA3_TVT_TABLE_ADDR, tve);
> +    int32_t  lev = GETFIELD(IODA3_TVT_NUM_LEVELS, tve);
> +    uint32_t tts = GETFIELD(IODA3_TVT_TCE_TABLE_SIZE, tve);
> +    uint32_t tps = GETFIELD(IODA3_TVT_IO_PSIZE, tve);
> +
> +    /* Invalid levels */
> +    if (lev > 4) {
> +        phb_error(ds->phb, "Invalid #levels in TVE %d", lev);
> +        return;
> +    }
> +
> +    /* Invalid entry */
> +    if (tts == 0) {
> +        phb_error(ds->phb, "Access to invalid TVE");
> +        return;
> +    }
> +
> +    /* IO Page Size of 0 means untranslated, else use TCEs */
> +    if (tps == 0) {
> +        /* TODO: Handle boundaries */
> +
> +        /* Use 4k pages like q35 ... for now */
> +        tlb->iova = addr & 0xfffffffffffff000ull;
> +        tlb->translated_addr = addr & 0x0003fffffffff000ull;
> +        tlb->addr_mask = 0xfffull;
> +        tlb->perm = IOMMU_RW;
> +    } else {
> +        uint32_t tce_shift, tbl_shift, sh;
> +        uint64_t base, taddr, tce, tce_mask;
> +
> +        /* Address bits per bottom level TCE entry */
> +        tce_shift = tps + 11;
> +
> +        /* Address bits per table level */
> +        tbl_shift = tts + 8;
> +
> +        /* Top level table base address */
> +        base = tta << 12;
> +
> +        /* Total shift to first level */
> +        sh = tbl_shift * lev + tce_shift;
> +
> +        /* TODO: Limit to support IO page sizes */
> +
> +        /* TODO: Multi-level untested */
> +        while ((lev--) >= 0) {
> +            /* Grab the TCE address */
> +            taddr = base | (((addr >> sh) & ((1ul << tbl_shift) - 1)) << 3);
> +            if (dma_memory_read(&address_space_memory, taddr, &tce,
> +                                sizeof(tce))) {
> +                phb_error(ds->phb, "Failed to read TCE at 0x%"PRIx64, taddr);
> +                return;
> +            }
> +            tce = be64_to_cpu(tce);
> +
> +            /* Check permission for indirect TCE */
> +            if ((lev >= 0) && !(tce & 3)) {
> +                phb_error(ds->phb, "Invalid indirect TCE at 0x%"PRIx64, taddr);
> +                phb_error(ds->phb, " xlate %"PRIx64":%c TVE=%"PRIx64, addr,
> +                           is_write ? 'W' : 'R', tve);
> +                phb_error(ds->phb, " tta=%"PRIx64" lev=%d tts=%d tps=%d",
> +                           tta, lev, tts, tps);
> +                return;
> +            }
> +            sh -= tbl_shift;
> +            base = tce & ~0xfffull;
> +        }
> +
> +        /* We exit the loop with TCE being the final TCE */
> +        tce_mask = ~((1ull << tce_shift) - 1);
> +        tlb->iova = addr & tce_mask;
> +        tlb->translated_addr = tce & tce_mask;
> +        tlb->addr_mask = ~tce_mask;
> +        tlb->perm = tce & 3;
> +        if ((is_write & !(tce & 2)) || ((!is_write) && !(tce & 1))) {
> +            phb_error(ds->phb, "TCE access fault at 0x%"PRIx64, taddr);
> +            phb_error(ds->phb, " xlate %"PRIx64":%c TVE=%"PRIx64, addr,
> +                       is_write ? 'W' : 'R', tve);
> +            phb_error(ds->phb, " tta=%"PRIx64" lev=%d tts=%d tps=%d",
> +                       tta, lev, tts, tps);
> +        }
> +    }
> +}
> +
> +static IOMMUTLBEntry pnv_phb4_translate_iommu(IOMMUMemoryRegion *iommu,
> +                                              hwaddr addr,
> +                                              IOMMUAccessFlags flag,
> +                                              int iommu_idx)
> +{
> +    PnvPhb4DMASpace *ds = container_of(iommu, PnvPhb4DMASpace, dma_mr);
> +    int tve_sel;
> +    uint64_t tve, cfg;
> +    IOMMUTLBEntry ret = {
> +        .target_as = &address_space_memory,
> +        .iova = addr,
> +        .translated_addr = 0,
> +        .addr_mask = ~(hwaddr)0,
> +        .perm = IOMMU_NONE,
> +    };
> +
> +    /* Resolve PE# */
> +    if (!pnv_phb4_resolve_pe(ds)) {
> +        phb_error(ds->phb, "Failed to resolve PE# for bus @%p (%d) devfn 0x%x",
> +                   ds->bus, pci_bus_num(ds->bus), ds->devfn);
> +        return ret;
> +    }
> +
> +    /* Check top bits */
> +    switch (addr >> 60) {
> +    case 00:
> +        /* DMA or 32-bit MSI ? */
> +        cfg = ds->phb->regs[PHB_PHB4_CONFIG >> 3];
> +        if ((cfg & PHB_PHB4C_32BIT_MSI_EN) &&
> +            ((addr & 0xffffffffffff0000ull) == 0xffff0000ull)) {
> +            phb_error(ds->phb, "xlate on 32-bit MSI region");
> +            return ret;
> +        }
> +        /* Choose TVE XXX Use PHB4 Control Register */
> +        tve_sel = (addr >> 59) & 1;
> +        tve = ds->phb->ioda_TVT[ds->pe_num * 2 + tve_sel];
> +        pnv_phb4_translate_tve(ds, addr, flag & IOMMU_WO, tve, &ret);
> +        break;
> +    case 01:
> +        phb_error(ds->phb, "xlate on 64-bit MSI region");
> +        break;
> +    default:
> +        phb_error(ds->phb, "xlate on unsupported address 0x%"PRIx64, addr);
> +    }
> +    return ret;
> +}
> +
> +#define TYPE_PNV_PHB4_IOMMU_MEMORY_REGION "pnv-phb4-iommu-memory-region"
> +#define PNV_PHB4_IOMMU_MEMORY_REGION(obj) \
> +    OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_PNV_PHB4_IOMMU_MEMORY_REGION)
> +
> +static void pnv_phb4_iommu_memory_region_class_init(ObjectClass *klass,
> +                                                    void *data)
> +{
> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
> +
> +    imrc->translate = pnv_phb4_translate_iommu;
> +}
> +
> +static const TypeInfo pnv_phb4_iommu_memory_region_info = {
> +    .parent = TYPE_IOMMU_MEMORY_REGION,
> +    .name = TYPE_PNV_PHB4_IOMMU_MEMORY_REGION,
> +    .class_init = pnv_phb4_iommu_memory_region_class_init,
> +};
> +
> +/*
> + * MSI/MSIX memory region implementation.
> + * The handler handles both MSI and MSIX.
> + */
> +static void pnv_phb4_msi_write(void *opaque, hwaddr addr,
> +                               uint64_t data, unsigned size)
> +{
> +    PnvPhb4DMASpace *ds = opaque;
> +    PnvPHB4 *phb = ds->phb;
> +
> +    uint32_t src = ((addr >> 4) & 0xffff) | (data & 0x1f);
> +
> +    /* Resolve PE# */
> +    if (!pnv_phb4_resolve_pe(ds)) {
> +        phb_error(phb, "Failed to resolve PE# for bus @%p (%d) devfn 0x%x",
> +                   ds->bus, pci_bus_num(ds->bus), ds->devfn);
> +        return;
> +    }
> +
> +    /* TODO: Check it doesn't collide with LSIs */
> +    if (src >= phb->xsrc.nr_irqs) {
> +        phb_error(phb, "MSI %d out of bounds", src);
> +        return;
> +    }
> +
> +    /* TODO: check PE/MSI assignement */
> +
> +    qemu_irq_pulse(phb->qirqs[src]);
> +}
> +
> +/* There is no .read as the read result is undefined by PCI spec */
> +static uint64_t pnv_phb4_msi_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    PnvPhb4DMASpace *ds = opaque;
> +
> +    phb_error(ds->phb, "Invalid MSI read @ 0x%" HWADDR_PRIx, addr);
> +    return -1;
> +}
> +
> +static const MemoryRegionOps pnv_phb4_msi_ops = {
> +    .read = pnv_phb4_msi_read,
> +    .write = pnv_phb4_msi_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN
> +};
> +
> +static PnvPhb4DMASpace *pnv_phb4_dma_find(PnvPHB4 *phb, PCIBus *bus, int devfn)
> +{
> +    PnvPhb4DMASpace *ds;
> +
> +    QLIST_FOREACH(ds, &phb->dma_spaces, list) {
> +        if (ds->bus == bus && ds->devfn == devfn) {
> +            break;
> +        }
> +    }
> +    return ds;
> +}
> +
> +static AddressSpace *pnv_phb4_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> +{
> +    PnvPHB4 *phb = opaque;
> +    PnvPhb4DMASpace *ds;
> +    char name[32];
> +
> +    ds = pnv_phb4_dma_find(phb, bus, devfn);
> +
> +    if (ds == NULL) {
> +        ds = g_malloc0(sizeof(PnvPhb4DMASpace));
> +        ds->bus = bus;
> +        ds->devfn = devfn;
> +        ds->pe_num = PHB_INVALID_PE;
> +        ds->phb = phb;
> +        snprintf(name, sizeof(name), "phb4-%d.%d-iommu", phb->chip_id,
> +                 phb->phb_id);
> +        memory_region_init_iommu(&ds->dma_mr, sizeof(ds->dma_mr),
> +                                 TYPE_PNV_PHB4_IOMMU_MEMORY_REGION,
> +                                 OBJECT(phb), name, UINT64_MAX);
> +        address_space_init(&ds->dma_as, MEMORY_REGION(&ds->dma_mr),
> +                           name);
> +        memory_region_init_io(&ds->msi32_mr, OBJECT(phb), &pnv_phb4_msi_ops,
> +                              ds, "msi32", 0x10000);
> +        memory_region_init_io(&ds->msi64_mr, OBJECT(phb), &pnv_phb4_msi_ops,
> +                              ds, "msi64", 0x100000);
> +        pnv_phb4_update_msi_regions(ds);
> +
> +        QLIST_INSERT_HEAD(&phb->dma_spaces, ds, list);
> +    }
> +    return &ds->dma_as;
> +}
> +
> +static void pnv_phb4_instance_init(Object *obj)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(obj);
> +
> +    QLIST_INIT(&phb->dma_spaces);
> +
> +    /* XIVE interrupt source object */
> +    object_initialize_child(obj, "source", &phb->xsrc, sizeof(XiveSource),
> +                            TYPE_XIVE_SOURCE, &error_abort, NULL);
> +
> +    /* Root Port */
> +    object_initialize_child(obj, "root", &phb->root, sizeof(phb->root),
> +                            TYPE_PNV_PHB4_ROOT_PORT, &error_abort, NULL);
> +
> +    qdev_prop_set_int32(DEVICE(&phb->root), "addr", PCI_DEVFN(0, 0));
> +    qdev_prop_set_bit(DEVICE(&phb->root), "multifunction", false);
> +}
> +
> +static void pnv_phb4_realize(DeviceState *dev, Error **errp)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(dev);
> +    PCIHostState *pci = PCI_HOST_BRIDGE(dev);
> +    XiveSource *xsrc = &phb->xsrc;
> +    Error *local_err = NULL;
> +    int nr_irqs;
> +    char name[32];
> +
> +    assert(phb->stack);
> +
> +    /* Set the "big_phb" flag */
> +    phb->big_phb = phb->phb_id == 0 || phb->phb_id == 3;
> +
> +    /* Controller Registers */
> +    snprintf(name, sizeof(name), "phb4-%d.%d-regs", phb->chip_id,
> +             phb->phb_id);
> +    memory_region_init_io(&phb->mr_regs, OBJECT(phb), &pnv_phb4_reg_ops, phb,
> +                          name, 0x2000);
> +
> +    /*
> +     * PHB4 doesn't support IO space. However, qemu gets very upset if
> +     * we don't have an IO region to anchor IO BARs onto so we just
> +     * initialize one which we never hook up to anything
> +     */
> +
> +    snprintf(name, sizeof(name), "phb4-%d.%d-pci-io", phb->chip_id,
> +             phb->phb_id);
> +    memory_region_init(&phb->pci_io, OBJECT(phb), name, 0x10000);
> +
> +    snprintf(name, sizeof(name), "phb4-%d.%d-pci-mmio", phb->chip_id,
> +             phb->phb_id);
> +    memory_region_init(&phb->pci_mmio, OBJECT(phb), name,
> +                       PCI_MMIO_TOTAL_SIZE);
> +
> +    pci->bus = pci_register_root_bus(dev, "root-bus",
> +                                     pnv_phb4_set_irq, pnv_phb4_map_irq, phb,
> +                                     &phb->pci_mmio, &phb->pci_io,
> +                                     0, 4, TYPE_PNV_PHB4_ROOT_BUS);
> +    pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb);
> +
> +    /* Add a single Root port */
> +    qdev_prop_set_uint8(DEVICE(&phb->root), "chassis", phb->chip_id);
> +    qdev_prop_set_uint16(DEVICE(&phb->root), "slot", phb->phb_id);
> +    qdev_set_parent_bus(DEVICE(&phb->root), BUS(pci->bus));
> +    qdev_init_nofail(DEVICE(&phb->root));
> +
> +    /* Setup XIVE Source */
> +    if (phb->big_phb) {
> +        nr_irqs = PNV_PHB4_MAX_INTs;
> +    } else {
> +        nr_irqs = PNV_PHB4_MAX_INTs >> 1;
> +    }
> +    object_property_set_int(OBJECT(xsrc), nr_irqs, "nr-irqs", &error_fatal);
> +    object_property_set_link(OBJECT(xsrc), OBJECT(phb), "xive", &error_fatal);
> +    object_property_set_bool(OBJECT(xsrc), true, "realized", &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    pnv_phb4_update_xsrc(phb);
> +
> +    phb->qirqs = qemu_allocate_irqs(xive_source_set_irq, xsrc, xsrc->nr_irqs);
> +}
> +
> +static void pnv_phb4_reset(DeviceState *dev)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(dev);
> +    PCIDevice *root_dev = PCI_DEVICE(&phb->root);
> +
> +    /*
> +     * Configure PCI device id at reset using a property.
> +     */
> +    pci_config_set_vendor_id(root_dev->config, PCI_VENDOR_ID_IBM);
> +    pci_config_set_device_id(root_dev->config, phb->device_id);
> +}
> +
> +static const char *pnv_phb4_root_bus_path(PCIHostState *host_bridge,
> +                                          PCIBus *rootbus)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(host_bridge);
> +
> +    snprintf(phb->bus_path, sizeof(phb->bus_path), "00%02x:%02x",
> +             phb->chip_id, phb->phb_id);
> +    return phb->bus_path;
> +}
> +
> +static void pnv_phb4_xive_notify(XiveNotifier *xf, uint32_t srcno)
> +{
> +    PnvPHB4 *phb = PNV_PHB4(xf);
> +    uint64_t notif_port = phb->regs[PHB_INT_NOTIFY_ADDR >> 3];
> +    uint32_t offset = phb->regs[PHB_INT_NOTIFY_INDEX >> 3];
> +    uint64_t data = XIVE_TRIGGER_PQ | offset | srcno;
> +    MemTxResult result;
> +
> +    address_space_stq_be(&address_space_memory, notif_port, data,
> +                         MEMTXATTRS_UNSPECIFIED, &result);
> +    if (result != MEMTX_OK) {
> +        phb_error(phb, "trigger failed @%"HWADDR_PRIx "\n", notif_port);
> +        return;
> +    }
> +}
> +
> +static Property pnv_phb4_properties[] = {
> +        DEFINE_PROP_UINT32("index", PnvPHB4, phb_id, 0),
> +        DEFINE_PROP_UINT32("chip-id", PnvPHB4, chip_id, 0),
> +        DEFINE_PROP_UINT64("version", PnvPHB4, version, 0),
> +        DEFINE_PROP_UINT16("device-id", PnvPHB4, device_id, 0),
> +        DEFINE_PROP_LINK("stack", PnvPHB4, stack, TYPE_PNV_PHB4_PEC_STACK,
> +                         PnvPhb4PecStack *),
> +        DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void pnv_phb4_class_init(ObjectClass *klass, void *data)
> +{
> +    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    XiveNotifierClass *xfc = XIVE_NOTIFIER_CLASS(klass);
> +
> +    hc->root_bus_path   = pnv_phb4_root_bus_path;
> +    dc->realize         = pnv_phb4_realize;
> +    dc->props           = pnv_phb4_properties;
> +    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> +    dc->user_creatable  = true;
> +    dc->reset           = pnv_phb4_reset;
> +
> +    xfc->notify         = pnv_phb4_xive_notify;
> +}
> +
> +static const TypeInfo pnv_phb4_type_info = {
> +    .name          = TYPE_PNV_PHB4,
> +    .parent        = TYPE_PCIE_HOST_BRIDGE,
> +    .instance_init = pnv_phb4_instance_init,
> +    .instance_size = sizeof(PnvPHB4),
> +    .class_init    = pnv_phb4_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +            { TYPE_XIVE_NOTIFIER },
> +            { },
> +    }
> +};
> +
> +static void pnv_phb4_root_bus_class_init(ObjectClass *klass, void *data)
> +{
> +    BusClass *k = BUS_CLASS(klass);
> +
> +    /*
> +     * PHB4 has only a single root complex. Enforce the limit on the
> +     * parent bus
> +     */
> +    k->max_dev = 1;
> +}
> +
> +static const TypeInfo pnv_phb4_root_bus_info = {
> +    .name = TYPE_PNV_PHB4_ROOT_BUS,
> +    .parent = TYPE_PCIE_BUS,
> +    .class_init = pnv_phb4_root_bus_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { INTERFACE_PCIE_DEVICE },
> +        { }
> +    },
> +};
> +
> +static void pnv_phb4_root_port_reset(DeviceState *dev)
> +{
> +    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
> +    PCIDevice *d = PCI_DEVICE(dev);
> +    uint8_t *conf = d->config;
> +
> +    rpc->parent_reset(dev);
> +
> +    pci_byte_test_and_set_mask(conf + PCI_IO_BASE,
> +                               PCI_IO_RANGE_MASK & 0xff);
> +    pci_byte_test_and_clear_mask(conf + PCI_IO_LIMIT,
> +                                 PCI_IO_RANGE_MASK & 0xff);
> +    pci_set_word(conf + PCI_MEMORY_BASE, 0);
> +    pci_set_word(conf + PCI_MEMORY_LIMIT, 0xfff0);
> +    pci_set_word(conf + PCI_PREF_MEMORY_BASE, 0x1);
> +    pci_set_word(conf + PCI_PREF_MEMORY_LIMIT, 0xfff1);
> +    pci_set_long(conf + PCI_PREF_BASE_UPPER32, 0x1); /* Hack */
> +    pci_set_long(conf + PCI_PREF_LIMIT_UPPER32, 0xffffffff);
> +}
> +
> +static void pnv_phb4_root_port_realize(DeviceState *dev, Error **errp)
> +{
> +    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
> +    Error *local_err = NULL;
> +
> +    rpc->parent_realize(dev, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +static void pnv_phb4_root_port_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +    PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
> +
> +    dc->desc     = "IBM PHB4 PCIE Root Port";
> +
> +    device_class_set_parent_realize(dc, pnv_phb4_root_port_realize,
> +                                    &rpc->parent_realize);
> +    device_class_set_parent_reset(dc, pnv_phb4_root_port_reset,
> +                                  &rpc->parent_reset);
> +
> +    k->vendor_id = PCI_VENDOR_ID_IBM;
> +    k->device_id = PNV_PHB4_DEVICE_ID;
> +    k->revision  = 0;
> +
> +    rpc->exp_offset = 0x48;
> +    rpc->aer_offset = 0x100;
> +
> +    dc->reset = &pnv_phb4_root_port_reset;
> +}
> +
> +static const TypeInfo pnv_phb4_root_port_info = {
> +    .name          = TYPE_PNV_PHB4_ROOT_PORT,
> +    .parent        = TYPE_PCIE_ROOT_PORT,
> +    .instance_size = sizeof(PnvPHB4RootPort),
> +    .class_init    = pnv_phb4_root_port_class_init,
> +};
> +
> +static void pnv_phb4_register_types(void)
> +{
> +    type_register_static(&pnv_phb4_root_bus_info);
> +    type_register_static(&pnv_phb4_root_port_info);
> +    type_register_static(&pnv_phb4_type_info);
> +    type_register_static(&pnv_phb4_iommu_memory_region_info);
> +}
> +
> +type_init(pnv_phb4_register_types);
> +
> +void pnv_phb4_update_regions(PnvPhb4PecStack *stack)
> +{
> +    PnvPHB4 *phb = &stack->phb;
> +
> +    /* Unmap first always */
> +    if (memory_region_is_mapped(&phb->mr_regs)) {
> +        memory_region_del_subregion(&stack->phbbar, &phb->mr_regs);
> +    }
> +    if (memory_region_is_mapped(&phb->xsrc.esb_mmio)) {
> +        memory_region_del_subregion(&stack->intbar, &phb->xsrc.esb_mmio);
> +    }
> +
> +    /* Map registers if enabled */
> +    if (memory_region_is_mapped(&stack->phbbar)) {
> +        memory_region_add_subregion(&stack->phbbar, 0, &phb->mr_regs);
> +    }
> +
> +    /* Map ESB if enabled */
> +    if (memory_region_is_mapped(&stack->intbar)) {
> +        memory_region_add_subregion(&stack->intbar, 0, &phb->xsrc.esb_mmio);
> +    }
> +
> +    /* Check/update m32 */
> +    pnv_phb4_check_all_mbt(phb);
> +}
> +
> +void pnv_phb4_pic_print_info(PnvPHB4 *phb, Monitor *mon)
> +{
> +    uint32_t offset = phb->regs[PHB_INT_NOTIFY_INDEX >> 3];
> +
> +    monitor_printf(mon, "PHB4[%x:%x] Source %08x .. %08x\n",
> +                   phb->chip_id, phb->phb_id,
> +                   offset, offset + phb->xsrc.nr_irqs - 1);
> +    xive_source_pic_print_info(&phb->xsrc, 0, mon);
> +}
> diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
> new file mode 100644
> index 000000000000..ea400bf6a1fb
> --- /dev/null
> +++ b/hw/pci-host/pnv_phb4_pec.c
> @@ -0,0 +1,593 @@
> +/*
> + * QEMU PowerPC PowerNV (POWER9) PHB4 model
> + *
> + * Copyright (c) 2018-2020, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +#include "qemu/log.h"
> +#include "target/ppc/cpu.h"
> +#include "hw/ppc/fdt.h"
> +#include "hw/pci-host/pnv_phb4_regs.h"
> +#include "hw/pci-host/pnv_phb4.h"
> +#include "hw/ppc/pnv_xscom.h"
> +#include "hw/pci/pci_bridge.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/ppc/pnv.h"
> +#include "hw/qdev-properties.h"
> +
> +#include <libfdt.h>
> +
> +#define phb_pec_error(pec, fmt, ...)                                    \
> +    qemu_log_mask(LOG_GUEST_ERROR, "phb4_pec[%d:%d]: " fmt "\n",        \
> +                  (pec)->chip_id, (pec)->index, ## __VA_ARGS__)
> +
> +
> +static uint64_t pnv_pec_nest_xscom_read(void *opaque, hwaddr addr,
> +                                        unsigned size)
> +{
> +    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
> +    uint32_t reg = addr >> 3;
> +
> +    /* TODO: add list of allowed registers and error out if not */
> +    return pec->nest_regs[reg];
> +}
> +
> +static void pnv_pec_nest_xscom_write(void *opaque, hwaddr addr,
> +                                     uint64_t val, unsigned size)
> +{
> +    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
> +    uint32_t reg = addr >> 3;
> +
> +    switch (reg) {
> +    case PEC_NEST_PBCQ_HW_CONFIG:
> +    case PEC_NEST_DROP_PRIO_CTRL:
> +    case PEC_NEST_PBCQ_ERR_INJECT:
> +    case PEC_NEST_PCI_NEST_CLK_TRACE_CTL:
> +    case PEC_NEST_PBCQ_PMON_CTRL:
> +    case PEC_NEST_PBCQ_PBUS_ADDR_EXT:
> +    case PEC_NEST_PBCQ_PRED_VEC_TIMEOUT:
> +    case PEC_NEST_CAPP_CTRL:
> +    case PEC_NEST_PBCQ_READ_STK_OVR:
> +    case PEC_NEST_PBCQ_WRITE_STK_OVR:
> +    case PEC_NEST_PBCQ_STORE_STK_OVR:
> +    case PEC_NEST_PBCQ_RETRY_BKOFF_CTRL:
> +        pec->nest_regs[reg] = val;
> +        break;
> +    default:
> +        phb_pec_error(pec, "%s @0x%"HWADDR_PRIx"=%"PRIx64"\n", __func__,
> +                      addr, val);
> +    }
> +}
> +
> +static const MemoryRegionOps pnv_pec_nest_xscom_ops = {
> +    .read = pnv_pec_nest_xscom_read,
> +    .write = pnv_pec_nest_xscom_write,
> +    .valid.min_access_size = 8,
> +    .valid.max_access_size = 8,
> +    .impl.min_access_size = 8,
> +    .impl.max_access_size = 8,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
> +
> +static uint64_t pnv_pec_pci_xscom_read(void *opaque, hwaddr addr,
> +                                       unsigned size)
> +{
> +    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
> +    uint32_t reg = addr >> 3;
> +
> +    /* TODO: add list of allowed registers and error out if not */
> +    return pec->pci_regs[reg];
> +}
> +
> +static void pnv_pec_pci_xscom_write(void *opaque, hwaddr addr,
> +                                    uint64_t val, unsigned size)
> +{
> +    PnvPhb4PecState *pec = PNV_PHB4_PEC(opaque);
> +    uint32_t reg = addr >> 3;
> +
> +    switch (reg) {
> +    case PEC_PCI_PBAIB_HW_CONFIG:
> +    case PEC_PCI_PBAIB_READ_STK_OVR:
> +        pec->pci_regs[reg] = val;
> +        break;
> +    default:
> +        phb_pec_error(pec, "%s @0x%"HWADDR_PRIx"=%"PRIx64"\n", __func__,
> +                      addr, val);
> +    }
> +}
> +
> +static const MemoryRegionOps pnv_pec_pci_xscom_ops = {
> +    .read = pnv_pec_pci_xscom_read,
> +    .write = pnv_pec_pci_xscom_write,
> +    .valid.min_access_size = 8,
> +    .valid.max_access_size = 8,
> +    .impl.min_access_size = 8,
> +    .impl.max_access_size = 8,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
> +
> +static uint64_t pnv_pec_stk_nest_xscom_read(void *opaque, hwaddr addr,
> +                                            unsigned size)
> +{
> +    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
> +    uint32_t reg = addr >> 3;
> +
> +    /* TODO: add list of allowed registers and error out if not */
> +    return stack->nest_regs[reg];
> +}
> +
> +static void pnv_pec_stk_update_map(PnvPhb4PecStack *stack)
> +{
> +    PnvPhb4PecState *pec = stack->pec;
> +    MemoryRegion *sysmem = pec->system_memory;
> +    uint64_t bar_en = stack->nest_regs[PEC_NEST_STK_BAR_EN];
> +    uint64_t bar, mask, size;
> +    char name[64];
> +
> +    /*
> +     * NOTE: This will really not work well if those are remapped
> +     * after the PHB has created its sub regions. We could do better
> +     * if we had a way to resize regions but we don't really care
> +     * that much in practice as the stuff below really only happens
> +     * once early during boot
> +     */
> +
> +    /* Handle unmaps */
> +    if (memory_region_is_mapped(&stack->mmbar0) &&
> +        !(bar_en & PEC_NEST_STK_BAR_EN_MMIO0)) {
> +        memory_region_del_subregion(sysmem, &stack->mmbar0);
> +    }
> +    if (memory_region_is_mapped(&stack->mmbar1) &&
> +        !(bar_en & PEC_NEST_STK_BAR_EN_MMIO1)) {
> +        memory_region_del_subregion(sysmem, &stack->mmbar1);
> +    }
> +    if (memory_region_is_mapped(&stack->phbbar) &&
> +        !(bar_en & PEC_NEST_STK_BAR_EN_PHB)) {
> +        memory_region_del_subregion(sysmem, &stack->phbbar);
> +    }
> +    if (memory_region_is_mapped(&stack->intbar) &&
> +        !(bar_en & PEC_NEST_STK_BAR_EN_INT)) {
> +        memory_region_del_subregion(sysmem, &stack->intbar);
> +    }
> +
> +    /* Update PHB */
> +    pnv_phb4_update_regions(stack);
> +
> +    /* Handle maps */
> +    if (!memory_region_is_mapped(&stack->mmbar0) &&
> +        (bar_en & PEC_NEST_STK_BAR_EN_MMIO0)) {
> +        bar = stack->nest_regs[PEC_NEST_STK_MMIO_BAR0] >> 8;
> +        mask = stack->nest_regs[PEC_NEST_STK_MMIO_BAR0_MASK];
> +        size = ((~mask) >> 8) + 1;
> +        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-mmio0",
> +                 pec->chip_id, pec->index, stack->stack_no);
> +        memory_region_init(&stack->mmbar0, OBJECT(stack), name, size);
> +        memory_region_add_subregion(sysmem, bar, &stack->mmbar0);
> +        stack->mmio0_base = bar;
> +        stack->mmio0_size = size;
> +    }
> +    if (!memory_region_is_mapped(&stack->mmbar1) &&
> +        (bar_en & PEC_NEST_STK_BAR_EN_MMIO1)) {
> +        bar = stack->nest_regs[PEC_NEST_STK_MMIO_BAR1] >> 8;
> +        mask = stack->nest_regs[PEC_NEST_STK_MMIO_BAR1_MASK];
> +        size = ((~mask) >> 8) + 1;
> +        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-mmio1",
> +                 pec->chip_id, pec->index, stack->stack_no);
> +        memory_region_init(&stack->mmbar1, OBJECT(stack), name, size);
> +        memory_region_add_subregion(sysmem, bar, &stack->mmbar1);
> +        stack->mmio1_base = bar;
> +        stack->mmio1_size = size;
> +    }
> +    if (!memory_region_is_mapped(&stack->phbbar) &&
> +        (bar_en & PEC_NEST_STK_BAR_EN_PHB)) {
> +        bar = stack->nest_regs[PEC_NEST_STK_PHB_REGS_BAR] >> 8;
> +        size = PNV_PHB4_NUM_REGS << 3;
> +        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-phb",
> +                 pec->chip_id, pec->index, stack->stack_no);
> +        memory_region_init(&stack->phbbar, OBJECT(stack), name, size);
> +        memory_region_add_subregion(sysmem, bar, &stack->phbbar);
> +    }
> +    if (!memory_region_is_mapped(&stack->intbar) &&
> +        (bar_en & PEC_NEST_STK_BAR_EN_INT)) {
> +        bar = stack->nest_regs[PEC_NEST_STK_INT_BAR] >> 8;
> +        size = PNV_PHB4_MAX_INTs << 16;
> +        snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-int",
> +                 stack->pec->chip_id, stack->pec->index, stack->stack_no);
> +        memory_region_init(&stack->intbar, OBJECT(stack), name, size);
> +        memory_region_add_subregion(sysmem, bar, &stack->intbar);
> +    }
> +
> +    /* Update PHB */
> +    pnv_phb4_update_regions(stack);
> +}
> +
> +static void pnv_pec_stk_nest_xscom_write(void *opaque, hwaddr addr,
> +                                         uint64_t val, unsigned size)
> +{
> +    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
> +    PnvPhb4PecState *pec = stack->pec;
> +    uint32_t reg = addr >> 3;
> +
> +    switch (reg) {
> +    case PEC_NEST_STK_PCI_NEST_FIR:
> +        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] = val;
> +        break;
> +    case PEC_NEST_STK_PCI_NEST_FIR_CLR:
> +        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] &= val;
> +        break;
> +    case PEC_NEST_STK_PCI_NEST_FIR_SET:
> +        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] |= val;
> +        break;
> +    case PEC_NEST_STK_PCI_NEST_FIR_MSK:
> +        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] = val;
> +        break;
> +    case PEC_NEST_STK_PCI_NEST_FIR_MSKC:
> +        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] &= val;
> +        break;
> +    case PEC_NEST_STK_PCI_NEST_FIR_MSKS:
> +        stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] |= val;
> +        break;
> +    case PEC_NEST_STK_PCI_NEST_FIR_ACT0:
> +    case PEC_NEST_STK_PCI_NEST_FIR_ACT1:
> +        stack->nest_regs[reg] = val;
> +        break;
> +    case PEC_NEST_STK_PCI_NEST_FIR_WOF:
> +        stack->nest_regs[reg] = 0;
> +        break;
> +    case PEC_NEST_STK_ERR_REPORT_0:
> +    case PEC_NEST_STK_ERR_REPORT_1:
> +    case PEC_NEST_STK_PBCQ_GNRL_STATUS:
> +        /* Flag error ? */
> +        break;
> +    case PEC_NEST_STK_PBCQ_MODE:
> +        stack->nest_regs[reg] = val & 0xff00000000000000ull;
> +        break;
> +    case PEC_NEST_STK_MMIO_BAR0:
> +    case PEC_NEST_STK_MMIO_BAR0_MASK:
> +    case PEC_NEST_STK_MMIO_BAR1:
> +    case PEC_NEST_STK_MMIO_BAR1_MASK:
> +        if (stack->nest_regs[PEC_NEST_STK_BAR_EN] &
> +            (PEC_NEST_STK_BAR_EN_MMIO0 |
> +             PEC_NEST_STK_BAR_EN_MMIO1)) {
> +            phb_pec_error(pec, "Changing enabled BAR unsupported\n");
> +        }
> +        stack->nest_regs[reg] = val & 0xffffffffff000000ull;
> +        break;
> +    case PEC_NEST_STK_PHB_REGS_BAR:
> +        if (stack->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_PHB) {
> +            phb_pec_error(pec, "Changing enabled BAR unsupported\n");
> +        }
> +        stack->nest_regs[reg] = val & 0xffffffffffc00000ull;
> +        break;
> +    case PEC_NEST_STK_INT_BAR:
> +        if (stack->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_INT) {
> +            phb_pec_error(pec, "Changing enabled BAR unsupported\n");
> +        }
> +        stack->nest_regs[reg] = val & 0xfffffff000000000ull;
> +        break;
> +    case PEC_NEST_STK_BAR_EN:
> +        stack->nest_regs[reg] = val & 0xf000000000000000ull;
> +        pnv_pec_stk_update_map(stack);
> +        break;
> +    case PEC_NEST_STK_DATA_FRZ_TYPE:
> +    case PEC_NEST_STK_PBCQ_TUN_BAR:
> +        /* Not used for now */
> +        stack->nest_regs[reg] = val;
> +        break;
> +    default:
> +        qemu_log_mask(LOG_UNIMP, "phb4_pec: nest_xscom_write 0x%"HWADDR_PRIx
> +                      "=%"PRIx64"\n", addr, val);
> +    }
> +}
> +
> +static const MemoryRegionOps pnv_pec_stk_nest_xscom_ops = {
> +    .read = pnv_pec_stk_nest_xscom_read,
> +    .write = pnv_pec_stk_nest_xscom_write,
> +    .valid.min_access_size = 8,
> +    .valid.max_access_size = 8,
> +    .impl.min_access_size = 8,
> +    .impl.max_access_size = 8,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
> +
> +static uint64_t pnv_pec_stk_pci_xscom_read(void *opaque, hwaddr addr,
> +                                           unsigned size)
> +{
> +    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
> +    uint32_t reg = addr >> 3;
> +
> +    /* TODO: add list of allowed registers and error out if not */
> +    return stack->pci_regs[reg];
> +}
> +
> +static void pnv_pec_stk_pci_xscom_write(void *opaque, hwaddr addr,
> +                                        uint64_t val, unsigned size)
> +{
> +    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
> +    uint32_t reg = addr >> 3;
> +
> +    switch (reg) {
> +    case PEC_PCI_STK_PCI_FIR:
> +        stack->nest_regs[reg] = val;
> +        break;
> +    case PEC_PCI_STK_PCI_FIR_CLR:
> +        stack->nest_regs[PEC_PCI_STK_PCI_FIR] &= val;
> +        break;
> +    case PEC_PCI_STK_PCI_FIR_SET:
> +        stack->nest_regs[PEC_PCI_STK_PCI_FIR] |= val;
> +        break;
> +    case PEC_PCI_STK_PCI_FIR_MSK:
> +        stack->nest_regs[reg] = val;
> +        break;
> +    case PEC_PCI_STK_PCI_FIR_MSKC:
> +        stack->nest_regs[PEC_PCI_STK_PCI_FIR_MSK] &= val;
> +        break;
> +    case PEC_PCI_STK_PCI_FIR_MSKS:
> +        stack->nest_regs[PEC_PCI_STK_PCI_FIR_MSK] |= val;
> +        break;
> +    case PEC_PCI_STK_PCI_FIR_ACT0:
> +    case PEC_PCI_STK_PCI_FIR_ACT1:
> +        stack->nest_regs[reg] = val;
> +        break;
> +    case PEC_PCI_STK_PCI_FIR_WOF:
> +        stack->nest_regs[reg] = 0;
> +        break;
> +    case PEC_PCI_STK_ETU_RESET:
> +        stack->nest_regs[reg] = val & 0x8000000000000000ull;
> +        /* TODO: Implement reset */
> +        break;
> +    case PEC_PCI_STK_PBAIB_ERR_REPORT:
> +        break;
> +    case PEC_PCI_STK_PBAIB_TX_CMD_CRED:
> +    case PEC_PCI_STK_PBAIB_TX_DAT_CRED:
> +        stack->nest_regs[reg] = val;
> +        break;
> +    default:
> +        qemu_log_mask(LOG_UNIMP, "phb4_pec_stk: pci_xscom_write 0x%"HWADDR_PRIx
> +                      "=%"PRIx64"\n", addr, val);
> +    }
> +}
> +
> +static const MemoryRegionOps pnv_pec_stk_pci_xscom_ops = {
> +    .read = pnv_pec_stk_pci_xscom_read,
> +    .write = pnv_pec_stk_pci_xscom_write,
> +    .valid.min_access_size = 8,
> +    .valid.max_access_size = 8,
> +    .impl.min_access_size = 8,
> +    .impl.max_access_size = 8,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +};
> +
> +static void pnv_pec_instance_init(Object *obj)
> +{
> +    PnvPhb4PecState *pec = PNV_PHB4_PEC(obj);
> +    int i;
> +
> +    for (i = 0; i < PHB4_PEC_MAX_STACKS; i++) {
> +        object_initialize_child(obj, "stack[*]", &pec->stacks[i],
> +                                sizeof(pec->stacks[i]), TYPE_PNV_PHB4_PEC_STACK,
> +                                &error_abort, NULL);
> +    }
> +}
> +
> +static void pnv_pec_realize(DeviceState *dev, Error **errp)
> +{
> +    PnvPhb4PecState *pec = PNV_PHB4_PEC(dev);
> +    Error *local_err = NULL;
> +    char name[64];
> +    int i;
> +
> +    assert(pec->system_memory);
> +
> +    /* Create stacks */
> +    for (i = 0; i < pec->num_stacks; i++) {
> +        PnvPhb4PecStack *stack = &pec->stacks[i];
> +        Object *stk_obj = OBJECT(stack);
> +
> +        object_property_set_int(stk_obj, i, "stack-no", &error_abort);
> +        object_property_set_link(stk_obj, OBJECT(pec), "pec", &error_abort);
> +        object_property_set_bool(stk_obj, true, "realized", errp);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
> +    /* Initialize the XSCOM regions for the PEC registers */
> +    snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest", pec->chip_id,
> +             pec->index);
> +    pnv_xscom_region_init(&pec->nest_regs_mr, OBJECT(dev),
> +                          &pnv_pec_nest_xscom_ops, pec, name,
> +                          PHB4_PEC_NEST_REGS_COUNT);
> +
> +    snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci", pec->chip_id,
> +             pec->index);
> +    pnv_xscom_region_init(&pec->pci_regs_mr, OBJECT(dev),
> +                          &pnv_pec_pci_xscom_ops, pec, name,
> +                          PHB4_PEC_PCI_REGS_COUNT);
> +}
> +
> +static int pnv_pec_dt_xscom(PnvXScomInterface *dev, void *fdt,
> +                            int xscom_offset)
> +{
> +    PnvPhb4PecState *pec = PNV_PHB4_PEC(dev);
> +    PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(dev);
> +    uint32_t nbase = pecc->xscom_nest_base(pec);
> +    uint32_t pbase = pecc->xscom_pci_base(pec);
> +    int offset, i;
> +    char *name;
> +    uint32_t reg[] = {
> +        cpu_to_be32(nbase),
> +        cpu_to_be32(pecc->xscom_nest_size),
> +        cpu_to_be32(pbase),
> +        cpu_to_be32(pecc->xscom_pci_size),
> +    };
> +
> +    name = g_strdup_printf("pbcq@%x", nbase);
> +    offset = fdt_add_subnode(fdt, xscom_offset, name);
> +    _FDT(offset);
> +    g_free(name);
> +
> +    _FDT((fdt_setprop(fdt, offset, "reg", reg, sizeof(reg))));
> +
> +    _FDT((fdt_setprop_cell(fdt, offset, "ibm,pec-index", pec->index)));
> +    _FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 1)));
> +    _FDT((fdt_setprop_cell(fdt, offset, "#size-cells", 0)));
> +    _FDT((fdt_setprop(fdt, offset, "compatible", pecc->compat,
> +                      pecc->compat_size)));
> +
> +    for (i = 0; i < pec->num_stacks; i++) {
> +        PnvPhb4PecStack *stack = &pec->stacks[i];
> +        PnvPHB4 *phb = &stack->phb;
> +        int stk_offset;
> +
> +        name = g_strdup_printf("stack@%x", i);
> +        stk_offset = fdt_add_subnode(fdt, offset, name);
> +        _FDT(stk_offset);
> +        g_free(name);
> +        _FDT((fdt_setprop(fdt, stk_offset, "compatible", pecc->stk_compat,
> +                          pecc->stk_compat_size)));
> +        _FDT((fdt_setprop_cell(fdt, stk_offset, "reg", i)));
> +        _FDT((fdt_setprop_cell(fdt, stk_offset, "ibm,phb-index", phb->phb_id)));
> +    }
> +
> +    return 0;
> +}
> +
> +static Property pnv_pec_properties[] = {
> +        DEFINE_PROP_UINT32("index", PnvPhb4PecState, index, 0),
> +        DEFINE_PROP_UINT32("num-stacks", PnvPhb4PecState, num_stacks, 0),
> +        DEFINE_PROP_UINT32("chip-id", PnvPhb4PecState, chip_id, 0),
> +        DEFINE_PROP_LINK("system-memory", PnvPhb4PecState, system_memory,
> +                     TYPE_MEMORY_REGION, MemoryRegion *),
> +        DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static uint32_t pnv_pec_xscom_pci_base(PnvPhb4PecState *pec)
> +{
> +    return PNV9_XSCOM_PEC_PCI_BASE + 0x1000000 * pec->index;
> +}
> +
> +static uint32_t pnv_pec_xscom_nest_base(PnvPhb4PecState *pec)
> +{
> +    return PNV9_XSCOM_PEC_NEST_BASE + 0x400 * pec->index;
> +}
> +
> +static void pnv_pec_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PnvXScomInterfaceClass *xdc = PNV_XSCOM_INTERFACE_CLASS(klass);
> +    PnvPhb4PecClass *pecc = PNV_PHB4_PEC_CLASS(klass);
> +    static const char compat[] = "ibm,power9-pbcq";
> +    static const char stk_compat[] = "ibm,power9-phb-stack";
> +
> +    xdc->dt_xscom = pnv_pec_dt_xscom;
> +
> +    dc->realize = pnv_pec_realize;
> +    dc->props = pnv_pec_properties;
> +
> +    pecc->xscom_nest_base = pnv_pec_xscom_nest_base;
> +    pecc->xscom_pci_base  = pnv_pec_xscom_pci_base;
> +    pecc->xscom_nest_size = PNV9_XSCOM_PEC_NEST_SIZE;
> +    pecc->xscom_pci_size  = PNV9_XSCOM_PEC_PCI_SIZE;
> +    pecc->compat = compat;
> +    pecc->compat_size = sizeof(compat);
> +    pecc->stk_compat = stk_compat;
> +    pecc->stk_compat_size = sizeof(stk_compat);
> +}
> +
> +static const TypeInfo pnv_pec_type_info = {
> +    .name          = TYPE_PNV_PHB4_PEC,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(PnvPhb4PecState),
> +    .instance_init = pnv_pec_instance_init,
> +    .class_init    = pnv_pec_class_init,
> +    .class_size    = sizeof(PnvPhb4PecClass),
> +    .interfaces    = (InterfaceInfo[]) {
> +        { TYPE_PNV_XSCOM_INTERFACE },
> +        { }
> +    }
> +};
> +
> +static void pnv_pec_stk_instance_init(Object *obj)
> +{
> +    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(obj);
> +
> +    object_initialize_child(obj, "phb", &stack->phb, sizeof(stack->phb),
> +                            TYPE_PNV_PHB4, &error_abort, NULL);
> +}
> +
> +static void pnv_pec_stk_realize(DeviceState *dev, Error **errp)
> +{
> +    PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(dev);
> +    PnvPhb4PecState *pec = stack->pec;
> +    char name[64];
> +
> +    assert(pec);
> +
> +    /* Initialize the XSCOM regions for the stack registers */
> +    snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest-stack-%d",
> +             pec->chip_id, pec->index, stack->stack_no);
> +    pnv_xscom_region_init(&stack->nest_regs_mr, OBJECT(stack),
> +                          &pnv_pec_stk_nest_xscom_ops, stack, name,
> +                          PHB4_PEC_NEST_STK_REGS_COUNT);
> +
> +    snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-stack-%d",
> +             pec->chip_id, pec->index, stack->stack_no);
> +    pnv_xscom_region_init(&stack->pci_regs_mr, OBJECT(stack),
> +                          &pnv_pec_stk_pci_xscom_ops, stack, name,
> +                          PHB4_PEC_PCI_STK_REGS_COUNT);
> +
> +    /* PHB pass-through */
> +    snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-stack-%d-phb",
> +             pec->chip_id, pec->index, stack->stack_no);
> +    pnv_xscom_region_init(&stack->phb_regs_mr, OBJECT(&stack->phb),
> +                          &pnv_phb4_xscom_ops, &stack->phb, name, 0x40);
> +
> +    /*
> +     * Let the machine/chip realize the PHB object to customize more
> +     * easily some fields
> +     */
> +}
> +
> +static Property pnv_pec_stk_properties[] = {
> +        DEFINE_PROP_UINT32("stack-no", PnvPhb4PecStack, stack_no, 0),
> +        DEFINE_PROP_LINK("pec", PnvPhb4PecStack, pec, TYPE_PNV_PHB4_PEC,
> +                         PnvPhb4PecState *),
> +        DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void pnv_pec_stk_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->props = pnv_pec_stk_properties;
> +    dc->realize = pnv_pec_stk_realize;
> +
> +    /* TODO: reset regs ? */
> +}
> +
> +static const TypeInfo pnv_pec_stk_type_info = {
> +    .name          = TYPE_PNV_PHB4_PEC_STACK,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(PnvPhb4PecStack),
> +    .instance_init = pnv_pec_stk_instance_init,
> +    .class_init    = pnv_pec_stk_class_init,
> +    .interfaces    = (InterfaceInfo[]) {
> +        { TYPE_PNV_XSCOM_INTERFACE },
> +        { }
> +    }
> +};
> +
> +static void pnv_pec_register_types(void)
> +{
> +    type_register_static(&pnv_pec_type_info);
> +    type_register_static(&pnv_pec_stk_type_info);
> +}
> +
> +type_init(pnv_pec_register_types);
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index a4b073c2c529..44c74be81b66 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -40,6 +40,7 @@
>  #include "hw/intc/intc.h"
>  #include "hw/ipmi/ipmi.h"
>  #include "target/ppc/mmu-hash64.h"
> +#include "hw/pci/msi.h"
>  
>  #include "hw/ppc/xics.h"
>  #include "hw/qdev-properties.h"
> @@ -622,9 +623,17 @@ static void pnv_chip_power8_pic_print_info(PnvChip *chip, Monitor *mon)
>  static void pnv_chip_power9_pic_print_info(PnvChip *chip, Monitor *mon)
>  {
>      Pnv9Chip *chip9 = PNV9_CHIP(chip);
> +    int i, j;
>  
>      pnv_xive_pic_print_info(&chip9->xive, mon);
>      pnv_psi_pic_print_info(&chip9->psi, mon);
> +
> +    for (i = 0; i < PNV9_CHIP_MAX_PEC; i++) {
> +        PnvPhb4PecState *pec = &chip9->pecs[i];
> +        for (j = 0; j < pec->num_stacks; j++) {
> +            pnv_phb4_pic_print_info(&pec->stacks[j].phb, mon);
> +        }
> +    }
>  }
>  
>  static uint64_t pnv_chip_power8_xscom_core_base(PnvChip *chip,
> @@ -753,6 +762,9 @@ static void pnv_init(MachineState *machine)
>          }
>      }
>  
> +    /* MSIs are supported on this platform */
> +    msi_nonbroken = true;
> +
>      /*
>       * Check compatibility of the specified CPU with the machine
>       * default.
> @@ -1235,7 +1247,10 @@ static void pnv_chip_power8nvl_class_init(ObjectClass *klass, void *data)
>  
>  static void pnv_chip_power9_instance_init(Object *obj)
>  {
> +    PnvChip *chip = PNV_CHIP(obj);
>      Pnv9Chip *chip9 = PNV9_CHIP(obj);
> +    PnvChipClass *pcc = PNV_CHIP_GET_CLASS(obj);
> +    int i;
>  
>      object_initialize_child(obj, "xive", &chip9->xive, sizeof(chip9->xive),
>                              TYPE_PNV_XIVE, &error_abort, NULL);
> @@ -1253,6 +1268,17 @@ static void pnv_chip_power9_instance_init(Object *obj)
>  
>      object_initialize_child(obj, "homer",  &chip9->homer, sizeof(chip9->homer),
>                              TYPE_PNV9_HOMER, &error_abort, NULL);
> +
> +    for (i = 0; i < PNV9_CHIP_MAX_PEC; i++) {
> +        object_initialize_child(obj, "pec[*]", &chip9->pecs[i],
> +                                sizeof(chip9->pecs[i]), TYPE_PNV_PHB4_PEC,
> +                                &error_abort, NULL);
> +    }
> +
> +    /*
> +     * Number of PHBs is the chip default
> +     */
> +    chip->num_phbs = pcc->num_phbs;
>  }
>  
>  static void pnv_chip_quad_realize(Pnv9Chip *chip9, Error **errp)
> @@ -1281,6 +1307,78 @@ static void pnv_chip_quad_realize(Pnv9Chip *chip9, Error **errp)
>      }
>  }
>  
> +static void pnv_chip_power9_phb_realize(PnvChip *chip, Error **errp)
> +{
> +    Pnv9Chip *chip9 = PNV9_CHIP(chip);
> +    Error *local_err = NULL;
> +    int i, j;
> +    int phb_id = 0;
> +
> +    for (i = 0; i < PNV9_CHIP_MAX_PEC; i++) {
> +        PnvPhb4PecState *pec = &chip9->pecs[i];
> +        PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
> +        uint32_t pec_nest_base;
> +        uint32_t pec_pci_base;
> +
> +        object_property_set_int(OBJECT(pec), i, "index", &error_fatal);
> +        /*
> +         * PEC0 -> 1 stack
> +         * PEC1 -> 2 stacks
> +         * PEC2 -> 3 stacks
> +         */
> +        object_property_set_int(OBJECT(pec), i + 1, "num-stacks",
> +                                &error_fatal);
> +        object_property_set_int(OBJECT(pec), chip->chip_id, "chip-id",
> +                                 &error_fatal);
> +        object_property_set_link(OBJECT(pec), OBJECT(get_system_memory()),
> +                                 "system-memory", &error_abort);
> +        object_property_set_bool(OBJECT(pec), true, "realized", &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +
> +        pec_nest_base = pecc->xscom_nest_base(pec);
> +        pec_pci_base = pecc->xscom_pci_base(pec);
> +
> +        pnv_xscom_add_subregion(chip, pec_nest_base, &pec->nest_regs_mr);
> +        pnv_xscom_add_subregion(chip, pec_pci_base, &pec->pci_regs_mr);
> +
> +        for (j = 0; j < pec->num_stacks && phb_id < chip->num_phbs;
> +             j++, phb_id++) {
> +            PnvPhb4PecStack *stack = &pec->stacks[j];
> +            Object *obj = OBJECT(&stack->phb);
> +
> +            object_property_set_int(obj, phb_id, "index", &error_fatal);
> +            object_property_set_int(obj, chip->chip_id, "chip-id",
> +                                    &error_fatal);
> +            object_property_set_int(obj, PNV_PHB4_VERSION, "version",
> +                                    &error_fatal);
> +            object_property_set_int(obj, PNV_PHB4_DEVICE_ID, "device-id",
> +                                    &error_fatal);
> +            object_property_set_link(obj, OBJECT(stack), "stack", &error_abort);
> +            object_property_set_bool(obj, true, "realized", &local_err);
> +            if (local_err) {
> +                error_propagate(errp, local_err);
> +                return;
> +            }
> +            qdev_set_parent_bus(DEVICE(obj), sysbus_get_default());
> +
> +            /* Populate the XSCOM address space. */
> +            pnv_xscom_add_subregion(chip,
> +                                   pec_nest_base + 0x40 * (stack->stack_no + 1),
> +                                   &stack->nest_regs_mr);
> +            pnv_xscom_add_subregion(chip,
> +                                    pec_pci_base + 0x40 * (stack->stack_no + 1),
> +                                    &stack->pci_regs_mr);
> +            pnv_xscom_add_subregion(chip,
> +                                    pec_pci_base + PNV9_XSCOM_PEC_PCI_STK0 +
> +                                    0x40 * stack->stack_no,
> +                                    &stack->phb_regs_mr);
> +        }
> +    }
> +}
> +
>  static void pnv_chip_power9_realize(DeviceState *dev, Error **errp)
>  {
>      PnvChipClass *pcc = PNV_CHIP_GET_CLASS(dev);
> @@ -1383,6 +1481,13 @@ static void pnv_chip_power9_realize(DeviceState *dev, Error **errp)
>      /* Homer mmio region */
>      memory_region_add_subregion(get_system_memory(), PNV9_HOMER_BASE(chip),
>                                  &chip9->homer.regs);
> +
> +    /* PHBs */
> +    pnv_chip_power9_phb_realize(chip, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
>  }
>  
>  static uint32_t pnv_chip_power9_xscom_pcba(PnvChip *chip, uint64_t addr)
> @@ -1409,6 +1514,7 @@ static void pnv_chip_power9_class_init(ObjectClass *klass, void *data)
>      k->xscom_core_base = pnv_chip_power9_xscom_core_base;
>      k->xscom_pcba = pnv_chip_power9_xscom_pcba;
>      dc->desc = "PowerNV Chip POWER9";
> +    k->num_phbs = 6;
>  
>      device_class_set_parent_realize(dc, pnv_chip_power9_realize,
>                                      &k->parent_realize);
> @@ -1613,6 +1719,7 @@ static Property pnv_chip_properties[] = {
>      DEFINE_PROP_UINT32("nr-cores", PnvChip, nr_cores, 1),
>      DEFINE_PROP_UINT64("cores-mask", PnvChip, cores_mask, 0x0),
>      DEFINE_PROP_UINT32("nr-threads", PnvChip, nr_threads, 1),
> +    DEFINE_PROP_UINT32("num-phbs", PnvChip, num_phbs, 0),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
> index 9c466fab0101..8a296e2f93b2 100644
> --- a/hw/pci-host/Makefile.objs
> +++ b/hw/pci-host/Makefile.objs
> @@ -20,3 +20,4 @@ common-obj-$(CONFIG_PCI_EXPRESS_GENERIC_BRIDGE) += gpex.o
>  common-obj-$(CONFIG_PCI_EXPRESS_XILINX) += xilinx-pcie.o
>  
>  common-obj-$(CONFIG_PCI_EXPRESS_DESIGNWARE) += designware.o
> +obj-$(CONFIG_POWERNV) += pnv_phb4.o pnv_phb4_pec.o
> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
> index e27efe9a2459..354828bf132f 100644
> --- a/hw/ppc/Kconfig
> +++ b/hw/ppc/Kconfig
> @@ -135,6 +135,8 @@ config XIVE_SPAPR
>      default y
>      depends on PSERIES
>      select XIVE
> +    select PCI
> +    select PCIE_PORT
>  
>  config XIVE_KVM
>      bool

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
  2020-01-29  3:09   ` David Gibson
@ 2020-01-29  3:54     ` Oliver O'Halloran
  2020-01-29  6:16       ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver O'Halloran @ 2020-01-29  3:54 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Cédric Le Goater, Nicholas Piggin, qemu-devel

On Wed, Jan 29, 2020 at 2:09 PM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Mon, Jan 27, 2020 at 03:45:05PM +0100, Cédric Le Goater wrote:
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >

*snip*

> > +
> > +/*
> > + * The CONFIG_DATA register expects little endian accesses, but as the
> > + * region is big endian, we have to swap the value.
> > + */
> > +static void pnv_phb4_config_write(PnvPHB4 *phb, unsigned off,
> > +                                  unsigned size, uint64_t val)
> > +{
> > +    uint32_t cfg_addr, limit;
> > +    PCIDevice *pdev;
> > +
> > +    pdev = pnv_phb4_find_cfg_dev(phb);
> > +    if (!pdev) {
> > +        return;
> > +    }
> > +    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
> > +    cfg_addr |= off;
> > +    limit = pci_config_size(pdev);
> > +    if (limit <= cfg_addr) {
> > +        /*
> > +         * conventional pci device can be behind pcie-to-pci bridge.
> > +         * 256 <= addr < 4K has no effects.
> > +         */
> > +        return;
> > +    }
> > +    switch (size) {
> > +    case 1:
> > +        break;
> > +    case 2:
> > +        val = bswap16(val);
>
> I'm a little confused by these byteswaps.  As I see below the device
> is set to big endian, so the values passed in here should already be
> in host-native endian.  Why do you need the swap?  Are some of the
> registers in the bank BE and some LE?

All the registers are BE except for CONFIG_DATA, which isn't actually
a register. It's really a window into the config space of the device
specified in CONFIG_ADDR so it doesn't do any byte-swapping.

> > +        break;
> > +    case 4:
> > +        val = bswap32(val);
> > +        break;
> > +    default:
> > +        g_assert_not_reached();
> > +    }
> > +    pci_host_config_write_common(pdev, cfg_addr, limit, val, size);
> > +}


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
  2020-01-29  3:54     ` Oliver O'Halloran
@ 2020-01-29  6:16       ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2020-01-29  6:16 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: qemu-ppc, Cédric Le Goater, Nicholas Piggin, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]

On Wed, Jan 29, 2020 at 02:54:19PM +1100, Oliver O'Halloran wrote:
> On Wed, Jan 29, 2020 at 2:09 PM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Mon, Jan 27, 2020 at 03:45:05PM +0100, Cédric Le Goater wrote:
> > > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > >
> 
> *snip*
> 
> > > +
> > > +/*
> > > + * The CONFIG_DATA register expects little endian accesses, but as the
> > > + * region is big endian, we have to swap the value.
> > > + */
> > > +static void pnv_phb4_config_write(PnvPHB4 *phb, unsigned off,
> > > +                                  unsigned size, uint64_t val)
> > > +{
> > > +    uint32_t cfg_addr, limit;
> > > +    PCIDevice *pdev;
> > > +
> > > +    pdev = pnv_phb4_find_cfg_dev(phb);
> > > +    if (!pdev) {
> > > +        return;
> > > +    }
> > > +    cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
> > > +    cfg_addr |= off;
> > > +    limit = pci_config_size(pdev);
> > > +    if (limit <= cfg_addr) {
> > > +        /*
> > > +         * conventional pci device can be behind pcie-to-pci bridge.
> > > +         * 256 <= addr < 4K has no effects.
> > > +         */
> > > +        return;
> > > +    }
> > > +    switch (size) {
> > > +    case 1:
> > > +        break;
> > > +    case 2:
> > > +        val = bswap16(val);
> >
> > I'm a little confused by these byteswaps.  As I see below the device
> > is set to big endian, so the values passed in here should already be
> > in host-native endian.  Why do you need the swap?  Are some of the
> > registers in the bank BE and some LE?
> 
> All the registers are BE except for CONFIG_DATA, which isn't actually
> a register. It's really a window into the config space of the device
> specified in CONFIG_ADDR so it doesn't do any byte-swapping.

Ah, right, that makes sense.

> 
> > > +        break;
> > > +    case 4:
> > > +        val = bswap32(val);
> > > +        break;
> > > +    default:
> > > +        g_assert_not_reached();
> > > +    }
> > > +    pci_host_config_write_common(pdev, cfg_addr, limit, val, size);
> > > +}
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges
  2020-01-27 14:45 [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges Cédric Le Goater
  2020-01-27 14:45 ` [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge Cédric Le Goater
  2020-01-27 14:45 ` [PATCH 2/2] ppc/pnv: Add models for POWER8 PHB3 " Cédric Le Goater
@ 2020-01-29  6:31 ` David Gibson
  2020-01-29 13:15   ` Cédric Le Goater
  2 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2020-01-29  6:31 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, Oliver O'Halloran, qemu-devel, Nicholas Piggin

[-- Attachment #1: Type: text/plain, Size: 3459 bytes --]

On Mon, Jan 27, 2020 at 03:45:04PM +0100, Cédric Le Goater wrote:
> Hello,
> 
> These are models for the PCIe Host Bridges, PHB3 and PHB4, as found on
> POWER8 and POWER9 processors. It includes the PowerBus logic interface
> (PBCQ), IOMMU support, a single PCIe Gen.3/4 Root Complex, and support
> for MSI and LSI interrupt sources as found on each system depending on
> the interrupt controller: XICS or XIVE.
> 
> No default device layout is provided and PCI devices can be added on
> any of the available PCIe Root Port (pcie.0 .. 2) with address 0x0 as
> the firwware (skiboot) only accepts a single device per root port. To
> run a simple system with a network and a storage adapters, use a
> command line options such as :
> 
>   -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
>   -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
> 
>   -device megasas,id=scsi0,bus=pcie.1,addr=0x0
>   -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>   -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
> 
> If more are needed, include a bridge.
> 
> Multi chip is supported, each chip adding its set of PHB controllers
> and its PCI busses. The model doesn't emulate the EEH error handling
> and cold plugging PHB devices still needs some work.
> 
> XICS requires some adjustment to support the PHB3 MSI. The changes are
> provided in the PHB3 model but they could be decoupled in prereq
> patches.

Applied to ppc-for-5.0, thanks.

> 
> Thanks,
> 
> C.
> 
> Benjamin Herrenschmidt (1):
>   ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
> 
> Cédric Le Goater (1):
>   ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge
> 
>  include/hw/pci-host/pnv_phb3.h      |  164 +++
>  include/hw/pci-host/pnv_phb3_regs.h |  450 +++++++++
>  include/hw/pci-host/pnv_phb4.h      |  230 +++++
>  include/hw/pci-host/pnv_phb4_regs.h |  553 ++++++++++
>  include/hw/pci/pcie_port.h          |    1 +
>  include/hw/ppc/pnv.h                |   11 +
>  include/hw/ppc/pnv_xscom.h          |   20 +
>  include/hw/ppc/xics.h               |    5 +
>  hw/intc/xics.c                      |   14 +-
>  hw/pci-host/pnv_phb3.c              | 1195 ++++++++++++++++++++++
>  hw/pci-host/pnv_phb3_msi.c          |  349 +++++++
>  hw/pci-host/pnv_phb3_pbcq.c         |  357 +++++++
>  hw/pci-host/pnv_phb4.c              | 1438 +++++++++++++++++++++++++++
>  hw/pci-host/pnv_phb4_pec.c          |  593 +++++++++++
>  hw/ppc/pnv.c                        |  176 +++-
>  hw/pci-host/Makefile.objs           |    2 +
>  hw/ppc/Kconfig                      |    2 +
>  17 files changed, 5557 insertions(+), 3 deletions(-)
>  create mode 100644 include/hw/pci-host/pnv_phb3.h
>  create mode 100644 include/hw/pci-host/pnv_phb3_regs.h
>  create mode 100644 include/hw/pci-host/pnv_phb4.h
>  create mode 100644 include/hw/pci-host/pnv_phb4_regs.h
>  create mode 100644 hw/pci-host/pnv_phb3.c
>  create mode 100644 hw/pci-host/pnv_phb3_msi.c
>  create mode 100644 hw/pci-host/pnv_phb3_pbcq.c
>  create mode 100644 hw/pci-host/pnv_phb4.c
>  create mode 100644 hw/pci-host/pnv_phb4_pec.c
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges
  2020-01-29  6:31 ` [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges David Gibson
@ 2020-01-29 13:15   ` Cédric Le Goater
  2020-01-29 22:14     ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Cédric Le Goater @ 2020-01-29 13:15 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Oliver O'Halloran, qemu-devel, Nicholas Piggin

On 1/29/20 7:31 AM, David Gibson wrote:
> On Mon, Jan 27, 2020 at 03:45:04PM +0100, Cédric Le Goater wrote:
>> Hello,
>>
>> These are models for the PCIe Host Bridges, PHB3 and PHB4, as found on
>> POWER8 and POWER9 processors. It includes the PowerBus logic interface
>> (PBCQ), IOMMU support, a single PCIe Gen.3/4 Root Complex, and support
>> for MSI and LSI interrupt sources as found on each system depending on
>> the interrupt controller: XICS or XIVE.
>>
>> No default device layout is provided and PCI devices can be added on
>> any of the available PCIe Root Port (pcie.0 .. 2) with address 0x0 as
>> the firwware (skiboot) only accepts a single device per root port. To
>> run a simple system with a network and a storage adapters, use a
>> command line options such as :
>>
>>   -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
>>   -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
>>
>>   -device megasas,id=scsi0,bus=pcie.1,addr=0x0
>>   -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>>   -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
>>
>> If more are needed, include a bridge.
>>
>> Multi chip is supported, each chip adding its set of PHB controllers
>> and its PCI busses. The model doesn't emulate the EEH error handling
>> and cold plugging PHB devices still needs some work.
>>
>> XICS requires some adjustment to support the PHB3 MSI. The changes are
>> provided in the PHB3 model but they could be decoupled in prereq
>> patches.
> 
> Applied to ppc-for-5.0, thanks.

Should we add a default set of devices on PHB1 like found on OpenPOWER 
system ? On a P8 we have  :

 +-[0001:00]---00.0-[01-07]----00.0-[02-07]--+-01.0-[03-04]----00.0-[04]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
 |                                           +-02.0-[05]----00.0  Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller
 |                                           +-03.0-[06]--+-00.0  Broadcom Inc. and subsidiaries NetXtreme BCM5718 Gigabit Ethernet PCIe
 |                                           |            \-00.1  Broadcom Inc. and subsidiaries NetXtreme BCM5718 Gigabit Ethernet PCIe
 |                                           \-04.0-[07]----00.0  Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller


C.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges
  2020-01-29 13:15   ` Cédric Le Goater
@ 2020-01-29 22:14     ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2020-01-29 22:14 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, Oliver O'Halloran, qemu-devel, Nicholas Piggin

[-- Attachment #1: Type: text/plain, Size: 2863 bytes --]

On Wed, Jan 29, 2020 at 02:15:35PM +0100, Cédric Le Goater wrote:
> On 1/29/20 7:31 AM, David Gibson wrote:
> > On Mon, Jan 27, 2020 at 03:45:04PM +0100, Cédric Le Goater wrote:
> >> Hello,
> >>
> >> These are models for the PCIe Host Bridges, PHB3 and PHB4, as found on
> >> POWER8 and POWER9 processors. It includes the PowerBus logic interface
> >> (PBCQ), IOMMU support, a single PCIe Gen.3/4 Root Complex, and support
> >> for MSI and LSI interrupt sources as found on each system depending on
> >> the interrupt controller: XICS or XIVE.
> >>
> >> No default device layout is provided and PCI devices can be added on
> >> any of the available PCIe Root Port (pcie.0 .. 2) with address 0x0 as
> >> the firwware (skiboot) only accepts a single device per root port. To
> >> run a simple system with a network and a storage adapters, use a
> >> command line options such as :
> >>
> >>   -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
> >>   -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
> >>
> >>   -device megasas,id=scsi0,bus=pcie.1,addr=0x0
> >>   -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
> >>   -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
> >>
> >> If more are needed, include a bridge.
> >>
> >> Multi chip is supported, each chip adding its set of PHB controllers
> >> and its PCI busses. The model doesn't emulate the EEH error handling
> >> and cold plugging PHB devices still needs some work.
> >>
> >> XICS requires some adjustment to support the PHB3 MSI. The changes are
> >> provided in the PHB3 model but they could be decoupled in prereq
> >> patches.
> > 
> > Applied to ppc-for-5.0, thanks.
> 
> Should we add a default set of devices on PHB1 like found on OpenPOWER 
> system ? On a P8 we have  :

I think that's kind of up to you.

> 
>  +-[0001:00]---00.0-[01-07]----00.0-[02-07]--+-01.0-[03-04]----00.0-[04]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
>  |                                           +-02.0-[05]----00.0  Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller
>  |                                           +-03.0-[06]--+-00.0  Broadcom Inc. and subsidiaries NetXtreme BCM5718 Gigabit Ethernet PCIe
>  |                                           |            \-00.1  Broadcom Inc. and subsidiaries NetXtreme BCM5718 Gigabit Ethernet PCIe
>  |                                           \-04.0-[07]----00.0  Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller
> 
> 
> C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-01-29 22:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27 14:45 [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges Cédric Le Goater
2020-01-27 14:45 ` [PATCH 1/2] ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge Cédric Le Goater
2020-01-29  3:09   ` David Gibson
2020-01-29  3:54     ` Oliver O'Halloran
2020-01-29  6:16       ` David Gibson
2020-01-27 14:45 ` [PATCH 2/2] ppc/pnv: Add models for POWER8 PHB3 " Cédric Le Goater
2020-01-29  6:31 ` [PATCH 0/2] ppc/pnv: Add models for PHB4 and PHB3 PCIe Host bridges David Gibson
2020-01-29 13:15   ` Cédric Le Goater
2020-01-29 22:14     ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.