All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries
@ 2016-10-12 23:57 David Gibson
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 1/7] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

The current way we organize the IO windows into PCI space for the
pseries machine type has several problems.

  - It makes it difficult to create very large MMIO spaces which is
    necessary for certain PCI devices with very large BARs.  This
    problem has been known for a while.

  - More recently we discovered a more serious problem: it prevents
    more than 1TiB of RAM being added to a pseries guest.

  - It doesn't make very efficient use of address space.

Fixing this is complicated by keeping migration from old versionss
working and working out what things belong on which side of the
abstraction barrier between the machine type and the host bridge
device.

This series addresses all these problems.  Patches 1-3/7 perform
preliminary cleanups to the spapr specific PCI test code, which we'll
need to get the tests working with the changed implementation.  4-5/7
represent a minimal fix for the most serious problem (the 1 TiB limit)
- once polished, I'll consider submiting these for the stable branch.
6-7/7 complete a more comprehensive fix.

Changes since v2:
  * Removed window sizes from placement callback.  Having them in there
    led to a small behavioural change that wasn't intended
  * Adjusted / added some comments for clarity.
Changes since v1:
  * Removed a debugging printf()
Changes since RFC:
  * Rebase
  * Fixed some bugs
  * Fixed up PCI testcases which were broken by the change (due to
    test limitations)
  * Seriously contemplated, then rejected a completely different
    approach

*** BLURB HERE ***

David Gibson (7):
  libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()
  libqos: Correct error in PCI hole sizing for spapr
  libqos: Limit spapr-pci to 32-bit MMIO for now
  spapr_pci: Delegate placement of PCI host bridges to machine type
  spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
  spapr_pci: Add a 64-bit MMIO window
  spapr: Improved placement of PCI host bridges in guest memory map

 hw/ppc/spapr.c              | 119 +++++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_pci.c          |  90 +++++++++++++++++++++++----------
 include/hw/pci-host/spapr.h |  25 +++++-----
 include/hw/ppc/spapr.h      |   4 ++
 tests/endianness-test.c     |   3 +-
 tests/libqos/pci-spapr.c    | 116 +++++++++++++++++++++++-------------------
 tests/spapr-phb-test.c      |   2 +-
 7 files changed, 265 insertions(+), 94 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCHv3 1/7] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
@ 2016-10-12 23:57 ` David Gibson
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 2/7] libqos: Correct error in PCI hole sizing for spapr David Gibson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

The libqos code for accessing PCI on the spapr machine type uses IOBASE()
and MMIOBASE() macros to determine the address in the CPU memory map of
the windows to PCI address space.

This is a detail of the implementation of PCI in the machine type, it's not
specified by the PAPR standard.  Real guests would get the addresses of the
PCI windows from the device tree.

Finding the device tree in libqos would be awkward, but we can at least
localize this knowledge of the implementation to the init function, saving
it in the QPCIBusSPAPR structure for use by the accessors.

That leaves only one place to fix if we alter the location of the PCI
windows, as we're planning to do.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 tests/libqos/pci-spapr.c | 113 +++++++++++++++++++++++++++--------------------
 1 file changed, 64 insertions(+), 49 deletions(-)

diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
index 2f73bad..1765a54 100644
--- a/tests/libqos/pci-spapr.c
+++ b/tests/libqos/pci-spapr.c
@@ -18,30 +18,23 @@
 
 /* From include/hw/pci-host/spapr.h */
 
-#define SPAPR_PCI_BASE_BUID          0x800000020000000ULL
-
-#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
-
-#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
-#define SPAPR_PCI_WINDOW_SPACING     0x1000000000ULL
-#define SPAPR_PCI_MMIO_WIN_OFF       0xA0000000
-#define SPAPR_PCI_MMIO_WIN_SIZE      (SPAPR_PCI_WINDOW_SPACING - \
-                                     SPAPR_PCI_MEM_WIN_BUS_OFFSET)
-#define SPAPR_PCI_IO_WIN_OFF         0x80000000
-#define SPAPR_PCI_IO_WIN_SIZE        0x10000
-
-/* index is the phb index */
-
-#define BUIDBASE(index)              (SPAPR_PCI_BASE_BUID + (index))
-#define PCIBASE(index)               (SPAPR_PCI_WINDOW_BASE + \
-                                      (index) * SPAPR_PCI_WINDOW_SPACING)
-#define IOBASE(index)                (PCIBASE(index) + SPAPR_PCI_IO_WIN_OFF)
-#define MMIOBASE(index)              (PCIBASE(index) + SPAPR_PCI_MMIO_WIN_OFF)
+typedef struct QPCIWindow {
+    uint64_t pci_base;    /* window address in PCI space */
+    uint64_t size;        /* window size */
+} QPCIWindow;
 
 typedef struct QPCIBusSPAPR {
     QPCIBus bus;
     QGuestAllocator *alloc;
 
+    uint64_t buid;
+
+    uint64_t pio_cpu_base;
+    QPCIWindow pio;
+
+    uint64_t mmio_cpu_base;
+    QPCIWindow mmio;
+
     uint64_t pci_hole_start;
     uint64_t pci_hole_size;
     uint64_t pci_hole_alloc;
@@ -59,69 +52,75 @@ typedef struct QPCIBusSPAPR {
 
 static uint8_t qpci_spapr_io_readb(QPCIBus *bus, void *addr)
 {
+    QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint64_t port = (uintptr_t)addr;
     uint8_t v;
-    if (port < SPAPR_PCI_IO_WIN_SIZE) {
-        v = readb(IOBASE(0) + port);
+    if (port < s->pio.size) {
+        v = readb(s->pio_cpu_base + port);
     } else {
-        v = readb(MMIOBASE(0) + port);
+        v = readb(s->mmio_cpu_base + port);
     }
     return v;
 }
 
 static uint16_t qpci_spapr_io_readw(QPCIBus *bus, void *addr)
 {
+    QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint64_t port = (uintptr_t)addr;
     uint16_t v;
-    if (port < SPAPR_PCI_IO_WIN_SIZE) {
-        v = readw(IOBASE(0) + port);
+    if (port < s->pio.size) {
+        v = readw(s->pio_cpu_base + port);
     } else {
-        v = readw(MMIOBASE(0) + port);
+        v = readw(s->mmio_cpu_base + port);
     }
     return bswap16(v);
 }
 
 static uint32_t qpci_spapr_io_readl(QPCIBus *bus, void *addr)
 {
+    QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint64_t port = (uintptr_t)addr;
     uint32_t v;
-    if (port < SPAPR_PCI_IO_WIN_SIZE) {
-        v = readl(IOBASE(0) + port);
+    if (port < s->pio.size) {
+        v = readl(s->pio_cpu_base + port);
     } else {
-        v = readl(MMIOBASE(0) + port);
+        v = readl(s->mmio_cpu_base + port);
     }
     return bswap32(v);
 }
 
 static void qpci_spapr_io_writeb(QPCIBus *bus, void *addr, uint8_t value)
 {
+    QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint64_t port = (uintptr_t)addr;
-    if (port < SPAPR_PCI_IO_WIN_SIZE) {
-        writeb(IOBASE(0) + port, value);
+    if (port < s->pio.size) {
+        writeb(s->pio_cpu_base + port, value);
     } else {
-        writeb(MMIOBASE(0) + port, value);
+        writeb(s->mmio_cpu_base + port, value);
     }
 }
 
 static void qpci_spapr_io_writew(QPCIBus *bus, void *addr, uint16_t value)
 {
+    QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint64_t port = (uintptr_t)addr;
     value = bswap16(value);
-    if (port < SPAPR_PCI_IO_WIN_SIZE) {
-        writew(IOBASE(0) + port, value);
+    if (port < s->pio.size) {
+        writew(s->pio_cpu_base + port, value);
     } else {
-        writew(MMIOBASE(0) + port, value);
+        writew(s->mmio_cpu_base + port, value);
     }
 }
 
 static void qpci_spapr_io_writel(QPCIBus *bus, void *addr, uint32_t value)
 {
+    QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint64_t port = (uintptr_t)addr;
     value = bswap32(value);
-    if (port < SPAPR_PCI_IO_WIN_SIZE) {
-        writel(IOBASE(0) + port, value);
+    if (port < s->pio.size) {
+        writel(s->pio_cpu_base + port, value);
     } else {
-        writel(MMIOBASE(0) + port, value);
+        writel(s->mmio_cpu_base + port, value);
     }
 }
 
@@ -129,24 +128,21 @@ static uint8_t qpci_spapr_config_readb(QPCIBus *bus, int devfn, uint8_t offset)
 {
     QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint32_t config_addr = (devfn << 8) | offset;
-    return qrtas_ibm_read_pci_config(s->alloc, BUIDBASE(0),
-                                     config_addr, 1);
+    return qrtas_ibm_read_pci_config(s->alloc, s->buid, config_addr, 1);
 }
 
 static uint16_t qpci_spapr_config_readw(QPCIBus *bus, int devfn, uint8_t offset)
 {
     QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint32_t config_addr = (devfn << 8) | offset;
-    return qrtas_ibm_read_pci_config(s->alloc, BUIDBASE(0),
-                                     config_addr, 2);
+    return qrtas_ibm_read_pci_config(s->alloc, s->buid, config_addr, 2);
 }
 
 static uint32_t qpci_spapr_config_readl(QPCIBus *bus, int devfn, uint8_t offset)
 {
     QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint32_t config_addr = (devfn << 8) | offset;
-    return qrtas_ibm_read_pci_config(s->alloc, BUIDBASE(0),
-                                     config_addr, 4);
+    return qrtas_ibm_read_pci_config(s->alloc, s->buid, config_addr, 4);
 }
 
 static void qpci_spapr_config_writeb(QPCIBus *bus, int devfn, uint8_t offset,
@@ -154,8 +150,7 @@ static void qpci_spapr_config_writeb(QPCIBus *bus, int devfn, uint8_t offset,
 {
     QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint32_t config_addr = (devfn << 8) | offset;
-    qrtas_ibm_write_pci_config(s->alloc, BUIDBASE(0),
-                               config_addr, 1, value);
+    qrtas_ibm_write_pci_config(s->alloc, s->buid, config_addr, 1, value);
 }
 
 static void qpci_spapr_config_writew(QPCIBus *bus, int devfn, uint8_t offset,
@@ -163,8 +158,7 @@ static void qpci_spapr_config_writew(QPCIBus *bus, int devfn, uint8_t offset,
 {
     QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint32_t config_addr = (devfn << 8) | offset;
-    qrtas_ibm_write_pci_config(s->alloc, BUIDBASE(0),
-                               config_addr, 2, value);
+    qrtas_ibm_write_pci_config(s->alloc, s->buid, config_addr, 2, value);
 }
 
 static void qpci_spapr_config_writel(QPCIBus *bus, int devfn, uint8_t offset,
@@ -172,8 +166,7 @@ static void qpci_spapr_config_writel(QPCIBus *bus, int devfn, uint8_t offset,
 {
     QPCIBusSPAPR *s = container_of(bus, QPCIBusSPAPR, bus);
     uint32_t config_addr = (devfn << 8) | offset;
-    qrtas_ibm_write_pci_config(s->alloc, BUIDBASE(0),
-                               config_addr, 4, value);
+    qrtas_ibm_write_pci_config(s->alloc, s->buid, config_addr, 4, value);
 }
 
 static void *qpci_spapr_iomap(QPCIBus *bus, QPCIDevice *dev, int barno,
@@ -242,6 +235,15 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void *data)
     /* FIXME */
 }
 
+#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
+#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
+#define SPAPR_PCI_WINDOW_SPACING     0x1000000000ULL
+#define SPAPR_PCI_MMIO_WIN_OFF       0xA0000000
+#define SPAPR_PCI_MMIO_WIN_SIZE      (SPAPR_PCI_WINDOW_SPACING - \
+                                     SPAPR_PCI_MEM_WIN_BUS_OFFSET)
+#define SPAPR_PCI_IO_WIN_OFF         0x80000000
+#define SPAPR_PCI_IO_WIN_SIZE        0x10000
+
 QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
 {
     QPCIBusSPAPR *ret;
@@ -269,6 +271,19 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
     ret->bus.iomap = qpci_spapr_iomap;
     ret->bus.iounmap = qpci_spapr_iounmap;
 
+    /* FIXME: We assume the default location of the PHB for now.
+     * Ideally we'd parse the device tree deposited in the guest to
+     * get the window locations */
+    ret->buid = 0x800000020000000ULL;
+
+    ret->pio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_IO_WIN_OFF;
+    ret->pio.pci_base = 0;
+    ret->pio.size = SPAPR_PCI_IO_WIN_SIZE;
+
+    ret->mmio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO_WIN_OFF;
+    ret->mmio.pci_base = SPAPR_PCI_MEM_WIN_BUS_OFFSET;
+    ret->mmio.size = SPAPR_PCI_MMIO_WIN_SIZE;
+
     ret->pci_hole_start = 0xC0000000;
     ret->pci_hole_size = SPAPR_PCI_MMIO_WIN_SIZE;
     ret->pci_hole_alloc = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCHv3 2/7] libqos: Correct error in PCI hole sizing for spapr
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 1/7] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
@ 2016-10-12 23:57 ` David Gibson
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 3/7] libqos: Limit spapr-pci to 32-bit MMIO for now David Gibson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

In pci-spapr.c (as in pci-pc.c from which it was derived), the
pci_hole_start/pci_hole_size and pci_iohole_start/pci_iohole_size pairs[1]
essentially define the region of PCI (not CPU) addresses in which MMIO
or PIO BARs respectively will be allocated.

The size value is relative to the start value.  But in pci-spapr.c it is
set to the entire size of the window supported by the (emulated) hardware,
but the start values are *not* at the beginning of the emulated windows.

That means if you tried to map enough PCI BARs, we'd messily overrun the
IO windows, instead of failing in iomap as we should.

This patch corrects this by calculating the hole sizes from the location
of the window in PCI space and the hole start.

[1] Those are bad names, but that's a problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 tests/libqos/pci-spapr.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
index 1765a54..3192903 100644
--- a/tests/libqos/pci-spapr.c
+++ b/tests/libqos/pci-spapr.c
@@ -285,11 +285,13 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
     ret->mmio.size = SPAPR_PCI_MMIO_WIN_SIZE;
 
     ret->pci_hole_start = 0xC0000000;
-    ret->pci_hole_size = SPAPR_PCI_MMIO_WIN_SIZE;
+    ret->pci_hole_size =
+        ret->mmio.pci_base + ret->mmio.size - ret->pci_hole_start;
     ret->pci_hole_alloc = 0;
 
     ret->pci_iohole_start = 0xc000;
-    ret->pci_iohole_size = SPAPR_PCI_IO_WIN_SIZE;
+    ret->pci_iohole_size =
+        ret->pio.pci_base + ret->pio.size - ret->pci_iohole_start;
     ret->pci_iohole_alloc = 0;
 
     return &ret->bus;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCHv3 3/7] libqos: Limit spapr-pci to 32-bit MMIO for now
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 1/7] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 2/7] libqos: Correct error in PCI hole sizing for spapr David Gibson
@ 2016-10-12 23:57 ` David Gibson
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

Currently the functions in pci-spapr.c (like pci-pc.c on which it's based)
don't distinguish between 32-bit and 64-bit PCI MMIO.  At the moment, the
qemu side implementation is a bit weird and has a single MMIO window
straddling 32-bit and 64-bit regions, but we're likely to change that in
future.

In any case, pci-pc.c - and therefore the testcases using PCI - only handle
32-bit MMIOs for now.  For spapr despite whatever changes might happen with
the MMIO windows, the 32-bit window is likely to remain at 2..4 GiB in PCI
space.

So, explicitly limit pci-spapr.c to 32-bit MMIOs for now, we can add 64-bit
MMIO support back in when and if we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 tests/libqos/pci-spapr.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
index 3192903..558dfc3 100644
--- a/tests/libqos/pci-spapr.c
+++ b/tests/libqos/pci-spapr.c
@@ -32,8 +32,8 @@ typedef struct QPCIBusSPAPR {
     uint64_t pio_cpu_base;
     QPCIWindow pio;
 
-    uint64_t mmio_cpu_base;
-    QPCIWindow mmio;
+    uint64_t mmio32_cpu_base;
+    QPCIWindow mmio32;
 
     uint64_t pci_hole_start;
     uint64_t pci_hole_size;
@@ -58,7 +58,7 @@ static uint8_t qpci_spapr_io_readb(QPCIBus *bus, void *addr)
     if (port < s->pio.size) {
         v = readb(s->pio_cpu_base + port);
     } else {
-        v = readb(s->mmio_cpu_base + port);
+        v = readb(s->mmio32_cpu_base + port);
     }
     return v;
 }
@@ -71,7 +71,7 @@ static uint16_t qpci_spapr_io_readw(QPCIBus *bus, void *addr)
     if (port < s->pio.size) {
         v = readw(s->pio_cpu_base + port);
     } else {
-        v = readw(s->mmio_cpu_base + port);
+        v = readw(s->mmio32_cpu_base + port);
     }
     return bswap16(v);
 }
@@ -84,7 +84,7 @@ static uint32_t qpci_spapr_io_readl(QPCIBus *bus, void *addr)
     if (port < s->pio.size) {
         v = readl(s->pio_cpu_base + port);
     } else {
-        v = readl(s->mmio_cpu_base + port);
+        v = readl(s->mmio32_cpu_base + port);
     }
     return bswap32(v);
 }
@@ -96,7 +96,7 @@ static void qpci_spapr_io_writeb(QPCIBus *bus, void *addr, uint8_t value)
     if (port < s->pio.size) {
         writeb(s->pio_cpu_base + port, value);
     } else {
-        writeb(s->mmio_cpu_base + port, value);
+        writeb(s->mmio32_cpu_base + port, value);
     }
 }
 
@@ -108,7 +108,7 @@ static void qpci_spapr_io_writew(QPCIBus *bus, void *addr, uint16_t value)
     if (port < s->pio.size) {
         writew(s->pio_cpu_base + port, value);
     } else {
-        writew(s->mmio_cpu_base + port, value);
+        writew(s->mmio32_cpu_base + port, value);
     }
 }
 
@@ -120,7 +120,7 @@ static void qpci_spapr_io_writel(QPCIBus *bus, void *addr, uint32_t value)
     if (port < s->pio.size) {
         writel(s->pio_cpu_base + port, value);
     } else {
-        writel(s->mmio_cpu_base + port, value);
+        writel(s->mmio32_cpu_base + port, value);
     }
 }
 
@@ -235,12 +235,9 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void *data)
     /* FIXME */
 }
 
-#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
 #define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
-#define SPAPR_PCI_WINDOW_SPACING     0x1000000000ULL
-#define SPAPR_PCI_MMIO_WIN_OFF       0xA0000000
-#define SPAPR_PCI_MMIO_WIN_SIZE      (SPAPR_PCI_WINDOW_SPACING - \
-                                     SPAPR_PCI_MEM_WIN_BUS_OFFSET)
+#define SPAPR_PCI_MMIO32_WIN_OFF     0xA0000000
+#define SPAPR_PCI_MMIO32_WIN_SIZE    0x80000000 /* 2 GiB */
 #define SPAPR_PCI_IO_WIN_OFF         0x80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
 
@@ -280,13 +277,14 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
     ret->pio.pci_base = 0;
     ret->pio.size = SPAPR_PCI_IO_WIN_SIZE;
 
-    ret->mmio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO_WIN_OFF;
-    ret->mmio.pci_base = SPAPR_PCI_MEM_WIN_BUS_OFFSET;
-    ret->mmio.size = SPAPR_PCI_MMIO_WIN_SIZE;
+    /* 32-bit portion of the MMIO window is at PCI address 2..4 GiB */
+    ret->mmio32_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO32_WIN_OFF;
+    ret->mmio32.pci_base = 0x80000000; /* 2 GiB */
+    ret->mmio32.size = SPAPR_PCI_MMIO32_WIN_SIZE;
 
     ret->pci_hole_start = 0xC0000000;
     ret->pci_hole_size =
-        ret->mmio.pci_base + ret->mmio.size - ret->pci_hole_start;
+        ret->mmio32.pci_base + ret->mmio32.size - ret->pci_hole_start;
     ret->pci_hole_alloc = 0;
 
     ret->pci_iohole_start = 0xc000;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
                   ` (2 preceding siblings ...)
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 3/7] libqos: Limit spapr-pci to 32-bit MMIO for now David Gibson
@ 2016-10-12 23:57 ` David Gibson
  2016-10-13  7:33   ` Laurent Vivier
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.

There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows.  This gives the most control, but is very awkward with 6
mandatory parameters.  Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.

The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO).  Obviously, for migration we can only change the
locations on a new machine type, however.

This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.

So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
can do.

For now, this just changes where the calculation is done.  It doesn't
change the actual location of the host bridges, or any other behaviour.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
 hw/ppc/spapr_pci.c          | 21 +++++++--------------
 include/hw/pci-host/spapr.h | 11 +----------
 include/hw/ppc/spapr.h      |  3 +++
 4 files changed, 42 insertions(+), 24 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 03e3803..cb9da96 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2370,6 +2370,36 @@ static HotpluggableCPUList *spapr_query_hotpluggable_cpus(MachineState *machine)
     return head;
 }
 
+static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
+                                uint64_t *buid, hwaddr *pio, hwaddr *mmio,
+                                unsigned n_dma, uint32_t *liobns, Error **errp)
+{
+    const uint64_t base_buid = 0x800000020000000ULL;
+    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
+    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
+    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
+    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
+    const uint32_t max_index = 255;
+
+    hwaddr phb_base;
+    int i;
+
+    if (index > max_index) {
+        error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
+                   max_index);
+        return;
+    }
+
+    *buid = base_buid + index;
+    for (i = 0; i < n_dma; ++i) {
+        liobns[i] = SPAPR_PCI_LIOBN(index, i);
+    }
+
+    phb_base = phb0_base + index * phb_spacing;
+    *pio = phb_base + pio_offset;
+    *mmio = phb_base + mmio_offset;
+}
+
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -2406,6 +2436,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     mc->query_hotpluggable_cpus = spapr_query_hotpluggable_cpus;
     fwc->get_dev_path = spapr_get_fw_dev_path;
     nc->nmi_monitor_handler = spapr_nmi;
+    smc->phb_placement = spapr_phb_placement;
 }
 
 static const TypeInfo spapr_machine_info = {
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4f00865..0e6cf4d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1311,7 +1311,8 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
 
     if (sphb->index != (uint32_t)-1) {
-        hwaddr windows_base;
+        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+        Error *local_err = NULL;
 
         if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != (uint32_t)-1)
             || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported == 2)
@@ -1322,21 +1323,13 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
             return;
         }
 
-        if (sphb->index > SPAPR_PCI_MAX_INDEX) {
-            error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
-                       SPAPR_PCI_MAX_INDEX);
+        smc->phb_placement(spapr, sphb->index, &sphb->buid,
+                           &sphb->io_win_addr, &sphb->mem_win_addr,
+                           windows_supported, sphb->dma_liobn, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
             return;
         }
-
-        sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
-        for (i = 0; i < windows_supported; ++i) {
-            sphb->dma_liobn[i] = SPAPR_PCI_LIOBN(sphb->index, i);
-        }
-
-        windows_base = SPAPR_PCI_WINDOW_BASE
-            + sphb->index * SPAPR_PCI_WINDOW_SPACING;
-        sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
-        sphb->io_win_addr = windows_base + SPAPR_PCI_IO_WIN_OFF;
     }
 
     if (sphb->buid == (uint64_t)-1) {
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 30dbd46..8c9ebfd 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -79,18 +79,9 @@ struct sPAPRPHBState {
     uint32_t numa_node;
 };
 
-#define SPAPR_PCI_MAX_INDEX          255
-
-#define SPAPR_PCI_BASE_BUID          0x800000020000000ULL
-
 #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
 
-#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
-#define SPAPR_PCI_WINDOW_SPACING     0x1000000000ULL
-#define SPAPR_PCI_MMIO_WIN_OFF       0xA0000000
-#define SPAPR_PCI_MMIO_WIN_SIZE      (SPAPR_PCI_WINDOW_SPACING - \
-                                     SPAPR_PCI_MEM_WIN_BUS_OFFSET)
-#define SPAPR_PCI_IO_WIN_OFF         0x80000000
+#define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
 
 #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 39dadaa..a05783c 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -40,6 +40,9 @@ struct sPAPRMachineClass {
     bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs */
     bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
     const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
+    void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
+                          uint64_t *buid, hwaddr *pio, hwaddr *mmio,
+                          unsigned n_dma, uint32_t *liobns, Error **errp);
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
                   ` (3 preceding siblings ...)
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
@ 2016-10-12 23:57 ` David Gibson
  2016-10-13  7:35   ` Laurent Vivier
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window David Gibson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

Currently the default PCI host bridge for the 'pseries' machine type is
constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
guest memory space.  This means that if > 1TiB of guest RAM is specified,
the RAM will collide with the PCI IO windows, causing serious problems.

Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
there's a little unused space at the bottom of the area reserved for PCI,
but essentially this means that > 1TiB of RAM has never worked with the
pseries machine type.

This patch fixes this by altering the placement of PHBs on large-RAM VMs.
Instead of always placing the first PHB at 1TiB, it is placed at the next
1 TiB boundary after the maximum RAM address.

Technically, this changes behaviour in a migration-breaking way for
existing machines with > 1TiB maximum memory, but since having > 1 TiB
memory was broken anyway, this seems like a reasonable trade-off.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index cb9da96..e6b110d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2375,15 +2375,27 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
                                 unsigned n_dma, uint32_t *liobns, Error **errp)
 {
     const uint64_t base_buid = 0x800000020000000ULL;
-    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
     const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
     const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
     const hwaddr pio_offset = 0x80000000; /* 2 GiB */
     const uint32_t max_index = 255;
+    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
 
-    hwaddr phb_base;
+    uint64_t ram_top = MACHINE(spapr)->ram_size;
+    hwaddr phb0_base, phb_base;
     int i;
 
+    /* Do we have hotpluggable memory? */
+    if (MACHINE(spapr)->maxram_size > ram_top) {
+        /* Can't just use maxram_size, because there may be an
+         * alignment gap between normal and hotpluggable memory
+         * regions */
+        ram_top = spapr->hotplug_memory.base +
+            memory_region_size(&spapr->hotplug_memory.mr);
+    }
+
+    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
+
     if (index > max_index) {
         error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
                    max_index);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
                   ` (4 preceding siblings ...)
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
@ 2016-10-12 23:57 ` David Gibson
  2016-10-13  8:06   ` Laurent Vivier
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
  2016-10-14  4:10 ` [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries no-reply
  7 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
  - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
  - A 64-bit window which maps onto a large region somewhere high in PCI
    address space (traditionally this used an identity mapping from guest
    physical address to PCI address, but that's not always the case)

The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however.  At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.

This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window.  With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.

This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured.  The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).

So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified.  This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.

For now, this only adds the possibility of 64-bit windows.  The default
configuration still uses the legacy mode.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c              | 10 +++++--
 hw/ppc/spapr_pci.c          | 70 ++++++++++++++++++++++++++++++++++++---------
 include/hw/pci-host/spapr.h |  8 ++++--
 include/hw/ppc/spapr.h      |  3 +-
 4 files changed, 72 insertions(+), 19 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e6b110d..8db3657 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2371,7 +2371,8 @@ static HotpluggableCPUList *spapr_query_hotpluggable_cpus(MachineState *machine)
 }
 
 static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
-                                uint64_t *buid, hwaddr *pio, hwaddr *mmio,
+                                uint64_t *buid, hwaddr *pio,
+                                hwaddr *mmio32, hwaddr *mmio64,
                                 unsigned n_dma, uint32_t *liobns, Error **errp)
 {
     const uint64_t base_buid = 0x800000020000000ULL;
@@ -2409,7 +2410,12 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
 
     phb_base = phb0_base + index * phb_spacing;
     *pio = phb_base + pio_offset;
-    *mmio = phb_base + mmio_offset;
+    *mmio32 = phb_base + mmio_offset;
+    /*
+     * We don't set the 64-bit MMIO window, relying on the PHB's
+     * fallback behaviour of automatically splitting a large "32-bit"
+     * window into contiguous 32-bit and 64-bit windows
+     */
 }
 
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 0e6cf4d..31ca6fa 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1317,14 +1317,16 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != (uint32_t)-1)
             || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported == 2)
             || (sphb->mem_win_addr != (hwaddr)-1)
+            || (sphb->mem64_win_addr != (hwaddr)-1)
             || (sphb->io_win_addr != (hwaddr)-1)) {
             error_setg(errp, "Either \"index\" or other parameters must"
                        " be specified for PAPR PHB, not both");
             return;
         }
 
-        smc->phb_placement(spapr, sphb->index, &sphb->buid,
-                           &sphb->io_win_addr, &sphb->mem_win_addr,
+        smc->phb_placement(spapr, sphb->index,
+                           &sphb->buid, &sphb->io_win_addr,
+                           &sphb->mem_win_addr, &sphb->mem64_win_addr,
                            windows_supported, sphb->dma_liobn, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
@@ -1353,6 +1355,38 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    if (sphb->mem64_win_size != 0) {
+        if (sphb->mem64_win_addr == (hwaddr)-1) {
+            error_setg(errp,
+                       "64-bit memory window address not specified for PHB");
+            return;
+        }
+
+        if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
+            error_setg(errp, "32-bit memory window of size 0x%"HWADDR_PRIx
+                       " (max 2 GiB)", sphb->mem_win_size);
+            return;
+        }
+
+        if (sphb->mem64_win_pciaddr == (hwaddr)-1) {
+            /* 64-bit window defaults to identity mapping */
+            sphb->mem64_win_pciaddr = sphb->mem64_win_addr;
+        }
+    } else if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
+        /*
+         * For compatibility with old configuration, if no 64-bit MMIO
+         * window is specified, but the ordinary (32-bit) memory
+         * window is specified as > 2GiB, we treat it as a 2GiB 32-bit
+         * window, with a 64-bit MMIO window following on immediately
+         * afterwards
+         */
+        sphb->mem64_win_size = sphb->mem_win_size - SPAPR_PCI_MEM32_WIN_SIZE;
+        sphb->mem64_win_addr = sphb->mem_win_addr + SPAPR_PCI_MEM32_WIN_SIZE;
+        sphb->mem64_win_pciaddr =
+            SPAPR_PCI_MEM_WIN_BUS_OFFSET + SPAPR_PCI_MEM32_WIN_SIZE;
+        sphb->mem_win_size = SPAPR_PCI_MEM32_WIN_SIZE;
+    }
+
     if (spapr_pci_find_phb(spapr, sphb->buid)) {
         error_setg(errp, "PCI host bridges must have unique BUIDs");
         return;
@@ -1366,12 +1400,19 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     sprintf(namebuf, "%s.mmio", sphb->dtbusname);
     memory_region_init(&sphb->memspace, OBJECT(sphb), namebuf, UINT64_MAX);
 
-    sprintf(namebuf, "%s.mmio-alias", sphb->dtbusname);
-    memory_region_init_alias(&sphb->memwindow, OBJECT(sphb),
+    sprintf(namebuf, "%s.mmio32-alias", sphb->dtbusname);
+    memory_region_init_alias(&sphb->mem32window, OBJECT(sphb),
                              namebuf, &sphb->memspace,
                              SPAPR_PCI_MEM_WIN_BUS_OFFSET, sphb->mem_win_size);
     memory_region_add_subregion(get_system_memory(), sphb->mem_win_addr,
-                                &sphb->memwindow);
+                                &sphb->mem32window);
+
+    sprintf(namebuf, "%s.mmio64-alias", sphb->dtbusname);
+    memory_region_init_alias(&sphb->mem64window, OBJECT(sphb),
+                             namebuf, &sphb->memspace,
+                             sphb->mem64_win_pciaddr, sphb->mem64_win_size);
+    memory_region_add_subregion(get_system_memory(), sphb->mem64_win_addr,
+                                &sphb->mem64window);
 
     /* Initialize IO regions */
     sprintf(namebuf, "%s.io", sphb->dtbusname);
@@ -1525,6 +1566,10 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
     DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
                        SPAPR_PCI_MMIO_WIN_SIZE),
+    DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
+    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
+    DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
+                       -1),
     DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
     DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
                        SPAPR_PCI_IO_WIN_SIZE),
@@ -1760,10 +1805,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
     int bus_off, i, j, ret;
     char nodename[FDT_NAME_MAX];
     uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
-    const uint64_t mmiosize = memory_region_size(&phb->memwindow);
-    const uint64_t w32max = (1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET;
-    const uint64_t w32size = MIN(w32max, mmiosize);
-    const uint64_t w64size = (mmiosize > w32size) ? (mmiosize - w32size) : 0;
     struct {
         uint32_t hi;
         uint64_t child;
@@ -1778,15 +1819,16 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
         {
             cpu_to_be32(b_ss(2)), cpu_to_be64(SPAPR_PCI_MEM_WIN_BUS_OFFSET),
             cpu_to_be64(phb->mem_win_addr),
-            cpu_to_be64(w32size),
+            cpu_to_be64(phb->mem_win_size),
         },
         {
-            cpu_to_be32(b_ss(3)), cpu_to_be64(1ULL << 32),
-            cpu_to_be64(phb->mem_win_addr + w32size),
-            cpu_to_be64(w64size)
+            cpu_to_be32(b_ss(3)), cpu_to_be64(phb->mem64_win_pciaddr),
+            cpu_to_be64(phb->mem64_win_addr),
+            cpu_to_be64(phb->mem64_win_size),
         },
     };
-    const unsigned sizeof_ranges = (w64size ? 3 : 2) * sizeof(ranges[0]);
+    const unsigned sizeof_ranges =
+        (phb->mem64_win_size ? 3 : 2) * sizeof(ranges[0]);
     uint64_t bus_reg[] = { cpu_to_be64(phb->buid), 0 };
     uint32_t interrupt_map_mask[] = {
         cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 8c9ebfd..23dfb09 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -53,8 +53,10 @@ struct sPAPRPHBState {
     bool dr_enabled;
 
     MemoryRegion memspace, iospace;
-    hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
-    MemoryRegion memwindow, iowindow, msiwindow;
+    hwaddr mem_win_addr, mem_win_size, mem64_win_addr, mem64_win_size;
+    uint64_t mem64_win_pciaddr;
+    hwaddr io_win_addr, io_win_size;
+    MemoryRegion mem32window, mem64window, iowindow, msiwindow;
 
     uint32_t dma_liobn[SPAPR_PCI_DMA_MAX_WINDOWS];
     hwaddr dma_win_addr, dma_win_size;
@@ -80,6 +82,8 @@ struct sPAPRPHBState {
 };
 
 #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
+#define SPAPR_PCI_MEM32_WIN_SIZE     \
+    ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
 
 #define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a05783c..aeaba3e 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -41,7 +41,8 @@ struct sPAPRMachineClass {
     bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
     const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
     void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
-                          uint64_t *buid, hwaddr *pio, hwaddr *mmio,
+                          uint64_t *buid, hwaddr *pio, 
+                          hwaddr *mmio32, hwaddr *mmio64,
                           unsigned n_dma, uint32_t *liobns, Error **errp);
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
                   ` (5 preceding siblings ...)
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window David Gibson
@ 2016-10-12 23:57 ` David Gibson
  2016-10-13  8:40   ` Laurent Vivier
  2016-10-14  4:10 ` [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries no-reply
  7 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-10-12 23:57 UTC (permalink / raw)
  To: lvivier, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh, David Gibson

Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
  - It limits guest RAM to 1 TiB (though we have a limited fix for this
    now)
  - It limits the total MMIO window to 64 GiB.  This is not always enough
    for some of the large nVidia GPGPU cards
  - Putting all the windows into a single 64 GiB area means that naturally
    aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB.  We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
(64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations.  Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c              | 126 +++++++++++++++++++++++++++++++++-----------
 hw/ppc/spapr_pci.c          |   5 +-
 include/hw/pci-host/spapr.h |   8 ++-
 tests/endianness-test.c     |   3 +-
 tests/libqos/pci-spapr.c    |   9 ++--
 tests/spapr-phb-test.c      |   2 +-
 6 files changed, 113 insertions(+), 40 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8db3657..2d952a8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
                                 hwaddr *mmio32, hwaddr *mmio64,
                                 unsigned n_dma, uint32_t *liobns, Error **errp)
 {
+    /*
+     * New-style PHB window placement.
+     *
+     * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
+     * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
+     * windows.
+     *
+     * Some guest kernels can't work with MMIO windows above 1<<46
+     * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
+     *
+     * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
+     * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
+     * has the 64-bit window for PHB1 and so forth.
+     */
     const uint64_t base_buid = 0x800000020000000ULL;
-    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
-    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
-    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
-    const uint32_t max_index = 255;
-    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
 
-    uint64_t ram_top = MACHINE(spapr)->ram_size;
-    hwaddr phb0_base, phb_base;
+    int max_phbs =
+        (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
+    hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;
+    hwaddr mmio64_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM64_WIN_SIZE;
     int i;
 
-    /* Do we have hotpluggable memory? */
-    if (MACHINE(spapr)->maxram_size > ram_top) {
-        /* Can't just use maxram_size, because there may be an
-         * alignment gap between normal and hotpluggable memory
-         * regions */
-        ram_top = spapr->hotplug_memory.base +
-            memory_region_size(&spapr->hotplug_memory.mr);
-    }
-
-    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
-
-    if (index > max_index) {
+    /* Sanity check natural alignments */
+    assert((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) == 0);
+    assert((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) == 0);
+    assert((SPAPR_PCI_MEM64_WIN_SIZE % SPAPR_PCI_MEM32_WIN_SIZE) == 0);
+    assert((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZE) == 0);
+    /* Sanity check bounds */
+    assert((SPAPR_PCI_BASE + max_phbs * SPAPR_PCI_IO_WIN_SIZE)
+           <= mmio32_base);
+    assert(mmio32_base + max_phbs * SPAPR_PCI_MEM32_WIN_SIZE
+           <= mmio64_base);
+
+    if (index >= max_phbs) {
         error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
-                   max_index);
+                   max_phbs - 1);
         return;
     }
 
@@ -2408,14 +2419,9 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
         liobns[i] = SPAPR_PCI_LIOBN(index, i);
     }
 
-    phb_base = phb0_base + index * phb_spacing;
-    *pio = phb_base + pio_offset;
-    *mmio32 = phb_base + mmio_offset;
-    /*
-     * We don't set the 64-bit MMIO window, relying on the PHB's
-     * fallback behaviour of automatically splitting a large "32-bit"
-     * window into contiguous 32-bit and 64-bit windows
-     */
+    *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
+    *mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE;
+    *mmio64 = mmio64_base + index * SPAPR_PCI_MEM64_WIN_SIZE;
 }
 
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
@@ -2519,8 +2525,67 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
 /*
  * pseries-2.7
  */
-#define SPAPR_COMPAT_2_7 \
-    HW_COMPAT_2_7 \
+#define SPAPR_COMPAT_2_7                            \
+    HW_COMPAT_2_7                                   \
+    {                                               \
+        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
+        .property = "mem_win_size",                 \
+        .value    = stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\
+    },                                              \
+    {                                               \
+        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
+        .property = "mem64_win_size",               \
+        .value    = "0",                            \
+    },
+
+static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
+                              uint64_t *buid, hwaddr *pio,
+                              hwaddr *mmio32, hwaddr *mmio64,
+                              unsigned n_dma, uint32_t *liobns, Error **errp)
+{
+    /* Legacy PHB placement for pseries-2.7 and earlier machine types */
+    const uint64_t base_buid = 0x800000020000000ULL;
+    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
+    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
+    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
+    const uint32_t max_index = 255;
+    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
+
+    uint64_t ram_top = MACHINE(spapr)->ram_size;
+    hwaddr phb0_base, phb_base;
+    int i;
+
+    /* Do we have hotpluggable memory? */
+    if (MACHINE(spapr)->maxram_size > ram_top) {
+        /* Can't just use maxram_size, because there may be an
+         * alignment gap between normal and hotpluggable memory
+         * regions */
+        ram_top = spapr->hotplug_memory.base +
+            memory_region_size(&spapr->hotplug_memory.mr);
+    }
+
+    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
+
+    if (index > max_index) {
+        error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
+                   max_index);
+        return;
+    }
+
+    *buid = base_buid + index;
+    for (i = 0; i < n_dma; ++i) {
+        liobns[i] = SPAPR_PCI_LIOBN(index, i);
+    }
+
+    phb_base = phb0_base + index * phb_spacing;
+    *pio = phb_base + pio_offset;
+    *mmio32 = phb_base + mmio_offset;
+    /*
+     * We don't set the 64-bit MMIO window, relying on the PHB's
+     * fallback behaviour of automatically splitting a large "32-bit"
+     * window into contiguous 32-bit and 64-bit windows
+     */
+}
 
 static void spapr_machine_2_7_instance_options(MachineState *machine)
 {
@@ -2533,6 +2598,7 @@ static void spapr_machine_2_7_class_options(MachineClass *mc)
     spapr_machine_2_8_class_options(mc);
     smc->tcg_default_cpu = "POWER7";
     SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_7);
+    smc->phb_placement = phb_placement_2_7;
 }
 
 DEFINE_SPAPR_MACHINE(2_7, "2.7", false);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 31ca6fa..3ef6a26 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1565,9 +1565,10 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_UINT32("liobn64", sPAPRPHBState, dma_liobn[1], -1),
     DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
     DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
-                       SPAPR_PCI_MMIO_WIN_SIZE),
+                       SPAPR_PCI_MEM32_WIN_SIZE),
     DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
-    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
+    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size,
+                       SPAPR_PCI_MEM64_WIN_SIZE),
     DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
                        -1),
     DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 23dfb09..b92c1b5 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -84,8 +84,14 @@ struct sPAPRPHBState {
 #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
 #define SPAPR_PCI_MEM32_WIN_SIZE     \
     ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
+#define SPAPR_PCI_MEM64_WIN_SIZE     0x10000000000ULL /* 1 TiB */
 
-#define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
+/* Without manual configuration, all PCI outbound windows will be
+ * within this range */
+#define SPAPR_PCI_BASE               (1ULL << 45) /* 32 TiB */
+#define SPAPR_PCI_LIMIT              (1ULL << 46) /* 64 TiB */
+
+#define SPAPR_PCI_2_7_MMIO_WIN_SIZE  0xf80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
 
 #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
diff --git a/tests/endianness-test.c b/tests/endianness-test.c
index b7a120e..cf8d41b 100644
--- a/tests/endianness-test.c
+++ b/tests/endianness-test.c
@@ -38,7 +38,8 @@ static const TestCase test_cases[] = {
     { "ppc", "prep", 0x80000000, .bswap = true },
     { "ppc", "bamboo", 0xe8000000, .bswap = true, .superio = "i82378" },
     { "ppc64", "mac99", 0xf2000000, .bswap = true, .superio = "i82378" },
-    { "ppc64", "pseries", 0x10080000000ULL,
+    { "ppc64", "pseries", (1ULL << 45), .bswap = true, .superio = "i82378" },
+    { "ppc64", "pseries-2.7", 0x10080000000ULL,
       .bswap = true, .superio = "i82378" },
     { "sh4", "r2d", 0xfe240000, .superio = "i82378" },
     { "sh4eb", "r2d", 0xfe240000, .bswap = true, .superio = "i82378" },
diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
index 558dfc3..2eaaf91 100644
--- a/tests/libqos/pci-spapr.c
+++ b/tests/libqos/pci-spapr.c
@@ -235,10 +235,9 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void *data)
     /* FIXME */
 }
 
-#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
-#define SPAPR_PCI_MMIO32_WIN_OFF     0xA0000000
+#define SPAPR_PCI_BASE               (1ULL << 45)
+
 #define SPAPR_PCI_MMIO32_WIN_SIZE    0x80000000 /* 2 GiB */
-#define SPAPR_PCI_IO_WIN_OFF         0x80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
 
 QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
@@ -273,12 +272,12 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
      * get the window locations */
     ret->buid = 0x800000020000000ULL;
 
-    ret->pio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_IO_WIN_OFF;
+    ret->pio_cpu_base = SPAPR_PCI_BASE;
     ret->pio.pci_base = 0;
     ret->pio.size = SPAPR_PCI_IO_WIN_SIZE;
 
     /* 32-bit portion of the MMIO window is at PCI address 2..4 GiB */
-    ret->mmio32_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO32_WIN_OFF;
+    ret->mmio32_cpu_base = SPAPR_PCI_BASE + SPAPR_PCI_MMIO32_WIN_SIZE;
     ret->mmio32.pci_base = 0x80000000; /* 2 GiB */
     ret->mmio32.size = SPAPR_PCI_MMIO32_WIN_SIZE;
 
diff --git a/tests/spapr-phb-test.c b/tests/spapr-phb-test.c
index 21004a7..d3522ea 100644
--- a/tests/spapr-phb-test.c
+++ b/tests/spapr-phb-test.c
@@ -25,7 +25,7 @@ int main(int argc, char **argv)
     g_test_init(&argc, &argv, NULL);
     qtest_add_func("/spapr-phb/device", test_phb_device);
 
-    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=100");
+    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=30");
 
     ret = g_test_run();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
@ 2016-10-13  7:33   ` Laurent Vivier
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2016-10-13  7:33 UTC (permalink / raw)
  To: David Gibson, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh



On 13/10/2016 01:57, David Gibson wrote:
> The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
> for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
> and PAPR guests) to have numerous independent PHBs, each controlling a
> separate PCI domain.
> 
> There are two ways of configuring the spapr-pci-host-bridge device: first
> it can be done fully manually, specifying the locations and sizes of all
> the IO windows.  This gives the most control, but is very awkward with 6
> mandatory parameters.  Alternatively just an "index" can be specified
> which essentially selects from an array of predefined PHB locations.
> The PHB at index 0 is automatically created as the default PHB.
> 
> The current set of default locations causes some problems for guests with
> large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
> GPGPU cards via VFIO).  Obviously, for migration we can only change the
> locations on a new machine type, however.
> 
> This is awkward, because the placement is currently decided within the
> spapr-pci-host-bridge code, so it breaks abstraction to look inside the
> machine type version.
> 
> So, this patch delegates the "default mode" PHB placement from the
> spapr-pci-host-bridge device back to the machine type via a public method
> in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
> can do.
> 
> For now, this just changes where the calculation is done.  It doesn't
> change the actual location of the host bridges, or any other behaviour.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Laurent Vivier <lvivier@redhat.com>

> ---
>  hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
>  hw/ppc/spapr_pci.c          | 21 +++++++--------------
>  include/hw/pci-host/spapr.h | 11 +----------
>  include/hw/ppc/spapr.h      |  3 +++
>  4 files changed, 42 insertions(+), 24 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 03e3803..cb9da96 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2370,6 +2370,36 @@ static HotpluggableCPUList *spapr_query_hotpluggable_cpus(MachineState *machine)
>      return head;
>  }
>  
> +static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> +                                uint64_t *buid, hwaddr *pio, hwaddr *mmio,
> +                                unsigned n_dma, uint32_t *liobns, Error **errp)
> +{
> +    const uint64_t base_buid = 0x800000020000000ULL;
> +    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
> +    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> +    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> +    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> +    const uint32_t max_index = 255;
> +
> +    hwaddr phb_base;
> +    int i;
> +
> +    if (index > max_index) {
> +        error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> +                   max_index);
> +        return;
> +    }
> +
> +    *buid = base_buid + index;
> +    for (i = 0; i < n_dma; ++i) {
> +        liobns[i] = SPAPR_PCI_LIOBN(index, i);
> +    }
> +
> +    phb_base = phb0_base + index * phb_spacing;
> +    *pio = phb_base + pio_offset;
> +    *mmio = phb_base + mmio_offset;
> +}
> +
>  static void spapr_machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> @@ -2406,6 +2436,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      mc->query_hotpluggable_cpus = spapr_query_hotpluggable_cpus;
>      fwc->get_dev_path = spapr_get_fw_dev_path;
>      nc->nmi_monitor_handler = spapr_nmi;
> +    smc->phb_placement = spapr_phb_placement;
>  }
>  
>  static const TypeInfo spapr_machine_info = {
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 4f00865..0e6cf4d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1311,7 +1311,8 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>          sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
>  
>      if (sphb->index != (uint32_t)-1) {
> -        hwaddr windows_base;
> +        sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +        Error *local_err = NULL;
>  
>          if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != (uint32_t)-1)
>              || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported == 2)
> @@ -1322,21 +1323,13 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>              return;
>          }
>  
> -        if (sphb->index > SPAPR_PCI_MAX_INDEX) {
> -            error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> -                       SPAPR_PCI_MAX_INDEX);
> +        smc->phb_placement(spapr, sphb->index, &sphb->buid,
> +                           &sphb->io_win_addr, &sphb->mem_win_addr,
> +                           windows_supported, sphb->dma_liobn, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
>              return;
>          }
> -
> -        sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
> -        for (i = 0; i < windows_supported; ++i) {
> -            sphb->dma_liobn[i] = SPAPR_PCI_LIOBN(sphb->index, i);
> -        }
> -
> -        windows_base = SPAPR_PCI_WINDOW_BASE
> -            + sphb->index * SPAPR_PCI_WINDOW_SPACING;
> -        sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
> -        sphb->io_win_addr = windows_base + SPAPR_PCI_IO_WIN_OFF;
>      }
>  
>      if (sphb->buid == (uint64_t)-1) {
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 30dbd46..8c9ebfd 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -79,18 +79,9 @@ struct sPAPRPHBState {
>      uint32_t numa_node;
>  };
>  
> -#define SPAPR_PCI_MAX_INDEX          255
> -
> -#define SPAPR_PCI_BASE_BUID          0x800000020000000ULL
> -
>  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
>  
> -#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
> -#define SPAPR_PCI_WINDOW_SPACING     0x1000000000ULL
> -#define SPAPR_PCI_MMIO_WIN_OFF       0xA0000000
> -#define SPAPR_PCI_MMIO_WIN_SIZE      (SPAPR_PCI_WINDOW_SPACING - \
> -                                     SPAPR_PCI_MEM_WIN_BUS_OFFSET)
> -#define SPAPR_PCI_IO_WIN_OFF         0x80000000
> +#define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
>  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
>  
>  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 39dadaa..a05783c 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -40,6 +40,9 @@ struct sPAPRMachineClass {
>      bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs */
>      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
>      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
> +    void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
> +                          uint64_t *buid, hwaddr *pio, hwaddr *mmio,
> +                          unsigned n_dma, uint32_t *liobns, Error **errp);
>  };
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
@ 2016-10-13  7:35   ` Laurent Vivier
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2016-10-13  7:35 UTC (permalink / raw)
  To: David Gibson, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh



On 13/10/2016 01:57, David Gibson wrote:
> Currently the default PCI host bridge for the 'pseries' machine type is
> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
> guest memory space.  This means that if > 1TiB of guest RAM is specified,
> the RAM will collide with the PCI IO windows, causing serious problems.
> 
> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
> there's a little unused space at the bottom of the area reserved for PCI,
> but essentially this means that > 1TiB of RAM has never worked with the
> pseries machine type.
> 
> This patch fixes this by altering the placement of PHBs on large-RAM VMs.
> Instead of always placing the first PHB at 1TiB, it is placed at the next
> 1 TiB boundary after the maximum RAM address.
> 
> Technically, this changes behaviour in a migration-breaking way for
> existing machines with > 1TiB maximum memory, but since having > 1 TiB
> memory was broken anyway, this seems like a reasonable trade-off.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Laurent Vivier <lvivier@redhat.com>

> ---
>  hw/ppc/spapr.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index cb9da96..e6b110d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2375,15 +2375,27 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>  {
>      const uint64_t base_buid = 0x800000020000000ULL;
> -    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
>      const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
>      const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
>      const hwaddr pio_offset = 0x80000000; /* 2 GiB */
>      const uint32_t max_index = 255;
> +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
>  
> -    hwaddr phb_base;
> +    uint64_t ram_top = MACHINE(spapr)->ram_size;
> +    hwaddr phb0_base, phb_base;
>      int i;
>  
> +    /* Do we have hotpluggable memory? */
> +    if (MACHINE(spapr)->maxram_size > ram_top) {
> +        /* Can't just use maxram_size, because there may be an
> +         * alignment gap between normal and hotpluggable memory
> +         * regions */
> +        ram_top = spapr->hotplug_memory.base +
> +            memory_region_size(&spapr->hotplug_memory.mr);
> +    }
> +
> +    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> +
>      if (index > max_index) {
>          error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
>                     max_index);
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window David Gibson
@ 2016-10-13  8:06   ` Laurent Vivier
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2016-10-13  8:06 UTC (permalink / raw)
  To: David Gibson, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh



On 13/10/2016 01:57, David Gibson wrote:
> On real hardware, and under pHyp, the PCI host bridges on Power machines
> typically advertise two outbound MMIO windows from the guest's physical
> memory space to PCI memory space:
>   - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
>   - A 64-bit window which maps onto a large region somewhere high in PCI
>     address space (traditionally this used an identity mapping from guest
>     physical address to PCI address, but that's not always the case)
> 
> The qemu implementation in spapr-pci-host-bridge, however, only supports a
> single outbound MMIO window, however.  At least some Linux versions expect
> the two windows however, so we arranged this window to map onto the PCI
> memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
> windows, the "32-bit" window from 2G..4G and the "64-bit" window from
> 4G..~64G.
> 
> This approach means, however, that the 64G window is not naturally aligned.
> In turn this limits the size of the largest BAR we can map (which does have
> to be naturally aligned) to roughly half of the total window.  With some
> large nVidia GPGPU cards which have huge memory BARs, this is starting to
> be a problem.
> 
> This patch adds true support for separate 32-bit and 64-bit outbound MMIO
> windows to the spapr-pci-host-bridge implementation, each of which can
> be independently configured.  The 32-bit window always maps to 2G.. in PCI
> space, but the PCI address of the 64-bit window can be configured (it
> defaults to the same as the guest physical address).
> 
> So as not to break possible existing configurations, as long as a 64-bit
> window is not specified, a large single window can be specified.  This
> will appear the same way to the guest as the old approach, although it's
> now implemented by two contiguous memory regions rather than a single one.
> 
> For now, this only adds the possibility of 64-bit windows.  The default
> configuration still uses the legacy mode.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Laurent Vivier <lvivier@redhat.com>

> ---
>  hw/ppc/spapr.c              | 10 +++++--
>  hw/ppc/spapr_pci.c          | 70 ++++++++++++++++++++++++++++++++++++---------
>  include/hw/pci-host/spapr.h |  8 ++++--
>  include/hw/ppc/spapr.h      |  3 +-
>  4 files changed, 72 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e6b110d..8db3657 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2371,7 +2371,8 @@ static HotpluggableCPUList *spapr_query_hotpluggable_cpus(MachineState *machine)
>  }
>  
>  static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> -                                uint64_t *buid, hwaddr *pio, hwaddr *mmio,
> +                                uint64_t *buid, hwaddr *pio,
> +                                hwaddr *mmio32, hwaddr *mmio64,
>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>  {
>      const uint64_t base_buid = 0x800000020000000ULL;
> @@ -2409,7 +2410,12 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>  
>      phb_base = phb0_base + index * phb_spacing;
>      *pio = phb_base + pio_offset;
> -    *mmio = phb_base + mmio_offset;
> +    *mmio32 = phb_base + mmio_offset;
> +    /*
> +     * We don't set the 64-bit MMIO window, relying on the PHB's
> +     * fallback behaviour of automatically splitting a large "32-bit"
> +     * window into contiguous 32-bit and 64-bit windows
> +     */
>  }
>  
>  static void spapr_machine_class_init(ObjectClass *oc, void *data)
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 0e6cf4d..31ca6fa 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1317,14 +1317,16 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>          if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != (uint32_t)-1)
>              || (sphb->dma_liobn[1] != (uint32_t)-1 && windows_supported == 2)
>              || (sphb->mem_win_addr != (hwaddr)-1)
> +            || (sphb->mem64_win_addr != (hwaddr)-1)
>              || (sphb->io_win_addr != (hwaddr)-1)) {
>              error_setg(errp, "Either \"index\" or other parameters must"
>                         " be specified for PAPR PHB, not both");
>              return;
>          }
>  
> -        smc->phb_placement(spapr, sphb->index, &sphb->buid,
> -                           &sphb->io_win_addr, &sphb->mem_win_addr,
> +        smc->phb_placement(spapr, sphb->index,
> +                           &sphb->buid, &sphb->io_win_addr,
> +                           &sphb->mem_win_addr, &sphb->mem64_win_addr,
>                             windows_supported, sphb->dma_liobn, &local_err);
>          if (local_err) {
>              error_propagate(errp, local_err);
> @@ -1353,6 +1355,38 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    if (sphb->mem64_win_size != 0) {
> +        if (sphb->mem64_win_addr == (hwaddr)-1) {
> +            error_setg(errp,
> +                       "64-bit memory window address not specified for PHB");
> +            return;
> +        }
> +
> +        if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
> +            error_setg(errp, "32-bit memory window of size 0x%"HWADDR_PRIx
> +                       " (max 2 GiB)", sphb->mem_win_size);
> +            return;
> +        }
> +
> +        if (sphb->mem64_win_pciaddr == (hwaddr)-1) {
> +            /* 64-bit window defaults to identity mapping */
> +            sphb->mem64_win_pciaddr = sphb->mem64_win_addr;
> +        }
> +    } else if (sphb->mem_win_size > SPAPR_PCI_MEM32_WIN_SIZE) {
> +        /*
> +         * For compatibility with old configuration, if no 64-bit MMIO
> +         * window is specified, but the ordinary (32-bit) memory
> +         * window is specified as > 2GiB, we treat it as a 2GiB 32-bit
> +         * window, with a 64-bit MMIO window following on immediately
> +         * afterwards
> +         */
> +        sphb->mem64_win_size = sphb->mem_win_size - SPAPR_PCI_MEM32_WIN_SIZE;
> +        sphb->mem64_win_addr = sphb->mem_win_addr + SPAPR_PCI_MEM32_WIN_SIZE;
> +        sphb->mem64_win_pciaddr =
> +            SPAPR_PCI_MEM_WIN_BUS_OFFSET + SPAPR_PCI_MEM32_WIN_SIZE;
> +        sphb->mem_win_size = SPAPR_PCI_MEM32_WIN_SIZE;
> +    }
> +
>      if (spapr_pci_find_phb(spapr, sphb->buid)) {
>          error_setg(errp, "PCI host bridges must have unique BUIDs");
>          return;
> @@ -1366,12 +1400,19 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>      sprintf(namebuf, "%s.mmio", sphb->dtbusname);
>      memory_region_init(&sphb->memspace, OBJECT(sphb), namebuf, UINT64_MAX);
>  
> -    sprintf(namebuf, "%s.mmio-alias", sphb->dtbusname);
> -    memory_region_init_alias(&sphb->memwindow, OBJECT(sphb),
> +    sprintf(namebuf, "%s.mmio32-alias", sphb->dtbusname);
> +    memory_region_init_alias(&sphb->mem32window, OBJECT(sphb),
>                               namebuf, &sphb->memspace,
>                               SPAPR_PCI_MEM_WIN_BUS_OFFSET, sphb->mem_win_size);
>      memory_region_add_subregion(get_system_memory(), sphb->mem_win_addr,
> -                                &sphb->memwindow);
> +                                &sphb->mem32window);
> +
> +    sprintf(namebuf, "%s.mmio64-alias", sphb->dtbusname);
> +    memory_region_init_alias(&sphb->mem64window, OBJECT(sphb),
> +                             namebuf, &sphb->memspace,
> +                             sphb->mem64_win_pciaddr, sphb->mem64_win_size);
> +    memory_region_add_subregion(get_system_memory(), sphb->mem64_win_addr,
> +                                &sphb->mem64window);
>  
>      /* Initialize IO regions */
>      sprintf(namebuf, "%s.io", sphb->dtbusname);
> @@ -1525,6 +1566,10 @@ static Property spapr_phb_properties[] = {
>      DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
>      DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
>                         SPAPR_PCI_MMIO_WIN_SIZE),
> +    DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
> +    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
> +    DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
> +                       -1),
>      DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
>      DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
>                         SPAPR_PCI_IO_WIN_SIZE),
> @@ -1760,10 +1805,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>      int bus_off, i, j, ret;
>      char nodename[FDT_NAME_MAX];
>      uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
> -    const uint64_t mmiosize = memory_region_size(&phb->memwindow);
> -    const uint64_t w32max = (1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET;
> -    const uint64_t w32size = MIN(w32max, mmiosize);
> -    const uint64_t w64size = (mmiosize > w32size) ? (mmiosize - w32size) : 0;
>      struct {
>          uint32_t hi;
>          uint64_t child;
> @@ -1778,15 +1819,16 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>          {
>              cpu_to_be32(b_ss(2)), cpu_to_be64(SPAPR_PCI_MEM_WIN_BUS_OFFSET),
>              cpu_to_be64(phb->mem_win_addr),
> -            cpu_to_be64(w32size),
> +            cpu_to_be64(phb->mem_win_size),
>          },
>          {
> -            cpu_to_be32(b_ss(3)), cpu_to_be64(1ULL << 32),
> -            cpu_to_be64(phb->mem_win_addr + w32size),
> -            cpu_to_be64(w64size)
> +            cpu_to_be32(b_ss(3)), cpu_to_be64(phb->mem64_win_pciaddr),
> +            cpu_to_be64(phb->mem64_win_addr),
> +            cpu_to_be64(phb->mem64_win_size),
>          },
>      };
> -    const unsigned sizeof_ranges = (w64size ? 3 : 2) * sizeof(ranges[0]);
> +    const unsigned sizeof_ranges =
> +        (phb->mem64_win_size ? 3 : 2) * sizeof(ranges[0]);
>      uint64_t bus_reg[] = { cpu_to_be64(phb->buid), 0 };
>      uint32_t interrupt_map_mask[] = {
>          cpu_to_be32(b_ddddd(-1)|b_fff(0)), 0x0, 0x0, cpu_to_be32(-1)};
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 8c9ebfd..23dfb09 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -53,8 +53,10 @@ struct sPAPRPHBState {
>      bool dr_enabled;
>  
>      MemoryRegion memspace, iospace;
> -    hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
> -    MemoryRegion memwindow, iowindow, msiwindow;
> +    hwaddr mem_win_addr, mem_win_size, mem64_win_addr, mem64_win_size;
> +    uint64_t mem64_win_pciaddr;
> +    hwaddr io_win_addr, io_win_size;
> +    MemoryRegion mem32window, mem64window, iowindow, msiwindow;
>  
>      uint32_t dma_liobn[SPAPR_PCI_DMA_MAX_WINDOWS];
>      hwaddr dma_win_addr, dma_win_size;
> @@ -80,6 +82,8 @@ struct sPAPRPHBState {
>  };
>  
>  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
> +#define SPAPR_PCI_MEM32_WIN_SIZE     \
> +    ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
>  
>  #define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
>  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a05783c..aeaba3e 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -41,7 +41,8 @@ struct sPAPRMachineClass {
>      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
>      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
>      void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
> -                          uint64_t *buid, hwaddr *pio, hwaddr *mmio,
> +                          uint64_t *buid, hwaddr *pio, 
> +                          hwaddr *mmio32, hwaddr *mmio64,
>                            unsigned n_dma, uint32_t *liobns, Error **errp);
>  };
>  
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
@ 2016-10-13  8:40   ` Laurent Vivier
  2016-10-13 23:29     ` David Gibson
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2016-10-13  8:40 UTC (permalink / raw)
  To: David Gibson, agraf
  Cc: mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh



On 13/10/2016 01:57, David Gibson wrote:
> Currently, the MMIO space for accessing PCI on pseries guests begins at
> 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> chunk of address space in which it places its outbound PIO and 32-bit and
> 64-bit MMIO windows.
> 
> This scheme as several problems:
>   - It limits guest RAM to 1 TiB (though we have a limited fix for this
>     now)
>   - It limits the total MMIO window to 64 GiB.  This is not always enough
>     for some of the large nVidia GPGPU cards
>   - Putting all the windows into a single 64 GiB area means that naturally
>     aligning things within there will waste more address space.
> In addition there was a miscalculation in some of the defaults, which meant
> that the MMIO windows for each PHB actually slightly overran the 64 GiB
> region for that PHB.  We got away without nasty consequences because
> the overrun fit within an unused area at the beginning of the next PHB's
> region, but it's not pretty.
> 
> This patch implements a new scheme which addresses those problems, and is
> also closer to what bare metal hardware and pHyp guests generally use.
> 
> Because some guest versions (including most current distro kernels) can't
> access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> one PHB each.
> 
> This reduces the number of allowed PHBs (without full manual configuration
> of all the windows) from 256 to 31, but this should still be plenty in
> practice.
> 
> We also change some of the default window sizes for manually configured
> PHBs to saner values.
> 
> Finally we adjust some tests and libqos so that it correctly uses the new
> default locations.  Ideally it would parse the device tree given to the
> guest, but that's a more complex problem for another time.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c              | 126 +++++++++++++++++++++++++++++++++-----------
>  hw/ppc/spapr_pci.c          |   5 +-
>  include/hw/pci-host/spapr.h |   8 ++-
>  tests/endianness-test.c     |   3 +-
>  tests/libqos/pci-spapr.c    |   9 ++--
>  tests/spapr-phb-test.c      |   2 +-
>  6 files changed, 113 insertions(+), 40 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8db3657..2d952a8 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>                                  hwaddr *mmio32, hwaddr *mmio64,
>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>  {
> +    /*
> +     * New-style PHB window placement.
> +     *
> +     * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> +     * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> +     * windows.
> +     *
> +     * Some guest kernels can't work with MMIO windows above 1<<46
> +     * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> +     *
> +     * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> +     * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> +     * has the 64-bit window for PHB1 and so forth.
> +     */
>      const uint64_t base_buid = 0x800000020000000ULL;
> -    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> -    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> -    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> -    const uint32_t max_index = 255;
> -    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
>  
> -    uint64_t ram_top = MACHINE(spapr)->ram_size;
> -    hwaddr phb0_base, phb_base;
> +    int max_phbs =
> +        (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> +    hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;

The result is the same but I would add SPAPR_PCI_MEM_WIN_BUS_OFFSET
instead of SPAPR_PCI_MEM32_WIN_SIZE.

As SPAPR_PCI_MEM32_WIN_SIZE is defined as "((1ULL << 32) -
SPAPR_PCI_MEM_WIN_BUS_OFFSET)", I guess 0..SPAPR_PCI_MEM_WIN_BUS_OFFSET
is for PIO and SPAPR_PCI_MEM_WIN_BUS_OFFSET..(1<<32) is for MMIO.

This is what we have below with:

    *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
    *mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE;

Perhaps we can see *mmio32 as

    SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE

But in this case you should directly defined SPAPR_PCI_MEM32_WIN_SIZE as
0x80000000, not as "SIZE - OFFSET". It's confusing...

If it is only a misunderstanding from my side, you can add my:

Reviewed-by: Laurent Vivier <lvivier@redhat.com>

Thanks,
Laurent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map
  2016-10-13  8:40   ` Laurent Vivier
@ 2016-10-13 23:29     ` David Gibson
  2016-10-14  7:25       ` Laurent Vivier
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-10-13 23:29 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: agraf, mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh

[-- Attachment #1: Type: text/plain, Size: 20088 bytes --]

On Thu, Oct 13, 2016 at 10:40:32AM +0200, Laurent Vivier wrote:
> 
> 
> On 13/10/2016 01:57, David Gibson wrote:
> > Currently, the MMIO space for accessing PCI on pseries guests begins at
> > 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> > chunk of address space in which it places its outbound PIO and 32-bit and
> > 64-bit MMIO windows.
> > 
> > This scheme as several problems:
> >   - It limits guest RAM to 1 TiB (though we have a limited fix for this
> >     now)
> >   - It limits the total MMIO window to 64 GiB.  This is not always enough
> >     for some of the large nVidia GPGPU cards
> >   - Putting all the windows into a single 64 GiB area means that naturally
> >     aligning things within there will waste more address space.
> > In addition there was a miscalculation in some of the defaults, which meant
> > that the MMIO windows for each PHB actually slightly overran the 64 GiB
> > region for that PHB.  We got away without nasty consequences because
> > the overrun fit within an unused area at the beginning of the next PHB's
> > region, but it's not pretty.
> > 
> > This patch implements a new scheme which addresses those problems, and is
> > also closer to what bare metal hardware and pHyp guests generally use.
> > 
> > Because some guest versions (including most current distro kernels) can't
> > access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> > 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> > (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> > subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> > one PHB each.
> > 
> > This reduces the number of allowed PHBs (without full manual configuration
> > of all the windows) from 256 to 31, but this should still be plenty in
> > practice.
> > 
> > We also change some of the default window sizes for manually configured
> > PHBs to saner values.
> > 
> > Finally we adjust some tests and libqos so that it correctly uses the new
> > default locations.  Ideally it would parse the device tree given to the
> > guest, but that's a more complex problem for another time.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr.c              | 126 +++++++++++++++++++++++++++++++++-----------
> >  hw/ppc/spapr_pci.c          |   5 +-
> >  include/hw/pci-host/spapr.h |   8 ++-
> >  tests/endianness-test.c     |   3 +-
> >  tests/libqos/pci-spapr.c    |   9 ++--
> >  tests/spapr-phb-test.c      |   2 +-
> >  6 files changed, 113 insertions(+), 40 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 8db3657..2d952a8 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> >                                  hwaddr *mmio32, hwaddr *mmio64,
> >                                  unsigned n_dma, uint32_t *liobns, Error **errp)
> >  {
> > +    /*
> > +     * New-style PHB window placement.
> > +     *
> > +     * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> > +     * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> > +     * windows.
> > +     *
> > +     * Some guest kernels can't work with MMIO windows above 1<<46
> > +     * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> > +     *
> > +     * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> > +     * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> > +     * has the 64-bit window for PHB1 and so forth.
> > +     */
> >      const uint64_t base_buid = 0x800000020000000ULL;
> > -    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> > -    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> > -    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> > -    const uint32_t max_index = 255;
> > -    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> >  
> > -    uint64_t ram_top = MACHINE(spapr)->ram_size;
> > -    hwaddr phb0_base, phb_base;
> > +    int max_phbs =
> > +        (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> > +    hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;
> 
> The result is the same but I would add SPAPR_PCI_MEM_WIN_BUS_OFFSET
> instead of SPAPR_PCI_MEM32_WIN_SIZE.
> 
> As SPAPR_PCI_MEM32_WIN_SIZE is defined as "((1ULL << 32) -
> SPAPR_PCI_MEM_WIN_BUS_OFFSET)", I guess 0..SPAPR_PCI_MEM_WIN_BUS_OFFSET
> is for PIO and SPAPR_PCI_MEM_WIN_BUS_OFFSET..(1<<32) is for MMIO.

No, not quite.

> This is what we have below with:
> 
>     *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
>     *mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE;
> 
> Perhaps we can see *mmio32 as
> 
>     SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE

That's the intended effect.  Basically we take 32..64TiB and divide it
into 1TiB regions.  Regions 1..31 are mmio64 windows.  Region 0 is
subdivided into 2GiB subregions.  Subregions 1..whatever are mmio32
windows, subregion 0 we subdivide into 64kiB pio windows.

So, all the PIO windows are stacked together, all the mmio32 windows
are stacked together and all the mmio64 windows are stacked together,
to give us minimim address space wastage while preserving natural
alignment of everything.

> But in this case you should directly defined SPAPR_PCI_MEM32_WIN_SIZE as
> 0x80000000, not as "SIZE - OFFSET". It's confusing...

No, I think the size - offset makes sense.  By definition the mmio32
window can't extend past 4G in PCI space.
SPAPR_PCI_MEM_WIN_BUS_OFFSET is the start of the window in PCI space,
so 4G - bus_offset is necessarily the size of the window.

The assert()s are there to make sure all those things really are
naturally aligned as intended.

> If it is only a misunderstanding from my side, you can add my:
> 
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>

So, I've adjusted my patch a little to try to make this clearer.  I've
updated the comment, used explicit (index + 1) logic and used compile
time rather than run-time assert()s.  Please review.

From 98f07c09c6d72218800d6cfbf44b973a88ece2aa Mon Sep 17 00:00:00 2001
From: David Gibson <david@gibson.dropbear.id.au>
Date: Fri, 14 Oct 2016 10:21:00 +1100
Subject: [PATCH] spapr: Improved placement of PCI host bridges in guest memory
 map

Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
  - It limits guest RAM to 1 TiB (though we have a limited fix for this
    now)
  - It limits the total MMIO window to 64 GiB.  This is not always enough
    for some of the large nVidia GPGPU cards
  - Putting all the windows into a single 64 GiB area means that naturally
    aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB.  We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
(64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations.  Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c              | 121 +++++++++++++++++++++++++++++++++-----------
 hw/ppc/spapr_pci.c          |   5 +-
 include/hw/pci-host/spapr.h |   8 ++-
 tests/endianness-test.c     |   3 +-
 tests/libqos/pci-spapr.c    |   9 ++--
 tests/spapr-phb-test.c      |   2 +-
 6 files changed, 108 insertions(+), 40 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8db3657..4bdf15b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2375,31 +2375,37 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
                                 hwaddr *mmio32, hwaddr *mmio64,
                                 unsigned n_dma, uint32_t *liobns, Error **errp)
 {
+    /*
+     * New-style PHB window placement.
+     *
+     * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
+     * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
+     * windows.
+     *
+     * Some guest kernels can't work with MMIO windows above 1<<46
+     * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
+     *
+     * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
+     * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
+     * has the 64-bit window for PHB1 and so forth.
+     */
     const uint64_t base_buid = 0x800000020000000ULL;
-    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
-    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
-    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
-    const uint32_t max_index = 255;
-    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
-
-    uint64_t ram_top = MACHINE(spapr)->ram_size;
-    hwaddr phb0_base, phb_base;
+    const int max_phbs =
+        (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
     int i;
 
-    /* Do we have hotpluggable memory? */
-    if (MACHINE(spapr)->maxram_size > ram_top) {
-        /* Can't just use maxram_size, because there may be an
-         * alignment gap between normal and hotpluggable memory
-         * regions */
-        ram_top = spapr->hotplug_memory.base +
-            memory_region_size(&spapr->hotplug_memory.mr);
-    }
-
-    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
+    /* Sanity check natural alignments */
+    QEMU_BUILD_BUG_ON((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
+    QEMU_BUILD_BUG_ON((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
+    QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM64_WIN_SIZE % SPAPR_PCI_MEM32_WIN_SIZE) != 0);
+    QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZE) != 0);
+    /* Sanity check bounds */
+    QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_IO_WIN_SIZE) > SPAPR_PCI_MEM32_WIN_SIZE);
+    QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_MEM32_WIN_SIZE) > SPAPR_PCI_MEM64_WIN_SIZE);
 
-    if (index > max_index) {
+    if (index >= max_phbs) {
         error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
-                   max_index);
+                   max_phbs - 1);
         return;
     }
 
@@ -2408,14 +2414,9 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
         liobns[i] = SPAPR_PCI_LIOBN(index, i);
     }
 
-    phb_base = phb0_base + index * phb_spacing;
-    *pio = phb_base + pio_offset;
-    *mmio32 = phb_base + mmio_offset;
-    /*
-     * We don't set the 64-bit MMIO window, relying on the PHB's
-     * fallback behaviour of automatically splitting a large "32-bit"
-     * window into contiguous 32-bit and 64-bit windows
-     */
+    *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
+    *mmio32 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE;
+    *mmio64 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM64_WIN_SIZE;
 }
 
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
@@ -2519,8 +2520,67 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
 /*
  * pseries-2.7
  */
-#define SPAPR_COMPAT_2_7 \
-    HW_COMPAT_2_7 \
+#define SPAPR_COMPAT_2_7                            \
+    HW_COMPAT_2_7                                   \
+    {                                               \
+        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
+        .property = "mem_win_size",                 \
+        .value    = stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\
+    },                                              \
+    {                                               \
+        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
+        .property = "mem64_win_size",               \
+        .value    = "0",                            \
+    },
+
+static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
+                              uint64_t *buid, hwaddr *pio,
+                              hwaddr *mmio32, hwaddr *mmio64,
+                              unsigned n_dma, uint32_t *liobns, Error **errp)
+{
+    /* Legacy PHB placement for pseries-2.7 and earlier machine types */
+    const uint64_t base_buid = 0x800000020000000ULL;
+    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
+    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
+    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
+    const uint32_t max_index = 255;
+    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
+
+    uint64_t ram_top = MACHINE(spapr)->ram_size;
+    hwaddr phb0_base, phb_base;
+    int i;
+
+    /* Do we have hotpluggable memory? */
+    if (MACHINE(spapr)->maxram_size > ram_top) {
+        /* Can't just use maxram_size, because there may be an
+         * alignment gap between normal and hotpluggable memory
+         * regions */
+        ram_top = spapr->hotplug_memory.base +
+            memory_region_size(&spapr->hotplug_memory.mr);
+    }
+
+    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
+
+    if (index > max_index) {
+        error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
+                   max_index);
+        return;
+    }
+
+    *buid = base_buid + index;
+    for (i = 0; i < n_dma; ++i) {
+        liobns[i] = SPAPR_PCI_LIOBN(index, i);
+    }
+
+    phb_base = phb0_base + index * phb_spacing;
+    *pio = phb_base + pio_offset;
+    *mmio32 = phb_base + mmio_offset;
+    /*
+     * We don't set the 64-bit MMIO window, relying on the PHB's
+     * fallback behaviour of automatically splitting a large "32-bit"
+     * window into contiguous 32-bit and 64-bit windows
+     */
+}
 
 static void spapr_machine_2_7_instance_options(MachineState *machine)
 {
@@ -2533,6 +2593,7 @@ static void spapr_machine_2_7_class_options(MachineClass *mc)
     spapr_machine_2_8_class_options(mc);
     smc->tcg_default_cpu = "POWER7";
     SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_7);
+    smc->phb_placement = phb_placement_2_7;
 }
 
 DEFINE_SPAPR_MACHINE(2_7, "2.7", false);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 31ca6fa..3ef6a26 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1565,9 +1565,10 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_UINT32("liobn64", sPAPRPHBState, dma_liobn[1], -1),
     DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
     DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
-                       SPAPR_PCI_MMIO_WIN_SIZE),
+                       SPAPR_PCI_MEM32_WIN_SIZE),
     DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
-    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
+    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size,
+                       SPAPR_PCI_MEM64_WIN_SIZE),
     DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
                        -1),
     DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 23dfb09..b92c1b5 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -84,8 +84,14 @@ struct sPAPRPHBState {
 #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
 #define SPAPR_PCI_MEM32_WIN_SIZE     \
     ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
+#define SPAPR_PCI_MEM64_WIN_SIZE     0x10000000000ULL /* 1 TiB */
 
-#define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
+/* Without manual configuration, all PCI outbound windows will be
+ * within this range */
+#define SPAPR_PCI_BASE               (1ULL << 45) /* 32 TiB */
+#define SPAPR_PCI_LIMIT              (1ULL << 46) /* 64 TiB */
+
+#define SPAPR_PCI_2_7_MMIO_WIN_SIZE  0xf80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
 
 #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
diff --git a/tests/endianness-test.c b/tests/endianness-test.c
index b7a120e..cf8d41b 100644
--- a/tests/endianness-test.c
+++ b/tests/endianness-test.c
@@ -38,7 +38,8 @@ static const TestCase test_cases[] = {
     { "ppc", "prep", 0x80000000, .bswap = true },
     { "ppc", "bamboo", 0xe8000000, .bswap = true, .superio = "i82378" },
     { "ppc64", "mac99", 0xf2000000, .bswap = true, .superio = "i82378" },
-    { "ppc64", "pseries", 0x10080000000ULL,
+    { "ppc64", "pseries", (1ULL << 45), .bswap = true, .superio = "i82378" },
+    { "ppc64", "pseries-2.7", 0x10080000000ULL,
       .bswap = true, .superio = "i82378" },
     { "sh4", "r2d", 0xfe240000, .superio = "i82378" },
     { "sh4eb", "r2d", 0xfe240000, .bswap = true, .superio = "i82378" },
diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
index 558dfc3..2eaaf91 100644
--- a/tests/libqos/pci-spapr.c
+++ b/tests/libqos/pci-spapr.c
@@ -235,10 +235,9 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void *data)
     /* FIXME */
 }
 
-#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
-#define SPAPR_PCI_MMIO32_WIN_OFF     0xA0000000
+#define SPAPR_PCI_BASE               (1ULL << 45)
+
 #define SPAPR_PCI_MMIO32_WIN_SIZE    0x80000000 /* 2 GiB */
-#define SPAPR_PCI_IO_WIN_OFF         0x80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
 
 QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
@@ -273,12 +272,12 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
      * get the window locations */
     ret->buid = 0x800000020000000ULL;
 
-    ret->pio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_IO_WIN_OFF;
+    ret->pio_cpu_base = SPAPR_PCI_BASE;
     ret->pio.pci_base = 0;
     ret->pio.size = SPAPR_PCI_IO_WIN_SIZE;
 
     /* 32-bit portion of the MMIO window is at PCI address 2..4 GiB */
-    ret->mmio32_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO32_WIN_OFF;
+    ret->mmio32_cpu_base = SPAPR_PCI_BASE + SPAPR_PCI_MMIO32_WIN_SIZE;
     ret->mmio32.pci_base = 0x80000000; /* 2 GiB */
     ret->mmio32.size = SPAPR_PCI_MMIO32_WIN_SIZE;
 
diff --git a/tests/spapr-phb-test.c b/tests/spapr-phb-test.c
index 21004a7..d3522ea 100644
--- a/tests/spapr-phb-test.c
+++ b/tests/spapr-phb-test.c
@@ -25,7 +25,7 @@ int main(int argc, char **argv)
     g_test_init(&argc, &argv, NULL);
     qtest_add_func("/spapr-phb/device", test_phb_device);
 
-    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=100");
+    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=30");
 
     ret = g_test_run();
 
-- 
2.7.4



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries
  2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
                   ` (6 preceding siblings ...)
  2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
@ 2016-10-14  4:10 ` no-reply
  2016-10-14  5:41   ` David Gibson
  7 siblings, 1 reply; 18+ messages in thread
From: no-reply @ 2016-10-14  4:10 UTC (permalink / raw)
  To: david
  Cc: famz, lvivier, agraf, mst, aik, mdroth, qemu-devel, qemu-ppc, abologna

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1476316647-9433-1-git-send-email-david@gibson.dropbear.id.au
Subject: [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
8fa0718 spapr: Improved placement of PCI host bridges in guest memory map
6416268 spapr_pci: Add a 64-bit MMIO window
94e8f4c spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
692718a spapr_pci: Delegate placement of PCI host bridges to machine type
2b765c9 libqos: Limit spapr-pci to 32-bit MMIO for now
4543692 libqos: Correct error in PCI hole sizing for spapr
2c35727 libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()

=== OUTPUT BEGIN ===
Checking PATCH 1/7: libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()...
Checking PATCH 2/7: libqos: Correct error in PCI hole sizing for spapr...
Checking PATCH 3/7: libqos: Limit spapr-pci to 32-bit MMIO for now...
Checking PATCH 4/7: spapr_pci: Delegate placement of PCI host bridges to machine type...
Checking PATCH 5/7: spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM...
Checking PATCH 6/7: spapr_pci: Add a 64-bit MMIO window...
ERROR: trailing whitespace
#237: FILE: include/hw/ppc/spapr.h:44:
+                          uint64_t *buid, hwaddr *pio, $

total: 1 errors, 0 warnings, 170 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 7/7: spapr: Improved placement of PCI host bridges in guest memory map...
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#141: FILE: hw/ppc/spapr.c:2528:
+#define SPAPR_COMPAT_2_7                            \
+    HW_COMPAT_2_7                                   \
+    {                                               \
+        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
+        .property = "mem_win_size",                 \
+        .value    = stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\
+    },                                              \
+    {                                               \
+        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
+        .property = "mem64_win_size",               \
+        .value    = "0",                            \
+    },

total: 1 errors, 0 warnings, 225 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries
  2016-10-14  4:10 ` [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries no-reply
@ 2016-10-14  5:41   ` David Gibson
  2016-10-14  7:07     ` Laurent Vivier
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-10-14  5:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, lvivier, agraf, mst, aik, mdroth, qemu-ppc, abologna

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]

On Thu, Oct 13, 2016 at 09:10:38PM -0700, no-reply@patchew.org wrote:
> Hi,
> 
> Your series seems to have some coding style problems. See output below for
> more information:

So.. this one is a fairly standard false positive.  checkpatch
complains about multiple statements in a macro when it's actually a
macro that's supposed to expand within a data structure initializer.

> Type: series
> Message-id: 1476316647-9433-1-git-send-email-david@gibson.dropbear.id.au
> Subject: [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> 
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
> 
> # Useful git options
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> 
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
>     echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
>     if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
>         failed=1
>         echo
>     fi
>     n=$((n+1))
> done
> 
> exit $failed
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Switched to a new branch 'test'
> 8fa0718 spapr: Improved placement of PCI host bridges in guest memory map
> 6416268 spapr_pci: Add a 64-bit MMIO window
> 94e8f4c spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
> 692718a spapr_pci: Delegate placement of PCI host bridges to machine type
> 2b765c9 libqos: Limit spapr-pci to 32-bit MMIO for now
> 4543692 libqos: Correct error in PCI hole sizing for spapr
> 2c35727 libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()
> 
> === OUTPUT BEGIN ===
> Checking PATCH 1/7: libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()...
> Checking PATCH 2/7: libqos: Correct error in PCI hole sizing for spapr...
> Checking PATCH 3/7: libqos: Limit spapr-pci to 32-bit MMIO for now...
> Checking PATCH 4/7: spapr_pci: Delegate placement of PCI host bridges to machine type...
> Checking PATCH 5/7: spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM...
> Checking PATCH 6/7: spapr_pci: Add a 64-bit MMIO window...
> ERROR: trailing whitespace
> #237: FILE: include/hw/ppc/spapr.h:44:
> +                          uint64_t *buid, hwaddr *pio, $
> 
> total: 1 errors, 0 warnings, 170 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.

In attempting to follow the suggestion above, I discovered that
checkpatch has no listed maintainer.  Maybe we should change that
message to "if any of these errors are false positives, then too bad,
suck it up"...

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries
  2016-10-14  5:41   ` David Gibson
@ 2016-10-14  7:07     ` Laurent Vivier
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2016-10-14  7:07 UTC (permalink / raw)
  To: David Gibson, qemu-devel
  Cc: famz, agraf, mst, aik, mdroth, qemu-ppc, abologna



On 14/10/2016 07:41, David Gibson wrote:
> On Thu, Oct 13, 2016 at 09:10:38PM -0700, no-reply@patchew.org wrote:
>> Hi,
>>
>> Your series seems to have some coding style problems. See output below for
>> more information:
> 
> So.. this one is a fairly standard false positive.  checkpatch
> complains about multiple statements in a macro when it's actually a
> macro that's supposed to expand within a data structure initializer.
> 
>> Type: series
>> Message-id: 1476316647-9433-1-git-send-email-david@gibson.dropbear.id.au
>> Subject: [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries
>>
>> === TEST SCRIPT BEGIN ===
>> #!/bin/bash
>>
>> BASE=base
>> n=1
>> total=$(git log --oneline $BASE.. | wc -l)
>> failed=0
>>
>> # Useful git options
>> git config --local diff.renamelimit 0
>> git config --local diff.renames True
>>
>> commits="$(git log --format=%H --reverse $BASE..)"
>> for c in $commits; do
>>     echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
>>     if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
>>         failed=1
>>         echo
>>     fi
>>     n=$((n+1))
>> done
>>
>> exit $failed
>> === TEST SCRIPT END ===
>>
>> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
>> Switched to a new branch 'test'
>> 8fa0718 spapr: Improved placement of PCI host bridges in guest memory map
>> 6416268 spapr_pci: Add a 64-bit MMIO window
>> 94e8f4c spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
>> 692718a spapr_pci: Delegate placement of PCI host bridges to machine type
>> 2b765c9 libqos: Limit spapr-pci to 32-bit MMIO for now
>> 4543692 libqos: Correct error in PCI hole sizing for spapr
>> 2c35727 libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()
>>
>> === OUTPUT BEGIN ===
>> Checking PATCH 1/7: libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()...
>> Checking PATCH 2/7: libqos: Correct error in PCI hole sizing for spapr...
>> Checking PATCH 3/7: libqos: Limit spapr-pci to 32-bit MMIO for now...
>> Checking PATCH 4/7: spapr_pci: Delegate placement of PCI host bridges to machine type...
>> Checking PATCH 5/7: spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM...
>> Checking PATCH 6/7: spapr_pci: Add a 64-bit MMIO window...
>> ERROR: trailing whitespace
>> #237: FILE: include/hw/ppc/spapr.h:44:
>> +                          uint64_t *buid, hwaddr *pio, $
>>
>> total: 1 errors, 0 warnings, 170 lines checked
>>
>> Your patch has style problems, please review.  If any of these errors
>> are false positives report them to the maintainer, see
>> CHECKPATCH in MAINTAINERS.
> 
> In attempting to follow the suggestion above, I discovered that
> checkpatch has no listed maintainer.  Maybe we should change that
> message to "if any of these errors are false positives, then too bad,
> suck it up"...
> 

You can try to report to Paolo:

./scripts/get_maintainer.pl -f scripts/checkpatch.pl
get_maintainer.pl: No maintainers found, printing recent contributors.
get_maintainer.pl: Do not blindly cc: them on patches!  Use common sense.

Paolo Bonzini <pbonzini@redhat.com> (commit_signer:13/19=68%)
Markus Armbruster <armbru@redhat.com> (commit_signer:6/19=32%)
Eric Blake <eblake@redhat.com> (commit_signer:4/19=21%)
Stefan Hajnoczi <stefanha@redhat.com> (commit_signer:3/19=16%)
Peter Maydell <peter.maydell@linaro.org> (commit_signer:2/19=11%)
qemu-devel@nongnu.org (open list:All patches CC here)


Laurent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map
  2016-10-13 23:29     ` David Gibson
@ 2016-10-14  7:25       ` Laurent Vivier
  2016-10-16  1:06         ` David Gibson
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2016-10-14  7:25 UTC (permalink / raw)
  To: David Gibson
  Cc: agraf, mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh

On 14/10/2016 01:29, David Gibson wrote:
> From 98f07c09c6d72218800d6cfbf44b973a88ece2aa Mon Sep 17 00:00:00 2001
> From: David Gibson <david@gibson.dropbear.id.au>
> Date: Fri, 14 Oct 2016 10:21:00 +1100
> Subject: [PATCH] spapr: Improved placement of PCI host bridges in guest memory
>  map
> 
> Currently, the MMIO space for accessing PCI on pseries guests begins at
> 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> chunk of address space in which it places its outbound PIO and 32-bit and
> 64-bit MMIO windows.
> 
> This scheme as several problems:
>   - It limits guest RAM to 1 TiB (though we have a limited fix for this
>     now)
>   - It limits the total MMIO window to 64 GiB.  This is not always enough
>     for some of the large nVidia GPGPU cards
>   - Putting all the windows into a single 64 GiB area means that naturally
>     aligning things within there will waste more address space.
> In addition there was a miscalculation in some of the defaults, which meant
> that the MMIO windows for each PHB actually slightly overran the 64 GiB
> region for that PHB.  We got away without nasty consequences because
> the overrun fit within an unused area at the beginning of the next PHB's
> region, but it's not pretty.
> 
> This patch implements a new scheme which addresses those problems, and is
> also closer to what bare metal hardware and pHyp guests generally use.
> 
> Because some guest versions (including most current distro kernels) can't
> access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> one PHB each.
> 
> This reduces the number of allowed PHBs (without full manual configuration
> of all the windows) from 256 to 31, but this should still be plenty in
> practice.
> 
> We also change some of the default window sizes for manually configured
> PHBs to saner values.
> 
> Finally we adjust some tests and libqos so that it correctly uses the new
> default locations.  Ideally it would parse the device tree given to the
> guest, but that's a more complex problem for another time.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>


I really like this new version.

Reviewed-by: Laurent Vivier <lvivier@redhat.com>

Laurent

> ---
>  hw/ppc/spapr.c              | 121 +++++++++++++++++++++++++++++++++-----------
>  hw/ppc/spapr_pci.c          |   5 +-
>  include/hw/pci-host/spapr.h |   8 ++-
>  tests/endianness-test.c     |   3 +-
>  tests/libqos/pci-spapr.c    |   9 ++--
>  tests/spapr-phb-test.c      |   2 +-
>  6 files changed, 108 insertions(+), 40 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8db3657..4bdf15b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2375,31 +2375,37 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>                                  hwaddr *mmio32, hwaddr *mmio64,
>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>  {
> +    /*
> +     * New-style PHB window placement.
> +     *
> +     * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> +     * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> +     * windows.
> +     *
> +     * Some guest kernels can't work with MMIO windows above 1<<46
> +     * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> +     *
> +     * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> +     * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> +     * has the 64-bit window for PHB1 and so forth.
> +     */
>      const uint64_t base_buid = 0x800000020000000ULL;
> -    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> -    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> -    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> -    const uint32_t max_index = 255;
> -    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> -
> -    uint64_t ram_top = MACHINE(spapr)->ram_size;
> -    hwaddr phb0_base, phb_base;
> +    const int max_phbs =
> +        (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
>      int i;
>  
> -    /* Do we have hotpluggable memory? */
> -    if (MACHINE(spapr)->maxram_size > ram_top) {
> -        /* Can't just use maxram_size, because there may be an
> -         * alignment gap between normal and hotpluggable memory
> -         * regions */
> -        ram_top = spapr->hotplug_memory.base +
> -            memory_region_size(&spapr->hotplug_memory.mr);
> -    }
> -
> -    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> +    /* Sanity check natural alignments */
> +    QEMU_BUILD_BUG_ON((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> +    QEMU_BUILD_BUG_ON((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> +    QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM64_WIN_SIZE % SPAPR_PCI_MEM32_WIN_SIZE) != 0);
> +    QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZE) != 0);
> +    /* Sanity check bounds */
> +    QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_IO_WIN_SIZE) > SPAPR_PCI_MEM32_WIN_SIZE);
> +    QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_MEM32_WIN_SIZE) > SPAPR_PCI_MEM64_WIN_SIZE);
>  
> -    if (index > max_index) {
> +    if (index >= max_phbs) {
>          error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> -                   max_index);
> +                   max_phbs - 1);
>          return;
>      }
>  
> @@ -2408,14 +2414,9 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>          liobns[i] = SPAPR_PCI_LIOBN(index, i);
>      }
>  
> -    phb_base = phb0_base + index * phb_spacing;
> -    *pio = phb_base + pio_offset;
> -    *mmio32 = phb_base + mmio_offset;
> -    /*
> -     * We don't set the 64-bit MMIO window, relying on the PHB's
> -     * fallback behaviour of automatically splitting a large "32-bit"
> -     * window into contiguous 32-bit and 64-bit windows
> -     */
> +    *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
> +    *mmio32 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE;
> +    *mmio64 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM64_WIN_SIZE;
>  }
>  
>  static void spapr_machine_class_init(ObjectClass *oc, void *data)
> @@ -2519,8 +2520,67 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
>  /*
>   * pseries-2.7
>   */
> -#define SPAPR_COMPAT_2_7 \
> -    HW_COMPAT_2_7 \
> +#define SPAPR_COMPAT_2_7                            \
> +    HW_COMPAT_2_7                                   \
> +    {                                               \
> +        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
> +        .property = "mem_win_size",                 \
> +        .value    = stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\
> +    },                                              \
> +    {                                               \
> +        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
> +        .property = "mem64_win_size",               \
> +        .value    = "0",                            \
> +    },
> +
> +static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
> +                              uint64_t *buid, hwaddr *pio,
> +                              hwaddr *mmio32, hwaddr *mmio64,
> +                              unsigned n_dma, uint32_t *liobns, Error **errp)
> +{
> +    /* Legacy PHB placement for pseries-2.7 and earlier machine types */
> +    const uint64_t base_buid = 0x800000020000000ULL;
> +    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> +    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> +    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> +    const uint32_t max_index = 255;
> +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> +
> +    uint64_t ram_top = MACHINE(spapr)->ram_size;
> +    hwaddr phb0_base, phb_base;
> +    int i;
> +
> +    /* Do we have hotpluggable memory? */
> +    if (MACHINE(spapr)->maxram_size > ram_top) {
> +        /* Can't just use maxram_size, because there may be an
> +         * alignment gap between normal and hotpluggable memory
> +         * regions */
> +        ram_top = spapr->hotplug_memory.base +
> +            memory_region_size(&spapr->hotplug_memory.mr);
> +    }
> +
> +    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> +
> +    if (index > max_index) {
> +        error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> +                   max_index);
> +        return;
> +    }
> +
> +    *buid = base_buid + index;
> +    for (i = 0; i < n_dma; ++i) {
> +        liobns[i] = SPAPR_PCI_LIOBN(index, i);
> +    }
> +
> +    phb_base = phb0_base + index * phb_spacing;
> +    *pio = phb_base + pio_offset;
> +    *mmio32 = phb_base + mmio_offset;
> +    /*
> +     * We don't set the 64-bit MMIO window, relying on the PHB's
> +     * fallback behaviour of automatically splitting a large "32-bit"
> +     * window into contiguous 32-bit and 64-bit windows
> +     */
> +}
>  
>  static void spapr_machine_2_7_instance_options(MachineState *machine)
>  {
> @@ -2533,6 +2593,7 @@ static void spapr_machine_2_7_class_options(MachineClass *mc)
>      spapr_machine_2_8_class_options(mc);
>      smc->tcg_default_cpu = "POWER7";
>      SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_7);
> +    smc->phb_placement = phb_placement_2_7;
>  }
>  
>  DEFINE_SPAPR_MACHINE(2_7, "2.7", false);
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 31ca6fa..3ef6a26 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1565,9 +1565,10 @@ static Property spapr_phb_properties[] = {
>      DEFINE_PROP_UINT32("liobn64", sPAPRPHBState, dma_liobn[1], -1),
>      DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
>      DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
> -                       SPAPR_PCI_MMIO_WIN_SIZE),
> +                       SPAPR_PCI_MEM32_WIN_SIZE),
>      DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
> -    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
> +    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size,
> +                       SPAPR_PCI_MEM64_WIN_SIZE),
>      DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
>                         -1),
>      DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 23dfb09..b92c1b5 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -84,8 +84,14 @@ struct sPAPRPHBState {
>  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
>  #define SPAPR_PCI_MEM32_WIN_SIZE     \
>      ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
> +#define SPAPR_PCI_MEM64_WIN_SIZE     0x10000000000ULL /* 1 TiB */
>  
> -#define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
> +/* Without manual configuration, all PCI outbound windows will be
> + * within this range */
> +#define SPAPR_PCI_BASE               (1ULL << 45) /* 32 TiB */
> +#define SPAPR_PCI_LIMIT              (1ULL << 46) /* 64 TiB */
> +
> +#define SPAPR_PCI_2_7_MMIO_WIN_SIZE  0xf80000000
>  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
>  
>  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
> diff --git a/tests/endianness-test.c b/tests/endianness-test.c
> index b7a120e..cf8d41b 100644
> --- a/tests/endianness-test.c
> +++ b/tests/endianness-test.c
> @@ -38,7 +38,8 @@ static const TestCase test_cases[] = {
>      { "ppc", "prep", 0x80000000, .bswap = true },
>      { "ppc", "bamboo", 0xe8000000, .bswap = true, .superio = "i82378" },
>      { "ppc64", "mac99", 0xf2000000, .bswap = true, .superio = "i82378" },
> -    { "ppc64", "pseries", 0x10080000000ULL,
> +    { "ppc64", "pseries", (1ULL << 45), .bswap = true, .superio = "i82378" },
> +    { "ppc64", "pseries-2.7", 0x10080000000ULL,
>        .bswap = true, .superio = "i82378" },
>      { "sh4", "r2d", 0xfe240000, .superio = "i82378" },
>      { "sh4eb", "r2d", 0xfe240000, .bswap = true, .superio = "i82378" },
> diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
> index 558dfc3..2eaaf91 100644
> --- a/tests/libqos/pci-spapr.c
> +++ b/tests/libqos/pci-spapr.c
> @@ -235,10 +235,9 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void *data)
>      /* FIXME */
>  }
>  
> -#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
> -#define SPAPR_PCI_MMIO32_WIN_OFF     0xA0000000
> +#define SPAPR_PCI_BASE               (1ULL << 45)
> +
>  #define SPAPR_PCI_MMIO32_WIN_SIZE    0x80000000 /* 2 GiB */
> -#define SPAPR_PCI_IO_WIN_OFF         0x80000000
>  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
>  
>  QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
> @@ -273,12 +272,12 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
>       * get the window locations */
>      ret->buid = 0x800000020000000ULL;
>  
> -    ret->pio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_IO_WIN_OFF;
> +    ret->pio_cpu_base = SPAPR_PCI_BASE;
>      ret->pio.pci_base = 0;
>      ret->pio.size = SPAPR_PCI_IO_WIN_SIZE;
>  
>      /* 32-bit portion of the MMIO window is at PCI address 2..4 GiB */
> -    ret->mmio32_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO32_WIN_OFF;
> +    ret->mmio32_cpu_base = SPAPR_PCI_BASE + SPAPR_PCI_MMIO32_WIN_SIZE;
>      ret->mmio32.pci_base = 0x80000000; /* 2 GiB */
>      ret->mmio32.size = SPAPR_PCI_MMIO32_WIN_SIZE;
>  
> diff --git a/tests/spapr-phb-test.c b/tests/spapr-phb-test.c
> index 21004a7..d3522ea 100644
> --- a/tests/spapr-phb-test.c
> +++ b/tests/spapr-phb-test.c
> @@ -25,7 +25,7 @@ int main(int argc, char **argv)
>      g_test_init(&argc, &argv, NULL);
>      qtest_add_func("/spapr-phb/device", test_phb_device);
>  
> -    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=100");
> +    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=30");
>  
>      ret = g_test_run();
>  
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map
  2016-10-14  7:25       ` Laurent Vivier
@ 2016-10-16  1:06         ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-10-16  1:06 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: agraf, mst, abologna, aik, qemu-devel, qemu-ppc, mdroth, benh

[-- Attachment #1: Type: text/plain, Size: 15312 bytes --]

On Fri, Oct 14, 2016 at 09:25:58AM +0200, Laurent Vivier wrote:
> On 14/10/2016 01:29, David Gibson wrote:
> > From 98f07c09c6d72218800d6cfbf44b973a88ece2aa Mon Sep 17 00:00:00 2001
> > From: David Gibson <david@gibson.dropbear.id.au>
> > Date: Fri, 14 Oct 2016 10:21:00 +1100
> > Subject: [PATCH] spapr: Improved placement of PCI host bridges in guest memory
> >  map
> > 
> > Currently, the MMIO space for accessing PCI on pseries guests begins at
> > 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> > chunk of address space in which it places its outbound PIO and 32-bit and
> > 64-bit MMIO windows.
> > 
> > This scheme as several problems:
> >   - It limits guest RAM to 1 TiB (though we have a limited fix for this
> >     now)
> >   - It limits the total MMIO window to 64 GiB.  This is not always enough
> >     for some of the large nVidia GPGPU cards
> >   - Putting all the windows into a single 64 GiB area means that naturally
> >     aligning things within there will waste more address space.
> > In addition there was a miscalculation in some of the defaults, which meant
> > that the MMIO windows for each PHB actually slightly overran the 64 GiB
> > region for that PHB.  We got away without nasty consequences because
> > the overrun fit within an unused area at the beginning of the next PHB's
> > region, but it's not pretty.
> > 
> > This patch implements a new scheme which addresses those problems, and is
> > also closer to what bare metal hardware and pHyp guests generally use.
> > 
> > Because some guest versions (including most current distro kernels) can't
> > access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> > 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> > (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> > subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> > one PHB each.
> > 
> > This reduces the number of allowed PHBs (without full manual configuration
> > of all the windows) from 256 to 31, but this should still be plenty in
> > practice.
> > 
> > We also change some of the default window sizes for manually configured
> > PHBs to saner values.
> > 
> > Finally we adjust some tests and libqos so that it correctly uses the new
> > default locations.  Ideally it would parse the device tree given to the
> > guest, but that's a more complex problem for another time.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> 
> 
> I really like this new version.
> 
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>

Great!  I've merged it into ppc-for-2.8.

> 
> Laurent
> 
> > ---
> >  hw/ppc/spapr.c              | 121 +++++++++++++++++++++++++++++++++-----------
> >  hw/ppc/spapr_pci.c          |   5 +-
> >  include/hw/pci-host/spapr.h |   8 ++-
> >  tests/endianness-test.c     |   3 +-
> >  tests/libqos/pci-spapr.c    |   9 ++--
> >  tests/spapr-phb-test.c      |   2 +-
> >  6 files changed, 108 insertions(+), 40 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 8db3657..4bdf15b 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2375,31 +2375,37 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> >                                  hwaddr *mmio32, hwaddr *mmio64,
> >                                  unsigned n_dma, uint32_t *liobns, Error **errp)
> >  {
> > +    /*
> > +     * New-style PHB window placement.
> > +     *
> > +     * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> > +     * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> > +     * windows.
> > +     *
> > +     * Some guest kernels can't work with MMIO windows above 1<<46
> > +     * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> > +     *
> > +     * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> > +     * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> > +     * has the 64-bit window for PHB1 and so forth.
> > +     */
> >      const uint64_t base_buid = 0x800000020000000ULL;
> > -    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> > -    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> > -    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> > -    const uint32_t max_index = 255;
> > -    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> > -
> > -    uint64_t ram_top = MACHINE(spapr)->ram_size;
> > -    hwaddr phb0_base, phb_base;
> > +    const int max_phbs =
> > +        (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> >      int i;
> >  
> > -    /* Do we have hotpluggable memory? */
> > -    if (MACHINE(spapr)->maxram_size > ram_top) {
> > -        /* Can't just use maxram_size, because there may be an
> > -         * alignment gap between normal and hotpluggable memory
> > -         * regions */
> > -        ram_top = spapr->hotplug_memory.base +
> > -            memory_region_size(&spapr->hotplug_memory.mr);
> > -    }
> > -
> > -    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> > +    /* Sanity check natural alignments */
> > +    QEMU_BUILD_BUG_ON((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> > +    QEMU_BUILD_BUG_ON((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> > +    QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM64_WIN_SIZE % SPAPR_PCI_MEM32_WIN_SIZE) != 0);
> > +    QEMU_BUILD_BUG_ON((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZE) != 0);
> > +    /* Sanity check bounds */
> > +    QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_IO_WIN_SIZE) > SPAPR_PCI_MEM32_WIN_SIZE);
> > +    QEMU_BUILD_BUG_ON((max_phbs * SPAPR_PCI_MEM32_WIN_SIZE) > SPAPR_PCI_MEM64_WIN_SIZE);
> >  
> > -    if (index > max_index) {
> > +    if (index >= max_phbs) {
> >          error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> > -                   max_index);
> > +                   max_phbs - 1);
> >          return;
> >      }
> >  
> > @@ -2408,14 +2414,9 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> >          liobns[i] = SPAPR_PCI_LIOBN(index, i);
> >      }
> >  
> > -    phb_base = phb0_base + index * phb_spacing;
> > -    *pio = phb_base + pio_offset;
> > -    *mmio32 = phb_base + mmio_offset;
> > -    /*
> > -     * We don't set the 64-bit MMIO window, relying on the PHB's
> > -     * fallback behaviour of automatically splitting a large "32-bit"
> > -     * window into contiguous 32-bit and 64-bit windows
> > -     */
> > +    *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
> > +    *mmio32 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE;
> > +    *mmio64 = SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM64_WIN_SIZE;
> >  }
> >  
> >  static void spapr_machine_class_init(ObjectClass *oc, void *data)
> > @@ -2519,8 +2520,67 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
> >  /*
> >   * pseries-2.7
> >   */
> > -#define SPAPR_COMPAT_2_7 \
> > -    HW_COMPAT_2_7 \
> > +#define SPAPR_COMPAT_2_7                            \
> > +    HW_COMPAT_2_7                                   \
> > +    {                                               \
> > +        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
> > +        .property = "mem_win_size",                 \
> > +        .value    = stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\
> > +    },                                              \
> > +    {                                               \
> > +        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
> > +        .property = "mem64_win_size",               \
> > +        .value    = "0",                            \
> > +    },
> > +
> > +static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
> > +                              uint64_t *buid, hwaddr *pio,
> > +                              hwaddr *mmio32, hwaddr *mmio64,
> > +                              unsigned n_dma, uint32_t *liobns, Error **errp)
> > +{
> > +    /* Legacy PHB placement for pseries-2.7 and earlier machine types */
> > +    const uint64_t base_buid = 0x800000020000000ULL;
> > +    const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> > +    const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> > +    const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> > +    const uint32_t max_index = 255;
> > +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> > +
> > +    uint64_t ram_top = MACHINE(spapr)->ram_size;
> > +    hwaddr phb0_base, phb_base;
> > +    int i;
> > +
> > +    /* Do we have hotpluggable memory? */
> > +    if (MACHINE(spapr)->maxram_size > ram_top) {
> > +        /* Can't just use maxram_size, because there may be an
> > +         * alignment gap between normal and hotpluggable memory
> > +         * regions */
> > +        ram_top = spapr->hotplug_memory.base +
> > +            memory_region_size(&spapr->hotplug_memory.mr);
> > +    }
> > +
> > +    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> > +
> > +    if (index > max_index) {
> > +        error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> > +                   max_index);
> > +        return;
> > +    }
> > +
> > +    *buid = base_buid + index;
> > +    for (i = 0; i < n_dma; ++i) {
> > +        liobns[i] = SPAPR_PCI_LIOBN(index, i);
> > +    }
> > +
> > +    phb_base = phb0_base + index * phb_spacing;
> > +    *pio = phb_base + pio_offset;
> > +    *mmio32 = phb_base + mmio_offset;
> > +    /*
> > +     * We don't set the 64-bit MMIO window, relying on the PHB's
> > +     * fallback behaviour of automatically splitting a large "32-bit"
> > +     * window into contiguous 32-bit and 64-bit windows
> > +     */
> > +}
> >  
> >  static void spapr_machine_2_7_instance_options(MachineState *machine)
> >  {
> > @@ -2533,6 +2593,7 @@ static void spapr_machine_2_7_class_options(MachineClass *mc)
> >      spapr_machine_2_8_class_options(mc);
> >      smc->tcg_default_cpu = "POWER7";
> >      SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_7);
> > +    smc->phb_placement = phb_placement_2_7;
> >  }
> >  
> >  DEFINE_SPAPR_MACHINE(2_7, "2.7", false);
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 31ca6fa..3ef6a26 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1565,9 +1565,10 @@ static Property spapr_phb_properties[] = {
> >      DEFINE_PROP_UINT32("liobn64", sPAPRPHBState, dma_liobn[1], -1),
> >      DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1),
> >      DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size,
> > -                       SPAPR_PCI_MMIO_WIN_SIZE),
> > +                       SPAPR_PCI_MEM32_WIN_SIZE),
> >      DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1),
> > -    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0),
> > +    DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size,
> > +                       SPAPR_PCI_MEM64_WIN_SIZE),
> >      DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr,
> >                         -1),
> >      DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
> > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> > index 23dfb09..b92c1b5 100644
> > --- a/include/hw/pci-host/spapr.h
> > +++ b/include/hw/pci-host/spapr.h
> > @@ -84,8 +84,14 @@ struct sPAPRPHBState {
> >  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
> >  #define SPAPR_PCI_MEM32_WIN_SIZE     \
> >      ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)
> > +#define SPAPR_PCI_MEM64_WIN_SIZE     0x10000000000ULL /* 1 TiB */
> >  
> > -#define SPAPR_PCI_MMIO_WIN_SIZE      0xf80000000
> > +/* Without manual configuration, all PCI outbound windows will be
> > + * within this range */
> > +#define SPAPR_PCI_BASE               (1ULL << 45) /* 32 TiB */
> > +#define SPAPR_PCI_LIMIT              (1ULL << 46) /* 64 TiB */
> > +
> > +#define SPAPR_PCI_2_7_MMIO_WIN_SIZE  0xf80000000
> >  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
> >  
> >  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
> > diff --git a/tests/endianness-test.c b/tests/endianness-test.c
> > index b7a120e..cf8d41b 100644
> > --- a/tests/endianness-test.c
> > +++ b/tests/endianness-test.c
> > @@ -38,7 +38,8 @@ static const TestCase test_cases[] = {
> >      { "ppc", "prep", 0x80000000, .bswap = true },
> >      { "ppc", "bamboo", 0xe8000000, .bswap = true, .superio = "i82378" },
> >      { "ppc64", "mac99", 0xf2000000, .bswap = true, .superio = "i82378" },
> > -    { "ppc64", "pseries", 0x10080000000ULL,
> > +    { "ppc64", "pseries", (1ULL << 45), .bswap = true, .superio = "i82378" },
> > +    { "ppc64", "pseries-2.7", 0x10080000000ULL,
> >        .bswap = true, .superio = "i82378" },
> >      { "sh4", "r2d", 0xfe240000, .superio = "i82378" },
> >      { "sh4eb", "r2d", 0xfe240000, .bswap = true, .superio = "i82378" },
> > diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
> > index 558dfc3..2eaaf91 100644
> > --- a/tests/libqos/pci-spapr.c
> > +++ b/tests/libqos/pci-spapr.c
> > @@ -235,10 +235,9 @@ static void qpci_spapr_iounmap(QPCIBus *bus, void *data)
> >      /* FIXME */
> >  }
> >  
> > -#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
> > -#define SPAPR_PCI_MMIO32_WIN_OFF     0xA0000000
> > +#define SPAPR_PCI_BASE               (1ULL << 45)
> > +
> >  #define SPAPR_PCI_MMIO32_WIN_SIZE    0x80000000 /* 2 GiB */
> > -#define SPAPR_PCI_IO_WIN_OFF         0x80000000
> >  #define SPAPR_PCI_IO_WIN_SIZE        0x10000
> >  
> >  QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
> > @@ -273,12 +272,12 @@ QPCIBus *qpci_init_spapr(QGuestAllocator *alloc)
> >       * get the window locations */
> >      ret->buid = 0x800000020000000ULL;
> >  
> > -    ret->pio_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_IO_WIN_OFF;
> > +    ret->pio_cpu_base = SPAPR_PCI_BASE;
> >      ret->pio.pci_base = 0;
> >      ret->pio.size = SPAPR_PCI_IO_WIN_SIZE;
> >  
> >      /* 32-bit portion of the MMIO window is at PCI address 2..4 GiB */
> > -    ret->mmio32_cpu_base = SPAPR_PCI_WINDOW_BASE + SPAPR_PCI_MMIO32_WIN_OFF;
> > +    ret->mmio32_cpu_base = SPAPR_PCI_BASE + SPAPR_PCI_MMIO32_WIN_SIZE;
> >      ret->mmio32.pci_base = 0x80000000; /* 2 GiB */
> >      ret->mmio32.size = SPAPR_PCI_MMIO32_WIN_SIZE;
> >  
> > diff --git a/tests/spapr-phb-test.c b/tests/spapr-phb-test.c
> > index 21004a7..d3522ea 100644
> > --- a/tests/spapr-phb-test.c
> > +++ b/tests/spapr-phb-test.c
> > @@ -25,7 +25,7 @@ int main(int argc, char **argv)
> >      g_test_init(&argc, &argv, NULL);
> >      qtest_add_func("/spapr-phb/device", test_phb_device);
> >  
> > -    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=100");
> > +    qtest_start("-device " TYPE_SPAPR_PCI_HOST_BRIDGE ",index=30");
> >  
> >      ret = g_test_run();
> >  
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-10-16  1:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-12 23:57 [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 1/7] libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 2/7] libqos: Correct error in PCI hole sizing for spapr David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 3/7] libqos: Limit spapr-pci to 32-bit MMIO for now David Gibson
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 4/7] spapr_pci: Delegate placement of PCI host bridges to machine type David Gibson
2016-10-13  7:33   ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM David Gibson
2016-10-13  7:35   ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 6/7] spapr_pci: Add a 64-bit MMIO window David Gibson
2016-10-13  8:06   ` Laurent Vivier
2016-10-12 23:57 ` [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map David Gibson
2016-10-13  8:40   ` Laurent Vivier
2016-10-13 23:29     ` David Gibson
2016-10-14  7:25       ` Laurent Vivier
2016-10-16  1:06         ` David Gibson
2016-10-14  4:10 ` [Qemu-devel] [PATCHv3 0/7] Improve PCI IO window orgnaization for pseries no-reply
2016-10-14  5:41   ` David Gibson
2016-10-14  7:07     ` Laurent Vivier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.