qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [for-5.0 0/4] Fixes for RMA size calculation
@ 2019-11-29  1:35 David Gibson
  2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: David Gibson @ 2019-11-29  1:35 UTC (permalink / raw)
  To: groug, clg; +Cc: aik, lvivier, qemu-ppc, qemu-devel, David Gibson

PAPR guests have a certain "Real Mode Area" - a subsection of memory
which can be accessed when in guest real mode (that is, with the MMU
"off" from the guest point of view).  This is advertised to the guest
in the device tree.

We want to make the RMA as large as we can, to allow for flexibility
in loading boot images, which need to fit within it.  But, there's a
somewhat complex set of constraints on the size.  At the moment, we
don't always get those correct.  This has caused crashes in some
cases, although for now those are worked around inside the guest
kernel.

These patches clarify and correct the logic here.  They will break
some cases using a host kernel with 4kiB pagesize (which doesn't
include any mainstream distro kernel nowadays).  Since that case is
very rare, and there do exist a number of workarounds for it, I think
that's worth it for the simplified logic and more consistent
behaviour.

David Gibson (4):
  spapr,ppc: Simplify signature of kvmppc_rma_size()
  spapr: Don't attempt to clamp RMA to VRMA constraint
  spapr: Clean up RMA size calculation
  spapr: Correct clamping of RMA to Node 0 size

 hw/ppc/spapr.c         | 110 ++++++++++++++++++++---------------------
 hw/ppc/spapr_hcall.c   |   4 +-
 include/hw/ppc/spapr.h |   3 +-
 target/ppc/kvm.c       |   5 +-
 target/ppc/kvm_ppc.h   |   7 ++-
 5 files changed, 63 insertions(+), 66 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size()
  2019-11-29  1:35 [for-5.0 0/4] Fixes for RMA size calculation David Gibson
@ 2019-11-29  1:35 ` David Gibson
  2019-12-02  7:45   ` Cédric Le Goater
                     ` (3 more replies)
  2019-11-29  1:35 ` [for-5.0 2/4] spapr: Don't attempt to clamp RMA to VRMA constraint David Gibson
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 17+ messages in thread
From: David Gibson @ 2019-11-29  1:35 UTC (permalink / raw)
  To: groug, clg; +Cc: aik, lvivier, qemu-ppc, qemu-devel, David Gibson

This function calculates the maximum size of the RMA as implied by the
host's page size of structure of the VRMA (there are a number of other
constraints on the RMA size which will supersede this one in many
circumstances).

The current interface takes the current RMA size estimate, and clamps it
to the VRMA derived size.  The only current caller passes in an arguably
wrong value (it will match the current RMA estimate in some but not all
cases).

We want to fix that, but for now just keep concerns separated by having the
KVM helper function just return the VRMA derived limit, and let the caller
combine it with other constraints.  We call the new function
kvmppc_vrma_limit() to more clearly indicate its limited responsibility.

The helper should only ever be called in the KVM enabled case, so replace
its !CONFIG_KVM stub with an assert() rather than a dummy value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c       | 5 +++--
 target/ppc/kvm.c     | 5 ++---
 target/ppc/kvm_ppc.h | 7 +++----
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d9c9a2bcee..069bd04a8d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1635,8 +1635,9 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
     spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
 
     if (spapr->vrma_adjust) {
-        spapr->rma_size = kvmppc_rma_size(spapr_node0_size(MACHINE(spapr)),
-                                          spapr->htab_shift);
+        hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
+
+        spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
     }
 }
 
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index c77f9848ec..09b3bd6443 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2101,7 +2101,7 @@ void kvmppc_hint_smt_possible(Error **errp)
 
 
 #ifdef TARGET_PPC64
-uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
+uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
 {
     struct kvm_ppc_smmu_info info;
     long rampagesize, best_page_shift;
@@ -2128,8 +2128,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
         }
     }
 
-    return MIN(current_size,
-               1ULL << (best_page_shift + hash_shift - 7));
+    return 1ULL << (best_page_shift + hash_shift - 7));
 }
 #endif
 
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 98bd7d5da6..4f0eec4c1b 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -45,7 +45,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
                               int *pfd, bool need_vfio);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
-uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
+uint64_t kvmppc_vrma_limit(unsigned int hash_shift);
 bool kvmppc_has_cap_spapr_vfio(void);
 #endif /* !CONFIG_USER_ONLY */
 bool kvmppc_has_cap_epr(void);
@@ -241,10 +241,9 @@ static inline int kvmppc_reset_htab(int shift_hint)
     return 0;
 }
 
-static inline uint64_t kvmppc_rma_size(uint64_t current_size,
-                                       unsigned int hash_shift)
+static inline uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
 {
-    return ram_size;
+    g_assert_not_reached();
 }
 
 static inline bool kvmppc_hpt_needs_host_contiguous_pages(void)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [for-5.0 2/4] spapr: Don't attempt to clamp RMA to VRMA constraint
  2019-11-29  1:35 [for-5.0 0/4] Fixes for RMA size calculation David Gibson
  2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
@ 2019-11-29  1:35 ` David Gibson
  2019-12-03  3:33   ` Alexey Kardashevskiy
  2019-11-29  1:35 ` [for-5.0 3/4] spapr: Clean up RMA size calculation David Gibson
  2019-11-29  1:35 ` [for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size David Gibson
  3 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2019-11-29  1:35 UTC (permalink / raw)
  To: groug, clg; +Cc: aik, lvivier, qemu-ppc, qemu-devel, David Gibson

The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.

The mechanics of how this works when in Hashed Page Table (HPT) mode, put
a constraint on the size of the RMA, which depends on the size of the HPT.
So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we
advertise to the guest based on this VRMA limit.

There are several things wrong with this:
 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
    of Node 0 memory size and the VRMA limit.  That will *often* work the
    same as clamping, but there can be other constraints on RMA size which
    supersede Node 0 memory size.  We have real bugs caused by this
    (currently worked around in the guest kernel)
 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
    we're past the point that we can actually advertise an RMA limit to the
    guest
 3) But most fundamentally, the VRMA limit depends on host configuration
    (page size) which shouldn't be visible to the guest, but this partially
    exposes it.  This can cause problems with migration in certain edge
    cases, although we will mostly get away with it.

In practice, this clamping is almost never applied anyway.  With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit.  It will hit with
4kiB pages, where it will be (guest memory size)/4.  However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.

So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit.  This can break if running HPT guests on a host kernel with
4kiB page size.  As noted that's very rare.  There also exist several
possible workarounds:
  * Change the host kernel to use 64kiB pages
  * Use radix MMU (RPT) guests instead of HPT
  * Use 64kiB hugepages on the host to back guest memory
  * Increase the guest memory size so that the RMA hits one of the fixed
    limits before the RMA limit.  This is relatively easy on POWER8 which
    has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
  * Decrease guest memory size so that it's below the lower bound on VRMA
    limit (minimum HPT size is 256kiB, giving a minimum VRAM of 8MiB).
    Difficult in practice since modern guests tend to want 1-2GiB.
  * Use a guest NUMA configuration which artificially constrains the RMA
    within the VRMA limit (the RMA must always fit within Node 0).

Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit.  This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG.  So we remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c         | 28 ++++++++++------------------
 hw/ppc/spapr_hcall.c   |  4 ++--
 include/hw/ppc/spapr.h |  3 +--
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 069bd04a8d..52c39daa99 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1618,7 +1618,7 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
     spapr_set_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
 }
 
-void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
+void spapr_setup_hpt(SpaprMachineState *spapr)
 {
     int hpt_shift;
 
@@ -1634,10 +1634,16 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
     }
     spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
 
-    if (spapr->vrma_adjust) {
+    if (kvm_enabled()) {
         hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
 
-        spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
+        /* Check our RMA fits in the possible VRMA */
+        if (vrma_limit < spapr->rma_size) {
+            error_report("Unable to create %" HWADDR_PRIu
+                         "MiB RMA (VRMA only allows %" HWADDR_PRIu "MiB",
+                         spapr->rma_size / MiB, vrma_limit / MiB);
+            exit(EXIT_FAILURE);
+        }
     }
 }
 
@@ -1676,7 +1682,7 @@ static void spapr_machine_reset(MachineState *machine)
         spapr->patb_entry = PATE1_GR;
         spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
     } else {
-        spapr_setup_hpt_and_vrma(spapr);
+        spapr_setup_hpt(spapr);
     }
 
     qemu_devices_reset();
@@ -2711,20 +2717,6 @@ static void spapr_machine_init(MachineState *machine)
 
     spapr->rma_size = node0_size;
 
-    /* With KVM, we don't actually know whether KVM supports an
-     * unbounded RMA (PR KVM) or is limited by the hash table size
-     * (HV KVM using VRMA), so we always assume the latter
-     *
-     * In that case, we also limit the initial allocations for RTAS
-     * etc... to 256M since we have no way to know what the VRMA size
-     * is going to be as it depends on the size of the hash table
-     * which isn't determined yet.
-     */
-    if (kvm_enabled()) {
-        spapr->vrma_adjust = 1;
-        spapr->rma_size = MIN(spapr->rma_size, 0x10000000);
-    }
-
     /* Actually we don't support unbounded RMA anymore since we added
      * proper emulation of HV mode. The max we can get is 16G which
      * also happens to be what we configure for PAPR mode so make sure
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 140f05c1c6..372ba8bd1c 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1456,7 +1456,7 @@ static void spapr_check_setup_free_hpt(SpaprMachineState *spapr,
         spapr_free_hpt(spapr);
     } else if (!(patbe_new & PATE1_GR)) {
         /* RADIX->HASH || NOTHING->HASH : Allocate HPT */
-        spapr_setup_hpt_and_vrma(spapr);
+        spapr_setup_hpt(spapr);
     }
     return;
 }
@@ -1772,7 +1772,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
          * (because the guest isn't going to use radix) then set it up here. */
         if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
             /* legacy hash or new hash: */
-            spapr_setup_hpt_and_vrma(spapr);
+            spapr_setup_hpt(spapr);
         }
         spapr->cas_reboot =
             (spapr_h_cas_compose_response(spapr, args[1], args[2],
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index d5ab5ea7b2..85c33560c3 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -154,7 +154,6 @@ struct SpaprMachineState {
     SpaprPendingHpt *pending_hpt; /* in-progress resize */
 
     hwaddr rma_size;
-    int vrma_adjust;
     uint32_t fdt_size;
     uint32_t fdt_initial_size;
     void *fdt_blob;
@@ -772,7 +771,7 @@ int spapr_h_cas_compose_response(SpaprMachineState *sm,
                                  target_ulong addr, target_ulong size,
                                  SpaprOptionVector *ov5_updates);
 void close_htab_fd(SpaprMachineState *spapr);
-void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr);
+void spapr_setup_hpt(SpaprMachineState *spapr);
 void spapr_free_hpt(SpaprMachineState *spapr);
 SpaprTceTable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
 void spapr_tce_table_enable(SpaprTceTable *tcet,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [for-5.0 3/4] spapr: Clean up RMA size calculation
  2019-11-29  1:35 [for-5.0 0/4] Fixes for RMA size calculation David Gibson
  2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
  2019-11-29  1:35 ` [for-5.0 2/4] spapr: Don't attempt to clamp RMA to VRMA constraint David Gibson
@ 2019-11-29  1:35 ` David Gibson
  2019-12-03  3:44   ` Alexey Kardashevskiy
  2019-11-29  1:35 ` [for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size David Gibson
  3 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2019-11-29  1:35 UTC (permalink / raw)
  To: groug, clg; +Cc: aik, lvivier, qemu-ppc, qemu-devel, David Gibson

Move the calculation of the Real Mode Area (RMA) size into a helper
function.  While we're there clean it up and correct it in a few ways:
  * Add comments making it clearer where the various constraints come from
  * Remove a pointless check that the RMA fits within Node 0 (we've just
    clamped it so that it does)
  * The 16GiB limit we apply is only correct for POWER8, but there is also
    a 1TiB limit that applies on POWER9.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 57 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 52c39daa99..7efd4f2b85 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2664,6 +2664,40 @@ static PCIHostState *spapr_create_default_phb(void)
     return PCI_HOST_BRIDGE(dev);
 }
 
+static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
+{
+    MachineState *machine = MACHINE(spapr);
+    hwaddr rma_size = machine->ram_size;
+    hwaddr node0_size = spapr_node0_size(machine);
+
+    /* RMA has to fit in the first NUMA node */
+    rma_size = MIN(rma_size, node0_size);
+
+    /*
+     * VRMA access is via a special 1TiB SLB mapping, so the RMA can
+     * never exceed that
+     */
+    rma_size = MIN(rma_size, TiB);
+
+    /*
+     * RMA size is controlled in hardware by LPCR[RMLS].  On POWER8
+     * the largest RMA that can be specified there is 16GiB
+     */
+    if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00,
+                               0, spapr->max_compat_pvr)) {
+        rma_size = MIN(rma_size, 16 * GiB);
+    }
+
+    if (rma_size < (MIN_RMA_SLOF * MiB)) {
+        error_setg(errp,
+"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
+                   MIN_RMA_SLOF);
+        return -1;
+    }
+
+    return rma_size;
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void spapr_machine_init(MachineState *machine)
 {
@@ -2675,7 +2709,6 @@ static void spapr_machine_init(MachineState *machine)
     int i;
     MemoryRegion *sysmem = get_system_memory();
     MemoryRegion *ram = g_new(MemoryRegion, 1);
-    hwaddr node0_size = spapr_node0_size(machine);
     long load_limit, fw_size;
     char *filename;
     Error *resize_hpt_err = NULL;
@@ -2715,20 +2748,7 @@ static void spapr_machine_init(MachineState *machine)
         exit(1);
     }
 
-    spapr->rma_size = node0_size;
-
-    /* Actually we don't support unbounded RMA anymore since we added
-     * proper emulation of HV mode. The max we can get is 16G which
-     * also happens to be what we configure for PAPR mode so make sure
-     * we don't do anything bigger than that
-     */
-    spapr->rma_size = MIN(spapr->rma_size, 0x400000000ull);
-
-    if (spapr->rma_size > node0_size) {
-        error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
-                     spapr->rma_size);
-        exit(1);
-    }
+    spapr->rma_size = spapr_rma_size(spapr, &error_fatal);
 
     /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
     load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
@@ -2956,13 +2976,6 @@ static void spapr_machine_init(MachineState *machine)
         }
     }
 
-    if (spapr->rma_size < (MIN_RMA_SLOF * MiB)) {
-        error_report(
-            "pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area memory)",
-            MIN_RMA_SLOF);
-        exit(1);
-    }
-
     if (kernel_filename) {
         uint64_t lowaddr = 0;
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size
  2019-11-29  1:35 [for-5.0 0/4] Fixes for RMA size calculation David Gibson
                   ` (2 preceding siblings ...)
  2019-11-29  1:35 ` [for-5.0 3/4] spapr: Clean up RMA size calculation David Gibson
@ 2019-11-29  1:35 ` David Gibson
  2019-12-03  4:18   ` Alexey Kardashevskiy
  3 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2019-11-29  1:35 UTC (permalink / raw)
  To: groug, clg; +Cc: aik, lvivier, qemu-ppc, qemu-devel, David Gibson

The Real Mode Area (RMA) needs to fit within Node 0 in NUMA configurations.
We use a helper function spapr_node0_size() to calculate this.

But that function doesn't actually get the size of Node 0, it gets the
minimum size of all nodes, ever since b082d65a300 "spapr: Add a helper for
node0_size calculation".  That was added, apparently, because Node 0 in
qemu's terms might not have corresponded to Node 0 in PAPR terms (i.e. the
node with memory at address 0).

That might not have been the case at the time, but it *is* the case now
that qemu node 0 must have the lowest address, which is the node we need.
So, we can simplify this logic, folding it into spapr_rma_size(), the only
remaining caller.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7efd4f2b85..6611f75bdf 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -295,20 +295,6 @@ static void spapr_populate_pa_features(SpaprMachineState *spapr,
     _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, pa_size)));
 }
 
-static hwaddr spapr_node0_size(MachineState *machine)
-{
-    if (machine->numa_state->num_nodes) {
-        int i;
-        for (i = 0; i < machine->numa_state->num_nodes; ++i) {
-            if (machine->numa_state->nodes[i].node_mem) {
-                return MIN(pow2floor(machine->numa_state->nodes[i].node_mem),
-                           machine->ram_size);
-            }
-        }
-    }
-    return machine->ram_size;
-}
-
 static void add_str(GString *s, const gchar *s1)
 {
     g_string_append_len(s, s1, strlen(s1) + 1);
@@ -2668,10 +2654,13 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
 {
     MachineState *machine = MACHINE(spapr);
     hwaddr rma_size = machine->ram_size;
-    hwaddr node0_size = spapr_node0_size(machine);
 
     /* RMA has to fit in the first NUMA node */
-    rma_size = MIN(rma_size, node0_size);
+    if (machine->numa_state->num_nodes) {
+        hwaddr node0_size = machine->numa_state->nodes[0].node_mem;
+
+        rma_size = MIN(rma_size, node0_size);
+    }
 
     /*
      * VRMA access is via a special 1TiB SLB mapping, so the RMA can
@@ -2688,6 +2677,11 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
         rma_size = MIN(rma_size, 16 * GiB);
     }
 
+    /*
+     * RMA size must be a power of 2
+     */
+    rma_size = pow2floor(rma_size);
+
     if (rma_size < (MIN_RMA_SLOF * MiB)) {
         error_setg(errp,
 "pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size()
  2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
@ 2019-12-02  7:45   ` Cédric Le Goater
  2019-12-02  8:55   ` Greg Kurz
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Cédric Le Goater @ 2019-12-02  7:45 UTC (permalink / raw)
  To: David Gibson, groug; +Cc: aik, lvivier, qemu-ppc, qemu-devel

On 29/11/2019 02:35, David Gibson wrote:
> This function calculates the maximum size of the RMA as implied by the
> host's page size of structure of the VRMA (there are a number of other
> constraints on the RMA size which will supersede this one in many
> circumstances).
> 
> The current interface takes the current RMA size estimate, and clamps it
> to the VRMA derived size.  The only current caller passes in an arguably
> wrong value (it will match the current RMA estimate in some but not all
> cases).
> 
> We want to fix that, but for now just keep concerns separated by having the
> KVM helper function just return the VRMA derived limit, and let the caller
> combine it with other constraints.  We call the new function
> kvmppc_vrma_limit() to more clearly indicate its limited responsibility.
> 
> The helper should only ever be called in the KVM enabled case, so replace
> its !CONFIG_KVM stub with an assert() rather than a dummy value.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>




> ---
>  hw/ppc/spapr.c       | 5 +++--
>  target/ppc/kvm.c     | 5 ++---
>  target/ppc/kvm_ppc.h | 7 +++----
>  3 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d9c9a2bcee..069bd04a8d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1635,8 +1635,9 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
>      spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
>  
>      if (spapr->vrma_adjust) {
> -        spapr->rma_size = kvmppc_rma_size(spapr_node0_size(MACHINE(spapr)),
> -                                          spapr->htab_shift);
> +        hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
> +
> +        spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
>      }
>  }
>  
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index c77f9848ec..09b3bd6443 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2101,7 +2101,7 @@ void kvmppc_hint_smt_possible(Error **errp)
>  
>  
>  #ifdef TARGET_PPC64
> -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
> +uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
>  {
>      struct kvm_ppc_smmu_info info;
>      long rampagesize, best_page_shift;
> @@ -2128,8 +2128,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
>          }
>      }
>  
> -    return MIN(current_size,
> -               1ULL << (best_page_shift + hash_shift - 7));
> +    return 1ULL << (best_page_shift + hash_shift - 7));
>  }
>  #endif
>  
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 98bd7d5da6..4f0eec4c1b 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -45,7 +45,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
>                                int *pfd, bool need_vfio);
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
> -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> +uint64_t kvmppc_vrma_limit(unsigned int hash_shift);
>  bool kvmppc_has_cap_spapr_vfio(void);
>  #endif /* !CONFIG_USER_ONLY */
>  bool kvmppc_has_cap_epr(void);
> @@ -241,10 +241,9 @@ static inline int kvmppc_reset_htab(int shift_hint)
>      return 0;
>  }
>  
> -static inline uint64_t kvmppc_rma_size(uint64_t current_size,
> -                                       unsigned int hash_shift)
> +static inline uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
>  {
> -    return ram_size;
> +    g_assert_not_reached();
>  }
>  
>  static inline bool kvmppc_hpt_needs_host_contiguous_pages(void)
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size()
  2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
  2019-12-02  7:45   ` Cédric Le Goater
@ 2019-12-02  8:55   ` Greg Kurz
  2019-12-02 18:18   ` Cédric Le Goater
  2019-12-03  3:32   ` Alexey Kardashevskiy
  3 siblings, 0 replies; 17+ messages in thread
From: Greg Kurz @ 2019-12-02  8:55 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, lvivier, qemu-ppc, clg, qemu-devel

On Fri, 29 Nov 2019 12:35:01 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> This function calculates the maximum size of the RMA as implied by the
> host's page size of structure of the VRMA (there are a number of other
> constraints on the RMA size which will supersede this one in many
> circumstances).
> 
> The current interface takes the current RMA size estimate, and clamps it
> to the VRMA derived size.  The only current caller passes in an arguably
> wrong value (it will match the current RMA estimate in some but not all
> cases).
> 
> We want to fix that, but for now just keep concerns separated by having the
> KVM helper function just return the VRMA derived limit, and let the caller
> combine it with other constraints.  We call the new function
> kvmppc_vrma_limit() to more clearly indicate its limited responsibility.
> 
> The helper should only ever be called in the KVM enabled case, so replace
> its !CONFIG_KVM stub with an assert() rather than a dummy value.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/ppc/spapr.c       | 5 +++--
>  target/ppc/kvm.c     | 5 ++---
>  target/ppc/kvm_ppc.h | 7 +++----
>  3 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d9c9a2bcee..069bd04a8d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1635,8 +1635,9 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
>      spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
>  
>      if (spapr->vrma_adjust) {
> -        spapr->rma_size = kvmppc_rma_size(spapr_node0_size(MACHINE(spapr)),
> -                                          spapr->htab_shift);
> +        hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
> +
> +        spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
>      }
>  }
>  
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index c77f9848ec..09b3bd6443 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2101,7 +2101,7 @@ void kvmppc_hint_smt_possible(Error **errp)
>  
>  
>  #ifdef TARGET_PPC64
> -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
> +uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
>  {
>      struct kvm_ppc_smmu_info info;
>      long rampagesize, best_page_shift;
> @@ -2128,8 +2128,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
>          }
>      }
>  
> -    return MIN(current_size,
> -               1ULL << (best_page_shift + hash_shift - 7));
> +    return 1ULL << (best_page_shift + hash_shift - 7));
>  }
>  #endif
>  
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 98bd7d5da6..4f0eec4c1b 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -45,7 +45,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
>                                int *pfd, bool need_vfio);
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
> -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> +uint64_t kvmppc_vrma_limit(unsigned int hash_shift);
>  bool kvmppc_has_cap_spapr_vfio(void);
>  #endif /* !CONFIG_USER_ONLY */
>  bool kvmppc_has_cap_epr(void);
> @@ -241,10 +241,9 @@ static inline int kvmppc_reset_htab(int shift_hint)
>      return 0;
>  }
>  
> -static inline uint64_t kvmppc_rma_size(uint64_t current_size,
> -                                       unsigned int hash_shift)
> +static inline uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
>  {
> -    return ram_size;
> +    g_assert_not_reached();
>  }
>  
>  static inline bool kvmppc_hpt_needs_host_contiguous_pages(void)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size()
  2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
  2019-12-02  7:45   ` Cédric Le Goater
  2019-12-02  8:55   ` Greg Kurz
@ 2019-12-02 18:18   ` Cédric Le Goater
  2019-12-10  3:58     ` David Gibson
  2019-12-03  3:32   ` Alexey Kardashevskiy
  3 siblings, 1 reply; 17+ messages in thread
From: Cédric Le Goater @ 2019-12-02 18:18 UTC (permalink / raw)
  To: David Gibson, groug; +Cc: aik, lvivier, qemu-ppc, qemu-devel

> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index c77f9848ec..09b3bd6443 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2101,7 +2101,7 @@ void kvmppc_hint_smt_possible(Error **errp)
>  
>  
>  #ifdef TARGET_PPC64
> -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
> +uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
>  {
>      struct kvm_ppc_smmu_info info;
>      long rampagesize, best_page_shift;
> @@ -2128,8 +2128,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
>          }
>      }
>  
> -    return MIN(current_size,
> -               1ULL << (best_page_shift + hash_shift - 7));
> +    return 1ULL << (best_page_shift + hash_shift - 7));

The closing ')' should be removed.

C.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size()
  2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
                     ` (2 preceding siblings ...)
  2019-12-02 18:18   ` Cédric Le Goater
@ 2019-12-03  3:32   ` Alexey Kardashevskiy
  3 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  3:32 UTC (permalink / raw)
  To: David Gibson, groug, clg; +Cc: lvivier, qemu-ppc, qemu-devel



On 29/11/2019 12:35, David Gibson wrote:
> This function calculates the maximum size of the RMA as implied by the
> host's page size of structure of the VRMA (there are a number of other
> constraints on the RMA size which will supersede this one in many
> circumstances).
> 
> The current interface takes the current RMA size estimate, and clamps it
> to the VRMA derived size.  The only current caller passes in an arguably
> wrong value (it will match the current RMA estimate in some but not all
> cases).
> 
> We want to fix that, but for now just keep concerns separated by having the
> KVM helper function just return the VRMA derived limit, and let the caller
> combine it with other constraints.  We call the new function
> kvmppc_vrma_limit() to more clearly indicate its limited responsibility.
> 
> The helper should only ever be called in the KVM enabled case, so replace
> its !CONFIG_KVM stub with an assert() rather than a dummy value.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>


Besides not compiling, otherwise looks good.


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>




> ---
>  hw/ppc/spapr.c       | 5 +++--
>  target/ppc/kvm.c     | 5 ++---
>  target/ppc/kvm_ppc.h | 7 +++----
>  3 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d9c9a2bcee..069bd04a8d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1635,8 +1635,9 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
>      spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
>  
>      if (spapr->vrma_adjust) {
> -        spapr->rma_size = kvmppc_rma_size(spapr_node0_size(MACHINE(spapr)),
> -                                          spapr->htab_shift);
> +        hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
> +
> +        spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
>      }
>  }
>  
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index c77f9848ec..09b3bd6443 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2101,7 +2101,7 @@ void kvmppc_hint_smt_possible(Error **errp)
>  
>  
>  #ifdef TARGET_PPC64
> -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
> +uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
>  {
>      struct kvm_ppc_smmu_info info;
>      long rampagesize, best_page_shift;
> @@ -2128,8 +2128,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
>          }
>      }
>  
> -    return MIN(current_size,
> -               1ULL << (best_page_shift + hash_shift - 7));
> +    return 1ULL << (best_page_shift + hash_shift - 7));
>  }
>  #endif
>  
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 98bd7d5da6..4f0eec4c1b 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -45,7 +45,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
>                                int *pfd, bool need_vfio);
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
> -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> +uint64_t kvmppc_vrma_limit(unsigned int hash_shift);
>  bool kvmppc_has_cap_spapr_vfio(void);
>  #endif /* !CONFIG_USER_ONLY */
>  bool kvmppc_has_cap_epr(void);
> @@ -241,10 +241,9 @@ static inline int kvmppc_reset_htab(int shift_hint)
>      return 0;
>  }
>  
> -static inline uint64_t kvmppc_rma_size(uint64_t current_size,
> -                                       unsigned int hash_shift)
> +static inline uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
>  {
> -    return ram_size;
> +    g_assert_not_reached();
>  }
>  
>  static inline bool kvmppc_hpt_needs_host_contiguous_pages(void)
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 2/4] spapr: Don't attempt to clamp RMA to VRMA constraint
  2019-11-29  1:35 ` [for-5.0 2/4] spapr: Don't attempt to clamp RMA to VRMA constraint David Gibson
@ 2019-12-03  3:33   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  3:33 UTC (permalink / raw)
  To: David Gibson, groug, clg; +Cc: lvivier, qemu-ppc, qemu-devel



On 29/11/2019 12:35, David Gibson wrote:
> The Real Mode Area (RMA) is the part of memory which a guest can access
> when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
> isn't really turned off, it's just in a special translation mode - Virtual
> Real Mode Area (VRMA) - which looks like real mode in guest mode.
> 
> The mechanics of how this works when in Hashed Page Table (HPT) mode, put
> a constraint on the size of the RMA, which depends on the size of the HPT.
> So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we
> advertise to the guest based on this VRMA limit.
> 
> There are several things wrong with this:
>  1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
>     of Node 0 memory size and the VRMA limit.  That will *often* work the
>     same as clamping, but there can be other constraints on RMA size which
>     supersede Node 0 memory size.  We have real bugs caused by this
>     (currently worked around in the guest kernel)
>  2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
>     we're past the point that we can actually advertise an RMA limit to the
>     guest
>  3) But most fundamentally, the VRMA limit depends on host configuration
>     (page size) which shouldn't be visible to the guest, but this partially
>     exposes it.  This can cause problems with migration in certain edge
>     cases, although we will mostly get away with it.
> 
> In practice, this clamping is almost never applied anyway.  With 64kiB
> pages and the normal rules for sizing of the HPT, the theoretical VRMA
> limit will be 4x(guest memory size) and so never hit.  It will hit with
> 4kiB pages, where it will be (guest memory size)/4.  However all mainstream
> distro kernels for POWER have used a 64kiB page size for at least 10 years.
> 
> So, simply replace this logic with a check that the RMA we've calculated
> based only on guest visible configuration will fit within the host implied
> VRMA limit.  This can break if running HPT guests on a host kernel with
> 4kiB page size.  As noted that's very rare.  There also exist several
> possible workarounds:
>   * Change the host kernel to use 64kiB pages
>   * Use radix MMU (RPT) guests instead of HPT
>   * Use 64kiB hugepages on the host to back guest memory
>   * Increase the guest memory size so that the RMA hits one of the fixed
>     limits before the RMA limit.  This is relatively easy on POWER8 which
>     has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
>   * Decrease guest memory size so that it's below the lower bound on VRMA
>     limit (minimum HPT size is 256kiB, giving a minimum VRAM of 8MiB).
>     Difficult in practice since modern guests tend to want 1-2GiB.
>   * Use a guest NUMA configuration which artificially constrains the RMA
>     within the VRMA limit (the RMA must always fit within Node 0).
> 
> Previously, on KVM, we also temporarily reduced the rma_size to 256M so
> that the we'd load the kernel and initrd safely, regardless of the VRMA
> limit.  This was a) confusing, b) could significantly limit the size of
> images we could load and c) introduced a behavioural difference between
> KVM and TCG.  So we remove that as well.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>  hw/ppc/spapr.c         | 28 ++++++++++------------------
>  hw/ppc/spapr_hcall.c   |  4 ++--
>  include/hw/ppc/spapr.h |  3 +--
>  3 files changed, 13 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 069bd04a8d..52c39daa99 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1618,7 +1618,7 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
>      spapr_set_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
>  }
>  
> -void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> +void spapr_setup_hpt(SpaprMachineState *spapr)
>  {
>      int hpt_shift;
>  
> @@ -1634,10 +1634,16 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
>      }
>      spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
>  
> -    if (spapr->vrma_adjust) {
> +    if (kvm_enabled()) {
>          hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
>  
> -        spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
> +        /* Check our RMA fits in the possible VRMA */
> +        if (vrma_limit < spapr->rma_size) {
> +            error_report("Unable to create %" HWADDR_PRIu
> +                         "MiB RMA (VRMA only allows %" HWADDR_PRIu "MiB",
> +                         spapr->rma_size / MiB, vrma_limit / MiB);
> +            exit(EXIT_FAILURE);
> +        }
>      }
>  }
>  
> @@ -1676,7 +1682,7 @@ static void spapr_machine_reset(MachineState *machine)
>          spapr->patb_entry = PATE1_GR;
>          spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
>      } else {
> -        spapr_setup_hpt_and_vrma(spapr);
> +        spapr_setup_hpt(spapr);
>      }
>  
>      qemu_devices_reset();
> @@ -2711,20 +2717,6 @@ static void spapr_machine_init(MachineState *machine)
>  
>      spapr->rma_size = node0_size;
>  
> -    /* With KVM, we don't actually know whether KVM supports an
> -     * unbounded RMA (PR KVM) or is limited by the hash table size
> -     * (HV KVM using VRMA), so we always assume the latter
> -     *
> -     * In that case, we also limit the initial allocations for RTAS
> -     * etc... to 256M since we have no way to know what the VRMA size
> -     * is going to be as it depends on the size of the hash table
> -     * which isn't determined yet.
> -     */
> -    if (kvm_enabled()) {
> -        spapr->vrma_adjust = 1;
> -        spapr->rma_size = MIN(spapr->rma_size, 0x10000000);
> -    }
> -
>      /* Actually we don't support unbounded RMA anymore since we added
>       * proper emulation of HV mode. The max we can get is 16G which
>       * also happens to be what we configure for PAPR mode so make sure
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 140f05c1c6..372ba8bd1c 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1456,7 +1456,7 @@ static void spapr_check_setup_free_hpt(SpaprMachineState *spapr,
>          spapr_free_hpt(spapr);
>      } else if (!(patbe_new & PATE1_GR)) {
>          /* RADIX->HASH || NOTHING->HASH : Allocate HPT */
> -        spapr_setup_hpt_and_vrma(spapr);
> +        spapr_setup_hpt(spapr);
>      }
>      return;
>  }
> @@ -1772,7 +1772,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>           * (because the guest isn't going to use radix) then set it up here. */
>          if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
>              /* legacy hash or new hash: */
> -            spapr_setup_hpt_and_vrma(spapr);
> +            spapr_setup_hpt(spapr);
>          }
>          spapr->cas_reboot =
>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index d5ab5ea7b2..85c33560c3 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -154,7 +154,6 @@ struct SpaprMachineState {
>      SpaprPendingHpt *pending_hpt; /* in-progress resize */
>  
>      hwaddr rma_size;
> -    int vrma_adjust;
>      uint32_t fdt_size;
>      uint32_t fdt_initial_size;
>      void *fdt_blob;
> @@ -772,7 +771,7 @@ int spapr_h_cas_compose_response(SpaprMachineState *sm,
>                                   target_ulong addr, target_ulong size,
>                                   SpaprOptionVector *ov5_updates);
>  void close_htab_fd(SpaprMachineState *spapr);
> -void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr);
> +void spapr_setup_hpt(SpaprMachineState *spapr);
>  void spapr_free_hpt(SpaprMachineState *spapr);
>  SpaprTceTable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
>  void spapr_tce_table_enable(SpaprTceTable *tcet,
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 3/4] spapr: Clean up RMA size calculation
  2019-11-29  1:35 ` [for-5.0 3/4] spapr: Clean up RMA size calculation David Gibson
@ 2019-12-03  3:44   ` Alexey Kardashevskiy
  2019-12-03  5:06     ` Alexey Kardashevskiy
  2019-12-10  5:29     ` David Gibson
  0 siblings, 2 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  3:44 UTC (permalink / raw)
  To: David Gibson, groug, clg; +Cc: lvivier, qemu-ppc, qemu-devel



On 29/11/2019 12:35, David Gibson wrote:
> Move the calculation of the Real Mode Area (RMA) size into a helper
> function.  While we're there clean it up and correct it in a few ways:
>   * Add comments making it clearer where the various constraints come from
>   * Remove a pointless check that the RMA fits within Node 0 (we've just
>     clamped it so that it does)
>   * The 16GiB limit we apply is only correct for POWER8, but there is also
>     a 1TiB limit that applies on POWER9.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 57 +++++++++++++++++++++++++++++++-------------------
>  1 file changed, 35 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 52c39daa99..7efd4f2b85 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2664,6 +2664,40 @@ static PCIHostState *spapr_create_default_phb(void)
>      return PCI_HOST_BRIDGE(dev);
>  }
>  
> +static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
> +{
> +    MachineState *machine = MACHINE(spapr);
> +    hwaddr rma_size = machine->ram_size;
> +    hwaddr node0_size = spapr_node0_size(machine);
> +
> +    /* RMA has to fit in the first NUMA node */
> +    rma_size = MIN(rma_size, node0_size);
> +
> +    /*
> +     * VRMA access is via a special 1TiB SLB mapping, so the RMA can
> +     * never exceed that
> +     */
> +    rma_size = MIN(rma_size, TiB);
> +
> +    /*
> +     * RMA size is controlled in hardware by LPCR[RMLS].  On POWER8


RMA is controlled by LPCR on P8 but the RMLS bits on P9 are reserved
(also reserved in PowerISA 3.0).


> +     * the largest RMA that can be specified there is 16GiB


The P8 user manual says:
===
The following RMO sizes are available for the POWER8 processor.
The RMLS[34:37] field in the LPCR defines the RMO sizes, as described below.
1000 - 32 MB
0011 - 64 MB
0111 - 128 MB
0100 - 256 MB
0010 - 1 GB
0001 - 16 GB
0000 - 256 GB
===

The maximum seems to be 256GiB.


> +     */
> +    if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00,
> +                               0, spapr->max_compat_pvr)) {
> +        rma_size = MIN(rma_size, 16 * GiB);
> +    }
> +
> +    if (rma_size < (MIN_RMA_SLOF * MiB)) {


nit: it is time to redefine MIN_RMA_SLOF to use MiBs imho :)


> +        error_setg(errp,
> +"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
> +                   MIN_RMA_SLOF);

Something went wrong with formatting here.

Otherwise looks good. Thanks,



> +        return -1;
> +    }
> +
> +    return rma_size;
> +}
> +
>  /* pSeries LPAR / sPAPR hardware init */
>  static void spapr_machine_init(MachineState *machine)
>  {
> @@ -2675,7 +2709,6 @@ static void spapr_machine_init(MachineState *machine)
>      int i;
>      MemoryRegion *sysmem = get_system_memory();
>      MemoryRegion *ram = g_new(MemoryRegion, 1);
> -    hwaddr node0_size = spapr_node0_size(machine);
>      long load_limit, fw_size;
>      char *filename;
>      Error *resize_hpt_err = NULL;
> @@ -2715,20 +2748,7 @@ static void spapr_machine_init(MachineState *machine)
>          exit(1);
>      }
>  
> -    spapr->rma_size = node0_size;
> -
> -    /* Actually we don't support unbounded RMA anymore since we added
> -     * proper emulation of HV mode. The max we can get is 16G which
> -     * also happens to be what we configure for PAPR mode so make sure
> -     * we don't do anything bigger than that
> -     */
> -    spapr->rma_size = MIN(spapr->rma_size, 0x400000000ull);
> -
> -    if (spapr->rma_size > node0_size) {
> -        error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
> -                     spapr->rma_size);
> -        exit(1);
> -    }
> +    spapr->rma_size = spapr_rma_size(spapr, &error_fatal);
>  
>      /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
>      load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
> @@ -2956,13 +2976,6 @@ static void spapr_machine_init(MachineState *machine)
>          }
>      }
>  
> -    if (spapr->rma_size < (MIN_RMA_SLOF * MiB)) {
> -        error_report(
> -            "pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area memory)",
> -            MIN_RMA_SLOF);
> -        exit(1);
> -    }
> -
>      if (kernel_filename) {
>          uint64_t lowaddr = 0;
>  
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size
  2019-11-29  1:35 ` [for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size David Gibson
@ 2019-12-03  4:18   ` Alexey Kardashevskiy
  2019-12-10  5:21     ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  4:18 UTC (permalink / raw)
  To: David Gibson, groug, clg; +Cc: lvivier, qemu-ppc, qemu-devel



On 29/11/2019 12:35, David Gibson wrote:
> The Real Mode Area (RMA) needs to fit within Node 0 in NUMA configurations.
> We use a helper function spapr_node0_size() to calculate this.
> 
> But that function doesn't actually get the size of Node 0, it gets the
> minimum size of all nodes, ever since b082d65a300 "spapr: Add a helper for
> node0_size calculation".  That was added, apparently, because Node 0 in
> qemu's terms might not have corresponded to Node 0 in PAPR terms (i.e. the
> node with memory at address 0).


After looking at this closely, I think the idea was that the first
node(s) may have only CPUs but not memory, in this case
node#0.node_mem==0 and things crash:


/home/aik/pbuild/qemu-garrison2-ppc64/ppc64-softmmu/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 2G \
-kernel /home/aik/t/vml4120le \
-initrd /home/aik/t/le.cpio \
-object memory-backend-ram,id=memdev1,size=2G \
-numa node,nodeid=0 \
-numa node,nodeid=1,memdev=memdev1 \
-numa cpu,node-id=0 \
-smp 8,threads=8 \
-machine pseries \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh37742 \
-mon chardev=SOCKET0,mode=control
QEMU PID = 12377
qemu-system-ppc64:qemu_trace_events:80: warning: trace event
'vfio_ram_register' does not exist
qemu-system-ppc64: pSeries SLOF firmware requires >= 128MiB guest RMA
(Real Mode Area)





> That might not have been the case at the time, but it *is* the case now
> that qemu node 0 must have the lowest address, which is the node we need.
> So, we can simplify this logic, folding it into spapr_rma_size(), the only
> remaining caller.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 26 ++++++++++----------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 7efd4f2b85..6611f75bdf 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -295,20 +295,6 @@ static void spapr_populate_pa_features(SpaprMachineState *spapr,
>      _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, pa_size)));
>  }
>  
> -static hwaddr spapr_node0_size(MachineState *machine)
> -{
> -    if (machine->numa_state->num_nodes) {
> -        int i;
> -        for (i = 0; i < machine->numa_state->num_nodes; ++i) {
> -            if (machine->numa_state->nodes[i].node_mem) {
> -                return MIN(pow2floor(machine->numa_state->nodes[i].node_mem),
> -                           machine->ram_size);
> -            }
> -        }
> -    }
> -    return machine->ram_size;
> -}
> -
>  static void add_str(GString *s, const gchar *s1)
>  {
>      g_string_append_len(s, s1, strlen(s1) + 1);
> @@ -2668,10 +2654,13 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
>  {
>      MachineState *machine = MACHINE(spapr);
>      hwaddr rma_size = machine->ram_size;
> -    hwaddr node0_size = spapr_node0_size(machine);
>  
>      /* RMA has to fit in the first NUMA node */
> -    rma_size = MIN(rma_size, node0_size);
> +    if (machine->numa_state->num_nodes) {
> +        hwaddr node0_size = machine->numa_state->nodes[0].node_mem;
> +
> +        rma_size = MIN(rma_size, node0_size);
> +    }
>  
>      /*
>       * VRMA access is via a special 1TiB SLB mapping, so the RMA can
> @@ -2688,6 +2677,11 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
>          rma_size = MIN(rma_size, 16 * GiB);
>      }
>  
> +    /*
> +     * RMA size must be a power of 2
> +     */
> +    rma_size = pow2floor(rma_size);
> +
>      if (rma_size < (MIN_RMA_SLOF * MiB)) {
>          error_setg(errp,
>  "pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 3/4] spapr: Clean up RMA size calculation
  2019-12-03  3:44   ` Alexey Kardashevskiy
@ 2019-12-03  5:06     ` Alexey Kardashevskiy
  2019-12-10  5:44       ` David Gibson
  2019-12-10  5:29     ` David Gibson
  1 sibling, 1 reply; 17+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-03  5:06 UTC (permalink / raw)
  To: David Gibson, groug, clg; +Cc: lvivier, qemu-ppc, qemu-devel



On 03/12/2019 14:44, Alexey Kardashevskiy wrote:
> 
> 
> On 29/11/2019 12:35, David Gibson wrote:
>> Move the calculation of the Real Mode Area (RMA) size into a helper
>> function.  While we're there clean it up and correct it in a few ways:
>>   * Add comments making it clearer where the various constraints come from
>>   * Remove a pointless check that the RMA fits within Node 0 (we've just
>>     clamped it so that it does)
>>   * The 16GiB limit we apply is only correct for POWER8, but there is also
>>     a 1TiB limit that applies on POWER9.
>>
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> ---
>>  hw/ppc/spapr.c | 57 +++++++++++++++++++++++++++++++-------------------
>>  1 file changed, 35 insertions(+), 22 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 52c39daa99..7efd4f2b85 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2664,6 +2664,40 @@ static PCIHostState *spapr_create_default_phb(void)
>>      return PCI_HOST_BRIDGE(dev);
>>  }
>>  
>> +static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
>> +{
>> +    MachineState *machine = MACHINE(spapr);
>> +    hwaddr rma_size = machine->ram_size;
>> +    hwaddr node0_size = spapr_node0_size(machine);
>> +
>> +    /* RMA has to fit in the first NUMA node */
>> +    rma_size = MIN(rma_size, node0_size);
>> +
>> +    /*
>> +     * VRMA access is via a special 1TiB SLB mapping, so the RMA can
>> +     * never exceed that
>> +     */
>> +    rma_size = MIN(rma_size, TiB);
>> +
>> +    /*
>> +     * RMA size is controlled in hardware by LPCR[RMLS].  On POWER8
> 
> 
> RMA is controlled by LPCR on P8 but the RMLS bits on P9 are reserved
> (also reserved in PowerISA 3.0).
> 
> 
>> +     * the largest RMA that can be specified there is 16GiB
> 
> 
> The P8 user manual says:
> ===
> The following RMO sizes are available for the POWER8 processor.
> The RMLS[34:37] field in the LPCR defines the RMO sizes, as described below.
> 1000 - 32 MB
> 0011 - 64 MB
> 0111 - 128 MB
> 0100 - 256 MB
> 0010 - 1 GB
> 0001 - 16 GB
> 0000 - 256 GB
> ===
> 
> The maximum seems to be 256GiB.


Ah, update from Paul - we do not actually use what LPCR[RMLS] controls -
Real Mode Offset Register (RMOR).



> 
> 
>> +     */
>> +    if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00,
>> +                               0, spapr->max_compat_pvr)) {
>> +        rma_size = MIN(rma_size, 16 * GiB);
>> +    }
>> +
>> +    if (rma_size < (MIN_RMA_SLOF * MiB)) {
> 
> 
> nit: it is time to redefine MIN_RMA_SLOF to use MiBs imho :)
> 
> 
>> +        error_setg(errp,
>> +"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
>> +                   MIN_RMA_SLOF);
> 
> Something went wrong with formatting here.
> 
> Otherwise looks good. Thanks,
> 
> 
> 
>> +        return -1;
>> +    }
>> +
>> +    return rma_size;
>> +}
>> +
>>  /* pSeries LPAR / sPAPR hardware init */
>>  static void spapr_machine_init(MachineState *machine)
>>  {
>> @@ -2675,7 +2709,6 @@ static void spapr_machine_init(MachineState *machine)
>>      int i;
>>      MemoryRegion *sysmem = get_system_memory();
>>      MemoryRegion *ram = g_new(MemoryRegion, 1);
>> -    hwaddr node0_size = spapr_node0_size(machine);
>>      long load_limit, fw_size;
>>      char *filename;
>>      Error *resize_hpt_err = NULL;
>> @@ -2715,20 +2748,7 @@ static void spapr_machine_init(MachineState *machine)
>>          exit(1);
>>      }
>>  
>> -    spapr->rma_size = node0_size;
>> -
>> -    /* Actually we don't support unbounded RMA anymore since we added
>> -     * proper emulation of HV mode. The max we can get is 16G which
>> -     * also happens to be what we configure for PAPR mode so make sure
>> -     * we don't do anything bigger than that
>> -     */
>> -    spapr->rma_size = MIN(spapr->rma_size, 0x400000000ull);
>> -
>> -    if (spapr->rma_size > node0_size) {
>> -        error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
>> -                     spapr->rma_size);
>> -        exit(1);
>> -    }
>> +    spapr->rma_size = spapr_rma_size(spapr, &error_fatal);
>>  
>>      /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
>>      load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
>> @@ -2956,13 +2976,6 @@ static void spapr_machine_init(MachineState *machine)
>>          }
>>      }
>>  
>> -    if (spapr->rma_size < (MIN_RMA_SLOF * MiB)) {
>> -        error_report(
>> -            "pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area memory)",
>> -            MIN_RMA_SLOF);
>> -        exit(1);
>> -    }
>> -
>>      if (kernel_filename) {
>>          uint64_t lowaddr = 0;
>>  
>>
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size()
  2019-12-02 18:18   ` Cédric Le Goater
@ 2019-12-10  3:58     ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2019-12-10  3:58 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: aik, lvivier, qemu-ppc, groug, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On Mon, Dec 02, 2019 at 07:18:52PM +0100, Cédric Le Goater wrote:
> > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> > index c77f9848ec..09b3bd6443 100644
> > --- a/target/ppc/kvm.c
> > +++ b/target/ppc/kvm.c
> > @@ -2101,7 +2101,7 @@ void kvmppc_hint_smt_possible(Error **errp)
> >  
> >  
> >  #ifdef TARGET_PPC64
> > -uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
> > +uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
> >  {
> >      struct kvm_ppc_smmu_info info;
> >      long rampagesize, best_page_shift;
> > @@ -2128,8 +2128,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
> >          }
> >      }
> >  
> > -    return MIN(current_size,
> > -               1ULL << (best_page_shift + hash_shift - 7));
> > +    return 1ULL << (best_page_shift + hash_shift - 7));
> 
> The closing ')' should be removed.

Oops, that's embarrassing.


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size
  2019-12-03  4:18   ` Alexey Kardashevskiy
@ 2019-12-10  5:21     ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2019-12-10  5:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: lvivier, qemu-devel, qemu-ppc, groug, clg

[-- Attachment #1: Type: text/plain, Size: 1119 bytes --]

On Tue, Dec 03, 2019 at 03:18:37PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 29/11/2019 12:35, David Gibson wrote:
> > The Real Mode Area (RMA) needs to fit within Node 0 in NUMA configurations.
> > We use a helper function spapr_node0_size() to calculate this.
> > 
> > But that function doesn't actually get the size of Node 0, it gets the
> > minimum size of all nodes, ever since b082d65a300 "spapr: Add a helper for
> > node0_size calculation".  That was added, apparently, because Node 0 in
> > qemu's terms might not have corresponded to Node 0 in PAPR terms (i.e. the
> > node with memory at address 0).
> 
> 
> After looking at this closely, I think the idea was that the first
> node(s) may have only CPUs but not memory, in this case
> node#0.node_mem==0 and things crash:

Ah!  Excellent point - I misread what the existing node0_size()
function was doing.  Corrected for the next spin.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 3/4] spapr: Clean up RMA size calculation
  2019-12-03  3:44   ` Alexey Kardashevskiy
  2019-12-03  5:06     ` Alexey Kardashevskiy
@ 2019-12-10  5:29     ` David Gibson
  1 sibling, 0 replies; 17+ messages in thread
From: David Gibson @ 2019-12-10  5:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: lvivier, qemu-devel, qemu-ppc, groug, clg

[-- Attachment #1: Type: text/plain, Size: 5497 bytes --]

On Tue, Dec 03, 2019 at 02:44:06PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 29/11/2019 12:35, David Gibson wrote:
> > Move the calculation of the Real Mode Area (RMA) size into a helper
> > function.  While we're there clean it up and correct it in a few ways:
> >   * Add comments making it clearer where the various constraints come from
> >   * Remove a pointless check that the RMA fits within Node 0 (we've just
> >     clamped it so that it does)
> >   * The 16GiB limit we apply is only correct for POWER8, but there is also
> >     a 1TiB limit that applies on POWER9.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr.c | 57 +++++++++++++++++++++++++++++++-------------------
> >  1 file changed, 35 insertions(+), 22 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 52c39daa99..7efd4f2b85 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2664,6 +2664,40 @@ static PCIHostState *spapr_create_default_phb(void)
> >      return PCI_HOST_BRIDGE(dev);
> >  }
> >  
> > +static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
> > +{
> > +    MachineState *machine = MACHINE(spapr);
> > +    hwaddr rma_size = machine->ram_size;
> > +    hwaddr node0_size = spapr_node0_size(machine);
> > +
> > +    /* RMA has to fit in the first NUMA node */
> > +    rma_size = MIN(rma_size, node0_size);
> > +
> > +    /*
> > +     * VRMA access is via a special 1TiB SLB mapping, so the RMA can
> > +     * never exceed that
> > +     */
> > +    rma_size = MIN(rma_size, TiB);
> > +
> > +    /*
> > +     * RMA size is controlled in hardware by LPCR[RMLS].  On POWER8
> 
> 
> RMA is controlled by LPCR on P8 but the RMLS bits on P9 are reserved
> (also reserved in PowerISA 3.0).
> 
> 
> > +     * the largest RMA that can be specified there is 16GiB
> 
> 
> The P8 user manual says:
> ===
> The following RMO sizes are available for the POWER8 processor.
> The RMLS[34:37] field in the LPCR defines the RMO sizes, as described below.
> 1000 - 32 MB
> 0011 - 64 MB
> 0111 - 128 MB
> 0100 - 256 MB
> 0010 - 1 GB
> 0001 - 16 GB
> 0000 - 256 GB
> ===
> 
> The maximum seems to be 256GiB.

Huh.  Ok, looks like I was wrong about where that 16GiB clamp came
from originally.  I wonder where it *did* come from, or if it's simply
wrong.

> > +     */
> > +    if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00,
> > +                               0, spapr->max_compat_pvr)) {
> > +        rma_size = MIN(rma_size, 16 * GiB);
> > +    }
> > +
> > +    if (rma_size < (MIN_RMA_SLOF * MiB)) {
> 
> 
> nit: it is time to redefine MIN_RMA_SLOF to use MiBs imho :)

Yeah, I'd thought about that too.  I've added a patch to clean that up.

> 
> 
> > +        error_setg(errp,
> > +"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
> > +                   MIN_RMA_SLOF);
> 
> Something went wrong with formatting here.

Actually, that's intentional.  The idea is to keep the string on one
line for greppability, without going too far to the right.  I believe
that's a generally accepted exception to the usual formatting rules in
qemu.

> 
> Otherwise looks good. Thanks,
> 
> 
> 
> > +        return -1;
> > +    }
> > +
> > +    return rma_size;
> > +}
> > +
> >  /* pSeries LPAR / sPAPR hardware init */
> >  static void spapr_machine_init(MachineState *machine)
> >  {
> > @@ -2675,7 +2709,6 @@ static void spapr_machine_init(MachineState *machine)
> >      int i;
> >      MemoryRegion *sysmem = get_system_memory();
> >      MemoryRegion *ram = g_new(MemoryRegion, 1);
> > -    hwaddr node0_size = spapr_node0_size(machine);
> >      long load_limit, fw_size;
> >      char *filename;
> >      Error *resize_hpt_err = NULL;
> > @@ -2715,20 +2748,7 @@ static void spapr_machine_init(MachineState *machine)
> >          exit(1);
> >      }
> >  
> > -    spapr->rma_size = node0_size;
> > -
> > -    /* Actually we don't support unbounded RMA anymore since we added
> > -     * proper emulation of HV mode. The max we can get is 16G which
> > -     * also happens to be what we configure for PAPR mode so make sure
> > -     * we don't do anything bigger than that
> > -     */
> > -    spapr->rma_size = MIN(spapr->rma_size, 0x400000000ull);
> > -
> > -    if (spapr->rma_size > node0_size) {
> > -        error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
> > -                     spapr->rma_size);
> > -        exit(1);
> > -    }
> > +    spapr->rma_size = spapr_rma_size(spapr, &error_fatal);
> >  
> >      /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
> >      load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
> > @@ -2956,13 +2976,6 @@ static void spapr_machine_init(MachineState *machine)
> >          }
> >      }
> >  
> > -    if (spapr->rma_size < (MIN_RMA_SLOF * MiB)) {
> > -        error_report(
> > -            "pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area memory)",
> > -            MIN_RMA_SLOF);
> > -        exit(1);
> > -    }
> > -
> >      if (kernel_filename) {
> >          uint64_t lowaddr = 0;
> >  
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [for-5.0 3/4] spapr: Clean up RMA size calculation
  2019-12-03  5:06     ` Alexey Kardashevskiy
@ 2019-12-10  5:44       ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2019-12-10  5:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: lvivier, qemu-devel, qemu-ppc, groug, clg

[-- Attachment #1: Type: text/plain, Size: 3355 bytes --]

On Tue, Dec 03, 2019 at 04:06:46PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 03/12/2019 14:44, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 29/11/2019 12:35, David Gibson wrote:
> >> Move the calculation of the Real Mode Area (RMA) size into a helper
> >> function.  While we're there clean it up and correct it in a few ways:
> >>   * Add comments making it clearer where the various constraints come from
> >>   * Remove a pointless check that the RMA fits within Node 0 (we've just
> >>     clamped it so that it does)
> >>   * The 16GiB limit we apply is only correct for POWER8, but there is also
> >>     a 1TiB limit that applies on POWER9.
> >>
> >> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> ---
> >>  hw/ppc/spapr.c | 57 +++++++++++++++++++++++++++++++-------------------
> >>  1 file changed, 35 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 52c39daa99..7efd4f2b85 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2664,6 +2664,40 @@ static PCIHostState *spapr_create_default_phb(void)
> >>      return PCI_HOST_BRIDGE(dev);
> >>  }
> >>  
> >> +static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
> >> +{
> >> +    MachineState *machine = MACHINE(spapr);
> >> +    hwaddr rma_size = machine->ram_size;
> >> +    hwaddr node0_size = spapr_node0_size(machine);
> >> +
> >> +    /* RMA has to fit in the first NUMA node */
> >> +    rma_size = MIN(rma_size, node0_size);
> >> +
> >> +    /*
> >> +     * VRMA access is via a special 1TiB SLB mapping, so the RMA can
> >> +     * never exceed that
> >> +     */
> >> +    rma_size = MIN(rma_size, TiB);
> >> +
> >> +    /*
> >> +     * RMA size is controlled in hardware by LPCR[RMLS].  On POWER8
> > 
> > 
> > RMA is controlled by LPCR on P8 but the RMLS bits on P9 are reserved
> > (also reserved in PowerISA 3.0).
> > 
> > 
> >> +     * the largest RMA that can be specified there is 16GiB
> > 
> > 
> > The P8 user manual says:
> > ===
> > The following RMO sizes are available for the POWER8 processor.
> > The RMLS[34:37] field in the LPCR defines the RMO sizes, as described below.
> > 1000 - 32 MB
> > 0011 - 64 MB
> > 0111 - 128 MB
> > 0100 - 256 MB
> > 0010 - 1 GB
> > 0001 - 16 GB
> > 0000 - 256 GB
> > ===
> > 
> > The maximum seems to be 256GiB.
> 
> 
> Ah, update from Paul - we do not actually use what LPCR[RMLS] controls -
> Real Mode Offset Register (RMOR).

Ah... I realized where the 16GiB limit was coming from.

We don't use RMLS with KVM, but we *do* use it under TCG.  The softmmu
code isn't aware of PAPR specific stuff at this point and just
consults the LPCR to handle real mode accesses.  And the TCG
implementation only supports up to 16GiB, even though POWER8 supports
more.

And, AFAICT that limit will apply for a POWER9 guest in hash mode as
well.  Not for initial boot, because we run in radix mode until we
determine that the guest wants hash, but if we drop back to real mode
after boot, this might matter.

I'm going to have to think about how to sort that mess out.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-12-10  5:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-29  1:35 [for-5.0 0/4] Fixes for RMA size calculation David Gibson
2019-11-29  1:35 ` [for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size() David Gibson
2019-12-02  7:45   ` Cédric Le Goater
2019-12-02  8:55   ` Greg Kurz
2019-12-02 18:18   ` Cédric Le Goater
2019-12-10  3:58     ` David Gibson
2019-12-03  3:32   ` Alexey Kardashevskiy
2019-11-29  1:35 ` [for-5.0 2/4] spapr: Don't attempt to clamp RMA to VRMA constraint David Gibson
2019-12-03  3:33   ` Alexey Kardashevskiy
2019-11-29  1:35 ` [for-5.0 3/4] spapr: Clean up RMA size calculation David Gibson
2019-12-03  3:44   ` Alexey Kardashevskiy
2019-12-03  5:06     ` Alexey Kardashevskiy
2019-12-10  5:44       ` David Gibson
2019-12-10  5:29     ` David Gibson
2019-11-29  1:35 ` [for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size David Gibson
2019-12-03  4:18   ` Alexey Kardashevskiy
2019-12-10  5:21     ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).