[PATCH] spapr: Rework hash<->radix transitions at CAS

From: Greg Kurz <groug@kaod.org>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Laurent Vivier <lvivier@redhat.com>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org
Subject: [PATCH] spapr: Rework hash<->radix transitions at CAS
Date: Thu, 13 Feb 2020 16:38:38 +0100	[thread overview]
Message-ID: <158160831807.3339719.7059822505220975954.stgit@bahia.lan> (raw)

Until the CAS negotiation is over, an HPT can be allocated on three
different paths:

1) during machine reset if the host doesn't support radix,

2) during CAS if the guest wants hash and doesn't support HPT resizing,
   in which case we pre-emptively resize the HPT to accomodate maxram,

3) during CAS if no CAS reboot was requested, the guest wants hash but
   we're currently configured for radix.

Depending on the various combinations of host or guest MMU support,
HPT resizing guest support and the possibility of a CAS reboot, it
is quite hard to know which of these allocates the HPT that will
be ultimately used by the guest that wants to do hash. Also, some of
them have bugs:

- 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
  and thus doesn't update the VRMA size, even though we've just extended
  the HPT. Not sure what issues this can cause,

- 3) doesn't check for HPT resizing support and will always allocate a
  small HPT based on the initial RAM size. This caps the total amount of
  RAM the guest can see, especially if maxram is much higher than the
  initial ram.

We only support guests that do CAS and we already assume that the HPT
isn't being used when we do the pre-emptive resizing at CAS. It thus
seems reasonable to only allocate the HPT at the end of CAS, when no
CAS reboot was requested.

Consolidate the logic so that we only create the HPT during 3), ie.
when we're done with the CAS reboot cycles, and ensure HPT resizing
is taken into account. This fixes the radix->hash transition for
all cases.

The guest can theoretically call CAS several times, without a CAS
reboot in between. Linux guests don't do that, but better safe than
sorry, let's ensure we can also handle the symmetrical hash->radix
transition correctly: free the HPT and set the GR bit in PATE.
An helper is introduced for the latter since this is already what
we do during machine reset when going for radix.

As a bonus, this removes one user of spapr->cas_reboot, which we
want to get rid of in the future.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c         |   25 +++++++++++++++-----
 hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
 include/hw/ppc/spapr.h |    1 +
 3 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 828e2cc1359a..88bc0e4e3ca1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
 {
     int hpt_shift;
 
+    /*
+     * HPT resizing is a bit of a special case, because when enabled
+     * we assume an HPT guest will support it until it says it
+     * doesn't, instead of assuming it won't support it until it says
+     * it does.  Strictly speaking that approach could break for
+     * guests which don't make a CAS call, but those are so old we
+     * don't care about them.  Without that assumption we'd have to
+     * make at least a temporary allocation of an HPT sized for max
+     * memory, which could be impossibly difficult under KVM HV if
+     * maxram is large.
+     */
     if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
-        || (spapr->cas_reboot
-            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
+        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
         hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
     } else {
         uint64_t current_ram_size;
@@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
     return 0;
 }
 
+void spapr_reset_patb_entry(SpaprMachineState *spapr)
+{
+    spapr->patb_entry = PATE1_GR;
+    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
+}
+
 static void spapr_machine_reset(MachineState *machine)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(machine);
@@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
          * without a HPT because KVM will start them in radix mode.
          * Set the GR bit in PATE so that we know there is no HPT.
          */
-        spapr->patb_entry = PATE1_GR;
-        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
-    } else {
-        spapr_setup_hpt_and_vrma(spapr);
+        spapr_reset_patb_entry(spapr);
     }
 
     qemu_devices_reset();
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index b8bb66b5c0d4..57ddf0fa6d05 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     bool raw_mode_supported = false;
     bool guest_xive;
     CPUState *cs;
+    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
 
     /* CAS is supposed to be called early when only the boot vCPU is active. */
     CPU_FOREACH(cs) {
@@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
 
     guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
 
-    /*
-     * HPT resizing is a bit of a special case, because when enabled
-     * we assume an HPT guest will support it until it says it
-     * doesn't, instead of assuming it won't support it until it says
-     * it does.  Strictly speaking that approach could break for
-     * guests which don't make a CAS call, but those are so old we
-     * don't care about them.  Without that assumption we'd have to
-     * make at least a temporary allocation of an HPT sized for max
-     * memory, which could be impossibly difficult under KVM HV if
-     * maxram is large.
-     */
-    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
-        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
-
-        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
-            error_report(
-                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
-            exit(1);
-        }
-
-        if (spapr->htab_shift < maxshift) {
-            /* Guest doesn't know about HPT resizing, so we
-             * pre-emptively resize for the maximum permitted RAM.  At
-             * the point this is called, nothing should have been
-             * entered into the existing HPT */
-            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
-            push_sregs_to_kvm_pr(spapr);
-        }
-    }
-
     /* NOTE: there are actually a number of ov5 bits where input from the
      * guest is always zero, and the platform/QEMU enables them independently
      * of guest input. To model these properly we'd want some sort of mask,
@@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             error_report("Guest requested unavailable MMU mode (hash).");
             exit(EXIT_FAILURE);
         }
+        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
+            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
+            error_report(
+                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
+            exit(1);
+        }
     }
     spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
     spapr_ovec_cleanup(ov1_guest);
@@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
         void *fdt;
         SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
 
-        /* If spapr_machine_reset() did not set up a HPT but one is necessary
-         * (because the guest isn't going to use radix) then set it up here. */
-        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
-            /* legacy hash or new hash: */
-            spapr_setup_hpt_and_vrma(spapr);
+        if (!guest_radix) {
+            /*
+             * Either spapr_machine_reset() did not set up a HPT but one
+             * is necessary (because the guest isn't going to use radix),
+             * or the guest doesn't know about HPT resizing and we need to
+             * pre-emptively resize for the maximum permitted RAM. Set it
+             * up here. At the point this is called, nothing should have
+             * been entered into the existing HPT.
+             */
+            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
+                /* legacy hash or new hash: */
+                spapr_setup_hpt_and_vrma(spapr);
+                push_sregs_to_kvm_pr(spapr);
+            }
+        } else {
+            spapr_free_hpt(spapr);
+            spapr_reset_patb_entry(spapr);
         }
 
         if (fdt_bufsize < sizeof(hdr)) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 09110961a589..9d88b5596481 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
 hwaddr spapr_get_rtas_addr(void);
+void spapr_reset_patb_entry(SpaprMachineState *spapr);
 #endif /* HW_SPAPR_H */