QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] spapr: Rework hash<->radix transitions at CAS
@ 2020-02-13 15:38 Greg Kurz
  2020-02-13 15:56 ` [PATCH] spapr: Make spapr_reallocate_hpt() static Greg Kurz
  2020-02-13 22:28 ` [PATCH] spapr: Rework hash<->radix transitions at CAS David Gibson
  0 siblings, 2 replies; 4+ messages in thread
From: Greg Kurz @ 2020-02-13 15:38 UTC (permalink / raw)
  To: David Gibson; +Cc: Alexey Kardashevskiy, Laurent Vivier, qemu-ppc, qemu-devel

Until the CAS negotiation is over, an HPT can be allocated on three
different paths:

1) during machine reset if the host doesn't support radix,

2) during CAS if the guest wants hash and doesn't support HPT resizing,
   in which case we pre-emptively resize the HPT to accomodate maxram,

3) during CAS if no CAS reboot was requested, the guest wants hash but
   we're currently configured for radix.

Depending on the various combinations of host or guest MMU support,
HPT resizing guest support and the possibility of a CAS reboot, it
is quite hard to know which of these allocates the HPT that will
be ultimately used by the guest that wants to do hash. Also, some of
them have bugs:

- 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
  and thus doesn't update the VRMA size, even though we've just extended
  the HPT. Not sure what issues this can cause,

- 3) doesn't check for HPT resizing support and will always allocate a
  small HPT based on the initial RAM size. This caps the total amount of
  RAM the guest can see, especially if maxram is much higher than the
  initial ram.

We only support guests that do CAS and we already assume that the HPT
isn't being used when we do the pre-emptive resizing at CAS. It thus
seems reasonable to only allocate the HPT at the end of CAS, when no
CAS reboot was requested.

Consolidate the logic so that we only create the HPT during 3), ie.
when we're done with the CAS reboot cycles, and ensure HPT resizing
is taken into account. This fixes the radix->hash transition for
all cases.

The guest can theoretically call CAS several times, without a CAS
reboot in between. Linux guests don't do that, but better safe than
sorry, let's ensure we can also handle the symmetrical hash->radix
transition correctly: free the HPT and set the GR bit in PATE.
An helper is introduced for the latter since this is already what
we do during machine reset when going for radix.

As a bonus, this removes one user of spapr->cas_reboot, which we
want to get rid of in the future.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c         |   25 +++++++++++++++-----
 hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
 include/hw/ppc/spapr.h |    1 +
 3 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 828e2cc1359a..88bc0e4e3ca1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
 {
     int hpt_shift;
 
+    /*
+     * HPT resizing is a bit of a special case, because when enabled
+     * we assume an HPT guest will support it until it says it
+     * doesn't, instead of assuming it won't support it until it says
+     * it does.  Strictly speaking that approach could break for
+     * guests which don't make a CAS call, but those are so old we
+     * don't care about them.  Without that assumption we'd have to
+     * make at least a temporary allocation of an HPT sized for max
+     * memory, which could be impossibly difficult under KVM HV if
+     * maxram is large.
+     */
     if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
-        || (spapr->cas_reboot
-            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
+        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
         hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
     } else {
         uint64_t current_ram_size;
@@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
     return 0;
 }
 
+void spapr_reset_patb_entry(SpaprMachineState *spapr)
+{
+    spapr->patb_entry = PATE1_GR;
+    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
+}
+
 static void spapr_machine_reset(MachineState *machine)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(machine);
@@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
          * without a HPT because KVM will start them in radix mode.
          * Set the GR bit in PATE so that we know there is no HPT.
          */
-        spapr->patb_entry = PATE1_GR;
-        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
-    } else {
-        spapr_setup_hpt_and_vrma(spapr);
+        spapr_reset_patb_entry(spapr);
     }
 
     qemu_devices_reset();
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index b8bb66b5c0d4..57ddf0fa6d05 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     bool raw_mode_supported = false;
     bool guest_xive;
     CPUState *cs;
+    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
 
     /* CAS is supposed to be called early when only the boot vCPU is active. */
     CPU_FOREACH(cs) {
@@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
 
     guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
 
-    /*
-     * HPT resizing is a bit of a special case, because when enabled
-     * we assume an HPT guest will support it until it says it
-     * doesn't, instead of assuming it won't support it until it says
-     * it does.  Strictly speaking that approach could break for
-     * guests which don't make a CAS call, but those are so old we
-     * don't care about them.  Without that assumption we'd have to
-     * make at least a temporary allocation of an HPT sized for max
-     * memory, which could be impossibly difficult under KVM HV if
-     * maxram is large.
-     */
-    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
-        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
-
-        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
-            error_report(
-                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
-            exit(1);
-        }
-
-        if (spapr->htab_shift < maxshift) {
-            /* Guest doesn't know about HPT resizing, so we
-             * pre-emptively resize for the maximum permitted RAM.  At
-             * the point this is called, nothing should have been
-             * entered into the existing HPT */
-            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
-            push_sregs_to_kvm_pr(spapr);
-        }
-    }
-
     /* NOTE: there are actually a number of ov5 bits where input from the
      * guest is always zero, and the platform/QEMU enables them independently
      * of guest input. To model these properly we'd want some sort of mask,
@@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             error_report("Guest requested unavailable MMU mode (hash).");
             exit(EXIT_FAILURE);
         }
+        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
+            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
+            error_report(
+                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
+            exit(1);
+        }
     }
     spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
     spapr_ovec_cleanup(ov1_guest);
@@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
         void *fdt;
         SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
 
-        /* If spapr_machine_reset() did not set up a HPT but one is necessary
-         * (because the guest isn't going to use radix) then set it up here. */
-        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
-            /* legacy hash or new hash: */
-            spapr_setup_hpt_and_vrma(spapr);
+        if (!guest_radix) {
+            /*
+             * Either spapr_machine_reset() did not set up a HPT but one
+             * is necessary (because the guest isn't going to use radix),
+             * or the guest doesn't know about HPT resizing and we need to
+             * pre-emptively resize for the maximum permitted RAM. Set it
+             * up here. At the point this is called, nothing should have
+             * been entered into the existing HPT.
+             */
+            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
+                /* legacy hash or new hash: */
+                spapr_setup_hpt_and_vrma(spapr);
+                push_sregs_to_kvm_pr(spapr);
+            }
+        } else {
+            spapr_free_hpt(spapr);
+            spapr_reset_patb_entry(spapr);
         }
 
         if (fdt_bufsize < sizeof(hdr)) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 09110961a589..9d88b5596481 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
 hwaddr spapr_get_rtas_addr(void);
+void spapr_reset_patb_entry(SpaprMachineState *spapr);
 #endif /* HW_SPAPR_H */



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] spapr: Make spapr_reallocate_hpt() static
  2020-02-13 15:38 [PATCH] spapr: Rework hash<->radix transitions at CAS Greg Kurz
@ 2020-02-13 15:56 ` Greg Kurz
  2020-02-13 22:28 ` [PATCH] spapr: Rework hash<->radix transitions at CAS David Gibson
  1 sibling, 0 replies; 4+ messages in thread
From: Greg Kurz @ 2020-02-13 15:56 UTC (permalink / raw)
  To: David Gibson; +Cc: Alexey Kardashevskiy, Laurent Vivier, qemu-ppc, qemu-devel

Its only users are in spapr.c.

Signed-off-by: Greg Kurz <groug@kaod.org>
---

I realized just after posting that spapr_reallocate_hpt() didn't need to
be extern anymore. Maybe worth squashing this into the "spapr: Rework
hash<->radix transitions at CAS" patch ?
---
 hw/ppc/spapr.c         |    4 ++--
 include/hw/ppc/spapr.h |    2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 88bc0e4e3ca1..1537866533cb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1519,8 +1519,8 @@ void spapr_free_hpt(SpaprMachineState *spapr)
     close_htab_fd(spapr);
 }
 
-void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
-                          Error **errp)
+static void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
+                                 Error **errp)
 {
     long rc;
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9d88b5596481..0be8abc70744 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -821,8 +821,6 @@ void spapr_hotplug_req_add_by_count_indexed(SpaprDrcType drc_type,
 void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                                                uint32_t count, uint32_t index);
 int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
-void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
-                          Error **errp);
 void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
  2020-02-13 15:38 [PATCH] spapr: Rework hash<->radix transitions at CAS Greg Kurz
  2020-02-13 15:56 ` [PATCH] spapr: Make spapr_reallocate_hpt() static Greg Kurz
@ 2020-02-13 22:28 ` David Gibson
  2020-02-14 18:19   ` Greg Kurz
  1 sibling, 1 reply; 4+ messages in thread
From: David Gibson @ 2020-02-13 22:28 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Alexey Kardashevskiy, Laurent Vivier, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 10099 bytes --]

On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> Until the CAS negotiation is over, an HPT can be allocated on three
> different paths:
> 
> 1) during machine reset if the host doesn't support radix,
> 
> 2) during CAS if the guest wants hash and doesn't support HPT resizing,
>    in which case we pre-emptively resize the HPT to accomodate maxram,
> 
> 3) during CAS if no CAS reboot was requested, the guest wants hash but
>    we're currently configured for radix.
> 
> Depending on the various combinations of host or guest MMU support,
> HPT resizing guest support and the possibility of a CAS reboot, it
> is quite hard to know which of these allocates the HPT that will
> be ultimately used by the guest that wants to do hash. Also, some of
> them have bugs:
> 
> - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
>   and thus doesn't update the VRMA size, even though we've just extended
>   the HPT. Not sure what issues this can cause,
> 
> - 3) doesn't check for HPT resizing support and will always allocate a
>   small HPT based on the initial RAM size. This caps the total amount of
>   RAM the guest can see, especially if maxram is much higher than the
>   initial ram.
> 
> We only support guests that do CAS and we already assume that the HPT
> isn't being used when we do the pre-emptive resizing at CAS. It thus
> seems reasonable to only allocate the HPT at the end of CAS, when no
> CAS reboot was requested.
> 
> Consolidate the logic so that we only create the HPT during 3), ie.
> when we're done with the CAS reboot cycles, and ensure HPT resizing
> is taken into account. This fixes the radix->hash transition for
> all cases.

Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
need the HPT at all times there, or there's nowhere to put VRMA
entries, so we can't run even in real mode.

> The guest can theoretically call CAS several times, without a CAS
> reboot in between. Linux guests don't do that, but better safe than
> sorry, let's ensure we can also handle the symmetrical hash->radix
> transition correctly: free the HPT and set the GR bit in PATE.
> An helper is introduced for the latter since this is already what
> we do during machine reset when going for radix.
> 
> As a bonus, this removes one user of spapr->cas_reboot, which we
> want to get rid of in the future.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/ppc/spapr.c         |   25 +++++++++++++++-----
>  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
>  include/hw/ppc/spapr.h |    1 +
>  3 files changed, 44 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 828e2cc1359a..88bc0e4e3ca1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
>  {
>      int hpt_shift;
>  
> +    /*
> +     * HPT resizing is a bit of a special case, because when enabled
> +     * we assume an HPT guest will support it until it says it
> +     * doesn't, instead of assuming it won't support it until it says
> +     * it does.  Strictly speaking that approach could break for
> +     * guests which don't make a CAS call, but those are so old we
> +     * don't care about them.  Without that assumption we'd have to
> +     * make at least a temporary allocation of an HPT sized for max
> +     * memory, which could be impossibly difficult under KVM HV if
> +     * maxram is large.
> +     */
>      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> -        || (spapr->cas_reboot
> -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
>          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
>      } else {
>          uint64_t current_ram_size;
> @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
>      return 0;
>  }
>  
> +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> +{
> +    spapr->patb_entry = PATE1_GR;
> +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> +}
> +
>  static void spapr_machine_reset(MachineState *machine)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
>           * without a HPT because KVM will start them in radix mode.
>           * Set the GR bit in PATE so that we know there is no HPT.
>           */
> -        spapr->patb_entry = PATE1_GR;
> -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> -    } else {
> -        spapr_setup_hpt_and_vrma(spapr);
> +        spapr_reset_patb_entry(spapr);
>      }
>  
>      qemu_devices_reset();
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index b8bb66b5c0d4..57ddf0fa6d05 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>      bool raw_mode_supported = false;
>      bool guest_xive;
>      CPUState *cs;
> +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
>  
>      /* CAS is supposed to be called early when only the boot vCPU is active. */
>      CPU_FOREACH(cs) {
> @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>  
>      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
>  
> -    /*
> -     * HPT resizing is a bit of a special case, because when enabled
> -     * we assume an HPT guest will support it until it says it
> -     * doesn't, instead of assuming it won't support it until it says
> -     * it does.  Strictly speaking that approach could break for
> -     * guests which don't make a CAS call, but those are so old we
> -     * don't care about them.  Without that assumption we'd have to
> -     * make at least a temporary allocation of an HPT sized for max
> -     * memory, which could be impossibly difficult under KVM HV if
> -     * maxram is large.
> -     */
> -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> -
> -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> -            error_report(
> -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> -            exit(1);
> -        }
> -
> -        if (spapr->htab_shift < maxshift) {
> -            /* Guest doesn't know about HPT resizing, so we
> -             * pre-emptively resize for the maximum permitted RAM.  At
> -             * the point this is called, nothing should have been
> -             * entered into the existing HPT */
> -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> -            push_sregs_to_kvm_pr(spapr);
> -        }
> -    }
> -
>      /* NOTE: there are actually a number of ov5 bits where input from the
>       * guest is always zero, and the platform/QEMU enables them independently
>       * of guest input. To model these properly we'd want some sort of mask,
> @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>              error_report("Guest requested unavailable MMU mode (hash).");
>              exit(EXIT_FAILURE);
>          }
> +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> +            error_report(
> +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> +            exit(1);
> +        }
>      }
>      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
>      spapr_ovec_cleanup(ov1_guest);
> @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>          void *fdt;
>          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
>  
> -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> -         * (because the guest isn't going to use radix) then set it up here. */
> -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> -            /* legacy hash or new hash: */
> -            spapr_setup_hpt_and_vrma(spapr);
> +        if (!guest_radix) {
> +            /*
> +             * Either spapr_machine_reset() did not set up a HPT but one
> +             * is necessary (because the guest isn't going to use radix),
> +             * or the guest doesn't know about HPT resizing and we need to
> +             * pre-emptively resize for the maximum permitted RAM. Set it
> +             * up here. At the point this is called, nothing should have
> +             * been entered into the existing HPT.
> +             */
> +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> +                /* legacy hash or new hash: */
> +                spapr_setup_hpt_and_vrma(spapr);
> +                push_sregs_to_kvm_pr(spapr);
> +            }
> +        } else {
> +            spapr_free_hpt(spapr);
> +            spapr_reset_patb_entry(spapr);
>          }
>  
>          if (fdt_bufsize < sizeof(hdr)) {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 09110961a589..9d88b5596481 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>  hwaddr spapr_get_rtas_addr(void);
> +void spapr_reset_patb_entry(SpaprMachineState *spapr);
>  #endif /* HW_SPAPR_H */
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
  2020-02-13 22:28 ` [PATCH] spapr: Rework hash<->radix transitions at CAS David Gibson
@ 2020-02-14 18:19   ` Greg Kurz
  0 siblings, 0 replies; 4+ messages in thread
From: Greg Kurz @ 2020-02-14 18:19 UTC (permalink / raw)
  To: David Gibson; +Cc: Alexey Kardashevskiy, Laurent Vivier, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 10721 bytes --]

On Fri, 14 Feb 2020 09:28:35 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> > Until the CAS negotiation is over, an HPT can be allocated on three
> > different paths:
> > 
> > 1) during machine reset if the host doesn't support radix,
> > 
> > 2) during CAS if the guest wants hash and doesn't support HPT resizing,
> >    in which case we pre-emptively resize the HPT to accomodate maxram,
> > 
> > 3) during CAS if no CAS reboot was requested, the guest wants hash but
> >    we're currently configured for radix.
> > 
> > Depending on the various combinations of host or guest MMU support,
> > HPT resizing guest support and the possibility of a CAS reboot, it
> > is quite hard to know which of these allocates the HPT that will
> > be ultimately used by the guest that wants to do hash. Also, some of
> > them have bugs:
> > 
> > - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
> >   and thus doesn't update the VRMA size, even though we've just extended
> >   the HPT. Not sure what issues this can cause,
> > 
> > - 3) doesn't check for HPT resizing support and will always allocate a
> >   small HPT based on the initial RAM size. This caps the total amount of
> >   RAM the guest can see, especially if maxram is much higher than the
> >   initial ram.
> > 
> > We only support guests that do CAS and we already assume that the HPT
> > isn't being used when we do the pre-emptive resizing at CAS. It thus
> > seems reasonable to only allocate the HPT at the end of CAS, when no
> > CAS reboot was requested.
> > 
> > Consolidate the logic so that we only create the HPT during 3), ie.
> > when we're done with the CAS reboot cycles, and ensure HPT resizing
> > is taken into account. This fixes the radix->hash transition for
> > all cases.
> 
> Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
> need the HPT at all times there, or there's nowhere to put VRMA
> entries, so we can't run even in real mode.
> 

Well it happens to be working anyway because KVM automatically
creates an HPT (default size 16MB) in kvmppc_hv_setup_htab_rma()
if QEMU didn't do so already... Would a comment to emphasize this
be enough or do you prefer I don't drop the HPT allocation currently
performed at machine reset ?

> > The guest can theoretically call CAS several times, without a CAS
> > reboot in between. Linux guests don't do that, but better safe than
> > sorry, let's ensure we can also handle the symmetrical hash->radix
> > transition correctly: free the HPT and set the GR bit in PATE.
> > An helper is introduced for the latter since this is already what
> > we do during machine reset when going for radix.
> > 
> > As a bonus, this removes one user of spapr->cas_reboot, which we
> > want to get rid of in the future.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  hw/ppc/spapr.c         |   25 +++++++++++++++-----
> >  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
> >  include/hw/ppc/spapr.h |    1 +
> >  3 files changed, 44 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 828e2cc1359a..88bc0e4e3ca1 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> >  {
> >      int hpt_shift;
> >  
> > +    /*
> > +     * HPT resizing is a bit of a special case, because when enabled
> > +     * we assume an HPT guest will support it until it says it
> > +     * doesn't, instead of assuming it won't support it until it says
> > +     * it does.  Strictly speaking that approach could break for
> > +     * guests which don't make a CAS call, but those are so old we
> > +     * don't care about them.  Without that assumption we'd have to
> > +     * make at least a temporary allocation of an HPT sized for max
> > +     * memory, which could be impossibly difficult under KVM HV if
> > +     * maxram is large.
> > +     */
> >      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> > -        || (spapr->cas_reboot
> > -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> > +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
> >          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> >      } else {
> >          uint64_t current_ram_size;
> > @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
> >      return 0;
> >  }
> >  
> > +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> > +{
> > +    spapr->patb_entry = PATE1_GR;
> > +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > +}
> > +
> >  static void spapr_machine_reset(MachineState *machine)
> >  {
> >      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> > @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
> >           * without a HPT because KVM will start them in radix mode.
> >           * Set the GR bit in PATE so that we know there is no HPT.
> >           */
> > -        spapr->patb_entry = PATE1_GR;
> > -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > -    } else {
> > -        spapr_setup_hpt_and_vrma(spapr);
> > +        spapr_reset_patb_entry(spapr);
> >      }
> >  
> >      qemu_devices_reset();
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index b8bb66b5c0d4..57ddf0fa6d05 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >      bool raw_mode_supported = false;
> >      bool guest_xive;
> >      CPUState *cs;
> > +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> >  
> >      /* CAS is supposed to be called early when only the boot vCPU is active. */
> >      CPU_FOREACH(cs) {
> > @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >  
> >      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
> >  
> > -    /*
> > -     * HPT resizing is a bit of a special case, because when enabled
> > -     * we assume an HPT guest will support it until it says it
> > -     * doesn't, instead of assuming it won't support it until it says
> > -     * it does.  Strictly speaking that approach could break for
> > -     * guests which don't make a CAS call, but those are so old we
> > -     * don't care about them.  Without that assumption we'd have to
> > -     * make at least a temporary allocation of an HPT sized for max
> > -     * memory, which could be impossibly difficult under KVM HV if
> > -     * maxram is large.
> > -     */
> > -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > -
> > -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> > -            error_report(
> > -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > -            exit(1);
> > -        }
> > -
> > -        if (spapr->htab_shift < maxshift) {
> > -            /* Guest doesn't know about HPT resizing, so we
> > -             * pre-emptively resize for the maximum permitted RAM.  At
> > -             * the point this is called, nothing should have been
> > -             * entered into the existing HPT */
> > -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> > -            push_sregs_to_kvm_pr(spapr);
> > -        }
> > -    }
> > -
> >      /* NOTE: there are actually a number of ov5 bits where input from the
> >       * guest is always zero, and the platform/QEMU enables them independently
> >       * of guest input. To model these properly we'd want some sort of mask,
> > @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >              error_report("Guest requested unavailable MMU mode (hash).");
> >              exit(EXIT_FAILURE);
> >          }
> > +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> > +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > +            error_report(
> > +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > +            exit(1);
> > +        }
> >      }
> >      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
> >      spapr_ovec_cleanup(ov1_guest);
> > @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >          void *fdt;
> >          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> >  
> > -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> > -         * (because the guest isn't going to use radix) then set it up here. */
> > -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> > -            /* legacy hash or new hash: */
> > -            spapr_setup_hpt_and_vrma(spapr);
> > +        if (!guest_radix) {
> > +            /*
> > +             * Either spapr_machine_reset() did not set up a HPT but one
> > +             * is necessary (because the guest isn't going to use radix),
> > +             * or the guest doesn't know about HPT resizing and we need to
> > +             * pre-emptively resize for the maximum permitted RAM. Set it
> > +             * up here. At the point this is called, nothing should have
> > +             * been entered into the existing HPT.
> > +             */
> > +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> > +                /* legacy hash or new hash: */
> > +                spapr_setup_hpt_and_vrma(spapr);
> > +                push_sregs_to_kvm_pr(spapr);
> > +            }
> > +        } else {
> > +            spapr_free_hpt(spapr);
> > +            spapr_reset_patb_entry(spapr);
> >          }
> >  
> >          if (fdt_bufsize < sizeof(hdr)) {
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 09110961a589..9d88b5596481 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> >  
> >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> >  hwaddr spapr_get_rtas_addr(void);
> > +void spapr_reset_patb_entry(SpaprMachineState *spapr);
> >  #endif /* HW_SPAPR_H */
> > 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 15:38 [PATCH] spapr: Rework hash<->radix transitions at CAS Greg Kurz
2020-02-13 15:56 ` [PATCH] spapr: Make spapr_reallocate_hpt() static Greg Kurz
2020-02-13 22:28 ` [PATCH] spapr: Rework hash<->radix transitions at CAS David Gibson
2020-02-14 18:19   ` Greg Kurz

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git