All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: Fabiano Rosas <farosas@linux.ibm.com>, qemu-devel@nongnu.org
Cc: aneesh.kumar@linux.ibm.com, qemu-ppc@nongnu.org, clg@kaod.org,
	npiggin@gmail.com, david@gibson.dropbear.id.au
Subject: Re: [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5
Date: Wed, 9 Mar 2022 16:09:04 -0300	[thread overview]
Message-ID: <d9cb6b05-9c06-8081-ee7f-a44f3a85848b@gmail.com> (raw)
In-Reply-To: <20220309012400.2527157-1-farosas@linux.ibm.com>



On 3/8/22 22:23, Fabiano Rosas wrote:
> QEMU reports MMU support to the guest via the ibm,architecture-vec-5
> property of the /chosen node. Byte number 26 specifies Radix Table
> Expansions, currently only GTSE (Guest Translation Shootdown
> Enable). This feature determines whether the tlbie instruction (and
> others) are HV privileged.
> 
> Up until now, we always reported GTSE=1 to guests. Even after the
> support for GTSE=0 was added. As part of that support, a kernel
> command line radix_hcall_invalidate=on was introduced that overrides
> the GTSE value received via CAS. So a guest can run with GTSE=0 and
> use the H_RPT_INVALIDATE hcall instead of tlbie.
> 
> In this scenario, having GTSE always set to 1 by QEMU leads to a crash
> when running nested KVM guests because KVM does not allow a nested
> hypervisor to set GTSE support for its nested guests. So a nested
> guest always uses the same value for LPCR_GTSE as its HV. Since the
> nested HV disabled GTSE, but the L2 QEMU always reports GTSE=1, we run
> into a crash when:
> 
> L1 LPCR_GTSE=0
> L2 LPCR_GTSE=0
> L2 CAS GTSE=1
> 
> The nested guest will run 'tlbie' and crash because the HW looks at
> LPCR_GTSE, which is clear.
> 
> Having GTSE disabled in the L1 and enabled in the L2 is not an option
> because the whole purpose of GTSE is to disallow access to tlbie and
> we cannot allow L1 to spawn L2s that can access features that L1
> itself cannot.
> 
> We also cannot have the guest check the LPCR bit, because LPCR is
> HV-privileged.
> 
> So this patch goes through the most intuitive route which is to have
> QEMU ask KVM about GTSE support and advertise the correct value to the
> guest. A new KVM_CAP_PPC_GTSE capability is being added.
> 
> TCG continues to always enable GTSE.
> 
> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
> ---


I'm not sure if I fully understood the situation, so let me recap. Once upon a time,
QEMU advertised GTSE=1 and the host would never advertise other value, and everyone
was happy.

The host started to support GTSE=0, but QEMU kept advertising GTSE=1 regardless, and no
KVM GTSE cap was added to reflect the host support. I'll then assume that:


- all guests would break if running in a GTSE=0 host prior to rpt_invalidate support (which
is necessary to allow the guest to run in GTSE=0)

- apparently no one ever tried to run a KVM guest in a GTSE=0 host, so no bugs were opened

After commit 82123b756a1a2f1 (target/ppc: Support for H_RPT_INVALIDATE hcall) we added
cap-rpt-invalidate. I didn't follow the discussions of this cap but it seems like, as with
almost every other cap we have, there would be a migration problem for a guest that was in
a rpt_invalidate aware host to migrate to another where this wouldn't be true, and the cap
solves that.

What I'm not following is why, even after having cap-rpt-invalidate, we are still "lying"
about the GTSE=1 regardless of what the host supports. We could've added the GTSE KVM cap
at the same time rpt_invalidate was introduced, and guests that want to ignore this setting
can use the cap to bypass it.


In the end this patch is a needed fix IMHO. My confusion is why we're doing this just now.


The patch itself LGTM.


Thanks,


Daniel


>   hw/ppc/spapr.c       | 38 +++++++++++++++++++++++++++++++-------
>   target/ppc/kvm.c     |  8 ++++++++
>   target/ppc/kvm_ppc.h |  6 ++++++
>   3 files changed, 45 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 4cc204f90d..3e95a1831f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -971,7 +971,7 @@ static void spapr_dt_ov5_platform_support(SpaprMachineState *spapr, void *fdt,
>           23, 0x00, /* XICS / XIVE mode */
>           24, 0x00, /* Hash/Radix, filled in below. */
>           25, 0x00, /* Hash options: Segment Tables == no, GTSE == no. */
> -        26, 0x40, /* Radix options: GTSE == yes. */
> +        26, 0x00, /* Radix options, filled in below. */
>       };
>   
>       if (spapr->irq->xics && spapr->irq->xive) {
> @@ -1000,10 +1000,16 @@ static void spapr_dt_ov5_platform_support(SpaprMachineState *spapr, void *fdt,
>           } else {
>               val[3] = 0x00; /* Hash */
>           }
> +
> +        if (kvmppc_has_cap_gtse()) {
> +            val[7] = 0x40 /* OV5_MMU_RADIX_GTSE */;
> +        }
>       } else {
>           /* V3 MMU supports both hash and radix in tcg (with dynamic switching) */
>           val[3] = 0xC0;
> +        val[7] = 0x40 /* OV5_MMU_RADIX_GTSE */;
>       }
> +
>       _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
>                        val, sizeof(val)));
>   }
> @@ -2824,14 +2830,32 @@ static void spapr_machine_init(MachineState *machine)
>       /* Init numa_assoc_array */
>       spapr_numa_associativity_init(spapr, machine);
>   
> -    if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
> -        ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
> +    if (ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
>                                 spapr->max_compat_pvr)) {
> -        spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_300);
> -        /* KVM and TCG always allow GTSE with radix... */
> -        spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
> +
> +        /* TCG always supports Radix w/ GTSE */
> +        if (!kvm_enabled()) {
> +            spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_300);
> +            spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
> +        } else {
> +            if (kvmppc_has_cap_mmu_radix()) {
> +                spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_300);
> +            }
> +
> +            /*
> +             * Only disable Guest Translation Shootdown if KVM
> +             * supports the H_RPT_INVALIDATE hypercall, otherwise we'd
> +             * leave the guest with no way to make TLB invalidations.
> +             */
> +            if (kvmppc_has_cap_rpt_invalidate()) {
> +                if (kvmppc_has_cap_gtse()) {
> +                    spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
> +                }
> +            } else {
> +                spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
> +            }
> +        }
>       }
> -    /* ... but not with hash (currently). */
>   
>       if (kvm_enabled()) {
>           /* Enable H_LOGICAL_CI_* so SLOF can talk to in-kernel devices */
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index dc93b99189..91582c4b15 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -90,6 +90,7 @@ static int cap_ppc_nested_kvm_hv;
>   static int cap_large_decr;
>   static int cap_fwnmi;
>   static int cap_rpt_invalidate;
> +static int cap_gtse;
>   
>   static uint32_t debug_inst_opcode;
>   
> @@ -154,6 +155,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>       }
>   
>       cap_rpt_invalidate = kvm_vm_check_extension(s, KVM_CAP_PPC_RPT_INVALIDATE);
> +    cap_gtse = kvm_vm_check_extension(s, KVM_CAP_PPC_GTSE);
> +
>       kvm_ppc_register_host_cpu_type();
>   
>       return 0;
> @@ -2397,6 +2400,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>       return cap_mmu_hash_v3;
>   }
>   
> +bool kvmppc_has_cap_gtse(void)
> +{
> +    return cap_gtse;
> +}
> +
>   static bool kvmppc_power8_host(void)
>   {
>       bool ret = false;
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index ee9325bf9a..7d6980edb7 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -63,6 +63,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
>   bool kvmppc_has_cap_htm(void);
>   bool kvmppc_has_cap_mmu_radix(void);
>   bool kvmppc_has_cap_mmu_hash_v3(void);
> +bool kvmppc_has_cap_gtse(void);
>   bool kvmppc_has_cap_xive(void);
>   int kvmppc_get_cap_safe_cache(void);
>   int kvmppc_get_cap_safe_bounds_check(void);
> @@ -343,6 +344,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
>       return false;
>   }
>   
> +static inline bool kvmppc_has_cap_gtse(void)
> +{
> +    return false;
> +}
> +
>   static inline bool kvmppc_has_cap_xive(void)
>   {
>       return false;


  parent reply	other threads:[~2022-03-09 19:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09  1:23 [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5 Fabiano Rosas
2022-03-09  1:24 ` [RFC PATCH 2/2] linux-headers: Add KVM_CAP_PPC_GTSE Fabiano Rosas
2022-03-09 19:09 ` Daniel Henrique Barboza [this message]
2022-03-11 14:45   ` [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5 Fabiano Rosas
2022-03-10 19:51 ` Fabiano Rosas
2022-03-11  0:30   ` Daniel Henrique Barboza
2022-03-12  9:17 ` David Gibson
2022-03-14 22:10   ` Fabiano Rosas
2022-03-31  4:29     ` David Gibson
2022-04-01  7:01       ` Aneesh Kumar K.V
2022-04-01 15:50         ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9cb6b05-9c06-8081-ee7f-a44f3a85848b@gmail.com \
    --to=danielhb413@gmail.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=farosas@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.