All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] ppc: CPU reset must flush translation buffer
       [not found] <1337054780.6727.60.camel@pasglop>
@ 2012-05-21  2:01 ` Benjamin Herrenschmidt
  2012-05-21  6:16   ` Alexander Graf
  2012-05-21  7:39   ` [Qemu-devel] " Alexander Graf
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-21  2:01 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel

Without that, reset from SLOF crashes in full emulation.

Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 target-ppc/translate_init.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index ae03065..fbf7705 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -10285,6 +10285,7 @@ static void ppc_cpu_reset(CPUState *s)
     env->error_code = 0;
     /* Flush all TLBs */
     tlb_flush(env, 1);
+    tb_flush(env);
 }
 
 static void ppc_cpu_initfn(Object *obj)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] ppc: CPU reset must flush translation buffer
  2012-05-21  2:01 ` [Qemu-devel] ppc: CPU reset must flush translation buffer Benjamin Herrenschmidt
@ 2012-05-21  6:16   ` Alexander Graf
  2012-05-21  6:26     ` Benjamin Herrenschmidt
  2012-05-21  7:15     ` Peter Maydell
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Graf @ 2012-05-21  6:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Peter Maydell, qemu-devel, Andreas Färber



On 21.05.2012, at 04:01, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> Without that, reset from SLOF crashes in full emulation.
> 
> Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> target-ppc/translate_init.c |    1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index ae03065..fbf7705 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -10285,6 +10285,7 @@ static void ppc_cpu_reset(CPUState *s)
>     env->error_code = 0;
>     /* Flush all TLBs */
>     tlb_flush(env, 1);
> +    tb_flush(env);

Shouldn't this be true for all CPUs? I remember talking about reset with Peter a while ago... but don't remember the conclusions :)


Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] ppc: CPU reset must flush translation buffer
  2012-05-21  6:16   ` Alexander Graf
@ 2012-05-21  6:26     ` Benjamin Herrenschmidt
  2012-05-21  7:15     ` Peter Maydell
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-21  6:26 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Peter Maydell, qemu-devel, Andreas Färber

On Mon, 2012-05-21 at 08:16 +0200, Alexander Graf wrote:
> 
> On 21.05.2012, at 04:01, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> 
> > Without that, reset from SLOF crashes in full emulation.
> > 
> > Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com>
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> > target-ppc/translate_init.c |    1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/target-ppc/translate_init.c
> b/target-ppc/translate_init.c
> > index ae03065..fbf7705 100644
> > --- a/target-ppc/translate_init.c
> > +++ b/target-ppc/translate_init.c
> > @@ -10285,6 +10285,7 @@ static void ppc_cpu_reset(CPUState *s)
> >     env->error_code = 0;
> >     /* Flush all TLBs */
> >     tlb_flush(env, 1);
> > +    tb_flush(env);
> 
> Shouldn't this be true for all CPUs? I remember talking about reset
> with Peter a while ago... but don't remember the conclusions :)

Possibly. I noticed other targets do that too (ARM iirc), in this case I
think it's the ROM being reloaded that doesn't flush the cached
translations for the vectors (I -think-, that's from memory). But there
could be all sort of other context changes, so it seems like the safest
thing to do.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] ppc: CPU reset must flush translation buffer
  2012-05-21  6:16   ` Alexander Graf
  2012-05-21  6:26     ` Benjamin Herrenschmidt
@ 2012-05-21  7:15     ` Peter Maydell
  2012-05-21  7:39       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Maydell @ 2012-05-21  7:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel, Andreas Färber

On 21 May 2012 07:16, Alexander Graf <agraf@suse.de> wrote:
> Shouldn't this be true for all CPUs? I remember talking about reset
> with Peter a while ago... but don't remember the conclusions :)

The conclusion we came to is that you only need to tb_flush
in your CPU's reset function if you have some CPU state which
you handle by baking it into translated code and doing a tb_flush
when the state changes. This is relatively rare, most CPU
frontends only use the other options:
 (a) CPU state is constant for life of simulation
 (b) CPU state not baked into code
 (c) CPU state encoded in tb_flags.

In particular, target-ppc doesn't have any uses of tb_flush
at the moment, so either this fix is insufficient (and you need
to also use tb_flush at the point where the relevant state is
changed by whatever helper function) or it's the wrong fix.

If the issue is ROM reloading then the loading code needs to
be fixed (compare the way that the memory region API correctly
handles bits of physical memory being mapped/unmapped/remapped
without the caller needing to do a tb_reset).

-- PMM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs
       [not found] <1337054780.6727.60.camel@pasglop>
  2012-05-21  2:01 ` [Qemu-devel] ppc: CPU reset must flush translation buffer Benjamin Herrenschmidt
@ 2012-05-21  7:39   ` Alexander Graf
  1 sibling, 0 replies; 8+ messages in thread
From: Alexander Graf @ 2012-05-21  7:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm list, qemu-ppc, QEMU Developers


On 15.05.2012, at 06:06, Benjamin Herrenschmidt wrote:

> More recent Power server chips (i.e. based on the 64 bit hash MMU)
> support more than just the traditional 4k and 16M page sizes.  This
> can get quite complicated, because which page sizes are supported,
> which combinations are supported within an MMU segment and how these
> page sizes are encoded both in the SLB entry and the hash PTE can vary
> depending on the CPU model (they are not specified by the
> architecture).  In addition the firmware or hypervisor may not permit
> use of certain page sizes, for various reasons.  Whether various page
> sizes are supported on KVM, for example, depends on whether the PR or
> HV variant of KVM is in use, and on the page size of the memory
> backing the guest's RAM.
> 
> This patch adds information to the CPUState and cpu defs to describe
> the supported page sizes and encodings.  Since TCG does not yet
> support any extended page sizes, we just set this to NULL in the
> static CPU definitions, expanding this to the default 4k and 16M page
> sizes when we initialize the cpu state.  When using KVM, however, we
> instead determine available page sizes using the new
> KVM_PPC_GET_SMMU_INFO call.  For old kernels without that call, we use
> some defaults, with some guesswork which should do the right thing for
> existing HV and PR implementations.  The fallback might not be correct
> for future versions, but that's ok, because they'll have
> KVM_PPC_GET_SMMU_INFO.
> 
> 
> Signed-off-by: Benjamin Herrenschmidt <
> benh@kernel.crashing.org
> >
> Signed-off-by: David Gibson <
> david@gibson.dropbear.id.au
> >
> ---
> v2: - Passes checkpatch now (with the exception of the kernel header
>       bit which will eventually be replaced by the real thing when
>       it goes upstream).
>     - Moved back some fixes that were incorrectly located in the
>       second patch (such as setting slb_size)
>     - Fix some issues when using "PR" KVM without proper support
>       for 1T segments
> 
> Note: I kept the caching. It's not a "hot" path but still, it's a
>       lot of mess to do for every CPU (and we routinely have a bunch
>       on those power machines), in the end it adds up ...
> 
>  linux-headers/linux/kvm.h   |   26 ++++++
>  target-ppc/cpu.h            |   31 +++++++
>  target-ppc/helper.c         |    7 ++
>  target-ppc/kvm.c            |  210 +++++++++++++++++++++++++++++++++++++++++++
>  target-ppc/kvm_ppc.h        |    5 +
>  target-ppc/translate_init.c |   21 +++++
>  6 files changed, 300 insertions(+), 0 deletions(-)
> 
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index ee7bd9c..015b6db 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> 

Please split Linux kernel header updates into their own patch, so I can easily redo them when necessary.

Also, you sent this mail to the wrong mailing lists - it's a QEMU patch :). CC'ing the correct ones.

> @@ -449,6 +449,30 @@ struct kvm_ppc_pvinfo {
>  	__u8  pad[108];
>  };
>  
> +/* for KVM_PPC_GET_SMMU_INFO */
> +#define KVM_PPC_PAGE_SIZES_MAX_SZ	8
> +
> +struct kvm_ppc_one_page_size {
> +	__u32 page_shift;	/* Page shift (or 0) */
> +	__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
> +};
> +
> +struct kvm_ppc_one_seg_page_size {
> +	__u32 page_shift;	/* Base page shift of segment (or 0) */
> +	__u32 slb_enc;		/* SLB encoding for BookS */
> +	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +#define KVM_PPC_PAGE_SIZES_REAL		0x00000001
> +#define KVM_PPC_1T_SEGMENTS		0x00000002
> +
> +struct kvm_ppc_smmu_info {
> +	__u64 flags;
> +	__u32 slb_size;
> +	__u32 pad;
> +	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
>  #define KVMIO 0xAE
>  
>  /* machine type bits, to be used as argument to KVM_CREATE_VM */
> @@ -789,6 +813,8 @@ struct kvm_s390_ucas_mapping {
>  /* Available with KVM_CAP_PCI_2_3 */
>  #define KVM_ASSIGN_SET_INTX_MASK  _IOW(KVMIO,  0xa4, \
>  				       struct kvm_assigned_pci_dev)
> +/* Available with KVM_CAP_PPC_GET_SMMU_INFO */
> +#define KVM_PPC_GET_SMMU_INFO     _IOR(KVMIO,  0xa5, struct kvm_ppc_smmu_info)
>  
>  /*
>   * ioctls for vcpu fds
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 84c9674..3ed75ac 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -119,6 +119,8 @@ enum powerpc_mmu_t {
>      POWERPC_MMU_620        = POWERPC_MMU_64 | 0x00000002,
>      /* Architecture 2.06 variant                               */
>      POWERPC_MMU_2_06       = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003,
> +    /* Architecture 2.06 "degraded" (no 1T segments)           */
> +    POWERPC_MMU_2_06d      = POWERPC_MMU_64 | 0x00000003,
>  #endif /* defined(TARGET_PPC64) */
>  };
>  
> @@ -874,6 +876,29 @@ enum {
>  #define DBELL_PIRTAG_MASK              0x3fff
>  
>  /*****************************************************************************/
> +/* Segment page size information, used by recent hash MMUs
> + * The format of this structure mirrors kvm_ppc_smmu_info
> + */
> +
> +#define PPC_PAGE_SIZES_MAX_SZ   8
> +
> +struct ppc_one_page_size {
> +    uint32_t page_shift;  /* Page shift (or 0) */
> +    uint32_t pte_enc;     /* Encoding in the HPTE (>>12) */
> +};
> +
> +struct ppc_one_seg_page_size {
> +    uint32_t page_shift;  /* Base page shift of segment (or 0) */
> +    uint32_t slb_enc;     /* SLB encoding for BookS */
> +    struct ppc_one_page_size enc[PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +struct ppc_segment_page_sizes {
> +    struct ppc_one_seg_page_size sps[PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +
> +/*****************************************************************************/
>  /* The whole PowerPC CPU context */
>  #define NB_MMU_MODES 3
>  
> @@ -889,6 +914,9 @@ struct ppc_def_t {
>      powerpc_input_t bus_model;
>      uint32_t flags;
>      int bfd_mach;
> +#if defined(TARGET_PPC64)
> +    const struct ppc_segment_page_sizes *sps;
> +#endif
>      void (*init_proc)(CPUPPCState *env);
>      int  (*check_pow)(CPUPPCState *env);
>  };
> @@ -1012,6 +1040,9 @@ struct CPUPPCState {
>      uint32_t flags;
>      uint64_t insns_flags;
>      uint64_t insns_flags2;
> +#if defined(TARGET_PPC64)
> +    struct ppc_segment_page_sizes sps;
> +#endif
>  
>  #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
>      target_phys_addr_t vpa;
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index e97e496..833d948 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -1617,6 +1617,7 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env)
>  #if defined(TARGET_PPC64)
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          mmubooks_dump_mmu(f, cpu_fprintf, env);
>          break;
>  #endif
> @@ -1647,6 +1648,7 @@ static inline int check_physical(CPUPPCState *env, mmu_ctx_t *ctx,
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          /* Real address are 60 bits long */
>          ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL;
>          ctx->prot |= PAGE_WRITE;
> @@ -1727,6 +1729,7 @@ int get_physical_address (CPUPPCState *env, mmu_ctx_t *ctx, target_ulong eaddr,
>          case POWERPC_MMU_620:
>          case POWERPC_MMU_64B:
>          case POWERPC_MMU_2_06:
> +        case POWERPC_MMU_2_06d:
>  #endif
>              if (ret < 0) {
>                  /* We didn't match any BAT entry or don't have BATs */
> @@ -1867,6 +1870,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw,
>                  case POWERPC_MMU_620:
>                  case POWERPC_MMU_64B:
>                  case POWERPC_MMU_2_06:
> +                case POWERPC_MMU_2_06d:
>  #endif
>                      env->exception_index = POWERPC_EXCP_ISI;
>                      env->error_code = 0x40000000;
> @@ -1977,6 +1981,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw,
>                  case POWERPC_MMU_620:
>                  case POWERPC_MMU_64B:
>                  case POWERPC_MMU_2_06:
> +                case POWERPC_MMU_2_06d:
>  #endif
>                      env->exception_index = POWERPC_EXCP_DSI;
>                      env->error_code = 0;
> @@ -2299,6 +2304,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env)
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>  #endif /* defined(TARGET_PPC64) */
>          tlb_flush(env, 1);
>          break;
> @@ -2367,6 +2373,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr)
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          /* tlbie invalidate TLBs for all segments */
>          /* XXX: given the fact that there are too many segments to invalidate,
>           *      and we still don't have a tlb_flush_mask(env, n, mask) in QEMU,
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 0ab7630..2100cb9 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -18,6 +18,7 @@
>  #include <sys/types.h>
>  #include <sys/ioctl.h>
>  #include <sys/mman.h>
> +#include <sys/vfs.h>
>  
>  #include <linux/kvm.h>
>  
> @@ -167,10 +168,219 @@ static int kvm_booke206_tlb_init(CPUPPCState *env)
>      return 0;
>  }
>  
> +
> +#if defined(TARGET_PPC64)
> +static void kvm_get_fallback_smmu_info(CPUPPCState *env,
> +                                       struct kvm_ppc_smmu_info *info)
> +{
> +    memset(info, 0, sizeof(*info));
> +
> +    /* We don't have the new KVM_PPC_GET_SMMU_INFO ioctl, so
> +     * need to "guess" what the supported page sizes are.
> +     *
> +     * For that to work we make a few assumptions:
> +     *
> +     * - If KVM_CAP_PPC_GET_PVINFO is supported we are running "PR"
> +     *   KVM which only supports 4K and 16M pages, but supports them
> +     *   regardless of the backing store characteritics. We also don't
> +     *   support 1T segments.
> +     *
> +     *   This is safe as if HV KVM ever supports that capability or PR
> +     *   KVM grows supports for more page/segment sizes, those versions
> +     *   will have implemented KVM_CAP_PPC_GET_SMMU_INFO and thus we
> +     *   will not hit this fallback
> +     *
> +     * - Else we are running HV KVM. This means we only support page
> +     *   sizes that fit in the backing store. Additionally we only
> +     *   advertize 64K pages if the processor is ARCH 2.06 and we assume
> +     *   P7 encodings for the SLB and hash table. Here too, we assume
> +     *   support for any newer processor will mean a kernel that
> +     *   implements KVM_CAP_PPC_GET_SMMU_INFO and thus doesn't hit
> +     *   this fallback.
> +     */
> +    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)) {
> +        /* No flags */
> +        info->flags = 0;
> +        info->slb_size = 64;
> +
> +        /* Standard 4k base page size segment */
> +        info->sps[0].page_shift = 12;
> +        info->sps[0].slb_enc = 0;
> +        info->sps[0].enc[0].page_shift = 12;
> +        info->sps[0].enc[0].pte_enc = 0;
> +
> +        /* Standard 16M large page size segment */
> +        info->sps[1].page_shift = 24;
> +        info->sps[1].slb_enc = SLB_VSID_L;
> +        info->sps[1].enc[0].page_shift = 24;
> +        info->sps[1].enc[0].pte_enc = 0;
> +    } else {
> +        int i = 0;
> +
> +        /* HV KVM has backing store size restrictions */
> +        info->flags = KVM_PPC_PAGE_SIZES_REAL;
> +
> +        if (env->mmu_model & POWERPC_MMU_1TSEG) {
> +            info->flags |= KVM_PPC_1T_SEGMENTS;
> +        }
> +
> +        if (env->mmu_model == POWERPC_MMU_2_06) {
> +            info->slb_size = 32;
> +        } else {
> +            info->slb_size = 64;
> +        }
> +
> +        /* Standard 4k base page size segment */
> +        info->sps[i].page_shift = 12;
> +        info->sps[i].slb_enc = 0;
> +        info->sps[i].enc[0].page_shift = 12;
> +        info->sps[i].enc[0].pte_enc = 0;
> +        i++;
> +
> +        /* 64K on MMU 2.06 */
> +        if (env->mmu_model == POWERPC_MMU_2_06) {
> +            info->sps[i].page_shift = 16;
> +            info->sps[i].slb_enc = 0x110;
> +            info->sps[i].enc[0].page_shift = 16;
> +            info->sps[i].enc[0].pte_enc = 1;
> +            i++;
> +        }
> +
> +        /* Standard 16M large page size segment */
> +        info->sps[i].page_shift = 24;
> +        info->sps[i].slb_enc = SLB_VSID_L;
> +        info->sps[i].enc[0].page_shift = 24;
> +        info->sps[i].enc[0].pte_enc = 0;
> +    }
> +}
> +
> +static void kvm_get_smmu_info(CPUPPCState *env, struct kvm_ppc_smmu_info *info)
> +{
> +#ifdef KVM_CAP_PPC_GET_SMMU_INFO
> 

No need to for the #ifdef anymore, because we're syncing the headers now.

> +    int ret;
> +
> +    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_SMMU_INFO)) {
> +        ret = kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_SMMU_INFO, info);
> +        if (ret == 0) {
> +            return;
> +        }
> +    }
> +#endif /* KVM_CAP_PPC_GET_SMMU_INFO */
> +
> +    kvm_get_fallback_smmu_info(env, info);
> +}
> +
> +static long getrampagesize(void)
> +{
> +    struct statfs fs;
> +    int ret;
> +
> +    if (!mem_path) {
> +        /* guest RAM is backed by normal anonymous pages */
> +        return getpagesize();
> +    }
> +
> +    do {
> +        ret = statfs(mem_path, &fs);
> +    } while (ret != 0 && errno == EINTR);
> +
> +    if (ret != 0) {
> +        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> +                strerror(errno));
> +        exit(1);
> +    }
> +
> +#define HUGETLBFS_MAGIC       0x958458f6
> +
> +    if (fs.f_type != HUGETLBFS_MAGIC) {
> +        /* Explicit mempath, but it's ordinary pages */
> +        return getpagesize();
> +    }
> +
> +    /* It's hugepage, return the huge page size */
> +    return fs.f_bsize;
> +}
> +
> +static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift)
> +{
> +    if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) {
> +        return true;
> +    }
> +
> +    return (1ul << shift) <= rampgsize;
> +}
> +
> +static void kvm_fixup_page_sizes(CPUPPCState *env)
> +{
> +    static struct kvm_ppc_smmu_info smmu_info;
> +    static bool has_smmu_info;
> +    long rampagesize;
> +    int iq, ik, jq, jk;
> +
> +    /* We only handle page sizes for 64-bit server guests for now */
> +    if (!(env->mmu_model & POWERPC_MMU_64)) {
> +        return;
> +    }
> +
> +    /* Collect MMU info from kernel if not already */
> +    if (!has_smmu_info) {
> +        kvm_get_smmu_info(env, &smmu_info);
> +        has_smmu_info = true;
> +    }
> +
> +    rampagesize = getrampagesize();
> +
> +    /* Convert to QEMU form */
> +    memset(&env->sps, 0, sizeof(env->sps));
> +
> +    for (ik = iq = 0; ik < KVM_PPC_PAGE_SIZES_MAX_SZ; ik++) {
> +        struct ppc_one_seg_page_size *qsps = &env->sps.sps[iq];
> +        struct kvm_ppc_one_seg_page_size *ksps = &smmu_info.sps[ik];
> +
> +        if (!kvm_valid_page_size(smmu_info.flags, rampagesize,
> +                                 ksps->page_shift)) {
> +            continue;
> +        }
> +        qsps->page_shift = ksps->page_shift;
> +        qsps->slb_enc = ksps->slb_enc;
> +        for (jk = jq = 0; jk < KVM_PPC_PAGE_SIZES_MAX_SZ; jk++) {
> +            if (!kvm_valid_page_size(smmu_info.flags, rampagesize,
> +                                     ksps->enc[jk].page_shift)) {
> +                continue;
> +            }
> +            qsps->enc[jq].page_shift = ksps->enc[jk].page_shift;
> +            qsps->enc[jq].pte_enc = ksps->enc[jk].pte_enc;
> +            if (++jq >= PPC_PAGE_SIZES_MAX_SZ) {
> +                break;
> +            }
> +        }
> +        if (++iq >= PPC_PAGE_SIZES_MAX_SZ) {
> +            break;
> +        }
> +    }
> +    env->slb_nr = smmu_info.slb_size;
> +    if (smmu_info.flags & KVM_PPC_1T_SEGMENTS) {
> +        env->mmu_model |= POWERPC_MMU_1TSEG;
> +    } else {
> +        env->mmu_model &= ~POWERPC_MMU_1TSEG;
> +    }
> +}
> +#else /* defined (TARGET_PPC64) */
> +
> +static inline void kvm_fixup_page_sizes(CPUPPCState *env)
> +{
> +}
> +
> +#endif /* !defined (TARGET_PPC64) */
> +
>  int kvm_arch_init_vcpu(CPUPPCState *cenv)
>  {
>      int ret;
>  
> +    /* Gather server mmu info from KVM and update the CPU state*/
> 

Missing space? :)


Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs
@ 2012-05-21  7:39   ` Alexander Graf
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Graf @ 2012-05-21  7:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-ppc, kvm list, kvm-ppc, QEMU Developers


On 15.05.2012, at 06:06, Benjamin Herrenschmidt wrote:

> More recent Power server chips (i.e. based on the 64 bit hash MMU)
> support more than just the traditional 4k and 16M page sizes.  This
> can get quite complicated, because which page sizes are supported,
> which combinations are supported within an MMU segment and how these
> page sizes are encoded both in the SLB entry and the hash PTE can vary
> depending on the CPU model (they are not specified by the
> architecture).  In addition the firmware or hypervisor may not permit
> use of certain page sizes, for various reasons.  Whether various page
> sizes are supported on KVM, for example, depends on whether the PR or
> HV variant of KVM is in use, and on the page size of the memory
> backing the guest's RAM.
> 
> This patch adds information to the CPUState and cpu defs to describe
> the supported page sizes and encodings.  Since TCG does not yet
> support any extended page sizes, we just set this to NULL in the
> static CPU definitions, expanding this to the default 4k and 16M page
> sizes when we initialize the cpu state.  When using KVM, however, we
> instead determine available page sizes using the new
> KVM_PPC_GET_SMMU_INFO call.  For old kernels without that call, we use
> some defaults, with some guesswork which should do the right thing for
> existing HV and PR implementations.  The fallback might not be correct
> for future versions, but that's ok, because they'll have
> KVM_PPC_GET_SMMU_INFO.
> 
> 
> Signed-off-by: Benjamin Herrenschmidt <
> benh@kernel.crashing.org
> >
> Signed-off-by: David Gibson <
> david@gibson.dropbear.id.au
> >
> ---
> v2: - Passes checkpatch now (with the exception of the kernel header
>       bit which will eventually be replaced by the real thing when
>       it goes upstream).
>     - Moved back some fixes that were incorrectly located in the
>       second patch (such as setting slb_size)
>     - Fix some issues when using "PR" KVM without proper support
>       for 1T segments
> 
> Note: I kept the caching. It's not a "hot" path but still, it's a
>       lot of mess to do for every CPU (and we routinely have a bunch
>       on those power machines), in the end it adds up ...
> 
>  linux-headers/linux/kvm.h   |   26 ++++++
>  target-ppc/cpu.h            |   31 +++++++
>  target-ppc/helper.c         |    7 ++
>  target-ppc/kvm.c            |  210 +++++++++++++++++++++++++++++++++++++++++++
>  target-ppc/kvm_ppc.h        |    5 +
>  target-ppc/translate_init.c |   21 +++++
>  6 files changed, 300 insertions(+), 0 deletions(-)
> 
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index ee7bd9c..015b6db 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> 

Please split Linux kernel header updates into their own patch, so I can easily redo them when necessary.

Also, you sent this mail to the wrong mailing lists - it's a QEMU patch :). CC'ing the correct ones.

> @@ -449,6 +449,30 @@ struct kvm_ppc_pvinfo {
>  	__u8  pad[108];
>  };
>  
> +/* for KVM_PPC_GET_SMMU_INFO */
> +#define KVM_PPC_PAGE_SIZES_MAX_SZ	8
> +
> +struct kvm_ppc_one_page_size {
> +	__u32 page_shift;	/* Page shift (or 0) */
> +	__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
> +};
> +
> +struct kvm_ppc_one_seg_page_size {
> +	__u32 page_shift;	/* Base page shift of segment (or 0) */
> +	__u32 slb_enc;		/* SLB encoding for BookS */
> +	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +#define KVM_PPC_PAGE_SIZES_REAL		0x00000001
> +#define KVM_PPC_1T_SEGMENTS		0x00000002
> +
> +struct kvm_ppc_smmu_info {
> +	__u64 flags;
> +	__u32 slb_size;
> +	__u32 pad;
> +	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
>  #define KVMIO 0xAE
>  
>  /* machine type bits, to be used as argument to KVM_CREATE_VM */
> @@ -789,6 +813,8 @@ struct kvm_s390_ucas_mapping {
>  /* Available with KVM_CAP_PCI_2_3 */
>  #define KVM_ASSIGN_SET_INTX_MASK  _IOW(KVMIO,  0xa4, \
>  				       struct kvm_assigned_pci_dev)
> +/* Available with KVM_CAP_PPC_GET_SMMU_INFO */
> +#define KVM_PPC_GET_SMMU_INFO     _IOR(KVMIO,  0xa5, struct kvm_ppc_smmu_info)
>  
>  /*
>   * ioctls for vcpu fds
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 84c9674..3ed75ac 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -119,6 +119,8 @@ enum powerpc_mmu_t {
>      POWERPC_MMU_620        = POWERPC_MMU_64 | 0x00000002,
>      /* Architecture 2.06 variant                               */
>      POWERPC_MMU_2_06       = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003,
> +    /* Architecture 2.06 "degraded" (no 1T segments)           */
> +    POWERPC_MMU_2_06d      = POWERPC_MMU_64 | 0x00000003,
>  #endif /* defined(TARGET_PPC64) */
>  };
>  
> @@ -874,6 +876,29 @@ enum {
>  #define DBELL_PIRTAG_MASK              0x3fff
>  
>  /*****************************************************************************/
> +/* Segment page size information, used by recent hash MMUs
> + * The format of this structure mirrors kvm_ppc_smmu_info
> + */
> +
> +#define PPC_PAGE_SIZES_MAX_SZ   8
> +
> +struct ppc_one_page_size {
> +    uint32_t page_shift;  /* Page shift (or 0) */
> +    uint32_t pte_enc;     /* Encoding in the HPTE (>>12) */
> +};
> +
> +struct ppc_one_seg_page_size {
> +    uint32_t page_shift;  /* Base page shift of segment (or 0) */
> +    uint32_t slb_enc;     /* SLB encoding for BookS */
> +    struct ppc_one_page_size enc[PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +struct ppc_segment_page_sizes {
> +    struct ppc_one_seg_page_size sps[PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +
> +/*****************************************************************************/
>  /* The whole PowerPC CPU context */
>  #define NB_MMU_MODES 3
>  
> @@ -889,6 +914,9 @@ struct ppc_def_t {
>      powerpc_input_t bus_model;
>      uint32_t flags;
>      int bfd_mach;
> +#if defined(TARGET_PPC64)
> +    const struct ppc_segment_page_sizes *sps;
> +#endif
>      void (*init_proc)(CPUPPCState *env);
>      int  (*check_pow)(CPUPPCState *env);
>  };
> @@ -1012,6 +1040,9 @@ struct CPUPPCState {
>      uint32_t flags;
>      uint64_t insns_flags;
>      uint64_t insns_flags2;
> +#if defined(TARGET_PPC64)
> +    struct ppc_segment_page_sizes sps;
> +#endif
>  
>  #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
>      target_phys_addr_t vpa;
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index e97e496..833d948 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -1617,6 +1617,7 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env)
>  #if defined(TARGET_PPC64)
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          mmubooks_dump_mmu(f, cpu_fprintf, env);
>          break;
>  #endif
> @@ -1647,6 +1648,7 @@ static inline int check_physical(CPUPPCState *env, mmu_ctx_t *ctx,
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          /* Real address are 60 bits long */
>          ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL;
>          ctx->prot |= PAGE_WRITE;
> @@ -1727,6 +1729,7 @@ int get_physical_address (CPUPPCState *env, mmu_ctx_t *ctx, target_ulong eaddr,
>          case POWERPC_MMU_620:
>          case POWERPC_MMU_64B:
>          case POWERPC_MMU_2_06:
> +        case POWERPC_MMU_2_06d:
>  #endif
>              if (ret < 0) {
>                  /* We didn't match any BAT entry or don't have BATs */
> @@ -1867,6 +1870,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw,
>                  case POWERPC_MMU_620:
>                  case POWERPC_MMU_64B:
>                  case POWERPC_MMU_2_06:
> +                case POWERPC_MMU_2_06d:
>  #endif
>                      env->exception_index = POWERPC_EXCP_ISI;
>                      env->error_code = 0x40000000;
> @@ -1977,6 +1981,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw,
>                  case POWERPC_MMU_620:
>                  case POWERPC_MMU_64B:
>                  case POWERPC_MMU_2_06:
> +                case POWERPC_MMU_2_06d:
>  #endif
>                      env->exception_index = POWERPC_EXCP_DSI;
>                      env->error_code = 0;
> @@ -2299,6 +2304,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env)
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>  #endif /* defined(TARGET_PPC64) */
>          tlb_flush(env, 1);
>          break;
> @@ -2367,6 +2373,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr)
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          /* tlbie invalidate TLBs for all segments */
>          /* XXX: given the fact that there are too many segments to invalidate,
>           *      and we still don't have a tlb_flush_mask(env, n, mask) in QEMU,
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 0ab7630..2100cb9 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -18,6 +18,7 @@
>  #include <sys/types.h>
>  #include <sys/ioctl.h>
>  #include <sys/mman.h>
> +#include <sys/vfs.h>
>  
>  #include <linux/kvm.h>
>  
> @@ -167,10 +168,219 @@ static int kvm_booke206_tlb_init(CPUPPCState *env)
>      return 0;
>  }
>  
> +
> +#if defined(TARGET_PPC64)
> +static void kvm_get_fallback_smmu_info(CPUPPCState *env,
> +                                       struct kvm_ppc_smmu_info *info)
> +{
> +    memset(info, 0, sizeof(*info));
> +
> +    /* We don't have the new KVM_PPC_GET_SMMU_INFO ioctl, so
> +     * need to "guess" what the supported page sizes are.
> +     *
> +     * For that to work we make a few assumptions:
> +     *
> +     * - If KVM_CAP_PPC_GET_PVINFO is supported we are running "PR"
> +     *   KVM which only supports 4K and 16M pages, but supports them
> +     *   regardless of the backing store characteritics. We also don't
> +     *   support 1T segments.
> +     *
> +     *   This is safe as if HV KVM ever supports that capability or PR
> +     *   KVM grows supports for more page/segment sizes, those versions
> +     *   will have implemented KVM_CAP_PPC_GET_SMMU_INFO and thus we
> +     *   will not hit this fallback
> +     *
> +     * - Else we are running HV KVM. This means we only support page
> +     *   sizes that fit in the backing store. Additionally we only
> +     *   advertize 64K pages if the processor is ARCH 2.06 and we assume
> +     *   P7 encodings for the SLB and hash table. Here too, we assume
> +     *   support for any newer processor will mean a kernel that
> +     *   implements KVM_CAP_PPC_GET_SMMU_INFO and thus doesn't hit
> +     *   this fallback.
> +     */
> +    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)) {
> +        /* No flags */
> +        info->flags = 0;
> +        info->slb_size = 64;
> +
> +        /* Standard 4k base page size segment */
> +        info->sps[0].page_shift = 12;
> +        info->sps[0].slb_enc = 0;
> +        info->sps[0].enc[0].page_shift = 12;
> +        info->sps[0].enc[0].pte_enc = 0;
> +
> +        /* Standard 16M large page size segment */
> +        info->sps[1].page_shift = 24;
> +        info->sps[1].slb_enc = SLB_VSID_L;
> +        info->sps[1].enc[0].page_shift = 24;
> +        info->sps[1].enc[0].pte_enc = 0;
> +    } else {
> +        int i = 0;
> +
> +        /* HV KVM has backing store size restrictions */
> +        info->flags = KVM_PPC_PAGE_SIZES_REAL;
> +
> +        if (env->mmu_model & POWERPC_MMU_1TSEG) {
> +            info->flags |= KVM_PPC_1T_SEGMENTS;
> +        }
> +
> +        if (env->mmu_model == POWERPC_MMU_2_06) {
> +            info->slb_size = 32;
> +        } else {
> +            info->slb_size = 64;
> +        }
> +
> +        /* Standard 4k base page size segment */
> +        info->sps[i].page_shift = 12;
> +        info->sps[i].slb_enc = 0;
> +        info->sps[i].enc[0].page_shift = 12;
> +        info->sps[i].enc[0].pte_enc = 0;
> +        i++;
> +
> +        /* 64K on MMU 2.06 */
> +        if (env->mmu_model == POWERPC_MMU_2_06) {
> +            info->sps[i].page_shift = 16;
> +            info->sps[i].slb_enc = 0x110;
> +            info->sps[i].enc[0].page_shift = 16;
> +            info->sps[i].enc[0].pte_enc = 1;
> +            i++;
> +        }
> +
> +        /* Standard 16M large page size segment */
> +        info->sps[i].page_shift = 24;
> +        info->sps[i].slb_enc = SLB_VSID_L;
> +        info->sps[i].enc[0].page_shift = 24;
> +        info->sps[i].enc[0].pte_enc = 0;
> +    }
> +}
> +
> +static void kvm_get_smmu_info(CPUPPCState *env, struct kvm_ppc_smmu_info *info)
> +{
> +#ifdef KVM_CAP_PPC_GET_SMMU_INFO
> 

No need to for the #ifdef anymore, because we're syncing the headers now.

> +    int ret;
> +
> +    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_SMMU_INFO)) {
> +        ret = kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_SMMU_INFO, info);
> +        if (ret == 0) {
> +            return;
> +        }
> +    }
> +#endif /* KVM_CAP_PPC_GET_SMMU_INFO */
> +
> +    kvm_get_fallback_smmu_info(env, info);
> +}
> +
> +static long getrampagesize(void)
> +{
> +    struct statfs fs;
> +    int ret;
> +
> +    if (!mem_path) {
> +        /* guest RAM is backed by normal anonymous pages */
> +        return getpagesize();
> +    }
> +
> +    do {
> +        ret = statfs(mem_path, &fs);
> +    } while (ret != 0 && errno == EINTR);
> +
> +    if (ret != 0) {
> +        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> +                strerror(errno));
> +        exit(1);
> +    }
> +
> +#define HUGETLBFS_MAGIC       0x958458f6
> +
> +    if (fs.f_type != HUGETLBFS_MAGIC) {
> +        /* Explicit mempath, but it's ordinary pages */
> +        return getpagesize();
> +    }
> +
> +    /* It's hugepage, return the huge page size */
> +    return fs.f_bsize;
> +}
> +
> +static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift)
> +{
> +    if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) {
> +        return true;
> +    }
> +
> +    return (1ul << shift) <= rampgsize;
> +}
> +
> +static void kvm_fixup_page_sizes(CPUPPCState *env)
> +{
> +    static struct kvm_ppc_smmu_info smmu_info;
> +    static bool has_smmu_info;
> +    long rampagesize;
> +    int iq, ik, jq, jk;
> +
> +    /* We only handle page sizes for 64-bit server guests for now */
> +    if (!(env->mmu_model & POWERPC_MMU_64)) {
> +        return;
> +    }
> +
> +    /* Collect MMU info from kernel if not already */
> +    if (!has_smmu_info) {
> +        kvm_get_smmu_info(env, &smmu_info);
> +        has_smmu_info = true;
> +    }
> +
> +    rampagesize = getrampagesize();
> +
> +    /* Convert to QEMU form */
> +    memset(&env->sps, 0, sizeof(env->sps));
> +
> +    for (ik = iq = 0; ik < KVM_PPC_PAGE_SIZES_MAX_SZ; ik++) {
> +        struct ppc_one_seg_page_size *qsps = &env->sps.sps[iq];
> +        struct kvm_ppc_one_seg_page_size *ksps = &smmu_info.sps[ik];
> +
> +        if (!kvm_valid_page_size(smmu_info.flags, rampagesize,
> +                                 ksps->page_shift)) {
> +            continue;
> +        }
> +        qsps->page_shift = ksps->page_shift;
> +        qsps->slb_enc = ksps->slb_enc;
> +        for (jk = jq = 0; jk < KVM_PPC_PAGE_SIZES_MAX_SZ; jk++) {
> +            if (!kvm_valid_page_size(smmu_info.flags, rampagesize,
> +                                     ksps->enc[jk].page_shift)) {
> +                continue;
> +            }
> +            qsps->enc[jq].page_shift = ksps->enc[jk].page_shift;
> +            qsps->enc[jq].pte_enc = ksps->enc[jk].pte_enc;
> +            if (++jq >= PPC_PAGE_SIZES_MAX_SZ) {
> +                break;
> +            }
> +        }
> +        if (++iq >= PPC_PAGE_SIZES_MAX_SZ) {
> +            break;
> +        }
> +    }
> +    env->slb_nr = smmu_info.slb_size;
> +    if (smmu_info.flags & KVM_PPC_1T_SEGMENTS) {
> +        env->mmu_model |= POWERPC_MMU_1TSEG;
> +    } else {
> +        env->mmu_model &= ~POWERPC_MMU_1TSEG;
> +    }
> +}
> +#else /* defined (TARGET_PPC64) */
> +
> +static inline void kvm_fixup_page_sizes(CPUPPCState *env)
> +{
> +}
> +
> +#endif /* !defined (TARGET_PPC64) */
> +
>  int kvm_arch_init_vcpu(CPUPPCState *cenv)
>  {
>      int ret;
>  
> +    /* Gather server mmu info from KVM and update the CPU state*/
> 

Missing space? :)


Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs
@ 2012-05-21  7:39   ` Alexander Graf
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Graf @ 2012-05-21  7:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: kvm-ppc, kvm list, qemu-ppc, QEMU Developers


On 15.05.2012, at 06:06, Benjamin Herrenschmidt wrote:

> More recent Power server chips (i.e. based on the 64 bit hash MMU)
> support more than just the traditional 4k and 16M page sizes.  This
> can get quite complicated, because which page sizes are supported,
> which combinations are supported within an MMU segment and how these
> page sizes are encoded both in the SLB entry and the hash PTE can vary
> depending on the CPU model (they are not specified by the
> architecture).  In addition the firmware or hypervisor may not permit
> use of certain page sizes, for various reasons.  Whether various page
> sizes are supported on KVM, for example, depends on whether the PR or
> HV variant of KVM is in use, and on the page size of the memory
> backing the guest's RAM.
> 
> This patch adds information to the CPUState and cpu defs to describe
> the supported page sizes and encodings.  Since TCG does not yet
> support any extended page sizes, we just set this to NULL in the
> static CPU definitions, expanding this to the default 4k and 16M page
> sizes when we initialize the cpu state.  When using KVM, however, we
> instead determine available page sizes using the new
> KVM_PPC_GET_SMMU_INFO call.  For old kernels without that call, we use
> some defaults, with some guesswork which should do the right thing for
> existing HV and PR implementations.  The fallback might not be correct
> for future versions, but that's ok, because they'll have
> KVM_PPC_GET_SMMU_INFO.
> 
> 
> Signed-off-by: Benjamin Herrenschmidt <
> benh@kernel.crashing.org
> >
> Signed-off-by: David Gibson <
> david@gibson.dropbear.id.au
> >
> ---
> v2: - Passes checkpatch now (with the exception of the kernel header
>       bit which will eventually be replaced by the real thing when
>       it goes upstream).
>     - Moved back some fixes that were incorrectly located in the
>       second patch (such as setting slb_size)
>     - Fix some issues when using "PR" KVM without proper support
>       for 1T segments
> 
> Note: I kept the caching. It's not a "hot" path but still, it's a
>       lot of mess to do for every CPU (and we routinely have a bunch
>       on those power machines), in the end it adds up ...
> 
>  linux-headers/linux/kvm.h   |   26 ++++++
>  target-ppc/cpu.h            |   31 +++++++
>  target-ppc/helper.c         |    7 ++
>  target-ppc/kvm.c            |  210 +++++++++++++++++++++++++++++++++++++++++++
>  target-ppc/kvm_ppc.h        |    5 +
>  target-ppc/translate_init.c |   21 +++++
>  6 files changed, 300 insertions(+), 0 deletions(-)
> 
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index ee7bd9c..015b6db 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> 

Please split Linux kernel header updates into their own patch, so I can easily redo them when necessary.

Also, you sent this mail to the wrong mailing lists - it's a QEMU patch :). CC'ing the correct ones.

> @@ -449,6 +449,30 @@ struct kvm_ppc_pvinfo {
>  	__u8  pad[108];
>  };
>  
> +/* for KVM_PPC_GET_SMMU_INFO */
> +#define KVM_PPC_PAGE_SIZES_MAX_SZ	8
> +
> +struct kvm_ppc_one_page_size {
> +	__u32 page_shift;	/* Page shift (or 0) */
> +	__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
> +};
> +
> +struct kvm_ppc_one_seg_page_size {
> +	__u32 page_shift;	/* Base page shift of segment (or 0) */
> +	__u32 slb_enc;		/* SLB encoding for BookS */
> +	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +#define KVM_PPC_PAGE_SIZES_REAL		0x00000001
> +#define KVM_PPC_1T_SEGMENTS		0x00000002
> +
> +struct kvm_ppc_smmu_info {
> +	__u64 flags;
> +	__u32 slb_size;
> +	__u32 pad;
> +	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
>  #define KVMIO 0xAE
>  
>  /* machine type bits, to be used as argument to KVM_CREATE_VM */
> @@ -789,6 +813,8 @@ struct kvm_s390_ucas_mapping {
>  /* Available with KVM_CAP_PCI_2_3 */
>  #define KVM_ASSIGN_SET_INTX_MASK  _IOW(KVMIO,  0xa4, \
>  				       struct kvm_assigned_pci_dev)
> +/* Available with KVM_CAP_PPC_GET_SMMU_INFO */
> +#define KVM_PPC_GET_SMMU_INFO     _IOR(KVMIO,  0xa5, struct kvm_ppc_smmu_info)
>  
>  /*
>   * ioctls for vcpu fds
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 84c9674..3ed75ac 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -119,6 +119,8 @@ enum powerpc_mmu_t {
>      POWERPC_MMU_620        = POWERPC_MMU_64 | 0x00000002,
>      /* Architecture 2.06 variant                               */
>      POWERPC_MMU_2_06       = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003,
> +    /* Architecture 2.06 "degraded" (no 1T segments)           */
> +    POWERPC_MMU_2_06d      = POWERPC_MMU_64 | 0x00000003,
>  #endif /* defined(TARGET_PPC64) */
>  };
>  
> @@ -874,6 +876,29 @@ enum {
>  #define DBELL_PIRTAG_MASK              0x3fff
>  
>  /*****************************************************************************/
> +/* Segment page size information, used by recent hash MMUs
> + * The format of this structure mirrors kvm_ppc_smmu_info
> + */
> +
> +#define PPC_PAGE_SIZES_MAX_SZ   8
> +
> +struct ppc_one_page_size {
> +    uint32_t page_shift;  /* Page shift (or 0) */
> +    uint32_t pte_enc;     /* Encoding in the HPTE (>>12) */
> +};
> +
> +struct ppc_one_seg_page_size {
> +    uint32_t page_shift;  /* Base page shift of segment (or 0) */
> +    uint32_t slb_enc;     /* SLB encoding for BookS */
> +    struct ppc_one_page_size enc[PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +struct ppc_segment_page_sizes {
> +    struct ppc_one_seg_page_size sps[PPC_PAGE_SIZES_MAX_SZ];
> +};
> +
> +
> +/*****************************************************************************/
>  /* The whole PowerPC CPU context */
>  #define NB_MMU_MODES 3
>  
> @@ -889,6 +914,9 @@ struct ppc_def_t {
>      powerpc_input_t bus_model;
>      uint32_t flags;
>      int bfd_mach;
> +#if defined(TARGET_PPC64)
> +    const struct ppc_segment_page_sizes *sps;
> +#endif
>      void (*init_proc)(CPUPPCState *env);
>      int  (*check_pow)(CPUPPCState *env);
>  };
> @@ -1012,6 +1040,9 @@ struct CPUPPCState {
>      uint32_t flags;
>      uint64_t insns_flags;
>      uint64_t insns_flags2;
> +#if defined(TARGET_PPC64)
> +    struct ppc_segment_page_sizes sps;
> +#endif
>  
>  #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
>      target_phys_addr_t vpa;
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index e97e496..833d948 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -1617,6 +1617,7 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env)
>  #if defined(TARGET_PPC64)
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          mmubooks_dump_mmu(f, cpu_fprintf, env);
>          break;
>  #endif
> @@ -1647,6 +1648,7 @@ static inline int check_physical(CPUPPCState *env, mmu_ctx_t *ctx,
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          /* Real address are 60 bits long */
>          ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL;
>          ctx->prot |= PAGE_WRITE;
> @@ -1727,6 +1729,7 @@ int get_physical_address (CPUPPCState *env, mmu_ctx_t *ctx, target_ulong eaddr,
>          case POWERPC_MMU_620:
>          case POWERPC_MMU_64B:
>          case POWERPC_MMU_2_06:
> +        case POWERPC_MMU_2_06d:
>  #endif
>              if (ret < 0) {
>                  /* We didn't match any BAT entry or don't have BATs */
> @@ -1867,6 +1870,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw,
>                  case POWERPC_MMU_620:
>                  case POWERPC_MMU_64B:
>                  case POWERPC_MMU_2_06:
> +                case POWERPC_MMU_2_06d:
>  #endif
>                      env->exception_index = POWERPC_EXCP_ISI;
>                      env->error_code = 0x40000000;
> @@ -1977,6 +1981,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw,
>                  case POWERPC_MMU_620:
>                  case POWERPC_MMU_64B:
>                  case POWERPC_MMU_2_06:
> +                case POWERPC_MMU_2_06d:
>  #endif
>                      env->exception_index = POWERPC_EXCP_DSI;
>                      env->error_code = 0;
> @@ -2299,6 +2304,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env)
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>  #endif /* defined(TARGET_PPC64) */
>          tlb_flush(env, 1);
>          break;
> @@ -2367,6 +2373,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr)
>      case POWERPC_MMU_620:
>      case POWERPC_MMU_64B:
>      case POWERPC_MMU_2_06:
> +    case POWERPC_MMU_2_06d:
>          /* tlbie invalidate TLBs for all segments */
>          /* XXX: given the fact that there are too many segments to invalidate,
>           *      and we still don't have a tlb_flush_mask(env, n, mask) in QEMU,
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 0ab7630..2100cb9 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -18,6 +18,7 @@
>  #include <sys/types.h>
>  #include <sys/ioctl.h>
>  #include <sys/mman.h>
> +#include <sys/vfs.h>
>  
>  #include <linux/kvm.h>
>  
> @@ -167,10 +168,219 @@ static int kvm_booke206_tlb_init(CPUPPCState *env)
>      return 0;
>  }
>  
> +
> +#if defined(TARGET_PPC64)
> +static void kvm_get_fallback_smmu_info(CPUPPCState *env,
> +                                       struct kvm_ppc_smmu_info *info)
> +{
> +    memset(info, 0, sizeof(*info));
> +
> +    /* We don't have the new KVM_PPC_GET_SMMU_INFO ioctl, so
> +     * need to "guess" what the supported page sizes are.
> +     *
> +     * For that to work we make a few assumptions:
> +     *
> +     * - If KVM_CAP_PPC_GET_PVINFO is supported we are running "PR"
> +     *   KVM which only supports 4K and 16M pages, but supports them
> +     *   regardless of the backing store characteritics. We also don't
> +     *   support 1T segments.
> +     *
> +     *   This is safe as if HV KVM ever supports that capability or PR
> +     *   KVM grows supports for more page/segment sizes, those versions
> +     *   will have implemented KVM_CAP_PPC_GET_SMMU_INFO and thus we
> +     *   will not hit this fallback
> +     *
> +     * - Else we are running HV KVM. This means we only support page
> +     *   sizes that fit in the backing store. Additionally we only
> +     *   advertize 64K pages if the processor is ARCH 2.06 and we assume
> +     *   P7 encodings for the SLB and hash table. Here too, we assume
> +     *   support for any newer processor will mean a kernel that
> +     *   implements KVM_CAP_PPC_GET_SMMU_INFO and thus doesn't hit
> +     *   this fallback.
> +     */
> +    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)) {
> +        /* No flags */
> +        info->flags = 0;
> +        info->slb_size = 64;
> +
> +        /* Standard 4k base page size segment */
> +        info->sps[0].page_shift = 12;
> +        info->sps[0].slb_enc = 0;
> +        info->sps[0].enc[0].page_shift = 12;
> +        info->sps[0].enc[0].pte_enc = 0;
> +
> +        /* Standard 16M large page size segment */
> +        info->sps[1].page_shift = 24;
> +        info->sps[1].slb_enc = SLB_VSID_L;
> +        info->sps[1].enc[0].page_shift = 24;
> +        info->sps[1].enc[0].pte_enc = 0;
> +    } else {
> +        int i = 0;
> +
> +        /* HV KVM has backing store size restrictions */
> +        info->flags = KVM_PPC_PAGE_SIZES_REAL;
> +
> +        if (env->mmu_model & POWERPC_MMU_1TSEG) {
> +            info->flags |= KVM_PPC_1T_SEGMENTS;
> +        }
> +
> +        if (env->mmu_model = POWERPC_MMU_2_06) {
> +            info->slb_size = 32;
> +        } else {
> +            info->slb_size = 64;
> +        }
> +
> +        /* Standard 4k base page size segment */
> +        info->sps[i].page_shift = 12;
> +        info->sps[i].slb_enc = 0;
> +        info->sps[i].enc[0].page_shift = 12;
> +        info->sps[i].enc[0].pte_enc = 0;
> +        i++;
> +
> +        /* 64K on MMU 2.06 */
> +        if (env->mmu_model = POWERPC_MMU_2_06) {
> +            info->sps[i].page_shift = 16;
> +            info->sps[i].slb_enc = 0x110;
> +            info->sps[i].enc[0].page_shift = 16;
> +            info->sps[i].enc[0].pte_enc = 1;
> +            i++;
> +        }
> +
> +        /* Standard 16M large page size segment */
> +        info->sps[i].page_shift = 24;
> +        info->sps[i].slb_enc = SLB_VSID_L;
> +        info->sps[i].enc[0].page_shift = 24;
> +        info->sps[i].enc[0].pte_enc = 0;
> +    }
> +}
> +
> +static void kvm_get_smmu_info(CPUPPCState *env, struct kvm_ppc_smmu_info *info)
> +{
> +#ifdef KVM_CAP_PPC_GET_SMMU_INFO
> 

No need to for the #ifdef anymore, because we're syncing the headers now.

> +    int ret;
> +
> +    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_SMMU_INFO)) {
> +        ret = kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_SMMU_INFO, info);
> +        if (ret = 0) {
> +            return;
> +        }
> +    }
> +#endif /* KVM_CAP_PPC_GET_SMMU_INFO */
> +
> +    kvm_get_fallback_smmu_info(env, info);
> +}
> +
> +static long getrampagesize(void)
> +{
> +    struct statfs fs;
> +    int ret;
> +
> +    if (!mem_path) {
> +        /* guest RAM is backed by normal anonymous pages */
> +        return getpagesize();
> +    }
> +
> +    do {
> +        ret = statfs(mem_path, &fs);
> +    } while (ret != 0 && errno = EINTR);
> +
> +    if (ret != 0) {
> +        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> +                strerror(errno));
> +        exit(1);
> +    }
> +
> +#define HUGETLBFS_MAGIC       0x958458f6
> +
> +    if (fs.f_type != HUGETLBFS_MAGIC) {
> +        /* Explicit mempath, but it's ordinary pages */
> +        return getpagesize();
> +    }
> +
> +    /* It's hugepage, return the huge page size */
> +    return fs.f_bsize;
> +}
> +
> +static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift)
> +{
> +    if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) {
> +        return true;
> +    }
> +
> +    return (1ul << shift) <= rampgsize;
> +}
> +
> +static void kvm_fixup_page_sizes(CPUPPCState *env)
> +{
> +    static struct kvm_ppc_smmu_info smmu_info;
> +    static bool has_smmu_info;
> +    long rampagesize;
> +    int iq, ik, jq, jk;
> +
> +    /* We only handle page sizes for 64-bit server guests for now */
> +    if (!(env->mmu_model & POWERPC_MMU_64)) {
> +        return;
> +    }
> +
> +    /* Collect MMU info from kernel if not already */
> +    if (!has_smmu_info) {
> +        kvm_get_smmu_info(env, &smmu_info);
> +        has_smmu_info = true;
> +    }
> +
> +    rampagesize = getrampagesize();
> +
> +    /* Convert to QEMU form */
> +    memset(&env->sps, 0, sizeof(env->sps));
> +
> +    for (ik = iq = 0; ik < KVM_PPC_PAGE_SIZES_MAX_SZ; ik++) {
> +        struct ppc_one_seg_page_size *qsps = &env->sps.sps[iq];
> +        struct kvm_ppc_one_seg_page_size *ksps = &smmu_info.sps[ik];
> +
> +        if (!kvm_valid_page_size(smmu_info.flags, rampagesize,
> +                                 ksps->page_shift)) {
> +            continue;
> +        }
> +        qsps->page_shift = ksps->page_shift;
> +        qsps->slb_enc = ksps->slb_enc;
> +        for (jk = jq = 0; jk < KVM_PPC_PAGE_SIZES_MAX_SZ; jk++) {
> +            if (!kvm_valid_page_size(smmu_info.flags, rampagesize,
> +                                     ksps->enc[jk].page_shift)) {
> +                continue;
> +            }
> +            qsps->enc[jq].page_shift = ksps->enc[jk].page_shift;
> +            qsps->enc[jq].pte_enc = ksps->enc[jk].pte_enc;
> +            if (++jq >= PPC_PAGE_SIZES_MAX_SZ) {
> +                break;
> +            }
> +        }
> +        if (++iq >= PPC_PAGE_SIZES_MAX_SZ) {
> +            break;
> +        }
> +    }
> +    env->slb_nr = smmu_info.slb_size;
> +    if (smmu_info.flags & KVM_PPC_1T_SEGMENTS) {
> +        env->mmu_model |= POWERPC_MMU_1TSEG;
> +    } else {
> +        env->mmu_model &= ~POWERPC_MMU_1TSEG;
> +    }
> +}
> +#else /* defined (TARGET_PPC64) */
> +
> +static inline void kvm_fixup_page_sizes(CPUPPCState *env)
> +{
> +}
> +
> +#endif /* !defined (TARGET_PPC64) */
> +
>  int kvm_arch_init_vcpu(CPUPPCState *cenv)
>  {
>      int ret;
>  
> +    /* Gather server mmu info from KVM and update the CPU state*/
> 

Missing space? :)


Alex


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] ppc: CPU reset must flush translation buffer
  2012-05-21  7:15     ` Peter Maydell
@ 2012-05-21  7:39       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-21  7:39 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Alexander Graf, Andreas Färber, qemu-devel

On Mon, 2012-05-21 at 08:15 +0100, Peter Maydell wrote:
> The conclusion we came to is that you only need to tb_flush
> in your CPU's reset function if you have some CPU state which
> you handle by baking it into translated code and doing a tb_flush
> when the state changes. This is relatively rare, most CPU
> frontends only use the other options:
>  (a) CPU state is constant for life of simulation
>  (b) CPU state not baked into code
>  (c) CPU state encoded in tb_flags.
> 
> In particular, target-ppc doesn't have any uses of tb_flush
> at the moment, so either this fix is insufficient (and you need
> to also use tb_flush at the point where the relevant state is
> changed by whatever helper function) or it's the wrong fix.
> 
> If the issue is ROM reloading then the loading code needs to
> be fixed (compare the way that the memory region API correctly
> handles bits of physical memory being mapped/unmapped/remapped
> without the caller needing to do a tb_reset).

Hrm, the state shouldn't change in a drastic way.... we can reproduce
from SLOF which is in real mode and the reset happens in real mode... it
looks like a flush of the exception vectors problem to me.

So that would mean that the ROM reload isn't flushing properly (well,
possibly, need to investigate more). From what I can tell the reload is
done implicitely by generic qemu code creating rom objects when I call
load_image_targphys.

So if something is missing here it's from the generic code, I will dig a
bit more later, gotta take care of sick kids...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-21  7:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1337054780.6727.60.camel@pasglop>
2012-05-21  2:01 ` [Qemu-devel] ppc: CPU reset must flush translation buffer Benjamin Herrenschmidt
2012-05-21  6:16   ` Alexander Graf
2012-05-21  6:26     ` Benjamin Herrenschmidt
2012-05-21  7:15     ` Peter Maydell
2012-05-21  7:39       ` Benjamin Herrenschmidt
2012-05-21  7:39 ` [PATCH v2 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs Alexander Graf
2012-05-21  7:39   ` Alexander Graf
2012-05-21  7:39   ` [Qemu-devel] " Alexander Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.