From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH v2 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs Date: Mon, 21 May 2012 09:39:24 +0200 Message-ID: <0A6CA42B-C1C1-4FF7-A271-7839D0291A0B@suse.de> References: <1337054780.6727.60.camel@pasglop> Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: kvm-ppc , kvm list , qemu-ppc , QEMU Developers To: Benjamin Herrenschmidt Return-path: In-Reply-To: <1337054780.6727.60.camel@pasglop> Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 15.05.2012, at 06:06, Benjamin Herrenschmidt wrote: > More recent Power server chips (i.e. based on the 64 bit hash MMU) > support more than just the traditional 4k and 16M page sizes. This > can get quite complicated, because which page sizes are supported, > which combinations are supported within an MMU segment and how these > page sizes are encoded both in the SLB entry and the hash PTE can vary > depending on the CPU model (they are not specified by the > architecture). In addition the firmware or hypervisor may not permit > use of certain page sizes, for various reasons. Whether various page > sizes are supported on KVM, for example, depends on whether the PR or > HV variant of KVM is in use, and on the page size of the memory > backing the guest's RAM. > > This patch adds information to the CPUState and cpu defs to describe > the supported page sizes and encodings. Since TCG does not yet > support any extended page sizes, we just set this to NULL in the > static CPU definitions, expanding this to the default 4k and 16M page > sizes when we initialize the cpu state. When using KVM, however, we > instead determine available page sizes using the new > KVM_PPC_GET_SMMU_INFO call. For old kernels without that call, we use > some defaults, with some guesswork which should do the right thing for > existing HV and PR implementations. The fallback might not be correct > for future versions, but that's ok, because they'll have > KVM_PPC_GET_SMMU_INFO. > > > Signed-off-by: Benjamin Herrenschmidt < > benh@kernel.crashing.org > > > Signed-off-by: David Gibson < > david@gibson.dropbear.id.au > > > --- > v2: - Passes checkpatch now (with the exception of the kernel header > bit which will eventually be replaced by the real thing when > it goes upstream). > - Moved back some fixes that were incorrectly located in the > second patch (such as setting slb_size) > - Fix some issues when using "PR" KVM without proper support > for 1T segments > > Note: I kept the caching. It's not a "hot" path but still, it's a > lot of mess to do for every CPU (and we routinely have a bunch > on those power machines), in the end it adds up ... > > linux-headers/linux/kvm.h | 26 ++++++ > target-ppc/cpu.h | 31 +++++++ > target-ppc/helper.c | 7 ++ > target-ppc/kvm.c | 210 +++++++++++++++++++++++++++++++++++++++++++ > target-ppc/kvm_ppc.h | 5 + > target-ppc/translate_init.c | 21 +++++ > 6 files changed, 300 insertions(+), 0 deletions(-) > > diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h > index ee7bd9c..015b6db 100644 > --- a/linux-headers/linux/kvm.h > +++ b/linux-headers/linux/kvm.h > Please split Linux kernel header updates into their own patch, so I can easily redo them when necessary. Also, you sent this mail to the wrong mailing lists - it's a QEMU patch :). CC'ing the correct ones. > @@ -449,6 +449,30 @@ struct kvm_ppc_pvinfo { > __u8 pad[108]; > }; > > +/* for KVM_PPC_GET_SMMU_INFO */ > +#define KVM_PPC_PAGE_SIZES_MAX_SZ 8 > + > +struct kvm_ppc_one_page_size { > + __u32 page_shift; /* Page shift (or 0) */ > + __u32 pte_enc; /* Encoding in the HPTE (>>12) */ > +}; > + > +struct kvm_ppc_one_seg_page_size { > + __u32 page_shift; /* Base page shift of segment (or 0) */ > + __u32 slb_enc; /* SLB encoding for BookS */ > + struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > +#define KVM_PPC_PAGE_SIZES_REAL 0x00000001 > +#define KVM_PPC_1T_SEGMENTS 0x00000002 > + > +struct kvm_ppc_smmu_info { > + __u64 flags; > + __u32 slb_size; > + __u32 pad; > + struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > #define KVMIO 0xAE > > /* machine type bits, to be used as argument to KVM_CREATE_VM */ > @@ -789,6 +813,8 @@ struct kvm_s390_ucas_mapping { > /* Available with KVM_CAP_PCI_2_3 */ > #define KVM_ASSIGN_SET_INTX_MASK _IOW(KVMIO, 0xa4, \ > struct kvm_assigned_pci_dev) > +/* Available with KVM_CAP_PPC_GET_SMMU_INFO */ > +#define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa5, struct kvm_ppc_smmu_info) > > /* > * ioctls for vcpu fds > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h > index 84c9674..3ed75ac 100644 > --- a/target-ppc/cpu.h > +++ b/target-ppc/cpu.h > @@ -119,6 +119,8 @@ enum powerpc_mmu_t { > POWERPC_MMU_620 = POWERPC_MMU_64 | 0x00000002, > /* Architecture 2.06 variant */ > POWERPC_MMU_2_06 = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003, > + /* Architecture 2.06 "degraded" (no 1T segments) */ > + POWERPC_MMU_2_06d = POWERPC_MMU_64 | 0x00000003, > #endif /* defined(TARGET_PPC64) */ > }; > > @@ -874,6 +876,29 @@ enum { > #define DBELL_PIRTAG_MASK 0x3fff > > /*****************************************************************************/ > +/* Segment page size information, used by recent hash MMUs > + * The format of this structure mirrors kvm_ppc_smmu_info > + */ > + > +#define PPC_PAGE_SIZES_MAX_SZ 8 > + > +struct ppc_one_page_size { > + uint32_t page_shift; /* Page shift (or 0) */ > + uint32_t pte_enc; /* Encoding in the HPTE (>>12) */ > +}; > + > +struct ppc_one_seg_page_size { > + uint32_t page_shift; /* Base page shift of segment (or 0) */ > + uint32_t slb_enc; /* SLB encoding for BookS */ > + struct ppc_one_page_size enc[PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > +struct ppc_segment_page_sizes { > + struct ppc_one_seg_page_size sps[PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > + > +/*****************************************************************************/ > /* The whole PowerPC CPU context */ > #define NB_MMU_MODES 3 > > @@ -889,6 +914,9 @@ struct ppc_def_t { > powerpc_input_t bus_model; > uint32_t flags; > int bfd_mach; > +#if defined(TARGET_PPC64) > + const struct ppc_segment_page_sizes *sps; > +#endif > void (*init_proc)(CPUPPCState *env); > int (*check_pow)(CPUPPCState *env); > }; > @@ -1012,6 +1040,9 @@ struct CPUPPCState { > uint32_t flags; > uint64_t insns_flags; > uint64_t insns_flags2; > +#if defined(TARGET_PPC64) > + struct ppc_segment_page_sizes sps; > +#endif > > #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY) > target_phys_addr_t vpa; > diff --git a/target-ppc/helper.c b/target-ppc/helper.c > index e97e496..833d948 100644 > --- a/target-ppc/helper.c > +++ b/target-ppc/helper.c > @@ -1617,6 +1617,7 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env) > #if defined(TARGET_PPC64) > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > mmubooks_dump_mmu(f, cpu_fprintf, env); > break; > #endif > @@ -1647,6 +1648,7 @@ static inline int check_physical(CPUPPCState *env, mmu_ctx_t *ctx, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > /* Real address are 60 bits long */ > ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL; > ctx->prot |= PAGE_WRITE; > @@ -1727,6 +1729,7 @@ int get_physical_address (CPUPPCState *env, mmu_ctx_t *ctx, target_ulong eaddr, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > if (ret < 0) { > /* We didn't match any BAT entry or don't have BATs */ > @@ -1867,6 +1870,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > env->exception_index = POWERPC_EXCP_ISI; > env->error_code = 0x40000000; > @@ -1977,6 +1981,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > env->exception_index = POWERPC_EXCP_DSI; > env->error_code = 0; > @@ -2299,6 +2304,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env) > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif /* defined(TARGET_PPC64) */ > tlb_flush(env, 1); > break; > @@ -2367,6 +2373,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr) > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > /* tlbie invalidate TLBs for all segments */ > /* XXX: given the fact that there are too many segments to invalidate, > * and we still don't have a tlb_flush_mask(env, n, mask) in QEMU, > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 0ab7630..2100cb9 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #include > > @@ -167,10 +168,219 @@ static int kvm_booke206_tlb_init(CPUPPCState *env) > return 0; > } > > + > +#if defined(TARGET_PPC64) > +static void kvm_get_fallback_smmu_info(CPUPPCState *env, > + struct kvm_ppc_smmu_info *info) > +{ > + memset(info, 0, sizeof(*info)); > + > + /* We don't have the new KVM_PPC_GET_SMMU_INFO ioctl, so > + * need to "guess" what the supported page sizes are. > + * > + * For that to work we make a few assumptions: > + * > + * - If KVM_CAP_PPC_GET_PVINFO is supported we are running "PR" > + * KVM which only supports 4K and 16M pages, but supports them > + * regardless of the backing store characteritics. We also don't > + * support 1T segments. > + * > + * This is safe as if HV KVM ever supports that capability or PR > + * KVM grows supports for more page/segment sizes, those versions > + * will have implemented KVM_CAP_PPC_GET_SMMU_INFO and thus we > + * will not hit this fallback > + * > + * - Else we are running HV KVM. This means we only support page > + * sizes that fit in the backing store. Additionally we only > + * advertize 64K pages if the processor is ARCH 2.06 and we assume > + * P7 encodings for the SLB and hash table. Here too, we assume > + * support for any newer processor will mean a kernel that > + * implements KVM_CAP_PPC_GET_SMMU_INFO and thus doesn't hit > + * this fallback. > + */ > + if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)) { > + /* No flags */ > + info->flags = 0; > + info->slb_size = 64; > + > + /* Standard 4k base page size segment */ > + info->sps[0].page_shift = 12; > + info->sps[0].slb_enc = 0; > + info->sps[0].enc[0].page_shift = 12; > + info->sps[0].enc[0].pte_enc = 0; > + > + /* Standard 16M large page size segment */ > + info->sps[1].page_shift = 24; > + info->sps[1].slb_enc = SLB_VSID_L; > + info->sps[1].enc[0].page_shift = 24; > + info->sps[1].enc[0].pte_enc = 0; > + } else { > + int i = 0; > + > + /* HV KVM has backing store size restrictions */ > + info->flags = KVM_PPC_PAGE_SIZES_REAL; > + > + if (env->mmu_model & POWERPC_MMU_1TSEG) { > + info->flags |= KVM_PPC_1T_SEGMENTS; > + } > + > + if (env->mmu_model == POWERPC_MMU_2_06) { > + info->slb_size = 32; > + } else { > + info->slb_size = 64; > + } > + > + /* Standard 4k base page size segment */ > + info->sps[i].page_shift = 12; > + info->sps[i].slb_enc = 0; > + info->sps[i].enc[0].page_shift = 12; > + info->sps[i].enc[0].pte_enc = 0; > + i++; > + > + /* 64K on MMU 2.06 */ > + if (env->mmu_model == POWERPC_MMU_2_06) { > + info->sps[i].page_shift = 16; > + info->sps[i].slb_enc = 0x110; > + info->sps[i].enc[0].page_shift = 16; > + info->sps[i].enc[0].pte_enc = 1; > + i++; > + } > + > + /* Standard 16M large page size segment */ > + info->sps[i].page_shift = 24; > + info->sps[i].slb_enc = SLB_VSID_L; > + info->sps[i].enc[0].page_shift = 24; > + info->sps[i].enc[0].pte_enc = 0; > + } > +} > + > +static void kvm_get_smmu_info(CPUPPCState *env, struct kvm_ppc_smmu_info *info) > +{ > +#ifdef KVM_CAP_PPC_GET_SMMU_INFO > No need to for the #ifdef anymore, because we're syncing the headers now. > + int ret; > + > + if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_SMMU_INFO)) { > + ret = kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_SMMU_INFO, info); > + if (ret == 0) { > + return; > + } > + } > +#endif /* KVM_CAP_PPC_GET_SMMU_INFO */ > + > + kvm_get_fallback_smmu_info(env, info); > +} > + > +static long getrampagesize(void) > +{ > + struct statfs fs; > + int ret; > + > + if (!mem_path) { > + /* guest RAM is backed by normal anonymous pages */ > + return getpagesize(); > + } > + > + do { > + ret = statfs(mem_path, &fs); > + } while (ret != 0 && errno == EINTR); > + > + if (ret != 0) { > + fprintf(stderr, "Couldn't statfs() memory path: %s\n", > + strerror(errno)); > + exit(1); > + } > + > +#define HUGETLBFS_MAGIC 0x958458f6 > + > + if (fs.f_type != HUGETLBFS_MAGIC) { > + /* Explicit mempath, but it's ordinary pages */ > + return getpagesize(); > + } > + > + /* It's hugepage, return the huge page size */ > + return fs.f_bsize; > +} > + > +static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift) > +{ > + if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) { > + return true; > + } > + > + return (1ul << shift) <= rampgsize; > +} > + > +static void kvm_fixup_page_sizes(CPUPPCState *env) > +{ > + static struct kvm_ppc_smmu_info smmu_info; > + static bool has_smmu_info; > + long rampagesize; > + int iq, ik, jq, jk; > + > + /* We only handle page sizes for 64-bit server guests for now */ > + if (!(env->mmu_model & POWERPC_MMU_64)) { > + return; > + } > + > + /* Collect MMU info from kernel if not already */ > + if (!has_smmu_info) { > + kvm_get_smmu_info(env, &smmu_info); > + has_smmu_info = true; > + } > + > + rampagesize = getrampagesize(); > + > + /* Convert to QEMU form */ > + memset(&env->sps, 0, sizeof(env->sps)); > + > + for (ik = iq = 0; ik < KVM_PPC_PAGE_SIZES_MAX_SZ; ik++) { > + struct ppc_one_seg_page_size *qsps = &env->sps.sps[iq]; > + struct kvm_ppc_one_seg_page_size *ksps = &smmu_info.sps[ik]; > + > + if (!kvm_valid_page_size(smmu_info.flags, rampagesize, > + ksps->page_shift)) { > + continue; > + } > + qsps->page_shift = ksps->page_shift; > + qsps->slb_enc = ksps->slb_enc; > + for (jk = jq = 0; jk < KVM_PPC_PAGE_SIZES_MAX_SZ; jk++) { > + if (!kvm_valid_page_size(smmu_info.flags, rampagesize, > + ksps->enc[jk].page_shift)) { > + continue; > + } > + qsps->enc[jq].page_shift = ksps->enc[jk].page_shift; > + qsps->enc[jq].pte_enc = ksps->enc[jk].pte_enc; > + if (++jq >= PPC_PAGE_SIZES_MAX_SZ) { > + break; > + } > + } > + if (++iq >= PPC_PAGE_SIZES_MAX_SZ) { > + break; > + } > + } > + env->slb_nr = smmu_info.slb_size; > + if (smmu_info.flags & KVM_PPC_1T_SEGMENTS) { > + env->mmu_model |= POWERPC_MMU_1TSEG; > + } else { > + env->mmu_model &= ~POWERPC_MMU_1TSEG; > + } > +} > +#else /* defined (TARGET_PPC64) */ > + > +static inline void kvm_fixup_page_sizes(CPUPPCState *env) > +{ > +} > + > +#endif /* !defined (TARGET_PPC64) */ > + > int kvm_arch_init_vcpu(CPUPPCState *cenv) > { > int ret; > > + /* Gather server mmu info from KVM and update the CPU state*/ > Missing space? :) Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:41962) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SWNDR-0001X7-Kp for qemu-devel@nongnu.org; Mon, 21 May 2012 03:39:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SWNDH-00044i-UB for qemu-devel@nongnu.org; Mon, 21 May 2012 03:39:41 -0400 Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=us-ascii From: Alexander Graf In-Reply-To: <1337054780.6727.60.camel@pasglop> Date: Mon, 21 May 2012 09:39:24 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <0A6CA42B-C1C1-4FF7-A271-7839D0291A0B@suse.de> References: <1337054780.6727.60.camel@pasglop> Subject: Re: [Qemu-devel] [PATCH v2 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Benjamin Herrenschmidt Cc: qemu-ppc , kvm list , kvm-ppc , QEMU Developers On 15.05.2012, at 06:06, Benjamin Herrenschmidt wrote: > More recent Power server chips (i.e. based on the 64 bit hash MMU) > support more than just the traditional 4k and 16M page sizes. This > can get quite complicated, because which page sizes are supported, > which combinations are supported within an MMU segment and how these > page sizes are encoded both in the SLB entry and the hash PTE can vary > depending on the CPU model (they are not specified by the > architecture). In addition the firmware or hypervisor may not permit > use of certain page sizes, for various reasons. Whether various page > sizes are supported on KVM, for example, depends on whether the PR or > HV variant of KVM is in use, and on the page size of the memory > backing the guest's RAM. >=20 > This patch adds information to the CPUState and cpu defs to describe > the supported page sizes and encodings. Since TCG does not yet > support any extended page sizes, we just set this to NULL in the > static CPU definitions, expanding this to the default 4k and 16M page > sizes when we initialize the cpu state. When using KVM, however, we > instead determine available page sizes using the new > KVM_PPC_GET_SMMU_INFO call. For old kernels without that call, we use > some defaults, with some guesswork which should do the right thing for > existing HV and PR implementations. The fallback might not be correct > for future versions, but that's ok, because they'll have > KVM_PPC_GET_SMMU_INFO. >=20 >=20 > Signed-off-by: Benjamin Herrenschmidt < > benh@kernel.crashing.org > > > Signed-off-by: David Gibson < > david@gibson.dropbear.id.au > > > --- > v2: - Passes checkpatch now (with the exception of the kernel header > bit which will eventually be replaced by the real thing when > it goes upstream). > - Moved back some fixes that were incorrectly located in the > second patch (such as setting slb_size) > - Fix some issues when using "PR" KVM without proper support > for 1T segments >=20 > Note: I kept the caching. It's not a "hot" path but still, it's a > lot of mess to do for every CPU (and we routinely have a bunch > on those power machines), in the end it adds up ... >=20 > linux-headers/linux/kvm.h | 26 ++++++ > target-ppc/cpu.h | 31 +++++++ > target-ppc/helper.c | 7 ++ > target-ppc/kvm.c | 210 = +++++++++++++++++++++++++++++++++++++++++++ > target-ppc/kvm_ppc.h | 5 + > target-ppc/translate_init.c | 21 +++++ > 6 files changed, 300 insertions(+), 0 deletions(-) >=20 > diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h > index ee7bd9c..015b6db 100644 > --- a/linux-headers/linux/kvm.h > +++ b/linux-headers/linux/kvm.h >=20 Please split Linux kernel header updates into their own patch, so I can = easily redo them when necessary. Also, you sent this mail to the wrong mailing lists - it's a QEMU patch = :). CC'ing the correct ones. > @@ -449,6 +449,30 @@ struct kvm_ppc_pvinfo { > __u8 pad[108]; > }; > =20 > +/* for KVM_PPC_GET_SMMU_INFO */ > +#define KVM_PPC_PAGE_SIZES_MAX_SZ 8 > + > +struct kvm_ppc_one_page_size { > + __u32 page_shift; /* Page shift (or 0) */ > + __u32 pte_enc; /* Encoding in the HPTE (>>12) */ > +}; > + > +struct kvm_ppc_one_seg_page_size { > + __u32 page_shift; /* Base page shift of segment (or 0) */ > + __u32 slb_enc; /* SLB encoding for BookS */ > + struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > +#define KVM_PPC_PAGE_SIZES_REAL 0x00000001 > +#define KVM_PPC_1T_SEGMENTS 0x00000002 > + > +struct kvm_ppc_smmu_info { > + __u64 flags; > + __u32 slb_size; > + __u32 pad; > + struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > #define KVMIO 0xAE > =20 > /* machine type bits, to be used as argument to KVM_CREATE_VM */ > @@ -789,6 +813,8 @@ struct kvm_s390_ucas_mapping { > /* Available with KVM_CAP_PCI_2_3 */ > #define KVM_ASSIGN_SET_INTX_MASK _IOW(KVMIO, 0xa4, \ > struct kvm_assigned_pci_dev) > +/* Available with KVM_CAP_PPC_GET_SMMU_INFO */ > +#define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa5, struct = kvm_ppc_smmu_info) > =20 > /* > * ioctls for vcpu fds > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h > index 84c9674..3ed75ac 100644 > --- a/target-ppc/cpu.h > +++ b/target-ppc/cpu.h > @@ -119,6 +119,8 @@ enum powerpc_mmu_t { > POWERPC_MMU_620 =3D POWERPC_MMU_64 | 0x00000002, > /* Architecture 2.06 variant */ > POWERPC_MMU_2_06 =3D POWERPC_MMU_64 | POWERPC_MMU_1TSEG | = 0x00000003, > + /* Architecture 2.06 "degraded" (no 1T segments) */ > + POWERPC_MMU_2_06d =3D POWERPC_MMU_64 | 0x00000003, > #endif /* defined(TARGET_PPC64) */ > }; > =20 > @@ -874,6 +876,29 @@ enum { > #define DBELL_PIRTAG_MASK 0x3fff > =20 > = /*************************************************************************= ****/ > +/* Segment page size information, used by recent hash MMUs > + * The format of this structure mirrors kvm_ppc_smmu_info > + */ > + > +#define PPC_PAGE_SIZES_MAX_SZ 8 > + > +struct ppc_one_page_size { > + uint32_t page_shift; /* Page shift (or 0) */ > + uint32_t pte_enc; /* Encoding in the HPTE (>>12) */ > +}; > + > +struct ppc_one_seg_page_size { > + uint32_t page_shift; /* Base page shift of segment (or 0) */ > + uint32_t slb_enc; /* SLB encoding for BookS */ > + struct ppc_one_page_size enc[PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > +struct ppc_segment_page_sizes { > + struct ppc_one_seg_page_size sps[PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > + > = +/************************************************************************= *****/ > /* The whole PowerPC CPU context */ > #define NB_MMU_MODES 3 > =20 > @@ -889,6 +914,9 @@ struct ppc_def_t { > powerpc_input_t bus_model; > uint32_t flags; > int bfd_mach; > +#if defined(TARGET_PPC64) > + const struct ppc_segment_page_sizes *sps; > +#endif > void (*init_proc)(CPUPPCState *env); > int (*check_pow)(CPUPPCState *env); > }; > @@ -1012,6 +1040,9 @@ struct CPUPPCState { > uint32_t flags; > uint64_t insns_flags; > uint64_t insns_flags2; > +#if defined(TARGET_PPC64) > + struct ppc_segment_page_sizes sps; > +#endif > =20 > #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY) > target_phys_addr_t vpa; > diff --git a/target-ppc/helper.c b/target-ppc/helper.c > index e97e496..833d948 100644 > --- a/target-ppc/helper.c > +++ b/target-ppc/helper.c > @@ -1617,6 +1617,7 @@ void dump_mmu(FILE *f, fprintf_function = cpu_fprintf, CPUPPCState *env) > #if defined(TARGET_PPC64) > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > mmubooks_dump_mmu(f, cpu_fprintf, env); > break; > #endif > @@ -1647,6 +1648,7 @@ static inline int check_physical(CPUPPCState = *env, mmu_ctx_t *ctx, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > /* Real address are 60 bits long */ > ctx->raddr &=3D 0x0FFFFFFFFFFFFFFFULL; > ctx->prot |=3D PAGE_WRITE; > @@ -1727,6 +1729,7 @@ int get_physical_address (CPUPPCState *env, = mmu_ctx_t *ctx, target_ulong eaddr, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > if (ret < 0) { > /* We didn't match any BAT entry or don't have BATs = */ > @@ -1867,6 +1870,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, = target_ulong address, int rw, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > env->exception_index =3D POWERPC_EXCP_ISI; > env->error_code =3D 0x40000000; > @@ -1977,6 +1981,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, = target_ulong address, int rw, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > env->exception_index =3D POWERPC_EXCP_DSI; > env->error_code =3D 0; > @@ -2299,6 +2304,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env) > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif /* defined(TARGET_PPC64) */ > tlb_flush(env, 1); > break; > @@ -2367,6 +2373,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, = target_ulong addr) > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > /* tlbie invalidate TLBs for all segments */ > /* XXX: given the fact that there are too many segments to = invalidate, > * and we still don't have a tlb_flush_mask(env, n, = mask) in QEMU, > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 0ab7630..2100cb9 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > =20 > #include > =20 > @@ -167,10 +168,219 @@ static int kvm_booke206_tlb_init(CPUPPCState = *env) > return 0; > } > =20 > + > +#if defined(TARGET_PPC64) > +static void kvm_get_fallback_smmu_info(CPUPPCState *env, > + struct kvm_ppc_smmu_info = *info) > +{ > + memset(info, 0, sizeof(*info)); > + > + /* We don't have the new KVM_PPC_GET_SMMU_INFO ioctl, so > + * need to "guess" what the supported page sizes are. > + * > + * For that to work we make a few assumptions: > + * > + * - If KVM_CAP_PPC_GET_PVINFO is supported we are running "PR" > + * KVM which only supports 4K and 16M pages, but supports them > + * regardless of the backing store characteritics. We also = don't > + * support 1T segments. > + * > + * This is safe as if HV KVM ever supports that capability or = PR > + * KVM grows supports for more page/segment sizes, those = versions > + * will have implemented KVM_CAP_PPC_GET_SMMU_INFO and thus we > + * will not hit this fallback > + * > + * - Else we are running HV KVM. This means we only support page > + * sizes that fit in the backing store. Additionally we only > + * advertize 64K pages if the processor is ARCH 2.06 and we = assume > + * P7 encodings for the SLB and hash table. Here too, we assume > + * support for any newer processor will mean a kernel that > + * implements KVM_CAP_PPC_GET_SMMU_INFO and thus doesn't hit > + * this fallback. > + */ > + if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)) = { > + /* No flags */ > + info->flags =3D 0; > + info->slb_size =3D 64; > + > + /* Standard 4k base page size segment */ > + info->sps[0].page_shift =3D 12; > + info->sps[0].slb_enc =3D 0; > + info->sps[0].enc[0].page_shift =3D 12; > + info->sps[0].enc[0].pte_enc =3D 0; > + > + /* Standard 16M large page size segment */ > + info->sps[1].page_shift =3D 24; > + info->sps[1].slb_enc =3D SLB_VSID_L; > + info->sps[1].enc[0].page_shift =3D 24; > + info->sps[1].enc[0].pte_enc =3D 0; > + } else { > + int i =3D 0; > + > + /* HV KVM has backing store size restrictions */ > + info->flags =3D KVM_PPC_PAGE_SIZES_REAL; > + > + if (env->mmu_model & POWERPC_MMU_1TSEG) { > + info->flags |=3D KVM_PPC_1T_SEGMENTS; > + } > + > + if (env->mmu_model =3D=3D POWERPC_MMU_2_06) { > + info->slb_size =3D 32; > + } else { > + info->slb_size =3D 64; > + } > + > + /* Standard 4k base page size segment */ > + info->sps[i].page_shift =3D 12; > + info->sps[i].slb_enc =3D 0; > + info->sps[i].enc[0].page_shift =3D 12; > + info->sps[i].enc[0].pte_enc =3D 0; > + i++; > + > + /* 64K on MMU 2.06 */ > + if (env->mmu_model =3D=3D POWERPC_MMU_2_06) { > + info->sps[i].page_shift =3D 16; > + info->sps[i].slb_enc =3D 0x110; > + info->sps[i].enc[0].page_shift =3D 16; > + info->sps[i].enc[0].pte_enc =3D 1; > + i++; > + } > + > + /* Standard 16M large page size segment */ > + info->sps[i].page_shift =3D 24; > + info->sps[i].slb_enc =3D SLB_VSID_L; > + info->sps[i].enc[0].page_shift =3D 24; > + info->sps[i].enc[0].pte_enc =3D 0; > + } > +} > + > +static void kvm_get_smmu_info(CPUPPCState *env, struct = kvm_ppc_smmu_info *info) > +{ > +#ifdef KVM_CAP_PPC_GET_SMMU_INFO >=20 No need to for the #ifdef anymore, because we're syncing the headers = now. > + int ret; > + > + if (kvm_check_extension(env->kvm_state, = KVM_CAP_PPC_GET_SMMU_INFO)) { > + ret =3D kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_SMMU_INFO, = info); > + if (ret =3D=3D 0) { > + return; > + } > + } > +#endif /* KVM_CAP_PPC_GET_SMMU_INFO */ > + > + kvm_get_fallback_smmu_info(env, info); > +} > + > +static long getrampagesize(void) > +{ > + struct statfs fs; > + int ret; > + > + if (!mem_path) { > + /* guest RAM is backed by normal anonymous pages */ > + return getpagesize(); > + } > + > + do { > + ret =3D statfs(mem_path, &fs); > + } while (ret !=3D 0 && errno =3D=3D EINTR); > + > + if (ret !=3D 0) { > + fprintf(stderr, "Couldn't statfs() memory path: %s\n", > + strerror(errno)); > + exit(1); > + } > + > +#define HUGETLBFS_MAGIC 0x958458f6 > + > + if (fs.f_type !=3D HUGETLBFS_MAGIC) { > + /* Explicit mempath, but it's ordinary pages */ > + return getpagesize(); > + } > + > + /* It's hugepage, return the huge page size */ > + return fs.f_bsize; > +} > + > +static bool kvm_valid_page_size(uint32_t flags, long rampgsize, = uint32_t shift) > +{ > + if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) { > + return true; > + } > + > + return (1ul << shift) <=3D rampgsize; > +} > + > +static void kvm_fixup_page_sizes(CPUPPCState *env) > +{ > + static struct kvm_ppc_smmu_info smmu_info; > + static bool has_smmu_info; > + long rampagesize; > + int iq, ik, jq, jk; > + > + /* We only handle page sizes for 64-bit server guests for now */ > + if (!(env->mmu_model & POWERPC_MMU_64)) { > + return; > + } > + > + /* Collect MMU info from kernel if not already */ > + if (!has_smmu_info) { > + kvm_get_smmu_info(env, &smmu_info); > + has_smmu_info =3D true; > + } > + > + rampagesize =3D getrampagesize(); > + > + /* Convert to QEMU form */ > + memset(&env->sps, 0, sizeof(env->sps)); > + > + for (ik =3D iq =3D 0; ik < KVM_PPC_PAGE_SIZES_MAX_SZ; ik++) { > + struct ppc_one_seg_page_size *qsps =3D &env->sps.sps[iq]; > + struct kvm_ppc_one_seg_page_size *ksps =3D = &smmu_info.sps[ik]; > + > + if (!kvm_valid_page_size(smmu_info.flags, rampagesize, > + ksps->page_shift)) { > + continue; > + } > + qsps->page_shift =3D ksps->page_shift; > + qsps->slb_enc =3D ksps->slb_enc; > + for (jk =3D jq =3D 0; jk < KVM_PPC_PAGE_SIZES_MAX_SZ; jk++) { > + if (!kvm_valid_page_size(smmu_info.flags, rampagesize, > + ksps->enc[jk].page_shift)) { > + continue; > + } > + qsps->enc[jq].page_shift =3D ksps->enc[jk].page_shift; > + qsps->enc[jq].pte_enc =3D ksps->enc[jk].pte_enc; > + if (++jq >=3D PPC_PAGE_SIZES_MAX_SZ) { > + break; > + } > + } > + if (++iq >=3D PPC_PAGE_SIZES_MAX_SZ) { > + break; > + } > + } > + env->slb_nr =3D smmu_info.slb_size; > + if (smmu_info.flags & KVM_PPC_1T_SEGMENTS) { > + env->mmu_model |=3D POWERPC_MMU_1TSEG; > + } else { > + env->mmu_model &=3D ~POWERPC_MMU_1TSEG; > + } > +} > +#else /* defined (TARGET_PPC64) */ > + > +static inline void kvm_fixup_page_sizes(CPUPPCState *env) > +{ > +} > + > +#endif /* !defined (TARGET_PPC64) */ > + > int kvm_arch_init_vcpu(CPUPPCState *cenv) > { > int ret; > =20 > + /* Gather server mmu info from KVM and update the CPU state*/ >=20 Missing space? :) Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Date: Mon, 21 May 2012 07:39:24 +0000 Subject: Re: [PATCH v2 1/2] ppc64: Rudimentary Support for extra page sizes on server CPUs Message-Id: <0A6CA42B-C1C1-4FF7-A271-7839D0291A0B@suse.de> List-Id: References: <1337054780.6727.60.camel@pasglop> In-Reply-To: <1337054780.6727.60.camel@pasglop> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Benjamin Herrenschmidt Cc: kvm-ppc , kvm list , qemu-ppc , QEMU Developers On 15.05.2012, at 06:06, Benjamin Herrenschmidt wrote: > More recent Power server chips (i.e. based on the 64 bit hash MMU) > support more than just the traditional 4k and 16M page sizes. This > can get quite complicated, because which page sizes are supported, > which combinations are supported within an MMU segment and how these > page sizes are encoded both in the SLB entry and the hash PTE can vary > depending on the CPU model (they are not specified by the > architecture). In addition the firmware or hypervisor may not permit > use of certain page sizes, for various reasons. Whether various page > sizes are supported on KVM, for example, depends on whether the PR or > HV variant of KVM is in use, and on the page size of the memory > backing the guest's RAM. > > This patch adds information to the CPUState and cpu defs to describe > the supported page sizes and encodings. Since TCG does not yet > support any extended page sizes, we just set this to NULL in the > static CPU definitions, expanding this to the default 4k and 16M page > sizes when we initialize the cpu state. When using KVM, however, we > instead determine available page sizes using the new > KVM_PPC_GET_SMMU_INFO call. For old kernels without that call, we use > some defaults, with some guesswork which should do the right thing for > existing HV and PR implementations. The fallback might not be correct > for future versions, but that's ok, because they'll have > KVM_PPC_GET_SMMU_INFO. > > > Signed-off-by: Benjamin Herrenschmidt < > benh@kernel.crashing.org > > > Signed-off-by: David Gibson < > david@gibson.dropbear.id.au > > > --- > v2: - Passes checkpatch now (with the exception of the kernel header > bit which will eventually be replaced by the real thing when > it goes upstream). > - Moved back some fixes that were incorrectly located in the > second patch (such as setting slb_size) > - Fix some issues when using "PR" KVM without proper support > for 1T segments > > Note: I kept the caching. It's not a "hot" path but still, it's a > lot of mess to do for every CPU (and we routinely have a bunch > on those power machines), in the end it adds up ... > > linux-headers/linux/kvm.h | 26 ++++++ > target-ppc/cpu.h | 31 +++++++ > target-ppc/helper.c | 7 ++ > target-ppc/kvm.c | 210 +++++++++++++++++++++++++++++++++++++++++++ > target-ppc/kvm_ppc.h | 5 + > target-ppc/translate_init.c | 21 +++++ > 6 files changed, 300 insertions(+), 0 deletions(-) > > diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h > index ee7bd9c..015b6db 100644 > --- a/linux-headers/linux/kvm.h > +++ b/linux-headers/linux/kvm.h > Please split Linux kernel header updates into their own patch, so I can easily redo them when necessary. Also, you sent this mail to the wrong mailing lists - it's a QEMU patch :). CC'ing the correct ones. > @@ -449,6 +449,30 @@ struct kvm_ppc_pvinfo { > __u8 pad[108]; > }; > > +/* for KVM_PPC_GET_SMMU_INFO */ > +#define KVM_PPC_PAGE_SIZES_MAX_SZ 8 > + > +struct kvm_ppc_one_page_size { > + __u32 page_shift; /* Page shift (or 0) */ > + __u32 pte_enc; /* Encoding in the HPTE (>>12) */ > +}; > + > +struct kvm_ppc_one_seg_page_size { > + __u32 page_shift; /* Base page shift of segment (or 0) */ > + __u32 slb_enc; /* SLB encoding for BookS */ > + struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > +#define KVM_PPC_PAGE_SIZES_REAL 0x00000001 > +#define KVM_PPC_1T_SEGMENTS 0x00000002 > + > +struct kvm_ppc_smmu_info { > + __u64 flags; > + __u32 slb_size; > + __u32 pad; > + struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > #define KVMIO 0xAE > > /* machine type bits, to be used as argument to KVM_CREATE_VM */ > @@ -789,6 +813,8 @@ struct kvm_s390_ucas_mapping { > /* Available with KVM_CAP_PCI_2_3 */ > #define KVM_ASSIGN_SET_INTX_MASK _IOW(KVMIO, 0xa4, \ > struct kvm_assigned_pci_dev) > +/* Available with KVM_CAP_PPC_GET_SMMU_INFO */ > +#define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa5, struct kvm_ppc_smmu_info) > > /* > * ioctls for vcpu fds > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h > index 84c9674..3ed75ac 100644 > --- a/target-ppc/cpu.h > +++ b/target-ppc/cpu.h > @@ -119,6 +119,8 @@ enum powerpc_mmu_t { > POWERPC_MMU_620 = POWERPC_MMU_64 | 0x00000002, > /* Architecture 2.06 variant */ > POWERPC_MMU_2_06 = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003, > + /* Architecture 2.06 "degraded" (no 1T segments) */ > + POWERPC_MMU_2_06d = POWERPC_MMU_64 | 0x00000003, > #endif /* defined(TARGET_PPC64) */ > }; > > @@ -874,6 +876,29 @@ enum { > #define DBELL_PIRTAG_MASK 0x3fff > > /*****************************************************************************/ > +/* Segment page size information, used by recent hash MMUs > + * The format of this structure mirrors kvm_ppc_smmu_info > + */ > + > +#define PPC_PAGE_SIZES_MAX_SZ 8 > + > +struct ppc_one_page_size { > + uint32_t page_shift; /* Page shift (or 0) */ > + uint32_t pte_enc; /* Encoding in the HPTE (>>12) */ > +}; > + > +struct ppc_one_seg_page_size { > + uint32_t page_shift; /* Base page shift of segment (or 0) */ > + uint32_t slb_enc; /* SLB encoding for BookS */ > + struct ppc_one_page_size enc[PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > +struct ppc_segment_page_sizes { > + struct ppc_one_seg_page_size sps[PPC_PAGE_SIZES_MAX_SZ]; > +}; > + > + > +/*****************************************************************************/ > /* The whole PowerPC CPU context */ > #define NB_MMU_MODES 3 > > @@ -889,6 +914,9 @@ struct ppc_def_t { > powerpc_input_t bus_model; > uint32_t flags; > int bfd_mach; > +#if defined(TARGET_PPC64) > + const struct ppc_segment_page_sizes *sps; > +#endif > void (*init_proc)(CPUPPCState *env); > int (*check_pow)(CPUPPCState *env); > }; > @@ -1012,6 +1040,9 @@ struct CPUPPCState { > uint32_t flags; > uint64_t insns_flags; > uint64_t insns_flags2; > +#if defined(TARGET_PPC64) > + struct ppc_segment_page_sizes sps; > +#endif > > #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY) > target_phys_addr_t vpa; > diff --git a/target-ppc/helper.c b/target-ppc/helper.c > index e97e496..833d948 100644 > --- a/target-ppc/helper.c > +++ b/target-ppc/helper.c > @@ -1617,6 +1617,7 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env) > #if defined(TARGET_PPC64) > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > mmubooks_dump_mmu(f, cpu_fprintf, env); > break; > #endif > @@ -1647,6 +1648,7 @@ static inline int check_physical(CPUPPCState *env, mmu_ctx_t *ctx, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > /* Real address are 60 bits long */ > ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL; > ctx->prot |= PAGE_WRITE; > @@ -1727,6 +1729,7 @@ int get_physical_address (CPUPPCState *env, mmu_ctx_t *ctx, target_ulong eaddr, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > if (ret < 0) { > /* We didn't match any BAT entry or don't have BATs */ > @@ -1867,6 +1870,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > env->exception_index = POWERPC_EXCP_ISI; > env->error_code = 0x40000000; > @@ -1977,6 +1981,7 @@ int cpu_ppc_handle_mmu_fault (CPUPPCState *env, target_ulong address, int rw, > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif > env->exception_index = POWERPC_EXCP_DSI; > env->error_code = 0; > @@ -2299,6 +2304,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env) > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > #endif /* defined(TARGET_PPC64) */ > tlb_flush(env, 1); > break; > @@ -2367,6 +2373,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr) > case POWERPC_MMU_620: > case POWERPC_MMU_64B: > case POWERPC_MMU_2_06: > + case POWERPC_MMU_2_06d: > /* tlbie invalidate TLBs for all segments */ > /* XXX: given the fact that there are too many segments to invalidate, > * and we still don't have a tlb_flush_mask(env, n, mask) in QEMU, > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 0ab7630..2100cb9 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #include > > @@ -167,10 +168,219 @@ static int kvm_booke206_tlb_init(CPUPPCState *env) > return 0; > } > > + > +#if defined(TARGET_PPC64) > +static void kvm_get_fallback_smmu_info(CPUPPCState *env, > + struct kvm_ppc_smmu_info *info) > +{ > + memset(info, 0, sizeof(*info)); > + > + /* We don't have the new KVM_PPC_GET_SMMU_INFO ioctl, so > + * need to "guess" what the supported page sizes are. > + * > + * For that to work we make a few assumptions: > + * > + * - If KVM_CAP_PPC_GET_PVINFO is supported we are running "PR" > + * KVM which only supports 4K and 16M pages, but supports them > + * regardless of the backing store characteritics. We also don't > + * support 1T segments. > + * > + * This is safe as if HV KVM ever supports that capability or PR > + * KVM grows supports for more page/segment sizes, those versions > + * will have implemented KVM_CAP_PPC_GET_SMMU_INFO and thus we > + * will not hit this fallback > + * > + * - Else we are running HV KVM. This means we only support page > + * sizes that fit in the backing store. Additionally we only > + * advertize 64K pages if the processor is ARCH 2.06 and we assume > + * P7 encodings for the SLB and hash table. Here too, we assume > + * support for any newer processor will mean a kernel that > + * implements KVM_CAP_PPC_GET_SMMU_INFO and thus doesn't hit > + * this fallback. > + */ > + if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)) { > + /* No flags */ > + info->flags = 0; > + info->slb_size = 64; > + > + /* Standard 4k base page size segment */ > + info->sps[0].page_shift = 12; > + info->sps[0].slb_enc = 0; > + info->sps[0].enc[0].page_shift = 12; > + info->sps[0].enc[0].pte_enc = 0; > + > + /* Standard 16M large page size segment */ > + info->sps[1].page_shift = 24; > + info->sps[1].slb_enc = SLB_VSID_L; > + info->sps[1].enc[0].page_shift = 24; > + info->sps[1].enc[0].pte_enc = 0; > + } else { > + int i = 0; > + > + /* HV KVM has backing store size restrictions */ > + info->flags = KVM_PPC_PAGE_SIZES_REAL; > + > + if (env->mmu_model & POWERPC_MMU_1TSEG) { > + info->flags |= KVM_PPC_1T_SEGMENTS; > + } > + > + if (env->mmu_model = POWERPC_MMU_2_06) { > + info->slb_size = 32; > + } else { > + info->slb_size = 64; > + } > + > + /* Standard 4k base page size segment */ > + info->sps[i].page_shift = 12; > + info->sps[i].slb_enc = 0; > + info->sps[i].enc[0].page_shift = 12; > + info->sps[i].enc[0].pte_enc = 0; > + i++; > + > + /* 64K on MMU 2.06 */ > + if (env->mmu_model = POWERPC_MMU_2_06) { > + info->sps[i].page_shift = 16; > + info->sps[i].slb_enc = 0x110; > + info->sps[i].enc[0].page_shift = 16; > + info->sps[i].enc[0].pte_enc = 1; > + i++; > + } > + > + /* Standard 16M large page size segment */ > + info->sps[i].page_shift = 24; > + info->sps[i].slb_enc = SLB_VSID_L; > + info->sps[i].enc[0].page_shift = 24; > + info->sps[i].enc[0].pte_enc = 0; > + } > +} > + > +static void kvm_get_smmu_info(CPUPPCState *env, struct kvm_ppc_smmu_info *info) > +{ > +#ifdef KVM_CAP_PPC_GET_SMMU_INFO > No need to for the #ifdef anymore, because we're syncing the headers now. > + int ret; > + > + if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_SMMU_INFO)) { > + ret = kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_SMMU_INFO, info); > + if (ret = 0) { > + return; > + } > + } > +#endif /* KVM_CAP_PPC_GET_SMMU_INFO */ > + > + kvm_get_fallback_smmu_info(env, info); > +} > + > +static long getrampagesize(void) > +{ > + struct statfs fs; > + int ret; > + > + if (!mem_path) { > + /* guest RAM is backed by normal anonymous pages */ > + return getpagesize(); > + } > + > + do { > + ret = statfs(mem_path, &fs); > + } while (ret != 0 && errno = EINTR); > + > + if (ret != 0) { > + fprintf(stderr, "Couldn't statfs() memory path: %s\n", > + strerror(errno)); > + exit(1); > + } > + > +#define HUGETLBFS_MAGIC 0x958458f6 > + > + if (fs.f_type != HUGETLBFS_MAGIC) { > + /* Explicit mempath, but it's ordinary pages */ > + return getpagesize(); > + } > + > + /* It's hugepage, return the huge page size */ > + return fs.f_bsize; > +} > + > +static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift) > +{ > + if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) { > + return true; > + } > + > + return (1ul << shift) <= rampgsize; > +} > + > +static void kvm_fixup_page_sizes(CPUPPCState *env) > +{ > + static struct kvm_ppc_smmu_info smmu_info; > + static bool has_smmu_info; > + long rampagesize; > + int iq, ik, jq, jk; > + > + /* We only handle page sizes for 64-bit server guests for now */ > + if (!(env->mmu_model & POWERPC_MMU_64)) { > + return; > + } > + > + /* Collect MMU info from kernel if not already */ > + if (!has_smmu_info) { > + kvm_get_smmu_info(env, &smmu_info); > + has_smmu_info = true; > + } > + > + rampagesize = getrampagesize(); > + > + /* Convert to QEMU form */ > + memset(&env->sps, 0, sizeof(env->sps)); > + > + for (ik = iq = 0; ik < KVM_PPC_PAGE_SIZES_MAX_SZ; ik++) { > + struct ppc_one_seg_page_size *qsps = &env->sps.sps[iq]; > + struct kvm_ppc_one_seg_page_size *ksps = &smmu_info.sps[ik]; > + > + if (!kvm_valid_page_size(smmu_info.flags, rampagesize, > + ksps->page_shift)) { > + continue; > + } > + qsps->page_shift = ksps->page_shift; > + qsps->slb_enc = ksps->slb_enc; > + for (jk = jq = 0; jk < KVM_PPC_PAGE_SIZES_MAX_SZ; jk++) { > + if (!kvm_valid_page_size(smmu_info.flags, rampagesize, > + ksps->enc[jk].page_shift)) { > + continue; > + } > + qsps->enc[jq].page_shift = ksps->enc[jk].page_shift; > + qsps->enc[jq].pte_enc = ksps->enc[jk].pte_enc; > + if (++jq >= PPC_PAGE_SIZES_MAX_SZ) { > + break; > + } > + } > + if (++iq >= PPC_PAGE_SIZES_MAX_SZ) { > + break; > + } > + } > + env->slb_nr = smmu_info.slb_size; > + if (smmu_info.flags & KVM_PPC_1T_SEGMENTS) { > + env->mmu_model |= POWERPC_MMU_1TSEG; > + } else { > + env->mmu_model &= ~POWERPC_MMU_1TSEG; > + } > +} > +#else /* defined (TARGET_PPC64) */ > + > +static inline void kvm_fixup_page_sizes(CPUPPCState *env) > +{ > +} > + > +#endif /* !defined (TARGET_PPC64) */ > + > int kvm_arch_init_vcpu(CPUPPCState *cenv) > { > int ret; > > + /* Gather server mmu info from KVM and update the CPU state*/ > Missing space? :) Alex