* [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits @ 2014-10-06 20:30 Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-06 20:30 UTC (permalink / raw) To: linux-arm-kernel The following two patches fixup some missing memory handling in KVM/arm64. The first patch supports 48 bit virtual address space which involves supporting a different number of levels of page tables in the host kernel and the stage-2 page tables. The second patch ensures userspace cannot create memory slots with too large IPA space given VTCR_EL2.T0SZ = 24. Finally, we enable 48-bit VA support in Linux. The following host configurations have been tested with KVM on APM Mustang: 1) 4KB + 39 bits VA space 2) 4KB + 48 bits VA space 3) 64KB + 39 bits VA space 4) 64KB + 48 bits VA space Tested on TC2 for regressions. Christoffer Dall (3): arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE arm64: Allow 48-bits VA space without ARM_SMMU arch/arm/include/asm/kvm_mmu.h | 23 +++++++ arch/arm/kvm/arm.c | 2 +- arch/arm/kvm/mmu.c | 140 +++++++++++++++++++++++++++++++-------- arch/arm64/Kconfig | 2 +- arch/arm64/include/asm/kvm_mmu.h | 128 +++++++++++++++++++++++++++++++++-- 5 files changed, 259 insertions(+), 36 deletions(-) -- 2.1.2.330.g565301e.dirty ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-06 20:30 [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Christoffer Dall @ 2014-10-06 20:30 ` Christoffer Dall 2014-10-07 10:48 ` Catalin Marinas 2014-10-07 13:40 ` Marc Zyngier 2014-10-06 20:30 ` [PATCH v2 2/3] arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE Christoffer Dall ` (2 subsequent siblings) 3 siblings, 2 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-06 20:30 UTC (permalink / raw) To: linux-arm-kernel This patch adds the necessary support for all host kernel PGSIZE and VA_SPACE configuration options for both EL2 and the Stage-2 page tables. However, for 40bit and 42bit PARange systems, the architecture mandates that VTCR_EL2.SL0 is maximum 1, resulting in fewer levels of stage-2 pagge tables than levels of host kernel page tables. At the same time, systems with a PARange > 42bit, we limit the IPA range by always setting VTCR_EL2.T0SZ to 24. To solve the situation with different levels of page tables for Stage-2 translation than the host kernel page tables, we allocate a dummy PGD with pointers to our actual inital level Stage-2 page table, in order for us to reuse the kernel pgtable manipulation primitives. Reproducing all these in KVM does not look pretty and unnecessarily complicates the 32-bit side. Systems with a PARange < 40bits are not yet supported. [ I have reworked this patch from its original form submitted by Jungseok to take the architecture constraints into consideration. There were too many changes from the original patch for me to preserve the authorship. Thanks to Catalin Marinas for his help in figuring out a good solution to this challenge. I have also fixed various bugs and missing error code handling from the original patch. - Christoffer ] Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> --- Changes [v1 -> v2]: - Use KVM_PREALLOC_LEVELS directly instead of C-variable indirection - Factored out the config changes to separate patch - Use __GFP_ZERO instead of memset - Fixed error return path in kvm_alloc_stage2_pgd() - Added WARN_ON if pgd_none() returns true - Changed some macro definitions and names arch/arm/include/asm/kvm_mmu.h | 23 +++++++ arch/arm/kvm/arm.c | 2 +- arch/arm/kvm/mmu.c | 132 +++++++++++++++++++++++++++++++-------- arch/arm64/include/asm/kvm_mmu.h | 128 ++++++++++++++++++++++++++++++++++--- 4 files changed, 250 insertions(+), 35 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 3f688b4..dbb3c5c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -37,6 +37,11 @@ */ #define TRAMPOLINE_VA UL(CONFIG_VECTORS_BASE) +/* + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. + */ +#define KVM_MMU_CACHE_MIN_PAGES 2 + #ifndef __ASSEMBLY__ #include <asm/cacheflush.h> @@ -83,6 +88,11 @@ static inline void kvm_clean_pgd(pgd_t *pgd) clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t)); } +static inline void kvm_clean_pmd(pmd_t *pmd) +{ + clean_dcache_area(pmd, PTRS_PER_PMD * sizeof(pmd_t)); +} + static inline void kvm_clean_pmd_entry(pmd_t *pmd) { clean_pmd_entry(pmd); @@ -127,6 +137,19 @@ static inline bool kvm_page_empty(void *ptr) #define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) #define kvm_pud_table_empty(pudp) (0) +#define KVM_PREALLOC_LEVEL 0 + +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) +{ + return 0; +} + +static inline void kvm_free_hwpgd(struct kvm *kvm) { } + +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) +{ + return virt_to_phys(kvm->arch.pgd); +} struct kvm; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 7796051..048f37f 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -409,7 +409,7 @@ static void update_vttbr(struct kvm *kvm) kvm_next_vmid++; /* update vttbr to be used with the new vmid */ - pgd_phys = virt_to_phys(kvm->arch.pgd); + pgd_phys = kvm_get_hwpgd(kvm); BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK); vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK; kvm->arch.vttbr = pgd_phys | vmid; diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index bb06f76..3b3e18f 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -42,7 +42,7 @@ static unsigned long hyp_idmap_start; static unsigned long hyp_idmap_end; static phys_addr_t hyp_idmap_vector; -#define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) +#define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x)) @@ -158,7 +158,7 @@ static void unmap_pmds(struct kvm *kvm, pud_t *pud, } } while (pmd++, addr = next, addr != end); - if (kvm_pmd_table_empty(start_pmd)) + if (kvm_pmd_table_empty(start_pmd) && (!kvm || KVM_PREALLOC_LEVEL < 2)) clear_pud_entry(kvm, pud, start_addr); } @@ -182,7 +182,7 @@ static void unmap_puds(struct kvm *kvm, pgd_t *pgd, } } while (pud++, addr = next, addr != end); - if (kvm_pud_table_empty(start_pud)) + if (kvm_pud_table_empty(start_pud) && (!kvm || KVM_PREALLOC_LEVEL < 1)) clear_pgd_entry(kvm, pgd, start_addr); } @@ -306,7 +306,7 @@ void free_boot_hyp_pgd(void) if (boot_hyp_pgd) { unmap_range(NULL, boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE); unmap_range(NULL, boot_hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE); - free_pages((unsigned long)boot_hyp_pgd, pgd_order); + free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order); boot_hyp_pgd = NULL; } @@ -343,7 +343,7 @@ void free_hyp_pgds(void) for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE) unmap_range(NULL, hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE); - free_pages((unsigned long)hyp_pgd, pgd_order); + free_pages((unsigned long)hyp_pgd, hyp_pgd_order); hyp_pgd = NULL; } @@ -401,13 +401,46 @@ static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start, return 0; } +static int create_hyp_pud_mappings(pgd_t *pgd, unsigned long start, + unsigned long end, unsigned long pfn, + pgprot_t prot) +{ + pud_t *pud; + pmd_t *pmd; + unsigned long addr, next; + int ret; + + addr = start; + do { + pud = pud_offset(pgd, addr); + + if (pud_none_or_clear_bad(pud)) { + pmd = pmd_alloc_one(NULL, addr); + if (!pmd) { + kvm_err("Cannot allocate Hyp pmd\n"); + return -ENOMEM; + } + pud_populate(NULL, pud, pmd); + get_page(virt_to_page(pud)); + kvm_flush_dcache_to_poc(pud, sizeof(*pud)); + } + + next = pud_addr_end(addr, end); + ret = create_hyp_pmd_mappings(pud, addr, next, pfn, prot); + if (ret) + return ret; + pfn += (next - addr) >> PAGE_SHIFT; + } while (addr = next, addr != end); + + return 0; +} + static int __create_hyp_mappings(pgd_t *pgdp, unsigned long start, unsigned long end, unsigned long pfn, pgprot_t prot) { pgd_t *pgd; pud_t *pud; - pmd_t *pmd; unsigned long addr, next; int err = 0; @@ -416,22 +449,21 @@ static int __create_hyp_mappings(pgd_t *pgdp, end = PAGE_ALIGN(end); do { pgd = pgdp + pgd_index(addr); - pud = pud_offset(pgd, addr); - if (pud_none_or_clear_bad(pud)) { - pmd = pmd_alloc_one(NULL, addr); - if (!pmd) { - kvm_err("Cannot allocate Hyp pmd\n"); + if (pgd_none(*pgd)) { + pud = pud_alloc_one(NULL, addr); + if (!pud) { + kvm_err("Cannot allocate Hyp pud\n"); err = -ENOMEM; goto out; } - pud_populate(NULL, pud, pmd); - get_page(virt_to_page(pud)); - kvm_flush_dcache_to_poc(pud, sizeof(*pud)); + pgd_populate(NULL, pgd, pud); + get_page(virt_to_page(pgd)); + kvm_flush_dcache_to_poc(pgd, sizeof(*pgd)); } next = pgd_addr_end(addr, end); - err = create_hyp_pmd_mappings(pud, addr, next, pfn, prot); + err = create_hyp_pud_mappings(pgd, addr, next, pfn, prot); if (err) goto out; pfn += (next - addr) >> PAGE_SHIFT; @@ -521,6 +553,7 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) */ int kvm_alloc_stage2_pgd(struct kvm *kvm) { + int ret; pgd_t *pgd; if (kvm->arch.pgd != NULL) { @@ -528,15 +561,38 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) return -EINVAL; } - pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, S2_PGD_ORDER); + if (KVM_PREALLOC_LEVEL > 0) { + /* + * Allocate fake pgd for the page table manipulation macros to + * work. This is not used by the hardware and we have no + * alignment requirement for this allocation. + */ + pgd = (pgd_t *)kmalloc(PTRS_PER_S2_PGD * sizeof(pgd_t), + GFP_KERNEL | __GFP_ZERO); + } else { + /* + * Allocate actual first-level Stage-2 page table used by the + * hardware for Stage-2 page table walks. + */ + pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, S2_PGD_ORDER); + } + if (!pgd) return -ENOMEM; - memset(pgd, 0, PTRS_PER_S2_PGD * sizeof(pgd_t)); + ret = kvm_prealloc_hwpgd(kvm, pgd); + if (ret) + goto out_err; + kvm_clean_pgd(pgd); kvm->arch.pgd = pgd; - return 0; +out_err: + if (KVM_PREALLOC_LEVEL > 0) + kfree(pgd); + else + free_pages((unsigned long)pgd, S2_PGD_ORDER); + return ret; } /** @@ -572,19 +628,39 @@ void kvm_free_stage2_pgd(struct kvm *kvm) return; unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); - free_pages((unsigned long)kvm->arch.pgd, S2_PGD_ORDER); + kvm_free_hwpgd(kvm); + if (KVM_PREALLOC_LEVEL > 0) + kfree(kvm->arch.pgd); + else + free_pages((unsigned long)kvm->arch.pgd, S2_PGD_ORDER); kvm->arch.pgd = NULL; } -static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, +static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, phys_addr_t addr) { pgd_t *pgd; pud_t *pud; - pmd_t *pmd; pgd = kvm->arch.pgd + pgd_index(addr); - pud = pud_offset(pgd, addr); + if (WARN_ON(pgd_none(*pgd))) { + if (!cache) + return NULL; + pud = mmu_memory_cache_alloc(cache); + pgd_populate(NULL, pgd, pud); + get_page(virt_to_page(pgd)); + } + + return pud_offset(pgd, addr); +} + +static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, + phys_addr_t addr) +{ + pud_t *pud; + pmd_t *pmd; + + pud = stage2_get_pud(kvm, cache, addr); if (pud_none(*pud)) { if (!cache) return NULL; @@ -630,7 +706,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, pmd_t *pmd; pte_t *pte, old_pte; - /* Create stage-2 page table mapping - Level 1 */ + /* Create stage-2 page table mapping - Levels 0 and 1 */ pmd = stage2_get_pmd(kvm, cache, addr); if (!pmd) { /* @@ -688,7 +764,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) { pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE); - ret = mmu_topup_memory_cache(&cache, 2, 2); + ret = mmu_topup_memory_cache(&cache, KVM_MMU_CACHE_MIN_PAGES, + KVM_MMU_CACHE_MIN_PAGES); if (ret) goto out; spin_lock(&kvm->mmu_lock); @@ -797,7 +874,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, up_read(¤t->mm->mmap_sem); /* We need minimum second+third level pages */ - ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS); + ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES, + KVM_NR_MEM_OBJS); if (ret) return ret; @@ -1070,8 +1148,8 @@ int kvm_mmu_init(void) (unsigned long)phys_base); } - hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, pgd_order); - boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, pgd_order); + hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order); + boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order); if (!hyp_pgd || !boot_hyp_pgd) { kvm_err("Hyp mode PGD not allocated\n"); diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index a030d16..df41ae2 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -41,6 +41,18 @@ */ #define TRAMPOLINE_VA (HYP_PAGE_OFFSET_MASK & PAGE_MASK) +/* + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation + * levels in addition to the PGD and potentially the PUD which are + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2 + * tables use one level of tables less than the kernel. + */ +#ifdef CONFIG_ARM64_64K_PAGES +#define KVM_MMU_CACHE_MIN_PAGES 1 +#else +#define KVM_MMU_CACHE_MIN_PAGES 2 +#endif + #ifdef __ASSEMBLY__ /* @@ -53,6 +65,7 @@ #else +#include <asm/pgalloc.h> #include <asm/cachetype.h> #include <asm/cacheflush.h> @@ -65,10 +78,6 @@ #define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) #define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) -/* Make sure we get the right size, and thus the right alignment */ -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) -#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) - int create_hyp_mappings(void *from, void *to); int create_hyp_io_mappings(void *from, void *to, phys_addr_t); void free_boot_hyp_pgd(void); @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void); #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) static inline void kvm_clean_pgd(pgd_t *pgd) {} +static inline void kvm_clean_pmd(pmd_t *pmd) {} static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} static inline void kvm_clean_pte(pte_t *pte) {} static inline void kvm_clean_pte_entry(pte_t *pte) {} @@ -118,13 +128,117 @@ static inline bool kvm_page_empty(void *ptr) } #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep) -#ifndef CONFIG_ARM64_64K_PAGES -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) -#else + +#ifdef __PAGETABLE_PMD_FOLDED #define kvm_pmd_table_empty(pmdp) (0) +#else +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) #endif + +#ifdef __PAGETABLE_PUD_FOLDED #define kvm_pud_table_empty(pudp) (0) +#else +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp) +#endif +/** + * kvm_prealloc_hwpgd - allocate inital table for VTTBR + * @kvm: The KVM struct pointer for the VM. + * @pgd: The kernel pseudo pgd + * + * When the kernel uses more levels of page tables than the guest, we allocate + * a fake PGD and pre-populate it to point to the next-level page table, which + * will be the real initial page table pointed to by the VTTBR. + * + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we + * allocate 2 consecutive PUD pages. + */ +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 +#define KVM_PREALLOC_LEVEL 2 +#define PTRS_PER_S2_PGD 1 +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) + + +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) +{ + pud_t *pud; + pmd_t *pmd; + + pud = pud_offset(pgd, 0); + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); + + if (!pmd) + return -ENOMEM; + pud_populate(NULL, pud, pmd); + + return 0; +} + +static inline void kvm_free_hwpgd(struct kvm *kvm) +{ + pgd_t *pgd = kvm->arch.pgd; + pud_t *pud = pud_offset(pgd, 0); + pmd_t *pmd = pmd_offset(pud, 0); + free_pages((unsigned long)pmd, 0); +} + +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) +{ + pgd_t *pgd = kvm->arch.pgd; + pud_t *pud = pud_offset(pgd, 0); + pmd_t *pmd = pmd_offset(pud, 0); + return virt_to_phys(pmd); + +} +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 +#define KVM_PREALLOC_LEVEL 1 +#define PTRS_PER_S2_PGD 2 +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) + +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) +{ + pud_t *pud; + + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); + if (!pud) + return -ENOMEM; + pgd_populate(NULL, pgd, pud); + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); + + return 0; +} + +static inline void kvm_free_hwpgd(struct kvm *kvm) +{ + pgd_t *pgd = kvm->arch.pgd; + pud_t *pud = pud_offset(pgd, 0); + free_pages((unsigned long)pud, 1); +} + +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) +{ + pgd_t *pgd = kvm->arch.pgd; + pud_t *pud = pud_offset(pgd, 0); + return virt_to_phys(pud); +} +#else +#define KVM_PREALLOC_LEVEL 0 +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) + +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) +{ + return 0; +} + +static inline void kvm_free_hwpgd(struct kvm *kvm) { } + +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) +{ + return virt_to_phys(kvm->arch.pgd); +} +#endif struct kvm; -- 2.1.2.330.g565301e.dirty ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall @ 2014-10-07 10:48 ` Catalin Marinas 2014-10-07 13:28 ` Marc Zyngier 2014-10-07 13:40 ` Marc Zyngier 1 sibling, 1 reply; 18+ messages in thread From: Catalin Marinas @ 2014-10-07 10:48 UTC (permalink / raw) To: linux-arm-kernel On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote: > +/** > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > + * @kvm: The KVM struct pointer for the VM. > + * @pgd: The kernel pseudo pgd > + * > + * When the kernel uses more levels of page tables than the guest, we allocate > + * a fake PGD and pre-populate it to point to the next-level page table, which > + * will be the real initial page table pointed to by the VTTBR. > + * > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > + * allocate 2 consecutive PUD pages. > + */ > +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 > +#define KVM_PREALLOC_LEVEL 2 > +#define PTRS_PER_S2_PGD 1 > +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) I agree that my magic equation wasn't readable ;) (I had troubles re-understanding it as well), but you also have some constants here that are not immediately obvious where you got to them from. IIUC, KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands stage 2 pmd and pte. I guess you could look into the ARM ARM tables but it's still not clear. Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was: #if PGDIR_SHIFT > KVM_PHYS_SHIFT #define PTRS_PER_S2_PGD (1) #else #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) #endif In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K and 4 levels case below is also correct. The KVM start level calculation, we could assume that KVM needs either host levels or host levels - 1 (unless we go for some weirdly small KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as: #if PTRS_PER_S2_PGD <= 16 #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) #else #define KVM_PREALLOC_LEVEL (0) #endif Basically if you can concatenate 16 or less pages@the level below the top, the architecture does not allow a small top level. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the host and we add 1 to go to the next level for KVM stage 2 when PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to preallocate. > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > +{ > + pud_t *pud; > + pmd_t *pmd; > + > + pud = pud_offset(pgd, 0); > + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); > + > + if (!pmd) > + return -ENOMEM; > + pud_populate(NULL, pud, pmd); > + > + return 0; > +} > + > +static inline void kvm_free_hwpgd(struct kvm *kvm) > +{ > + pgd_t *pgd = kvm->arch.pgd; > + pud_t *pud = pud_offset(pgd, 0); > + pmd_t *pmd = pmd_offset(pud, 0); > + free_pages((unsigned long)pmd, 0); > +} > + > +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) > +{ > + pgd_t *pgd = kvm->arch.pgd; > + pud_t *pud = pud_offset(pgd, 0); > + pmd_t *pmd = pmd_offset(pud, 0); > + return virt_to_phys(pmd); > + > +} > +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 > +#define KVM_PREALLOC_LEVEL 1 > +#define PTRS_PER_S2_PGD 2 > +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39)) which is 2 and KVM_PREALLOC_LEVEL == 1. > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > +{ > + pud_t *pud; > + > + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); > + if (!pud) > + return -ENOMEM; > + pgd_populate(NULL, pgd, pud); > + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); > + > + return 0; > +} You still need to define these functions but you can make their implementation dependent solely on the KVM_PREALLOC_LEVEL rather than 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you allocate pud and populate the pgds (in a loop based on the PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud (still in a loop though it would probably be 1 iteration). We know based on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and CONFIG_ARM64_PGTABLE_LEVELS == 4. -- Catalin ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-07 10:48 ` Catalin Marinas @ 2014-10-07 13:28 ` Marc Zyngier 2014-10-07 19:39 ` Christoffer Dall 0 siblings, 1 reply; 18+ messages in thread From: Marc Zyngier @ 2014-10-07 13:28 UTC (permalink / raw) To: linux-arm-kernel On 07/10/14 11:48, Catalin Marinas wrote: > On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote: >> +/** >> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR >> + * @kvm: The KVM struct pointer for the VM. >> + * @pgd: The kernel pseudo pgd >> + * >> + * When the kernel uses more levels of page tables than the guest, we allocate >> + * a fake PGD and pre-populate it to point to the next-level page table, which >> + * will be the real initial page table pointed to by the VTTBR. >> + * >> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and >> + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we >> + * allocate 2 consecutive PUD pages. >> + */ >> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 >> +#define KVM_PREALLOC_LEVEL 2 >> +#define PTRS_PER_S2_PGD 1 >> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > I agree that my magic equation wasn't readable ;) (I had troubles > re-understanding it as well), but you also have some constants here that > are not immediately obvious where you got to them from. IIUC, > KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands > stage 2 pmd and pte. I guess you could look into the ARM ARM tables but > it's still not clear. > > Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was: > > #if PGDIR_SHIFT > KVM_PHYS_SHIFT > #define PTRS_PER_S2_PGD (1) > #else > #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > #endif > > In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K > and 4 levels case below is also correct. > > The KVM start level calculation, we could assume that KVM needs either > host levels or host levels - 1 (unless we go for some weirdly small > KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as: > > #if PTRS_PER_S2_PGD <= 16 > #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > #else > #define KVM_PREALLOC_LEVEL (0) > #endif > > Basically if you can concatenate 16 or less pages at the level below the > top, the architecture does not allow a small top level. In this case, > (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the > host and we add 1 to go to the next level for KVM stage 2 when > PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to > preallocate. I think this makes the whole thing clearer (at least for me), as it makes the relationship between KVM_PREALLOC_LEVEL and CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me initially). >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >> +{ >> + pud_t *pud; >> + pmd_t *pmd; >> + >> + pud = pud_offset(pgd, 0); >> + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); >> + >> + if (!pmd) >> + return -ENOMEM; >> + pud_populate(NULL, pud, pmd); >> + >> + return 0; >> +} >> + >> +static inline void kvm_free_hwpgd(struct kvm *kvm) >> +{ >> + pgd_t *pgd = kvm->arch.pgd; >> + pud_t *pud = pud_offset(pgd, 0); >> + pmd_t *pmd = pmd_offset(pud, 0); >> + free_pages((unsigned long)pmd, 0); >> +} >> + >> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) >> +{ >> + pgd_t *pgd = kvm->arch.pgd; >> + pud_t *pud = pud_offset(pgd, 0); >> + pmd_t *pmd = pmd_offset(pud, 0); >> + return virt_to_phys(pmd); >> + >> +} >> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 >> +#define KVM_PREALLOC_LEVEL 1 >> +#define PTRS_PER_S2_PGD 2 >> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39)) > which is 2 and KVM_PREALLOC_LEVEL == 1. > >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >> +{ >> + pud_t *pud; >> + >> + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); >> + if (!pud) >> + return -ENOMEM; >> + pgd_populate(NULL, pgd, pud); >> + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); >> + >> + return 0; >> +} > > You still need to define these functions but you can make their > implementation dependent solely on the KVM_PREALLOC_LEVEL rather than > 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you > allocate pud and populate the pgds (in a loop based on the > PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud > (still in a loop though it would probably be 1 iteration). We know based > on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and > CONFIG_ARM64_PGTABLE_LEVELS == 4. > Also agreed. Most of what you wrote here could also be gathered as comments in the patch. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-07 13:28 ` Marc Zyngier @ 2014-10-07 19:39 ` Christoffer Dall 2014-10-08 9:34 ` Marc Zyngier 2014-10-08 9:47 ` Catalin Marinas 0 siblings, 2 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-07 19:39 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote: > On 07/10/14 11:48, Catalin Marinas wrote: > > On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote: > >> +/** > >> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > >> + * @kvm: The KVM struct pointer for the VM. > >> + * @pgd: The kernel pseudo pgd > >> + * > >> + * When the kernel uses more levels of page tables than the guest, we allocate > >> + * a fake PGD and pre-populate it to point to the next-level page table, which > >> + * will be the real initial page table pointed to by the VTTBR. > >> + * > >> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > >> + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > >> + * allocate 2 consecutive PUD pages. > >> + */ > >> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 > >> +#define KVM_PREALLOC_LEVEL 2 > >> +#define PTRS_PER_S2_PGD 1 > >> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > > > I agree that my magic equation wasn't readable ;) (I had troubles > > re-understanding it as well), but you also have some constants here that > > are not immediately obvious where you got to them from. IIUC, > > KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands > > stage 2 pmd and pte. I guess you could look into the ARM ARM tables but > > it's still not clear. > > > > Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was: > > > > #if PGDIR_SHIFT > KVM_PHYS_SHIFT > > #define PTRS_PER_S2_PGD (1) > > #else > > #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > > #endif > > > > In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K > > and 4 levels case below is also correct. > > > > The KVM start level calculation, we could assume that KVM needs either > > host levels or host levels - 1 (unless we go for some weirdly small > > KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as: > > > > #if PTRS_PER_S2_PGD <= 16 > > #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > > #else > > #define KVM_PREALLOC_LEVEL (0) > > #endif > > > > Basically if you can concatenate 16 or less pages at the level below the > > top, the architecture does not allow a small top level. In this case, > > (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the > > host and we add 1 to go to the next level for KVM stage 2 when > > PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to > > preallocate. > > I think this makes the whole thing clearer (at least for me), as it > makes the relationship between KVM_PREALLOC_LEVEL and > CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me > initially). Agreed. > > >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > >> +{ > >> + pud_t *pud; > >> + pmd_t *pmd; > >> + > >> + pud = pud_offset(pgd, 0); > >> + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); > >> + > >> + if (!pmd) > >> + return -ENOMEM; > >> + pud_populate(NULL, pud, pmd); > >> + > >> + return 0; > >> +} > >> + > >> +static inline void kvm_free_hwpgd(struct kvm *kvm) > >> +{ > >> + pgd_t *pgd = kvm->arch.pgd; > >> + pud_t *pud = pud_offset(pgd, 0); > >> + pmd_t *pmd = pmd_offset(pud, 0); > >> + free_pages((unsigned long)pmd, 0); > >> +} > >> + > >> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) > >> +{ > >> + pgd_t *pgd = kvm->arch.pgd; > >> + pud_t *pud = pud_offset(pgd, 0); > >> + pmd_t *pmd = pmd_offset(pud, 0); > >> + return virt_to_phys(pmd); > >> + > >> +} > >> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 > >> +#define KVM_PREALLOC_LEVEL 1 > >> +#define PTRS_PER_S2_PGD 2 > >> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > > > Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39)) > > which is 2 and KVM_PREALLOC_LEVEL == 1. > > > >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > >> +{ > >> + pud_t *pud; > >> + > >> + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); > >> + if (!pud) > >> + return -ENOMEM; > >> + pgd_populate(NULL, pgd, pud); > >> + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); > >> + > >> + return 0; > >> +} > > > > You still need to define these functions but you can make their > > implementation dependent solely on the KVM_PREALLOC_LEVEL rather than > > 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you > > allocate pud and populate the pgds (in a loop based on the > > PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud > > (still in a loop though it would probably be 1 iteration). We know based > > on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and > > CONFIG_ARM64_PGTABLE_LEVELS == 4. > > > > Also agreed. Most of what you wrote here could also be gathered as > comments in the patch. > Yes, I reworded some of the text slightly as comments for the next version of the patch. However, I'm not sure I have a clear idea of how you'd like these functions to look like. I came up with the following based on your feedback, but I personally don't find it a lot easier to read than what I had already. Suggestions are welcome: diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index a030d16..7941a51 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -41,6 +41,18 @@ */ #define TRAMPOLINE_VA (HYP_PAGE_OFFSET_MASK & PAGE_MASK) +/* + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation + * levels in addition to the PGD and potentially the PUD which are + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2 + * tables use one level of tables less than the kernel. + */ +#ifdef CONFIG_ARM64_64K_PAGES +#define KVM_MMU_CACHE_MIN_PAGES 1 +#else +#define KVM_MMU_CACHE_MIN_PAGES 2 +#endif + #ifdef __ASSEMBLY__ /* @@ -53,6 +65,7 @@ #else +#include <asm/pgalloc.h> #include <asm/cachetype.h> #include <asm/cacheflush.h> @@ -65,10 +78,6 @@ #define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) #define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) -/* Make sure we get the right size, and thus the right alignment */ -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) -#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) - int create_hyp_mappings(void *from, void *to); int create_hyp_io_mappings(void *from, void *to, phys_addr_t); void free_boot_hyp_pgd(void); @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void); #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) static inline void kvm_clean_pgd(pgd_t *pgd) {} +static inline void kvm_clean_pmd(pmd_t *pmd) {} static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} static inline void kvm_clean_pte(pte_t *pte) {} static inline void kvm_clean_pte_entry(pte_t *pte) {} @@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr) } #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep) -#ifndef CONFIG_ARM64_64K_PAGES -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) -#else + +#ifdef __PAGETABLE_PMD_FOLDED #define kvm_pmd_table_empty(pmdp) (0) +#else +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) #endif + +#ifdef __PAGETABLE_PUD_FOLDED #define kvm_pud_table_empty(pudp) (0) +#else +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp) +#endif + +/* + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address + * the entire IPA input range with a single pgd entry, and we would only need + * one pgd entry. + */ +#if PGDIR_SHIFT > KVM_PHYS_SHIFT +#define PTRS_PER_S2_PGD (1) +#else +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) +#endif +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) +/* + * If we are concatenating first level stage-2 page tables, we would have less + * than or equal to 16 pointers in the fake PGD, because that's what the + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) + * represents the first level for the host, and we add 1 to go to the next + * level (which uses contatenation) for the stage-2 tables. + */ +#if PTRS_PER_S2_PGD <= 16 +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) +#else +#define KVM_PREALLOC_LEVEL (0) +#endif + +/** + * kvm_prealloc_hwpgd - allocate inital table for VTTBR + * @kvm: The KVM struct pointer for the VM. + * @pgd: The kernel pseudo pgd + * + * When the kernel uses more levels of page tables than the guest, we allocate + * a fake PGD and pre-populate it to point to the next-level page table, which + * will be the real initial page table pointed to by the VTTBR. + * + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we + * allocate 2 consecutive PUD pages. + */ +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) +{ + pud_t *pud; + pmd_t *pmd; + unsigned int order, i; + unsigned long hwpgd; + + if (KVM_PREALLOC_LEVEL == 0) + return 0; + + order = get_order(PTRS_PER_S2_PGD); + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); + if (!hwpgd) + return -ENOMEM; + + if (KVM_PREALLOC_LEVEL == 1) { + pud = (pud_t *)hwpgd; + for (i = 0; i < PTRS_PER_S2_PGD; i++) + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD); + } else if (KVM_PREALLOC_LEVEL == 2) { + pud = pud_offset(pgd, 0); + pmd = (pmd_t *)hwpgd; + for (i = 0; i < PTRS_PER_S2_PGD; i++) + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD); + } + + return 0; +} + +static inline void *kvm_get_hwpgd(struct kvm *kvm) +{ + pgd_t *pgd = kvm->arch.pgd; + pud_t *pud; + pmd_t *pmd; + + switch (KVM_PREALLOC_LEVEL) { + case 0: + return pgd; + case 1: + pud = pud_offset(pgd, 0); + return pud; + case 2: + pud = pud_offset(pgd, 0); + pmd = pmd_offset(pud, 0); + return pmd; + default: + BUG(); + return NULL; + } +} + +static inline void kvm_free_hwpgd(struct kvm *kvm) +{ + if (KVM_PREALLOC_LEVEL > 0) { + unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm); + free_pages(hwpgd, get_order(S2_PGD_ORDER)); + } +} struct kvm; Thanks, -Christoffer ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-07 19:39 ` Christoffer Dall @ 2014-10-08 9:34 ` Marc Zyngier 2014-10-08 9:47 ` Christoffer Dall 2014-10-08 9:47 ` Catalin Marinas 1 sibling, 1 reply; 18+ messages in thread From: Marc Zyngier @ 2014-10-08 9:34 UTC (permalink / raw) To: linux-arm-kernel On 07/10/14 20:39, Christoffer Dall wrote: > On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote: >> On 07/10/14 11:48, Catalin Marinas wrote: >>> On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote: >>>> +/** >>>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR >>>> + * @kvm: The KVM struct pointer for the VM. >>>> + * @pgd: The kernel pseudo pgd >>>> + * >>>> + * When the kernel uses more levels of page tables than the guest, we allocate >>>> + * a fake PGD and pre-populate it to point to the next-level page table, which >>>> + * will be the real initial page table pointed to by the VTTBR. >>>> + * >>>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and >>>> + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we >>>> + * allocate 2 consecutive PUD pages. >>>> + */ >>>> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 >>>> +#define KVM_PREALLOC_LEVEL 2 >>>> +#define PTRS_PER_S2_PGD 1 >>>> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) >>> >>> I agree that my magic equation wasn't readable ;) (I had troubles >>> re-understanding it as well), but you also have some constants here that >>> are not immediately obvious where you got to them from. IIUC, >>> KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands >>> stage 2 pmd and pte. I guess you could look into the ARM ARM tables but >>> it's still not clear. >>> >>> Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was: >>> >>> #if PGDIR_SHIFT > KVM_PHYS_SHIFT >>> #define PTRS_PER_S2_PGD (1) >>> #else >>> #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) >>> #endif >>> >>> In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K >>> and 4 levels case below is also correct. >>> >>> The KVM start level calculation, we could assume that KVM needs either >>> host levels or host levels - 1 (unless we go for some weirdly small >>> KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as: >>> >>> #if PTRS_PER_S2_PGD <= 16 >>> #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) >>> #else >>> #define KVM_PREALLOC_LEVEL (0) >>> #endif >>> >>> Basically if you can concatenate 16 or less pages at the level below the >>> top, the architecture does not allow a small top level. In this case, >>> (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the >>> host and we add 1 to go to the next level for KVM stage 2 when >>> PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to >>> preallocate. >> >> I think this makes the whole thing clearer (at least for me), as it >> makes the relationship between KVM_PREALLOC_LEVEL and >> CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me >> initially). > > Agreed. > >> >>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >>>> +{ >>>> + pud_t *pud; >>>> + pmd_t *pmd; >>>> + >>>> + pud = pud_offset(pgd, 0); >>>> + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); >>>> + >>>> + if (!pmd) >>>> + return -ENOMEM; >>>> + pud_populate(NULL, pud, pmd); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static inline void kvm_free_hwpgd(struct kvm *kvm) >>>> +{ >>>> + pgd_t *pgd = kvm->arch.pgd; >>>> + pud_t *pud = pud_offset(pgd, 0); >>>> + pmd_t *pmd = pmd_offset(pud, 0); >>>> + free_pages((unsigned long)pmd, 0); >>>> +} >>>> + >>>> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) >>>> +{ >>>> + pgd_t *pgd = kvm->arch.pgd; >>>> + pud_t *pud = pud_offset(pgd, 0); >>>> + pmd_t *pmd = pmd_offset(pud, 0); >>>> + return virt_to_phys(pmd); >>>> + >>>> +} >>>> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 >>>> +#define KVM_PREALLOC_LEVEL 1 >>>> +#define PTRS_PER_S2_PGD 2 >>>> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) >>> >>> Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39)) >>> which is 2 and KVM_PREALLOC_LEVEL == 1. >>> >>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >>>> +{ >>>> + pud_t *pud; >>>> + >>>> + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); >>>> + if (!pud) >>>> + return -ENOMEM; >>>> + pgd_populate(NULL, pgd, pud); >>>> + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); >>>> + >>>> + return 0; >>>> +} >>> >>> You still need to define these functions but you can make their >>> implementation dependent solely on the KVM_PREALLOC_LEVEL rather than >>> 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you >>> allocate pud and populate the pgds (in a loop based on the >>> PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud >>> (still in a loop though it would probably be 1 iteration). We know based >>> on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and >>> CONFIG_ARM64_PGTABLE_LEVELS == 4. >>> >> >> Also agreed. Most of what you wrote here could also be gathered as >> comments in the patch. >> > Yes, I reworded some of the text slightly as comments for the next > version of the patch. > > However, I'm not sure I have a clear idea of how you'd like these > functions to look like. > > I came up with the following based on your feedback, but I personally > don't find it a lot easier to read than what I had already. Suggestions > are welcome: > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index a030d16..7941a51 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -41,6 +41,18 @@ > */ > #define TRAMPOLINE_VA (HYP_PAGE_OFFSET_MASK & PAGE_MASK) > > +/* > + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation > + * levels in addition to the PGD and potentially the PUD which are > + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2 > + * tables use one level of tables less than the kernel. > + */ > +#ifdef CONFIG_ARM64_64K_PAGES > +#define KVM_MMU_CACHE_MIN_PAGES 1 > +#else > +#define KVM_MMU_CACHE_MIN_PAGES 2 > +#endif > + > #ifdef __ASSEMBLY__ > > /* > @@ -53,6 +65,7 @@ > > #else > > +#include <asm/pgalloc.h> > #include <asm/cachetype.h> > #include <asm/cacheflush.h> > > @@ -65,10 +78,6 @@ > #define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) > #define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) > > -/* Make sure we get the right size, and thus the right alignment */ > -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > -#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > - > int create_hyp_mappings(void *from, void *to); > int create_hyp_io_mappings(void *from, void *to, phys_addr_t); > void free_boot_hyp_pgd(void); > @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void); > #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) > > static inline void kvm_clean_pgd(pgd_t *pgd) {} > +static inline void kvm_clean_pmd(pmd_t *pmd) {} > static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} > static inline void kvm_clean_pte(pte_t *pte) {} > static inline void kvm_clean_pte_entry(pte_t *pte) {} > @@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr) > } > > #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep) > -#ifndef CONFIG_ARM64_64K_PAGES > -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) > -#else > + > +#ifdef __PAGETABLE_PMD_FOLDED > #define kvm_pmd_table_empty(pmdp) (0) > +#else > +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) > #endif > + > +#ifdef __PAGETABLE_PUD_FOLDED > #define kvm_pud_table_empty(pudp) (0) > +#else > +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp) > +#endif > + > +/* > + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address > + * the entire IPA input range with a single pgd entry, and we would only need > + * one pgd entry. > + */ > +#if PGDIR_SHIFT > KVM_PHYS_SHIFT > +#define PTRS_PER_S2_PGD (1) > +#else > +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > +#endif > +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > +/* > + * If we are concatenating first level stage-2 page tables, we would have less > + * than or equal to 16 pointers in the fake PGD, because that's what the > + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) > + * represents the first level for the host, and we add 1 to go to the next > + * level (which uses contatenation) for the stage-2 tables. > + */ > +#if PTRS_PER_S2_PGD <= 16 > +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > +#else > +#define KVM_PREALLOC_LEVEL (0) > +#endif > + > +/** > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > + * @kvm: The KVM struct pointer for the VM. > + * @pgd: The kernel pseudo pgd > + * > + * When the kernel uses more levels of page tables than the guest, we allocate > + * a fake PGD and pre-populate it to point to the next-level page table, which > + * will be the real initial page table pointed to by the VTTBR. > + * > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > + * allocate 2 consecutive PUD pages. > + */ > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > +{ > + pud_t *pud; > + pmd_t *pmd; > + unsigned int order, i; > + unsigned long hwpgd; > + > + if (KVM_PREALLOC_LEVEL == 0) > + return 0; > + > + order = get_order(PTRS_PER_S2_PGD); S2_PGD_ORDER instead? Otherwise, that doesn't seem quite right... > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > + if (!hwpgd) > + return -ENOMEM; > + > + if (KVM_PREALLOC_LEVEL == 1) { > + pud = (pud_t *)hwpgd; > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD); > + } else if (KVM_PREALLOC_LEVEL == 2) { > + pud = pud_offset(pgd, 0); > + pmd = (pmd_t *)hwpgd; > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD); > + } > + > + return 0; Shouldn't we return an error here instead? Or BUG()? > +} > + > +static inline void *kvm_get_hwpgd(struct kvm *kvm) > +{ > + pgd_t *pgd = kvm->arch.pgd; > + pud_t *pud; > + pmd_t *pmd; > + > + switch (KVM_PREALLOC_LEVEL) { > + case 0: > + return pgd; > + case 1: > + pud = pud_offset(pgd, 0); > + return pud; > + case 2: > + pud = pud_offset(pgd, 0); > + pmd = pmd_offset(pud, 0); > + return pmd; > + default: > + BUG(); > + return NULL; > + } > +} > + > +static inline void kvm_free_hwpgd(struct kvm *kvm) > +{ > + if (KVM_PREALLOC_LEVEL > 0) { > + unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm); > + free_pages(hwpgd, get_order(S2_PGD_ORDER)); Isn't the get_order() a bit wrong here? I'd expect S2_PGD_ORDER to be what we need already... > + } > +} I personally like this version more (Catalin may have a different opinion ;-). Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-08 9:34 ` Marc Zyngier @ 2014-10-08 9:47 ` Christoffer Dall 2014-10-08 10:27 ` Marc Zyngier 0 siblings, 1 reply; 18+ messages in thread From: Christoffer Dall @ 2014-10-08 9:47 UTC (permalink / raw) To: linux-arm-kernel On Wed, Oct 08, 2014 at 10:34:31AM +0100, Marc Zyngier wrote: > On 07/10/14 20:39, Christoffer Dall wrote: > > On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote: > >> On 07/10/14 11:48, Catalin Marinas wrote: > >>> On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote: > >>>> +/** > >>>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > >>>> + * @kvm: The KVM struct pointer for the VM. > >>>> + * @pgd: The kernel pseudo pgd > >>>> + * > >>>> + * When the kernel uses more levels of page tables than the guest, we allocate > >>>> + * a fake PGD and pre-populate it to point to the next-level page table, which > >>>> + * will be the real initial page table pointed to by the VTTBR. > >>>> + * > >>>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > >>>> + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > >>>> + * allocate 2 consecutive PUD pages. > >>>> + */ > >>>> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 > >>>> +#define KVM_PREALLOC_LEVEL 2 > >>>> +#define PTRS_PER_S2_PGD 1 > >>>> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > >>> > >>> I agree that my magic equation wasn't readable ;) (I had troubles > >>> re-understanding it as well), but you also have some constants here that > >>> are not immediately obvious where you got to them from. IIUC, > >>> KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands > >>> stage 2 pmd and pte. I guess you could look into the ARM ARM tables but > >>> it's still not clear. > >>> > >>> Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was: > >>> > >>> #if PGDIR_SHIFT > KVM_PHYS_SHIFT > >>> #define PTRS_PER_S2_PGD (1) > >>> #else > >>> #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > >>> #endif > >>> > >>> In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K > >>> and 4 levels case below is also correct. > >>> > >>> The KVM start level calculation, we could assume that KVM needs either > >>> host levels or host levels - 1 (unless we go for some weirdly small > >>> KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as: > >>> > >>> #if PTRS_PER_S2_PGD <= 16 > >>> #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > >>> #else > >>> #define KVM_PREALLOC_LEVEL (0) > >>> #endif > >>> > >>> Basically if you can concatenate 16 or less pages at the level below the > >>> top, the architecture does not allow a small top level. In this case, > >>> (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the > >>> host and we add 1 to go to the next level for KVM stage 2 when > >>> PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to > >>> preallocate. > >> > >> I think this makes the whole thing clearer (at least for me), as it > >> makes the relationship between KVM_PREALLOC_LEVEL and > >> CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me > >> initially). > > > > Agreed. > > > >> > >>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > >>>> +{ > >>>> + pud_t *pud; > >>>> + pmd_t *pmd; > >>>> + > >>>> + pud = pud_offset(pgd, 0); > >>>> + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); > >>>> + > >>>> + if (!pmd) > >>>> + return -ENOMEM; > >>>> + pud_populate(NULL, pud, pmd); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static inline void kvm_free_hwpgd(struct kvm *kvm) > >>>> +{ > >>>> + pgd_t *pgd = kvm->arch.pgd; > >>>> + pud_t *pud = pud_offset(pgd, 0); > >>>> + pmd_t *pmd = pmd_offset(pud, 0); > >>>> + free_pages((unsigned long)pmd, 0); > >>>> +} > >>>> + > >>>> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) > >>>> +{ > >>>> + pgd_t *pgd = kvm->arch.pgd; > >>>> + pud_t *pud = pud_offset(pgd, 0); > >>>> + pmd_t *pmd = pmd_offset(pud, 0); > >>>> + return virt_to_phys(pmd); > >>>> + > >>>> +} > >>>> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 > >>>> +#define KVM_PREALLOC_LEVEL 1 > >>>> +#define PTRS_PER_S2_PGD 2 > >>>> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > >>> > >>> Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39)) > >>> which is 2 and KVM_PREALLOC_LEVEL == 1. > >>> > >>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > >>>> +{ > >>>> + pud_t *pud; > >>>> + > >>>> + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); > >>>> + if (!pud) > >>>> + return -ENOMEM; > >>>> + pgd_populate(NULL, pgd, pud); > >>>> + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); > >>>> + > >>>> + return 0; > >>>> +} > >>> > >>> You still need to define these functions but you can make their > >>> implementation dependent solely on the KVM_PREALLOC_LEVEL rather than > >>> 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you > >>> allocate pud and populate the pgds (in a loop based on the > >>> PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud > >>> (still in a loop though it would probably be 1 iteration). We know based > >>> on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and > >>> CONFIG_ARM64_PGTABLE_LEVELS == 4. > >>> > >> > >> Also agreed. Most of what you wrote here could also be gathered as > >> comments in the patch. > >> > > Yes, I reworded some of the text slightly as comments for the next > > version of the patch. > > > > However, I'm not sure I have a clear idea of how you'd like these > > functions to look like. > > > > I came up with the following based on your feedback, but I personally > > don't find it a lot easier to read than what I had already. Suggestions > > are welcome: > > > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > > index a030d16..7941a51 100644 > > --- a/arch/arm64/include/asm/kvm_mmu.h > > +++ b/arch/arm64/include/asm/kvm_mmu.h > > @@ -41,6 +41,18 @@ > > */ > > #define TRAMPOLINE_VA (HYP_PAGE_OFFSET_MASK & PAGE_MASK) > > > > +/* > > + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation > > + * levels in addition to the PGD and potentially the PUD which are > > + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2 > > + * tables use one level of tables less than the kernel. > > + */ > > +#ifdef CONFIG_ARM64_64K_PAGES > > +#define KVM_MMU_CACHE_MIN_PAGES 1 > > +#else > > +#define KVM_MMU_CACHE_MIN_PAGES 2 > > +#endif > > + > > #ifdef __ASSEMBLY__ > > > > /* > > @@ -53,6 +65,7 @@ > > > > #else > > > > +#include <asm/pgalloc.h> > > #include <asm/cachetype.h> > > #include <asm/cacheflush.h> > > > > @@ -65,10 +78,6 @@ > > #define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) > > #define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) > > > > -/* Make sure we get the right size, and thus the right alignment */ > > -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > > -#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > - > > int create_hyp_mappings(void *from, void *to); > > int create_hyp_io_mappings(void *from, void *to, phys_addr_t); > > void free_boot_hyp_pgd(void); > > @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void); > > #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) > > > > static inline void kvm_clean_pgd(pgd_t *pgd) {} > > +static inline void kvm_clean_pmd(pmd_t *pmd) {} > > static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} > > static inline void kvm_clean_pte(pte_t *pte) {} > > static inline void kvm_clean_pte_entry(pte_t *pte) {} > > @@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr) > > } > > > > #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep) > > -#ifndef CONFIG_ARM64_64K_PAGES > > -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) > > -#else > > + > > +#ifdef __PAGETABLE_PMD_FOLDED > > #define kvm_pmd_table_empty(pmdp) (0) > > +#else > > +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) > > #endif > > + > > +#ifdef __PAGETABLE_PUD_FOLDED > > #define kvm_pud_table_empty(pudp) (0) > > +#else > > +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp) > > +#endif > > + > > +/* > > + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address > > + * the entire IPA input range with a single pgd entry, and we would only need > > + * one pgd entry. > > + */ > > +#if PGDIR_SHIFT > KVM_PHYS_SHIFT > > +#define PTRS_PER_S2_PGD (1) > > +#else > > +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > > +#endif > > +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > > > +/* > > + * If we are concatenating first level stage-2 page tables, we would have less > > + * than or equal to 16 pointers in the fake PGD, because that's what the > > + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) > > + * represents the first level for the host, and we add 1 to go to the next > > + * level (which uses contatenation) for the stage-2 tables. > > + */ > > +#if PTRS_PER_S2_PGD <= 16 > > +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > > +#else > > +#define KVM_PREALLOC_LEVEL (0) > > +#endif > > + > > +/** > > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > > + * @kvm: The KVM struct pointer for the VM. > > + * @pgd: The kernel pseudo pgd > > + * > > + * When the kernel uses more levels of page tables than the guest, we allocate > > + * a fake PGD and pre-populate it to point to the next-level page table, which > > + * will be the real initial page table pointed to by the VTTBR. > > + * > > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > > + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > > + * allocate 2 consecutive PUD pages. > > + */ > > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > > +{ > > + pud_t *pud; > > + pmd_t *pmd; > > + unsigned int order, i; > > + unsigned long hwpgd; > > + > > + if (KVM_PREALLOC_LEVEL == 0) > > + return 0; > > + > > + order = get_order(PTRS_PER_S2_PGD); > > S2_PGD_ORDER instead? Otherwise, that doesn't seem quite right... > no, S2_PGD_ORDER is always the order of the PGD (in linux macro-world-pgd-terms) that we allocate currently. Of course, we could rework that like this: #if PTRS_PER_S2_PGD <= 16 #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) #define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD) #else #define KVM_PREALLOC_LEVEL (0) #define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) #endif > > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > > + if (!hwpgd) > > + return -ENOMEM; > > + > > + if (KVM_PREALLOC_LEVEL == 1) { > > + pud = (pud_t *)hwpgd; > > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > > + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD); > > + } else if (KVM_PREALLOC_LEVEL == 2) { > > + pud = pud_offset(pgd, 0); > > + pmd = (pmd_t *)hwpgd; > > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > > + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD); > > + } > > + > > + return 0; > > Shouldn't we return an error here instead? Or BUG()? > yes, should never happen. > > +} > > + > > +static inline void *kvm_get_hwpgd(struct kvm *kvm) > > +{ > > + pgd_t *pgd = kvm->arch.pgd; > > + pud_t *pud; > > + pmd_t *pmd; > > + > > + switch (KVM_PREALLOC_LEVEL) { > > + case 0: > > + return pgd; > > + case 1: > > + pud = pud_offset(pgd, 0); > > + return pud; > > + case 2: > > + pud = pud_offset(pgd, 0); > > + pmd = pmd_offset(pud, 0); > > + return pmd; > > + default: > > + BUG(); > > + return NULL; > > + } > > +} > > + > > +static inline void kvm_free_hwpgd(struct kvm *kvm) > > +{ > > + if (KVM_PREALLOC_LEVEL > 0) { > > + unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm); > > + free_pages(hwpgd, get_order(S2_PGD_ORDER)); > > Isn't the get_order() a bit wrong here? I'd expect S2_PGD_ORDER to be > what we need already... > yikes! gonzo coding. what it should be is free_pages(hwpgd, get_order(PTRS_PER_S2_PGD)); but it ties into the discussion above. > > + } > > +} > > I personally like this version more (Catalin may have a different > opinion ;-). > I'm fine with this, but I wasn't sure if you guys had something more clever/beautiful in mind....? -Christoffer ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-08 9:47 ` Christoffer Dall @ 2014-10-08 10:27 ` Marc Zyngier 0 siblings, 0 replies; 18+ messages in thread From: Marc Zyngier @ 2014-10-08 10:27 UTC (permalink / raw) To: linux-arm-kernel On 08/10/14 10:47, Christoffer Dall wrote: > On Wed, Oct 08, 2014 at 10:34:31AM +0100, Marc Zyngier wrote: >> On 07/10/14 20:39, Christoffer Dall wrote: >>> On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote: >>>> On 07/10/14 11:48, Catalin Marinas wrote: >>>>> On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote: >>>>>> +/** >>>>>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR >>>>>> + * @kvm: The KVM struct pointer for the VM. >>>>>> + * @pgd: The kernel pseudo pgd >>>>>> + * >>>>>> + * When the kernel uses more levels of page tables than the guest, we allocate >>>>>> + * a fake PGD and pre-populate it to point to the next-level page table, which >>>>>> + * will be the real initial page table pointed to by the VTTBR. >>>>>> + * >>>>>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and >>>>>> + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we >>>>>> + * allocate 2 consecutive PUD pages. >>>>>> + */ >>>>>> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 >>>>>> +#define KVM_PREALLOC_LEVEL 2 >>>>>> +#define PTRS_PER_S2_PGD 1 >>>>>> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) >>>>> >>>>> I agree that my magic equation wasn't readable ;) (I had troubles >>>>> re-understanding it as well), but you also have some constants here that >>>>> are not immediately obvious where you got to them from. IIUC, >>>>> KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands >>>>> stage 2 pmd and pte. I guess you could look into the ARM ARM tables but >>>>> it's still not clear. >>>>> >>>>> Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was: >>>>> >>>>> #if PGDIR_SHIFT > KVM_PHYS_SHIFT >>>>> #define PTRS_PER_S2_PGD (1) >>>>> #else >>>>> #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) >>>>> #endif >>>>> >>>>> In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K >>>>> and 4 levels case below is also correct. >>>>> >>>>> The KVM start level calculation, we could assume that KVM needs either >>>>> host levels or host levels - 1 (unless we go for some weirdly small >>>>> KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as: >>>>> >>>>> #if PTRS_PER_S2_PGD <= 16 >>>>> #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) >>>>> #else >>>>> #define KVM_PREALLOC_LEVEL (0) >>>>> #endif >>>>> >>>>> Basically if you can concatenate 16 or less pages at the level below the >>>>> top, the architecture does not allow a small top level. In this case, >>>>> (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the >>>>> host and we add 1 to go to the next level for KVM stage 2 when >>>>> PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to >>>>> preallocate. >>>> >>>> I think this makes the whole thing clearer (at least for me), as it >>>> makes the relationship between KVM_PREALLOC_LEVEL and >>>> CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me >>>> initially). >>> >>> Agreed. >>> >>>> >>>>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >>>>>> +{ >>>>>> + pud_t *pud; >>>>>> + pmd_t *pmd; >>>>>> + >>>>>> + pud = pud_offset(pgd, 0); >>>>>> + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); >>>>>> + >>>>>> + if (!pmd) >>>>>> + return -ENOMEM; >>>>>> + pud_populate(NULL, pud, pmd); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static inline void kvm_free_hwpgd(struct kvm *kvm) >>>>>> +{ >>>>>> + pgd_t *pgd = kvm->arch.pgd; >>>>>> + pud_t *pud = pud_offset(pgd, 0); >>>>>> + pmd_t *pmd = pmd_offset(pud, 0); >>>>>> + free_pages((unsigned long)pmd, 0); >>>>>> +} >>>>>> + >>>>>> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) >>>>>> +{ >>>>>> + pgd_t *pgd = kvm->arch.pgd; >>>>>> + pud_t *pud = pud_offset(pgd, 0); >>>>>> + pmd_t *pmd = pmd_offset(pud, 0); >>>>>> + return virt_to_phys(pmd); >>>>>> + >>>>>> +} >>>>>> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 >>>>>> +#define KVM_PREALLOC_LEVEL 1 >>>>>> +#define PTRS_PER_S2_PGD 2 >>>>>> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) >>>>> >>>>> Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39)) >>>>> which is 2 and KVM_PREALLOC_LEVEL == 1. >>>>> >>>>>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >>>>>> +{ >>>>>> + pud_t *pud; >>>>>> + >>>>>> + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); >>>>>> + if (!pud) >>>>>> + return -ENOMEM; >>>>>> + pgd_populate(NULL, pgd, pud); >>>>>> + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>> >>>>> You still need to define these functions but you can make their >>>>> implementation dependent solely on the KVM_PREALLOC_LEVEL rather than >>>>> 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you >>>>> allocate pud and populate the pgds (in a loop based on the >>>>> PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud >>>>> (still in a loop though it would probably be 1 iteration). We know based >>>>> on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and >>>>> CONFIG_ARM64_PGTABLE_LEVELS == 4. >>>>> >>>> >>>> Also agreed. Most of what you wrote here could also be gathered as >>>> comments in the patch. >>>> >>> Yes, I reworded some of the text slightly as comments for the next >>> version of the patch. >>> >>> However, I'm not sure I have a clear idea of how you'd like these >>> functions to look like. >>> >>> I came up with the following based on your feedback, but I personally >>> don't find it a lot easier to read than what I had already. Suggestions >>> are welcome: >>> >>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h >>> index a030d16..7941a51 100644 >>> --- a/arch/arm64/include/asm/kvm_mmu.h >>> +++ b/arch/arm64/include/asm/kvm_mmu.h >>> @@ -41,6 +41,18 @@ >>> */ >>> #define TRAMPOLINE_VA (HYP_PAGE_OFFSET_MASK & PAGE_MASK) >>> >>> +/* >>> + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation >>> + * levels in addition to the PGD and potentially the PUD which are >>> + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2 >>> + * tables use one level of tables less than the kernel. >>> + */ >>> +#ifdef CONFIG_ARM64_64K_PAGES >>> +#define KVM_MMU_CACHE_MIN_PAGES 1 >>> +#else >>> +#define KVM_MMU_CACHE_MIN_PAGES 2 >>> +#endif >>> + >>> #ifdef __ASSEMBLY__ >>> >>> /* >>> @@ -53,6 +65,7 @@ >>> >>> #else >>> >>> +#include <asm/pgalloc.h> >>> #include <asm/cachetype.h> >>> #include <asm/cacheflush.h> >>> >>> @@ -65,10 +78,6 @@ >>> #define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) >>> #define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) >>> >>> -/* Make sure we get the right size, and thus the right alignment */ >>> -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) >>> -#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) >>> - >>> int create_hyp_mappings(void *from, void *to); >>> int create_hyp_io_mappings(void *from, void *to, phys_addr_t); >>> void free_boot_hyp_pgd(void); >>> @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void); >>> #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) >>> >>> static inline void kvm_clean_pgd(pgd_t *pgd) {} >>> +static inline void kvm_clean_pmd(pmd_t *pmd) {} >>> static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} >>> static inline void kvm_clean_pte(pte_t *pte) {} >>> static inline void kvm_clean_pte_entry(pte_t *pte) {} >>> @@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr) >>> } >>> >>> #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep) >>> -#ifndef CONFIG_ARM64_64K_PAGES >>> -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) >>> -#else >>> + >>> +#ifdef __PAGETABLE_PMD_FOLDED >>> #define kvm_pmd_table_empty(pmdp) (0) >>> +#else >>> +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) >>> #endif >>> + >>> +#ifdef __PAGETABLE_PUD_FOLDED >>> #define kvm_pud_table_empty(pudp) (0) >>> +#else >>> +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp) >>> +#endif >>> + >>> +/* >>> + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address >>> + * the entire IPA input range with a single pgd entry, and we would only need >>> + * one pgd entry. >>> + */ >>> +#if PGDIR_SHIFT > KVM_PHYS_SHIFT >>> +#define PTRS_PER_S2_PGD (1) >>> +#else >>> +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) >>> +#endif >>> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) >>> >>> +/* >>> + * If we are concatenating first level stage-2 page tables, we would have less >>> + * than or equal to 16 pointers in the fake PGD, because that's what the >>> + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) >>> + * represents the first level for the host, and we add 1 to go to the next >>> + * level (which uses contatenation) for the stage-2 tables. >>> + */ >>> +#if PTRS_PER_S2_PGD <= 16 >>> +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) >>> +#else >>> +#define KVM_PREALLOC_LEVEL (0) >>> +#endif >>> + >>> +/** >>> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR >>> + * @kvm: The KVM struct pointer for the VM. >>> + * @pgd: The kernel pseudo pgd >>> + * >>> + * When the kernel uses more levels of page tables than the guest, we allocate >>> + * a fake PGD and pre-populate it to point to the next-level page table, which >>> + * will be the real initial page table pointed to by the VTTBR. >>> + * >>> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and >>> + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we >>> + * allocate 2 consecutive PUD pages. >>> + */ >>> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >>> +{ >>> + pud_t *pud; >>> + pmd_t *pmd; >>> + unsigned int order, i; >>> + unsigned long hwpgd; >>> + >>> + if (KVM_PREALLOC_LEVEL == 0) >>> + return 0; >>> + >>> + order = get_order(PTRS_PER_S2_PGD); >> >> S2_PGD_ORDER instead? Otherwise, that doesn't seem quite right... >> > > no, S2_PGD_ORDER is always the order of the PGD (in linux > macro-world-pgd-terms) that we allocate currently. Of course, we could > rework that like this: > > #if PTRS_PER_S2_PGD <= 16 > #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > #define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD) > #else > #define KVM_PREALLOC_LEVEL (0) > #define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > #endif I see. Got confused by the various PGD... > >>> + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); >>> + if (!hwpgd) >>> + return -ENOMEM; >>> + >>> + if (KVM_PREALLOC_LEVEL == 1) { >>> + pud = (pud_t *)hwpgd; >>> + for (i = 0; i < PTRS_PER_S2_PGD; i++) >>> + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD); >>> + } else if (KVM_PREALLOC_LEVEL == 2) { >>> + pud = pud_offset(pgd, 0); >>> + pmd = (pmd_t *)hwpgd; >>> + for (i = 0; i < PTRS_PER_S2_PGD; i++) >>> + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD); >>> + } >>> + >>> + return 0; >> >> Shouldn't we return an error here instead? Or BUG()? >> > > yes, should never happen. > >>> +} >>> + >>> +static inline void *kvm_get_hwpgd(struct kvm *kvm) >>> +{ >>> + pgd_t *pgd = kvm->arch.pgd; >>> + pud_t *pud; >>> + pmd_t *pmd; >>> + >>> + switch (KVM_PREALLOC_LEVEL) { >>> + case 0: >>> + return pgd; >>> + case 1: >>> + pud = pud_offset(pgd, 0); >>> + return pud; >>> + case 2: >>> + pud = pud_offset(pgd, 0); >>> + pmd = pmd_offset(pud, 0); >>> + return pmd; >>> + default: >>> + BUG(); >>> + return NULL; >>> + } >>> +} >>> + >>> +static inline void kvm_free_hwpgd(struct kvm *kvm) >>> +{ >>> + if (KVM_PREALLOC_LEVEL > 0) { >>> + unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm); >>> + free_pages(hwpgd, get_order(S2_PGD_ORDER)); >> >> Isn't the get_order() a bit wrong here? I'd expect S2_PGD_ORDER to be >> what we need already... >> > > yikes! gonzo coding. > > what it should be is > free_pages(hwpgd, get_order(PTRS_PER_S2_PGD)); > > but it ties into the discussion above. Agreed. >>> + } >>> +} >> >> I personally like this version more (Catalin may have a different >> opinion ;-). >> > I'm fine with this, but I wasn't sure if you guys had something more > clever/beautiful in mind....? See Catalin's reply, but I'm happy either way. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-07 19:39 ` Christoffer Dall 2014-10-08 9:34 ` Marc Zyngier @ 2014-10-08 9:47 ` Catalin Marinas 2014-10-09 11:01 ` Christoffer Dall 1 sibling, 1 reply; 18+ messages in thread From: Catalin Marinas @ 2014-10-08 9:47 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote: > I came up with the following based on your feedback, but I personally > don't find it a lot easier to read than what I had already. Suggestions > are welcome: At least PTRS_PER_S2_PGD and KVM_PREALLOC_LEVEL are clearer to me as formulas than the magic numbers. > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index a030d16..7941a51 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h [...] > +/* > + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address > + * the entire IPA input range with a single pgd entry, and we would only need > + * one pgd entry. > + */ It may be worth here stating that this pgd is actually fake (covered below as well). Maybe something like "single (fake) pgd entry". > +#if PGDIR_SHIFT > KVM_PHYS_SHIFT > +#define PTRS_PER_S2_PGD (1) > +#else > +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > +#endif > +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > +/* > + * If we are concatenating first level stage-2 page tables, we would have less > + * than or equal to 16 pointers in the fake PGD, because that's what the > + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) > + * represents the first level for the host, and we add 1 to go to the next > + * level (which uses contatenation) for the stage-2 tables. > + */ > +#if PTRS_PER_S2_PGD <= 16 > +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > +#else > +#define KVM_PREALLOC_LEVEL (0) > +#endif > + > +/** > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > + * @kvm: The KVM struct pointer for the VM. > + * @pgd: The kernel pseudo pgd > + * > + * When the kernel uses more levels of page tables than the guest, we allocate > + * a fake PGD and pre-populate it to point to the next-level page table, which > + * will be the real initial page table pointed to by the VTTBR. > + * > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > + * allocate 2 consecutive PUD pages. > + */ I don't have a strong preference here, if you find the code easier to read as separate kvm_prealloc_hwpgd() functions, use those, as per your original patch. My point was to no longer define the functions based on #if 64K && 3-levels etc. but only on KVM_PREALLOC_LEVEL. Anyway, I think the code below looks ok, with some fixes. > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > +{ > + pud_t *pud; > + pmd_t *pmd; > + unsigned int order, i; > + unsigned long hwpgd; > + > + if (KVM_PREALLOC_LEVEL == 0) > + return 0; > + > + order = get_order(PTRS_PER_S2_PGD); Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD is 16 or less and the order should not be used. > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); I assume you need __get_free_pages() for alignment. > + if (!hwpgd) > + return -ENOMEM; > + > + if (KVM_PREALLOC_LEVEL == 1) { > + pud = (pud_t *)hwpgd; > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD); > + } else if (KVM_PREALLOC_LEVEL == 2) { > + pud = pud_offset(pgd, 0); > + pmd = (pmd_t *)hwpgd; > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD); > + } It could be slightly shorter as (I can't guarantee clearer ;)): for (i = 0; i < PTRS_PER_S2_PGD; i++) { if (KVM_PREALLOC_LEVEL == 1) pgd_populate(NULL, pgd + i, (pud_t *)hwpgd + i * PTRS_PER_PUD); else if (KVM_PREALLOC_LEVEL == 2) pud_populate(NULL, pud_offset(pgd, 0) + i, (pmd_t *)hwpgd + i * PTRS_PER_PMD) } Or you could write a kvm_populate_swpgd() to handle the ifs and casting. > + > + return 0; > +} > + > +static inline void *kvm_get_hwpgd(struct kvm *kvm) > +{ > + pgd_t *pgd = kvm->arch.pgd; > + pud_t *pud; > + pmd_t *pmd; > + > + switch (KVM_PREALLOC_LEVEL) { > + case 0: > + return pgd; > + case 1: > + pud = pud_offset(pgd, 0); > + return pud; > + case 2: > + pud = pud_offset(pgd, 0); > + pmd = pmd_offset(pud, 0); > + return pmd; > + default: > + BUG(); > + return NULL; > + } /* not needed? Use BUG_ON or BUILD_BUG_ON */ if (KVM_PREALLOC_LEVEL == 0) return pgd; pud = pud_offset(pgd, 0); if (KVM_PREALLOC_LEVEL == 1) return pud; return pmd_offset(pud, 0); You don't need KVM_PREALLOC_LEVEL == 0 case since this function wouldn't be called. So you could do with some (BUILD_)BUG_ON and 4 lines after. -- Catalin ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-08 9:47 ` Catalin Marinas @ 2014-10-09 11:01 ` Christoffer Dall 2014-10-09 13:36 ` Catalin Marinas 0 siblings, 1 reply; 18+ messages in thread From: Christoffer Dall @ 2014-10-09 11:01 UTC (permalink / raw) To: linux-arm-kernel On Wed, Oct 08, 2014 at 10:47:04AM +0100, Catalin Marinas wrote: > On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote: > > I came up with the following based on your feedback, but I personally > > don't find it a lot easier to read than what I had already. Suggestions > > are welcome: > > At least PTRS_PER_S2_PGD and KVM_PREALLOC_LEVEL are clearer to me as > formulas than the magic numbers. > Agreed. > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > > index a030d16..7941a51 100644 > > --- a/arch/arm64/include/asm/kvm_mmu.h > > +++ b/arch/arm64/include/asm/kvm_mmu.h > [...] > > +/* > > + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address > > + * the entire IPA input range with a single pgd entry, and we would only need > > + * one pgd entry. > > + */ > > It may be worth here stating that this pgd is actually fake (covered > below as well). Maybe something like "single (fake) pgd entry". > Yes. > > +#if PGDIR_SHIFT > KVM_PHYS_SHIFT > > +#define PTRS_PER_S2_PGD (1) > > +#else > > +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > > +#endif > > +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > > > +/* > > + * If we are concatenating first level stage-2 page tables, we would have less > > + * than or equal to 16 pointers in the fake PGD, because that's what the > > + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) > > + * represents the first level for the host, and we add 1 to go to the next > > + * level (which uses contatenation) for the stage-2 tables. > > + */ > > +#if PTRS_PER_S2_PGD <= 16 > > +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > > +#else > > +#define KVM_PREALLOC_LEVEL (0) > > +#endif > > + > > +/** > > + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > > + * @kvm: The KVM struct pointer for the VM. > > + * @pgd: The kernel pseudo pgd > > + * > > + * When the kernel uses more levels of page tables than the guest, we allocate > > + * a fake PGD and pre-populate it to point to the next-level page table, which > > + * will be the real initial page table pointed to by the VTTBR. > > + * > > + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > > + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > > + * allocate 2 consecutive PUD pages. > > + */ > > I don't have a strong preference here, if you find the code easier to > read as separate kvm_prealloc_hwpgd() functions, use those, as per your > original patch. My point was to no longer define the functions based on > #if 64K && 3-levels etc. but only on KVM_PREALLOC_LEVEL. > > Anyway, I think the code below looks ok, with some fixes. > I think it's nicer too once I got used to it. > > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > > +{ > > + pud_t *pud; > > + pmd_t *pmd; > > + unsigned int order, i; > > + unsigned long hwpgd; > > + > > + if (KVM_PREALLOC_LEVEL == 0) > > + return 0; > > + > > + order = get_order(PTRS_PER_S2_PGD); > > Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD > is 16 or less and the order should not be used. > no, if the kernel has 4K pages and 4 levels, then PGDIR_SHIFT is 39, and KVM_PHYS_SHIFT stays 40, so that means PTRS_PER_S2_PGD becomes 2, which means we concatenate two first level stage-2 page tables, which means we need to allocate two consecutive pages, giving us an order of 1, not 0. That's exactly why we use get_order(PTRS_PER_S2_PGD) instead of S2_PGD_ORDER, which is only used when we're not doing the fake PGD trick (see my response to Marc's mail). > > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > > I assume you need __get_free_pages() for alignment. > yes, would you prefer a comment to that fact? > > + if (!hwpgd) > > + return -ENOMEM; > > + > > + if (KVM_PREALLOC_LEVEL == 1) { > > + pud = (pud_t *)hwpgd; > > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > > + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD); > > + } else if (KVM_PREALLOC_LEVEL == 2) { > > + pud = pud_offset(pgd, 0); > > + pmd = (pmd_t *)hwpgd; > > + for (i = 0; i < PTRS_PER_S2_PGD; i++) > > + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD); > > + } > > It could be slightly shorter as (I can't guarantee clearer ;)): > > for (i = 0; i < PTRS_PER_S2_PGD; i++) { > if (KVM_PREALLOC_LEVEL == 1) > pgd_populate(NULL, pgd + i, > (pud_t *)hwpgd + i * PTRS_PER_PUD); > else if (KVM_PREALLOC_LEVEL == 2) > pud_populate(NULL, pud_offset(pgd, 0) + i, > (pmd_t *)hwpgd + i * PTRS_PER_PMD) > } > > Or you could write a kvm_populate_swpgd() to handle the ifs and casting. > I actually quite like this, let's see how it looks in the next revision and if people really dislike it, we can look at factoring it out further. > > + > > + return 0; > > +} > > + > > +static inline void *kvm_get_hwpgd(struct kvm *kvm) > > +{ > > + pgd_t *pgd = kvm->arch.pgd; > > + pud_t *pud; > > + pmd_t *pmd; > > + > > + switch (KVM_PREALLOC_LEVEL) { > > + case 0: > > + return pgd; > > + case 1: > > + pud = pud_offset(pgd, 0); > > + return pud; > > + case 2: > > + pud = pud_offset(pgd, 0); > > + pmd = pmd_offset(pud, 0); > > + return pmd; > > + default: > > + BUG(); > > + return NULL; > > + } > > /* not needed? Use BUG_ON or BUILD_BUG_ON */ > if (KVM_PREALLOC_LEVEL == 0) > return pgd; > > pud = pud_offset(pgd, 0); > if (KVM_PREALLOC_LEVEL == 1) > return pud; > > return pmd_offset(pud, 0); I like this, but... > > You don't need KVM_PREALLOC_LEVEL == 0 case since this function wouldn't > be called. So you could do with some (BUILD_)BUG_ON and 4 lines after. > It is needed and it is called from arch/arm/kvm/arm.c in update_vttbr(). Thanks! -Christoffer ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-09 11:01 ` Christoffer Dall @ 2014-10-09 13:36 ` Catalin Marinas 2014-10-10 8:16 ` Christoffer Dall 0 siblings, 1 reply; 18+ messages in thread From: Catalin Marinas @ 2014-10-09 13:36 UTC (permalink / raw) To: linux-arm-kernel On Thu, Oct 09, 2014 at 12:01:37PM +0100, Christoffer Dall wrote: > On Wed, Oct 08, 2014 at 10:47:04AM +0100, Catalin Marinas wrote: > > On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote: > > > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > > > +{ > > > + pud_t *pud; > > > + pmd_t *pmd; > > > + unsigned int order, i; > > > + unsigned long hwpgd; > > > + > > > + if (KVM_PREALLOC_LEVEL == 0) > > > + return 0; > > > + > > > + order = get_order(PTRS_PER_S2_PGD); > > > > Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD > > is 16 or less and the order should not be used. > > no, if the kernel has 4K pages and 4 levels, then PGDIR_SHIFT is 39, and > KVM_PHYS_SHIFT stays 40, so that means PTRS_PER_S2_PGD becomes 2, which > means we concatenate two first level stage-2 page tables, which means we > need to allocate two consecutive pages, giving us an order of 1, not 0. So if PTRS_PER_S2_PGD is 2, how come get_order(PTRS_PER_S2_PGD) == 1? My reading of the get_order() macro is that get_order(2) == 0. Did you mean get_order(PTRS_PER_S2_PGD * PAGE_SIZE)? Or you could define a PTRS_PER_S2_PGD_SHIFT as (KVM_PHYS_SHIFT - PGDIR_SHIFT) and use this as the order directly. > > > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > > > > I assume you need __get_free_pages() for alignment. > > yes, would you prefer a comment to that fact? No, that's fine. -- Catalin ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-09 13:36 ` Catalin Marinas @ 2014-10-10 8:16 ` Christoffer Dall 0 siblings, 0 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-10 8:16 UTC (permalink / raw) To: linux-arm-kernel On Thu, Oct 09, 2014 at 02:36:26PM +0100, Catalin Marinas wrote: > On Thu, Oct 09, 2014 at 12:01:37PM +0100, Christoffer Dall wrote: > > On Wed, Oct 08, 2014 at 10:47:04AM +0100, Catalin Marinas wrote: > > > On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote: > > > > +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > > > > +{ > > > > + pud_t *pud; > > > > + pmd_t *pmd; > > > > + unsigned int order, i; > > > > + unsigned long hwpgd; > > > > + > > > > + if (KVM_PREALLOC_LEVEL == 0) > > > > + return 0; > > > > + > > > > + order = get_order(PTRS_PER_S2_PGD); > > > > > > Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD > > > is 16 or less and the order should not be used. > > > > no, if the kernel has 4K pages and 4 levels, then PGDIR_SHIFT is 39, and > > KVM_PHYS_SHIFT stays 40, so that means PTRS_PER_S2_PGD becomes 2, which > > means we concatenate two first level stage-2 page tables, which means we > > need to allocate two consecutive pages, giving us an order of 1, not 0. > > So if PTRS_PER_S2_PGD is 2, how come get_order(PTRS_PER_S2_PGD) == 1? My > reading of the get_order() macro is that get_order(2) == 0. > > Did you mean get_order(PTRS_PER_S2_PGD * PAGE_SIZE)? Ah, you're right. Sorry. Yes, that's what I meant. > > Or you could define a PTRS_PER_S2_PGD_SHIFT as (KVM_PHYS_SHIFT - > PGDIR_SHIFT) and use this as the order directly. > That's better. I also experimented with defining S2_HWPGD_ORDER or S2_PREALLOC_ORDER, but it didn't look much clear, so sticking with PTRS_PER_S2_PGD_SHIFT. > > > > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > > > > > > I assume you need __get_free_pages() for alignment. > > > > yes, would you prefer a comment to that fact? > > No, that's fine. > Thanks, -Christoffer ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall 2014-10-07 10:48 ` Catalin Marinas @ 2014-10-07 13:40 ` Marc Zyngier 2014-10-08 9:48 ` Christoffer Dall 1 sibling, 1 reply; 18+ messages in thread From: Marc Zyngier @ 2014-10-07 13:40 UTC (permalink / raw) To: linux-arm-kernel Hi Christoffer, On 06/10/14 21:30, Christoffer Dall wrote: > This patch adds the necessary support for all host kernel PGSIZE and > VA_SPACE configuration options for both EL2 and the Stage-2 page tables. > > However, for 40bit and 42bit PARange systems, the architecture mandates > that VTCR_EL2.SL0 is maximum 1, resulting in fewer levels of stage-2 > pagge tables than levels of host kernel page tables. At the same time, > systems with a PARange > 42bit, we limit the IPA range by always setting > VTCR_EL2.T0SZ to 24. > > To solve the situation with different levels of page tables for Stage-2 > translation than the host kernel page tables, we allocate a dummy PGD > with pointers to our actual inital level Stage-2 page table, in order > for us to reuse the kernel pgtable manipulation primitives. Reproducing > all these in KVM does not look pretty and unnecessarily complicates the > 32-bit side. > > Systems with a PARange < 40bits are not yet supported. > > [ I have reworked this patch from its original form submitted by > Jungseok to take the architecture constraints into consideration. > There were too many changes from the original patch for me to > preserve the authorship. Thanks to Catalin Marinas for his help in > figuring out a good solution to this challenge. I have also fixed > various bugs and missing error code handling from the original > patch. - Christoffer ] > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> On top of Catalin's review, I have the following comments: [...] > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index bb06f76..3b3e18f 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -42,7 +42,7 @@ static unsigned long hyp_idmap_start; > static unsigned long hyp_idmap_end; > static phys_addr_t hyp_idmap_vector; > > -#define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) > +#define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) > > #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x)) > > @@ -158,7 +158,7 @@ static void unmap_pmds(struct kvm *kvm, pud_t *pud, > } > } while (pmd++, addr = next, addr != end); > > - if (kvm_pmd_table_empty(start_pmd)) > + if (kvm_pmd_table_empty(start_pmd) && (!kvm || KVM_PREALLOC_LEVEL < 2)) This really feels clunky. Can we fold the additional tests inside kvm_pmd_table_empty(), taking kvm as an additional parameter? > clear_pud_entry(kvm, pud, start_addr); > } > > @@ -182,7 +182,7 @@ static void unmap_puds(struct kvm *kvm, pgd_t *pgd, > } > } while (pud++, addr = next, addr != end); > > - if (kvm_pud_table_empty(start_pud)) > + if (kvm_pud_table_empty(start_pud) && (!kvm || KVM_PREALLOC_LEVEL < 1)) Same here. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 2014-10-07 13:40 ` Marc Zyngier @ 2014-10-08 9:48 ` Christoffer Dall 0 siblings, 0 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-08 9:48 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 07, 2014 at 02:40:27PM +0100, Marc Zyngier wrote: > Hi Christoffer, > > On 06/10/14 21:30, Christoffer Dall wrote: > > This patch adds the necessary support for all host kernel PGSIZE and > > VA_SPACE configuration options for both EL2 and the Stage-2 page tables. > > > > However, for 40bit and 42bit PARange systems, the architecture mandates > > that VTCR_EL2.SL0 is maximum 1, resulting in fewer levels of stage-2 > > pagge tables than levels of host kernel page tables. At the same time, > > systems with a PARange > 42bit, we limit the IPA range by always setting > > VTCR_EL2.T0SZ to 24. > > > > To solve the situation with different levels of page tables for Stage-2 > > translation than the host kernel page tables, we allocate a dummy PGD > > with pointers to our actual inital level Stage-2 page table, in order > > for us to reuse the kernel pgtable manipulation primitives. Reproducing > > all these in KVM does not look pretty and unnecessarily complicates the > > 32-bit side. > > > > Systems with a PARange < 40bits are not yet supported. > > > > [ I have reworked this patch from its original form submitted by > > Jungseok to take the architecture constraints into consideration. > > There were too many changes from the original patch for me to > > preserve the authorship. Thanks to Catalin Marinas for his help in > > figuring out a good solution to this challenge. I have also fixed > > various bugs and missing error code handling from the original > > patch. - Christoffer ] > > > > Cc: Marc Zyngier <marc.zyngier@arm.com> > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > > On top of Catalin's review, I have the following comments: > > [...] > > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > > index bb06f76..3b3e18f 100644 > > --- a/arch/arm/kvm/mmu.c > > +++ b/arch/arm/kvm/mmu.c > > @@ -42,7 +42,7 @@ static unsigned long hyp_idmap_start; > > static unsigned long hyp_idmap_end; > > static phys_addr_t hyp_idmap_vector; > > > > -#define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) > > +#define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) > > > > #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x)) > > > > @@ -158,7 +158,7 @@ static void unmap_pmds(struct kvm *kvm, pud_t *pud, > > } > > } while (pmd++, addr = next, addr != end); > > > > - if (kvm_pmd_table_empty(start_pmd)) > > + if (kvm_pmd_table_empty(start_pmd) && (!kvm || KVM_PREALLOC_LEVEL < 2)) > > This really feels clunky. Can we fold the additional tests inside > kvm_pmd_table_empty(), taking kvm as an additional parameter? > > > clear_pud_entry(kvm, pud, start_addr); > > } > > > > @@ -182,7 +182,7 @@ static void unmap_puds(struct kvm *kvm, pgd_t *pgd, > > } > > } while (pud++, addr = next, addr != end); > > > > - if (kvm_pud_table_empty(start_pud)) > > + if (kvm_pud_table_empty(start_pud) && (!kvm || KVM_PREALLOC_LEVEL < 1)) > > Same here. > Sounds reasonable, I'll try to work it into the next version of the patches. -Christoffer ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/3] arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE 2014-10-06 20:30 [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall @ 2014-10-06 20:30 ` Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 3/3] arm64: Allow 48-bits VA space without ARM_SMMU Christoffer Dall 2014-10-07 9:24 ` [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Catalin Marinas 3 siblings, 0 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-06 20:30 UTC (permalink / raw) To: linux-arm-kernel When creating or moving a memslot, make sure the IPA space is within the addressable range of the guest. Otherwise, user space can create too large a memslot and KVM would try to access potentially unallocated page table entries when inserting entries in the Stage-2 page tables. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> --- arch/arm/kvm/mmu.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 3b3e18f..123508d 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -992,6 +992,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) goto out_unlock; } + /* Userspace should not be able to register out-of-bounds IPAs */ + VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE); + ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status); if (ret == 0) ret = 1; @@ -1216,6 +1219,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem, enum kvm_mr_change change) { + if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) { + if (memslot->base_gfn + memslot->npages >= + (KVM_PHYS_SIZE >> PAGE_SHIFT)) + return -EFAULT; + } return 0; } -- 2.1.2.330.g565301e.dirty ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/3] arm64: Allow 48-bits VA space without ARM_SMMU 2014-10-06 20:30 [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 2/3] arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE Christoffer Dall @ 2014-10-06 20:30 ` Christoffer Dall 2014-10-07 9:24 ` [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Catalin Marinas 3 siblings, 0 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-06 20:30 UTC (permalink / raw) To: linux-arm-kernel Now when KVM has been reworked to support 48-bits host VA space, we can allow systems to be configured with this option. However, the ARM SMMU driver also needs to be tweaked for 48-bit support so only allow the config option to be set when not including support for theSMMU. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> --- arch/arm64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fd4e81a..a76c6c3b 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -200,7 +200,7 @@ config ARM64_VA_BITS_42 config ARM64_VA_BITS_48 bool "48-bit" - depends on BROKEN + depends on !ARM_SMMU endchoice -- 2.1.2.330.g565301e.dirty ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits 2014-10-06 20:30 [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Christoffer Dall ` (2 preceding siblings ...) 2014-10-06 20:30 ` [PATCH v2 3/3] arm64: Allow 48-bits VA space without ARM_SMMU Christoffer Dall @ 2014-10-07 9:24 ` Catalin Marinas 2014-10-07 9:36 ` Christoffer Dall 3 siblings, 1 reply; 18+ messages in thread From: Catalin Marinas @ 2014-10-07 9:24 UTC (permalink / raw) To: linux-arm-kernel On Mon, Oct 06, 2014 at 09:30:24PM +0100, Christoffer Dall wrote: > The following host configurations have been tested with KVM on APM > Mustang: [...] > 3) 64KB + 39 bits VA space That would be 42-bit VA space. -- Catalin ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits 2014-10-07 9:24 ` [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Catalin Marinas @ 2014-10-07 9:36 ` Christoffer Dall 0 siblings, 0 replies; 18+ messages in thread From: Christoffer Dall @ 2014-10-07 9:36 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 07, 2014 at 10:24:58AM +0100, Catalin Marinas wrote: > On Mon, Oct 06, 2014 at 09:30:24PM +0100, Christoffer Dall wrote: > > The following host configurations have been tested with KVM on APM > > Mustang: > [...] > > 3) 64KB + 39 bits VA space > > That would be 42-bit VA space. > Yeah, -ECUTNPASTE, sorry about that. -Christoffer ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-10-10 8:16 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-10-06 20:30 [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Christoffer Dall 2014-10-07 10:48 ` Catalin Marinas 2014-10-07 13:28 ` Marc Zyngier 2014-10-07 19:39 ` Christoffer Dall 2014-10-08 9:34 ` Marc Zyngier 2014-10-08 9:47 ` Christoffer Dall 2014-10-08 10:27 ` Marc Zyngier 2014-10-08 9:47 ` Catalin Marinas 2014-10-09 11:01 ` Christoffer Dall 2014-10-09 13:36 ` Catalin Marinas 2014-10-10 8:16 ` Christoffer Dall 2014-10-07 13:40 ` Marc Zyngier 2014-10-08 9:48 ` Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 2/3] arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE Christoffer Dall 2014-10-06 20:30 ` [PATCH v2 3/3] arm64: Allow 48-bits VA space without ARM_SMMU Christoffer Dall 2014-10-07 9:24 ` [PATCH v2 0/3] arm/arm64: KVM: Host 48-bit VA support and IPA limits Catalin Marinas 2014-10-07 9:36 ` Christoffer Dall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).