kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Support for GB pages in KVM
@ 2009-03-27 14:31 Joerg Roedel
  2009-03-27 14:31 ` [PATCH 1/7] hugetlb: export vma_kernel_pagesize to modules Joerg Roedel
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel

Hi,

this patchset extends the KVM MMU implementation to support 1GB pages as
supported by AMD family 16 processors. These patches enable support for
1 GB pages with Nested Paging. Support for these pages in the shadow
paging code was also developed but does not run stable yet. The patch
for shadow-paging support is not included in this series and will be
sent out seperatly.

Joerg

git diff --stat avi/master..

 arch/ia64/kvm/process.c         |    5 -
 arch/ia64/kvm/vmm_ivt.S         |   18 ++--
 arch/ia64/kvm/vtlb.c            |    3 +-
 arch/x86/include/asm/kvm_host.h |   29 ++++-
 arch/x86/kvm/mmu.c              |  232 +++++++++++++++++++++++++++------------
 arch/x86/kvm/paging_tmpl.h      |   48 ++++++---
 arch/x86/kvm/svm.c              |    7 +
 arch/x86/kvm/vmx.c              |   19 ++-
 arch/x86/kvm/x86.c              |    6 +-
 include/linux/kvm.h             |    1 +
 include/linux/kvm_host.h        |    2 +-
 mm/hugetlb.c                    |    1 +
 virt/kvm/kvm_main.c             |   26 +++++




^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/7] hugetlb: export vma_kernel_pagesize to modules
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
@ 2009-03-27 14:31 ` Joerg Roedel
  2009-03-27 14:31 ` [PATCH 2/7] kvm mmu: infrastructure changes for multiple huge page support Joerg Roedel
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, linux-kernel, Joerg Roedel, William Irwin

This function is required in the KVM module to determine the size of the
backing page size.

Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 mm/hugetlb.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 107da3d..29f4fbf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -234,6 +234,7 @@ unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
 
 	return 1UL << (hstate->order + PAGE_SHIFT);
 }
+EXPORT_SYMBOL_GPL(vma_kernel_pagesize);
 
 /*
  * Return the page size being used by the MMU to back a VMA. In the majority
-- 
1.5.6.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/7] kvm mmu: infrastructure changes for multiple huge page support
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
  2009-03-27 14:31 ` [PATCH 1/7] hugetlb: export vma_kernel_pagesize to modules Joerg Roedel
@ 2009-03-27 14:31 ` Joerg Roedel
  2009-03-29 11:38   ` Avi Kivity
  2009-03-27 14:31 ` [PATCH 3/7] kvm mmu: add page size parameter to rmap_remove Joerg Roedel
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

This patch includes most of the necessary changes to the KVM SoftMMU for
supporting more than one huge page size. The changes in this patch
include:

  * introduce 'enum kvm_page_size' which is used to represent the page
    size used
  * change boolean is_largepage_backed() function to backing_size()
    which returns the largest page size KVM can use to map a gfn
  * change the other largepage flags to 'enum kvm_page_size'

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   17 ++++--
 arch/x86/kvm/mmu.c              |  111 +++++++++++++++++++++------------------
 arch/x86/kvm/paging_tmpl.h      |   22 ++++----
 3 files changed, 83 insertions(+), 67 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8351c4d..f268f99 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -52,11 +52,13 @@
 #define UNMAPPED_GVA (~(gpa_t)0)
 
 /* shadow tables are PAE even on non-PAE hosts */
-#define KVM_HPAGE_SHIFT 21
-#define KVM_HPAGE_SIZE (1UL << KVM_HPAGE_SHIFT)
-#define KVM_HPAGE_MASK (~(KVM_HPAGE_SIZE - 1))
+#define KVM_2M_PAGE_SHIFT 21
+#define KVM_2M_PAGE_SIZE (1UL << KVM_2M_PAGE_SHIFT)
+#define KVM_2M_PAGE_MASK (~(KVM_2M_PAGE_SIZE - 1))
 
-#define KVM_PAGES_PER_HPAGE (KVM_HPAGE_SIZE / PAGE_SIZE)
+#define KVM_PAGES_PER_2M_PAGE (KVM_2M_PAGE_SIZE / PAGE_SIZE)
+
+#define KVM_PAGES_PER_HPAGE	KVM_PAGES_PER_2M_PAGE
 
 #define DE_VECTOR 0
 #define DB_VECTOR 1
@@ -263,6 +265,11 @@ struct kvm_mmu {
 	u64 *pae_root;
 };
 
+enum kvm_page_size {
+	KVM_PAGE_SIZE_4k = (1 << 12),
+	KVM_PAGE_SIZE_2M = (1 << 21),
+};
+
 struct kvm_vcpu_arch {
 	u64 host_tsc;
 	int interrupt_window_open;
@@ -310,7 +317,7 @@ struct kvm_vcpu_arch {
 	struct {
 		gfn_t gfn;	/* presumed gfn during guest pte update */
 		pfn_t pfn;	/* pfn corresponding to that gfn */
-		int largepage;
+		enum kvm_page_size page_size;
 		unsigned long mmu_seq;
 	} update_pte;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b625ed4..3a57c17 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -385,8 +385,8 @@ static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
 {
 	unsigned long idx;
 
-	idx = (gfn / KVM_PAGES_PER_HPAGE) -
-	      (slot->base_gfn / KVM_PAGES_PER_HPAGE);
+	idx = (gfn / KVM_PAGES_PER_2M_PAGE) -
+	      (slot->base_gfn / KVM_PAGES_PER_2M_PAGE);
 	return &slot->lpage_info[idx].write_count;
 }
 
@@ -426,11 +426,11 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
 	return 1;
 }
 
-static int host_largepage_backed(struct kvm *kvm, gfn_t gfn)
+static enum kvm_page_size host_page_size(struct kvm *kvm, gfn_t gfn)
 {
 	struct vm_area_struct *vma;
-	unsigned long addr;
-	int ret = 0;
+	unsigned long addr, size;
+	enum kvm_page_size ret = KVM_PAGE_SIZE_4k;
 
 	addr = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(addr))
@@ -438,28 +438,31 @@ static int host_largepage_backed(struct kvm *kvm, gfn_t gfn)
 
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma(current->mm, addr);
-	if (vma && is_vm_hugetlb_page(vma))
-		ret = 1;
+	if (vma) {
+		size = vma_kernel_pagesize(vma);
+		if (size >= KVM_PAGE_SIZE_2M)
+			ret = KVM_PAGE_SIZE_2M;
+	}
 	up_read(&current->mm->mmap_sem);
 
 	return ret;
 }
 
-static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn)
+static enum kvm_page_size backing_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	struct kvm_memory_slot *slot;
 
-	if (has_wrprotected_page(vcpu->kvm, large_gfn))
-		return 0;
+	if (has_wrprotected_page(vcpu->kvm, gfn))
+		return KVM_PAGE_SIZE_4k;
 
-	if (!host_largepage_backed(vcpu->kvm, large_gfn))
-		return 0;
+	if (host_page_size(vcpu->kvm, gfn) < KVM_PAGE_SIZE_2M)
+		return KVM_PAGE_SIZE_4k;
 
-	slot = gfn_to_memslot(vcpu->kvm, large_gfn);
+	slot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (slot && slot->dirty_bitmap)
-		return 0;
+		return KVM_PAGE_SIZE_4k;
 
-	return 1;
+	return KVM_PAGE_SIZE_2M;
 }
 
 /*
@@ -467,17 +470,18 @@ static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn)
  * Note: gfn must be unaliased before this function get called
  */
 
-static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int lpage)
+static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
+				  enum kvm_page_size psize)
 {
 	struct kvm_memory_slot *slot;
 	unsigned long idx;
 
 	slot = gfn_to_memslot(kvm, gfn);
-	if (!lpage)
+	if (psize == KVM_PAGE_SIZE_4k)
 		return &slot->rmap[gfn - slot->base_gfn];
 
-	idx = (gfn / KVM_PAGES_PER_HPAGE) -
-	      (slot->base_gfn / KVM_PAGES_PER_HPAGE);
+	idx = (gfn / KVM_PAGES_PER_2M_PAGE) -
+	      (slot->base_gfn / KVM_PAGES_PER_2M_PAGE);
 
 	return &slot->lpage_info[idx].rmap_pde;
 }
@@ -491,7 +495,8 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int lpage)
  * If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc
  * containing more mappings.
  */
-static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage)
+static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn,
+		     enum kvm_page_size psize)
 {
 	struct kvm_mmu_page *sp;
 	struct kvm_rmap_desc *desc;
@@ -503,7 +508,7 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage)
 	gfn = unalias_gfn(vcpu->kvm, gfn);
 	sp = page_header(__pa(spte));
 	sp->gfns[spte - sp->spt] = gfn;
-	rmapp = gfn_to_rmap(vcpu->kvm, gfn, lpage);
+	rmapp = gfn_to_rmap(vcpu->kvm, gfn, psize);
 	if (!*rmapp) {
 		rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
 		*rmapp = (unsigned long)spte;
@@ -559,6 +564,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	pfn_t pfn;
 	unsigned long *rmapp;
 	int i;
+	enum kvm_page_size psize;
 
 	if (!is_rmap_pte(*spte))
 		return;
@@ -570,7 +576,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 		kvm_release_pfn_dirty(pfn);
 	else
 		kvm_release_pfn_clean(pfn);
-	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], is_large_pte(*spte));
+	psize = is_large_pte(*spte) ? KVM_PAGE_SIZE_2M : KVM_PAGE_SIZE_4k;
+	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], psize);
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
 		BUG();
@@ -636,7 +643,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 	int write_protected = 0;
 
 	gfn = unalias_gfn(kvm, gfn);
-	rmapp = gfn_to_rmap(kvm, gfn, 0);
+	rmapp = gfn_to_rmap(kvm, gfn, KVM_PAGE_SIZE_4k);
 
 	spte = rmap_next(kvm, rmapp, NULL);
 	while (spte) {
@@ -658,7 +665,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 	}
 
 	/* check for huge page mappings */
-	rmapp = gfn_to_rmap(kvm, gfn, 1);
+	rmapp = gfn_to_rmap(kvm, gfn, KVM_PAGE_SIZE_2M);
 	spte = rmap_next(kvm, rmapp, NULL);
 	while (spte) {
 		BUG_ON(!spte);
@@ -719,7 +726,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 			retval |= handler(kvm,
 					  &memslot->lpage_info[
 						  gfn_offset /
-						  KVM_PAGES_PER_HPAGE].rmap_pde);
+						  KVM_PAGES_PER_2M_PAGE].rmap_pde);
 		}
 	}
 
@@ -1676,7 +1683,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 
 static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 		    unsigned pte_access, int user_fault,
-		    int write_fault, int dirty, int largepage,
+		    int write_fault, int dirty, enum kvm_page_size psize,
 		    int global, gfn_t gfn, pfn_t pfn, bool speculative,
 		    bool can_unsync)
 {
@@ -1709,7 +1716,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 		spte |= shadow_nx_mask;
 	if (pte_access & ACC_USER_MASK)
 		spte |= shadow_user_mask;
-	if (largepage)
+	if (psize > KVM_PAGE_SIZE_4k)
 		spte |= PT_PAGE_SIZE_MASK;
 	if (mt_mask) {
 		if (!kvm_is_mmio_pfn(pfn)) {
@@ -1727,7 +1734,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 	if ((pte_access & ACC_WRITE_MASK)
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
 
-		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
+		if (psize > KVM_PAGE_SIZE_4k &&
+		    has_wrprotected_page(vcpu->kvm, gfn)) {
 			ret = 1;
 			spte = shadow_trap_nonpresent_pte;
 			goto set_pte;
@@ -1765,7 +1773,7 @@ set_pte:
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 			 unsigned pt_access, unsigned pte_access,
 			 int user_fault, int write_fault, int dirty,
-			 int *ptwrite, int largepage, int global,
+			 int *ptwrite, enum kvm_page_size psize, int global,
 			 gfn_t gfn, pfn_t pfn, bool speculative)
 {
 	int was_rmapped = 0;
@@ -1781,7 +1789,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 		 * the parent of the now unreachable PTE.
 		 */
-		if (largepage && !is_large_pte(*shadow_pte)) {
+		if (psize > KVM_PAGE_SIZE_4k && !is_large_pte(*shadow_pte)) {
 			struct kvm_mmu_page *child;
 			u64 pte = *shadow_pte;
 
@@ -1795,7 +1803,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 			was_rmapped = 1;
 	}
 	if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
-		      dirty, largepage, global, gfn, pfn, speculative, true)) {
+		      dirty, psize, global, gfn, pfn, speculative, true)) {
 		if (write_fault)
 			*ptwrite = 1;
 		kvm_x86_ops->tlb_flush(vcpu);
@@ -1811,7 +1819,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 
 	page_header_update_slot(vcpu->kvm, shadow_pte, gfn);
 	if (!was_rmapped) {
-		rmap_add(vcpu, shadow_pte, gfn, largepage);
+		rmap_add(vcpu, shadow_pte, gfn, psize);
 		if (!is_rmap_pte(*shadow_pte))
 			kvm_release_pfn_clean(pfn);
 	} else {
@@ -1831,7 +1839,7 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-			int largepage, gfn_t gfn, pfn_t pfn)
+			enum kvm_page_size psize, gfn_t gfn, pfn_t pfn)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
@@ -1840,10 +1848,11 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 
 	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
 		if (iterator.level == PT_PAGE_TABLE_LEVEL
-		    || (largepage && iterator.level == PT_DIRECTORY_LEVEL)) {
+		    || (psize == KVM_PAGE_SIZE_2M &&
+			iterator.level == PT_DIRECTORY_LEVEL)) {
 			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
 				     0, write, 1, &pt_write,
-				     largepage, 0, gfn, pfn, false);
+				     psize, 0, gfn, pfn, false);
 			++vcpu->stat.pf_fixed;
 			break;
 		}
@@ -1871,14 +1880,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 {
 	int r;
-	int largepage = 0;
 	pfn_t pfn;
 	unsigned long mmu_seq;
+	enum kvm_page_size psize = backing_size(vcpu, gfn);
 
-	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
-		largepage = 1;
-	}
+	if (psize == KVM_PAGE_SIZE_2M)
+		gfn &= ~(KVM_PAGES_PER_2M_PAGE-1);
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
@@ -1894,7 +1901,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	if (mmu_notifier_retry(vcpu, mmu_seq))
 		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
-	r = __direct_map(vcpu, v, write, largepage, gfn, pfn);
+	r = __direct_map(vcpu, v, write, psize, gfn, pfn);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 
@@ -2067,9 +2074,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 {
 	pfn_t pfn;
 	int r;
-	int largepage = 0;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
+	enum kvm_page_size psize;
 
 	ASSERT(vcpu);
 	ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -2078,10 +2085,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	if (r)
 		return r;
 
-	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
-		largepage = 1;
-	}
+	psize = backing_size(vcpu, gfn);
+	if (psize == KVM_PAGE_SIZE_2M)
+		gfn &= ~(KVM_PAGES_PER_2M_PAGE-1);
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
@@ -2094,7 +2100,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
-			 largepage, gfn, pfn);
+			 psize, gfn, pfn);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return r;
@@ -2333,7 +2339,7 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
 				  const void *new)
 {
 	if (sp->role.level != PT_PAGE_TABLE_LEVEL) {
-		if (!vcpu->arch.update_pte.largepage ||
+		if (vcpu->arch.update_pte.page_size != KVM_PAGE_SIZE_2M ||
 		    sp->role.glevels == PT32_ROOT_LEVEL) {
 			++vcpu->kvm->stat.mmu_pde_zapped;
 			return;
@@ -2383,7 +2389,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	u64 gpte = 0;
 	pfn_t pfn;
 
-	vcpu->arch.update_pte.largepage = 0;
+	vcpu->arch.update_pte.page_size = KVM_PAGE_SIZE_4k;
 
 	if (bytes != 4 && bytes != 8)
 		return;
@@ -2412,9 +2418,10 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		return;
 	gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
-	if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
-		vcpu->arch.update_pte.largepage = 1;
+	if (is_large_pte(gpte) &&
+	    backing_size(vcpu, gfn) != KVM_PAGE_SIZE_4k) {
+		gfn &= ~(KVM_PAGES_PER_2M_PAGE-1);
+		vcpu->arch.update_pte.page_size = KVM_PAGE_SIZE_2M;
 	}
 	vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 855eb71..9fbd049 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -241,7 +241,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 	pt_element_t gpte;
 	unsigned pte_access;
 	pfn_t pfn;
-	int largepage = vcpu->arch.update_pte.largepage;
+	enum kvm_page_size psize = vcpu->arch.update_pte.page_size;
 
 	gpte = *(const pt_element_t *)pte;
 	if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
@@ -260,7 +260,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 		return;
 	kvm_get_pfn(pfn);
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
-		     gpte & PT_DIRTY_MASK, NULL, largepage,
+		     gpte & PT_DIRTY_MASK, NULL, psize,
 		     gpte & PT_GLOBAL_MASK, gpte_to_gfn(gpte),
 		     pfn, true);
 }
@@ -270,7 +270,8 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
  */
 static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			 struct guest_walker *gw,
-			 int user_fault, int write_fault, int largepage,
+			 int user_fault, int write_fault,
+			 enum kvm_page_size psize,
 			 int *ptwrite, pfn_t pfn)
 {
 	unsigned access = gw->pt_access;
@@ -290,12 +291,13 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		level = iterator.level;
 		sptep = iterator.sptep;
 		if (level == PT_PAGE_TABLE_LEVEL
-		    || (largepage && level == PT_DIRECTORY_LEVEL)) {
+		    || (psize == KVM_PAGE_SIZE_2M &&
+			level == PT_DIRECTORY_LEVEL)) {
 			mmu_set_spte(vcpu, sptep, access,
 				     gw->pte_access & access,
 				     user_fault, write_fault,
 				     gw->ptes[gw->level-1] & PT_DIRTY_MASK,
-				     ptwrite, largepage,
+				     ptwrite, psize,
 				     gw->ptes[gw->level-1] & PT_GLOBAL_MASK,
 				     gw->gfn, pfn, false);
 			break;
@@ -368,7 +370,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	int write_pt = 0;
 	int r;
 	pfn_t pfn;
-	int largepage = 0;
+	enum kvm_page_size psize = KVM_PAGE_SIZE_4k;
 	unsigned long mmu_seq;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
@@ -396,10 +398,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 
 	if (walker.level == PT_DIRECTORY_LEVEL) {
 		gfn_t large_gfn;
-		large_gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE-1);
-		if (is_largepage_backed(vcpu, large_gfn)) {
+		large_gfn = walker.gfn & ~(KVM_PAGES_PER_2M_PAGE-1);
+		if (backing_size(vcpu, large_gfn) != KVM_PAGE_SIZE_4k) {
 			walker.gfn = large_gfn;
-			largepage = 1;
+			psize = KVM_PAGE_SIZE_2M;
 		}
 	}
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
@@ -418,7 +420,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	shadow_pte = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
-				  largepage, &write_pt, pfn);
+				  psize, &write_pt, pfn);
 
 	pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
 		 shadow_pte, *shadow_pte, write_pt);
-- 
1.5.6.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/7] kvm mmu: add page size parameter to rmap_remove
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
  2009-03-27 14:31 ` [PATCH 1/7] hugetlb: export vma_kernel_pagesize to modules Joerg Roedel
  2009-03-27 14:31 ` [PATCH 2/7] kvm mmu: infrastructure changes for multiple huge page support Joerg Roedel
@ 2009-03-27 14:31 ` Joerg Roedel
  2009-03-27 14:31 ` [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting Joerg Roedel
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

Two reasons for this:

  * To make rmap_remove interface consistent with rmap_add which already
    has a page size parameter
  * When supporting more than one huge page size rmap_remove can't guess
    the huge page size anymore

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c         |   40 ++++++++++++++++++++++------------------
 arch/x86/kvm/paging_tmpl.h |   10 +++++++---
 2 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3a57c17..9936b45 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -556,7 +556,7 @@ static void rmap_desc_remove_entry(unsigned long *rmapp,
 	mmu_free_rmap_desc(desc);
 }
 
-static void rmap_remove(struct kvm *kvm, u64 *spte)
+static void rmap_remove(struct kvm *kvm, u64 *spte, enum kvm_page_size psize)
 {
 	struct kvm_rmap_desc *desc;
 	struct kvm_rmap_desc *prev_desc;
@@ -564,7 +564,6 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	pfn_t pfn;
 	unsigned long *rmapp;
 	int i;
-	enum kvm_page_size psize;
 
 	if (!is_rmap_pte(*spte))
 		return;
@@ -576,7 +575,6 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 		kvm_release_pfn_dirty(pfn);
 	else
 		kvm_release_pfn_clean(pfn);
-	psize = is_large_pte(*spte) ? KVM_PAGE_SIZE_2M : KVM_PAGE_SIZE_4k;
 	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], psize);
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -673,7 +671,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 		BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
 		pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
 		if (is_writeble_pte(*spte)) {
-			rmap_remove(kvm, spte);
+			rmap_remove(kvm, spte, KVM_PAGE_SIZE_2M);
 			--kvm->stat.lpages;
 			set_shadow_pte(spte, shadow_trap_nonpresent_pte);
 			spte = NULL;
@@ -685,7 +683,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 	return write_protected;
 }
 
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+			   enum kvm_page_size psize)
 {
 	u64 *spte;
 	int need_tlb_flush = 0;
@@ -693,7 +692,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 	while ((spte = rmap_next(kvm, rmapp, NULL))) {
 		BUG_ON(!(*spte & PT_PRESENT_MASK));
 		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
-		rmap_remove(kvm, spte);
+		rmap_remove(kvm, spte, psize);
 		set_shadow_pte(spte, shadow_trap_nonpresent_pte);
 		need_tlb_flush = 1;
 	}
@@ -701,7 +700,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 }
 
 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp))
+			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+					 enum kvm_page_size psize))
 {
 	int i;
 	int retval = 0;
@@ -722,11 +722,12 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 		end = start + (memslot->npages << PAGE_SHIFT);
 		if (hva >= start && hva < end) {
 			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
-			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
+			unsigned long lidx = gfn_offset / KVM_PAGES_PER_2M_PAGE;
+			retval |= handler(kvm, &memslot->rmap[gfn_offset],
+					  KVM_PAGE_SIZE_4k);
 			retval |= handler(kvm,
-					  &memslot->lpage_info[
-						  gfn_offset /
-						  KVM_PAGES_PER_2M_PAGE].rmap_pde);
+					  &memslot->lpage_info[lidx].rmap_pde,
+					  KVM_PAGE_SIZE_2M);
 		}
 	}
 
@@ -738,7 +739,8 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 	return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
 }
 
-static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+			 enum kvm_page_size psize)
 {
 	u64 *spte;
 	int young = 0;
@@ -1312,7 +1314,7 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 	if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
 		for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
 			if (is_shadow_present_pte(pt[i]))
-				rmap_remove(kvm, &pt[i]);
+				rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_4k);
 			pt[i] = shadow_trap_nonpresent_pte;
 		}
 		return;
@@ -1328,7 +1330,7 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 							   &pt[i]);
 			} else {
 				--kvm->stat.lpages;
-				rmap_remove(kvm, &pt[i]);
+				rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_2M);
 			}
 		}
 		pt[i] = shadow_trap_nonpresent_pte;
@@ -1798,7 +1800,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 		} else if (pfn != spte_to_pfn(*shadow_pte)) {
 			pgprintk("hfn old %lx new %lx\n",
 				 spte_to_pfn(*shadow_pte), pfn);
-			rmap_remove(vcpu->kvm, shadow_pte);
+			rmap_remove(vcpu->kvm, shadow_pte, psize);
 		} else
 			was_rmapped = 1;
 	}
@@ -2320,9 +2322,11 @@ static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
 
 	pte = *spte;
 	if (is_shadow_present_pte(pte)) {
-		if (sp->role.level == PT_PAGE_TABLE_LEVEL ||
-		    is_large_pte(pte))
-			rmap_remove(vcpu->kvm, spte);
+		if (sp->role.level == PT_PAGE_TABLE_LEVEL)
+			rmap_remove(vcpu->kvm, spte, KVM_PAGE_SIZE_4k);
+		else if (is_large_pte(pte) &&
+			 sp->role.level == PT_DIRECTORY_LEVEL)
+			rmap_remove(vcpu->kvm, spte, KVM_PAGE_SIZE_2M);
 		else {
 			child = page_header(pte & PT64_BASE_ADDR_MASK);
 			mmu_page_remove_parent_pte(child, spte);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 9fbd049..6704ec7 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -307,7 +307,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			continue;
 
 		if (is_large_pte(*sptep)) {
-			rmap_remove(vcpu->kvm, sptep);
+			rmap_remove(vcpu->kvm, sptep, KVM_PAGE_SIZE_2M);
 			set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
 			kvm_flush_remote_tlbs(vcpu->kvm);
 		}
@@ -459,12 +459,16 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 		if (level == PT_PAGE_TABLE_LEVEL ||
 		    ((level == PT_DIRECTORY_LEVEL) && is_large_pte(*sptep))) {
 			struct kvm_mmu_page *sp = page_header(__pa(sptep));
+			enum kvm_page_size psize = KVM_PAGE_SIZE_4k;
+
+			if (level == PT_DIRECTORY_LEVEL)
+				psize = KVM_PAGE_SIZE_2M;
 
 			pte_gpa = (sp->gfn << PAGE_SHIFT);
 			pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
 
 			if (is_shadow_present_pte(*sptep)) {
-				rmap_remove(vcpu->kvm, sptep);
+				rmap_remove(vcpu->kvm, sptep, psize);
 				if (is_large_pte(*sptep))
 					--vcpu->kvm->stat.lpages;
 				need_flush = 1;
@@ -575,7 +579,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		    !(gpte & PT_ACCESSED_MASK)) {
 			u64 nonpresent;
 
-			rmap_remove(vcpu->kvm, &sp->spt[i]);
+			rmap_remove(vcpu->kvm, &sp->spt[i], KVM_PAGE_SIZE_4k);
 			if (is_present_pte(gpte))
 				nonpresent = shadow_trap_nonpresent_pte;
 			else
-- 
1.5.6.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
                   ` (2 preceding siblings ...)
  2009-03-27 14:31 ` [PATCH 3/7] kvm mmu: add page size parameter to rmap_remove Joerg Roedel
@ 2009-03-27 14:31 ` Joerg Roedel
  2009-03-29 11:45   ` Avi Kivity
  2009-03-29 13:26   ` Avi Kivity
  2009-03-27 14:31 ` [PATCH 5/7] kvm mmu: add support for 1GB pages to direct mapping paths Joerg Roedel
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

This patch adds the necessary data structures to take care of write
protections in place within a second huge page sized page.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   10 ++++++++++
 arch/x86/kvm/mmu.c              |   33 +++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h        |    2 +-
 virt/kvm/kvm_main.c             |   26 ++++++++++++++++++++++++++
 4 files changed, 70 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f268f99..a1df2a3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -58,7 +58,16 @@
 
 #define KVM_PAGES_PER_2M_PAGE (KVM_2M_PAGE_SIZE / PAGE_SIZE)
 
+#define KVM_1G_PAGE_SHIFT 30
+#define KVM_1G_PAGE_SIZE (1UL << KVM_1G_PAGE_SHIFT)
+#define KVM_1G_PAGE_MASK (~(KVM_1G_PAGE_SIZE - 1))
+
+#define KVM_PAGES_PER_1G_PAGE (KVM_1G_PAGE_SIZE / PAGE_SIZE)
+
+/* Huge Page */
 #define KVM_PAGES_PER_HPAGE	KVM_PAGES_PER_2M_PAGE
+/* Large Huge Page ;-) */
+#define KVM_PAGES_PER_LHPAGE	KVM_PAGES_PER_1G_PAGE
 
 #define DE_VECTOR 0
 #define DB_VECTOR 1
@@ -268,6 +277,7 @@ struct kvm_mmu {
 enum kvm_page_size {
 	KVM_PAGE_SIZE_4k = (1 << 12),
 	KVM_PAGE_SIZE_2M = (1 << 21),
+	KVM_PAGE_SIZE_1G = (1 << 30),
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9936b45..7d4162d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -390,6 +390,15 @@ static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
 	return &slot->lpage_info[idx].write_count;
 }
 
+static int *slot_hugepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
+{
+	unsigned long idx;
+
+	idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
+	      (slot->base_gfn / KVM_PAGES_PER_1G_PAGE);
+	return &slot->hpage_info[idx].write_count;
+}
+
 static void account_shadowed(struct kvm *kvm, gfn_t gfn)
 {
 	int *write_count;
@@ -398,6 +407,10 @@ static void account_shadowed(struct kvm *kvm, gfn_t gfn)
 	write_count = slot_largepage_idx(gfn,
 					 gfn_to_memslot_unaliased(kvm, gfn));
 	*write_count += 1;
+
+	write_count = slot_hugepage_idx(gfn,
+					gfn_to_memslot_unaliased(kvm, gfn));
+	*write_count += 1;
 }
 
 static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
@@ -409,6 +422,11 @@ static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
 					 gfn_to_memslot_unaliased(kvm, gfn));
 	*write_count -= 1;
 	WARN_ON(*write_count < 0);
+
+	write_count = slot_hugepage_idx(gfn,
+					gfn_to_memslot_unaliased(kvm, gfn));
+	*write_count -= 1;
+	WARN_ON(*write_count < 0);
 }
 
 static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
@@ -426,6 +444,21 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
 	return 1;
 }
 
+static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int *hugepage_idx;
+
+	gfn = unalias_gfn(kvm, gfn);
+	slot = gfn_to_memslot_unaliased(kvm, gfn);
+	if (slot) {
+		hugepage_idx = slot_hugepage_idx(gfn, slot);
+		return *hugepage_idx;
+	}
+
+	return 1;
+}
+
 static enum kvm_page_size host_page_size(struct kvm *kvm, gfn_t gfn)
 {
 	struct vm_area_struct *vma;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 095ebb6..2f05d48 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -103,7 +103,7 @@ struct kvm_memory_slot {
 	struct {
 		unsigned long rmap_pde;
 		int write_count;
-	} *lpage_info;
+	} *lpage_info, *hpage_info;
 	unsigned long userspace_addr;
 	int user_alloc;
 };
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8aa3b95..c4842f4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1001,10 +1001,14 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 	if (!dont || free->lpage_info != dont->lpage_info)
 		vfree(free->lpage_info);
 
+	if (!dont || free->hpage_info != dont->hpage_info)
+		vfree(free->hpage_info);
+
 	free->npages = 0;
 	free->dirty_bitmap = NULL;
 	free->rmap = NULL;
 	free->lpage_info = NULL;
+	free->hpage_info = NULL;
 }
 
 void kvm_free_physmem(struct kvm *kvm)
@@ -1170,6 +1174,28 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			new.lpage_info[largepages-1].write_count = 1;
 	}
 
+#ifdef KVM_PAGES_PER_LHPAGE
+	if (npages && !new.hpage_info) {
+		int hugepages = npages / KVM_PAGES_PER_LHPAGE;
+		if (npages % KVM_PAGES_PER_LHPAGE)
+			hugepages++;
+		if (base_gfn % KVM_PAGES_PER_LHPAGE)
+			hugepages++;
+
+		new.hpage_info = vmalloc(hugepages * sizeof(*new.hpage_info));
+
+		if (!new.hpage_info)
+			goto out_free;
+
+		memset(new.hpage_info, 0, hugepages * sizeof(*new.hpage_info));
+
+		if (base_gfn % KVM_PAGES_PER_LHPAGE)
+			new.hpage_info[0].write_count = 1;
+		if ((base_gfn+npages) % KVM_PAGES_PER_LHPAGE)
+			new.hpage_info[hugepages-1].write_count = 1;
+	}
+#endif
+
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
 		unsigned dirty_bytes = ALIGN(npages, BITS_PER_LONG) / 8;
-- 
1.5.6.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/7] kvm mmu: add support for 1GB pages to direct mapping paths
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
                   ` (3 preceding siblings ...)
  2009-03-27 14:31 ` [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting Joerg Roedel
@ 2009-03-27 14:31 ` Joerg Roedel
  2009-03-29 11:49   ` Avi Kivity
  2009-03-27 14:31 ` [PATCH 6/7] kvm mmu: enabling 1GB pages by extending backing_size funtion Joerg Roedel
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

This patch makes the MMU path for TDP aware of 1GB pages.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c         |   56 +++++++++++++++++++++++++++++++++++--------
 arch/x86/kvm/paging_tmpl.h |   14 +++++++++++
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7d4162d..3f5e20b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -116,6 +116,11 @@ module_param(oos_shadow, bool, 0644);
 #define PT64_DIR_BASE_ADDR_MASK \
 	(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
 
+#define PT64_MID_BASE_ADDR_MASK (PT64_BASE_ADDR_MASK & \
+		~((1ULL << (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))) - 1))
+#define PT64_MID_GFN_DELTA_MASK (PT64_BASE_ADDR_MASK & (((1ULL << \
+				(2 * PT64_LEVEL_BITS)) - 1) << PAGE_SHIFT))
+
 #define PT32_BASE_ADDR_MASK PAGE_MASK
 #define PT32_DIR_BASE_ADDR_MASK \
 	(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
@@ -128,6 +133,7 @@ module_param(oos_shadow, bool, 0644);
 #define PFERR_USER_MASK (1U << 2)
 #define PFERR_FETCH_MASK (1U << 4)
 
+#define PT_MIDDLE_LEVEL 3
 #define PT_DIRECTORY_LEVEL 2
 #define PT_PAGE_TABLE_LEVEL 1
 
@@ -507,16 +513,29 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
 				  enum kvm_page_size psize)
 {
 	struct kvm_memory_slot *slot;
-	unsigned long idx;
+	unsigned long idx, *ret;
 
 	slot = gfn_to_memslot(kvm, gfn);
-	if (psize == KVM_PAGE_SIZE_4k)
-		return &slot->rmap[gfn - slot->base_gfn];
 
-	idx = (gfn / KVM_PAGES_PER_2M_PAGE) -
-	      (slot->base_gfn / KVM_PAGES_PER_2M_PAGE);
+	switch (psize) {
+	case KVM_PAGE_SIZE_4k:
+		ret = &slot->rmap[gfn - slot->base_gfn];
+		break;
+	case KVM_PAGE_SIZE_2M:
+		idx = (gfn / KVM_PAGES_PER_2M_PAGE) -
+		      (slot->base_gfn / KVM_PAGES_PER_2M_PAGE);
+		ret = &slot->lpage_info[idx].rmap_pde;
+		break;
+	case KVM_PAGE_SIZE_1G:
+		idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
+		      (slot->base_gfn / KVM_PAGES_PER_1G_PAGE);
+		ret = &slot->hpage_info[idx].rmap_pde;
+		break;
+	default:
+		BUG();
+	}
 
-	return &slot->lpage_info[idx].rmap_pde;
+	return ret;
 }
 
 /*
@@ -1363,7 +1382,10 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 							   &pt[i]);
 			} else {
 				--kvm->stat.lpages;
-				rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_2M);
+				if (sp->role.level == PT_DIRECTORY_LEVEL)
+					rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_2M);
+				else
+					rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_1G);
 			}
 		}
 		pt[i] = shadow_trap_nonpresent_pte;
@@ -1769,8 +1791,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 	if ((pte_access & ACC_WRITE_MASK)
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
 
-		if (psize > KVM_PAGE_SIZE_4k &&
-		    has_wrprotected_page(vcpu->kvm, gfn)) {
+		if ((psize == KVM_PAGE_SIZE_2M &&
+		     has_wrprotected_page(vcpu->kvm, gfn)) ||
+		    (psize == KVM_PAGE_SIZE_1G &&
+		     has_wrprotected_largepage(vcpu->kvm, gfn))) {
 			ret = 1;
 			spte = shadow_trap_nonpresent_pte;
 			goto set_pte;
@@ -1884,7 +1908,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
 		if (iterator.level == PT_PAGE_TABLE_LEVEL
 		    || (psize == KVM_PAGE_SIZE_2M &&
-			iterator.level == PT_DIRECTORY_LEVEL)) {
+			iterator.level == PT_DIRECTORY_LEVEL)
+		    || (psize == KVM_PAGE_SIZE_1G &&
+			iterator.level == PT_MIDDLE_LEVEL)) {
 			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
 				     0, write, 1, &pt_write,
 				     psize, 0, gfn, pfn, false);
@@ -1919,8 +1945,14 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	unsigned long mmu_seq;
 	enum kvm_page_size psize = backing_size(vcpu, gfn);
 
-	if (psize == KVM_PAGE_SIZE_2M)
+	if (psize >= KVM_PAGE_SIZE_2M) {
+		/*
+		 * nonpaging mode uses pae page tables - so we
+		 * can't use gbpages here - take care of this
+		 */
 		gfn &= ~(KVM_PAGES_PER_2M_PAGE-1);
+		psize = KVM_PAGE_SIZE_2M;
+	}
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
@@ -2123,6 +2155,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	psize = backing_size(vcpu, gfn);
 	if (psize == KVM_PAGE_SIZE_2M)
 		gfn &= ~(KVM_PAGES_PER_2M_PAGE-1);
+	else if (psize == KVM_PAGE_SIZE_1G)
+		gfn &= ~(KVM_PAGES_PER_1G_PAGE-1);
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6704ec7..67d6bfb 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -55,6 +55,7 @@
 
 #define gpte_to_gfn FNAME(gpte_to_gfn)
 #define gpte_to_gfn_pde FNAME(gpte_to_gfn_pde)
+#define gpte_to_gfn_pmd FNAME(gpte_to_gfn_pmd)
 
 /*
  * The guest_walker structure emulates the behavior of the hardware page
@@ -81,6 +82,11 @@ static gfn_t gpte_to_gfn_pde(pt_element_t gpte)
 	return (gpte & PT_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
 }
 
+static gfn_t gpte_to_gfn_pmd(pt_element_t gpte)
+{
+	return (gpte & PT64_MID_BASE_ADDR_MASK) >> PAGE_SHIFT;
+}
+
 static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
 			 gfn_t table_gfn, unsigned index,
 			 pt_element_t orig_pte, pt_element_t new_pte)
@@ -196,6 +202,14 @@ walk:
 			break;
 		}
 
+		if (walker->level == PT_MIDDLE_LEVEL &&
+		    (pte & PT_PAGE_SIZE_MASK) &&
+		    is_long_mode(vcpu)) {
+			walker->gfn = gpte_to_gfn_pmd(pte);
+			walker->gfn += (addr & PT64_MID_GFN_DELTA_MASK) >> PAGE_SHIFT;
+			break;
+		}
+
 		pt_access = pte_access;
 		--walker->level;
 	}
-- 
1.5.6.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/7] kvm mmu: enabling 1GB pages by extending backing_size funtion
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
                   ` (4 preceding siblings ...)
  2009-03-27 14:31 ` [PATCH 5/7] kvm mmu: add support for 1GB pages to direct mapping paths Joerg Roedel
@ 2009-03-27 14:31 ` Joerg Roedel
  2009-03-29 11:51   ` Avi Kivity
  2009-03-27 14:31 ` [PATCH 7/7] kvm x86: report 1GB page support to userspace Joerg Roedel
  2009-03-28 21:40 ` [PATCH 0/7] Support for GB pages in KVM Marcelo Tosatti
  7 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

This patch enables support for 1GB pages in KVM by implementing
the support in backing_size().

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c |   30 ++++++++++++++++++++++--------
 1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3f5e20b..471e5d0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -479,7 +479,9 @@ static enum kvm_page_size host_page_size(struct kvm *kvm, gfn_t gfn)
 	vma = find_vma(current->mm, addr);
 	if (vma) {
 		size = vma_kernel_pagesize(vma);
-		if (size >= KVM_PAGE_SIZE_2M)
+		if (size == KVM_PAGE_SIZE_1G)
+			ret = KVM_PAGE_SIZE_1G;
+		else if (size >= KVM_PAGE_SIZE_2M)
 			ret = KVM_PAGE_SIZE_2M;
 	}
 	up_read(&current->mm->mmap_sem);
@@ -490,18 +492,30 @@ static enum kvm_page_size host_page_size(struct kvm *kvm, gfn_t gfn)
 static enum kvm_page_size backing_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	struct kvm_memory_slot *slot;
-
-	if (has_wrprotected_page(vcpu->kvm, gfn))
-		return KVM_PAGE_SIZE_4k;
-
-	if (host_page_size(vcpu->kvm, gfn) < KVM_PAGE_SIZE_2M)
-		return KVM_PAGE_SIZE_4k;
+	enum kvm_page_size host_size, ret;
 
 	slot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (slot && slot->dirty_bitmap)
 		return KVM_PAGE_SIZE_4k;
 
-	return KVM_PAGE_SIZE_2M;
+	host_size = host_page_size(vcpu->kvm, gfn);
+
+	switch (host_size) {
+	case KVM_PAGE_SIZE_1G:
+		if (!has_wrprotected_largepage(vcpu->kvm, gfn)) {
+			ret = KVM_PAGE_SIZE_1G;
+			break;
+		}
+	case KVM_PAGE_SIZE_2M:
+		if (!has_wrprotected_page(vcpu->kvm, gfn)) {
+			ret = KVM_PAGE_SIZE_2M;
+			break;
+		}
+	default:
+		ret = KVM_PAGE_SIZE_4k;
+	}
+
+	return ret;
 }
 
 /*
-- 
1.5.6.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/7] kvm x86: report 1GB page support to userspace
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
                   ` (5 preceding siblings ...)
  2009-03-27 14:31 ` [PATCH 6/7] kvm mmu: enabling 1GB pages by extending backing_size funtion Joerg Roedel
@ 2009-03-27 14:31 ` Joerg Roedel
  2009-03-29 11:54   ` Avi Kivity
  2009-03-28 21:40 ` [PATCH 0/7] Support for GB pages in KVM Marcelo Tosatti
  7 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-27 14:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

If userspace knows that the kernel part supports 1GB pages it can enable
the corresponding cpuid bit so that guests actually use GB pages.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    2 ++
 arch/x86/kvm/svm.c              |    7 +++++++
 arch/x86/kvm/vmx.c              |    7 +++++++
 arch/x86/kvm/x86.c              |    6 +++++-
 include/linux/kvm.h             |    1 +
 5 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a1df2a3..6593198 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -542,6 +542,8 @@ struct kvm_x86_ops {
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	int (*get_mt_mask_shift)(void);
+
+	bool (*gb_page_enable)(void);
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1fcbc17..d140686 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2604,6 +2604,11 @@ static int svm_get_mt_mask_shift(void)
 	return 0;
 }
 
+static bool svm_gb_page_enable(void)
+{
+	return npt_enabled;
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -2661,6 +2666,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
 	.get_mt_mask_shift = svm_get_mt_mask_shift,
+
+	.gb_page_enable = svm_gb_page_enable,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 37ae13d..e54af3f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3647,6 +3647,11 @@ static int vmx_get_mt_mask_shift(void)
 	return VMX_EPT_MT_EPTE_SHIFT;
 }
 
+static bool vmx_gb_page_enable(void)
+{
+	return false;
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -3702,6 +3707,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
 	.get_mt_mask_shift = vmx_get_mt_mask_shift,
+
+	.gb_page_enable = vmx_gb_page_enable,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae4918c..c94f231 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,7 +1009,7 @@ out:
 
 int kvm_dev_ioctl_check_extension(long ext)
 {
-	int r;
+	int r = 0;
 
 	switch (ext) {
 	case KVM_CAP_IRQCHIP:
@@ -1027,6 +1027,10 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_ASSIGN_DEV_IRQ:
 		r = 1;
 		break;
+	case KVM_CAP_1GB_PAGES:
+		if (kvm_x86_ops->gb_page_enable())
+			r = 1;
+		break;
 	case KVM_CAP_COALESCED_MMIO:
 		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ee755e2..e79eb26 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -413,6 +413,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_DEVICE_MSIX 28
 #endif
 #define KVM_CAP_ASSIGN_DEV_IRQ 29
+#define KVM_CAP_1GB_PAGES 30
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.5.6.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] Support for GB pages in KVM
  2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
                   ` (6 preceding siblings ...)
  2009-03-27 14:31 ` [PATCH 7/7] kvm x86: report 1GB page support to userspace Joerg Roedel
@ 2009-03-28 21:40 ` Marcelo Tosatti
  2009-03-28 21:49   ` Joerg Roedel
  2009-03-29 12:01   ` Avi Kivity
  7 siblings, 2 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2009-03-28 21:40 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, kvm, linux-kernel

On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:
> Hi,
> 
> this patchset extends the KVM MMU implementation to support 1GB pages as
> supported by AMD family 16 processors. These patches enable support for
> 1 GB pages with Nested Paging. Support for these pages in the shadow
> paging code was also developed but does not run stable yet. The patch
> for shadow-paging support is not included in this series and will be
> sent out seperatly.

Looks generally sane. I'm not sure its even worthwhile to support
GBpages with softmmu, because the chance of finding an area without
shadowed (write protected) pages is much smaller than with 2MB pages.

Have any numbers to share?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] Support for GB pages in KVM
  2009-03-28 21:40 ` [PATCH 0/7] Support for GB pages in KVM Marcelo Tosatti
@ 2009-03-28 21:49   ` Joerg Roedel
  2009-03-29 12:03     ` Avi Kivity
  2009-03-29 12:01   ` Avi Kivity
  1 sibling, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-28 21:49 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Joerg Roedel, Avi Kivity, kvm, linux-kernel

On Sat, Mar 28, 2009 at 06:40:08PM -0300, Marcelo Tosatti wrote:
> On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:
> > Hi,
> > 
> > this patchset extends the KVM MMU implementation to support 1GB pages as
> > supported by AMD family 16 processors. These patches enable support for
> > 1 GB pages with Nested Paging. Support for these pages in the shadow
> > paging code was also developed but does not run stable yet. The patch
> > for shadow-paging support is not included in this series and will be
> > sent out seperatly.
> 
> Looks generally sane. I'm not sure its even worthwhile to support
> GBpages with softmmu, because the chance of finding an area without
> shadowed (write protected) pages is much smaller than with 2MB pages.

Thanks for your review.

The idea behind GB pages in softmmu code was to provide GB pages to the
guest even if hardware does not support it. This would work better with
live migration (Only case where we wouldn't have gbpages then would be
vmx with ept enabled).

> Have any numbers to share?

No numbers I fully trust by now. I measured a 32% improvement in
kernbench using nested pages backed with gb pages. I will do some more
measurements and share some more solid numbers.

Joerg


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/7] kvm mmu: infrastructure changes for multiple huge page support
  2009-03-27 14:31 ` [PATCH 2/7] kvm mmu: infrastructure changes for multiple huge page support Joerg Roedel
@ 2009-03-29 11:38   ` Avi Kivity
  0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 11:38 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
> This patch includes most of the necessary changes to the KVM SoftMMU for
> supporting more than one huge page size. The changes in this patch
> include:
>
>   * introduce 'enum kvm_page_size' which is used to represent the page
>     size used
>   

How about the concept of page_level instead, which returns the level of 
a page in terms of the paging hierarchy?  Would be easier to compare 
against in the various walkers.

I've been thinking for a while to make levels zero-based, so 0 would be 
a 4K page, 1 a 2M page, and 2 a 1GB page.




-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting
  2009-03-27 14:31 ` [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting Joerg Roedel
@ 2009-03-29 11:45   ` Avi Kivity
  2009-03-29 13:03     ` Joerg Roedel
  2009-03-29 13:26   ` Avi Kivity
  1 sibling, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 11:45 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
> This patch adds the necessary data structures to take care of write
> protections in place within a second huge page sized page.
>
>  
>  struct kvm_vcpu_arch {
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9936b45..7d4162d 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -390,6 +390,15 @@ static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
>  	return &slot->lpage_info[idx].write_count;
>  }
>  
> +static int *slot_hugepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
> +{
> +	unsigned long idx;
> +
> +	idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
> +	      (slot->base_gfn / KVM_PAGES_PER_1G_PAGE);
> +	return &slot->hpage_info[idx].write_count;
> +}
>   

A page level argument would remove the need for this duplication, as 
well as all the constants.

>  
> +static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)
> +{
> +	struct kvm_memory_slot *slot;
> +	int *hugepage_idx;
> +
> +	gfn = unalias_gfn(kvm, gfn);
> +	slot = gfn_to_memslot_unaliased(kvm, gfn);
> +	if (slot) {
> +		hugepage_idx = slot_hugepage_idx(gfn, slot);
>   

slot_largepage_idx() here?

I don't think we ever write protect large pages, so why is this needed?

> +		return *hugepage_idx;
> +	}
> +
> +	return 1;
> +}
>   


> +
>  static enum kvm_page_size host_page_size(struct kvm *kvm, gfn_t gfn)
>  {
>  	struct vm_area_struct *vma;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 095ebb6..2f05d48 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -103,7 +103,7 @@ struct kvm_memory_slot {
>  	struct {
>  		unsigned long rmap_pde;
>  		int write_count;
> -	} *lpage_info;
> +	} *lpage_info, *hpage_info;
>   

} * lpage_info[KVM_NR_PAGE_LEVELS];

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8aa3b95..c4842f4 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1001,10 +1001,14 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
>  	if (!dont || free->lpage_info != dont->lpage_info)
>  		vfree(free->lpage_info);
>  
> +	if (!dont || free->hpage_info != dont->hpage_info)
> +		vfree(free->hpage_info);
>   

loop

>  void kvm_free_physmem(struct kvm *kvm)
> @@ -1170,6 +1174,28 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  			new.lpage_info[largepages-1].write_count = 1;
>  	}
>  
> +#ifdef KVM_PAGES_PER_LHPAGE
> +	if (npages && !new.hpage_info) {
> +		int hugepages = npages / KVM_PAGES_PER_LHPAGE;
> +		if (npages % KVM_PAGES_PER_LHPAGE)
> +			hugepages++;
> +		if (base_gfn % KVM_PAGES_PER_LHPAGE)
> +			hugepages++;
> +
> +		new.hpage_info = vmalloc(hugepages * sizeof(*new.hpage_info));
> +
> +		if (!new.hpage_info)
> +			goto out_free;
> +
> +		memset(new.hpage_info, 0, hugepages * sizeof(*new.hpage_info));
> +
> +		if (base_gfn % KVM_PAGES_PER_LHPAGE)
> +			new.hpage_info[0].write_count = 1;
> +		if ((base_gfn+npages) % KVM_PAGES_PER_LHPAGE)
> +			new.hpage_info[hugepages-1].write_count = 1;
> +	}
> +#endif
>   

Loop, KVM_NR_PAGE_LEVELS defined per arch.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] kvm mmu: add support for 1GB pages to direct mapping paths
  2009-03-27 14:31 ` [PATCH 5/7] kvm mmu: add support for 1GB pages to direct mapping paths Joerg Roedel
@ 2009-03-29 11:49   ` Avi Kivity
  0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 11:49 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
> This patch makes the MMU path for TDP aware of 1GB pages.
>
>  
> +#define PT64_MID_BASE_ADDR_MASK (PT64_BASE_ADDR_MASK & \
> +		~((1ULL << (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))) - 1))
> +#define PT64_MID_GFN_DELTA_MASK (PT64_BASE_ADDR_MASK & (((1ULL << \
> +				(2 * PT64_LEVEL_BITS)) - 1) << PAGE_SHIFT))
> +
>  #define PT32_BASE_ADDR_MASK PAGE_MASK
>  #define PT32_DIR_BASE_ADDR_MASK \
>  	(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
> @@ -128,6 +133,7 @@ module_param(oos_shadow, bool, 0644);
>  #define PFERR_USER_MASK (1U << 2)
>  #define PFERR_FETCH_MASK (1U << 4)
>  
> +#define PT_MIDDLE_LEVEL 3
>   

I prefer the architectural names to the Linux names (since we're talking 
about the guest), so PDPT here (even though the Linux names make a bit 
more sense).

>  #define PT_DIRECTORY_LEVEL 2
>  #define PT_PAGE_TABLE_LEVEL 1
>  
> @@ -507,16 +513,29 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
>  				  enum kvm_page_size psize)
>  {
>  	struct kvm_memory_slot *slot;
> -	unsigned long idx;
> +	unsigned long idx, *ret;
>  
>  	slot = gfn_to_memslot(kvm, gfn);
> -	if (psize == KVM_PAGE_SIZE_4k)
> -		return &slot->rmap[gfn - slot->base_gfn];
>  
> -	idx = (gfn / KVM_PAGES_PER_2M_PAGE) -
> -	      (slot->base_gfn / KVM_PAGES_PER_2M_PAGE);
> +	switch (psize) {
> +	case KVM_PAGE_SIZE_4k:
> +		ret = &slot->rmap[gfn - slot->base_gfn];
> +		break;
> +	case KVM_PAGE_SIZE_2M:
> +		idx = (gfn / KVM_PAGES_PER_2M_PAGE) -
> +		      (slot->base_gfn / KVM_PAGES_PER_2M_PAGE);
> +		ret = &slot->lpage_info[idx].rmap_pde;
> +		break;
> +	case KVM_PAGE_SIZE_1G:
> +		idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
> +		      (slot->base_gfn / KVM_PAGES_PER_1G_PAGE);
> +		ret = &slot->hpage_info[idx].rmap_pde;
> +		break;
> +	default:
> +		BUG();
> +	}
>   

Ah, page_level would really make sense here.

>  
> -	return &slot->lpage_info[idx].rmap_pde;
> +	return ret;
>  }
>  
>  /*
> @@ -1363,7 +1382,10 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
>  							   &pt[i]);
>  			} else {
>  				--kvm->stat.lpages;
> -				rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_2M);
> +				if (sp->role.level == PT_DIRECTORY_LEVEL)
> +					rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_2M);
> +				else
> +					rmap_remove(kvm, &pt[i], KVM_PAGE_SIZE_1G);
>  			}
>   

And here.

>  		}
>  		pt[i] = shadow_trap_nonpresent_pte;
> @@ -1769,8 +1791,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
>  	if ((pte_access & ACC_WRITE_MASK)
>  	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
>  
> -		if (psize > KVM_PAGE_SIZE_4k &&
> -		    has_wrprotected_page(vcpu->kvm, gfn)) {
> +		if ((psize == KVM_PAGE_SIZE_2M &&
> +		     has_wrprotected_page(vcpu->kvm, gfn)) ||
> +		    (psize == KVM_PAGE_SIZE_1G &&
> +		     has_wrprotected_largepage(vcpu->kvm, gfn))) {
>  			ret = 1;
>   

And here.  I'm in complete agreement with myself here.

>  			spte = shadow_trap_nonpresent_pte;
>  			goto set_pte;
> @@ -1884,7 +1908,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
>  		if (iterator.level == PT_PAGE_TABLE_LEVEL
>  		    || (psize == KVM_PAGE_SIZE_2M &&
> -			iterator.level == PT_DIRECTORY_LEVEL)) {
> +			iterator.level == PT_DIRECTORY_LEVEL)
> +		    || (psize == KVM_PAGE_SIZE_1G &&
> +			iterator.level == PT_MIDDLE_LEVEL)) {
>  			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
>  				     0, write, 1, &pt_write,
>  				     psize, 0, gfn, pfn, false);
> @@ -1919,8 +1945,14 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
>  	unsigned long mmu_seq;
>  	enum kvm_page_size psize = backing_size(vcpu, gfn);
>  
> -	if (psize == KVM_PAGE_SIZE_2M)
> +	if (psize >= KVM_PAGE_SIZE_2M) {
> +		/*
> +		 * nonpaging mode uses pae page tables - so we
> +		 * can't use gbpages here - take care of this
> +		 */
>  		gfn &= ~(KVM_PAGES_PER_2M_PAGE-1);
> +		psize = KVM_PAGE_SIZE_2M;
> +	}
>  
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
> @@ -2123,6 +2155,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  	psize = backing_size(vcpu, gfn);
>  	if (psize == KVM_PAGE_SIZE_2M)
>  		gfn &= ~(KVM_PAGES_PER_2M_PAGE-1);
> +	else if (psize == KVM_PAGE_SIZE_1G)
> +		gfn &= ~(KVM_PAGES_PER_1G_PAGE-1);
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
>  	pfn = gfn_to_pfn(vcpu->kvm, gfn);
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 6704ec7..67d6bfb 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -55,6 +55,7 @@
>  
>  #define gpte_to_gfn FNAME(gpte_to_gfn)
>  #define gpte_to_gfn_pde FNAME(gpte_to_gfn_pde)
> +#define gpte_to_gfn_pmd FNAME(gpte_to_gfn_pmd)
>   

gpte_to_gfn(gpte, level)?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] kvm mmu: enabling 1GB pages by extending backing_size funtion
  2009-03-27 14:31 ` [PATCH 6/7] kvm mmu: enabling 1GB pages by extending backing_size funtion Joerg Roedel
@ 2009-03-29 11:51   ` Avi Kivity
  0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 11:51 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
> This patch enables support for 1GB pages in KVM by implementing
> the support in backing_size().
>
> @@ -490,18 +492,30 @@ static enum kvm_page_size host_page_size(struct kvm *kvm, gfn_t gfn)
>  static enum kvm_page_size backing_size(struct kvm_vcpu *vcpu, gfn_t gfn)
>  {
>  	struct kvm_memory_slot *slot;
> -
> -	if (has_wrprotected_page(vcpu->kvm, gfn))
> -		return KVM_PAGE_SIZE_4k;
> -
> -	if (host_page_size(vcpu->kvm, gfn) < KVM_PAGE_SIZE_2M)
> -		return KVM_PAGE_SIZE_4k;
> +	enum kvm_page_size host_size, ret;
>  
>  	slot = gfn_to_memslot(vcpu->kvm, gfn);
>  	if (slot && slot->dirty_bitmap)
>  		return KVM_PAGE_SIZE_4k;
>  
> -	return KVM_PAGE_SIZE_2M;
> +	host_size = host_page_size(vcpu->kvm, gfn);
> +
> +	switch (host_size) {
> +	case KVM_PAGE_SIZE_1G:
> +		if (!has_wrprotected_largepage(vcpu->kvm, gfn)) {
> +			ret = KVM_PAGE_SIZE_1G;
> +			break;
> +		}
>   

What if there's a wrprotected_page in there?

> +	case KVM_PAGE_SIZE_2M:
> +		if (!has_wrprotected_page(vcpu->kvm, gfn)) {
> +			ret = KVM_PAGE_SIZE_2M;
> +			break;
> +		}
> +	default:
> +		ret = KVM_PAGE_SIZE_4k;
> +	}
> +
> +	return ret;
>  }
>  
>  /*
>   


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace
  2009-03-27 14:31 ` [PATCH 7/7] kvm x86: report 1GB page support to userspace Joerg Roedel
@ 2009-03-29 11:54   ` Avi Kivity
  2009-03-29 12:45     ` Joerg Roedel
  0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 11:54 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
> If userspace knows that the kernel part supports 1GB pages it can enable
> the corresponding cpuid bit so that guests actually use GB pages.
>   

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a1df2a3..6593198 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -542,6 +542,8 @@ struct kvm_x86_ops {
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	int (*get_mt_mask_shift)(void);
> +
> +	bool (*gb_page_enable)(void);
>  };
>   

Should enable unconditionally.  Of course we need to find the shadow bug 
first, may be the has_wrprotected thingy.

> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index ee755e2..e79eb26 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -413,6 +413,7 @@ struct kvm_trace_rec {
>  #define KVM_CAP_DEVICE_MSIX 28
>  #endif
>  #define KVM_CAP_ASSIGN_DEV_IRQ 29
> +#define KVM_CAP_1GB_PAGES 30
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
>   

Need KVM_GET_SUPPORTED_CPUID2 support as well.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] Support for GB pages in KVM
  2009-03-28 21:40 ` [PATCH 0/7] Support for GB pages in KVM Marcelo Tosatti
  2009-03-28 21:49   ` Joerg Roedel
@ 2009-03-29 12:01   ` Avi Kivity
  1 sibling, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 12:01 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Joerg Roedel, kvm, linux-kernel

Marcelo Tosatti wrote:
> On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:
>   
>> Hi,
>>
>> this patchset extends the KVM MMU implementation to support 1GB pages as
>> supported by AMD family 16 processors. These patches enable support for
>> 1 GB pages with Nested Paging. Support for these pages in the shadow
>> paging code was also developed but does not run stable yet. The patch
>> for shadow-paging support is not included in this series and will be
>> sent out seperatly.
>>     
>
> Looks generally sane. I'm not sure its even worthwhile to support
> GBpages with softmmu, because the chance of finding an area without
> shadowed (write protected) pages is much smaller than with 2MB pages.
>   

If the guest uses 1GB pages in userspace, then these pages would not 
have any write protected subpages.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] Support for GB pages in KVM
  2009-03-28 21:49   ` Joerg Roedel
@ 2009-03-29 12:03     ` Avi Kivity
  2009-03-29 12:47       ` Joerg Roedel
  0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 12:03 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, Joerg Roedel, kvm, linux-kernel

Joerg Roedel wrote:
> On Sat, Mar 28, 2009 at 06:40:08PM -0300, Marcelo Tosatti wrote:
>   
>> On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:
>>     
>>> Hi,
>>>
>>> this patchset extends the KVM MMU implementation to support 1GB pages as
>>> supported by AMD family 16 processors. These patches enable support for
>>> 1 GB pages with Nested Paging. Support for these pages in the shadow
>>> paging code was also developed but does not run stable yet. The patch
>>> for shadow-paging support is not included in this series and will be
>>> sent out seperatly.
>>>       
>> Looks generally sane. I'm not sure its even worthwhile to support
>> GBpages with softmmu, because the chance of finding an area without
>> shadowed (write protected) pages is much smaller than with 2MB pages.
>>     
>
> Thanks for your review.
>
> The idea behind GB pages in softmmu code was to provide GB pages to the
> guest even if hardware does not support it. This would work better with
> live migration (Only case where we wouldn't have gbpages then would be
> vmx with ept enabled).
>
>   
>> Have any numbers to share?
>>     
>
> No numbers I fully trust by now. I measured a 32% improvement in
> kernbench using nested pages backed with gb pages. I will do some more
> measurements and share some more solid numbers.
>
>   

Compared to 2M pages?  But we're already close to native here.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace
  2009-03-29 11:54   ` Avi Kivity
@ 2009-03-29 12:45     ` Joerg Roedel
  2009-03-29 12:49       ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-29 12:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Joerg Roedel, Marcelo Tosatti, kvm, linux-kernel

On Sun, Mar 29, 2009 at 02:54:31PM +0300, Avi Kivity wrote:
> Joerg Roedel wrote:
>> If userspace knows that the kernel part supports 1GB pages it can enable
>> the corresponding cpuid bit so that guests actually use GB pages.
>>   
>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index a1df2a3..6593198 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -542,6 +542,8 @@ struct kvm_x86_ops {
>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>  	int (*get_tdp_level)(void);
>>  	int (*get_mt_mask_shift)(void);
>> +
>> +	bool (*gb_page_enable)(void);
>>  };
>>   
>
> Should enable unconditionally.  Of course we need to find the shadow bug  
> first, may be the has_wrprotected thingy.

This was the original plan. But how about VMX with EPT enabled? I am not
sure but I think this configuration will not support gbpages?

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] Support for GB pages in KVM
  2009-03-29 12:03     ` Avi Kivity
@ 2009-03-29 12:47       ` Joerg Roedel
  0 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2009-03-29 12:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Joerg Roedel, kvm, linux-kernel

On Sun, Mar 29, 2009 at 03:03:28PM +0300, Avi Kivity wrote:
> Joerg Roedel wrote:
>> On Sat, Mar 28, 2009 at 06:40:08PM -0300, Marcelo Tosatti wrote:
>>   
>>> On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:
>>>     
>>>> Hi,
>>>>
>>>> this patchset extends the KVM MMU implementation to support 1GB pages as
>>>> supported by AMD family 16 processors. These patches enable support for
>>>> 1 GB pages with Nested Paging. Support for these pages in the shadow
>>>> paging code was also developed but does not run stable yet. The patch
>>>> for shadow-paging support is not included in this series and will be
>>>> sent out seperatly.
>>>>       
>>> Looks generally sane. I'm not sure its even worthwhile to support
>>> GBpages with softmmu, because the chance of finding an area without
>>> shadowed (write protected) pages is much smaller than with 2MB pages.
>>>     
>>
>> Thanks for your review.
>>
>> The idea behind GB pages in softmmu code was to provide GB pages to the
>> guest even if hardware does not support it. This would work better with
>> live migration (Only case where we wouldn't have gbpages then would be
>> vmx with ept enabled).
>>
>>   
>>> Have any numbers to share?
>>>     
>>
>> No numbers I fully trust by now. I measured a 32% improvement in
>> kernbench using nested pages backed with gb pages. I will do some more
>> measurements and share some more solid numbers.
>>
>>   
>
> Compared to 2M pages?  But we're already close to native here.

Yes, thats why I don't trust those numbers. I'll find out what went
wrong and provide more solid numbers.

	Joerg


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace
  2009-03-29 12:45     ` Joerg Roedel
@ 2009-03-29 12:49       ` Avi Kivity
  2009-03-29 12:54         ` Joerg Roedel
  0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 12:49 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Joerg Roedel, Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
>>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>>  	int (*get_tdp_level)(void);
>>>  	int (*get_mt_mask_shift)(void);
>>> +
>>> +	bool (*gb_page_enable)(void);
>>>  };
>>>   
>>>       
>> Should enable unconditionally.  Of course we need to find the shadow bug  
>> first, may be the has_wrprotected thingy.
>>     
>
> This was the original plan. But how about VMX with EPT enabled? I am not
> sure but I think this configuration will not support gbpages?
>
> 	

You're right.  Let's have a ->max_host_page_level() to handle that.  
It's 0.5T pages ready, too.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace
  2009-03-29 12:49       ` Avi Kivity
@ 2009-03-29 12:54         ` Joerg Roedel
  2009-03-29 13:00           ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-29 12:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Joerg Roedel, Marcelo Tosatti, kvm, linux-kernel

On Sun, Mar 29, 2009 at 03:49:11PM +0300, Avi Kivity wrote:
> Joerg Roedel wrote:
>>>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>>>  	int (*get_tdp_level)(void);
>>>>  	int (*get_mt_mask_shift)(void);
>>>> +
>>>> +	bool (*gb_page_enable)(void);
>>>>  };
>>>>         
>>> Should enable unconditionally.  Of course we need to find the shadow 
>>> bug  first, may be the has_wrprotected thingy.
>>>     
>>
>> This was the original plan. But how about VMX with EPT enabled? I am not
>> sure but I think this configuration will not support gbpages?
>>
>> 	
>
> You're right.  Let's have a ->max_host_page_level() to handle that.   
> It's 0.5T pages ready, too.

Ok I will change that together with the page_size -> page_level
chhanges. But I doubt that there will ever be 0.5T pages ;)

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace
  2009-03-29 12:54         ` Joerg Roedel
@ 2009-03-29 13:00           ` Avi Kivity
  0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 13:00 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Joerg Roedel, Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
> Ok I will change that together with the page_size -> page_level
> chhanges. But I doubt that there will ever be 0.5T pages ;)
>   

We're bloating at a rate of 1 bit per 1-2 years, so we have 8-16 years 
to prepare.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting
  2009-03-29 11:45   ` Avi Kivity
@ 2009-03-29 13:03     ` Joerg Roedel
  2009-03-29 13:15       ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2009-03-29 13:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Joerg Roedel, Marcelo Tosatti, kvm, linux-kernel

On Sun, Mar 29, 2009 at 02:45:44PM +0300, Avi Kivity wrote:
> Joerg Roedel wrote:
>> This patch adds the necessary data structures to take care of write
>> protections in place within a second huge page sized page.
>>
>>   struct kvm_vcpu_arch {
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9936b45..7d4162d 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -390,6 +390,15 @@ static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
>>  	return &slot->lpage_info[idx].write_count;
>>  }
>>  +static int *slot_hugepage_idx(gfn_t gfn, struct kvm_memory_slot 
>> *slot)
>> +{
>> +	unsigned long idx;
>> +
>> +	idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
>> +	      (slot->base_gfn / KVM_PAGES_PER_1G_PAGE);
>> +	return &slot->hpage_info[idx].write_count;
>> +}
>>   
>
> A page level argument would remove the need for this duplication, as  
> well as all the constants.
>
>>  +static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)
>> +{
>> +	struct kvm_memory_slot *slot;
>> +	int *hugepage_idx;
>> +
>> +	gfn = unalias_gfn(kvm, gfn);
>> +	slot = gfn_to_memslot_unaliased(kvm, gfn);
>> +	if (slot) {
>> +		hugepage_idx = slot_hugepage_idx(gfn, slot);
>>   
>
> slot_largepage_idx() here?
>
> I don't think we ever write protect large pages, so why is this needed?

For 2mb pages we need to check if there is a write-protected 4k page in it
before we map a 2mb page for writing. If there is any write-protected 4k
page in a 2mb area this 2mb page is considered write-protected. These
'write-protected' 2mb pages are accounted in the account_shadow()
function. This information is taken into account when we decide if we
can map a guest 1gb page as a 1gb page on the host too.

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting
  2009-03-29 13:03     ` Joerg Roedel
@ 2009-03-29 13:15       ` Avi Kivity
  2009-03-29 13:32         ` Joerg Roedel
  0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 13:15 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Joerg Roedel, Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
>>     
>>>  +static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)
>>> +{
>>> +	struct kvm_memory_slot *slot;
>>> +	int *hugepage_idx;
>>> +
>>> +	gfn = unalias_gfn(kvm, gfn);
>>> +	slot = gfn_to_memslot_unaliased(kvm, gfn);
>>> +	if (slot) {
>>> +		hugepage_idx = slot_hugepage_idx(gfn, slot);
>>>   
>>>       
>> slot_largepage_idx() here?
>>
>> I don't think we ever write protect large pages, so why is this needed?
>>     
>
> For 2mb pages we need to check if there is a write-protected 4k page in it
> before we map a 2mb page for writing. If there is any write-protected 4k
> page in a 2mb area this 2mb page is considered write-protected. These
> 'write-protected' 2mb pages are accounted in the account_shadow()
> function. This information is taken into account when we decide if we
> can map a guest 1gb page as a 1gb page on the host too.
>   

account_shadowed() actually increments a hugepage write_count by 1 for 
every 4K page, not 2M page, if I read the code correctly.  The code I 
commented on is right though.

The naming is confusing.  I suggest 
has_wrprotected_page_in_{large,huge}page().  although with the a level 
parameter we can keep has_wrprotected_page().

btw, if we implement account_shadow() as you describe it (only account 
hugepages on largepage transition 0->1 or 1->0) we save a potential 
cacheline bounce on the hugepage write_count accounting array.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting
  2009-03-27 14:31 ` [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting Joerg Roedel
  2009-03-29 11:45   ` Avi Kivity
@ 2009-03-29 13:26   ` Avi Kivity
  2009-03-29 13:37     ` Avi Kivity
  1 sibling, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 13:26 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

Joerg Roedel wrote:
> This patch adds the necessary data structures to take care of write
> protections in place within a second huge page sized page.
>
>
> +#ifdef KVM_PAGES_PER_LHPAGE
> +	if (npages && !new.hpage_info) {
> +		int hugepages = npages / KVM_PAGES_PER_LHPAGE;
> +		if (npages % KVM_PAGES_PER_LHPAGE)
> +			hugepages++;
> +		if (base_gfn % KVM_PAGES_PER_LHPAGE)
> +			hugepages++;
>   

Consider a slot with base_gfn == 1 and npages == 1.  This will have 
hugepages == 2, which is wrong.

I think the right calculation is

  ((base_gfn + npages - 1) / N) - (base_gfn / N) + 1

i.e. index of last page, plus one so we can store it.

The small huge page calculation is off as well.

> +
> +		new.hpage_info = vmalloc(hugepages * sizeof(*new.hpage_info));
> +
> +		if (!new.hpage_info)
> +			goto out_free;
> +
> +		memset(new.hpage_info, 0, hugepages * sizeof(*new.hpage_info));
> +
> +		if (base_gfn % KVM_PAGES_PER_LHPAGE)
> +			new.hpage_info[0].write_count = 1;
> +		if ((base_gfn+npages) % KVM_PAGES_PER_LHPAGE)
> +			new.hpage_info[hugepages-1].write_count = 1;
> +	}
> +#endif
> +
>   

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting
  2009-03-29 13:15       ` Avi Kivity
@ 2009-03-29 13:32         ` Joerg Roedel
  0 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2009-03-29 13:32 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Joerg Roedel, Marcelo Tosatti, kvm, linux-kernel

On Sun, Mar 29, 2009 at 04:15:07PM +0300, Avi Kivity wrote:
> Joerg Roedel wrote:
>>>     
>>>>  +static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)
>>>> +{
>>>> +	struct kvm_memory_slot *slot;
>>>> +	int *hugepage_idx;
>>>> +
>>>> +	gfn = unalias_gfn(kvm, gfn);
>>>> +	slot = gfn_to_memslot_unaliased(kvm, gfn);
>>>> +	if (slot) {
>>>> +		hugepage_idx = slot_hugepage_idx(gfn, slot);
>>>>         
>>> slot_largepage_idx() here?
>>>
>>> I don't think we ever write protect large pages, so why is this needed?
>>>     
>>
>> For 2mb pages we need to check if there is a write-protected 4k page in it
>> before we map a 2mb page for writing. If there is any write-protected 4k
>> page in a 2mb area this 2mb page is considered write-protected. These
>> 'write-protected' 2mb pages are accounted in the account_shadow()
>> function. This information is taken into account when we decide if we
>> can map a guest 1gb page as a 1gb page on the host too.
>>   
>
> account_shadowed() actually increments a hugepage write_count by 1 for  
> every 4K page, not 2M page, if I read the code correctly.  The code I  
> commented on is right though.
>
> The naming is confusing.  I suggest  
> has_wrprotected_page_in_{large,huge}page().  although with the a level  
> parameter we can keep has_wrprotected_page().

Yeah true, the name is a bit confusing. I think a level parameter for
has_wrprotected_page() is the best solution.

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting
  2009-03-29 13:26   ` Avi Kivity
@ 2009-03-29 13:37     ` Avi Kivity
  0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2009-03-29 13:37 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

Avi Kivity wrote:
> Joerg Roedel wrote:
>> This patch adds the necessary data structures to take care of write
>> protections in place within a second huge page sized page.
>>
>>
>> +#ifdef KVM_PAGES_PER_LHPAGE
>> +    if (npages && !new.hpage_info) {
>> +        int hugepages = npages / KVM_PAGES_PER_LHPAGE;
>> +        if (npages % KVM_PAGES_PER_LHPAGE)
>> +            hugepages++;
>> +        if (base_gfn % KVM_PAGES_PER_LHPAGE)
>> +            hugepages++;
>>   
>
> Consider a slot with base_gfn == 1 and npages == 1.  This will have 
> hugepages == 2, which is wrong.
>
> I think the right calculation is
>
>  ((base_gfn + npages - 1) / N) - (base_gfn / N) + 1
>
> i.e. index of last page, plus one so we can store it.
>
> The small huge page calculation is off as well.
>

I fixed the existing case with

commit 1a967084dbe97a2f4be84139d14e2d958d7ffc46
Author: Avi Kivity <avi@redhat.com>
Date:   Sun Mar 29 16:31:25 2009 +0300

    KVM: MMU: Fix off-by-one calculating large page count
   
    The large page initialization code concludes there are two large 
pages spanned
    by a slot covering 1 (small) page starting at gfn 1.  This is 
incorrect, and
    also results in incorrect write_count initialization in some cases 
(base = 1,
    npages = 513 for example).
   
    Cc: stable@kernel.org
    Signed-off-by: Avi Kivity <avi@redhat.com>

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8aa3b95..3d31557 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1076,6 +1076,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
        int r;
        gfn_t base_gfn;
        unsigned long npages;
+       int largepages;
        unsigned long i;
        struct kvm_memory_slot *memslot;
        struct kvm_memory_slot old, new;
@@ -1151,11 +1152,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
                        new.userspace_addr = 0;
        }
        if (npages && !new.lpage_info) {
-               int largepages = npages / KVM_PAGES_PER_HPAGE;
-               if (npages % KVM_PAGES_PER_HPAGE)
-                       largepages++;
-               if (base_gfn % KVM_PAGES_PER_HPAGE)
-                       largepages++;
+               largepages = 1 + (base_gfn + npages - 1) / 
KVM_PAGES_PER_HPAGE;
+               largepages -= base_gfn / npages;
 
                new.lpage_info = vmalloc(largepages * 
sizeof(*new.lpage_info));

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-03-29 13:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-27 14:31 [PATCH 0/7] Support for GB pages in KVM Joerg Roedel
2009-03-27 14:31 ` [PATCH 1/7] hugetlb: export vma_kernel_pagesize to modules Joerg Roedel
2009-03-27 14:31 ` [PATCH 2/7] kvm mmu: infrastructure changes for multiple huge page support Joerg Roedel
2009-03-29 11:38   ` Avi Kivity
2009-03-27 14:31 ` [PATCH 3/7] kvm mmu: add page size parameter to rmap_remove Joerg Roedel
2009-03-27 14:31 ` [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting Joerg Roedel
2009-03-29 11:45   ` Avi Kivity
2009-03-29 13:03     ` Joerg Roedel
2009-03-29 13:15       ` Avi Kivity
2009-03-29 13:32         ` Joerg Roedel
2009-03-29 13:26   ` Avi Kivity
2009-03-29 13:37     ` Avi Kivity
2009-03-27 14:31 ` [PATCH 5/7] kvm mmu: add support for 1GB pages to direct mapping paths Joerg Roedel
2009-03-29 11:49   ` Avi Kivity
2009-03-27 14:31 ` [PATCH 6/7] kvm mmu: enabling 1GB pages by extending backing_size funtion Joerg Roedel
2009-03-29 11:51   ` Avi Kivity
2009-03-27 14:31 ` [PATCH 7/7] kvm x86: report 1GB page support to userspace Joerg Roedel
2009-03-29 11:54   ` Avi Kivity
2009-03-29 12:45     ` Joerg Roedel
2009-03-29 12:49       ` Avi Kivity
2009-03-29 12:54         ` Joerg Roedel
2009-03-29 13:00           ` Avi Kivity
2009-03-28 21:40 ` [PATCH 0/7] Support for GB pages in KVM Marcelo Tosatti
2009-03-28 21:49   ` Joerg Roedel
2009-03-29 12:03     ` Avi Kivity
2009-03-29 12:47       ` Joerg Roedel
2009-03-29 12:01   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).