All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8 v3] KVM support for 1GB pages
@ 2009-06-19 13:16 Joerg Roedel
  2009-06-19 13:16 ` [PATCH 1/8] hugetlbfs: export vma_kernel_pagsize to modules Joerg Roedel
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel

Hi,

this is the third version of the patches for KVM to support 1GB pages. Changes
to the last version include:

	- changed reporting of 1GB page support to userspace to use
	  SUPPORTED_CPUID ioctl
	- added support for 1GB pages to the software page walker code too

This code still only can make use of 1GB pages with nested paging enabled. I
will give the shadow paging code another debug round soon. Please comment or
consider to apply these patches.

Thanks,

	Joerg

Shortlog:

Joerg Roedel (8):
      hugetlbfs: export vma_kernel_pagsize to modules
      kvm: change memslot data structures for multiple hugepage sizes
      kvm/mmu: rename is_largepage_backed to mapping_level
      kvm/mmu: make rmap code aware of mapping levels
      kvm/mmu: make direct mapping paths aware of mapping levels
      kvm/mmu: add support for another level to page walker
      kvm/mmu: enable gbpages by increasing nr of pagesizes
      kvm x86: report 1GB page support to userspace

Diffstat:

 arch/ia64/include/asm/kvm_host.h    |    3 +-
 arch/powerpc/include/asm/kvm_host.h |    3 +-
 arch/x86/include/asm/kvm_host.h     |   15 ++-
 arch/x86/kvm/mmu.c                  |  219 ++++++++++++++++++++++-------------
 arch/x86/kvm/paging_tmpl.h          |   27 ++++-
 arch/x86/kvm/svm.c                  |    6 +
 arch/x86/kvm/vmx.c                  |    6 +
 arch/x86/kvm/x86.c                  |    3 +-
 include/linux/kvm_host.h            |    2 +-
 mm/hugetlb.c                        |    1 +
 virt/kvm/kvm_main.c                 |   53 ++++++---
 11 files changed, 222 insertions(+), 116 deletions(-)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/8] hugetlbfs: export vma_kernel_pagsize to modules
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-19 13:16 ` [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes Joerg Roedel
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

This function is required by KVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 mm/hugetlb.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e83ad2c..e2016ef 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -234,6 +234,7 @@ unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
 
 	return 1UL << (hstate->order + PAGE_SHIFT);
 }
+EXPORT_SYMBOL_GPL(vma_kernel_pagesize);
 
 /*
  * Return the page size being used by the MMU to back a VMA. In the majority
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
  2009-06-19 13:16 ` [PATCH 1/8] hugetlbfs: export vma_kernel_pagsize to modules Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-23 16:49   ` Marcelo Tosatti
  2009-06-19 13:16 ` [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level Joerg Roedel
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/ia64/include/asm/kvm_host.h    |    3 +-
 arch/powerpc/include/asm/kvm_host.h |    3 +-
 arch/x86/include/asm/kvm_host.h     |   12 ++++----
 arch/x86/kvm/mmu.c                  |   30 ++++++++++---------
 arch/x86/kvm/paging_tmpl.h          |    3 +-
 include/linux/kvm_host.h            |    2 +-
 virt/kvm/kvm_main.c                 |   53 +++++++++++++++++++++++-----------
 7 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 9cf1c4b..d9b6325 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -235,7 +235,8 @@ struct kvm_vm_data {
 #define KVM_REQ_PTC_G		32
 #define KVM_REQ_RESUME		33
 
-#define KVM_PAGES_PER_HPAGE	1
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	1
 
 struct kvm;
 struct kvm_vcpu;
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 3625424..10e884e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -34,7 +34,8 @@
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
 /* We don't currently support large pages. */
-#define KVM_PAGES_PER_HPAGE (1<<31)
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1<<31)
 
 struct kvm;
 struct kvm_run;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c7b0cc2..930bac2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -53,12 +53,12 @@
 #define INVALID_PAGE (~(hpa_t)0)
 #define UNMAPPED_GVA (~(gpa_t)0)
 
-/* shadow tables are PAE even on non-PAE hosts */
-#define KVM_HPAGE_SHIFT 21
-#define KVM_HPAGE_SIZE (1UL << KVM_HPAGE_SHIFT)
-#define KVM_HPAGE_MASK (~(KVM_HPAGE_SIZE - 1))
-
-#define KVM_PAGES_PER_HPAGE (KVM_HPAGE_SIZE / PAGE_SIZE)
+/* KVM Hugepage definitions for x86 */
+#define KVM_NR_PAGE_SIZES	2
+#define KVM_HPAGE_SHIFT(x)	(PAGE_SHIFT + (((x) - 1) * 9))
+#define KVM_HPAGE_SIZE(x)	(1UL << KVM_HPAGE_SHIFT(x))
+#define KVM_HPAGE_MASK(x)	(~(KVM_HPAGE_SIZE(x) - 1))
+#define KVM_PAGES_PER_HPAGE(x)	(KVM_HPAGE_SIZE(x) / PAGE_SIZE)
 
 #define DE_VECTOR 0
 #define DB_VECTOR 1
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index edf3dea..1f24d88 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -394,9 +394,9 @@ static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
 {
 	unsigned long idx;
 
-	idx = (gfn / KVM_PAGES_PER_HPAGE) -
-	      (slot->base_gfn / KVM_PAGES_PER_HPAGE);
-	return &slot->lpage_info[idx].write_count;
+	idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
+	      (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
+	return &slot->lpage_info[0][idx].write_count;
 }
 
 static void account_shadowed(struct kvm *kvm, gfn_t gfn)
@@ -485,10 +485,10 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int lpage)
 	if (!lpage)
 		return &slot->rmap[gfn - slot->base_gfn];
 
-	idx = (gfn / KVM_PAGES_PER_HPAGE) -
-	      (slot->base_gfn / KVM_PAGES_PER_HPAGE);
+	idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
+	      (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
 
-	return &slot->lpage_info[idx].rmap_pde;
+	return &slot->lpage_info[0][idx].rmap_pde;
 }
 
 /*
@@ -724,11 +724,11 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 		end = start + (memslot->npages << PAGE_SHIFT);
 		if (hva >= start && hva < end) {
 			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+			int idx = gfn_offset /
+			          KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);
 			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
 			retval |= handler(kvm,
-					  &memslot->lpage_info[
-						  gfn_offset /
-						  KVM_PAGES_PER_HPAGE].rmap_pde);
+					&memslot->lpage_info[0][idx].rmap_pde);
 		}
 	}
 
@@ -1852,8 +1852,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	pfn_t pfn;
 	unsigned long mmu_seq;
 
-	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
+	if (is_largepage_backed(vcpu, gfn &
+			~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1))) {
+		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
 		largepage = 1;
 	}
 
@@ -2058,8 +2059,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	if (r)
 		return r;
 
-	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
+	if (is_largepage_backed(vcpu, gfn &
+			~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1))) {
+		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
 		largepage = 1;
 	}
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
@@ -2461,7 +2463,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
 	if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
+		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
 		vcpu->arch.update_pte.largepage = 1;
 	}
 	vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 322e811..53e129c 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -401,7 +401,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 
 	if (walker.level == PT_DIRECTORY_LEVEL) {
 		gfn_t large_gfn;
-		large_gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE-1);
+		large_gfn = walker.gfn &
+			~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
 		if (is_largepage_backed(vcpu, large_gfn)) {
 			walker.gfn = large_gfn;
 			largepage = 1;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1b48092..b3fa28f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -103,7 +103,7 @@ struct kvm_memory_slot {
 	struct {
 		unsigned long rmap_pde;
 		int write_count;
-	} *lpage_info;
+	} *lpage_info[KVM_NR_PAGE_SIZES - 1];
 	unsigned long userspace_addr;
 	int user_alloc;
 };
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8939ffa..c5e6d13 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -995,19 +995,25 @@ out:
 static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 				  struct kvm_memory_slot *dont)
 {
+	int i;
+
 	if (!dont || free->rmap != dont->rmap)
 		vfree(free->rmap);
 
 	if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
 		vfree(free->dirty_bitmap);
 
-	if (!dont || free->lpage_info != dont->lpage_info)
-		vfree(free->lpage_info);
+
+	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
+		if (!dont || free->lpage_info[i] != dont->lpage_info[i]) {
+			vfree(free->lpage_info[i]);
+			free->lpage_info[i] = NULL;
+		}
+	}
 
 	free->npages = 0;
 	free->dirty_bitmap = NULL;
 	free->rmap = NULL;
-	free->lpage_info = NULL;
 }
 
 void kvm_free_physmem(struct kvm *kvm)
@@ -1081,7 +1087,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	int r;
 	gfn_t base_gfn;
 	unsigned long npages, ugfn;
-	unsigned long largepages, i;
+	int lpages;
+	unsigned long i, j;
 	struct kvm_memory_slot *memslot;
 	struct kvm_memory_slot old, new;
 
@@ -1155,33 +1162,45 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		else
 			new.userspace_addr = 0;
 	}
-	if (npages && !new.lpage_info) {
-		largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE;
-		largepages -= base_gfn / KVM_PAGES_PER_HPAGE;
+	if (!npages)
+		goto skip_lpage;
 
-		new.lpage_info = vmalloc(largepages * sizeof(*new.lpage_info));
+	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
+		int level = i + 2;
 
-		if (!new.lpage_info)
+		if (new.lpage_info[i])
+			continue;
+
+		lpages = 1 + (base_gfn + npages - 1) /
+			     KVM_PAGES_PER_HPAGE(level);
+		lpages -= base_gfn / KVM_PAGES_PER_HPAGE(level);
+
+		new.lpage_info[i] = vmalloc(lpages * sizeof(*new.lpage_info[i]));
+
+		if (!new.lpage_info[i])
 			goto out_free;
 
-		memset(new.lpage_info, 0, largepages * sizeof(*new.lpage_info));
+		memset(new.lpage_info[i], 0,
+		       lpages * sizeof(*new.lpage_info[i]));
 
-		if (base_gfn % KVM_PAGES_PER_HPAGE)
-			new.lpage_info[0].write_count = 1;
-		if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE)
-			new.lpage_info[largepages-1].write_count = 1;
+		if (base_gfn % KVM_PAGES_PER_HPAGE(level))
+			new.lpage_info[i][0].write_count = 1;
+		if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE(level))
+			new.lpage_info[i][lpages - 1].write_count = 1;
 		ugfn = new.userspace_addr >> PAGE_SHIFT;
 		/*
 		 * If the gfn and userspace address are not aligned wrt each
 		 * other, or if explicitly asked to, disable large page
 		 * support for this slot
 		 */
-		if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE - 1) ||
+		if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1) ||
 		    !largepages_enabled)
-			for (i = 0; i < largepages; ++i)
-				new.lpage_info[i].write_count = 1;
+			for (j = 0; j < lpages; ++j)
+				new.lpage_info[i][j].write_count = 1;
 	}
 
+skip_lpage:
+
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
 		unsigned dirty_bytes = ALIGN(npages, BITS_PER_LONG) / 8;
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
  2009-06-19 13:16 ` [PATCH 1/8] hugetlbfs: export vma_kernel_pagsize to modules Joerg Roedel
  2009-06-19 13:16 ` [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-23 15:59   ` Marcelo Tosatti
  2009-06-19 13:16 ` [PATCH 4/8] kvm/mmu: make rmap code aware of mapping levels Joerg Roedel
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

With the new name and the corresponding backend changes this function
can now support multiple hugepage sizes.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c         |  100 +++++++++++++++++++++++++++++--------------
 arch/x86/kvm/paging_tmpl.h |    4 +-
 2 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1f24d88..3fa6009 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -390,37 +390,52 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
  * Return the pointer to the largepage write count for a given
  * gfn, handling slots that are not large page aligned.
  */
-static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
+static int *slot_largepage_idx(gfn_t gfn,
+			       struct kvm_memory_slot *slot,
+			       int level)
 {
 	unsigned long idx;
 
-	idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
-	      (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
-	return &slot->lpage_info[0][idx].write_count;
+	idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
+	      (slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
+	return &slot->lpage_info[level - 2][idx].write_count;
 }
 
 static void account_shadowed(struct kvm *kvm, gfn_t gfn)
 {
+	struct kvm_memory_slot *slot;
 	int *write_count;
+	int i;
 
 	gfn = unalias_gfn(kvm, gfn);
-	write_count = slot_largepage_idx(gfn,
-					 gfn_to_memslot_unaliased(kvm, gfn));
-	*write_count += 1;
+
+	for (i = PT_DIRECTORY_LEVEL;
+	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+		slot          = gfn_to_memslot_unaliased(kvm, gfn);
+		write_count   = slot_largepage_idx(gfn, slot, i);
+		*write_count += 1;
+	}
 }
 
 static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
 {
+	struct kvm_memory_slot *slot;
 	int *write_count;
+	int i;
 
 	gfn = unalias_gfn(kvm, gfn);
-	write_count = slot_largepage_idx(gfn,
-					 gfn_to_memslot_unaliased(kvm, gfn));
-	*write_count -= 1;
-	WARN_ON(*write_count < 0);
+	for (i = PT_DIRECTORY_LEVEL;
+	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+		slot          = gfn_to_memslot_unaliased(kvm, gfn);
+		write_count   = slot_largepage_idx(gfn, slot, i);
+		*write_count -= 1;
+		WARN_ON(*write_count < 0);
+	}
 }
 
-static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
+static int has_wrprotected_page(struct kvm *kvm,
+				gfn_t gfn,
+				int level)
 {
 	struct kvm_memory_slot *slot;
 	int *largepage_idx;
@@ -428,47 +443,67 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
 	gfn = unalias_gfn(kvm, gfn);
 	slot = gfn_to_memslot_unaliased(kvm, gfn);
 	if (slot) {
-		largepage_idx = slot_largepage_idx(gfn, slot);
+		largepage_idx = slot_largepage_idx(gfn, slot, level);
 		return *largepage_idx;
 	}
 
 	return 1;
 }
 
-static int host_largepage_backed(struct kvm *kvm, gfn_t gfn)
+static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
 {
+	unsigned long page_size = PAGE_SIZE;
 	struct vm_area_struct *vma;
 	unsigned long addr;
-	int ret = 0;
+	int i, ret = 0;
 
 	addr = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(addr))
-		return ret;
+		return page_size;
 
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma(current->mm, addr);
-	if (vma && is_vm_hugetlb_page(vma))
-		ret = 1;
+	if (!vma)
+		goto out;
+
+	page_size = vma_kernel_pagesize(vma);
+
+out:
 	up_read(&current->mm->mmap_sem);
 
+	for (i = PT_PAGE_TABLE_LEVEL;
+	     i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
+		if (page_size >= KVM_HPAGE_SIZE(i))
+			ret = i;
+		else
+			break;
+	}
+
 	return ret;
 }
 
-static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn)
+static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
 {
 	struct kvm_memory_slot *slot;
-
-	if (has_wrprotected_page(vcpu->kvm, large_gfn))
-		return 0;
-
-	if (!host_largepage_backed(vcpu->kvm, large_gfn))
-		return 0;
+	int host_level;
+	int level = PT_PAGE_TABLE_LEVEL;
 
 	slot = gfn_to_memslot(vcpu->kvm, large_gfn);
 	if (slot && slot->dirty_bitmap)
-		return 0;
+		return PT_PAGE_TABLE_LEVEL;
 
-	return 1;
+	host_level = host_mapping_level(vcpu->kvm, large_gfn);
+
+	if (host_level == PT_PAGE_TABLE_LEVEL)
+		return host_level;
+
+	for (level = PT_DIRECTORY_LEVEL; level <= host_level; ++level) {
+
+		if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
+			break;
+	}
+
+	return level - 1;
 }
 
 /*
@@ -1704,7 +1739,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if ((pte_access & ACC_WRITE_MASK)
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
 
-		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
+		if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) {
 			ret = 1;
 			spte = shadow_trap_nonpresent_pte;
 			goto set_pte;
@@ -1852,8 +1887,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	pfn_t pfn;
 	unsigned long mmu_seq;
 
-	if (is_largepage_backed(vcpu, gfn &
-			~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1))) {
+	if (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
 		largepage = 1;
 	}
@@ -2059,8 +2093,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	if (r)
 		return r;
 
-	if (is_largepage_backed(vcpu, gfn &
-			~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1))) {
+	if (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
 		largepage = 1;
 	}
@@ -2462,7 +2495,8 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		return;
 	gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
-	if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
+	if (is_large_pte(gpte) &&
+	    (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL)) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
 		vcpu->arch.update_pte.largepage = 1;
 	}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 53e129c..25a4437 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -402,8 +402,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	if (walker.level == PT_DIRECTORY_LEVEL) {
 		gfn_t large_gfn;
 		large_gfn = walker.gfn &
-			~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
-		if (is_largepage_backed(vcpu, large_gfn)) {
+			    ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
+		if (mapping_level(vcpu, large_gfn) == PT_DIRECTORY_LEVEL) {
 			walker.gfn = large_gfn;
 			largepage = 1;
 		}
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/8] kvm/mmu: make rmap code aware of mapping levels
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
                   ` (2 preceding siblings ...)
  2009-06-19 13:16 ` [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-19 13:16 ` [PATCH 5/8] kvm/mmu: make direct mapping paths " Joerg Roedel
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

This patch removes the largepage parameter from the rmap_add function.
Together with rmap_remove this function now uses the role.level field to
find determine if the page is a huge page.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c |   53 +++++++++++++++++++++++++++------------------------
 1 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3fa6009..0ef947d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -511,19 +511,19 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
  * Note: gfn must be unaliased before this function get called
  */
 
-static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int lpage)
+static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
 {
 	struct kvm_memory_slot *slot;
 	unsigned long idx;
 
 	slot = gfn_to_memslot(kvm, gfn);
-	if (!lpage)
+	if (likely(level == PT_PAGE_TABLE_LEVEL))
 		return &slot->rmap[gfn - slot->base_gfn];
 
-	idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
-	      (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
+	idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
+		(slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
 
-	return &slot->lpage_info[0][idx].rmap_pde;
+	return &slot->lpage_info[level - 2][idx].rmap_pde;
 }
 
 /*
@@ -535,7 +535,7 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int lpage)
  * If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc
  * containing more mappings.
  */
-static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage)
+static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
 	struct kvm_mmu_page *sp;
 	struct kvm_rmap_desc *desc;
@@ -547,7 +547,7 @@ static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, int lpage)
 	gfn = unalias_gfn(vcpu->kvm, gfn);
 	sp = page_header(__pa(spte));
 	sp->gfns[spte - sp->spt] = gfn;
-	rmapp = gfn_to_rmap(vcpu->kvm, gfn, lpage);
+	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
 	if (!*rmapp) {
 		rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
 		*rmapp = (unsigned long)spte;
@@ -614,7 +614,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 		kvm_release_pfn_dirty(pfn);
 	else
 		kvm_release_pfn_clean(pfn);
-	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], is_large_pte(*spte));
+	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
 		BUG();
@@ -677,10 +677,10 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
 	unsigned long *rmapp;
 	u64 *spte;
-	int write_protected = 0;
+	int i, write_protected = 0;
 
 	gfn = unalias_gfn(kvm, gfn);
-	rmapp = gfn_to_rmap(kvm, gfn, 0);
+	rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL);
 
 	spte = rmap_next(kvm, rmapp, NULL);
 	while (spte) {
@@ -702,21 +702,24 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 	}
 
 	/* check for huge page mappings */
-	rmapp = gfn_to_rmap(kvm, gfn, 1);
-	spte = rmap_next(kvm, rmapp, NULL);
-	while (spte) {
-		BUG_ON(!spte);
-		BUG_ON(!(*spte & PT_PRESENT_MASK));
-		BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
-		pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
-		if (is_writeble_pte(*spte)) {
-			rmap_remove(kvm, spte);
-			--kvm->stat.lpages;
-			__set_spte(spte, shadow_trap_nonpresent_pte);
-			spte = NULL;
-			write_protected = 1;
+	for (i = PT_DIRECTORY_LEVEL;
+	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+		rmapp = gfn_to_rmap(kvm, gfn, i);
+		spte = rmap_next(kvm, rmapp, NULL);
+		while (spte) {
+			BUG_ON(!spte);
+			BUG_ON(!(*spte & PT_PRESENT_MASK));
+			BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
+			pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
+			if (is_writeble_pte(*spte)) {
+				rmap_remove(kvm, spte);
+				--kvm->stat.lpages;
+				__set_spte(spte, shadow_trap_nonpresent_pte);
+				spte = NULL;
+				write_protected = 1;
+			}
+			spte = rmap_next(kvm, rmapp, spte);
 		}
-		spte = rmap_next(kvm, rmapp, spte);
 	}
 
 	return write_protected;
@@ -1823,7 +1826,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	page_header_update_slot(vcpu->kvm, sptep, gfn);
 	if (!was_rmapped) {
-		rmap_add(vcpu, sptep, gfn, largepage);
+		rmap_add(vcpu, sptep, gfn);
 		if (!is_rmap_spte(*sptep))
 			kvm_release_pfn_clean(pfn);
 	} else {
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/8] kvm/mmu: make direct mapping paths aware of mapping levels
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
                   ` (3 preceding siblings ...)
  2009-06-19 13:16 ` [PATCH 4/8] kvm/mmu: make rmap code aware of mapping levels Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-23 16:47   ` Marcelo Tosatti
  2009-06-19 13:16 ` [PATCH 6/8] kvm/mmu: add support for another level to page walker Joerg Roedel
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    2 +-
 arch/x86/kvm/mmu.c              |   60 ++++++++++++++++++++++-----------------
 arch/x86/kvm/paging_tmpl.h      |    6 ++--
 3 files changed, 38 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 930bac2..397f992 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -314,7 +314,7 @@ struct kvm_vcpu_arch {
 	struct {
 		gfn_t gfn;	/* presumed gfn during guest pte update */
 		pfn_t pfn;	/* pfn corresponding to that gfn */
-		int largepage;
+		int level;
 		unsigned long mmu_seq;
 	} update_pte;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0ef947d..ef2396d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -254,7 +254,7 @@ static int is_last_spte(u64 pte, int level)
 {
 	if (level == PT_PAGE_TABLE_LEVEL)
 		return 1;
-	if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte))
+	if (is_large_pte(pte))
 		return 1;
 	return 0;
 }
@@ -1708,7 +1708,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		    unsigned pte_access, int user_fault,
-		    int write_fault, int dirty, int largepage,
+		    int write_fault, int dirty, int level,
 		    gfn_t gfn, pfn_t pfn, bool speculative,
 		    bool can_unsync)
 {
@@ -1731,7 +1731,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		spte |= shadow_nx_mask;
 	if (pte_access & ACC_USER_MASK)
 		spte |= shadow_user_mask;
-	if (largepage)
+	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;
 	if (tdp_enabled)
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
@@ -1742,7 +1742,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if ((pte_access & ACC_WRITE_MASK)
 	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
 
-		if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) {
+		if (level > PT_PAGE_TABLE_LEVEL &&
+		    has_wrprotected_page(vcpu->kvm, gfn, level)) {
 			ret = 1;
 			spte = shadow_trap_nonpresent_pte;
 			goto set_pte;
@@ -1780,7 +1781,7 @@ set_pte:
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			 unsigned pt_access, unsigned pte_access,
 			 int user_fault, int write_fault, int dirty,
-			 int *ptwrite, int largepage, gfn_t gfn,
+			 int *ptwrite, int level, gfn_t gfn,
 			 pfn_t pfn, bool speculative)
 {
 	int was_rmapped = 0;
@@ -1796,7 +1797,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 		 * the parent of the now unreachable PTE.
 		 */
-		if (largepage && !is_large_pte(*sptep)) {
+		if (level > PT_PAGE_TABLE_LEVEL &&
+		    !is_large_pte(*sptep)) {
 			struct kvm_mmu_page *child;
 			u64 pte = *sptep;
 
@@ -1809,8 +1811,9 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		} else
 			was_rmapped = 1;
 	}
+
 	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
-		      dirty, largepage, gfn, pfn, speculative, true)) {
+		      dirty, level, gfn, pfn, speculative, true)) {
 		if (write_fault)
 			*ptwrite = 1;
 		kvm_x86_ops->tlb_flush(vcpu);
@@ -1846,7 +1849,7 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-			int largepage, gfn_t gfn, pfn_t pfn)
+			int level, gfn_t gfn, pfn_t pfn)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
@@ -1854,11 +1857,10 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 	gfn_t pseudo_gfn;
 
 	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
-		if (iterator.level == PT_PAGE_TABLE_LEVEL
-		    || (largepage && iterator.level == PT_DIRECTORY_LEVEL)) {
+		if (iterator.level == level) {
 			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
 				     0, write, 1, &pt_write,
-				     largepage, gfn, pfn, false);
+				     level, gfn, pfn, false);
 			++vcpu->stat.pf_fixed;
 			break;
 		}
@@ -1886,14 +1888,20 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 {
 	int r;
-	int largepage = 0;
+	int level;
 	pfn_t pfn;
 	unsigned long mmu_seq;
 
-	if (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
-		largepage = 1;
-	}
+	level = mapping_level(vcpu, gfn);
+
+	/*
+	 * This path builds a PAE pagetable - so we can map 2mb pages at
+	 * maximum. Therefore check if the level is larger than that.
+	 */
+	if (level > PT_DIRECTORY_LEVEL)
+		level = PT_DIRECTORY_LEVEL;
+
+	gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
@@ -1909,7 +1917,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	if (mmu_notifier_retry(vcpu, mmu_seq))
 		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
-	r = __direct_map(vcpu, v, write, largepage, gfn, pfn);
+	r = __direct_map(vcpu, v, write, level, gfn, pfn);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 
@@ -2085,7 +2093,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 {
 	pfn_t pfn;
 	int r;
-	int largepage = 0;
+	int level;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
 
@@ -2096,10 +2104,10 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	if (r)
 		return r;
 
-	if (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL) {
-		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
-		largepage = 1;
-	}
+	level = mapping_level(vcpu, gfn);
+
+	gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
+
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
@@ -2112,7 +2120,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
-			 largepage, gfn, pfn);
+			 level, gfn, pfn);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return r;
@@ -2419,7 +2427,7 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
 				  const void *new)
 {
 	if (sp->role.level != PT_PAGE_TABLE_LEVEL) {
-		if (!vcpu->arch.update_pte.largepage ||
+		if (vcpu->arch.update_pte.level == PT_PAGE_TABLE_LEVEL ||
 		    sp->role.glevels == PT32_ROOT_LEVEL) {
 			++vcpu->kvm->stat.mmu_pde_zapped;
 			return;
@@ -2469,7 +2477,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	u64 gpte = 0;
 	pfn_t pfn;
 
-	vcpu->arch.update_pte.largepage = 0;
+	vcpu->arch.update_pte.level = PT_PAGE_TABLE_LEVEL;
 
 	if (bytes != 4 && bytes != 8)
 		return;
@@ -2501,7 +2509,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	if (is_large_pte(gpte) &&
 	    (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL)) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
-		vcpu->arch.update_pte.largepage = 1;
+		vcpu->arch.update_pte.level = PT_DIRECTORY_LEVEL;
 	}
 	vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 25a4437..8fbf4e7 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -248,7 +248,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 	pt_element_t gpte;
 	unsigned pte_access;
 	pfn_t pfn;
-	int largepage = vcpu->arch.update_pte.largepage;
+	int level = vcpu->arch.update_pte.level;
 
 	gpte = *(const pt_element_t *)pte;
 	if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
@@ -267,7 +267,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 		return;
 	kvm_get_pfn(pfn);
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
-		     gpte & PT_DIRTY_MASK, NULL, largepage,
+		     gpte & PT_DIRTY_MASK, NULL, level,
 		     gpte_to_gfn(gpte), pfn, true);
 }
 
@@ -301,7 +301,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 				     gw->pte_access & access,
 				     user_fault, write_fault,
 				     gw->ptes[gw->level-1] & PT_DIRTY_MASK,
-				     ptwrite, largepage,
+				     ptwrite, level,
 				     gw->gfn, pfn, false);
 			break;
 		}
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/8] kvm/mmu: add support for another level to page walker
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
                   ` (4 preceding siblings ...)
  2009-06-19 13:16 ` [PATCH 5/8] kvm/mmu: make direct mapping paths " Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-20 11:19   ` Avi Kivity
  2009-06-19 13:16 ` [PATCH 7/8] kvm/mmu: enable gbpages by increasing nr of pagesizes Joerg Roedel
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

The page walker may be used with nested paging too when accessing mmio areas.
Make it support the additional page-level too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/mmu.c         |    6 ++++++
 arch/x86/kvm/paging_tmpl.h |   16 ++++++++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ef2396d..fc0e2fc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -117,6 +117,11 @@ module_param(oos_shadow, bool, 0644);
 #define PT64_DIR_BASE_ADDR_MASK \
 	(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
 
+#define PT64_PDPE_BASE_ADDR_MASK \
+	(PT64_BASE_ADDR_MASK & ~(1ULL << (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))))
+#define PT64_PDPE_OFFSET_MASK \
+	(PT64_BASE_ADDR_MASK & (1ULL << (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))))
+
 #define PT32_BASE_ADDR_MASK PAGE_MASK
 #define PT32_DIR_BASE_ADDR_MASK \
 	(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
@@ -130,6 +135,7 @@ module_param(oos_shadow, bool, 0644);
 #define PFERR_RSVD_MASK (1U << 3)
 #define PFERR_FETCH_MASK (1U << 4)
 
+#define PT_PDPE_LEVEL 3
 #define PT_DIRECTORY_LEVEL 2
 #define PT_PAGE_TABLE_LEVEL 1
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8fbf4e7..54c77be 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -55,6 +55,7 @@
 
 #define gpte_to_gfn FNAME(gpte_to_gfn)
 #define gpte_to_gfn_pde FNAME(gpte_to_gfn_pde)
+#define gpte_to_gfn_pdpe FNAME(gpte_to_gfn_pdpe)
 
 /*
  * The guest_walker structure emulates the behavior of the hardware page
@@ -81,6 +82,11 @@ static gfn_t gpte_to_gfn_pde(pt_element_t gpte)
 	return (gpte & PT_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
 }
 
+static gfn_t gpte_to_gfn_pdpe(pt_element_t gpte)
+{
+	return (gpte & PT64_PDPE_BASE_ADDR_MASK) >> PAGE_SHIFT;
+}
+
 static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
 			 gfn_t table_gfn, unsigned index,
 			 pt_element_t orig_pte, pt_element_t new_pte)
@@ -201,6 +207,15 @@ walk:
 			break;
 		}
 
+		if (walker->level == PT_PDPE_LEVEL &&
+		    (pte & PT_PAGE_SIZE_MASK) &&
+		    is_long_mode(vcpu)) {
+			walker->gfn  = gpte_to_gfn_pdpe(pte);
+			walker->gfn += (addr & PT64_PDPE_OFFSET_MASK)
+					>> PAGE_SHIFT;
+			break;
+		}
+
 		pt_access = pte_access;
 		--walker->level;
 	}
@@ -609,4 +624,5 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 #undef PT_MAX_FULL_LEVELS
 #undef gpte_to_gfn
 #undef gpte_to_gfn_pde
+#undef gpte_to_gfn_pdpe
 #undef CMPXCHG
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/8] kvm/mmu: enable gbpages by increasing nr of pagesizes
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
                   ` (5 preceding siblings ...)
  2009-06-19 13:16 ` [PATCH 6/8] kvm/mmu: add support for another level to page walker Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-19 13:16 ` [PATCH 8/8] kvm x86: report 1GB page support to userspace Joerg Roedel
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 397f992..26ea9b8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -54,7 +54,7 @@
 #define UNMAPPED_GVA (~(gpa_t)0)
 
 /* KVM Hugepage definitions for x86 */
-#define KVM_NR_PAGE_SIZES	2
+#define KVM_NR_PAGE_SIZES	3
 #define KVM_HPAGE_SHIFT(x)	(PAGE_SHIFT + (((x) - 1) * 9))
 #define KVM_HPAGE_SIZE(x)	(1UL << KVM_HPAGE_SHIFT(x))
 #define KVM_HPAGE_MASK(x)	(~(KVM_HPAGE_SIZE(x) - 1))
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/8] kvm x86: report 1GB page support to userspace
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
                   ` (6 preceding siblings ...)
  2009-06-19 13:16 ` [PATCH 7/8] kvm/mmu: enable gbpages by increasing nr of pagesizes Joerg Roedel
@ 2009-06-19 13:16 ` Joerg Roedel
  2009-06-20 11:03 ` [PATCH 0/8 v3] KVM support for 1GB pages Avi Kivity
  2009-06-24  8:43 ` Avi Kivity
  9 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-19 13:16 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, linux-kernel, Joerg Roedel

If userspace knows that the kernel part supports 1GB pages it can enable
the corresponding cpuid bit so that guests actually use GB pages.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/svm.c              |    6 ++++++
 arch/x86/kvm/vmx.c              |    6 ++++++
 arch/x86/kvm/x86.c              |    3 ++-
 4 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26ea9b8..c120882 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -527,6 +527,7 @@ struct kvm_x86_ops {
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*get_tdp_level)(void);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+	bool (*gb_page_enable)(void);
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c283201..ccb75fb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2718,6 +2718,11 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 	return 0;
 }
 
+static bool svm_gb_page_enable(void)
+{
+	return npt_enabled;
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -2779,6 +2784,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 	.set_tss_addr = svm_set_tss_addr,
 	.get_tdp_level = get_npt_level,
 	.get_mt_mask = svm_get_mt_mask,
+	.gb_page_enable = svm_gb_page_enable,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9332938..4418ba6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3891,6 +3891,11 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 	return ret;
 }
 
+static bool vmx_gb_page_enable(void)
+{
+	return false;
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -3950,6 +3955,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.set_tss_addr = vmx_set_tss_addr,
 	.get_tdp_level = get_ept_level,
 	.get_mt_mask = vmx_get_mt_mask,
+	.gb_page_enable = vmx_gb_page_enable,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 71090d2..ddfb77e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1371,6 +1371,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			 u32 index, int *nent, int maxnent)
 {
 	unsigned f_nx = is_efer_nx() ? F(NX) : 0;
+	unsigned f_gbpages = kvm_x86_ops->gb_page_enable() ? F(GBPAGES) : 0;
 #ifdef CONFIG_X86_64
 	unsigned f_lm = F(LM);
 #else
@@ -1395,7 +1396,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
 		F(PAT) | F(PSE36) | 0 /* Reserved */ |
 		f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
-		F(FXSR) | F(FXSR_OPT) | 0 /* GBPAGES */ | 0 /* RDTSCP */ |
+		F(FXSR) | F(FXSR_OPT) | f_gbpages | 0 /* RDTSCP */ |
 		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
 	/* cpuid 1.ecx */
 	const u32 kvm_supported_word4_x86_features =
-- 
1.6.3.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v3] KVM support for 1GB pages
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
                   ` (7 preceding siblings ...)
  2009-06-19 13:16 ` [PATCH 8/8] kvm x86: report 1GB page support to userspace Joerg Roedel
@ 2009-06-20 11:03 ` Avi Kivity
  2009-06-22  9:40   ` Joerg Roedel
  2009-06-24  8:43 ` Avi Kivity
  9 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2009-06-20 11:03 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

On 06/19/2009 04:16 PM, Joerg Roedel wrote:
> Hi,
>
> this is the third version of the patches for KVM to support 1GB pages. Changes
> to the last version include:
>
> 	- changed reporting of 1GB page support to userspace to use
> 	  SUPPORTED_CPUID ioctl
> 	- added support for 1GB pages to the software page walker code too
>
> This code still only can make use of 1GB pages with nested paging enabled. I
> will give the shadow paging code another debug round soon. Please comment or
> consider to apply these patches.
>
>    

Can you describe how the code fails with shadow paging?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] kvm/mmu: add support for another level to page walker
  2009-06-19 13:16 ` [PATCH 6/8] kvm/mmu: add support for another level to page walker Joerg Roedel
@ 2009-06-20 11:19   ` Avi Kivity
  2009-06-22  9:38     ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2009-06-20 11:19 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

On 06/19/2009 04:16 PM, Joerg Roedel wrote:
> The page walker may be used with nested paging too when accessing mmio areas.
> Make it support the additional page-level too.
>
> Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
> ---
>   arch/x86/kvm/mmu.c         |    6 ++++++
>   arch/x86/kvm/paging_tmpl.h |   16 ++++++++++++++++
>   2 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ef2396d..fc0e2fc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -117,6 +117,11 @@ module_param(oos_shadow, bool, 0644);
>   #define PT64_DIR_BASE_ADDR_MASK \
>   	(PT64_BASE_ADDR_MASK&  ~((1ULL<<  (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
>
> +#define PT64_PDPE_BASE_ADDR_MASK \
> +	(PT64_BASE_ADDR_MASK&  ~(1ULL<<  (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))))
> +#define PT64_PDPE_OFFSET_MASK \
> +	(PT64_BASE_ADDR_MASK&  (1ULL<<  (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))))
> +
>   #define PT32_BASE_ADDR_MASK PAGE_MASK
>   #define PT32_DIR_BASE_ADDR_MASK \
>   	(PAGE_MASK&  ~((1ULL<<  (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
> @@ -130,6 +135,7 @@ module_param(oos_shadow, bool, 0644);
>   #define PFERR_RSVD_MASK (1U<<  3)
>   #define PFERR_FETCH_MASK (1U<<  4)
> +static gfn_t gpte_to_gfn_pdpe(pt_element_t gpte)
> +{
> +	return (gpte&  PT64_PDPE_BASE_ADDR_MASK)>>  PAGE_SHIFT;
> +}
> +
>   static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
>   			 gfn_t table_gfn, unsigned index,
>   			 pt_element_t orig_pte, pt_element_t new_pte)
> @@ -201,6 +207,15 @@ walk:
>   			break;
>   		}
>
> +		if (walker->level == PT_PDPE_LEVEL&&
> +		    (pte&  PT_PAGE_SIZE_MASK)&&
> +		    is_long_mode(vcpu)) {
> +			walker->gfn  = gpte_to_gfn_pdpe(pte);
> +			walker->gfn += (addr&  PT64_PDPE_OFFSET_MASK)
> +					>>  PAGE_SHIFT;
> +			break;
> +		}
> +
>   		pt_access = pte_access;
>    

It would be cleaner to merge this with the 2MB check earlier (and to 
rename and parametrise gpte_to_gfn_pde() rather than duplicate it).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] kvm/mmu: add support for another level to page walker
  2009-06-20 11:19   ` Avi Kivity
@ 2009-06-22  9:38     ` Joerg Roedel
  0 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-22  9:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, linux-kernel

On Sat, Jun 20, 2009 at 02:19:48PM +0300, Avi Kivity wrote:
> On 06/19/2009 04:16 PM, Joerg Roedel wrote:
>> The page walker may be used with nested paging too when accessing mmio areas.
>> Make it support the additional page-level too.
>>
>> Signed-off-by: Joerg Roedel<joerg.roedel@amd.com>
>> ---
>>   arch/x86/kvm/mmu.c         |    6 ++++++
>>   arch/x86/kvm/paging_tmpl.h |   16 ++++++++++++++++
>>   2 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index ef2396d..fc0e2fc 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -117,6 +117,11 @@ module_param(oos_shadow, bool, 0644);
>>   #define PT64_DIR_BASE_ADDR_MASK \
>>   	(PT64_BASE_ADDR_MASK&  ~((1ULL<<  (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
>>
>> +#define PT64_PDPE_BASE_ADDR_MASK \
>> +	(PT64_BASE_ADDR_MASK&  ~(1ULL<<  (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))))
>> +#define PT64_PDPE_OFFSET_MASK \
>> +	(PT64_BASE_ADDR_MASK&  (1ULL<<  (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))))
>> +
>>   #define PT32_BASE_ADDR_MASK PAGE_MASK
>>   #define PT32_DIR_BASE_ADDR_MASK \
>>   	(PAGE_MASK&  ~((1ULL<<  (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
>> @@ -130,6 +135,7 @@ module_param(oos_shadow, bool, 0644);
>>   #define PFERR_RSVD_MASK (1U<<  3)
>>   #define PFERR_FETCH_MASK (1U<<  4)
>> +static gfn_t gpte_to_gfn_pdpe(pt_element_t gpte)
>> +{
>> +	return (gpte&  PT64_PDPE_BASE_ADDR_MASK)>>  PAGE_SHIFT;
>> +}
>> +
>>   static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
>>   			 gfn_t table_gfn, unsigned index,
>>   			 pt_element_t orig_pte, pt_element_t new_pte)
>> @@ -201,6 +207,15 @@ walk:
>>   			break;
>>   		}
>>
>> +		if (walker->level == PT_PDPE_LEVEL&&
>> +		    (pte&  PT_PAGE_SIZE_MASK)&&
>> +		    is_long_mode(vcpu)) {
>> +			walker->gfn  = gpte_to_gfn_pdpe(pte);
>> +			walker->gfn += (addr&  PT64_PDPE_OFFSET_MASK)
>> +					>>  PAGE_SHIFT;
>> +			break;
>> +		}
>> +
>>   		pt_access = pte_access;
>>    
>
> It would be cleaner to merge this with the 2MB check earlier (and to  
> rename and parametrise gpte_to_gfn_pde() rather than duplicate it).

Ok, I will merge it into the previous function.

Joerg

-- 
           | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System    | 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center    | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
           | Registergericht München, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v3] KVM support for 1GB pages
  2009-06-20 11:03 ` [PATCH 0/8 v3] KVM support for 1GB pages Avi Kivity
@ 2009-06-22  9:40   ` Joerg Roedel
  2009-06-22  9:43     ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2009-06-22  9:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, linux-kernel

On Sat, Jun 20, 2009 at 02:03:22PM +0300, Avi Kivity wrote:
> On 06/19/2009 04:16 PM, Joerg Roedel wrote:
>> Hi,
>>
>> this is the third version of the patches for KVM to support 1GB pages. Changes
>> to the last version include:
>>
>> 	- changed reporting of 1GB page support to userspace to use
>> 	  SUPPORTED_CPUID ioctl
>> 	- added support for 1GB pages to the software page walker code too
>>
>> This code still only can make use of 1GB pages with nested paging enabled. I
>> will give the shadow paging code another debug round soon. Please comment or
>> consider to apply these patches.
>>
>>    
>
> Can you describe how the code fails with shadow paging?

After the rework it even fails to boot a guest with gbpages enabled. This week
is a bit tight but I will debug this further next week.

Joerg

-- 
           | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System    | 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center    | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
           | Registergericht München, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v3] KVM support for 1GB pages
  2009-06-22  9:40   ` Joerg Roedel
@ 2009-06-22  9:43     ` Avi Kivity
  2009-06-22  9:49       ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2009-06-22  9:43 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

On 06/22/2009 12:40 PM, Joerg Roedel wrote:
>> Can you describe how the code fails with shadow paging?
>>      
>
> After the rework it even fails to boot a guest with gbpages enabled. This week
> is a bit tight but I will debug this further next week.
>    

Does the failure happen only when both guest/host use gbpages, or does 
it also happen if gbpages is disabled on either one?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v3] KVM support for 1GB pages
  2009-06-22  9:43     ` Avi Kivity
@ 2009-06-22  9:49       ` Joerg Roedel
  2009-06-22  9:55         ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2009-06-22  9:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, linux-kernel

On Mon, Jun 22, 2009 at 12:43:40PM +0300, Avi Kivity wrote:
> On 06/22/2009 12:40 PM, Joerg Roedel wrote:
>>> Can you describe how the code fails with shadow paging?
>>>      
>>
>> After the rework it even fails to boot a guest with gbpages enabled. This week
>> is a bit tight but I will debug this further next week.
>>    
>
> Does the failure happen only when both guest/host use gbpages, or does  
> it also happen if gbpages is disabled on either one?

It happens only when the guest uses gbpages. If I disable them at boot the
linux guest boots fine and runs stable.

Joerg

-- 
           | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System    | 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center    | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
           | Registergericht München, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v3] KVM support for 1GB pages
  2009-06-22  9:49       ` Joerg Roedel
@ 2009-06-22  9:55         ` Avi Kivity
  0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2009-06-22  9:55 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

On 06/22/2009 12:49 PM, Joerg Roedel wrote:
>>> After the rework it even fails to boot a guest with gbpages enabled. This week
>>> is a bit tight but I will debug this further next week.
>>>
>>>        
>> Does the failure happen only when both guest/host use gbpages, or does
>> it also happen if gbpages is disabled on either one?
>>      
>
> It happens only when the guest uses gbpages. If I disable them at boot the
> linux guest boots fine and runs stable.
>    

This seems to point the finger at the write protection logic, either we 
create a gbpage even though there are shadowed page tables in its area, 
or (more likely) we don't break up a gbpage when we write protect a page.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level
  2009-06-19 13:16 ` [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level Joerg Roedel
@ 2009-06-23 15:59   ` Marcelo Tosatti
  2009-06-23 17:00     ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2009-06-23 15:59 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, kvm, linux-kernel

Hi Joerg,

On Fri, Jun 19, 2009 at 03:16:24PM +0200, Joerg Roedel wrote:
> With the new name and the corresponding backend changes this function
> can now support multiple hugepage sizes.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/kvm/mmu.c         |  100 +++++++++++++++++++++++++++++--------------
>  arch/x86/kvm/paging_tmpl.h |    4 +-
>  2 files changed, 69 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 1f24d88..3fa6009 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -390,37 +390,52 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
>   * Return the pointer to the largepage write count for a given
>   * gfn, handling slots that are not large page aligned.
>   */
> -static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
> +static int *slot_largepage_idx(gfn_t gfn,
> +			       struct kvm_memory_slot *slot,
> +			       int level)
>  {
>  	unsigned long idx;
>  
> -	idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
> -	      (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
> -	return &slot->lpage_info[0][idx].write_count;
> +	idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
> +	      (slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
> +	return &slot->lpage_info[level - 2][idx].write_count;
>  }
>  
>  static void account_shadowed(struct kvm *kvm, gfn_t gfn)
>  {
> +	struct kvm_memory_slot *slot;
>  	int *write_count;
> +	int i;
>  
>  	gfn = unalias_gfn(kvm, gfn);
> -	write_count = slot_largepage_idx(gfn,
> -					 gfn_to_memslot_unaliased(kvm, gfn));
> -	*write_count += 1;
> +
> +	for (i = PT_DIRECTORY_LEVEL;
> +	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> +		slot          = gfn_to_memslot_unaliased(kvm, gfn);

Can't you move this call out of the loop?

> +		write_count   = slot_largepage_idx(gfn, slot, i);
> +		*write_count += 1;
> +	}
>  }
>  
>  static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
>  {
> +	struct kvm_memory_slot *slot;
>  	int *write_count;
> +	int i;
>  
>  	gfn = unalias_gfn(kvm, gfn);
> -	write_count = slot_largepage_idx(gfn,
> -					 gfn_to_memslot_unaliased(kvm, gfn));
> -	*write_count -= 1;
> -	WARN_ON(*write_count < 0);
> +	for (i = PT_DIRECTORY_LEVEL;
> +	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> +		slot          = gfn_to_memslot_unaliased(kvm, gfn);
> +		write_count   = slot_largepage_idx(gfn, slot, i);
> +		*write_count -= 1;
> +		WARN_ON(*write_count < 0);
> +	}
>  }
>  
> -static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
> +static int has_wrprotected_page(struct kvm *kvm,
> +				gfn_t gfn,
> +				int level)
>  {
>  	struct kvm_memory_slot *slot;
>  	int *largepage_idx;
> @@ -428,47 +443,67 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
>  	gfn = unalias_gfn(kvm, gfn);
>  	slot = gfn_to_memslot_unaliased(kvm, gfn);
>  	if (slot) {
> -		largepage_idx = slot_largepage_idx(gfn, slot);
> +		largepage_idx = slot_largepage_idx(gfn, slot, level);
>  		return *largepage_idx;
>  	}
>  
>  	return 1;
>  }
>  
> -static int host_largepage_backed(struct kvm *kvm, gfn_t gfn)
> +static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
>  {
> +	unsigned long page_size = PAGE_SIZE;
>  	struct vm_area_struct *vma;
>  	unsigned long addr;
> -	int ret = 0;
> +	int i, ret = 0;
>  
>  	addr = gfn_to_hva(kvm, gfn);
>  	if (kvm_is_error_hva(addr))
> -		return ret;
> +		return page_size;
>  
>  	down_read(&current->mm->mmap_sem);
>  	vma = find_vma(current->mm, addr);
> -	if (vma && is_vm_hugetlb_page(vma))
> -		ret = 1;
> +	if (!vma)
> +		goto out;
> +
> +	page_size = vma_kernel_pagesize(vma);
> +
> +out:
>  	up_read(&current->mm->mmap_sem);
>  
> +	for (i = PT_PAGE_TABLE_LEVEL;
> +	     i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
> +		if (page_size >= KVM_HPAGE_SIZE(i))
> +			ret = i;
> +		else
> +			break;
> +	}
> +
>  	return ret;
>  }
>  
> -static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn)
> +static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>  {
>  	struct kvm_memory_slot *slot;
> -
> -	if (has_wrprotected_page(vcpu->kvm, large_gfn))
> -		return 0;
> -
> -	if (!host_largepage_backed(vcpu->kvm, large_gfn))
> -		return 0;
> +	int host_level;
> +	int level = PT_PAGE_TABLE_LEVEL;
>  
>  	slot = gfn_to_memslot(vcpu->kvm, large_gfn);
>  	if (slot && slot->dirty_bitmap)
> -		return 0;
> +		return PT_PAGE_TABLE_LEVEL;
>  
> -	return 1;
> +	host_level = host_mapping_level(vcpu->kvm, large_gfn);
> +
> +	if (host_level == PT_PAGE_TABLE_LEVEL)
> +		return host_level;
> +
> +	for (level = PT_DIRECTORY_LEVEL; level <= host_level; ++level) {
> +
> +		if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
> +			break;
> +	}
> +
> +	return level - 1;
>  }
>  
>  /*
> @@ -1704,7 +1739,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	if ((pte_access & ACC_WRITE_MASK)
>  	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
>  
> -		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
> +		if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) {

It seems direct_map is missing the large pte overwrite check that
fetch() contains:

                if (is_large_pte(*sptep)) {
                        rmap_remove(vcpu->kvm, sptep);
                        __set_spte(sptep, shadow_trap_nonpresent_pte);
                        kvm_flush_remote_tlbs(vcpu->kvm);
                }

(perhaps its not a possible scenario at the moment, but...).



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/8] kvm/mmu: make direct mapping paths aware of mapping levels
  2009-06-19 13:16 ` [PATCH 5/8] kvm/mmu: make direct mapping paths " Joerg Roedel
@ 2009-06-23 16:47   ` Marcelo Tosatti
  2009-06-23 17:10     ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2009-06-23 16:47 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Marcelo Tosatti, kvm, linux-kernel

On Fri, Jun 19, 2009 at 03:16:26PM +0200, Joerg Roedel wrote:
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    2 +-
>  arch/x86/kvm/mmu.c              |   60 ++++++++++++++++++++++-----------------
>  arch/x86/kvm/paging_tmpl.h      |    6 ++--
>  3 files changed, 38 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 930bac2..397f992 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -314,7 +314,7 @@ struct kvm_vcpu_arch {
>  	struct {
>  		gfn_t gfn;	/* presumed gfn during guest pte update */
>  		pfn_t pfn;	/* pfn corresponding to that gfn */
> -		int largepage;
> +		int level;
>  		unsigned long mmu_seq;
>  	} update_pte;
>  
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 0ef947d..ef2396d 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -254,7 +254,7 @@ static int is_last_spte(u64 pte, int level)
>  {
>  	if (level == PT_PAGE_TABLE_LEVEL)
>  		return 1;
> -	if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte))
> +	if (is_large_pte(pte))
>  		return 1;

Wouldnt it be safer to check for bit 7 only on the levels
we're sure it means large page?

>  	return 0;
>  }
> @@ -1708,7 +1708,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>  
>  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		    unsigned pte_access, int user_fault,
> -		    int write_fault, int dirty, int largepage,
> +		    int write_fault, int dirty, int level,
>  		    gfn_t gfn, pfn_t pfn, bool speculative,
>  		    bool can_unsync)
>  {
> @@ -1731,7 +1731,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		spte |= shadow_nx_mask;
>  	if (pte_access & ACC_USER_MASK)
>  		spte |= shadow_user_mask;
> -	if (largepage)
> +	if (level > PT_PAGE_TABLE_LEVEL)
>  		spte |= PT_PAGE_SIZE_MASK;
>  	if (tdp_enabled)
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> @@ -1742,7 +1742,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	if ((pte_access & ACC_WRITE_MASK)
>  	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
>  
> -		if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) {
> +		if (level > PT_PAGE_TABLE_LEVEL &&
> +		    has_wrprotected_page(vcpu->kvm, gfn, level)) {
>  			ret = 1;
>  			spte = shadow_trap_nonpresent_pte;
>  			goto set_pte;
> @@ -1780,7 +1781,7 @@ set_pte:
>  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  			 unsigned pt_access, unsigned pte_access,
>  			 int user_fault, int write_fault, int dirty,
> -			 int *ptwrite, int largepage, gfn_t gfn,
> +			 int *ptwrite, int level, gfn_t gfn,
>  			 pfn_t pfn, bool speculative)
>  {
>  	int was_rmapped = 0;
> @@ -1796,7 +1797,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
>  		 * the parent of the now unreachable PTE.
>  		 */
> -		if (largepage && !is_large_pte(*sptep)) {
> +		if (level > PT_PAGE_TABLE_LEVEL &&
> +		    !is_large_pte(*sptep)) {

Should probably get rid of this entirely (or BUG_ON), since a better
place to handle this is at fetch / direct_map as mentioned in the other
email.

But we can do that later, incrementally.

>  			struct kvm_mmu_page *child;
>  			u64 pte = *sptep;
>  
> @@ -1809,8 +1811,9 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		} else
>  			was_rmapped = 1;
>  	}
> +
>  	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
> -		      dirty, largepage, gfn, pfn, speculative, true)) {
> +		      dirty, level, gfn, pfn, speculative, true)) {
>  		if (write_fault)
>  			*ptwrite = 1;
>  		kvm_x86_ops->tlb_flush(vcpu);
> @@ -1846,7 +1849,7 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
>  }
>  
>  static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
> -			int largepage, gfn_t gfn, pfn_t pfn)
> +			int level, gfn_t gfn, pfn_t pfn)
>  {
>  	struct kvm_shadow_walk_iterator iterator;
>  	struct kvm_mmu_page *sp;
> @@ -1854,11 +1857,10 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  	gfn_t pseudo_gfn;
>  
>  	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
> -		if (iterator.level == PT_PAGE_TABLE_LEVEL
> -		    || (largepage && iterator.level == PT_DIRECTORY_LEVEL)) {
> +		if (iterator.level == level) {
>  			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
>  				     0, write, 1, &pt_write,
> -				     largepage, gfn, pfn, false);
> +				     level, gfn, pfn, false);
>  			++vcpu->stat.pf_fixed;
>  			break;
>  		}
> @@ -1886,14 +1888,20 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
>  {
>  	int r;
> -	int largepage = 0;
> +	int level;
>  	pfn_t pfn;
>  	unsigned long mmu_seq;
>  
> -	if (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL) {
> -		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
> -		largepage = 1;
> -	}
> +	level = mapping_level(vcpu, gfn);
> +
> +	/*
> +	 * This path builds a PAE pagetable - so we can map 2mb pages at
> +	 * maximum. Therefore check if the level is larger than that.
> +	 */
> +	if (level > PT_DIRECTORY_LEVEL)
> +		level = PT_DIRECTORY_LEVEL;
> +
> +	gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
>  
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
> @@ -1909,7 +1917,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
>  	if (mmu_notifier_retry(vcpu, mmu_seq))
>  		goto out_unlock;
>  	kvm_mmu_free_some_pages(vcpu);
> -	r = __direct_map(vcpu, v, write, largepage, gfn, pfn);
> +	r = __direct_map(vcpu, v, write, level, gfn, pfn);
>  	spin_unlock(&vcpu->kvm->mmu_lock);
>  
>  
> @@ -2085,7 +2093,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  {
>  	pfn_t pfn;
>  	int r;
> -	int largepage = 0;
> +	int level;
>  	gfn_t gfn = gpa >> PAGE_SHIFT;
>  	unsigned long mmu_seq;
>  
> @@ -2096,10 +2104,10 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  	if (r)
>  		return r;
>  
> -	if (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL) {
> -		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
> -		largepage = 1;
> -	}
> +	level = mapping_level(vcpu, gfn);
> +
> +	gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
> +
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
>  	pfn = gfn_to_pfn(vcpu->kvm, gfn);
> @@ -2112,7 +2120,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
>  		goto out_unlock;
>  	kvm_mmu_free_some_pages(vcpu);
>  	r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
> -			 largepage, gfn, pfn);
> +			 level, gfn, pfn);
>  	spin_unlock(&vcpu->kvm->mmu_lock);
>  
>  	return r;
> @@ -2419,7 +2427,7 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
>  				  const void *new)
>  {
>  	if (sp->role.level != PT_PAGE_TABLE_LEVEL) {
> -		if (!vcpu->arch.update_pte.largepage ||
> +		if (vcpu->arch.update_pte.level == PT_PAGE_TABLE_LEVEL ||
>  		    sp->role.glevels == PT32_ROOT_LEVEL) {
>  			++vcpu->kvm->stat.mmu_pde_zapped;
>  			return;
> @@ -2469,7 +2477,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	u64 gpte = 0;
>  	pfn_t pfn;
>  
> -	vcpu->arch.update_pte.largepage = 0;
> +	vcpu->arch.update_pte.level = PT_PAGE_TABLE_LEVEL;
>  
>  	if (bytes != 4 && bytes != 8)
>  		return;
> @@ -2501,7 +2509,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	if (is_large_pte(gpte) &&
>  	    (mapping_level(vcpu, gfn) == PT_DIRECTORY_LEVEL)) {
>  		gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
> -		vcpu->arch.update_pte.largepage = 1;
> +		vcpu->arch.update_pte.level = PT_DIRECTORY_LEVEL;
>  	}
>  	vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 25a4437..8fbf4e7 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -248,7 +248,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
>  	pt_element_t gpte;
>  	unsigned pte_access;
>  	pfn_t pfn;
> -	int largepage = vcpu->arch.update_pte.largepage;
> +	int level = vcpu->arch.update_pte.level;
>  
>  	gpte = *(const pt_element_t *)pte;
>  	if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
> @@ -267,7 +267,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
>  		return;
>  	kvm_get_pfn(pfn);
>  	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
> -		     gpte & PT_DIRTY_MASK, NULL, largepage,
> +		     gpte & PT_DIRTY_MASK, NULL, level,
>  		     gpte_to_gfn(gpte), pfn, true);

It would be better to just turn off updates to large sptes via          
update_pte path, so just nuke them and let the pagefault path
handle.

>  }
>  
> @@ -301,7 +301,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  				     gw->pte_access & access,
>  				     user_fault, write_fault,
>  				     gw->ptes[gw->level-1] & PT_DIRTY_MASK,
> -				     ptwrite, largepage,
> +				     ptwrite, level,
>  				     gw->gfn, pfn, false);
>  			break;
>  		}
> -- 
> 1.6.3.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes
  2009-06-19 13:16 ` [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes Joerg Roedel
@ 2009-06-23 16:49   ` Marcelo Tosatti
  0 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2009-06-23 16:49 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Avi Kivity, Marcelo Tosatti, kvm, linux-kernel

On Fri, Jun 19, 2009 at 03:16:23PM +0200, Joerg Roedel wrote:
>  /*
> @@ -724,11 +724,11 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  		end = start + (memslot->npages << PAGE_SHIFT);
>  		if (hva >= start && hva < end) {
>  			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
> +			int idx = gfn_offset /
> +			          KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);
>  			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
>  			retval |= handler(kvm,
> -					  &memslot->lpage_info[
> -						  gfn_offset /
> -						  KVM_PAGES_PER_HPAGE].rmap_pde);
> +					&memslot->lpage_info[0][idx].rmap_pde);
>  		}

Cannot find where you update this function to reflect 1GB pages in the
series?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level
  2009-06-23 15:59   ` Marcelo Tosatti
@ 2009-06-23 17:00     ` Joerg Roedel
  0 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-23 17:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, linux-kernel

On Tue, Jun 23, 2009 at 12:59:33PM -0300, Marcelo Tosatti wrote:
> Hi Joerg,
> 
> On Fri, Jun 19, 2009 at 03:16:24PM +0200, Joerg Roedel wrote:
> >  	gfn = unalias_gfn(kvm, gfn);
> > -	write_count = slot_largepage_idx(gfn,
> > -					 gfn_to_memslot_unaliased(kvm, gfn));
> > -	*write_count += 1;
> > +
> > +	for (i = PT_DIRECTORY_LEVEL;
> > +	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> > +		slot          = gfn_to_memslot_unaliased(kvm, gfn);
> 
> Can't you move this call out of the loop?

True. Will do this.

> > @@ -1704,7 +1739,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >  	if ((pte_access & ACC_WRITE_MASK)
> >  	    || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
> >  
> > -		if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
> > +		if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) {
> 
> It seems direct_map is missing the large pte overwrite check that
> fetch() contains:
> 
>                 if (is_large_pte(*sptep)) {
>                         rmap_remove(vcpu->kvm, sptep);
>                         __set_spte(sptep, shadow_trap_nonpresent_pte);
>                         kvm_flush_remote_tlbs(vcpu->kvm);
>                 }
> 
> (perhaps its not a possible scenario at the moment, but...).


This function is only called from mmu_set_spte which takes care of this.

Thanks,

	Joerg

-- 
           | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System    | 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center    | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
           | Registergericht München, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/8] kvm/mmu: make direct mapping paths aware of mapping levels
  2009-06-23 16:47   ` Marcelo Tosatti
@ 2009-06-23 17:10     ` Joerg Roedel
  0 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-23 17:10 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Marcelo Tosatti, kvm, linux-kernel

On Tue, Jun 23, 2009 at 01:47:28PM -0300, Marcelo Tosatti wrote:
> On Fri, Jun 19, 2009 at 03:16:26PM +0200, Joerg Roedel wrote:
> > @@ -254,7 +254,7 @@ static int is_last_spte(u64 pte, int level)
> >  {
> >  	if (level == PT_PAGE_TABLE_LEVEL)
> >  		return 1;
> > -	if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte))
> > +	if (is_large_pte(pte))
> >  		return 1;
> 
> Wouldnt it be safer to check for bit 7 only on the levels
> we're sure it means large page?

If we are not on PT_PAGE_TABLE_LEVEL and this bit does not mean "large page"
then bit 7 is MBZ and harware would fault. So it should be safe to just check
for bit 7 here.


> >  	kvm_get_pfn(pfn);
> >  	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
> > -		     gpte & PT_DIRTY_MASK, NULL, largepage,
> > +		     gpte & PT_DIRTY_MASK, NULL, level,
> >  		     gpte_to_gfn(gpte), pfn, true);
> 
> It would be better to just turn off updates to large sptes via          
> update_pte path, so just nuke them and let the pagefault path
> handle.

Yeah true. Thats in the shadow paging patch. Maybe I can get it into a state to
be postable again ;-)

Thanks,

	Joerg

-- 
           | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System    | 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center    | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
           | Registergericht München, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v3] KVM support for 1GB pages
  2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
                   ` (8 preceding siblings ...)
  2009-06-20 11:03 ` [PATCH 0/8 v3] KVM support for 1GB pages Avi Kivity
@ 2009-06-24  8:43 ` Avi Kivity
  2009-06-29 13:17   ` Joerg Roedel
  9 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2009-06-24  8:43 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Marcelo Tosatti, kvm, linux-kernel

On 06/19/2009 04:16 PM, Joerg Roedel wrote:
> Hi,
>
> this is the third version of the patches for KVM to support 1GB pages. Changes
> to the last version include:
>
> 	- changed reporting of 1GB page support to userspace to use
> 	  SUPPORTED_CPUID ioctl
> 	- added support for 1GB pages to the software page walker code too
>
> This code still only can make use of 1GB pages with nested paging enabled. I
> will give the shadow paging code another debug round soon. Please comment or
> consider to apply these patches.
>
> Thanks,
>
> 	Joerg
>
> Shortlog:
>
> Joerg Roedel (8):
>        hugetlbfs: export vma_kernel_pagsize to modules
>        kvm: change memslot data structures for multiple hugepage sizes
>    

At Marcelo's suggestion I've applied the first two patches, to reduce 
your rebase load.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v3] KVM support for 1GB pages
  2009-06-24  8:43 ` Avi Kivity
@ 2009-06-29 13:17   ` Joerg Roedel
  0 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2009-06-29 13:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, linux-kernel

On Wed, Jun 24, 2009 at 11:43:00AM +0300, Avi Kivity wrote:
> On 06/19/2009 04:16 PM, Joerg Roedel wrote:
>> Hi,
>>
>> this is the third version of the patches for KVM to support 1GB pages. Changes
>> to the last version include:
>>
>> 	- changed reporting of 1GB page support to userspace to use
>> 	  SUPPORTED_CPUID ioctl
>> 	- added support for 1GB pages to the software page walker code too
>>
>> This code still only can make use of 1GB pages with nested paging enabled. I
>> will give the shadow paging code another debug round soon. Please comment or
>> consider to apply these patches.
>>
>> Thanks,
>>
>> 	Joerg
>>
>> Shortlog:
>>
>> Joerg Roedel (8):
>>        hugetlbfs: export vma_kernel_pagsize to modules
>>        kvm: change memslot data structures for multiple hugepage sizes
>>    
>
> At Marcelo's suggestion I've applied the first two patches, to reduce  your
> rebase load.

Thank you. I will soon fix your objection on the existing patches and resend
them. After that I continue to look at gbpages with shadow paging again.

	Joerg

-- 
           | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System    | 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center    | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
           | Registergericht München, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-06-29 13:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-19 13:16 [PATCH 0/8 v3] KVM support for 1GB pages Joerg Roedel
2009-06-19 13:16 ` [PATCH 1/8] hugetlbfs: export vma_kernel_pagsize to modules Joerg Roedel
2009-06-19 13:16 ` [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes Joerg Roedel
2009-06-23 16:49   ` Marcelo Tosatti
2009-06-19 13:16 ` [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level Joerg Roedel
2009-06-23 15:59   ` Marcelo Tosatti
2009-06-23 17:00     ` Joerg Roedel
2009-06-19 13:16 ` [PATCH 4/8] kvm/mmu: make rmap code aware of mapping levels Joerg Roedel
2009-06-19 13:16 ` [PATCH 5/8] kvm/mmu: make direct mapping paths " Joerg Roedel
2009-06-23 16:47   ` Marcelo Tosatti
2009-06-23 17:10     ` Joerg Roedel
2009-06-19 13:16 ` [PATCH 6/8] kvm/mmu: add support for another level to page walker Joerg Roedel
2009-06-20 11:19   ` Avi Kivity
2009-06-22  9:38     ` Joerg Roedel
2009-06-19 13:16 ` [PATCH 7/8] kvm/mmu: enable gbpages by increasing nr of pagesizes Joerg Roedel
2009-06-19 13:16 ` [PATCH 8/8] kvm x86: report 1GB page support to userspace Joerg Roedel
2009-06-20 11:03 ` [PATCH 0/8 v3] KVM support for 1GB pages Avi Kivity
2009-06-22  9:40   ` Joerg Roedel
2009-06-22  9:43     ` Avi Kivity
2009-06-22  9:49       ` Joerg Roedel
2009-06-22  9:55         ` Avi Kivity
2009-06-24  8:43 ` Avi Kivity
2009-06-29 13:17   ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.