All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test
@ 2022-05-17 19:05 David Matlack
  2022-05-17 19:05 ` [PATCH v2 01/10] KVM: selftests: Replace x86_page_size with PG_LEVEL_XX David Matlack
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

This series adds support for taking any perf_test_util-based test and
configuring it to run vCPUs in L2 instead of L1, and adds an option to
dirty_log_perf_test to enable it.

This series was used to collect the performance data for eager page
spliting for nested MMUs [1].

[1] https://lore.kernel.org/kvm/20220422210546.458943-1-dmatlack@google.com/

v2:
 - Collect R-b tags from Peter.
 - Use level macros instead of raw numbers [Peter]
 - Remove "upper" from function name [Peter]
 - Bring back setting the A/D bits on EPT PTEs [Peter]
 - Drop "all" rule from Makefile [Peter]
 - Reserve memory for EPT pages [Peter]
 - Fix off-by-one error in nested_map_all_1g() [me]

v1: https://lore.kernel.org/kvm/20220429183935.1094599-1-dmatlack@google.com/

David Matlack (10):
  KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS)
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2

 tools/testing/selftests/kvm/Makefile          |  49 ++++--
 .../selftests/kvm/dirty_log_perf_test.c       |  10 +-
 .../selftests/kvm/include/perf_test_util.h    |   7 +
 .../selftests/kvm/include/x86_64/processor.h  |  21 +--
 .../selftests/kvm/include/x86_64/vmx.h        |   5 +
 .../selftests/kvm/lib/perf_test_util.c        |  29 +++-
 .../selftests/kvm/lib/x86_64/perf_test_util.c |  98 ++++++++++++
 .../selftests/kvm/lib/x86_64/processor.c      |  33 ++--
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  | 147 +++++++++++-------
 .../selftests/kvm/max_guest_memory_test.c     |   2 +-
 .../selftests/kvm/x86_64/mmu_role_test.c      |   2 +-
 11 files changed, 300 insertions(+), 103 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c


base-commit: a3808d88461270c71d3fece5e51cc486ecdac7d0
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 01/10] KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 20:26   ` Peter Xu
  2022-05-17 19:05 ` [PATCH v2 02/10] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

x86_page_size is an enum used to communicate the desired page size with
which to map a range of memory. Under the hood they just encode the
desired level at which to map the page. This ends up being clunky in a
few ways:

 - The name suggests it encodes the size of the page rather than the
   level.
 - In other places in x86_64/processor.c we just use a raw int to encode
   the level.

Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
around raw ints when referring to the level. This makes the code easier
to understand since these macros are very common in KVM MMU code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../selftests/kvm/include/x86_64/processor.h  | 18 ++++++----
 .../selftests/kvm/lib/x86_64/processor.c      | 33 ++++++++++---------
 .../selftests/kvm/max_guest_memory_test.c     |  2 +-
 .../selftests/kvm/x86_64/mmu_role_test.c      |  2 +-
 4 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 37db341d4cc5..434a4f60f4d9 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -465,13 +465,19 @@ void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 void vm_xsave_req_perm(int bit);
 
-enum x86_page_size {
-	X86_PAGE_SIZE_4K = 0,
-	X86_PAGE_SIZE_2M,
-	X86_PAGE_SIZE_1G,
+enum pg_level {
+	PG_LEVEL_NONE,
+	PG_LEVEL_4K,
+	PG_LEVEL_2M,
+	PG_LEVEL_1G,
+	PG_LEVEL_512G,
+	PG_LEVEL_NUM
 };
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
-		   enum x86_page_size page_size);
+
+#define PG_LEVEL_SHIFT(_level) ((_level - 1) * 9 + 12)
+#define PG_LEVEL_SIZE(_level) (1ull << PG_LEVEL_SHIFT(_level))
+
+void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
 
 /*
  * Basic CPU control in CR0
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 9f000dfb5594..f733c5b02da5 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -190,7 +190,7 @@ static void *virt_get_pte(struct kvm_vm *vm, uint64_t pt_pfn, uint64_t vaddr,
 			  int level)
 {
 	uint64_t *page_table = addr_gpa2hva(vm, pt_pfn << vm->page_shift);
-	int index = vaddr >> (vm->page_shift + level * 9) & 0x1ffu;
+	int index = (vaddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
 
 	return &page_table[index];
 }
@@ -199,15 +199,15 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
 						    uint64_t pt_pfn,
 						    uint64_t vaddr,
 						    uint64_t paddr,
-						    int level,
-						    enum x86_page_size page_size)
+						    int current_level,
+						    int target_level)
 {
-	struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, level);
+	struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, current_level);
 
 	if (!pte->present) {
 		pte->writable = true;
 		pte->present = true;
-		pte->page_size = (level == page_size);
+		pte->page_size = (current_level == target_level);
 		if (pte->page_size)
 			pte->pfn = paddr >> vm->page_shift;
 		else
@@ -218,20 +218,19 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
 		 * a hugepage at this level, and that there isn't a hugepage at
 		 * this level.
 		 */
-		TEST_ASSERT(level != page_size,
+		TEST_ASSERT(current_level != target_level,
 			    "Cannot create hugepage at level: %u, vaddr: 0x%lx\n",
-			    page_size, vaddr);
+			    current_level, vaddr);
 		TEST_ASSERT(!pte->page_size,
 			    "Cannot create page table at level: %u, vaddr: 0x%lx\n",
-			    level, vaddr);
+			    current_level, vaddr);
 	}
 	return pte;
 }
 
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
-		   enum x86_page_size page_size)
+void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
 {
-	const uint64_t pg_size = 1ull << ((page_size * 9) + 12);
+	const uint64_t pg_size = PG_LEVEL_SIZE(level);
 	struct pageUpperEntry *pml4e, *pdpe, *pde;
 	struct pageTableEntry *pte;
 
@@ -256,20 +255,22 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 	 * early if a hugepage was created.
 	 */
 	pml4e = virt_create_upper_pte(vm, vm->pgd >> vm->page_shift,
-				      vaddr, paddr, 3, page_size);
+				      vaddr, paddr, PG_LEVEL_512G, level);
 	if (pml4e->page_size)
 		return;
 
-	pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, page_size);
+	pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, PG_LEVEL_1G,
+				     level);
 	if (pdpe->page_size)
 		return;
 
-	pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, page_size);
+	pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, PG_LEVEL_2M,
+				    level);
 	if (pde->page_size)
 		return;
 
 	/* Fill in page table entry. */
-	pte = virt_get_pte(vm, pde->pfn, vaddr, 0);
+	pte = virt_get_pte(vm, pde->pfn, vaddr, PG_LEVEL_4K);
 	TEST_ASSERT(!pte->present,
 		    "PTE already present for 4k page at vaddr: 0x%lx\n", vaddr);
 	pte->pfn = paddr >> vm->page_shift;
@@ -279,7 +280,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 
 void virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
 {
-	__virt_pg_map(vm, vaddr, paddr, X86_PAGE_SIZE_4K);
+	__virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
 }
 
 static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid,
diff --git a/tools/testing/selftests/kvm/max_guest_memory_test.c b/tools/testing/selftests/kvm/max_guest_memory_test.c
index 3875c4b23a04..15f046e19cb2 100644
--- a/tools/testing/selftests/kvm/max_guest_memory_test.c
+++ b/tools/testing/selftests/kvm/max_guest_memory_test.c
@@ -244,7 +244,7 @@ int main(int argc, char *argv[])
 #ifdef __x86_64__
 		/* Identity map memory in the guest using 1gb pages. */
 		for (i = 0; i < slot_size; i += size_1gb)
-			__virt_pg_map(vm, gpa + i, gpa + i, X86_PAGE_SIZE_1G);
+			__virt_pg_map(vm, gpa + i, gpa + i, PG_LEVEL_1G);
 #else
 		for (i = 0; i < slot_size; i += vm_get_page_size(vm))
 			virt_pg_map(vm, gpa + i, gpa + i);
diff --git a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
index da2325fcad87..bdecd532f935 100644
--- a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
+++ b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
@@ -35,7 +35,7 @@ static void mmu_role_test(u32 *cpuid_reg, u32 evil_cpuid_val)
 	run = vcpu_state(vm, VCPU_ID);
 
 	/* Map 1gb page without a backing memlot. */
-	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, X86_PAGE_SIZE_1G);
+	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, PG_LEVEL_1G);
 
 	r = _vcpu_run(vm, VCPU_ID);
 
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 02/10] KVM: selftests: Add option to create 2M and 1G EPT mappings
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
  2022-05-17 19:05 ` [PATCH v2 01/10] KVM: selftests: Replace x86_page_size with PG_LEVEL_XX David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 20:27   ` Peter Xu
  2022-05-17 19:05 ` [PATCH v2 03/10] KVM: selftests: Drop stale function parameter comment for nested_map() David Matlack
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

The current EPT mapping code in the selftests only supports mapping 4K
pages. This commit extends that support with an option to map at 2M or
1G. This will be used in a future commit to create large page mappings
to test eager page splitting.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 110 ++++++++++---------
 1 file changed, 60 insertions(+), 50 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index d089d8b850b5..fdc1e6deb922 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -392,80 +392,90 @@ void nested_vmx_check_supported(void)
 	}
 }
 
-void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
-		   uint64_t nested_paddr, uint64_t paddr)
+static void nested_create_pte(struct kvm_vm *vm,
+			      struct eptPageTableEntry *pte,
+			      uint64_t nested_paddr,
+			      uint64_t paddr,
+			      int current_level,
+			      int target_level)
+{
+	if (!pte->readable) {
+		pte->writable = true;
+		pte->readable = true;
+		pte->executable = true;
+		pte->page_size = (current_level == target_level);
+		if (pte->page_size)
+			pte->address = paddr >> vm->page_shift;
+		else
+			pte->address = vm_alloc_page_table(vm) >> vm->page_shift;
+	} else {
+		/*
+		 * Entry already present.  Assert that the caller doesn't want
+		 * a hugepage at this level, and that there isn't a hugepage at
+		 * this level.
+		 */
+		TEST_ASSERT(current_level != target_level,
+			    "Cannot create hugepage at level: %u, nested_paddr: 0x%lx\n",
+			    current_level, nested_paddr);
+		TEST_ASSERT(!pte->page_size,
+			    "Cannot create page table at level: %u, nested_paddr: 0x%lx\n",
+			    current_level, nested_paddr);
+	}
+}
+
+
+void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		     uint64_t nested_paddr, uint64_t paddr, int target_level)
 {
-	uint16_t index[4];
-	struct eptPageTableEntry *pml4e;
+	const uint64_t page_size = PG_LEVEL_SIZE(target_level);
+	struct eptPageTableEntry *pt = vmx->eptp_hva, *pte;
+	uint16_t index;
 
 	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
 		    "unknown or unsupported guest mode, mode: 0x%x", vm->mode);
 
-	TEST_ASSERT((nested_paddr % vm->page_size) == 0,
+	TEST_ASSERT((nested_paddr % page_size) == 0,
 		    "Nested physical address not on page boundary,\n"
-		    "  nested_paddr: 0x%lx vm->page_size: 0x%x",
-		    nested_paddr, vm->page_size);
+		    "  nested_paddr: 0x%lx page_size: 0x%lx",
+		    nested_paddr, page_size);
 	TEST_ASSERT((nested_paddr >> vm->page_shift) <= vm->max_gfn,
 		    "Physical address beyond beyond maximum supported,\n"
 		    "  nested_paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
 		    paddr, vm->max_gfn, vm->page_size);
-	TEST_ASSERT((paddr % vm->page_size) == 0,
+	TEST_ASSERT((paddr % page_size) == 0,
 		    "Physical address not on page boundary,\n"
-		    "  paddr: 0x%lx vm->page_size: 0x%x",
-		    paddr, vm->page_size);
+		    "  paddr: 0x%lx page_size: 0x%lx",
+		    paddr, page_size);
 	TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn,
 		    "Physical address beyond beyond maximum supported,\n"
 		    "  paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
 		    paddr, vm->max_gfn, vm->page_size);
 
-	index[0] = (nested_paddr >> 12) & 0x1ffu;
-	index[1] = (nested_paddr >> 21) & 0x1ffu;
-	index[2] = (nested_paddr >> 30) & 0x1ffu;
-	index[3] = (nested_paddr >> 39) & 0x1ffu;
-
-	/* Allocate page directory pointer table if not present. */
-	pml4e = vmx->eptp_hva;
-	if (!pml4e[index[3]].readable) {
-		pml4e[index[3]].address = vm_alloc_page_table(vm) >> vm->page_shift;
-		pml4e[index[3]].writable = true;
-		pml4e[index[3]].readable = true;
-		pml4e[index[3]].executable = true;
-	}
+	for (int level = PG_LEVEL_512G; level >= PG_LEVEL_4K; level--) {
+		index = (nested_paddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
+		pte = &pt[index];
 
-	/* Allocate page directory table if not present. */
-	struct eptPageTableEntry *pdpe;
-	pdpe = addr_gpa2hva(vm, pml4e[index[3]].address * vm->page_size);
-	if (!pdpe[index[2]].readable) {
-		pdpe[index[2]].address = vm_alloc_page_table(vm) >> vm->page_shift;
-		pdpe[index[2]].writable = true;
-		pdpe[index[2]].readable = true;
-		pdpe[index[2]].executable = true;
-	}
+		nested_create_pte(vm, pte, nested_paddr, paddr, level, target_level);
 
-	/* Allocate page table if not present. */
-	struct eptPageTableEntry *pde;
-	pde = addr_gpa2hva(vm, pdpe[index[2]].address * vm->page_size);
-	if (!pde[index[1]].readable) {
-		pde[index[1]].address = vm_alloc_page_table(vm) >> vm->page_shift;
-		pde[index[1]].writable = true;
-		pde[index[1]].readable = true;
-		pde[index[1]].executable = true;
-	}
+		if (pte->page_size)
+			break;
 
-	/* Fill in page table entry. */
-	struct eptPageTableEntry *pte;
-	pte = addr_gpa2hva(vm, pde[index[1]].address * vm->page_size);
-	pte[index[0]].address = paddr >> vm->page_shift;
-	pte[index[0]].writable = true;
-	pte[index[0]].readable = true;
-	pte[index[0]].executable = true;
+		pt = addr_gpa2hva(vm, pte->address * vm->page_size);
+	}
 
 	/*
 	 * For now mark these as accessed and dirty because the only
 	 * testcase we have needs that.  Can be reconsidered later.
 	 */
-	pte[index[0]].accessed = true;
-	pte[index[0]].dirty = true;
+	pte->accessed = true;
+	pte->dirty = true;
+
+}
+
+void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		   uint64_t nested_paddr, uint64_t paddr)
+{
+	__nested_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
 }
 
 /*
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 03/10] KVM: selftests: Drop stale function parameter comment for nested_map()
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
  2022-05-17 19:05 ` [PATCH v2 01/10] KVM: selftests: Replace x86_page_size with PG_LEVEL_XX David Matlack
  2022-05-17 19:05 ` [PATCH v2 02/10] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 19:05 ` [PATCH v2 04/10] KVM: selftests: Refactor nested_map() to specify target level David Matlack
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

nested_map() does not take a parameter named eptp_memslot. Drop the
comment referring to it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index fdc1e6deb922..baeaa35de113 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -486,7 +486,6 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
  *   nested_paddr - Nested guest physical address to map
  *   paddr - VM Physical Address
  *   size - The size of the range to map
- *   eptp_memslot - Memory region slot for new virtual translation tables
  *
  * Output Args: None
  *
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 04/10] KVM: selftests: Refactor nested_map() to specify target level
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (2 preceding siblings ...)
  2022-05-17 19:05 ` [PATCH v2 03/10] KVM: selftests: Drop stale function parameter comment for nested_map() David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 19:05 ` [PATCH v2 05/10] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h David Matlack
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Refactor nested_map() to specify that it explicityl wants 4K mappings
(the existing behavior) and push the implementation down into
__nested_map(), which can be used in subsequent commits to create huge
page mappings.

No function change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index baeaa35de113..b8cfe4914a3a 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -486,6 +486,7 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
  *   nested_paddr - Nested guest physical address to map
  *   paddr - VM Physical Address
  *   size - The size of the range to map
+ *   level - The level at which to map the range
  *
  * Output Args: None
  *
@@ -494,22 +495,29 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
  * Within the VM given by vm, creates a nested guest translation for the
  * page range starting at nested_paddr to the page range starting at paddr.
  */
-void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
-		uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+void __nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		  uint64_t nested_paddr, uint64_t paddr, uint64_t size,
+		  int level)
 {
-	size_t page_size = vm->page_size;
+	size_t page_size = PG_LEVEL_SIZE(level);
 	size_t npages = size / page_size;
 
 	TEST_ASSERT(nested_paddr + size > nested_paddr, "Vaddr overflow");
 	TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
 
 	while (npages--) {
-		nested_pg_map(vmx, vm, nested_paddr, paddr);
+		__nested_pg_map(vmx, vm, nested_paddr, paddr, level);
 		nested_paddr += page_size;
 		paddr += page_size;
 	}
 }
 
+void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+{
+	__nested_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+}
+
 /* Prepare an identity extended page table that maps all the
  * physical pages in VM.
  */
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 05/10] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (3 preceding siblings ...)
  2022-05-17 19:05 ` [PATCH v2 04/10] KVM: selftests: Refactor nested_map() to specify target level David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 19:05 ` [PATCH v2 06/10] KVM: selftests: Add a helper to check EPT/VPID capabilities David Matlack
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

This is a VMX-related macro so move it to vmx.h. While here, open code
the mask like the rest of the VMX bitmask macros.

No functional change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/include/x86_64/processor.h | 3 ---
 tools/testing/selftests/kvm/include/x86_64/vmx.h       | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 434a4f60f4d9..04f1d540bcb2 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -494,9 +494,6 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
 #define X86_CR0_CD          (1UL<<30) /* Cache Disable */
 #define X86_CR0_PG          (1UL<<31) /* Paging */
 
-/* VMX_EPT_VPID_CAP bits */
-#define VMX_EPT_VPID_CAP_AD_BITS       (1ULL << 21)
-
 #define XSTATE_XTILE_CFG_BIT		17
 #define XSTATE_XTILE_DATA_BIT		18
 
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 583ceb0d1457..3b1794baa97c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -96,6 +96,8 @@
 #define VMX_MISC_PREEMPTION_TIMER_RATE_MASK	0x0000001f
 #define VMX_MISC_SAVE_EFER_LMA			0x00000020
 
+#define VMX_EPT_VPID_CAP_AD_BITS		0x00200000
+
 #define EXIT_REASON_FAILED_VMENTRY	0x80000000
 #define EXIT_REASON_EXCEPTION_NMI	0
 #define EXIT_REASON_EXTERNAL_INTERRUPT	1
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 06/10] KVM: selftests: Add a helper to check EPT/VPID capabilities
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (4 preceding siblings ...)
  2022-05-17 19:05 ` [PATCH v2 05/10] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 19:05 ` [PATCH v2 07/10] KVM: selftests: Link selftests directly with lib object files David Matlack
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Create a small helper function to check if a given EPT/VPID capability
is supported. This will be re-used in a follow-up commit to check for 1G
page support.

No functional change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index b8cfe4914a3a..5bf169179455 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -198,6 +198,11 @@ bool load_vmcs(struct vmx_pages *vmx)
 	return true;
 }
 
+static bool ept_vpid_cap_supported(uint64_t mask)
+{
+	return rdmsr(MSR_IA32_VMX_EPT_VPID_CAP) & mask;
+}
+
 /*
  * Initialize the control fields to the most basic settings possible.
  */
@@ -215,7 +220,7 @@ static inline void init_vmcs_control_fields(struct vmx_pages *vmx)
 		struct eptPageTablePointer eptp = {
 			.memory_type = VMX_BASIC_MEM_TYPE_WB,
 			.page_walk_length = 3, /* + 1 */
-			.ad_enabled = !!(rdmsr(MSR_IA32_VMX_EPT_VPID_CAP) & VMX_EPT_VPID_CAP_AD_BITS),
+			.ad_enabled = ept_vpid_cap_supported(VMX_EPT_VPID_CAP_AD_BITS),
 			.address = vmx->eptp_gpa >> PAGE_SHIFT_4K,
 		};
 
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 07/10] KVM: selftests: Link selftests directly with lib object files
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (5 preceding siblings ...)
  2022-05-17 19:05 ` [PATCH v2 06/10] KVM: selftests: Add a helper to check EPT/VPID capabilities David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 19:05 ` [PATCH v2 08/10] KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS) David Matlack
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

The linker does obey strong/weak symbols when linking static libraries,
it simply resolves an undefined symbol to the first-encountered symbol.
This means that defining __weak arch-generic functions and then defining
arch-specific strong functions to override them in libkvm will not
always work.

More specifically, if we have:

lib/generic.c:

  void __weak foo(void)
  {
          pr_info("weak\n");
  }

  void bar(void)
  {
          foo();
  }

lib/x86_64/arch.c:

  void foo(void)
  {
          pr_info("strong\n");
  }

And a selftest that calls bar(), it will print "weak". Now if you make
generic.o explicitly depend on arch.o (e.g. add function to arch.c that
is called directly from generic.c) it will print "strong". In other
words, it seems that the linker is free to throw out arch.o when linking
because generic.o does not explicitly depend on it, which causes the
linker to lose the strong symbol.

One solution is to link libkvm.a with --whole-archive so that the linker
doesn't throw away object files it thinks are unnecessary. However that
is a bit difficult to plumb since we are using the common selftests
makefile rules. An easier solution is to drop libkvm.a just link
selftests with all the .o files that were originally in libkvm.a.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 8c3db2f75315..cd7a9df4ad6d 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -173,12 +173,13 @@ LDFLAGS += -pthread $(no-pie-option) $(pgste-option)
 # $(TEST_GEN_PROGS) starts with $(OUTPUT)/
 include ../lib.mk
 
-STATIC_LIBS := $(OUTPUT)/libkvm.a
 LIBKVM_C := $(filter %.c,$(LIBKVM))
 LIBKVM_S := $(filter %.S,$(LIBKVM))
 LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
 LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
-EXTRA_CLEAN += $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(STATIC_LIBS) cscope.*
+LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
+
+EXTRA_CLEAN += $(LIBKVM_OBJS) cscope.*
 
 x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ))))
 $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
@@ -187,13 +188,9 @@ $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
 $(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
 	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
-LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
-$(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
-	$(AR) crs $@ $^
-
 x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
-all: $(STATIC_LIBS)
-$(TEST_GEN_PROGS): $(STATIC_LIBS)
+all: $(LIBKVM_OBJS)
+$(TEST_GEN_PROGS): $(LIBKVM_OBJS)
 
 cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
 cscope:
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 08/10] KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS)
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (6 preceding siblings ...)
  2022-05-17 19:05 ` [PATCH v2 07/10] KVM: selftests: Link selftests directly with lib object files David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 20:21   ` Peter Xu
  2022-05-17 19:05 ` [PATCH v2 09/10] KVM: selftests: Clean up LIBKVM files in Makefile David Matlack
  2022-05-17 19:05 ` [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
  9 siblings, 1 reply; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Drop the "all: $(LIBKVM_OBJS)" rule. The KVM selftests already depend
on $(LIBKVM_OBJS), so there is no reason to have this rule.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index cd7a9df4ad6d..0889fc17baa5 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -189,7 +189,6 @@ $(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
 	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
 x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
-all: $(LIBKVM_OBJS)
 $(TEST_GEN_PROGS): $(LIBKVM_OBJS)
 
 cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 09/10] KVM: selftests: Clean up LIBKVM files in Makefile
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (7 preceding siblings ...)
  2022-05-17 19:05 ` [PATCH v2 08/10] KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS) David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 19:05 ` [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
  9 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Break up the long lines for LIBKVM and alphabetize each architecture.
This makes reading the Makefile easier, and will make reading diffs to
LIBKVM easier.

No functional change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile | 36 ++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 0889fc17baa5..83b9ffa456ea 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -37,11 +37,37 @@ ifeq ($(ARCH),riscv)
 	UNAME_M := riscv
 endif
 
-LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/rbtree.c lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c
-LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
-LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handlers.S lib/aarch64/spinlock.c lib/aarch64/gic.c lib/aarch64/gic_v3.c lib/aarch64/vgic.c
-LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
-LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
+LIBKVM += lib/assert.c
+LIBKVM += lib/elf.c
+LIBKVM += lib/guest_modes.c
+LIBKVM += lib/io.c
+LIBKVM += lib/kvm_util.c
+LIBKVM += lib/perf_test_util.c
+LIBKVM += lib/rbtree.c
+LIBKVM += lib/sparsebit.c
+LIBKVM += lib/test_util.c
+
+LIBKVM_x86_64 += lib/x86_64/apic.c
+LIBKVM_x86_64 += lib/x86_64/handlers.S
+LIBKVM_x86_64 += lib/x86_64/processor.c
+LIBKVM_x86_64 += lib/x86_64/svm.c
+LIBKVM_x86_64 += lib/x86_64/ucall.c
+LIBKVM_x86_64 += lib/x86_64/vmx.c
+
+LIBKVM_aarch64 += lib/aarch64/gic.c
+LIBKVM_aarch64 += lib/aarch64/gic_v3.c
+LIBKVM_aarch64 += lib/aarch64/handlers.S
+LIBKVM_aarch64 += lib/aarch64/processor.c
+LIBKVM_aarch64 += lib/aarch64/spinlock.c
+LIBKVM_aarch64 += lib/aarch64/ucall.c
+LIBKVM_aarch64 += lib/aarch64/vgic.c
+
+LIBKVM_s390x += lib/s390x/diag318_test_handler.c
+LIBKVM_s390x += lib/s390x/processor.c
+LIBKVM_s390x += lib/s390x/ucall.c
+
+LIBKVM_riscv += lib/riscv/processor.c
+LIBKVM_riscv += lib/riscv/ucall.c
 
 TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
 TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (8 preceding siblings ...)
  2022-05-17 19:05 ` [PATCH v2 09/10] KVM: selftests: Clean up LIBKVM files in Makefile David Matlack
@ 2022-05-17 19:05 ` David Matlack
  2022-05-17 20:20   ` Peter Xu
  9 siblings, 1 reply; 22+ messages in thread
From: David Matlack @ 2022-05-17 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Add an option to dirty_log_perf_test that configures the vCPUs to run in
L2 instead of L1. This makes it possible to benchmark the dirty logging
performance of nested virtualization, which is particularly interesting
because KVM must shadow L1's EPT/NPT tables.

For now this support only works on x86_64 CPUs with VMX. Otherwise
passing -n results in the test being skipped.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../selftests/kvm/dirty_log_perf_test.c       | 10 +-
 .../selftests/kvm/include/perf_test_util.h    |  7 ++
 .../selftests/kvm/include/x86_64/vmx.h        |  3 +
 .../selftests/kvm/lib/perf_test_util.c        | 29 +++++-
 .../selftests/kvm/lib/x86_64/perf_test_util.c | 98 +++++++++++++++++++
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  | 13 +++
 7 files changed, 154 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 83b9ffa456ea..42cb904f6e54 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -49,6 +49,7 @@ LIBKVM += lib/test_util.c
 
 LIBKVM_x86_64 += lib/x86_64/apic.c
 LIBKVM_x86_64 += lib/x86_64/handlers.S
+LIBKVM_x86_64 += lib/x86_64/perf_test_util.c
 LIBKVM_x86_64 += lib/x86_64/processor.c
 LIBKVM_x86_64 += lib/x86_64/svm.c
 LIBKVM_x86_64 += lib/x86_64/ucall.c
diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 7b47ae4f952e..d60a34cdfaee 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -336,8 +336,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 static void help(char *name)
 {
 	puts("");
-	printf("usage: %s [-h] [-i iterations] [-p offset] [-g]"
-	       "[-m mode] [-b vcpu bytes] [-v vcpus] [-o] [-s mem type]"
+	printf("usage: %s [-h] [-i iterations] [-p offset] [-g] "
+	       "[-m mode] [-n] [-b vcpu bytes] [-v vcpus] [-o] [-s mem type]"
 	       "[-x memslots]\n", name);
 	puts("");
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
@@ -351,6 +351,7 @@ static void help(char *name)
 	printf(" -p: specify guest physical test memory offset\n"
 	       "     Warning: a low offset can conflict with the loaded test code.\n");
 	guest_modes_help();
+	printf(" -n: Run the vCPUs in nested mode (L2)\n");
 	printf(" -b: specify the size of the memory region which should be\n"
 	       "     dirtied by each vCPU. e.g. 10M or 3G.\n"
 	       "     (default: 1G)\n");
@@ -387,7 +388,7 @@ int main(int argc, char *argv[])
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "ghi:p:m:b:f:v:os:x:")) != -1) {
+	while ((opt = getopt(argc, argv, "ghi:p:m:nb:f:v:os:x:")) != -1) {
 		switch (opt) {
 		case 'g':
 			dirty_log_manual_caps = 0;
@@ -401,6 +402,9 @@ int main(int argc, char *argv[])
 		case 'm':
 			guest_modes_cmdline(optarg);
 			break;
+		case 'n':
+			perf_test_args.nested = true;
+			break;
 		case 'b':
 			guest_percpu_mem_size = parse_size(optarg);
 			break;
diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h
index a86f953d8d36..b6c1770ab831 100644
--- a/tools/testing/selftests/kvm/include/perf_test_util.h
+++ b/tools/testing/selftests/kvm/include/perf_test_util.h
@@ -34,6 +34,9 @@ struct perf_test_args {
 	uint64_t guest_page_size;
 	int wr_fract;
 
+	/* Run vCPUs in L2 instead of L1, if the architecture supports it. */
+	bool nested;
+
 	struct perf_test_vcpu_args vcpu_args[KVM_MAX_VCPUS];
 };
 
@@ -49,5 +52,9 @@ void perf_test_set_wr_fract(struct kvm_vm *vm, int wr_fract);
 
 void perf_test_start_vcpu_threads(int vcpus, void (*vcpu_fn)(struct perf_test_vcpu_args *));
 void perf_test_join_vcpu_threads(int vcpus);
+void perf_test_guest_code(uint32_t vcpu_id);
+
+uint64_t perf_test_nested_pages(int nr_vcpus);
+void perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus);
 
 #endif /* SELFTEST_KVM_PERF_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 3b1794baa97c..17d712503a36 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -96,6 +96,7 @@
 #define VMX_MISC_PREEMPTION_TIMER_RATE_MASK	0x0000001f
 #define VMX_MISC_SAVE_EFER_LMA			0x00000020
 
+#define VMX_EPT_VPID_CAP_1G_PAGES		0x00020000
 #define VMX_EPT_VPID_CAP_AD_BITS		0x00200000
 
 #define EXIT_REASON_FAILED_VMENTRY	0x80000000
@@ -608,6 +609,7 @@ bool load_vmcs(struct vmx_pages *vmx);
 
 bool nested_vmx_supported(void);
 void nested_vmx_check_supported(void);
+bool ept_1g_pages_supported(void);
 
 void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
 		   uint64_t nested_paddr, uint64_t paddr);
@@ -615,6 +617,7 @@ void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
 		 uint64_t nested_paddr, uint64_t paddr, uint64_t size);
 void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
 			uint32_t memslot);
+void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm);
 void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot);
 void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
index 722df3a28791..530be01706d5 100644
--- a/tools/testing/selftests/kvm/lib/perf_test_util.c
+++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
@@ -40,7 +40,7 @@ static bool all_vcpu_threads_running;
  * Continuously write to the first 8 bytes of each page in the
  * specified region.
  */
-static void guest_code(uint32_t vcpu_id)
+void perf_test_guest_code(uint32_t vcpu_id)
 {
 	struct perf_test_args *pta = &perf_test_args;
 	struct perf_test_vcpu_args *vcpu_args = &pta->vcpu_args[vcpu_id];
@@ -108,7 +108,7 @@ struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int vcpus,
 {
 	struct perf_test_args *pta = &perf_test_args;
 	struct kvm_vm *vm;
-	uint64_t guest_num_pages;
+	uint64_t guest_num_pages, slot0_pages = DEFAULT_GUEST_PHY_PAGES;
 	uint64_t backing_src_pagesz = get_backing_src_pagesz(backing_src);
 	int i;
 
@@ -134,13 +134,20 @@ struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int vcpus,
 		    "Guest memory cannot be evenly divided into %d slots.",
 		    slots);
 
+	/*
+	 * If using nested, allocate extra pages for the nested page tables and
+	 * in-memory data structures.
+	 */
+	if (pta->nested)
+		slot0_pages += perf_test_nested_pages(vcpus);
+
 	/*
 	 * Pass guest_num_pages to populate the page tables for test memory.
 	 * The memory is also added to memslot 0, but that's a benign side
 	 * effect as KVM allows aliasing HVAs in meslots.
 	 */
-	vm = vm_create_with_vcpus(mode, vcpus, DEFAULT_GUEST_PHY_PAGES,
-				  guest_num_pages, 0, guest_code, NULL);
+	vm = vm_create_with_vcpus(mode, vcpus, slot0_pages, guest_num_pages, 0,
+				  perf_test_guest_code, NULL);
 
 	pta->vm = vm;
 
@@ -178,6 +185,9 @@ struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int vcpus,
 
 	perf_test_setup_vcpus(vm, vcpus, vcpu_memory_bytes, partition_vcpu_memory_access);
 
+	if (pta->nested)
+		perf_test_setup_nested(vm, vcpus);
+
 	ucall_init(vm, NULL);
 
 	/* Export the shared variables to the guest. */
@@ -198,6 +208,17 @@ void perf_test_set_wr_fract(struct kvm_vm *vm, int wr_fract)
 	sync_global_to_guest(vm, perf_test_args);
 }
 
+uint64_t __weak perf_test_nested_pages(int nr_vcpus)
+{
+	return 0;
+}
+
+void __weak perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus)
+{
+	pr_info("%s() not support on this architecture, skipping.\n", __func__);
+	exit(KSFT_SKIP);
+}
+
 static void *vcpu_thread_main(void *data)
 {
 	struct vcpu_thread *vcpu = data;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c b/tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c
new file mode 100644
index 000000000000..472e7d5a182b
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * x86_64-specific extensions to perf_test_util.c.
+ *
+ * Copyright (C) 2022, Google, Inc.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "perf_test_util.h"
+#include "../kvm_util_internal.h"
+#include "processor.h"
+#include "vmx.h"
+
+void perf_test_l2_guest_code(uint64_t vcpu_id)
+{
+	perf_test_guest_code(vcpu_id);
+	vmcall();
+}
+
+extern char perf_test_l2_guest_entry[];
+__asm__(
+"perf_test_l2_guest_entry:"
+"	mov (%rsp), %rdi;"
+"	call perf_test_l2_guest_code;"
+"	ud2;"
+);
+
+static void perf_test_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
+{
+#define L2_GUEST_STACK_SIZE 64
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	unsigned long *rsp;
+
+	GUEST_ASSERT(vmx->vmcs_gpa);
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx));
+	GUEST_ASSERT(load_vmcs(vmx));
+	GUEST_ASSERT(ept_1g_pages_supported());
+
+	rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
+	*rsp = vcpu_id;
+	prepare_vmcs(vmx, perf_test_l2_guest_entry, rsp);
+
+	GUEST_ASSERT(!vmlaunch());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+	GUEST_DONE();
+}
+
+uint64_t perf_test_nested_pages(int nr_vcpus)
+{
+	/*
+	 * 513 page tables to identity-map the L2 with 1G pages, plus a few
+	 * pages per-vCPU for data structures such as the VMCS.
+	 */
+	return 513 + 10 * nr_vcpus;
+}
+
+void perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus)
+{
+	struct vmx_pages *vmx, *vmx0 = NULL;
+	struct kvm_regs regs;
+	vm_vaddr_t vmx_gva;
+	int vcpu_id;
+
+	nested_vmx_check_supported();
+
+	for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
+		vmx = vcpu_alloc_vmx(vm, &vmx_gva);
+
+		if (vcpu_id == 0) {
+			prepare_eptp(vmx, vm, 0);
+			/*
+			 * Identity map L2 with 1G pages so that KVM can shadow
+			 * the EPT12 with huge pages.
+			 */
+			nested_map_all_1g(vmx, vm);
+			vmx0 = vmx;
+		} else {
+			/* Share the same EPT table across all vCPUs. */
+			vmx->eptp = vmx0->eptp;
+			vmx->eptp_hva = vmx0->eptp_hva;
+			vmx->eptp_gpa = vmx0->eptp_gpa;
+		}
+
+		/*
+		 * Override the vCPU to run perf_test_l1_guest_code() which will
+		 * bounce it into L2 before calling perf_test_guest_code().
+		 */
+		vcpu_regs_get(vm, vcpu_id, &regs);
+		regs.rip = (unsigned long) perf_test_l1_guest_code;
+		vcpu_regs_set(vm, vcpu_id, &regs);
+		vcpu_args_set(vm, vcpu_id, 2, vmx_gva, vcpu_id);
+	}
+}
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 5bf169179455..9858e56370cb 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -203,6 +203,11 @@ static bool ept_vpid_cap_supported(uint64_t mask)
 	return rdmsr(MSR_IA32_VMX_EPT_VPID_CAP) & mask;
 }
 
+bool ept_1g_pages_supported(void)
+{
+	return ept_vpid_cap_supported(VMX_EPT_VPID_CAP_1G_PAGES);
+}
+
 /*
  * Initialize the control fields to the most basic settings possible.
  */
@@ -547,6 +552,14 @@ void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
 	}
 }
 
+/* Identity map the entire guest physical address space with 1GiB Pages. */
+void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm)
+{
+	uint64_t gpa_size = (vm->max_gfn + 1) << vm->page_shift;
+
+	__nested_map(vmx, vm, 0, 0, gpa_size, PG_LEVEL_1G);
+}
+
 void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot)
 {
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-17 19:05 ` [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
@ 2022-05-17 20:20   ` Peter Xu
  2022-05-18 13:51     ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2022-05-17 20:20 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Tue, May 17, 2022 at 07:05:24PM +0000, David Matlack wrote:
> +uint64_t perf_test_nested_pages(int nr_vcpus)
> +{
> +	/*
> +	 * 513 page tables to identity-map the L2 with 1G pages, plus a few
> +	 * pages per-vCPU for data structures such as the VMCS.
> +	 */
> +	return 513 + 10 * nr_vcpus;

Shouldn't that 513 magic value be related to vm->max_gfn instead (rather
than assuming all hosts have 39 bits PA)?

If my math is correct, it'll require 1GB here just for the l2->l1 pgtables
on a 5-level host to run this test nested. So I had a feeling we'd better
still consider >4 level hosts some day very soon..  No strong opinion, as
long as this test is not run by default.

> +}

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 08/10] KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS)
  2022-05-17 19:05 ` [PATCH v2 08/10] KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS) David Matlack
@ 2022-05-17 20:21   ` Peter Xu
  2022-05-18 17:18     ` David Matlack
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2022-05-17 20:21 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Tue, May 17, 2022 at 07:05:22PM +0000, David Matlack wrote:
> Drop the "all: $(LIBKVM_OBJS)" rule. The KVM selftests already depend
> on $(LIBKVM_OBJS), so there is no reason to have this rule.
> 
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Matlack <dmatlack@google.com>

Since previous patch touched the same line, normally for such a trivial
change I'll just squash into it.  Or at least it should be before the
previous patch then that one contains one less LOC change.  Anyway:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

> ---
>  tools/testing/selftests/kvm/Makefile | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index cd7a9df4ad6d..0889fc17baa5 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -189,7 +189,6 @@ $(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
>  	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
>  
>  x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
> -all: $(LIBKVM_OBJS)
>  $(TEST_GEN_PROGS): $(LIBKVM_OBJS)
>  
>  cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
> -- 
> 2.36.0.550.gb090851708-goog
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/10] KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  2022-05-17 19:05 ` [PATCH v2 01/10] KVM: selftests: Replace x86_page_size with PG_LEVEL_XX David Matlack
@ 2022-05-17 20:26   ` Peter Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2022-05-17 20:26 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Tue, May 17, 2022 at 07:05:15PM +0000, David Matlack wrote:
> x86_page_size is an enum used to communicate the desired page size with
> which to map a range of memory. Under the hood they just encode the
> desired level at which to map the page. This ends up being clunky in a
> few ways:
> 
>  - The name suggests it encodes the size of the page rather than the
>    level.
>  - In other places in x86_64/processor.c we just use a raw int to encode
>    the level.
> 
> Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
> around raw ints when referring to the level. This makes the code easier
> to understand since these macros are very common in KVM MMU code.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  .../selftests/kvm/include/x86_64/processor.h  | 18 ++++++----
>  .../selftests/kvm/lib/x86_64/processor.c      | 33 ++++++++++---------
>  .../selftests/kvm/max_guest_memory_test.c     |  2 +-
>  .../selftests/kvm/x86_64/mmu_role_test.c      |  2 +-
>  4 files changed, 31 insertions(+), 24 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 37db341d4cc5..434a4f60f4d9 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -465,13 +465,19 @@ void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
>  struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
>  void vm_xsave_req_perm(int bit);
>  
> -enum x86_page_size {
> -	X86_PAGE_SIZE_4K = 0,
> -	X86_PAGE_SIZE_2M,
> -	X86_PAGE_SIZE_1G,
> +enum pg_level {
> +	PG_LEVEL_NONE,
> +	PG_LEVEL_4K,
> +	PG_LEVEL_2M,
> +	PG_LEVEL_1G,
> +	PG_LEVEL_512G,
> +	PG_LEVEL_NUM
>  };

I still prefer PTE/PMD/PUD/... as I suggested, as that's how the kernel mm
handles these levels with arch-independent way across the kernel.  But
well.. I never fight hard on namings, because I know that's the major
complexity. :-)

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 02/10] KVM: selftests: Add option to create 2M and 1G EPT mappings
  2022-05-17 19:05 ` [PATCH v2 02/10] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
@ 2022-05-17 20:27   ` Peter Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2022-05-17 20:27 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Tue, May 17, 2022 at 07:05:16PM +0000, David Matlack wrote:
> The current EPT mapping code in the selftests only supports mapping 4K
> pages. This commit extends that support with an option to map at 2M or
> 1G. This will be used in a future commit to create large page mappings
> to test eager page splitting.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-17 20:20   ` Peter Xu
@ 2022-05-18 13:51     ` Peter Xu
  2022-05-18 15:24       ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2022-05-18 13:51 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Tue, May 17, 2022 at 04:20:31PM -0400, Peter Xu wrote:
> On Tue, May 17, 2022 at 07:05:24PM +0000, David Matlack wrote:
> > +uint64_t perf_test_nested_pages(int nr_vcpus)
> > +{
> > +	/*
> > +	 * 513 page tables to identity-map the L2 with 1G pages, plus a few
> > +	 * pages per-vCPU for data structures such as the VMCS.
> > +	 */
> > +	return 513 + 10 * nr_vcpus;
> 
> Shouldn't that 513 magic value be related to vm->max_gfn instead (rather
> than assuming all hosts have 39 bits PA)?
> 
> If my math is correct, it'll require 1GB here just for the l2->l1 pgtables
> on a 5-level host to run this test nested. So I had a feeling we'd better
> still consider >4 level hosts some day very soon..  No strong opinion, as
> long as this test is not run by default.

I had a feeling that when I said N level I actually meant N-1 level in all
above, since 39 bits are for 3 level not 4 level?..

Then it's ~512GB pgtables on 5 level?  If so I do think we'd better have a
nicer way to do this identity mapping..

I don't think it's very hard - walk the mem regions in kvm_vm.regions
should work for us?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-18 13:51     ` Peter Xu
@ 2022-05-18 15:24       ` Sean Christopherson
  2022-05-18 16:12         ` David Matlack
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2022-05-18 15:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Matlack, Paolo Bonzini, Ben Gardon, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Wed, May 18, 2022, Peter Xu wrote:
> On Tue, May 17, 2022 at 04:20:31PM -0400, Peter Xu wrote:
> > On Tue, May 17, 2022 at 07:05:24PM +0000, David Matlack wrote:
> > > +uint64_t perf_test_nested_pages(int nr_vcpus)
> > > +{
> > > +	/*
> > > +	 * 513 page tables to identity-map the L2 with 1G pages, plus a few
> > > +	 * pages per-vCPU for data structures such as the VMCS.
> > > +	 */
> > > +	return 513 + 10 * nr_vcpus;
> > 
> > Shouldn't that 513 magic value be related to vm->max_gfn instead (rather
> > than assuming all hosts have 39 bits PA)?
> > 
> > If my math is correct, it'll require 1GB here just for the l2->l1 pgtables
> > on a 5-level host to run this test nested. So I had a feeling we'd better
> > still consider >4 level hosts some day very soon..  No strong opinion, as
> > long as this test is not run by default.
> 
> I had a feeling that when I said N level I actually meant N-1 level in all
> above, since 39 bits are for 3 level not 4 level?..
> 
> Then it's ~512GB pgtables on 5 level?  If so I do think we'd better have a
> nicer way to do this identity mapping..

Agreed, mapping all theoretically possible gfns into L2 is doomed to fail for
larger MAXPHYADDR systems.

Page table allocations are currently hardcoded to come from memslot0.  memslot0
is required to be in lower DRAM, and thus tops out at ~3gb for all intents and
purposes because we need to leave room for the xAPIC.

And I would strongly prefer not to plumb back the ability to specificy an alternative
memslot for page table allocations, because except for truly pathological tests that
functionality is unnecessary and pointless complexity.

> I don't think it's very hard - walk the mem regions in kvm_vm.regions
> should work for us?

Yeah.  Alternatively, The test can identity map all of memory <4gb and then also
map "guest_test_phys_mem - guest_num_pages".  I don't think there's any other memory
to deal with, is there?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-18 15:24       ` Sean Christopherson
@ 2022-05-18 16:12         ` David Matlack
  2022-05-18 16:37           ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: David Matlack @ 2022-05-18 16:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Xu, Paolo Bonzini, Ben Gardon, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Wed, May 18, 2022 at 8:24 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, May 18, 2022, Peter Xu wrote:
> > On Tue, May 17, 2022 at 04:20:31PM -0400, Peter Xu wrote:
> > > On Tue, May 17, 2022 at 07:05:24PM +0000, David Matlack wrote:
> > > > +uint64_t perf_test_nested_pages(int nr_vcpus)
> > > > +{
> > > > + /*
> > > > +  * 513 page tables to identity-map the L2 with 1G pages, plus a few
> > > > +  * pages per-vCPU for data structures such as the VMCS.
> > > > +  */
> > > > + return 513 + 10 * nr_vcpus;
> > >
> > > Shouldn't that 513 magic value be related to vm->max_gfn instead (rather
> > > than assuming all hosts have 39 bits PA)?
> > >
> > > If my math is correct, it'll require 1GB here just for the l2->l1 pgtables
> > > on a 5-level host to run this test nested. So I had a feeling we'd better
> > > still consider >4 level hosts some day very soon..  No strong opinion, as
> > > long as this test is not run by default.
> >
> > I had a feeling that when I said N level I actually meant N-1 level in all
> > above, since 39 bits are for 3 level not 4 level?..
> >
> > Then it's ~512GB pgtables on 5 level?  If so I do think we'd better have a
> > nicer way to do this identity mapping..
>
> Agreed, mapping all theoretically possible gfns into L2 is doomed to fail for
> larger MAXPHYADDR systems.

Peter, I think your original math was correct. For 4-level we need 1
L4 + 512 L3 tables (i.e. ~2MiB) to map the entire address space. Each
of the L3 tables contains 512 PTEs that each points to a 1GiB page,
mapping in total 512 * 512 = 256 TiBd.

So for 5-level we need 1 L5 + 512 L4 + 262144 L3 table (i.e. ~1GiB).

>
> Page table allocations are currently hardcoded to come from memslot0.  memslot0
> is required to be in lower DRAM, and thus tops out at ~3gb for all intents and
> purposes because we need to leave room for the xAPIC.
>
> And I would strongly prefer not to plumb back the ability to specificy an alternative
> memslot for page table allocations, because except for truly pathological tests that
> functionality is unnecessary and pointless complexity.
>
> > I don't think it's very hard - walk the mem regions in kvm_vm.regions
> > should work for us?
>
> Yeah.  Alternatively, The test can identity map all of memory <4gb and then also
> map "guest_test_phys_mem - guest_num_pages".  I don't think there's any other memory
> to deal with, is there?

This isn't necessary for 4-level, but also wouldn't be too hard to
implement. I can take a stab at implementing in v3 if we think 5-level
selftests are coming soon.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-18 16:12         ` David Matlack
@ 2022-05-18 16:37           ` Sean Christopherson
  2022-05-20 22:01             ` David Matlack
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2022-05-18 16:37 UTC (permalink / raw)
  To: David Matlack
  Cc: Peter Xu, Paolo Bonzini, Ben Gardon, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Wed, May 18, 2022, David Matlack wrote:
> On Wed, May 18, 2022 at 8:24 AM Sean Christopherson <seanjc@google.com> wrote:
> > Page table allocations are currently hardcoded to come from memslot0.  memslot0
> > is required to be in lower DRAM, and thus tops out at ~3gb for all intents and
> > purposes because we need to leave room for the xAPIC.
> >
> > And I would strongly prefer not to plumb back the ability to specificy an alternative
> > memslot for page table allocations, because except for truly pathological tests that
> > functionality is unnecessary and pointless complexity.
> >
> > > I don't think it's very hard - walk the mem regions in kvm_vm.regions
> > > should work for us?
> >
> > Yeah.  Alternatively, The test can identity map all of memory <4gb and then also
> > map "guest_test_phys_mem - guest_num_pages".  I don't think there's any other memory
> > to deal with, is there?
> 
> This isn't necessary for 4-level, but also wouldn't be too hard to
> implement. I can take a stab at implementing in v3 if we think 5-level
> selftests are coming soon.

The current incarnation of nested_map_all_1g() is broken irrespective of 5-level
paging.  If MAXPHYADDR > 48, then bits 51:48 will either be ignored or will cause
reserved #PF or #GP[*].  Because the test puts memory at max_gfn, identity mapping
test memory will fail if 4-level paging is used and MAXPHYADDR > 48.

I think the easist thing would be to restrict the "starting" upper gfn to the min
of max_gfn and the max addressable gfn based on whether 4-level or 5-level paging
is in use.

[*] Intel's SDM is comically out-of-date and pretends 5-level EPT doesn't exist,
    so I'm not sure what happens if a GPA is greater than the PWL.

    Section "28.3.2 EPT Translation Mechanism" still says:

    The EPT translation mechanism uses only bits 47:0 of each guest-physical address.

    No processors supporting the Intel 64 architecture support more than 48
    physical-address bits. Thus, no such processor can produce a guest-physical
    address with more than 48 bits. An attempt to use such an address causes a
    page fault. An attempt to load CR3 with such an address causes a general-protection
    fault. If PAE paging is being used, an attempt to load CR3 that would load a
    PDPTE with such an address causes a general-protection fault.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 08/10] KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS)
  2022-05-17 20:21   ` Peter Xu
@ 2022-05-18 17:18     ` David Matlack
  0 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-18 17:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Tue, May 17, 2022 at 1:21 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, May 17, 2022 at 07:05:22PM +0000, David Matlack wrote:
> > Drop the "all: $(LIBKVM_OBJS)" rule. The KVM selftests already depend
> > on $(LIBKVM_OBJS), so there is no reason to have this rule.
> >
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: David Matlack <dmatlack@google.com>
>
> Since previous patch touched the same line, normally for such a trivial
> change I'll just squash into it.  Or at least it should be before the
> previous patch then that one contains one less LOC change.  Anyway:

The previous patch does touch this line but this is a logically
distinct change so I think it makes sense to split out.

You're right though that it'd probably make sense to re-order this
before the previous patch. i.e. Drop the line "all: $(STATIC_LIBS)".



>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Thanks,
>
> > ---
> >  tools/testing/selftests/kvm/Makefile | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index cd7a9df4ad6d..0889fc17baa5 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -189,7 +189,6 @@ $(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
> >       $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
> >
> >  x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
> > -all: $(LIBKVM_OBJS)
> >  $(TEST_GEN_PROGS): $(LIBKVM_OBJS)
> >
> >  cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
> > --
> > 2.36.0.550.gb090851708-goog
> >
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-18 16:37           ` Sean Christopherson
@ 2022-05-20 22:01             ` David Matlack
  2022-05-20 22:49               ` David Matlack
  0 siblings, 1 reply; 22+ messages in thread
From: David Matlack @ 2022-05-20 22:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Xu, Paolo Bonzini, Ben Gardon, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Wed, May 18, 2022 at 9:37 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, May 18, 2022, David Matlack wrote:
> > On Wed, May 18, 2022 at 8:24 AM Sean Christopherson <seanjc@google.com> wrote:
> > > Page table allocations are currently hardcoded to come from memslot0.  memslot0
> > > is required to be in lower DRAM, and thus tops out at ~3gb for all intents and
> > > purposes because we need to leave room for the xAPIC.
> > >
> > > And I would strongly prefer not to plumb back the ability to specificy an alternative
> > > memslot for page table allocations, because except for truly pathological tests that
> > > functionality is unnecessary and pointless complexity.
> > >
> > > > I don't think it's very hard - walk the mem regions in kvm_vm.regions
> > > > should work for us?
> > >
> > > Yeah.  Alternatively, The test can identity map all of memory <4gb and then also
> > > map "guest_test_phys_mem - guest_num_pages".  I don't think there's any other memory
> > > to deal with, is there?
> >
> > This isn't necessary for 4-level, but also wouldn't be too hard to
> > implement. I can take a stab at implementing in v3 if we think 5-level
> > selftests are coming soon.
>
> The current incarnation of nested_map_all_1g() is broken irrespective of 5-level
> paging.  If MAXPHYADDR > 48, then bits 51:48 will either be ignored or will cause
> reserved #PF or #GP[*].  Because the test puts memory at max_gfn, identity mapping
> test memory will fail if 4-level paging is used and MAXPHYADDR > 48.

Ah good point.

I wasn't able to get a machine with MAXPHYADDR > 48 to test today so
I've just made __nested_pg_map() assert that the nested_paddr fits in
48 bits. We can add the support for 5-level paging or your idea to
restrict the perf_test_util gfn to 48-bits in a subsequent series when
it becomes necessary.

>
> I think the easist thing would be to restrict the "starting" upper gfn to the min
> of max_gfn and the max addressable gfn based on whether 4-level or 5-level paging
> is in use.
>
> [*] Intel's SDM is comically out-of-date and pretends 5-level EPT doesn't exist,
>     so I'm not sure what happens if a GPA is greater than the PWL.
>
>     Section "28.3.2 EPT Translation Mechanism" still says:
>
>     The EPT translation mechanism uses only bits 47:0 of each guest-physical address.
>
>     No processors supporting the Intel 64 architecture support more than 48
>     physical-address bits. Thus, no such processor can produce a guest-physical
>     address with more than 48 bits. An attempt to use such an address causes a
>     page fault. An attempt to load CR3 with such an address causes a general-protection
>     fault. If PAE paging is being used, an attempt to load CR3 that would load a
>     PDPTE with such an address causes a general-protection fault.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-20 22:01             ` David Matlack
@ 2022-05-20 22:49               ` David Matlack
  0 siblings, 0 replies; 22+ messages in thread
From: David Matlack @ 2022-05-20 22:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Xu, Paolo Bonzini, Ben Gardon, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, May 20, 2022 at 3:01 PM David Matlack <dmatlack@google.com> wrote:
>
> On Wed, May 18, 2022 at 9:37 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, May 18, 2022, David Matlack wrote:
> > > On Wed, May 18, 2022 at 8:24 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > Page table allocations are currently hardcoded to come from memslot0.  memslot0
> > > > is required to be in lower DRAM, and thus tops out at ~3gb for all intents and
> > > > purposes because we need to leave room for the xAPIC.
> > > >
> > > > And I would strongly prefer not to plumb back the ability to specificy an alternative
> > > > memslot for page table allocations, because except for truly pathological tests that
> > > > functionality is unnecessary and pointless complexity.
> > > >
> > > > > I don't think it's very hard - walk the mem regions in kvm_vm.regions
> > > > > should work for us?
> > > >
> > > > Yeah.  Alternatively, The test can identity map all of memory <4gb and then also
> > > > map "guest_test_phys_mem - guest_num_pages".  I don't think there's any other memory
> > > > to deal with, is there?
> > >
> > > This isn't necessary for 4-level, but also wouldn't be too hard to
> > > implement. I can take a stab at implementing in v3 if we think 5-level
> > > selftests are coming soon.
> >
> > The current incarnation of nested_map_all_1g() is broken irrespective of 5-level
> > paging.  If MAXPHYADDR > 48, then bits 51:48 will either be ignored or will cause
> > reserved #PF or #GP[*].  Because the test puts memory at max_gfn, identity mapping
> > test memory will fail if 4-level paging is used and MAXPHYADDR > 48.
>
> Ah good point.
>
> I wasn't able to get a machine with MAXPHYADDR > 48 to test today so
> I've just made __nested_pg_map() assert that the nested_paddr fits in
> 48 bits. We can add the support for 5-level paging or your idea to
> restrict the perf_test_util gfn to 48-bits in a subsequent series when
> it becomes necessary.

Nevermind I've got a machine to test on now. I'll have a v4 out in a
few minutes to address MAXPHYADDR > 48 hosts. In the meantime I've
confirmed that the new assert in __nested_pg_map() works as expected
:)

>
> >
> > I think the easist thing would be to restrict the "starting" upper gfn to the min
> > of max_gfn and the max addressable gfn based on whether 4-level or 5-level paging
> > is in use.
> >
> > [*] Intel's SDM is comically out-of-date and pretends 5-level EPT doesn't exist,
> >     so I'm not sure what happens if a GPA is greater than the PWL.
> >
> >     Section "28.3.2 EPT Translation Mechanism" still says:
> >
> >     The EPT translation mechanism uses only bits 47:0 of each guest-physical address.
> >
> >     No processors supporting the Intel 64 architecture support more than 48
> >     physical-address bits. Thus, no such processor can produce a guest-physical
> >     address with more than 48 bits. An attempt to use such an address causes a
> >     page fault. An attempt to load CR3 with such an address causes a general-protection
> >     fault. If PAE paging is being used, an attempt to load CR3 that would load a
> >     PDPTE with such an address causes a general-protection fault.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-05-20 22:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-17 19:05 [PATCH v2 00/10] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
2022-05-17 19:05 ` [PATCH v2 01/10] KVM: selftests: Replace x86_page_size with PG_LEVEL_XX David Matlack
2022-05-17 20:26   ` Peter Xu
2022-05-17 19:05 ` [PATCH v2 02/10] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
2022-05-17 20:27   ` Peter Xu
2022-05-17 19:05 ` [PATCH v2 03/10] KVM: selftests: Drop stale function parameter comment for nested_map() David Matlack
2022-05-17 19:05 ` [PATCH v2 04/10] KVM: selftests: Refactor nested_map() to specify target level David Matlack
2022-05-17 19:05 ` [PATCH v2 05/10] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h David Matlack
2022-05-17 19:05 ` [PATCH v2 06/10] KVM: selftests: Add a helper to check EPT/VPID capabilities David Matlack
2022-05-17 19:05 ` [PATCH v2 07/10] KVM: selftests: Link selftests directly with lib object files David Matlack
2022-05-17 19:05 ` [PATCH v2 08/10] KVM: selftests: Drop unnecessary rule for $(LIBKVM_OBJS) David Matlack
2022-05-17 20:21   ` Peter Xu
2022-05-18 17:18     ` David Matlack
2022-05-17 19:05 ` [PATCH v2 09/10] KVM: selftests: Clean up LIBKVM files in Makefile David Matlack
2022-05-17 19:05 ` [PATCH v2 10/10] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
2022-05-17 20:20   ` Peter Xu
2022-05-18 13:51     ` Peter Xu
2022-05-18 15:24       ` Sean Christopherson
2022-05-18 16:12         ` David Matlack
2022-05-18 16:37           ` Sean Christopherson
2022-05-20 22:01             ` David Matlack
2022-05-20 22:49               ` David Matlack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.