[PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test
@ 2022-04-29 18:39 David Matlack
  2022-04-29 18:39 ` [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels David Matlack
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

This series adds support for taking any perf_test_util-based test and
configuring it to run vCPUs in L2 instead of L1, and adds an option to
dirty_log_perf_test to enable it.

This series was used to collect the performance data for eager page
spliting for nested MMUs [1].

[1] https://lore.kernel.org/kvm/20220422210546.458943-1-dmatlack@google.com/

David Matlack (9):
  KVM: selftests: Replace x86_page_size with raw levels
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2

 tools/testing/selftests/kvm/Makefile          |  50 +++++--
 .../selftests/kvm/dirty_log_perf_test.c       |  10 +-
 .../selftests/kvm/include/perf_test_util.h    |   5 +
 .../selftests/kvm/include/x86_64/processor.h  |  17 +--
 .../selftests/kvm/include/x86_64/vmx.h        |   5 +
 .../selftests/kvm/lib/perf_test_util.c        |  13 +-
 .../selftests/kvm/lib/x86_64/perf_test_util.c |  89 +++++++++++
 .../selftests/kvm/lib/x86_64/processor.c      |  27 ++--
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  | 140 +++++++++++-------
 .../selftests/kvm/max_guest_memory_test.c     |   2 +-
 .../selftests/kvm/x86_64/mmu_role_test.c      |   2 +-
 11 files changed, 262 insertions(+), 98 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c


base-commit: 84e5ffd045f33e4fa32370135436d987478d0bf7
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-13 20:04   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 2/9] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

x86_page_size is an enum used to communicate the desired page size with
which to map a range of memory. Under the hood they just encode the
desired level at which to map the page. This ends up being clunky in a
few ways:

 - The name suggests it encodes the size of the page rather than the
   level.
 - In other places in x86_64/processor.c we just use a raw int to encode
   the level.

Simplify this by just admitting that x86_page_size is just the level and
using an int and some more obviously named macros (e.g. PG_LEVEL_1G).

Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../selftests/kvm/include/x86_64/processor.h  | 14 +++++-----
 .../selftests/kvm/lib/x86_64/processor.c      | 27 +++++++++----------
 .../selftests/kvm/max_guest_memory_test.c     |  2 +-
 .../selftests/kvm/x86_64/mmu_role_test.c      |  2 +-
 4 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 37db341d4cc5..b512f9f508ae 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -465,13 +465,13 @@ void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 void vm_xsave_req_perm(int bit);
 
-enum x86_page_size {
-	X86_PAGE_SIZE_4K = 0,
-	X86_PAGE_SIZE_2M,
-	X86_PAGE_SIZE_1G,
-};
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
-		   enum x86_page_size page_size);
+#define PG_LEVEL_4K 0
+#define PG_LEVEL_2M 1
+#define PG_LEVEL_1G 2
+
+#define PG_LEVEL_SIZE(_level) (1ull << (((_level) * 9) + 12))
+
+void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
 
 /*
  * Basic CPU control in CR0
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 9f000dfb5594..1a7de69e2495 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -199,15 +199,15 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
 						    uint64_t pt_pfn,
 						    uint64_t vaddr,
 						    uint64_t paddr,
-						    int level,
-						    enum x86_page_size page_size)
+						    int current_level,
+						    int target_level)
 {
-	struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, level);
+	struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, current_level);
 
 	if (!pte->present) {
 		pte->writable = true;
 		pte->present = true;
-		pte->page_size = (level == page_size);
+		pte->page_size = (current_level == target_level);
 		if (pte->page_size)
 			pte->pfn = paddr >> vm->page_shift;
 		else
@@ -218,20 +218,19 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
 		 * a hugepage at this level, and that there isn't a hugepage at
 		 * this level.
 		 */
-		TEST_ASSERT(level != page_size,
+		TEST_ASSERT(current_level != target_level,
 			    "Cannot create hugepage at level: %u, vaddr: 0x%lx\n",
-			    page_size, vaddr);
+			    current_level, vaddr);
 		TEST_ASSERT(!pte->page_size,
 			    "Cannot create page table at level: %u, vaddr: 0x%lx\n",
-			    level, vaddr);
+			    current_level, vaddr);
 	}
 	return pte;
 }
 
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
-		   enum x86_page_size page_size)
+void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
 {
-	const uint64_t pg_size = 1ull << ((page_size * 9) + 12);
+	const uint64_t pg_size = PG_LEVEL_SIZE(level);
 	struct pageUpperEntry *pml4e, *pdpe, *pde;
 	struct pageTableEntry *pte;
 
@@ -256,15 +255,15 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 	 * early if a hugepage was created.
 	 */
 	pml4e = virt_create_upper_pte(vm, vm->pgd >> vm->page_shift,
-				      vaddr, paddr, 3, page_size);
+				      vaddr, paddr, 3, level);
 	if (pml4e->page_size)
 		return;
 
-	pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, page_size);
+	pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, level);
 	if (pdpe->page_size)
 		return;
 
-	pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, page_size);
+	pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, level);
 	if (pde->page_size)
 		return;
 
@@ -279,7 +278,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 
 void virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
 {
-	__virt_pg_map(vm, vaddr, paddr, X86_PAGE_SIZE_4K);
+	__virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
 }
 
 static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm, int vcpuid,
diff --git a/tools/testing/selftests/kvm/max_guest_memory_test.c b/tools/testing/selftests/kvm/max_guest_memory_test.c
index 3875c4b23a04..15f046e19cb2 100644
--- a/tools/testing/selftests/kvm/max_guest_memory_test.c
+++ b/tools/testing/selftests/kvm/max_guest_memory_test.c
@@ -244,7 +244,7 @@ int main(int argc, char *argv[])
 #ifdef __x86_64__
 		/* Identity map memory in the guest using 1gb pages. */
 		for (i = 0; i < slot_size; i += size_1gb)
-			__virt_pg_map(vm, gpa + i, gpa + i, X86_PAGE_SIZE_1G);
+			__virt_pg_map(vm, gpa + i, gpa + i, PG_LEVEL_1G);
 #else
 		for (i = 0; i < slot_size; i += vm_get_page_size(vm))
 			virt_pg_map(vm, gpa + i, gpa + i);
diff --git a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
index da2325fcad87..bdecd532f935 100644
--- a/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
+++ b/tools/testing/selftests/kvm/x86_64/mmu_role_test.c
@@ -35,7 +35,7 @@ static void mmu_role_test(u32 *cpuid_reg, u32 evil_cpuid_val)
 	run = vcpu_state(vm, VCPU_ID);
 
 	/* Map 1gb page without a backing memlot. */
-	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, X86_PAGE_SIZE_1G);
+	__virt_pg_map(vm, MMIO_GPA, MMIO_GPA, PG_LEVEL_1G);
 
 	r = _vcpu_run(vm, VCPU_ID);
 

base-commit: 84e5ffd045f33e4fa32370135436d987478d0bf7
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/9] KVM: selftests: Add option to create 2M and 1G EPT mappings
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
  2022-04-29 18:39 ` [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 20:32   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 3/9] KVM: selftests: Drop stale function parameter comment for nested_map() David Matlack
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

The current EPT mapping code in the selftests only supports mapping 4K
pages. This commit extends that support with an option to map at 2M or
1G. This will be used in a future commit to create large page mappings
to test eager page splitting.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 105 ++++++++++---------
 1 file changed, 57 insertions(+), 48 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index d089d8b850b5..1fa2d1059ade 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -392,27 +392,60 @@ void nested_vmx_check_supported(void)
 	}
 }
 
-void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
-		   uint64_t nested_paddr, uint64_t paddr)
+static void nested_create_upper_pte(struct kvm_vm *vm,
+				    struct eptPageTableEntry *pte,
+				    uint64_t nested_paddr,
+				    uint64_t paddr,
+				    int current_level,
+				    int target_level)
+{
+	if (!pte->readable) {
+		pte->writable = true;
+		pte->readable = true;
+		pte->executable = true;
+		pte->page_size = (current_level == target_level);
+		if (pte->page_size)
+			pte->address = paddr >> vm->page_shift;
+		else
+			pte->address = vm_alloc_page_table(vm) >> vm->page_shift;
+	} else {
+		/*
+		 * Entry already present.  Assert that the caller doesn't want
+		 * a hugepage at this level, and that there isn't a hugepage at
+		 * this level.
+		 */
+		TEST_ASSERT(current_level != target_level,
+			    "Cannot create hugepage at level: %u, nested_paddr: 0x%lx\n",
+			    current_level, nested_paddr);
+		TEST_ASSERT(!pte->page_size,
+			    "Cannot create page table at level: %u, nested_paddr: 0x%lx\n",
+			    current_level, nested_paddr);
+	}
+}
+
+
+void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		     uint64_t nested_paddr, uint64_t paddr, int target_level)
 {
+	const uint64_t page_size = PG_LEVEL_SIZE(target_level);
+	struct eptPageTableEntry *pt;
 	uint16_t index[4];
-	struct eptPageTableEntry *pml4e;
 
 	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
 		    "unknown or unsupported guest mode, mode: 0x%x", vm->mode);
 
-	TEST_ASSERT((nested_paddr % vm->page_size) == 0,
+	TEST_ASSERT((nested_paddr % page_size) == 0,
 		    "Nested physical address not on page boundary,\n"
-		    "  nested_paddr: 0x%lx vm->page_size: 0x%x",
-		    nested_paddr, vm->page_size);
+		    "  nested_paddr: 0x%lx page_size: 0x%lx",
+		    nested_paddr, page_size);
 	TEST_ASSERT((nested_paddr >> vm->page_shift) <= vm->max_gfn,
 		    "Physical address beyond beyond maximum supported,\n"
 		    "  nested_paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
 		    paddr, vm->max_gfn, vm->page_size);
-	TEST_ASSERT((paddr % vm->page_size) == 0,
+	TEST_ASSERT((paddr % page_size) == 0,
 		    "Physical address not on page boundary,\n"
-		    "  paddr: 0x%lx vm->page_size: 0x%x",
-		    paddr, vm->page_size);
+		    "  paddr: 0x%lx page_size: 0x%lx",
+		    paddr, page_size);
 	TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn,
 		    "Physical address beyond beyond maximum supported,\n"
 		    "  paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
@@ -423,49 +456,25 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
 	index[2] = (nested_paddr >> 30) & 0x1ffu;
 	index[3] = (nested_paddr >> 39) & 0x1ffu;
 
-	/* Allocate page directory pointer table if not present. */
-	pml4e = vmx->eptp_hva;
-	if (!pml4e[index[3]].readable) {
-		pml4e[index[3]].address = vm_alloc_page_table(vm) >> vm->page_shift;
-		pml4e[index[3]].writable = true;
-		pml4e[index[3]].readable = true;
-		pml4e[index[3]].executable = true;
-	}
+	pt = vmx->eptp_hva;
 
-	/* Allocate page directory table if not present. */
-	struct eptPageTableEntry *pdpe;
-	pdpe = addr_gpa2hva(vm, pml4e[index[3]].address * vm->page_size);
-	if (!pdpe[index[2]].readable) {
-		pdpe[index[2]].address = vm_alloc_page_table(vm) >> vm->page_shift;
-		pdpe[index[2]].writable = true;
-		pdpe[index[2]].readable = true;
-		pdpe[index[2]].executable = true;
-	}
+	for (int current_level = 3; current_level >= 0; current_level--) {
+		struct eptPageTableEntry *pte = &pt[index[current_level]];
 
-	/* Allocate page table if not present. */
-	struct eptPageTableEntry *pde;
-	pde = addr_gpa2hva(vm, pdpe[index[2]].address * vm->page_size);
-	if (!pde[index[1]].readable) {
-		pde[index[1]].address = vm_alloc_page_table(vm) >> vm->page_shift;
-		pde[index[1]].writable = true;
-		pde[index[1]].readable = true;
-		pde[index[1]].executable = true;
-	}
+		nested_create_upper_pte(vm, pte, nested_paddr, paddr,
+					current_level, target_level);
 
-	/* Fill in page table entry. */
-	struct eptPageTableEntry *pte;
-	pte = addr_gpa2hva(vm, pde[index[1]].address * vm->page_size);
-	pte[index[0]].address = paddr >> vm->page_shift;
-	pte[index[0]].writable = true;
-	pte[index[0]].readable = true;
-	pte[index[0]].executable = true;
+		if (pte->page_size)
+			break;
 
-	/*
-	 * For now mark these as accessed and dirty because the only
-	 * testcase we have needs that.  Can be reconsidered later.
-	 */
-	pte[index[0]].accessed = true;
-	pte[index[0]].dirty = true;
+		pt = addr_gpa2hva(vm, pte->address * vm->page_size);
+	}
+}
+
+void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		   uint64_t nested_paddr, uint64_t paddr)
+{
+	__nested_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
 }
 
 /*
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/9] KVM: selftests: Drop stale function parameter comment for nested_map()
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
  2022-04-29 18:39 ` [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels David Matlack
  2022-04-29 18:39 ` [PATCH 2/9] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 20:32   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 4/9] KVM: selftests: Refactor nested_map() to specify target level David Matlack
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

nested_map() does not take a parameter named eptp_memslot. Drop the
comment referring to it.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 1fa2d1059ade..ac432e064fcd 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -485,7 +485,6 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
  *   nested_paddr - Nested guest physical address to map
  *   paddr - VM Physical Address
  *   size - The size of the range to map
- *   eptp_memslot - Memory region slot for new virtual translation tables
  *
  * Output Args: None
  *
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/9] KVM: selftests: Refactor nested_map() to specify target level
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (2 preceding siblings ...)
  2022-04-29 18:39 ` [PATCH 3/9] KVM: selftests: Drop stale function parameter comment for nested_map() David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 20:34   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 5/9] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h David Matlack
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Refactor nested_map() to specify that it explicityl wants 4K mappings
(the existing behavior) and push the implementation down into
__nested_map(), which can be used in subsequent commits to create huge
page mappings.

No function change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index ac432e064fcd..715b58f1f7bc 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -485,6 +485,7 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
  *   nested_paddr - Nested guest physical address to map
  *   paddr - VM Physical Address
  *   size - The size of the range to map
+ *   level - The level at which to map the range
  *
  * Output Args: None
  *
@@ -493,22 +494,29 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
  * Within the VM given by vm, creates a nested guest translation for the
  * page range starting at nested_paddr to the page range starting at paddr.
  */
-void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
-		uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+void __nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		  uint64_t nested_paddr, uint64_t paddr, uint64_t size,
+		  int level)
 {
-	size_t page_size = vm->page_size;
+	size_t page_size = PG_LEVEL_SIZE(level);
 	size_t npages = size / page_size;
 
 	TEST_ASSERT(nested_paddr + size > nested_paddr, "Vaddr overflow");
 	TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
 
 	while (npages--) {
-		nested_pg_map(vmx, vm, nested_paddr, paddr);
+		__nested_pg_map(vmx, vm, nested_paddr, paddr, level);
 		nested_paddr += page_size;
 		paddr += page_size;
 	}
 }
 
+void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+		uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+{
+	__nested_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+}
+
 /* Prepare an identity extended page table that maps all the
  * physical pages in VM.
  */
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/9] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (3 preceding siblings ...)
  2022-04-29 18:39 ` [PATCH 4/9] KVM: selftests: Refactor nested_map() to specify target level David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 22:12   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 6/9] KVM: selftests: Add a helper to check EPT/VPID capabilities David Matlack
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

This is a VMX-related macro so move it to vmx.h. While here, open code
the mask like the rest of the VMX bitmask macros.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/include/x86_64/processor.h | 3 ---
 tools/testing/selftests/kvm/include/x86_64/vmx.h       | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index b512f9f508ae..5a8854e85b8f 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -488,9 +488,6 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
 #define X86_CR0_CD          (1UL<<30) /* Cache Disable */
 #define X86_CR0_PG          (1UL<<31) /* Paging */
 
-/* VMX_EPT_VPID_CAP bits */
-#define VMX_EPT_VPID_CAP_AD_BITS       (1ULL << 21)
-
 #define XSTATE_XTILE_CFG_BIT		17
 #define XSTATE_XTILE_DATA_BIT		18
 
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 583ceb0d1457..3b1794baa97c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -96,6 +96,8 @@
 #define VMX_MISC_PREEMPTION_TIMER_RATE_MASK	0x0000001f
 #define VMX_MISC_SAVE_EFER_LMA			0x00000020
 
+#define VMX_EPT_VPID_CAP_AD_BITS		0x00200000
+
 #define EXIT_REASON_FAILED_VMENTRY	0x80000000
 #define EXIT_REASON_EXCEPTION_NMI	0
 #define EXIT_REASON_EXTERNAL_INTERRUPT	1
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/9] KVM: selftests: Add a helper to check EPT/VPID capabilities
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (4 preceding siblings ...)
  2022-04-29 18:39 ` [PATCH 5/9] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 20:42   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 7/9] KVM: selftests: Link selftests directly with lib object files David Matlack
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Create a small helper function to check if a given EPT/VPID capability
is supported. This will be re-used in a follow-up commit to check for 1G
page support.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/vmx.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 715b58f1f7bc..3862d93a18ac 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -198,6 +198,11 @@ bool load_vmcs(struct vmx_pages *vmx)
 	return true;
 }
 
+static bool ept_vpid_cap_supported(uint64_t mask)
+{
+	return rdmsr(MSR_IA32_VMX_EPT_VPID_CAP) & mask;
+}
+
 /*
  * Initialize the control fields to the most basic settings possible.
  */
@@ -215,7 +220,7 @@ static inline void init_vmcs_control_fields(struct vmx_pages *vmx)
 		struct eptPageTablePointer eptp = {
 			.memory_type = VMX_BASIC_MEM_TYPE_WB,
 			.page_walk_length = 3, /* + 1 */
-			.ad_enabled = !!(rdmsr(MSR_IA32_VMX_EPT_VPID_CAP) & VMX_EPT_VPID_CAP_AD_BITS),
+			.ad_enabled = ept_vpid_cap_supported(VMX_EPT_VPID_CAP_AD_BITS),
 			.address = vmx->eptp_gpa >> PAGE_SHIFT_4K,
 		};
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/9] KVM: selftests: Link selftests directly with lib object files
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (5 preceding siblings ...)
  2022-04-29 18:39 ` [PATCH 6/9] KVM: selftests: Add a helper to check EPT/VPID capabilities David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 22:15   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 8/9] KVM: selftests: Clean up LIBKVM files in Makefile David Matlack
  2022-04-29 18:39 ` [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

The linker does obey strong/weak symbols when linking static libraries,
it simply resolves an undefined symbol to the first-encountered symbol.
This means that defining __weak arch-generic functions and then defining
arch-specific strong functions to override them in libkvm will not
always work.

More specifically, if we have:

lib/generic.c:

  void __weak foo(void)
  {
          pr_info("weak\n");
  }

  void bar(void)
  {
          foo();
  }

lib/x86_64/arch.c:

  void foo(void)
  {
          pr_info("strong\n");
  }

And a selftest that calls bar(), it will print "weak". Now if you make
generic.o explicitly depend on arch.o (e.g. add function to arch.c that
is called directly from generic.c) it will print "strong". In other
words, it seems that the linker is free to throw out arch.o when linking
because generic.o does not explicitly depend on it, which causes the
linker to lose the strong symbol.

One solution is to link libkvm.a with --whole-archive so that the linker
doesn't throw away object files it thinks are unnecessary. However that
is a bit difficult to plumb since we are using the common selftests
makefile rules. An easier solution is to drop libkvm.a just link
selftests with all the .o files that were originally in libkvm.a.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index af582d168621..c1eb6acb30de 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -172,12 +172,13 @@ LDFLAGS += -pthread $(no-pie-option) $(pgste-option)
 # $(TEST_GEN_PROGS) starts with $(OUTPUT)/
 include ../lib.mk
 
-STATIC_LIBS := $(OUTPUT)/libkvm.a
 LIBKVM_C := $(filter %.c,$(LIBKVM))
 LIBKVM_S := $(filter %.S,$(LIBKVM))
 LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
 LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
-EXTRA_CLEAN += $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(STATIC_LIBS) cscope.*
+LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
+
+EXTRA_CLEAN += $(LIBKVM_OBJS) cscope.*
 
 x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ))))
 $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
@@ -186,13 +187,9 @@ $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
 $(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
 	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
-LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
-$(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
-	$(AR) crs $@ $^
-
 x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
-all: $(STATIC_LIBS)
-$(TEST_GEN_PROGS): $(STATIC_LIBS)
+all: $(LIBKVM_OBJS)
+$(TEST_GEN_PROGS): $(LIBKVM_OBJS)
 
 cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
 cscope:
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/9] KVM: selftests: Clean up LIBKVM files in Makefile
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (6 preceding siblings ...)
  2022-04-29 18:39 ` [PATCH 7/9] KVM: selftests: Link selftests directly with lib object files David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 22:15   ` Peter Xu
  2022-04-29 18:39 ` [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Break up the long lines for LIBKVM and alphabetize each architecture.
This makes reading the Makefile easier, and will make reading diffs to
LIBKVM easier.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile | 36 ++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index c1eb6acb30de..1ba0d01362bd 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -37,11 +37,37 @@ ifeq ($(ARCH),riscv)
 	UNAME_M := riscv
 endif
 
-LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/rbtree.c lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c
-LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
-LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handlers.S lib/aarch64/spinlock.c lib/aarch64/gic.c lib/aarch64/gic_v3.c lib/aarch64/vgic.c
-LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
-LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
+LIBKVM += lib/assert.c
+LIBKVM += lib/elf.c
+LIBKVM += lib/guest_modes.c
+LIBKVM += lib/io.c
+LIBKVM += lib/kvm_util.c
+LIBKVM += lib/perf_test_util.c
+LIBKVM += lib/rbtree.c
+LIBKVM += lib/sparsebit.c
+LIBKVM += lib/test_util.c
+
+LIBKVM_x86_64 += lib/x86_64/apic.c
+LIBKVM_x86_64 += lib/x86_64/handlers.S
+LIBKVM_x86_64 += lib/x86_64/processor.c
+LIBKVM_x86_64 += lib/x86_64/svm.c
+LIBKVM_x86_64 += lib/x86_64/ucall.c
+LIBKVM_x86_64 += lib/x86_64/vmx.c
+
+LIBKVM_aarch64 += lib/aarch64/gic.c
+LIBKVM_aarch64 += lib/aarch64/gic_v3.c
+LIBKVM_aarch64 += lib/aarch64/handlers.S
+LIBKVM_aarch64 += lib/aarch64/processor.c
+LIBKVM_aarch64 += lib/aarch64/spinlock.c
+LIBKVM_aarch64 += lib/aarch64/ucall.c
+LIBKVM_aarch64 += lib/aarch64/vgic.c
+
+LIBKVM_s390x += lib/s390x/diag318_test_handler.c
+LIBKVM_s390x += lib/s390x/processor.c
+LIBKVM_s390x += lib/s390x/ucall.c
+
+LIBKVM_riscv += lib/riscv/processor.c
+LIBKVM_riscv += lib/riscv/ucall.c
 
 TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
 TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
                   ` (7 preceding siblings ...)
  2022-04-29 18:39 ` [PATCH 8/9] KVM: selftests: Clean up LIBKVM files in Makefile David Matlack
@ 2022-04-29 18:39 ` David Matlack
  2022-05-16 22:17   ` Peter Xu
  8 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-04-29 18:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Sean Christopherson, Oliver Upton, Peter Xu,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM),
	David Matlack

Add an option to dirty_log_perf_test that configures the vCPUs to run in
L2 instead of L1. This makes it possible to benchmark the dirty logging
performance of nested virtualization, which is particularly interesting
because KVM must shadow L1's EPT/NPT tables.

For now this support only works on x86_64 CPUs with VMX. Otherwise
passing -n results in the test being skipped.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../selftests/kvm/dirty_log_perf_test.c       | 10 ++-
 .../selftests/kvm/include/perf_test_util.h    |  5 ++
 .../selftests/kvm/include/x86_64/vmx.h        |  3 +
 .../selftests/kvm/lib/perf_test_util.c        | 13 ++-
 .../selftests/kvm/lib/x86_64/perf_test_util.c | 89 +++++++++++++++++++
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  | 11 +++
 7 files changed, 127 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 1ba0d01362bd..9b342239a6dd 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -49,6 +49,7 @@ LIBKVM += lib/test_util.c
 
 LIBKVM_x86_64 += lib/x86_64/apic.c
 LIBKVM_x86_64 += lib/x86_64/handlers.S
+LIBKVM_x86_64 += lib/x86_64/perf_test_util.c
 LIBKVM_x86_64 += lib/x86_64/processor.c
 LIBKVM_x86_64 += lib/x86_64/svm.c
 LIBKVM_x86_64 += lib/x86_64/ucall.c
diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 7b47ae4f952e..d60a34cdfaee 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -336,8 +336,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 static void help(char *name)
 {
 	puts("");
-	printf("usage: %s [-h] [-i iterations] [-p offset] [-g]"
-	       "[-m mode] [-b vcpu bytes] [-v vcpus] [-o] [-s mem type]"
+	printf("usage: %s [-h] [-i iterations] [-p offset] [-g] "
+	       "[-m mode] [-n] [-b vcpu bytes] [-v vcpus] [-o] [-s mem type]"
 	       "[-x memslots]\n", name);
 	puts("");
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
@@ -351,6 +351,7 @@ static void help(char *name)
 	printf(" -p: specify guest physical test memory offset\n"
 	       "     Warning: a low offset can conflict with the loaded test code.\n");
 	guest_modes_help();
+	printf(" -n: Run the vCPUs in nested mode (L2)\n");
 	printf(" -b: specify the size of the memory region which should be\n"
 	       "     dirtied by each vCPU. e.g. 10M or 3G.\n"
 	       "     (default: 1G)\n");
@@ -387,7 +388,7 @@ int main(int argc, char *argv[])
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "ghi:p:m:b:f:v:os:x:")) != -1) {
+	while ((opt = getopt(argc, argv, "ghi:p:m:nb:f:v:os:x:")) != -1) {
 		switch (opt) {
 		case 'g':
 			dirty_log_manual_caps = 0;
@@ -401,6 +402,9 @@ int main(int argc, char *argv[])
 		case 'm':
 			guest_modes_cmdline(optarg);
 			break;
+		case 'n':
+			perf_test_args.nested = true;
+			break;
 		case 'b':
 			guest_percpu_mem_size = parse_size(optarg);
 			break;
diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h
index a86f953d8d36..1dfdaec43321 100644
--- a/tools/testing/selftests/kvm/include/perf_test_util.h
+++ b/tools/testing/selftests/kvm/include/perf_test_util.h
@@ -34,6 +34,9 @@ struct perf_test_args {
 	uint64_t guest_page_size;
 	int wr_fract;
 
+	/* Run vCPUs in L2 instead of L1, if the architecture supports it. */
+	bool nested;
+
 	struct perf_test_vcpu_args vcpu_args[KVM_MAX_VCPUS];
 };
 
@@ -49,5 +52,7 @@ void perf_test_set_wr_fract(struct kvm_vm *vm, int wr_fract);
 
 void perf_test_start_vcpu_threads(int vcpus, void (*vcpu_fn)(struct perf_test_vcpu_args *));
 void perf_test_join_vcpu_threads(int vcpus);
+void perf_test_guest_code(uint32_t vcpu_id);
+void perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus);
 
 #endif /* SELFTEST_KVM_PERF_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 3b1794baa97c..17d712503a36 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -96,6 +96,7 @@
 #define VMX_MISC_PREEMPTION_TIMER_RATE_MASK	0x0000001f
 #define VMX_MISC_SAVE_EFER_LMA			0x00000020
 
+#define VMX_EPT_VPID_CAP_1G_PAGES		0x00020000
 #define VMX_EPT_VPID_CAP_AD_BITS		0x00200000
 
 #define EXIT_REASON_FAILED_VMENTRY	0x80000000
@@ -608,6 +609,7 @@ bool load_vmcs(struct vmx_pages *vmx);
 
 bool nested_vmx_supported(void);
 void nested_vmx_check_supported(void);
+bool ept_1g_pages_supported(void);
 
 void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
 		   uint64_t nested_paddr, uint64_t paddr);
@@ -615,6 +617,7 @@ void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
 		 uint64_t nested_paddr, uint64_t paddr, uint64_t size);
 void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
 			uint32_t memslot);
+void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm);
 void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot);
 void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
index 722df3a28791..6e15c93a3577 100644
--- a/tools/testing/selftests/kvm/lib/perf_test_util.c
+++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
@@ -40,7 +40,7 @@ static bool all_vcpu_threads_running;
  * Continuously write to the first 8 bytes of each page in the
  * specified region.
  */
-static void guest_code(uint32_t vcpu_id)
+void perf_test_guest_code(uint32_t vcpu_id)
 {
 	struct perf_test_args *pta = &perf_test_args;
 	struct perf_test_vcpu_args *vcpu_args = &pta->vcpu_args[vcpu_id];
@@ -140,7 +140,7 @@ struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int vcpus,
 	 * effect as KVM allows aliasing HVAs in meslots.
 	 */
 	vm = vm_create_with_vcpus(mode, vcpus, DEFAULT_GUEST_PHY_PAGES,
-				  guest_num_pages, 0, guest_code, NULL);
+				  guest_num_pages, 0, perf_test_guest_code, NULL);
 
 	pta->vm = vm;
 
@@ -178,6 +178,9 @@ struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int vcpus,
 
 	perf_test_setup_vcpus(vm, vcpus, vcpu_memory_bytes, partition_vcpu_memory_access);
 
+	if (pta->nested)
+		perf_test_setup_nested(vm, vcpus);
+
 	ucall_init(vm, NULL);
 
 	/* Export the shared variables to the guest. */
@@ -198,6 +201,12 @@ void perf_test_set_wr_fract(struct kvm_vm *vm, int wr_fract)
 	sync_global_to_guest(vm, perf_test_args);
 }
 
+void __weak perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus)
+{
+	pr_info("%s() not support on this architecture, skipping.\n", __func__);
+	exit(KSFT_SKIP);
+}
+
 static void *vcpu_thread_main(void *data)
 {
 	struct vcpu_thread *vcpu = data;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c b/tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c
new file mode 100644
index 000000000000..ba20a1499263
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/x86_64/perf_test_util.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * x86_64-specific extensions to perf_test_util.c.
+ *
+ * Copyright (C) 2022, Google, Inc.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "perf_test_util.h"
+#include "../kvm_util_internal.h"
+#include "processor.h"
+#include "vmx.h"
+
+void perf_test_l2_guest_code(uint64_t vcpu_id)
+{
+	perf_test_guest_code(vcpu_id);
+	vmcall();
+}
+
+extern char perf_test_l2_guest_entry[];
+__asm__(
+"perf_test_l2_guest_entry:"
+"	mov (%rsp), %rdi;"
+"	call perf_test_l2_guest_code;"
+"	ud2;"
+);
+
+static void perf_test_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
+{
+#define L2_GUEST_STACK_SIZE 64
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	unsigned long *rsp;
+
+	GUEST_ASSERT(vmx->vmcs_gpa);
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx));
+	GUEST_ASSERT(load_vmcs(vmx));
+	GUEST_ASSERT(ept_1g_pages_supported());
+
+	rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
+	*rsp = vcpu_id;
+	prepare_vmcs(vmx, perf_test_l2_guest_entry, rsp);
+
+	GUEST_ASSERT(!vmlaunch());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+	GUEST_DONE();
+}
+
+void perf_test_setup_nested(struct kvm_vm *vm, int nr_vcpus)
+{
+	struct vmx_pages *vmx, *vmx0 = NULL;
+	struct kvm_regs regs;
+	vm_vaddr_t vmx_gva;
+	int vcpu_id;
+
+	nested_vmx_check_supported();
+
+	for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
+		vmx = vcpu_alloc_vmx(vm, &vmx_gva);
+
+		if (vcpu_id == 0) {
+			prepare_eptp(vmx, vm, 0);
+			/*
+			 * Identity map L2 with 1G pages so that KVM can shadow
+			 * the EPT12 with huge pages.
+			 */
+			nested_map_all_1g(vmx, vm);
+			vmx0 = vmx;
+		} else {
+			/* Share the same EPT table across all vCPUs. */
+			vmx->eptp = vmx0->eptp;
+			vmx->eptp_hva = vmx0->eptp_hva;
+			vmx->eptp_gpa = vmx0->eptp_gpa;
+		}
+
+		/*
+		 * Override the vCPU to run perf_test_l1_guest_code() which will
+		 * bounce it into L2 before calling perf_test_guest_code().
+		 */
+		vcpu_regs_get(vm, vcpu_id, &regs);
+		regs.rip = (unsigned long) perf_test_l1_guest_code;
+		vcpu_regs_set(vm, vcpu_id, &regs);
+		vcpu_args_set(vm, vcpu_id, 2, vmx_gva, vcpu_id);
+	}
+}
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 3862d93a18ac..32374a0f002c 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -203,6 +203,11 @@ static bool ept_vpid_cap_supported(uint64_t mask)
 	return rdmsr(MSR_IA32_VMX_EPT_VPID_CAP) & mask;
 }
 
+bool ept_1g_pages_supported(void)
+{
+	return ept_vpid_cap_supported(VMX_EPT_VPID_CAP_1G_PAGES);
+}
+
 /*
  * Initialize the control fields to the most basic settings possible.
  */
@@ -546,6 +551,12 @@ void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
 	}
 }
 
+/* Identity map the entire guest physical address space with 1GiB Pages. */
+void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm)
+{
+	__nested_map(vmx, vm, 0, 0, vm->max_gfn << vm->page_shift, PG_LEVEL_1G);
+}
+
 void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot)
 {
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels
  2022-04-29 18:39 ` [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels David Matlack
@ 2022-05-13 20:04   ` Peter Xu
  2022-05-16 22:38     ` David Matlack
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2022-05-13 20:04 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:27PM +0000, David Matlack wrote:
> x86_page_size is an enum used to communicate the desired page size with
> which to map a range of memory. Under the hood they just encode the
> desired level at which to map the page. This ends up being clunky in a
> few ways:
> 
>  - The name suggests it encodes the size of the page rather than the
>    level.
>  - In other places in x86_64/processor.c we just use a raw int to encode
>    the level.
> 
> Simplify this by just admitting that x86_page_size is just the level and
> using an int and some more obviously named macros (e.g. PG_LEVEL_1G).
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  .../selftests/kvm/include/x86_64/processor.h  | 14 +++++-----
>  .../selftests/kvm/lib/x86_64/processor.c      | 27 +++++++++----------
>  .../selftests/kvm/max_guest_memory_test.c     |  2 +-
>  .../selftests/kvm/x86_64/mmu_role_test.c      |  2 +-
>  4 files changed, 22 insertions(+), 23 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 37db341d4cc5..b512f9f508ae 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -465,13 +465,13 @@ void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
>  struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
>  void vm_xsave_req_perm(int bit);
>  
> -enum x86_page_size {
> -	X86_PAGE_SIZE_4K = 0,
> -	X86_PAGE_SIZE_2M,
> -	X86_PAGE_SIZE_1G,
> -};
> -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> -		   enum x86_page_size page_size);
> +#define PG_LEVEL_4K 0
> +#define PG_LEVEL_2M 1
> +#define PG_LEVEL_1G 2

A nitpick is: we could have named those as PG_LEVEL_[PTE|PMD|PUD|PGD..]
rather than 4K|2M|..., then...

> +
> +#define PG_LEVEL_SIZE(_level) (1ull << (((_level) * 9) + 12))
> +
> +void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
>  
>  /*
>   * Basic CPU control in CR0
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 9f000dfb5594..1a7de69e2495 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -199,15 +199,15 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
>  						    uint64_t pt_pfn,
>  						    uint64_t vaddr,
>  						    uint64_t paddr,
> -						    int level,
> -						    enum x86_page_size page_size)
> +						    int current_level,
> +						    int target_level)
>  {
> -	struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, level);
> +	struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, current_level);
>  
>  	if (!pte->present) {
>  		pte->writable = true;
>  		pte->present = true;
> -		pte->page_size = (level == page_size);
> +		pte->page_size = (current_level == target_level);
>  		if (pte->page_size)
>  			pte->pfn = paddr >> vm->page_shift;
>  		else
> @@ -218,20 +218,19 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
>  		 * a hugepage at this level, and that there isn't a hugepage at
>  		 * this level.
>  		 */
> -		TEST_ASSERT(level != page_size,
> +		TEST_ASSERT(current_level != target_level,
>  			    "Cannot create hugepage at level: %u, vaddr: 0x%lx\n",
> -			    page_size, vaddr);
> +			    current_level, vaddr);
>  		TEST_ASSERT(!pte->page_size,
>  			    "Cannot create page table at level: %u, vaddr: 0x%lx\n",
> -			    level, vaddr);
> +			    current_level, vaddr);
>  	}
>  	return pte;
>  }
>  
> -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> -		   enum x86_page_size page_size)
> +void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
>  {
> -	const uint64_t pg_size = 1ull << ((page_size * 9) + 12);
> +	const uint64_t pg_size = PG_LEVEL_SIZE(level);
>  	struct pageUpperEntry *pml4e, *pdpe, *pde;
>  	struct pageTableEntry *pte;
>  
> @@ -256,15 +255,15 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
>  	 * early if a hugepage was created.
>  	 */
>  	pml4e = virt_create_upper_pte(vm, vm->pgd >> vm->page_shift,
> -				      vaddr, paddr, 3, page_size);
> +				      vaddr, paddr, 3, level);
>  	if (pml4e->page_size)
>  		return;
>  
> -	pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, page_size);
> +	pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, level);
>  	if (pdpe->page_size)
>  		return;
>  
> -	pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, page_size);
> +	pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, level);

... here we could also potentially replace the 3/2/1s with the new macro
(or with existing naming number 3 will be missing a macro)?

>  	if (pde->page_size)
>  		return;

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] KVM: selftests: Add option to create 2M and 1G EPT mappings
  2022-04-29 18:39 ` [PATCH 2/9] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
@ 2022-05-16 20:32   ` Peter Xu
  2022-05-16 22:38     ` David Matlack
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2022-05-16 20:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:28PM +0000, David Matlack wrote:
> The current EPT mapping code in the selftests only supports mapping 4K
> pages. This commit extends that support with an option to map at 2M or
> 1G. This will be used in a future commit to create large page mappings
> to test eager page splitting.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  tools/testing/selftests/kvm/lib/x86_64/vmx.c | 105 ++++++++++---------
>  1 file changed, 57 insertions(+), 48 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> index d089d8b850b5..1fa2d1059ade 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> @@ -392,27 +392,60 @@ void nested_vmx_check_supported(void)
>  	}
>  }
>  
> -void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> -		   uint64_t nested_paddr, uint64_t paddr)
> +static void nested_create_upper_pte(struct kvm_vm *vm,
> +				    struct eptPageTableEntry *pte,
> +				    uint64_t nested_paddr,
> +				    uint64_t paddr,
> +				    int current_level,
> +				    int target_level)
> +{
> +	if (!pte->readable) {
> +		pte->writable = true;
> +		pte->readable = true;
> +		pte->executable = true;
> +		pte->page_size = (current_level == target_level);
> +		if (pte->page_size)
> +			pte->address = paddr >> vm->page_shift;
> +		else
> +			pte->address = vm_alloc_page_table(vm) >> vm->page_shift;
> +	} else {
> +		/*
> +		 * Entry already present.  Assert that the caller doesn't want
> +		 * a hugepage at this level, and that there isn't a hugepage at
> +		 * this level.
> +		 */
> +		TEST_ASSERT(current_level != target_level,
> +			    "Cannot create hugepage at level: %u, nested_paddr: 0x%lx\n",
> +			    current_level, nested_paddr);
> +		TEST_ASSERT(!pte->page_size,
> +			    "Cannot create page table at level: %u, nested_paddr: 0x%lx\n",
> +			    current_level, nested_paddr);
> +	}
> +}
> +
> +
> +void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> +		     uint64_t nested_paddr, uint64_t paddr, int target_level)
>  {
> +	const uint64_t page_size = PG_LEVEL_SIZE(target_level);
> +	struct eptPageTableEntry *pt;
>  	uint16_t index[4];
> -	struct eptPageTableEntry *pml4e;
>  
>  	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
>  		    "unknown or unsupported guest mode, mode: 0x%x", vm->mode);
>  
> -	TEST_ASSERT((nested_paddr % vm->page_size) == 0,
> +	TEST_ASSERT((nested_paddr % page_size) == 0,
>  		    "Nested physical address not on page boundary,\n"
> -		    "  nested_paddr: 0x%lx vm->page_size: 0x%x",
> -		    nested_paddr, vm->page_size);
> +		    "  nested_paddr: 0x%lx page_size: 0x%lx",
> +		    nested_paddr, page_size);
>  	TEST_ASSERT((nested_paddr >> vm->page_shift) <= vm->max_gfn,
>  		    "Physical address beyond beyond maximum supported,\n"
>  		    "  nested_paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
>  		    paddr, vm->max_gfn, vm->page_size);
> -	TEST_ASSERT((paddr % vm->page_size) == 0,
> +	TEST_ASSERT((paddr % page_size) == 0,
>  		    "Physical address not on page boundary,\n"
> -		    "  paddr: 0x%lx vm->page_size: 0x%x",
> -		    paddr, vm->page_size);
> +		    "  paddr: 0x%lx page_size: 0x%lx",
> +		    paddr, page_size);
>  	TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn,
>  		    "Physical address beyond beyond maximum supported,\n"
>  		    "  paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
> @@ -423,49 +456,25 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
>  	index[2] = (nested_paddr >> 30) & 0x1ffu;
>  	index[3] = (nested_paddr >> 39) & 0x1ffu;
>  
> -	/* Allocate page directory pointer table if not present. */
> -	pml4e = vmx->eptp_hva;
> -	if (!pml4e[index[3]].readable) {
> -		pml4e[index[3]].address = vm_alloc_page_table(vm) >> vm->page_shift;
> -		pml4e[index[3]].writable = true;
> -		pml4e[index[3]].readable = true;
> -		pml4e[index[3]].executable = true;
> -	}
> +	pt = vmx->eptp_hva;
>  
> -	/* Allocate page directory table if not present. */
> -	struct eptPageTableEntry *pdpe;
> -	pdpe = addr_gpa2hva(vm, pml4e[index[3]].address * vm->page_size);
> -	if (!pdpe[index[2]].readable) {
> -		pdpe[index[2]].address = vm_alloc_page_table(vm) >> vm->page_shift;
> -		pdpe[index[2]].writable = true;
> -		pdpe[index[2]].readable = true;
> -		pdpe[index[2]].executable = true;
> -	}
> +	for (int current_level = 3; current_level >= 0; current_level--) {
> +		struct eptPageTableEntry *pte = &pt[index[current_level]];
>  
> -	/* Allocate page table if not present. */
> -	struct eptPageTableEntry *pde;
> -	pde = addr_gpa2hva(vm, pdpe[index[2]].address * vm->page_size);
> -	if (!pde[index[1]].readable) {
> -		pde[index[1]].address = vm_alloc_page_table(vm) >> vm->page_shift;
> -		pde[index[1]].writable = true;
> -		pde[index[1]].readable = true;
> -		pde[index[1]].executable = true;
> -	}
> +		nested_create_upper_pte(vm, pte, nested_paddr, paddr,
> +					current_level, target_level);

This is going to run for the last level pte too, so maybe remove the
"upper" prefix in the helper?

>  
> -	/* Fill in page table entry. */
> -	struct eptPageTableEntry *pte;
> -	pte = addr_gpa2hva(vm, pde[index[1]].address * vm->page_size);
> -	pte[index[0]].address = paddr >> vm->page_shift;
> -	pte[index[0]].writable = true;
> -	pte[index[0]].readable = true;
> -	pte[index[0]].executable = true;
> +		if (pte->page_size)
> +			break;
>  
> -	/*
> -	 * For now mark these as accessed and dirty because the only
> -	 * testcase we have needs that.  Can be reconsidered later.
> -	 */
> -	pte[index[0]].accessed = true;
> -	pte[index[0]].dirty = true;

Is it intended to to drop the access/dirty bits here?

> +		pt = addr_gpa2hva(vm, pte->address * vm->page_size);
> +	}
> +}
> +
> +void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> +		   uint64_t nested_paddr, uint64_t paddr)
> +{
> +	__nested_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
>  }
>  
>  /*
> -- 
> 2.36.0.464.gb9c8b46e94-goog
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/9] KVM: selftests: Drop stale function parameter comment for nested_map()
  2022-04-29 18:39 ` [PATCH 3/9] KVM: selftests: Drop stale function parameter comment for nested_map() David Matlack
@ 2022-05-16 20:32   ` Peter Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2022-05-16 20:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:29PM +0000, David Matlack wrote:
> nested_map() does not take a parameter named eptp_memslot. Drop the
> comment referring to it.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] KVM: selftests: Refactor nested_map() to specify target level
  2022-04-29 18:39 ` [PATCH 4/9] KVM: selftests: Refactor nested_map() to specify target level David Matlack
@ 2022-05-16 20:34   ` Peter Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2022-05-16 20:34 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:30PM +0000, David Matlack wrote:
> Refactor nested_map() to specify that it explicityl wants 4K mappings
> (the existing behavior) and push the implementation down into
> __nested_map(), which can be used in subsequent commits to create huge
> page mappings.
> 
> No function change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/9] KVM: selftests: Add a helper to check EPT/VPID capabilities
  2022-04-29 18:39 ` [PATCH 6/9] KVM: selftests: Add a helper to check EPT/VPID capabilities David Matlack
@ 2022-05-16 20:42   ` Peter Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2022-05-16 20:42 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:32PM +0000, David Matlack wrote:
> Create a small helper function to check if a given EPT/VPID capability
> is supported. This will be re-used in a follow-up commit to check for 1G
> page support.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  2022-04-29 18:39 ` [PATCH 5/9] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h David Matlack
@ 2022-05-16 22:12   ` Peter Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2022-05-16 22:12 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:31PM +0000, David Matlack wrote:
> This is a VMX-related macro so move it to vmx.h. While here, open code
> the mask like the rest of the VMX bitmask macros.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/9] KVM: selftests: Link selftests directly with lib object files
  2022-04-29 18:39 ` [PATCH 7/9] KVM: selftests: Link selftests directly with lib object files David Matlack
@ 2022-05-16 22:15   ` Peter Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2022-05-16 22:15 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:33PM +0000, David Matlack wrote:
> The linker does obey strong/weak symbols when linking static libraries,
> it simply resolves an undefined symbol to the first-encountered symbol.
> This means that defining __weak arch-generic functions and then defining
> arch-specific strong functions to override them in libkvm will not
> always work.
> 
> More specifically, if we have:
> 
> lib/generic.c:
> 
>   void __weak foo(void)
>   {
>           pr_info("weak\n");
>   }
> 
>   void bar(void)
>   {
>           foo();
>   }
> 
> lib/x86_64/arch.c:
> 
>   void foo(void)
>   {
>           pr_info("strong\n");
>   }
> 
> And a selftest that calls bar(), it will print "weak". Now if you make
> generic.o explicitly depend on arch.o (e.g. add function to arch.c that
> is called directly from generic.c) it will print "strong". In other
> words, it seems that the linker is free to throw out arch.o when linking
> because generic.o does not explicitly depend on it, which causes the
> linker to lose the strong symbol.
> 
> One solution is to link libkvm.a with --whole-archive so that the linker
> doesn't throw away object files it thinks are unnecessary. However that
> is a bit difficult to plumb since we are using the common selftests
> makefile rules. An easier solution is to drop libkvm.a just link
> selftests with all the .o files that were originally in libkvm.a.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  tools/testing/selftests/kvm/Makefile | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index af582d168621..c1eb6acb30de 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -172,12 +172,13 @@ LDFLAGS += -pthread $(no-pie-option) $(pgste-option)
>  # $(TEST_GEN_PROGS) starts with $(OUTPUT)/
>  include ../lib.mk
>  
> -STATIC_LIBS := $(OUTPUT)/libkvm.a
>  LIBKVM_C := $(filter %.c,$(LIBKVM))
>  LIBKVM_S := $(filter %.S,$(LIBKVM))
>  LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
>  LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
> -EXTRA_CLEAN += $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(STATIC_LIBS) cscope.*
> +LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
> +
> +EXTRA_CLEAN += $(LIBKVM_OBJS) cscope.*
>  
>  x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ))))
>  $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
> @@ -186,13 +187,9 @@ $(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
>  $(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
>  	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
>  
> -LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
> -$(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
> -	$(AR) crs $@ $^
> -
>  x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
> -all: $(STATIC_LIBS)
> -$(TEST_GEN_PROGS): $(STATIC_LIBS)
> +all: $(LIBKVM_OBJS)

Can this line be dropped alongside?  Default targets should have already
been set in ../lib.mk anyway iiuc, and they all depend on the objs.

> +$(TEST_GEN_PROGS): $(LIBKVM_OBJS)

Never know such a difference, but it does seem true to happen..  Good to
learn this.

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/9] KVM: selftests: Clean up LIBKVM files in Makefile
  2022-04-29 18:39 ` [PATCH 8/9] KVM: selftests: Clean up LIBKVM files in Makefile David Matlack
@ 2022-05-16 22:15   ` Peter Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2022-05-16 22:15 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:34PM +0000, David Matlack wrote:
> Break up the long lines for LIBKVM and alphabetize each architecture.
> This makes reading the Makefile easier, and will make reading diffs to
> LIBKVM easier.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-04-29 18:39 ` [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
@ 2022-05-16 22:17   ` Peter Xu
  2022-05-16 22:34     ` David Matlack
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2022-05-16 22:17 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, Apr 29, 2022 at 06:39:35PM +0000, David Matlack wrote:
> +static void perf_test_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
> +{
> +#define L2_GUEST_STACK_SIZE 64
> +	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> +	unsigned long *rsp;
> +
> +	GUEST_ASSERT(vmx->vmcs_gpa);
> +	GUEST_ASSERT(prepare_for_vmx_operation(vmx));
> +	GUEST_ASSERT(load_vmcs(vmx));
> +	GUEST_ASSERT(ept_1g_pages_supported());
> +
> +	rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
> +	*rsp = vcpu_id;
> +	prepare_vmcs(vmx, perf_test_l2_guest_entry, rsp);

Just to purely ask: is this setting the same stack pointer to all the
vcpus?

> +
> +	GUEST_ASSERT(!vmlaunch());
> +	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
> +	GUEST_DONE();
> +}

[...]

> +/* Identity map the entire guest physical address space with 1GiB Pages. */
> +void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm)
> +{
> +	__nested_map(vmx, vm, 0, 0, vm->max_gfn << vm->page_shift, PG_LEVEL_1G);
> +}

Could max_gfn be large?  Could it consumes a bunch of pages even if mapping
1G only?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-16 22:17   ` Peter Xu
@ 2022-05-16 22:34     ` David Matlack
  2022-05-16 23:42       ` Peter Xu
  0 siblings, 1 reply; 24+ messages in thread
From: David Matlack @ 2022-05-16 22:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Mon, May 16, 2022 at 3:17 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Apr 29, 2022 at 06:39:35PM +0000, David Matlack wrote:
> > +static void perf_test_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
> > +{
> > +#define L2_GUEST_STACK_SIZE 64
> > +     unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> > +     unsigned long *rsp;
> > +
> > +     GUEST_ASSERT(vmx->vmcs_gpa);
> > +     GUEST_ASSERT(prepare_for_vmx_operation(vmx));
> > +     GUEST_ASSERT(load_vmcs(vmx));
> > +     GUEST_ASSERT(ept_1g_pages_supported());
> > +
> > +     rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
> > +     *rsp = vcpu_id;
> > +     prepare_vmcs(vmx, perf_test_l2_guest_entry, rsp);
>
> Just to purely ask: is this setting the same stack pointer to all the
> vcpus?

No, but I understand the confusion since typically selftests use
symbols like "l2_guest_code" that are global. But "l2_guest_stack" is
actually a local variable so it will be allocated on the stack. Each
vCPU runs on a separate stack, so they will each run with their own
"l2_guest_stack".

>
> > +
> > +     GUEST_ASSERT(!vmlaunch());
> > +     GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
> > +     GUEST_DONE();
> > +}
>
> [...]
>
> > +/* Identity map the entire guest physical address space with 1GiB Pages. */
> > +void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm)
> > +{
> > +     __nested_map(vmx, vm, 0, 0, vm->max_gfn << vm->page_shift, PG_LEVEL_1G);
> > +}
>
> Could max_gfn be large?  Could it consumes a bunch of pages even if mapping
> 1G only?

Since the selftests only support 4-level EPT, this will use at most
513 pages. If we add support for 5-level EPT we may need to revisit
this approach.

>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels
  2022-05-13 20:04   ` Peter Xu
@ 2022-05-16 22:38     ` David Matlack
  0 siblings, 0 replies; 24+ messages in thread
From: David Matlack @ 2022-05-16 22:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Fri, May 13, 2022 at 1:04 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Apr 29, 2022 at 06:39:27PM +0000, David Matlack wrote:
> > x86_page_size is an enum used to communicate the desired page size with
> > which to map a range of memory. Under the hood they just encode the
> > desired level at which to map the page. This ends up being clunky in a
> > few ways:
> >
> >  - The name suggests it encodes the size of the page rather than the
> >    level.
> >  - In other places in x86_64/processor.c we just use a raw int to encode
> >    the level.
> >
> > Simplify this by just admitting that x86_page_size is just the level and
> > using an int and some more obviously named macros (e.g. PG_LEVEL_1G).
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >  .../selftests/kvm/include/x86_64/processor.h  | 14 +++++-----
> >  .../selftests/kvm/lib/x86_64/processor.c      | 27 +++++++++----------
> >  .../selftests/kvm/max_guest_memory_test.c     |  2 +-
> >  .../selftests/kvm/x86_64/mmu_role_test.c      |  2 +-
> >  4 files changed, 22 insertions(+), 23 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > index 37db341d4cc5..b512f9f508ae 100644
> > --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> > +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > @@ -465,13 +465,13 @@ void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
> >  struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
> >  void vm_xsave_req_perm(int bit);
> >
> > -enum x86_page_size {
> > -     X86_PAGE_SIZE_4K = 0,
> > -     X86_PAGE_SIZE_2M,
> > -     X86_PAGE_SIZE_1G,
> > -};
> > -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> > -                enum x86_page_size page_size);
> > +#define PG_LEVEL_4K 0
> > +#define PG_LEVEL_2M 1
> > +#define PG_LEVEL_1G 2
>
> A nitpick is: we could have named those as PG_LEVEL_[PTE|PMD|PUD|PGD..]
> rather than 4K|2M|..., then...

I went with these names to match the KVM code (although the level
numbers themselves are off by 1).

>
> > +
> > +#define PG_LEVEL_SIZE(_level) (1ull << (((_level) * 9) + 12))
> > +
> > +void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
> >
> >  /*
> >   * Basic CPU control in CR0
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > index 9f000dfb5594..1a7de69e2495 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > @@ -199,15 +199,15 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
> >                                                   uint64_t pt_pfn,
> >                                                   uint64_t vaddr,
> >                                                   uint64_t paddr,
> > -                                                 int level,
> > -                                                 enum x86_page_size page_size)
> > +                                                 int current_level,
> > +                                                 int target_level)
> >  {
> > -     struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, level);
> > +     struct pageUpperEntry *pte = virt_get_pte(vm, pt_pfn, vaddr, current_level);
> >
> >       if (!pte->present) {
> >               pte->writable = true;
> >               pte->present = true;
> > -             pte->page_size = (level == page_size);
> > +             pte->page_size = (current_level == target_level);
> >               if (pte->page_size)
> >                       pte->pfn = paddr >> vm->page_shift;
> >               else
> > @@ -218,20 +218,19 @@ static struct pageUpperEntry *virt_create_upper_pte(struct kvm_vm *vm,
> >                * a hugepage at this level, and that there isn't a hugepage at
> >                * this level.
> >                */
> > -             TEST_ASSERT(level != page_size,
> > +             TEST_ASSERT(current_level != target_level,
> >                           "Cannot create hugepage at level: %u, vaddr: 0x%lx\n",
> > -                         page_size, vaddr);
> > +                         current_level, vaddr);
> >               TEST_ASSERT(!pte->page_size,
> >                           "Cannot create page table at level: %u, vaddr: 0x%lx\n",
> > -                         level, vaddr);
> > +                         current_level, vaddr);
> >       }
> >       return pte;
> >  }
> >
> > -void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> > -                enum x86_page_size page_size)
> > +void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> >  {
> > -     const uint64_t pg_size = 1ull << ((page_size * 9) + 12);
> > +     const uint64_t pg_size = PG_LEVEL_SIZE(level);
> >       struct pageUpperEntry *pml4e, *pdpe, *pde;
> >       struct pageTableEntry *pte;
> >
> > @@ -256,15 +255,15 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> >        * early if a hugepage was created.
> >        */
> >       pml4e = virt_create_upper_pte(vm, vm->pgd >> vm->page_shift,
> > -                                   vaddr, paddr, 3, page_size);
> > +                                   vaddr, paddr, 3, level);
> >       if (pml4e->page_size)
> >               return;
> >
> > -     pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, page_size);
> > +     pdpe = virt_create_upper_pte(vm, pml4e->pfn, vaddr, paddr, 2, level);
> >       if (pdpe->page_size)
> >               return;
> >
> > -     pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, page_size);
> > +     pde = virt_create_upper_pte(vm, pdpe->pfn, vaddr, paddr, 1, level);
>
> ... here we could also potentially replace the 3/2/1s with the new macro
> (or with existing naming number 3 will be missing a macro)?

Good point. Will do.

>
> >       if (pde->page_size)
> >               return;
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] KVM: selftests: Add option to create 2M and 1G EPT mappings
  2022-05-16 20:32   ` Peter Xu
@ 2022-05-16 22:38     ` David Matlack
  0 siblings, 0 replies; 24+ messages in thread
From: David Matlack @ 2022-05-16 22:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Mon, May 16, 2022 at 1:32 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Apr 29, 2022 at 06:39:28PM +0000, David Matlack wrote:
> > The current EPT mapping code in the selftests only supports mapping 4K
> > pages. This commit extends that support with an option to map at 2M or
> > 1G. This will be used in a future commit to create large page mappings
> > to test eager page splitting.
> >
> > No functional change intended.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >  tools/testing/selftests/kvm/lib/x86_64/vmx.c | 105 ++++++++++---------
> >  1 file changed, 57 insertions(+), 48 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> > index d089d8b850b5..1fa2d1059ade 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> > @@ -392,27 +392,60 @@ void nested_vmx_check_supported(void)
> >       }
> >  }
> >
> > -void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> > -                uint64_t nested_paddr, uint64_t paddr)
> > +static void nested_create_upper_pte(struct kvm_vm *vm,
> > +                                 struct eptPageTableEntry *pte,
> > +                                 uint64_t nested_paddr,
> > +                                 uint64_t paddr,
> > +                                 int current_level,
> > +                                 int target_level)
> > +{
> > +     if (!pte->readable) {
> > +             pte->writable = true;
> > +             pte->readable = true;
> > +             pte->executable = true;
> > +             pte->page_size = (current_level == target_level);
> > +             if (pte->page_size)
> > +                     pte->address = paddr >> vm->page_shift;
> > +             else
> > +                     pte->address = vm_alloc_page_table(vm) >> vm->page_shift;
> > +     } else {
> > +             /*
> > +              * Entry already present.  Assert that the caller doesn't want
> > +              * a hugepage at this level, and that there isn't a hugepage at
> > +              * this level.
> > +              */
> > +             TEST_ASSERT(current_level != target_level,
> > +                         "Cannot create hugepage at level: %u, nested_paddr: 0x%lx\n",
> > +                         current_level, nested_paddr);
> > +             TEST_ASSERT(!pte->page_size,
> > +                         "Cannot create page table at level: %u, nested_paddr: 0x%lx\n",
> > +                         current_level, nested_paddr);
> > +     }
> > +}
> > +
> > +
> > +void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> > +                  uint64_t nested_paddr, uint64_t paddr, int target_level)
> >  {
> > +     const uint64_t page_size = PG_LEVEL_SIZE(target_level);
> > +     struct eptPageTableEntry *pt;
> >       uint16_t index[4];
> > -     struct eptPageTableEntry *pml4e;
> >
> >       TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
> >                   "unknown or unsupported guest mode, mode: 0x%x", vm->mode);
> >
> > -     TEST_ASSERT((nested_paddr % vm->page_size) == 0,
> > +     TEST_ASSERT((nested_paddr % page_size) == 0,
> >                   "Nested physical address not on page boundary,\n"
> > -                 "  nested_paddr: 0x%lx vm->page_size: 0x%x",
> > -                 nested_paddr, vm->page_size);
> > +                 "  nested_paddr: 0x%lx page_size: 0x%lx",
> > +                 nested_paddr, page_size);
> >       TEST_ASSERT((nested_paddr >> vm->page_shift) <= vm->max_gfn,
> >                   "Physical address beyond beyond maximum supported,\n"
> >                   "  nested_paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
> >                   paddr, vm->max_gfn, vm->page_size);
> > -     TEST_ASSERT((paddr % vm->page_size) == 0,
> > +     TEST_ASSERT((paddr % page_size) == 0,
> >                   "Physical address not on page boundary,\n"
> > -                 "  paddr: 0x%lx vm->page_size: 0x%x",
> > -                 paddr, vm->page_size);
> > +                 "  paddr: 0x%lx page_size: 0x%lx",
> > +                 paddr, page_size);
> >       TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn,
> >                   "Physical address beyond beyond maximum supported,\n"
> >                   "  paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
> > @@ -423,49 +456,25 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> >       index[2] = (nested_paddr >> 30) & 0x1ffu;
> >       index[3] = (nested_paddr >> 39) & 0x1ffu;
> >
> > -     /* Allocate page directory pointer table if not present. */
> > -     pml4e = vmx->eptp_hva;
> > -     if (!pml4e[index[3]].readable) {
> > -             pml4e[index[3]].address = vm_alloc_page_table(vm) >> vm->page_shift;
> > -             pml4e[index[3]].writable = true;
> > -             pml4e[index[3]].readable = true;
> > -             pml4e[index[3]].executable = true;
> > -     }
> > +     pt = vmx->eptp_hva;
> >
> > -     /* Allocate page directory table if not present. */
> > -     struct eptPageTableEntry *pdpe;
> > -     pdpe = addr_gpa2hva(vm, pml4e[index[3]].address * vm->page_size);
> > -     if (!pdpe[index[2]].readable) {
> > -             pdpe[index[2]].address = vm_alloc_page_table(vm) >> vm->page_shift;
> > -             pdpe[index[2]].writable = true;
> > -             pdpe[index[2]].readable = true;
> > -             pdpe[index[2]].executable = true;
> > -     }
> > +     for (int current_level = 3; current_level >= 0; current_level--) {
> > +             struct eptPageTableEntry *pte = &pt[index[current_level]];
> >
> > -     /* Allocate page table if not present. */
> > -     struct eptPageTableEntry *pde;
> > -     pde = addr_gpa2hva(vm, pdpe[index[2]].address * vm->page_size);
> > -     if (!pde[index[1]].readable) {
> > -             pde[index[1]].address = vm_alloc_page_table(vm) >> vm->page_shift;
> > -             pde[index[1]].writable = true;
> > -             pde[index[1]].readable = true;
> > -             pde[index[1]].executable = true;
> > -     }
> > +             nested_create_upper_pte(vm, pte, nested_paddr, paddr,
> > +                                     current_level, target_level);
>
> This is going to run for the last level pte too, so maybe remove the
> "upper" prefix in the helper?

Good idea. Will do.

>
> >
> > -     /* Fill in page table entry. */
> > -     struct eptPageTableEntry *pte;
> > -     pte = addr_gpa2hva(vm, pde[index[1]].address * vm->page_size);
> > -     pte[index[0]].address = paddr >> vm->page_shift;
> > -     pte[index[0]].writable = true;
> > -     pte[index[0]].readable = true;
> > -     pte[index[0]].executable = true;
> > +             if (pte->page_size)
> > +                     break;
> >
> > -     /*
> > -      * For now mark these as accessed and dirty because the only
> > -      * testcase we have needs that.  Can be reconsidered later.
> > -      */
> > -     pte[index[0]].accessed = true;
> > -     pte[index[0]].dirty = true;
>
> Is it intended to to drop the access/dirty bits here?

This was not intentional. Thanks for catching it!
>
> > +             pt = addr_gpa2hva(vm, pte->address * vm->page_size);
> > +     }
> > +}
> > +
> > +void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> > +                uint64_t nested_paddr, uint64_t paddr)
> > +{
> > +     __nested_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
> >  }
> >
> >  /*
> > --
> > 2.36.0.464.gb9c8b46e94-goog
> >
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-16 22:34     ` David Matlack
@ 2022-05-16 23:42       ` Peter Xu
  2022-05-16 23:47         ` David Matlack
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2022-05-16 23:42 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Mon, May 16, 2022 at 03:34:28PM -0700, David Matlack wrote:
> On Mon, May 16, 2022 at 3:17 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 06:39:35PM +0000, David Matlack wrote:
> > > +static void perf_test_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
> > > +{
> > > +#define L2_GUEST_STACK_SIZE 64
> > > +     unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> > > +     unsigned long *rsp;
> > > +
> > > +     GUEST_ASSERT(vmx->vmcs_gpa);
> > > +     GUEST_ASSERT(prepare_for_vmx_operation(vmx));
> > > +     GUEST_ASSERT(load_vmcs(vmx));
> > > +     GUEST_ASSERT(ept_1g_pages_supported());
> > > +
> > > +     rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
> > > +     *rsp = vcpu_id;
> > > +     prepare_vmcs(vmx, perf_test_l2_guest_entry, rsp);
> >
> > Just to purely ask: is this setting the same stack pointer to all the
> > vcpus?
> 
> No, but I understand the confusion since typically selftests use
> symbols like "l2_guest_code" that are global. But "l2_guest_stack" is
> actually a local variable so it will be allocated on the stack. Each
> vCPU runs on a separate stack, so they will each run with their own
> "l2_guest_stack".

Ahh that's correct!

> 
> >
> > > +
> > > +     GUEST_ASSERT(!vmlaunch());
> > > +     GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
> > > +     GUEST_DONE();
> > > +}
> >
> > [...]
> >
> > > +/* Identity map the entire guest physical address space with 1GiB Pages. */
> > > +void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm)
> > > +{
> > > +     __nested_map(vmx, vm, 0, 0, vm->max_gfn << vm->page_shift, PG_LEVEL_1G);
> > > +}
> >
> > Could max_gfn be large?  Could it consumes a bunch of pages even if mapping
> > 1G only?
> 
> Since the selftests only support 4-level EPT, this will use at most
> 513 pages. If we add support for 5-level EPT we may need to revisit
> this approach.

It's just that AFAICT vm_alloc_page_table() is fetching from slot 0 for all
kinds of pgtables including EPT.  I'm not sure whether there can be some
failures conditionally with this because when creating the vm we're not
aware of this consumption, so maybe we'd reserve the pages somehow so that
we'll be sure to have those pages at least?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  2022-05-16 23:42       ` Peter Xu
@ 2022-05-16 23:47         ` David Matlack
  0 siblings, 0 replies; 24+ messages in thread
From: David Matlack @ 2022-05-16 23:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Ben Gardon, Sean Christopherson, Oliver Upton,
	Vitaly Kuznetsov, Andrew Jones,
	open list:KERNEL VIRTUAL MACHINE (KVM)

On Mon, May 16, 2022 at 4:42 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, May 16, 2022 at 03:34:28PM -0700, David Matlack wrote:
> > On Mon, May 16, 2022 at 3:17 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Apr 29, 2022 at 06:39:35PM +0000, David Matlack wrote:
> > > > +static void perf_test_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
> > > > +{
> > > > +#define L2_GUEST_STACK_SIZE 64
> > > > +     unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> > > > +     unsigned long *rsp;
> > > > +
> > > > +     GUEST_ASSERT(vmx->vmcs_gpa);
> > > > +     GUEST_ASSERT(prepare_for_vmx_operation(vmx));
> > > > +     GUEST_ASSERT(load_vmcs(vmx));
> > > > +     GUEST_ASSERT(ept_1g_pages_supported());
> > > > +
> > > > +     rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
> > > > +     *rsp = vcpu_id;
> > > > +     prepare_vmcs(vmx, perf_test_l2_guest_entry, rsp);
> > >
> > > Just to purely ask: is this setting the same stack pointer to all the
> > > vcpus?
> >
> > No, but I understand the confusion since typically selftests use
> > symbols like "l2_guest_code" that are global. But "l2_guest_stack" is
> > actually a local variable so it will be allocated on the stack. Each
> > vCPU runs on a separate stack, so they will each run with their own
> > "l2_guest_stack".
>
> Ahh that's correct!
>
> >
> > >
> > > > +
> > > > +     GUEST_ASSERT(!vmlaunch());
> > > > +     GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
> > > > +     GUEST_DONE();
> > > > +}
> > >
> > > [...]
> > >
> > > > +/* Identity map the entire guest physical address space with 1GiB Pages. */
> > > > +void nested_map_all_1g(struct vmx_pages *vmx, struct kvm_vm *vm)
> > > > +{
> > > > +     __nested_map(vmx, vm, 0, 0, vm->max_gfn << vm->page_shift, PG_LEVEL_1G);
> > > > +}
> > >
> > > Could max_gfn be large?  Could it consumes a bunch of pages even if mapping
> > > 1G only?
> >
> > Since the selftests only support 4-level EPT, this will use at most
> > 513 pages. If we add support for 5-level EPT we may need to revisit
> > this approach.
>
> It's just that AFAICT vm_alloc_page_table() is fetching from slot 0 for all
> kinds of pgtables including EPT.  I'm not sure whether there can be some
> failures conditionally with this because when creating the vm we're not
> aware of this consumption, so maybe we'd reserve the pages somehow so that
> we'll be sure to have those pages at least?

So far in my tests perf_test_util seemed to allocate enough pages in
slot 0 that this just worked, so I didn't bother to explicitly reserve
the extra pages. But that's just an accident waiting to happen as you
point out, so I'll fix that in v2.

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-05-16 23:48 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 18:39 [PATCH 0/9] KVM: selftests: Add nested support to dirty_log_perf_test David Matlack
2022-04-29 18:39 ` [PATCH 1/9] KVM: selftests: Replace x86_page_size with raw levels David Matlack
2022-05-13 20:04   ` Peter Xu
2022-05-16 22:38     ` David Matlack
2022-04-29 18:39 ` [PATCH 2/9] KVM: selftests: Add option to create 2M and 1G EPT mappings David Matlack
2022-05-16 20:32   ` Peter Xu
2022-05-16 22:38     ` David Matlack
2022-04-29 18:39 ` [PATCH 3/9] KVM: selftests: Drop stale function parameter comment for nested_map() David Matlack
2022-05-16 20:32   ` Peter Xu
2022-04-29 18:39 ` [PATCH 4/9] KVM: selftests: Refactor nested_map() to specify target level David Matlack
2022-05-16 20:34   ` Peter Xu
2022-04-29 18:39 ` [PATCH 5/9] KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h David Matlack
2022-05-16 22:12   ` Peter Xu
2022-04-29 18:39 ` [PATCH 6/9] KVM: selftests: Add a helper to check EPT/VPID capabilities David Matlack
2022-05-16 20:42   ` Peter Xu
2022-04-29 18:39 ` [PATCH 7/9] KVM: selftests: Link selftests directly with lib object files David Matlack
2022-05-16 22:15   ` Peter Xu
2022-04-29 18:39 ` [PATCH 8/9] KVM: selftests: Clean up LIBKVM files in Makefile David Matlack
2022-05-16 22:15   ` Peter Xu
2022-04-29 18:39 ` [PATCH 9/9] KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 David Matlack
2022-05-16 22:17   ` Peter Xu
2022-05-16 22:34     ` David Matlack
2022-05-16 23:42       ` Peter Xu
2022-05-16 23:47         ` David Matlack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.