kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Verify dirty logging works properly with page stats
@ 2022-03-21  0:26 Mingwei Zhang
  2022-03-21  0:26 ` [PATCH 1/4] selftests: KVM: Dump VM stats in binary stats test Mingwei Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-21  0:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	David Matlack, Jing Zhang, Peter Xu, Ben Gardon

This patchset is to verify if dirty logging works properly by checking page
stats from the per-VM interface. We discovered one performance bug in
disallowed_hugepage_adjust() which prevents KVM from recovering large pages
for the guest. The selftest logic added later could help validate the
problem.

The patchset borrowes two patches come from Ben's series: "[PATCH 00/13]
KVM: x86: Add a cap to disable NX hugepages on a VM" [1], which completes the
selftest library functions to use the stats interface.

[1] https://lore.kernel.org/all/20220310164532.1821490-2-bgardon@google.com/T/

Ben Gardon (2):
  selftests: KVM: Dump VM stats in binary stats test
  selftests: KVM: Test reading a single stat

Mingwei Zhang (2):
  KVM: x86/mmu: explicitly check nx_hugepage in
    disallowed_hugepage_adjust()
  selftests: KVM: use dirty logging to check if page stats work
    correctly

 arch/x86/kvm/mmu/mmu.c                        |  14 +-
 .../selftests/kvm/dirty_log_perf_test.c       |  52 +++++
 .../selftests/kvm/include/kvm_util_base.h     |   2 +
 .../selftests/kvm/kvm_binary_stats_test.c     |   6 +
 tools/testing/selftests/kvm/lib/kvm_util.c    | 196 ++++++++++++++++++
 5 files changed, 268 insertions(+), 2 deletions(-)

-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] selftests: KVM: Dump VM stats in binary stats test
  2022-03-21  0:26 [PATCH 0/4] Verify dirty logging works properly with page stats Mingwei Zhang
@ 2022-03-21  0:26 ` Mingwei Zhang
  2022-03-21  0:26 ` [PATCH 2/4] selftests: KVM: Test reading a single stat Mingwei Zhang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-21  0:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	David Matlack, Jing Zhang, Peter Xu, Ben Gardon

From: Ben Gardon <bgardon@google.com>

Add kvm_util library functions to read KVM stats through the binary
stats interface and then dump them to stdout when running the binary
stats test. Subsequent commits will extend the kvm_util code and use it
to make assertions in a test for NX hugepages.

CC: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 .../selftests/kvm/include/kvm_util_base.h     |   1 +
 .../selftests/kvm/kvm_binary_stats_test.c     |   3 +
 tools/testing/selftests/kvm/lib/kvm_util.c    | 143 ++++++++++++++++++
 3 files changed, 147 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 4ed6aa049a91..160b9ad8474a 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -392,6 +392,7 @@ void assert_on_unhandled_exception(struct kvm_vm *vm, uint32_t vcpuid);
 
 int vm_get_stats_fd(struct kvm_vm *vm);
 int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid);
+void dump_vm_stats(struct kvm_vm *vm);
 
 uint32_t guest_get_vcpuid(void);
 
diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 17f65d514915..afc4701ce8dd 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -174,6 +174,9 @@ static void vm_stats_test(struct kvm_vm *vm)
 	stats_test(stats_fd);
 	close(stats_fd);
 	TEST_ASSERT(fcntl(stats_fd, F_GETFD) == -1, "Stats fd not freed");
+
+	/* Dump VM stats */
+	dump_vm_stats(vm);
 }
 
 static void vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index d8cf851ab119..d9c660913403 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2517,3 +2517,146 @@ int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid)
 
 	return ioctl(vcpu->fd, KVM_GET_STATS_FD, NULL);
 }
+
+/* Caller is responsible for freeing the returned kvm_stats_header. */
+static struct kvm_stats_header *read_vm_stats_header(int stats_fd)
+{
+	struct kvm_stats_header *header;
+	ssize_t ret;
+
+	/* Read kvm stats header */
+	header = malloc(sizeof(*header));
+	TEST_ASSERT(header, "Allocate memory for stats header");
+
+	ret = read(stats_fd, header, sizeof(*header));
+	TEST_ASSERT(ret == sizeof(*header), "Read stats header");
+
+	return header;
+}
+
+static void dump_header(int stats_fd, struct kvm_stats_header *header)
+{
+	ssize_t ret;
+	char *id;
+
+	printf("flags: %u\n", header->flags);
+	printf("name size: %u\n", header->name_size);
+	printf("num_desc: %u\n", header->num_desc);
+	printf("id_offset: %u\n", header->id_offset);
+	printf("desc_offset: %u\n", header->desc_offset);
+	printf("data_offset: %u\n", header->data_offset);
+
+	/* Read kvm stats id string */
+	id = malloc(header->name_size);
+	TEST_ASSERT(id, "Allocate memory for id string");
+	ret = pread(stats_fd, id, header->name_size, header->id_offset);
+	TEST_ASSERT(ret == header->name_size, "Read id string");
+
+	printf("id: %s\n", id);
+
+	free(id);
+}
+
+static ssize_t stats_desc_size(struct kvm_stats_header *header)
+{
+	return sizeof(struct kvm_stats_desc) + header->name_size;
+}
+
+/* Caller is responsible for freeing the returned kvm_stats_desc. */
+static struct kvm_stats_desc *read_vm_stats_desc(int stats_fd,
+						 struct kvm_stats_header *header)
+{
+	struct kvm_stats_desc *stats_desc;
+	size_t size_desc;
+	ssize_t ret;
+
+	size_desc = header->num_desc * stats_desc_size(header);
+
+	/* Allocate memory for stats descriptors */
+	stats_desc = malloc(size_desc);
+	TEST_ASSERT(stats_desc, "Allocate memory for stats descriptors");
+
+	/* Read kvm stats descriptors */
+	ret = pread(stats_fd, stats_desc, size_desc, header->desc_offset);
+	TEST_ASSERT(ret == size_desc, "Read KVM stats descriptors");
+
+	return stats_desc;
+}
+
+/* Caller is responsible for freeing the memory *data. */
+static int read_stat_data(int stats_fd, struct kvm_stats_header *header,
+			  struct kvm_stats_desc *desc, uint64_t **data)
+{
+	u64 *stats_data;
+	ssize_t ret;
+
+	stats_data = malloc(desc->size * sizeof(*stats_data));
+
+	ret = pread(stats_fd, stats_data, desc->size * sizeof(*stats_data),
+		    header->data_offset + desc->offset);
+
+	/* ret is in bytes. */
+	ret = ret / sizeof(*stats_data);
+
+	TEST_ASSERT(ret == desc->size,
+		    "Read data of KVM stats: %s", desc->name);
+
+	*data = stats_data;
+
+	return ret;
+}
+
+static void dump_stat(int stats_fd, struct kvm_stats_header *header,
+		      struct kvm_stats_desc *desc)
+{
+	u64 *stats_data;
+	ssize_t ret;
+	int i;
+
+	printf("\tflags: %u\n", desc->flags);
+	printf("\texponent: %u\n", desc->exponent);
+	printf("\tsize: %u\n", desc->size);
+	printf("\toffset: %u\n", desc->offset);
+	printf("\tbucket_size: %u\n", desc->bucket_size);
+	printf("\tname: %s\n", (char *)&desc->name);
+
+	ret = read_stat_data(stats_fd, header, desc, &stats_data);
+
+	printf("\tdata: %lu", *stats_data);
+	for (i = 1; i < ret; i++)
+		printf(", %lu", *(stats_data + i));
+	printf("\n\n");
+
+	free(stats_data);
+}
+
+void dump_vm_stats(struct kvm_vm *vm)
+{
+	struct kvm_stats_desc *stats_desc;
+	struct kvm_stats_header *header;
+	struct kvm_stats_desc *desc;
+	size_t size_desc;
+	int stats_fd;
+	int i;
+
+	stats_fd = vm_get_stats_fd(vm);
+
+	header = read_vm_stats_header(stats_fd);
+	dump_header(stats_fd, header);
+
+	stats_desc = read_vm_stats_desc(stats_fd, header);
+
+	size_desc = stats_desc_size(header);
+
+	/* Read kvm stats data one by one */
+	for (i = 0; i < header->num_desc; ++i) {
+		desc = (void *)stats_desc + (i * size_desc);
+		dump_stat(stats_fd, header, desc);
+	}
+
+	free(stats_desc);
+	free(header);
+
+	close(stats_fd);
+}
+
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] selftests: KVM: Test reading a single stat
  2022-03-21  0:26 [PATCH 0/4] Verify dirty logging works properly with page stats Mingwei Zhang
  2022-03-21  0:26 ` [PATCH 1/4] selftests: KVM: Dump VM stats in binary stats test Mingwei Zhang
@ 2022-03-21  0:26 ` Mingwei Zhang
  2022-03-21  0:26 ` [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Mingwei Zhang
  2022-03-21  0:26 ` [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly Mingwei Zhang
  3 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-21  0:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	David Matlack, Jing Zhang, Peter Xu, Ben Gardon

From: Ben Gardon <bgardon@google.com>

Retrieve the value of a single stat by name in the binary stats test to
ensure the kvm_util library functions work.

CC: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 .../selftests/kvm/include/kvm_util_base.h     |  1 +
 .../selftests/kvm/kvm_binary_stats_test.c     |  3 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 53 +++++++++++++++++++
 3 files changed, 57 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 160b9ad8474a..a07964c95941 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -393,6 +393,7 @@ void assert_on_unhandled_exception(struct kvm_vm *vm, uint32_t vcpuid);
 int vm_get_stats_fd(struct kvm_vm *vm);
 int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid);
 void dump_vm_stats(struct kvm_vm *vm);
+uint64_t vm_get_single_stat(struct kvm_vm *vm, const char *stat_name);
 
 uint32_t guest_get_vcpuid(void);
 
diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index afc4701ce8dd..97bde355f105 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -177,6 +177,9 @@ static void vm_stats_test(struct kvm_vm *vm)
 
 	/* Dump VM stats */
 	dump_vm_stats(vm);
+
+	/* Read a single stat. */
+	printf("remote_tlb_flush: %lu\n", vm_get_single_stat(vm, "remote_tlb_flush"));
 }
 
 static void vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index d9c660913403..dad54f5d57e7 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2660,3 +2660,56 @@ void dump_vm_stats(struct kvm_vm *vm)
 	close(stats_fd);
 }
 
+static int vm_get_stat_data(struct kvm_vm *vm, const char *stat_name,
+			    uint64_t **data)
+{
+	struct kvm_stats_desc *stats_desc;
+	struct kvm_stats_header *header;
+	struct kvm_stats_desc *desc;
+	size_t size_desc;
+	int stats_fd;
+	int ret = -EINVAL;
+	int i;
+
+	*data = NULL;
+
+	stats_fd = vm_get_stats_fd(vm);
+
+	header = read_vm_stats_header(stats_fd);
+
+	stats_desc = read_vm_stats_desc(stats_fd, header);
+
+	size_desc = stats_desc_size(header);
+
+	/* Read kvm stats data one by one */
+	for (i = 0; i < header->num_desc; ++i) {
+		desc = (void *)stats_desc + (i * size_desc);
+
+		if (strcmp(desc->name, stat_name))
+			continue;
+
+		ret = read_stat_data(stats_fd, header, desc, data);
+	}
+
+	free(stats_desc);
+	free(header);
+
+	close(stats_fd);
+
+	return ret;
+}
+
+uint64_t vm_get_single_stat(struct kvm_vm *vm, const char *stat_name)
+{
+	uint64_t *data;
+	uint64_t value;
+	int ret;
+
+	ret = vm_get_stat_data(vm, stat_name, &data);
+	TEST_ASSERT(ret == 1, "Stat %s expected to have 1 element, but has %d",
+		    stat_name, ret);
+	value = *data;
+	free(data);
+	return value;
+}
+
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()
  2022-03-21  0:26 [PATCH 0/4] Verify dirty logging works properly with page stats Mingwei Zhang
  2022-03-21  0:26 ` [PATCH 1/4] selftests: KVM: Dump VM stats in binary stats test Mingwei Zhang
  2022-03-21  0:26 ` [PATCH 2/4] selftests: KVM: Test reading a single stat Mingwei Zhang
@ 2022-03-21  0:26 ` Mingwei Zhang
  2022-03-21 17:56   ` Ben Gardon
  2022-03-21 22:00   ` David Matlack
  2022-03-21  0:26 ` [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly Mingwei Zhang
  3 siblings, 2 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-21  0:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	David Matlack, Jing Zhang, Peter Xu, Ben Gardon

Add extra check to specify the case of nx hugepage and allow KVM to
reconstruct large mapping after dirty logging is disabled. Existing code
works only for nx hugepage but the condition is too general in that does
not consider other usage case (such as dirty logging). Moreover, existing
code assumes that a present PMD or PUD indicates that there exist 'smaller
SPTEs' under the paging structure. This assumption may no be true if
consider the zapping leafs only behavior in MMU.

Missing the check causes KVM incorrectly regards the faulting page as a NX
huge page and refuse to map it at desired level. And this leads to back
performance in shadow mmu and potentiall TDP mmu.

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable@vger.kernel.org

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5628d0ba637e..4d358c273f6c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    cur_level == fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte)) {
+		struct kvm_mmu_page *sp;
+		u64 page_mask;
+		/*
+		 * When nx hugepage flag is not set, there is no reason to
+		 * go down to another level. This helps demand paging to
+		 * generate large mappings.
+		 */
+		sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
+		if (!sp->lpage_disallowed)
+			return;
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch)
 		 * and __direct_map would like to create a large PTE
@@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 		 * patching back for them into pfn the next 9 bits of
 		 * the address.
 		 */
-		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
-				KVM_PAGES_PER_HPAGE(cur_level - 1);
+		page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
+			KVM_PAGES_PER_HPAGE(cur_level - 1);
 		fault->pfn |= fault->gfn & page_mask;
 		fault->goal_level--;
 	}
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly
  2022-03-21  0:26 [PATCH 0/4] Verify dirty logging works properly with page stats Mingwei Zhang
                   ` (2 preceding siblings ...)
  2022-03-21  0:26 ` [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Mingwei Zhang
@ 2022-03-21  0:26 ` Mingwei Zhang
  2022-03-21 17:55   ` Ben Gardon
  2022-03-21 18:08   ` Ben Gardon
  3 siblings, 2 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-21  0:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Mingwei Zhang,
	David Matlack, Jing Zhang, Peter Xu, Ben Gardon

When dirty logging is enabled, KVM splits the all hugepage mapping in
NPT/EPT into the smallest 4K size. This property could be used to check if
the page stats metrics work properly in KVM mmu. At the same time, this
logic might be used the other way around: using page stats to verify if
dirty logging really splits all huge pages. Moreover, when dirty logging is
disabled, KVM zaps corresponding SPTEs and we could check whether the large
pages come back when guest touches the pages again.

So add page stats checking in dirty logging performance selftest. In
particular, add checks in three locations:
 - just after vm is created;
 - after populating memory into vm but before enabling dirty logging;
 - just after turning on dirty logging.
 - after one final iteration after turning off dirty logging.

Tested using commands:
 - ./dirty_log_perf_test -s anonymous_hugetlb_1gb
 - ./dirty_log_perf_test -s anonymous_thp

Cc: Sean Christopherson <seanjc@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: Peter Xu <peterx@redhat.com>

Suggested-by: Ben Gardon <bgorden@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 .../selftests/kvm/dirty_log_perf_test.c       | 52 +++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index 1954b964d1cf..ab0457d91658 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -19,6 +19,10 @@
 #include "perf_test_util.h"
 #include "guest_modes.h"
 
+#ifdef __x86_64__
+#include "processor.h"
+#endif
+
 /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/
 #define TEST_HOST_LOOP_N		2UL
 
@@ -185,6 +189,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 				 p->slots, p->backing_src,
 				 p->partition_vcpu_memory_access);
 
+#ifdef __x86_64__
+	TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") == 0,
+		    "4K page is non zero");
+	TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
+		    "2M page is non zero");
+	TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
+		    "1G page is non zero");
+#endif
 	perf_test_set_wr_fract(vm, p->wr_fract);
 
 	guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
@@ -222,6 +234,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	pr_info("Populate memory time: %ld.%.9lds\n",
 		ts_diff.tv_sec, ts_diff.tv_nsec);
 
+#ifdef __x86_64__
+	TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
+		    "4K page is zero");
+	if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
+		TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
+			    "2M page is zero");
+	if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
+		TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
+			    "1G page is zero");
+#endif
 	/* Enable dirty logging */
 	clock_gettime(CLOCK_MONOTONIC, &start);
 	enable_dirty_logging(vm, p->slots);
@@ -267,6 +289,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 				iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
 		}
 	}
+#ifdef __x86_64__
+	TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
+		    "4K page is zero after dirty logging");
+	TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
+		    "2M page is non-zero after dirty logging");
+	TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
+		    "1G page is non-zero after dirty logging");
+#endif
 
 	/* Disable dirty logging */
 	clock_gettime(CLOCK_MONOTONIC, &start);
@@ -275,6 +305,28 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	pr_info("Disabling dirty logging time: %ld.%.9lds\n",
 		ts_diff.tv_sec, ts_diff.tv_nsec);
 
+#ifdef __x86_64__
+	/*
+	 * Increment iteration to run the vcpus again to verify if huge pages
+	 * come back.
+	 */
+	iteration++;
+	pr_info("Starting the final iteration to verify page stats\n");
+
+	for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
+		while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id])
+		       != iteration)
+			;
+	}
+
+	if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
+		TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
+			    "2M page is zero");
+	if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
+		TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
+			    "1G page is zero");
+#endif
+
 	/* Tell the vcpu thread to quit */
 	host_quit = true;
 	perf_test_join_vcpu_threads(nr_vcpus);
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly
  2022-03-21  0:26 ` [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly Mingwei Zhang
@ 2022-03-21 17:55   ` Ben Gardon
  2022-03-22  5:01     ` Mingwei Zhang
  2022-03-23 18:21     ` Mingwei Zhang
  2022-03-21 18:08   ` Ben Gardon
  1 sibling, 2 replies; 15+ messages in thread
From: Ben Gardon @ 2022-03-21 17:55 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Matlack,
	Jing Zhang, Peter Xu, Ben Gardon

On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
>
> When dirty logging is enabled, KVM splits the all hugepage mapping in
> NPT/EPT into the smallest 4K size. This property could be used to check if

Note this is only true if eager page splitting is enabled. It would be
more accurate to say:
"While dirty logging is enabled, KVM will re-map any accessed page in
NPT/EPT at 4K."

> the page stats metrics work properly in KVM mmu. At the same time, this
> logic might be used the other way around: using page stats to verify if
> dirty logging really splits all huge pages. Moreover, when dirty logging is

It might be worth having a follow up commit which checks if eager
splitting is enabled and changes the assertions accordingly.

> disabled, KVM zaps corresponding SPTEs and we could check whether the large
> pages come back when guest touches the pages again.
>
> So add page stats checking in dirty logging performance selftest. In
> particular, add checks in three locations:
>  - just after vm is created;
>  - after populating memory into vm but before enabling dirty logging;
>  - just after turning on dirty logging.

Note a key stage here is after dirty logging is enabled, and then the
VM touches all the memory in the data region.
I believe that's the point at which you're making the assertion that
all mappings are 4k currently, which is the right place if eager
splitting is not enabled.

>  - after one final iteration after turning off dirty logging.
>
> Tested using commands:
>  - ./dirty_log_perf_test -s anonymous_hugetlb_1gb
>  - ./dirty_log_perf_test -s anonymous_thp
>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Jing Zhang <jingzhangos@google.com>
> Cc: Peter Xu <peterx@redhat.com>
>
> Suggested-by: Ben Gardon <bgorden@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  .../selftests/kvm/dirty_log_perf_test.c       | 52 +++++++++++++++++++
>  1 file changed, 52 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> index 1954b964d1cf..ab0457d91658 100644
> --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> @@ -19,6 +19,10 @@
>  #include "perf_test_util.h"
>  #include "guest_modes.h"
>
> +#ifdef __x86_64__
> +#include "processor.h"
> +#endif
> +
>  /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/
>  #define TEST_HOST_LOOP_N               2UL
>
> @@ -185,6 +189,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                                  p->slots, p->backing_src,
>                                  p->partition_vcpu_memory_access);
>
> +#ifdef __x86_64__
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") == 0,
> +                   "4K page is non zero");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> +                   "2M page is non zero");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> +                   "1G page is non zero");
> +#endif
>         perf_test_set_wr_fract(vm, p->wr_fract);
>
>         guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
> @@ -222,6 +234,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         pr_info("Populate memory time: %ld.%.9lds\n",
>                 ts_diff.tv_sec, ts_diff.tv_nsec);
>
> +#ifdef __x86_64__
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> +                   "4K page is zero");
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)

This should also handle 2M hugetlb memory.
I think there might be a library function to translate backing src
type to page size too, which could make this check cleaner.

> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> +                           "2M page is zero");
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> +                           "1G page is zero");
> +#endif
>         /* Enable dirty logging */
>         clock_gettime(CLOCK_MONOTONIC, &start);
>         enable_dirty_logging(vm, p->slots);
> @@ -267,6 +289,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                                 iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
>                 }
>         }
> +#ifdef __x86_64__
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> +                   "4K page is zero after dirty logging");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> +                   "2M page is non-zero after dirty logging");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> +                   "1G page is non-zero after dirty logging");
> +#endif

Note this is after dirty logging has been enabled, AND all pages in
the data region have been written by the guest.

>
>         /* Disable dirty logging */
>         clock_gettime(CLOCK_MONOTONIC, &start);
> @@ -275,6 +305,28 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         pr_info("Disabling dirty logging time: %ld.%.9lds\n",
>                 ts_diff.tv_sec, ts_diff.tv_nsec);
>
> +#ifdef __x86_64__
> +       /*
> +        * Increment iteration to run the vcpus again to verify if huge pages
> +        * come back.
> +        */
> +       iteration++;
> +       pr_info("Starting the final iteration to verify page stats\n");
> +
> +       for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> +               while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id])
> +                      != iteration)
> +                       ;
> +       }

We might as well do this on all archs. Even without the stats, it at
least validates that disabling dirty logging doesn't break the VM.

> +
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> +                           "2M page is zero");
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> +                           "1G page is zero");
> +#endif
> +
>         /* Tell the vcpu thread to quit */
>         host_quit = true;
>         perf_test_join_vcpu_threads(nr_vcpus);
> --
> 2.35.1.894.gb6a874cedc-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()
  2022-03-21  0:26 ` [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Mingwei Zhang
@ 2022-03-21 17:56   ` Ben Gardon
  2022-03-22  4:28     ` Mingwei Zhang
  2022-03-21 22:00   ` David Matlack
  1 sibling, 1 reply; 15+ messages in thread
From: Ben Gardon @ 2022-03-21 17:56 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Matlack,
	Jing Zhang, Peter Xu, Ben Gardon

On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
>
> Add extra check to specify the case of nx hugepage and allow KVM to
> reconstruct large mapping after dirty logging is disabled. Existing code
> works only for nx hugepage but the condition is too general in that does
> not consider other usage case (such as dirty logging). Moreover, existing
> code assumes that a present PMD or PUD indicates that there exist 'smaller
> SPTEs' under the paging structure. This assumption may no be true if
> consider the zapping leafs only behavior in MMU.
>
> Missing the check causes KVM incorrectly regards the faulting page as a NX
> huge page and refuse to map it at desired level. And this leads to back
> performance in shadow mmu and potentiall TDP mmu.
>
> Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> Cc: stable@vger.kernel.org
>
> Reviewed-by: Ben Gardon <bgardon@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5628d0ba637e..4d358c273f6c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte)) {
> +               struct kvm_mmu_page *sp;
> +               u64 page_mask;
> +               /*
> +                * When nx hugepage flag is not set, there is no reason to
> +                * go down to another level. This helps demand paging to
> +                * generate large mappings.
> +                */

This comment is relevant to Google's internal demand paging scheme,
but isn't really relevant to UFFD demand paging.
Still, as demonstrated by the next commit, this is important for dirty
loggin, so I'd suggest updating this comment to refer to that instead.


> +               sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +               if (!sp->lpage_disallowed)
> +                       return;
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch)
>                  * and __direct_map would like to create a large PTE
> @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>                  * patching back for them into pfn the next 9 bits of
>                  * the address.
>                  */
> -               u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> -                               KVM_PAGES_PER_HPAGE(cur_level - 1);
> +               page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> +                       KVM_PAGES_PER_HPAGE(cur_level - 1);
>                 fault->pfn |= fault->gfn & page_mask;
>                 fault->goal_level--;
>         }
> --
> 2.35.1.894.gb6a874cedc-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly
  2022-03-21  0:26 ` [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly Mingwei Zhang
  2022-03-21 17:55   ` Ben Gardon
@ 2022-03-21 18:08   ` Ben Gardon
  2022-03-22  5:09     ` Mingwei Zhang
  1 sibling, 1 reply; 15+ messages in thread
From: Ben Gardon @ 2022-03-21 18:08 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Matlack,
	Jing Zhang, Peter Xu, Ben Gardon

On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
>
> When dirty logging is enabled, KVM splits the all hugepage mapping in
> NPT/EPT into the smallest 4K size. This property could be used to check if
> the page stats metrics work properly in KVM mmu. At the same time, this
> logic might be used the other way around: using page stats to verify if
> dirty logging really splits all huge pages. Moreover, when dirty logging is
> disabled, KVM zaps corresponding SPTEs and we could check whether the large
> pages come back when guest touches the pages again.
>
> So add page stats checking in dirty logging performance selftest. In
> particular, add checks in three locations:
>  - just after vm is created;
>  - after populating memory into vm but before enabling dirty logging;
>  - just after turning on dirty logging.
>  - after one final iteration after turning off dirty logging.
>
> Tested using commands:
>  - ./dirty_log_perf_test -s anonymous_hugetlb_1gb
>  - ./dirty_log_perf_test -s anonymous_thp
>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Jing Zhang <jingzhangos@google.com>
> Cc: Peter Xu <peterx@redhat.com>
>
> Suggested-by: Ben Gardon <bgorden@google.com>

Woops, got a mail bounce from this. Should be:
Suggested-by: Ben Gardon <bgardon@google.com>

> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  .../selftests/kvm/dirty_log_perf_test.c       | 52 +++++++++++++++++++
>  1 file changed, 52 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> index 1954b964d1cf..ab0457d91658 100644
> --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> @@ -19,6 +19,10 @@
>  #include "perf_test_util.h"
>  #include "guest_modes.h"
>
> +#ifdef __x86_64__
> +#include "processor.h"
> +#endif
> +
>  /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/
>  #define TEST_HOST_LOOP_N               2UL
>
> @@ -185,6 +189,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                                  p->slots, p->backing_src,
>                                  p->partition_vcpu_memory_access);
>
> +#ifdef __x86_64__
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") == 0,
> +                   "4K page is non zero");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> +                   "2M page is non zero");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> +                   "1G page is non zero");
> +#endif
>         perf_test_set_wr_fract(vm, p->wr_fract);
>
>         guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
> @@ -222,6 +234,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         pr_info("Populate memory time: %ld.%.9lds\n",
>                 ts_diff.tv_sec, ts_diff.tv_nsec);
>
> +#ifdef __x86_64__
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> +                   "4K page is zero");
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> +                           "2M page is zero");
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> +                           "1G page is zero");
> +#endif
>         /* Enable dirty logging */
>         clock_gettime(CLOCK_MONOTONIC, &start);
>         enable_dirty_logging(vm, p->slots);
> @@ -267,6 +289,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                                 iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
>                 }
>         }
> +#ifdef __x86_64__
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> +                   "4K page is zero after dirty logging");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> +                   "2M page is non-zero after dirty logging");
> +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> +                   "1G page is non-zero after dirty logging");
> +#endif
>
>         /* Disable dirty logging */
>         clock_gettime(CLOCK_MONOTONIC, &start);
> @@ -275,6 +305,28 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         pr_info("Disabling dirty logging time: %ld.%.9lds\n",
>                 ts_diff.tv_sec, ts_diff.tv_nsec);
>
> +#ifdef __x86_64__
> +       /*
> +        * Increment iteration to run the vcpus again to verify if huge pages
> +        * come back.
> +        */
> +       iteration++;
> +       pr_info("Starting the final iteration to verify page stats\n");
> +
> +       for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> +               while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id])
> +                      != iteration)
> +                       ;
> +       }
> +
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> +                           "2M page is zero");
> +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> +                           "1G page is zero");
> +#endif
> +
>         /* Tell the vcpu thread to quit */
>         host_quit = true;
>         perf_test_join_vcpu_threads(nr_vcpus);
> --
> 2.35.1.894.gb6a874cedc-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()
  2022-03-21  0:26 ` [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Mingwei Zhang
  2022-03-21 17:56   ` Ben Gardon
@ 2022-03-21 22:00   ` David Matlack
  2022-03-21 22:16     ` David Matlack
  1 sibling, 1 reply; 15+ messages in thread
From: David Matlack @ 2022-03-21 22:00 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML, Ben Gardon,
	Jing Zhang, Peter Xu, Ben Gardon

On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
>
> Add extra check to specify the case of nx hugepage and allow KVM to
> reconstruct large mapping after dirty logging is disabled. Existing code
> works only for nx hugepage but the condition is too general in that does
> not consider other usage case (such as dirty logging).

KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is
disabled. Why is that not sufficient?

> Moreover, existing
> code assumes that a present PMD or PUD indicates that there exist 'smaller
> SPTEs' under the paging structure. This assumption may no be true if
> consider the zapping leafs only behavior in MMU.

Good point. Although, that code just got reverted. Maybe say something like:

  This assumption may not be true in the future if KVM gains support
for zapping only leaf SPTEs.

>
> Missing the check causes KVM incorrectly regards the faulting page as a NX
> huge page and refuse to map it at desired level. And this leads to back
> performance in shadow mmu and potentiall TDP mmu.

s/potentiall/potentially/

>
> Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> Cc: stable@vger.kernel.org
>
> Reviewed-by: Ben Gardon <bgardon@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5628d0ba637e..4d358c273f6c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte)) {
> +               struct kvm_mmu_page *sp;
> +               u64 page_mask;
> +               /*
> +                * When nx hugepage flag is not set, there is no reason to
> +                * go down to another level. This helps demand paging to
> +                * generate large mappings.
> +                */
> +               sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +               if (!sp->lpage_disallowed)
> +                       return;
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch)
>                  * and __direct_map would like to create a large PTE
> @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>                  * patching back for them into pfn the next 9 bits of
>                  * the address.
>                  */
> -               u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> -                               KVM_PAGES_PER_HPAGE(cur_level - 1);
> +               page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> +                       KVM_PAGES_PER_HPAGE(cur_level - 1);
>                 fault->pfn |= fault->gfn & page_mask;
>                 fault->goal_level--;
>         }
> --
> 2.35.1.894.gb6a874cedc-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()
  2022-03-21 22:00   ` David Matlack
@ 2022-03-21 22:16     ` David Matlack
  2022-03-22  4:33       ` Mingwei Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: David Matlack @ 2022-03-21 22:16 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML, Ben Gardon,
	Jing Zhang, Peter Xu, Ben Gardon

On Mon, Mar 21, 2022 at 3:00 PM David Matlack <dmatlack@google.com> wrote:
>
> On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
> >
> > Add extra check to specify the case of nx hugepage and allow KVM to
> > reconstruct large mapping after dirty logging is disabled. Existing code
> > works only for nx hugepage but the condition is too general in that does
> > not consider other usage case (such as dirty logging).
>
> KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is
> disabled. Why is that not sufficient?

Ahh I see, kvm_mmu_zap_collapsible_sptes() only zaps the leaf SPTEs.
Could you add a blurb about this in the commit message for future
reference?

>
> > Moreover, existing
> > code assumes that a present PMD or PUD indicates that there exist 'smaller
> > SPTEs' under the paging structure. This assumption may no be true if
> > consider the zapping leafs only behavior in MMU.
>
> Good point. Although, that code just got reverted. Maybe say something like:
>
>   This assumption may not be true in the future if KVM gains support
> for zapping only leaf SPTEs.

Nevermind, support for zapping leaf SPTEs already exists for zapping
collapsible SPTEs.



>
> >
> > Missing the check causes KVM incorrectly regards the faulting page as a NX
> > huge page and refuse to map it at desired level. And this leads to back
> > performance in shadow mmu and potentiall TDP mmu.
>
> s/potentiall/potentially/
>
> >
> > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> > Cc: stable@vger.kernel.org
> >
> > Reviewed-by: Ben Gardon <bgardon@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 5628d0ba637e..4d358c273f6c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> >             cur_level == fault->goal_level &&
> >             is_shadow_present_pte(spte) &&
> >             !is_large_pte(spte)) {
> > +               struct kvm_mmu_page *sp;
> > +               u64 page_mask;
> > +               /*
> > +                * When nx hugepage flag is not set, there is no reason to
> > +                * go down to another level. This helps demand paging to
> > +                * generate large mappings.
> > +                */
> > +               sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> > +               if (!sp->lpage_disallowed)
> > +                       return;
> >                 /*
> >                  * A small SPTE exists for this pfn, but FNAME(fetch)
> >                  * and __direct_map would like to create a large PTE
> > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> >                  * patching back for them into pfn the next 9 bits of
> >                  * the address.
> >                  */
> > -               u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > -                               KVM_PAGES_PER_HPAGE(cur_level - 1);
> > +               page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > +                       KVM_PAGES_PER_HPAGE(cur_level - 1);
> >                 fault->pfn |= fault->gfn & page_mask;
> >                 fault->goal_level--;
> >         }
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()
  2022-03-21 17:56   ` Ben Gardon
@ 2022-03-22  4:28     ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-22  4:28 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Matlack,
	Jing Zhang, Peter Xu, Ben Gardon

On Mon, Mar 21, 2022, Ben Gardon wrote:
> On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
> >
> > Add extra check to specify the case of nx hugepage and allow KVM to
> > reconstruct large mapping after dirty logging is disabled. Existing code
> > works only for nx hugepage but the condition is too general in that does
> > not consider other usage case (such as dirty logging). Moreover, existing
> > code assumes that a present PMD or PUD indicates that there exist 'smaller
> > SPTEs' under the paging structure. This assumption may no be true if
> > consider the zapping leafs only behavior in MMU.
> >
> > Missing the check causes KVM incorrectly regards the faulting page as a NX
> > huge page and refuse to map it at desired level. And this leads to back
> > performance in shadow mmu and potentiall TDP mmu.
> >
> > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> > Cc: stable@vger.kernel.org
> >
> > Reviewed-by: Ben Gardon <bgardon@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 5628d0ba637e..4d358c273f6c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> >             cur_level == fault->goal_level &&
> >             is_shadow_present_pte(spte) &&
> >             !is_large_pte(spte)) {
> > +               struct kvm_mmu_page *sp;
> > +               u64 page_mask;
> > +               /*
> > +                * When nx hugepage flag is not set, there is no reason to
> > +                * go down to another level. This helps demand paging to
> > +                * generate large mappings.
> > +                */
> 
> This comment is relevant to Google's internal demand paging scheme,
> but isn't really relevant to UFFD demand paging.
> Still, as demonstrated by the next commit, this is important for dirty
> loggin, so I'd suggest updating this comment to refer to that instead.
> 

Ah, leaking my true motivation :-) Definitely will update the comment.

> > +               sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> > +               if (!sp->lpage_disallowed)
> > +                       return;
> >                 /*
> >                  * A small SPTE exists for this pfn, but FNAME(fetch)
> >                  * and __direct_map would like to create a large PTE
> > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> >                  * patching back for them into pfn the next 9 bits of
> >                  * the address.
> >                  */
> > -               u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > -                               KVM_PAGES_PER_HPAGE(cur_level - 1);
> > +               page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > +                       KVM_PAGES_PER_HPAGE(cur_level - 1);
> >                 fault->pfn |= fault->gfn & page_mask;
> >                 fault->goal_level--;
> >         }
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()
  2022-03-21 22:16     ` David Matlack
@ 2022-03-22  4:33       ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-22  4:33 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML, Ben Gardon,
	Jing Zhang, Peter Xu, Ben Gardon

On Mon, Mar 21, 2022, David Matlack wrote:
> On Mon, Mar 21, 2022 at 3:00 PM David Matlack <dmatlack@google.com> wrote:
> >
> > On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
> > >
> > > Add extra check to specify the case of nx hugepage and allow KVM to
> > > reconstruct large mapping after dirty logging is disabled. Existing code
> > > works only for nx hugepage but the condition is too general in that does
> > > not consider other usage case (such as dirty logging).
> >
> > KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is
> > disabled. Why is that not sufficient?
> 
> Ahh I see, kvm_mmu_zap_collapsible_sptes() only zaps the leaf SPTEs.
> Could you add a blurb about this in the commit message for future
> reference?
> 

will do.

> >
> > > Moreover, existing
> > > code assumes that a present PMD or PUD indicates that there exist 'smaller
> > > SPTEs' under the paging structure. This assumption may no be true if
> > > consider the zapping leafs only behavior in MMU.
> >
> > Good point. Although, that code just got reverted. Maybe say something like:
> >
> >   This assumption may not be true in the future if KVM gains support
> > for zapping only leaf SPTEs.
> 
> Nevermind, support for zapping leaf SPTEs already exists for zapping
> collapsible SPTEs.
>
> 
> 
> >
> > >
> > > Missing the check causes KVM incorrectly regards the faulting page as a NX
> > > huge page and refuse to map it at desired level. And this leads to back
> > > performance in shadow mmu and potentiall TDP mmu.
> >
> > s/potentiall/potentially/

Thanks for that.
> >
> > >
> > > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> > > Cc: stable@vger.kernel.org
> > >
> > > Reviewed-by: Ben Gardon <bgardon@google.com>
> > > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 5628d0ba637e..4d358c273f6c 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > >             cur_level == fault->goal_level &&
> > >             is_shadow_present_pte(spte) &&
> > >             !is_large_pte(spte)) {
> > > +               struct kvm_mmu_page *sp;
> > > +               u64 page_mask;
> > > +               /*
> > > +                * When nx hugepage flag is not set, there is no reason to
> > > +                * go down to another level. This helps demand paging to
> > > +                * generate large mappings.
> > > +                */
> > > +               sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> > > +               if (!sp->lpage_disallowed)
> > > +                       return;
> > >                 /*
> > >                  * A small SPTE exists for this pfn, but FNAME(fetch)
> > >                  * and __direct_map would like to create a large PTE
> > > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > >                  * patching back for them into pfn the next 9 bits of
> > >                  * the address.
> > >                  */
> > > -               u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > > -                               KVM_PAGES_PER_HPAGE(cur_level - 1);
> > > +               page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > > +                       KVM_PAGES_PER_HPAGE(cur_level - 1);
> > >                 fault->pfn |= fault->gfn & page_mask;
> > >                 fault->goal_level--;
> > >         }
> > > --
> > > 2.35.1.894.gb6a874cedc-goog
> > >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly
  2022-03-21 17:55   ` Ben Gardon
@ 2022-03-22  5:01     ` Mingwei Zhang
  2022-03-23 18:21     ` Mingwei Zhang
  1 sibling, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-22  5:01 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Matlack,
	Jing Zhang, Peter Xu, Ben Gardon

On Mon, Mar 21, 2022, Ben Gardon wrote:
> On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
> >
> > When dirty logging is enabled, KVM splits the all hugepage mapping in
> > NPT/EPT into the smallest 4K size. This property could be used to check if
> 
> Note this is only true if eager page splitting is enabled. It would be
> more accurate to say:
> "While dirty logging is enabled, KVM will re-map any accessed page in
> NPT/EPT at 4K."
> 
> > the page stats metrics work properly in KVM mmu. At the same time, this
> > logic might be used the other way around: using page stats to verify if
> > dirty logging really splits all huge pages. Moreover, when dirty logging is
> 
> It might be worth having a follow up commit which checks if eager
> splitting is enabled and changes the assertions accordingly.

So eager splitting is still pending for review, right? But yes, I can
add one after the feature get merged.

> 
> > disabled, KVM zaps corresponding SPTEs and we could check whether the large
> > pages come back when guest touches the pages again.
> >
> > So add page stats checking in dirty logging performance selftest. In
> > particular, add checks in three locations:
> >  - just after vm is created;
> >  - after populating memory into vm but before enabling dirty logging;
> >  - just after turning on dirty logging.
> 
> Note a key stage here is after dirty logging is enabled, and then the
> VM touches all the memory in the data region.
> I believe that's the point at which you're making the assertion that
> all mappings are 4k currently, which is the right place if eager
> splitting is not enabled.

Oh, sorry. This one should be after dirty logging is done, not 'just
after turning on dirty logging'. Will update it.

> 
> >  - after one final iteration after turning off dirty logging.
> >
> > Tested using commands:
> >  - ./dirty_log_perf_test -s anonymous_hugetlb_1gb
> >  - ./dirty_log_perf_test -s anonymous_thp
> >
> > Cc: Sean Christopherson <seanjc@google.com>
> > Cc: David Matlack <dmatlack@google.com>
> > Cc: Jing Zhang <jingzhangos@google.com>
> > Cc: Peter Xu <peterx@redhat.com>
> >
> > Suggested-by: Ben Gardon <bgorden@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  .../selftests/kvm/dirty_log_perf_test.c       | 52 +++++++++++++++++++
> >  1 file changed, 52 insertions(+)
> >
> > diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > index 1954b964d1cf..ab0457d91658 100644
> > --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > @@ -19,6 +19,10 @@
> >  #include "perf_test_util.h"
> >  #include "guest_modes.h"
> >
> > +#ifdef __x86_64__
> > +#include "processor.h"
> > +#endif
> > +
> >  /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/
> >  #define TEST_HOST_LOOP_N               2UL
> >
> > @@ -185,6 +189,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                                  p->slots, p->backing_src,
> >                                  p->partition_vcpu_memory_access);
> >
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") == 0,
> > +                   "4K page is non zero");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> > +                   "2M page is non zero");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> > +                   "1G page is non zero");
> > +#endif
> >         perf_test_set_wr_fract(vm, p->wr_fract);
> >
> >         guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
> > @@ -222,6 +234,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         pr_info("Populate memory time: %ld.%.9lds\n",
> >                 ts_diff.tv_sec, ts_diff.tv_nsec);
> >
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> > +                   "4K page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> 
> This should also handle 2M hugetlb memory.
> I think there might be a library function to translate backing src
> type to page size too, which could make this check cleaner.

Ack.
> 
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> > +                           "2M page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> > +                           "1G page is zero");
> > +#endif
> >         /* Enable dirty logging */
> >         clock_gettime(CLOCK_MONOTONIC, &start);
> >         enable_dirty_logging(vm, p->slots);
> > @@ -267,6 +289,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                                 iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
> >                 }
> >         }
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> > +                   "4K page is zero after dirty logging");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> > +                   "2M page is non-zero after dirty logging");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> > +                   "1G page is non-zero after dirty logging");
> > +#endif
> 
> Note this is after dirty logging has been enabled, AND all pages in
> the data region have been written by the guest.
> 
> >
> >         /* Disable dirty logging */
> >         clock_gettime(CLOCK_MONOTONIC, &start);
> > @@ -275,6 +305,28 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         pr_info("Disabling dirty logging time: %ld.%.9lds\n",
> >                 ts_diff.tv_sec, ts_diff.tv_nsec);
> >
> > +#ifdef __x86_64__
> > +       /*
> > +        * Increment iteration to run the vcpus again to verify if huge pages
> > +        * come back.
> > +        */
> > +       iteration++;
> > +       pr_info("Starting the final iteration to verify page stats\n");
> > +
> > +       for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> > +               while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id])
> > +                      != iteration)
> > +                       ;
> > +       }
> 
> We might as well do this on all archs. Even without the stats, it at
> least validates that disabling dirty logging doesn't break the VM.
> 
Ack.
> > +
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> > +                           "2M page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> > +                           "1G page is zero");
> > +#endif
> > +
> >         /* Tell the vcpu thread to quit */
> >         host_quit = true;
> >         perf_test_join_vcpu_threads(nr_vcpus);
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly
  2022-03-21 18:08   ` Ben Gardon
@ 2022-03-22  5:09     ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-22  5:09 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Matlack,
	Jing Zhang, Peter Xu, Ben Gardon

On Mon, Mar 21, 2022, Ben Gardon wrote:
> On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
> >
> > When dirty logging is enabled, KVM splits the all hugepage mapping in
> > NPT/EPT into the smallest 4K size. This property could be used to check if
> > the page stats metrics work properly in KVM mmu. At the same time, this
> > logic might be used the other way around: using page stats to verify if
> > dirty logging really splits all huge pages. Moreover, when dirty logging is
> > disabled, KVM zaps corresponding SPTEs and we could check whether the large
> > pages come back when guest touches the pages again.
> >
> > So add page stats checking in dirty logging performance selftest. In
> > particular, add checks in three locations:
> >  - just after vm is created;
> >  - after populating memory into vm but before enabling dirty logging;
> >  - just after turning on dirty logging.
> >  - after one final iteration after turning off dirty logging.
> >
> > Tested using commands:
> >  - ./dirty_log_perf_test -s anonymous_hugetlb_1gb
> >  - ./dirty_log_perf_test -s anonymous_thp
> >
> > Cc: Sean Christopherson <seanjc@google.com>
> > Cc: David Matlack <dmatlack@google.com>
> > Cc: Jing Zhang <jingzhangos@google.com>
> > Cc: Peter Xu <peterx@redhat.com>
> >
> > Suggested-by: Ben Gardon <bgorden@google.com>
> 
> Woops, got a mail bounce from this. Should be:
> Suggested-by: Ben Gardon <bgardon@google.com>
> 

Oh... sorry about that. Will discuss with you offline. Really want to
avoid this in the future.

> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  .../selftests/kvm/dirty_log_perf_test.c       | 52 +++++++++++++++++++
> >  1 file changed, 52 insertions(+)
> >
> > diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > index 1954b964d1cf..ab0457d91658 100644
> > --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > @@ -19,6 +19,10 @@
> >  #include "perf_test_util.h"
> >  #include "guest_modes.h"
> >
> > +#ifdef __x86_64__
> > +#include "processor.h"
> > +#endif
> > +
> >  /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/
> >  #define TEST_HOST_LOOP_N               2UL
> >
> > @@ -185,6 +189,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                                  p->slots, p->backing_src,
> >                                  p->partition_vcpu_memory_access);
> >
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") == 0,
> > +                   "4K page is non zero");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> > +                   "2M page is non zero");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> > +                   "1G page is non zero");
> > +#endif
> >         perf_test_set_wr_fract(vm, p->wr_fract);
> >
> >         guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
> > @@ -222,6 +234,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         pr_info("Populate memory time: %ld.%.9lds\n",
> >                 ts_diff.tv_sec, ts_diff.tv_nsec);
> >
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> > +                   "4K page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> > +                           "2M page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> > +                           "1G page is zero");
> > +#endif
> >         /* Enable dirty logging */
> >         clock_gettime(CLOCK_MONOTONIC, &start);
> >         enable_dirty_logging(vm, p->slots);
> > @@ -267,6 +289,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                                 iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
> >                 }
> >         }
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> > +                   "4K page is zero after dirty logging");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> > +                   "2M page is non-zero after dirty logging");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> > +                   "1G page is non-zero after dirty logging");
> > +#endif
> >
> >         /* Disable dirty logging */
> >         clock_gettime(CLOCK_MONOTONIC, &start);
> > @@ -275,6 +305,28 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         pr_info("Disabling dirty logging time: %ld.%.9lds\n",
> >                 ts_diff.tv_sec, ts_diff.tv_nsec);
> >
> > +#ifdef __x86_64__
> > +       /*
> > +        * Increment iteration to run the vcpus again to verify if huge pages
> > +        * come back.
> > +        */
> > +       iteration++;
> > +       pr_info("Starting the final iteration to verify page stats\n");
> > +
> > +       for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> > +               while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id])
> > +                      != iteration)
> > +                       ;
> > +       }
> > +
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> > +                           "2M page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> > +                           "1G page is zero");
> > +#endif
> > +
> >         /* Tell the vcpu thread to quit */
> >         host_quit = true;
> >         perf_test_join_vcpu_threads(nr_vcpus);
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly
  2022-03-21 17:55   ` Ben Gardon
  2022-03-22  5:01     ` Mingwei Zhang
@ 2022-03-23 18:21     ` Mingwei Zhang
  1 sibling, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2022-03-23 18:21 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Matlack,
	Jing Zhang, Peter Xu, Ben Gardon

On Mon, Mar 21, 2022, Ben Gardon wrote:
> On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote:
> >
> > When dirty logging is enabled, KVM splits the all hugepage mapping in
> > NPT/EPT into the smallest 4K size. This property could be used to check if
> 
> Note this is only true if eager page splitting is enabled. It would be
> more accurate to say:
> "While dirty logging is enabled, KVM will re-map any accessed page in
> NPT/EPT at 4K."
> 
> > the page stats metrics work properly in KVM mmu. At the same time, this
> > logic might be used the other way around: using page stats to verify if
> > dirty logging really splits all huge pages. Moreover, when dirty logging is
> 
> It might be worth having a follow up commit which checks if eager
> splitting is enabled and changes the assertions accordingly.
> 
> > disabled, KVM zaps corresponding SPTEs and we could check whether the large
> > pages come back when guest touches the pages again.
> >
> > So add page stats checking in dirty logging performance selftest. In
> > particular, add checks in three locations:
> >  - just after vm is created;
> >  - after populating memory into vm but before enabling dirty logging;
> >  - just after turning on dirty logging.
> 
> Note a key stage here is after dirty logging is enabled, and then the
> VM touches all the memory in the data region.
> I believe that's the point at which you're making the assertion that
> all mappings are 4k currently, which is the right place if eager
> splitting is not enabled.
> 
> >  - after one final iteration after turning off dirty logging.
> >
> > Tested using commands:
> >  - ./dirty_log_perf_test -s anonymous_hugetlb_1gb
> >  - ./dirty_log_perf_test -s anonymous_thp
> >
> > Cc: Sean Christopherson <seanjc@google.com>
> > Cc: David Matlack <dmatlack@google.com>
> > Cc: Jing Zhang <jingzhangos@google.com>
> > Cc: Peter Xu <peterx@redhat.com>
> >
> > Suggested-by: Ben Gardon <bgorden@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  .../selftests/kvm/dirty_log_perf_test.c       | 52 +++++++++++++++++++
> >  1 file changed, 52 insertions(+)
> >
> > diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > index 1954b964d1cf..ab0457d91658 100644
> > --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> > @@ -19,6 +19,10 @@
> >  #include "perf_test_util.h"
> >  #include "guest_modes.h"
> >
> > +#ifdef __x86_64__
> > +#include "processor.h"
> > +#endif
> > +
> >  /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/
> >  #define TEST_HOST_LOOP_N               2UL
> >
> > @@ -185,6 +189,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                                  p->slots, p->backing_src,
> >                                  p->partition_vcpu_memory_access);
> >
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") == 0,
> > +                   "4K page is non zero");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> > +                   "2M page is non zero");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> > +                   "1G page is non zero");
> > +#endif
> >         perf_test_set_wr_fract(vm, p->wr_fract);
> >
> >         guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
> > @@ -222,6 +234,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         pr_info("Populate memory time: %ld.%.9lds\n",
> >                 ts_diff.tv_sec, ts_diff.tv_nsec);
> >
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> > +                   "4K page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> 
> This should also handle 2M hugetlb memory.
> I think there might be a library function to translate backing src
> type to page size too, which could make this check cleaner.
> 
Just went through the selftest code again, it seems this logic a quite
x86 and there were no similar checks in other places. So I think I'll
just add another condition here for now.

> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> > +                           "2M page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> > +                           "1G page is zero");
> > +#endif
> >         /* Enable dirty logging */
> >         clock_gettime(CLOCK_MONOTONIC, &start);
> >         enable_dirty_logging(vm, p->slots);
> > @@ -267,6 +289,14 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                                 iteration, ts_diff.tv_sec, ts_diff.tv_nsec);
> >                 }
> >         }
> > +#ifdef __x86_64__
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_4k") != 0,
> > +                   "4K page is zero after dirty logging");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") == 0,
> > +                   "2M page is non-zero after dirty logging");
> > +       TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") == 0,
> > +                   "1G page is non-zero after dirty logging");
> > +#endif
> 
> Note this is after dirty logging has been enabled, AND all pages in
> the data region have been written by the guest.
> 
> >
> >         /* Disable dirty logging */
> >         clock_gettime(CLOCK_MONOTONIC, &start);
> > @@ -275,6 +305,28 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         pr_info("Disabling dirty logging time: %ld.%.9lds\n",
> >                 ts_diff.tv_sec, ts_diff.tv_nsec);
> >
> > +#ifdef __x86_64__
> > +       /*
> > +        * Increment iteration to run the vcpus again to verify if huge pages
> > +        * come back.
> > +        */
> > +       iteration++;
> > +       pr_info("Starting the final iteration to verify page stats\n");
> > +
> > +       for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> > +               while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id])
> > +                      != iteration)
> > +                       ;
> > +       }
> 
> We might as well do this on all archs. Even without the stats, it at
> least validates that disabling dirty logging doesn't break the VM.
> 
> > +
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_THP)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_2m") != 0,
> > +                           "2M page is zero");
> > +       if (p->backing_src == VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB)
> > +               TEST_ASSERT(vm_get_single_stat(vm, "pages_1g") != 0,
> > +                           "1G page is zero");
> > +#endif
> > +
> >         /* Tell the vcpu thread to quit */
> >         host_quit = true;
> >         perf_test_join_vcpu_threads(nr_vcpus);
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-03-23 18:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21  0:26 [PATCH 0/4] Verify dirty logging works properly with page stats Mingwei Zhang
2022-03-21  0:26 ` [PATCH 1/4] selftests: KVM: Dump VM stats in binary stats test Mingwei Zhang
2022-03-21  0:26 ` [PATCH 2/4] selftests: KVM: Test reading a single stat Mingwei Zhang
2022-03-21  0:26 ` [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Mingwei Zhang
2022-03-21 17:56   ` Ben Gardon
2022-03-22  4:28     ` Mingwei Zhang
2022-03-21 22:00   ` David Matlack
2022-03-21 22:16     ` David Matlack
2022-03-22  4:33       ` Mingwei Zhang
2022-03-21  0:26 ` [PATCH 4/4] selftests: KVM: use dirty logging to check if page stats work correctly Mingwei Zhang
2022-03-21 17:55   ` Ben Gardon
2022-03-22  5:01     ` Mingwei Zhang
2022-03-23 18:21     ` Mingwei Zhang
2022-03-21 18:08   ` Ben Gardon
2022-03-22  5:09     ` Mingwei Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).