[PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults
@ 2021-05-12 21:44 Axel Rasmussen
  2021-05-12 21:44 ` [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging Axel Rasmussen
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-12 21:44 UTC (permalink / raw)
  To: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Ben Gardon, Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang
  Cc: kvm, linux-kernel, linux-kselftest, Axel Rasmussen

Base
====

These patches are based upon Andrew Morton's v5.13-rc1-mmots-2021-05-10-22-15
tag. This is because this series depends on:

- UFFD minor fault support for hugetlbfs (in v5.13-rc1) [1]
- UFFD minor fault support for shmem (in Andrew's tree) [2]

[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/
[2] https://lore.kernel.org/patchwork/cover/1420967/

Overview
========

Minor fault handling is a new userfaultfd feature whose goal is generally to
improve performance. In particular, it is intended for use with demand paging.
There are more details in the cover letters for this new feature (linked above),
but at a high level the idea is that we think of these three phases of live
migration of a VM:

1. Precopy, where we copy "some" pages from the source to the target, while the
   VM is still running on the source machine.
2. Blackout, where execution stops on the source, and begins on the target.
3. Postcopy, where the VM is running on the target, some pages are already up
   to date, and others are not (because they weren't copied, or were modified
   after being copied).

During postcopy, the first time the guest touches memory, we intercept a minor
fault. Userspace checks whether or not the page is already up to date. If
needed, we copy the final version of the page from the soure machine. This
could be done with RDMA for example, to do it truly in place / with no copying.
At this point, all that's left is to setup PTEs for the guest: so we issue
UFFDIO_CONTINUE. No copying or page allocation needed.

Because of this use case, it's useful to exercise this as part of the demand
paging test. It lets us ensure the use case works correctly end-to-end, and also
gives us an in-tree way to profile the end-to-end flow for future performance
improvements.

Axel Rasmussen (5):
  KVM: selftests: allow different backing memory types for demand paging
  KVM: selftests: add shmem backing source type
  KVM: selftests: create alias mappings when using shared memory
  KVM: selftests: allow using UFFD minor faults for demand paging
  KVM: selftests: add shared hugetlbfs backing source type

 .../selftests/kvm/demand_paging_test.c        | 146 +++++++++++++-----
 .../testing/selftests/kvm/include/kvm_util.h  |   1 +
 .../testing/selftests/kvm/include/test_util.h |  11 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  79 +++++++++-
 .../selftests/kvm/lib/kvm_util_internal.h     |   2 +
 tools/testing/selftests/kvm/lib/test_util.c   |  46 ++++--
 6 files changed, 222 insertions(+), 63 deletions(-)

--
2.31.1.607.g51e8a6a459-goog

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging
  2021-05-12 21:44 [PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults Axel Rasmussen
@ 2021-05-12 21:44 ` Axel Rasmussen
  2021-05-13 21:32   ` Ben Gardon
  2021-05-12 21:44 ` [PATCH 2/5] KVM: selftests: add shmem backing source type Axel Rasmussen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-12 21:44 UTC (permalink / raw)
  To: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Ben Gardon, Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang
  Cc: kvm, linux-kernel, linux-kselftest, Axel Rasmussen

Add an argument which lets us specify a different backing memory type
for the test. The default is just to use anonymous, matching existing
behavior (if the argument is omitted).

This is in preparation for testing UFFD minor faults. For that, we need
to use a new backing memory type which is setup with MAP_SHARED.

This notably requires one other change. Perhaps counter-intuitively,
perf_test_args.host_page_size is the host's *native* page size, not the
size of the pages the host is using to back the guest. This means, if we
try to run the test with e.g. VM_MEM_SRC_ANONYMOUS_HUGETLB, we'll try to
do demand paging with 4k pages instead of 2M hugepages.

So, convert everything to use a new demand_paging_size, computed based
on the backing memory type.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 .../selftests/kvm/demand_paging_test.c        | 24 +++++++++++++------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index 5f7a229c3af1..10c7ba76a9c6 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -38,6 +38,7 @@
 
 static int nr_vcpus = 1;
 static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE;
+static size_t demand_paging_size;
 static char *guest_data_prototype;
 
 static void *vcpu_worker(void *data)
@@ -83,7 +84,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
 
 	copy.src = (uint64_t)guest_data_prototype;
 	copy.dst = addr;
-	copy.len = perf_test_args.host_page_size;
+	copy.len = demand_paging_size;
 	copy.mode = 0;
 
 	clock_gettime(CLOCK_MONOTONIC, &start);
@@ -100,7 +101,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
 	PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
 		       timespec_to_ns(ts_diff));
 	PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
-		       perf_test_args.host_page_size, addr, tid);
+		       demand_paging_size, addr, tid);
 
 	return 0;
 }
@@ -250,6 +251,7 @@ static int setup_demand_paging(struct kvm_vm *vm,
 struct test_params {
 	bool use_uffd;
 	useconds_t uffd_delay;
+	enum vm_mem_backing_src_type src_type;
 	bool partition_vcpu_memory_access;
 };
 
@@ -267,14 +269,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	int r;
 
 	vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
-				 VM_MEM_SRC_ANONYMOUS);
+				 p->src_type);
 
 	perf_test_args.wr_fract = 1;
 
-	guest_data_prototype = malloc(perf_test_args.host_page_size);
+	demand_paging_size = get_backing_src_pagesz(p->src_type);
+
+	guest_data_prototype = malloc(demand_paging_size);
 	TEST_ASSERT(guest_data_prototype,
 		    "Failed to allocate buffer for guest data pattern");
-	memset(guest_data_prototype, 0xAB, perf_test_args.host_page_size);
+	memset(guest_data_prototype, 0xAB, demand_paging_size);
 
 	vcpu_threads = malloc(nr_vcpus * sizeof(*vcpu_threads));
 	TEST_ASSERT(vcpu_threads, "Memory allocation failed");
@@ -388,7 +392,7 @@ static void help(char *name)
 {
 	puts("");
 	printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
-	       "          [-b memory] [-v vcpus] [-o]\n", name);
+	       "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
 	guest_modes_help();
 	printf(" -u: use User Fault FD to handle vCPU page\n"
 	       "     faults.\n");
@@ -398,6 +402,8 @@ static void help(char *name)
 	printf(" -b: specify the size of the memory region which should be\n"
 	       "     demand paged by each vCPU. e.g. 10M or 3G.\n"
 	       "     Default: 1G\n");
+	printf(" -t: The type of backing memory to use. Default: anonymous\n");
+	backing_src_help();
 	printf(" -v: specify the number of vCPUs to run.\n");
 	printf(" -o: Overlap guest memory accesses instead of partitioning\n"
 	       "     them into a separate region of memory for each vCPU.\n");
@@ -409,13 +415,14 @@ int main(int argc, char *argv[])
 {
 	int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
 	struct test_params p = {
+		.src_type = VM_MEM_SRC_ANONYMOUS,
 		.partition_vcpu_memory_access = true,
 	};
 	int opt;
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "hm:ud:b:v:o")) != -1) {
+	while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
 		switch (opt) {
 		case 'm':
 			guest_modes_cmdline(optarg);
@@ -430,6 +437,9 @@ int main(int argc, char *argv[])
 		case 'b':
 			guest_percpu_mem_size = parse_size(optarg);
 			break;
+		case 't':
+			p.src_type = parse_backing_src_type(optarg);
+			break;
 		case 'v':
 			nr_vcpus = atoi(optarg);
 			TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus,
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] KVM: selftests: add shmem backing source type
  2021-05-12 21:44 [PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults Axel Rasmussen
  2021-05-12 21:44 ` [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging Axel Rasmussen
@ 2021-05-12 21:44 ` Axel Rasmussen
  2021-05-13 21:46   ` Ben Gardon
  2021-05-12 21:45 ` [PATCH 3/5] KVM: selftests: create alias mappings when using shared memory Axel Rasmussen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-12 21:44 UTC (permalink / raw)
  To: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Ben Gardon, Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang
  Cc: kvm, linux-kernel, linux-kselftest, Axel Rasmussen

This lets us run the demand paging test on top of a shmem-backed area.
In follow-up commits, we'll 1) leverage this new capability to create an
alias mapping, and then 2) use the alias mapping to exercise UFFD minor
faults.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 .../testing/selftests/kvm/include/test_util.h |  1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    | 24 ++++++++---
 tools/testing/selftests/kvm/lib/test_util.c   | 40 +++++++++++--------
 3 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index fade3130eb01..7377f00469ef 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -84,6 +84,7 @@ enum vm_mem_backing_src_type {
 	VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB,
 	VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB,
 	VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
+	VM_MEM_SRC_SHMEM,
 	NUM_SRC_TYPES,
 };
 
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index fc83f6c5902d..6fbe124e0e16 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -663,8 +663,8 @@ int kvm_memcmp_hva_gva(void *hva, struct kvm_vm *vm, vm_vaddr_t gva, size_t len)
  *
  * Input Args:
  *   vm - Virtual Machine
- *   backing_src - Storage source for this region.
- *                 NULL to use anonymous memory.
+ *   src_type - Storage source for this region.
+ *              NULL to use anonymous memory.
  *   guest_paddr - Starting guest physical address
  *   slot - KVM region slot
  *   npages - Number of physical pages
@@ -755,11 +755,25 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
 	if (alignment > 1)
 		region->mmap_size += alignment;
 
+	region->fd = -1;
+	if (src_type == VM_MEM_SRC_SHMEM) {
+		region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC);
+		TEST_ASSERT(region->fd != -1,
+			    "memfd_create failed, errno: %i", errno);
+
+		ret = ftruncate(region->fd, region->mmap_size);
+		TEST_ASSERT(ret == 0, "ftruncate failed, errno: %i", errno);
+
+		ret = fallocate(region->fd,
+				FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
+				region->mmap_size);
+		TEST_ASSERT(ret == 0, "fallocate failed, errno: %i", errno);
+	}
+
 	region->mmap_start = mmap(NULL, region->mmap_size,
 				  PROT_READ | PROT_WRITE,
-				  MAP_PRIVATE | MAP_ANONYMOUS
-				  | vm_mem_backing_src_alias(src_type)->flag,
-				  -1, 0);
+				  vm_mem_backing_src_alias(src_type)->flag,
+				  region->fd, 0);
 	TEST_ASSERT(region->mmap_start != MAP_FAILED,
 		    "test_malloc failed, mmap_start: %p errno: %i",
 		    region->mmap_start, errno);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index 63d2bc7d757b..c7a265da5090 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -168,70 +168,77 @@ size_t get_def_hugetlb_pagesz(void)
 
 const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
 {
+	static const int anon_flags = MAP_PRIVATE | MAP_ANONYMOUS;
+	static const int anon_huge_flags = anon_flags | MAP_HUGETLB;
+
 	static const struct vm_mem_backing_src_alias aliases[] = {
 		[VM_MEM_SRC_ANONYMOUS] = {
 			.name = "anonymous",
-			.flag = 0,
+			.flag = anon_flags,
 		},
 		[VM_MEM_SRC_ANONYMOUS_THP] = {
 			.name = "anonymous_thp",
-			.flag = 0,
+			.flag = anon_flags,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB] = {
 			.name = "anonymous_hugetlb",
-			.flag = MAP_HUGETLB,
+			.flag = anon_huge_flags,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_16KB] = {
 			.name = "anonymous_hugetlb_16kb",
-			.flag = MAP_HUGETLB | MAP_HUGE_16KB,
+			.flag = anon_huge_flags | MAP_HUGE_16KB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_64KB] = {
 			.name = "anonymous_hugetlb_64kb",
-			.flag = MAP_HUGETLB | MAP_HUGE_64KB,
+			.flag = anon_huge_flags | MAP_HUGE_64KB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_512KB] = {
 			.name = "anonymous_hugetlb_512kb",
-			.flag = MAP_HUGETLB | MAP_HUGE_512KB,
+			.flag = anon_huge_flags | MAP_HUGE_512KB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_1MB] = {
 			.name = "anonymous_hugetlb_1mb",
-			.flag = MAP_HUGETLB | MAP_HUGE_1MB,
+			.flag = anon_huge_flags | MAP_HUGE_1MB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB] = {
 			.name = "anonymous_hugetlb_2mb",
-			.flag = MAP_HUGETLB | MAP_HUGE_2MB,
+			.flag = anon_huge_flags | MAP_HUGE_2MB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_8MB] = {
 			.name = "anonymous_hugetlb_8mb",
-			.flag = MAP_HUGETLB | MAP_HUGE_8MB,
+			.flag = anon_huge_flags | MAP_HUGE_8MB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_16MB] = {
 			.name = "anonymous_hugetlb_16mb",
-			.flag = MAP_HUGETLB | MAP_HUGE_16MB,
+			.flag = anon_huge_flags | MAP_HUGE_16MB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_32MB] = {
 			.name = "anonymous_hugetlb_32mb",
-			.flag = MAP_HUGETLB | MAP_HUGE_32MB,
+			.flag = anon_huge_flags | MAP_HUGE_32MB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_256MB] = {
 			.name = "anonymous_hugetlb_256mb",
-			.flag = MAP_HUGETLB | MAP_HUGE_256MB,
+			.flag = anon_huge_flags | MAP_HUGE_256MB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_512MB] = {
 			.name = "anonymous_hugetlb_512mb",
-			.flag = MAP_HUGETLB | MAP_HUGE_512MB,
+			.flag = anon_huge_flags | MAP_HUGE_512MB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB] = {
 			.name = "anonymous_hugetlb_1gb",
-			.flag = MAP_HUGETLB | MAP_HUGE_1GB,
+			.flag = anon_huge_flags | MAP_HUGE_1GB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB] = {
 			.name = "anonymous_hugetlb_2gb",
-			.flag = MAP_HUGETLB | MAP_HUGE_2GB,
+			.flag = anon_huge_flags | MAP_HUGE_2GB,
 		},
 		[VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB] = {
 			.name = "anonymous_hugetlb_16gb",
-			.flag = MAP_HUGETLB | MAP_HUGE_16GB,
+			.flag = anon_huge_flags | MAP_HUGE_16GB,
+		},
+		[VM_MEM_SRC_SHMEM] = {
+			.name = "shmem",
+			.flag = MAP_SHARED,
 		},
 	};
 	_Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
@@ -250,6 +257,7 @@ size_t get_backing_src_pagesz(uint32_t i)
 
 	switch (i) {
 	case VM_MEM_SRC_ANONYMOUS:
+	case VM_MEM_SRC_SHMEM:
 		return getpagesize();
 	case VM_MEM_SRC_ANONYMOUS_THP:
 		return get_trans_hugepagesz();
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] KVM: selftests: create alias mappings when using shared memory
  2021-05-12 21:44 [PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults Axel Rasmussen
  2021-05-12 21:44 ` [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging Axel Rasmussen
  2021-05-12 21:44 ` [PATCH 2/5] KVM: selftests: add shmem backing source type Axel Rasmussen
@ 2021-05-12 21:45 ` Axel Rasmussen
  2021-05-13 22:33   ` Ben Gardon
  2021-05-12 21:45 ` [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging Axel Rasmussen
  2021-05-12 21:45 ` [PATCH 5/5] KVM: selftests: add shared hugetlbfs backing source type Axel Rasmussen
  4 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-12 21:45 UTC (permalink / raw)
  To: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Ben Gardon, Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang
  Cc: kvm, linux-kernel, linux-kselftest, Axel Rasmussen

When a memory region is added with a src_type specifying that it should
use some kind of shared memory, also create an alias mapping to the same
underlying physical pages.

And, add an API so tests can get access to these alias addresses.
Basically, for a guest physical address, let us look up the analogous
host *alias* address.

In a future commit, we'll modify the demand paging test to take
advantage of this to exercise UFFD minor faults. The idea is, we
pre-fault the underlying pages *via the alias*. When the *guest*
faults, it gets a "minor" fault (PTEs don't exist yet, but a page is
already in the page cache). Then, the userfaultfd theads can handle the
fault: they could potentially modify the underlying memory *via the
alias* if they wanted to, and then they install the PTEs and let the
guest carry on via a UFFDIO_CONTINUE ioctl.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 .../testing/selftests/kvm/include/kvm_util.h  |  1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    | 50 +++++++++++++++++++
 .../selftests/kvm/lib/kvm_util_internal.h     |  2 +
 3 files changed, 53 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index a8f022794ce3..0624f25a6803 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -146,6 +146,7 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa);
 void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva);
 vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva);
+void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa);
 
 /*
  * Address Guest Virtual to Guest Physical
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 6fbe124e0e16..838d58633f7e 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -809,6 +809,19 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
 
 	/* Add to linked-list of memory regions. */
 	list_add(&region->list, &vm->userspace_mem_regions);
+
+	/* If shared memory, create an alias. */
+	if (region->fd >= 0) {
+		region->mmap_alias = mmap(NULL, region->mmap_size,
+					  PROT_READ | PROT_WRITE,
+					  vm_mem_backing_src_alias(src_type)->flag,
+					  region->fd, 0);
+		TEST_ASSERT(region->mmap_alias != MAP_FAILED,
+			    "mmap of alias failed, errno: %i", errno);
+
+		/* Align host alias address */
+		region->host_alias = align(region->mmap_alias, alignment);
+	}
 }
 
 /*
@@ -1237,6 +1250,43 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva)
 	return -1;
 }
 
+/*
+ * Address VM physical to Host Virtual *alias*.
+ *
+ * Input Args:
+ *   vm - Virtual Machine
+ *   gpa - VM physical address
+ *
+ * Output Args: None
+ *
+ * Return:
+ *   Equivalent address within the host virtual *alias* area, or NULL
+ *   (without failing the test) if the guest memory is not shared (so
+ *   no alias exists).
+ *
+ * When vm_create() and related functions are called with a shared memory
+ * src_type, we also create a writable, shared alias mapping of the
+ * underlying guest memory. This allows the host to manipulate guest memory,
+ * e.g. to implement demand paging.
+ */
+void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa)
+{
+	struct userspace_mem_region *region;
+
+	list_for_each_entry(region, &vm->userspace_mem_regions, list) {
+		if (!region->host_alias)
+			continue;
+
+		if ((gpa >= region->region.guest_phys_addr)
+			&& (gpa <= (region->region.guest_phys_addr
+				+ region->region.memory_size - 1)))
+			return (void *) ((uintptr_t) region->host_alias
+				+ (gpa - region->region.guest_phys_addr));
+	}
+
+	return NULL;
+}
+
 /*
  * VM Create IRQ Chip
  *
diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
index 91ce1b5d480b..a25af33d4a9c 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h
+++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
@@ -16,7 +16,9 @@ struct userspace_mem_region {
 	int fd;
 	off_t offset;
 	void *host_mem;
+	void *host_alias;
 	void *mmap_start;
+	void *mmap_alias;
 	size_t mmap_size;
 	struct list_head list;
 };
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging
  2021-05-12 21:44 [PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults Axel Rasmussen
                   ` (2 preceding siblings ...)
  2021-05-12 21:45 ` [PATCH 3/5] KVM: selftests: create alias mappings when using shared memory Axel Rasmussen
@ 2021-05-12 21:45 ` Axel Rasmussen
  2021-05-17 23:51   ` Ben Gardon
  2021-05-12 21:45 ` [PATCH 5/5] KVM: selftests: add shared hugetlbfs backing source type Axel Rasmussen
  4 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-12 21:45 UTC (permalink / raw)
  To: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Ben Gardon, Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang
  Cc: kvm, linux-kernel, linux-kselftest, Axel Rasmussen

UFFD handling of MINOR faults is a new feature whose use case is to
speed up demand paging (compared to MISSING faults). So, it's
interesting to let this selftest exercise this new mode.

Modify the demand paging test to have the option of using UFFD minor
faults, as opposed to missing faults. Now, when turning on userfaultfd
with '-u', the desired mode has to be specified ("MISSING" or "MINOR").

If we're in minor mode, before registering, prefault via the *alias*.
This way, the guest will trigger minor faults, instead of missing
faults, and we can UFFDIO_CONTINUE to resolve them.

Modify the page fault handler function to use the right ioctl depending
on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 .../selftests/kvm/demand_paging_test.c        | 124 ++++++++++++------
 1 file changed, 87 insertions(+), 37 deletions(-)

diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index 10c7ba76a9c6..ff29aaea3120 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -72,33 +72,57 @@ static void *vcpu_worker(void *data)
 	return NULL;
 }
 
-static int handle_uffd_page_request(int uffd, uint64_t addr)
+static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr)
 {
-	pid_t tid;
+	const char *ioctl_name;
+	pid_t tid = syscall(__NR_gettid);
 	struct timespec start;
 	struct timespec ts_diff;
-	struct uffdio_copy copy;
 	int r;
 
-	tid = syscall(__NR_gettid);
+	if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
+		struct uffdio_copy copy;
 
-	copy.src = (uint64_t)guest_data_prototype;
-	copy.dst = addr;
-	copy.len = demand_paging_size;
-	copy.mode = 0;
+		ioctl_name = "UFFDIO_COPY";
 
-	clock_gettime(CLOCK_MONOTONIC, &start);
+		copy.src = (uint64_t)guest_data_prototype;
+		copy.dst = addr;
+		copy.len = demand_paging_size;
+		copy.mode = 0;
 
-	r = ioctl(uffd, UFFDIO_COPY, &copy);
-	if (r == -1) {
-		pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
-			addr, tid, errno);
-		return r;
-	}
+		clock_gettime(CLOCK_MONOTONIC, &start);
 
-	ts_diff = timespec_elapsed(start);
+		r = ioctl(uffd, UFFDIO_COPY, &copy);
+		if (r == -1) {
+			pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
+				addr, tid, errno);
+			return r;
+		}
+
+		ts_diff = timespec_elapsed(start);
+	} else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
+		struct uffdio_continue cont = {0};
+
+		ioctl_name = "UFFDIO_CONTINUE";
+
+		cont.range.start = addr;
+		cont.range.len = demand_paging_size;
+
+		clock_gettime(CLOCK_MONOTONIC, &start);
+
+		r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
+		if (r == -1) {
+			pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
+				addr, tid, errno);
+			return r;
+		}
 
-	PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
+		ts_diff = timespec_elapsed(start);
+	} else {
+		TEST_FAIL("Invalid uffd mode %d", uffd_mode);
+	}
+
+	PER_PAGE_DEBUG("%s %d \t%ld ns\n", ioctl_name, tid,
 		       timespec_to_ns(ts_diff));
 	PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
 		       demand_paging_size, addr, tid);
@@ -109,6 +133,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
 bool quit_uffd_thread;
 
 struct uffd_handler_args {
+	int uffd_mode;
 	int uffd;
 	int pipefd;
 	useconds_t delay;
@@ -170,7 +195,7 @@ static void *uffd_handler_thread_fn(void *arg)
 		if (r == -1) {
 			if (errno == EAGAIN)
 				continue;
-			pr_info("Read of uffd gor errno %d", errno);
+			pr_info("Read of uffd got errno %d\n", errno);
 			return NULL;
 		}
 
@@ -185,7 +210,7 @@ static void *uffd_handler_thread_fn(void *arg)
 		if (delay)
 			usleep(delay);
 		addr =  msg.arg.pagefault.address;
-		r = handle_uffd_page_request(uffd, addr);
+		r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr);
 		if (r < 0)
 			return NULL;
 		pages++;
@@ -201,17 +226,32 @@ static void *uffd_handler_thread_fn(void *arg)
 
 static int setup_demand_paging(struct kvm_vm *vm,
 			       pthread_t *uffd_handler_thread, int pipefd,
+			       int uffd_mode,
 			       useconds_t uffd_delay,
 			       struct uffd_handler_args *uffd_args,
-			       void *hva, uint64_t len)
+			       void *hva, void *alias, uint64_t len)
 {
 	int uffd;
 	struct uffdio_api uffdio_api;
 	struct uffdio_register uffdio_register;
+	uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
+
+	/* In order to get minor faults, prefault via the alias. */
+	if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
+		size_t p;
+
+		expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
+
+		TEST_ASSERT(alias != NULL, "Alias required for minor faults");
+		for (p = 0; p < (len / demand_paging_size); ++p) {
+			memcpy(alias + (p * demand_paging_size),
+			       guest_data_prototype, demand_paging_size);
+		}
+	}
 
 	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
 	if (uffd == -1) {
-		pr_info("uffd creation failed\n");
+		pr_info("uffd creation failed, errno: %d\n", errno);
 		return -1;
 	}
 
@@ -224,18 +264,18 @@ static int setup_demand_paging(struct kvm_vm *vm,
 
 	uffdio_register.range.start = (uint64_t)hva;
 	uffdio_register.range.len = len;
-	uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
+	uffdio_register.mode = uffd_mode;
 	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
 		pr_info("ioctl uffdio_register failed\n");
 		return -1;
 	}
 
-	if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) !=
-			UFFD_API_RANGE_IOCTLS) {
-		pr_info("unexpected userfaultfd ioctl set\n");
+	if ((uffdio_register.ioctls & expected_ioctls) != expected_ioctls) {
+		pr_info("missing userfaultfd ioctls\n");
 		return -1;
 	}
 
+	uffd_args->uffd_mode = uffd_mode;
 	uffd_args->uffd = uffd;
 	uffd_args->pipefd = pipefd;
 	uffd_args->delay = uffd_delay;
@@ -249,7 +289,7 @@ static int setup_demand_paging(struct kvm_vm *vm,
 }
 
 struct test_params {
-	bool use_uffd;
+	int uffd_mode;
 	useconds_t uffd_delay;
 	enum vm_mem_backing_src_type src_type;
 	bool partition_vcpu_memory_access;
@@ -286,7 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size,
 			      p->partition_vcpu_memory_access);
 
-	if (p->use_uffd) {
+	if (p->uffd_mode) {
 		uffd_handler_threads =
 			malloc(nr_vcpus * sizeof(*uffd_handler_threads));
 		TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
@@ -300,6 +340,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
 			vm_paddr_t vcpu_gpa;
 			void *vcpu_hva;
+			void *vcpu_alias;
 			uint64_t vcpu_mem_size;
 
 
@@ -314,8 +355,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 			PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n",
 				       vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
 
-			/* Cache the HVA pointer of the region */
+			/* Cache the host addresses of the region */
 			vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
+			vcpu_alias = addr_gpa2alias(vm, vcpu_gpa);
 
 			/*
 			 * Set up user fault fd to handle demand paging
@@ -327,9 +369,10 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 
 			r = setup_demand_paging(vm,
 						&uffd_handler_threads[vcpu_id],
-						pipefds[vcpu_id * 2],
+						pipefds[vcpu_id * 2], p->uffd_mode,
 						p->uffd_delay, &uffd_args[vcpu_id],
-						vcpu_hva, vcpu_mem_size);
+						vcpu_hva, vcpu_alias,
+						vcpu_mem_size);
 			if (r < 0)
 				exit(-r);
 		}
@@ -359,7 +402,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 
 	pr_info("All vCPU threads joined\n");
 
-	if (p->use_uffd) {
+	if (p->uffd_mode) {
 		char c;
 
 		/* Tell the user fault fd handler threads to quit */
@@ -381,7 +424,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 
 	free(guest_data_prototype);
 	free(vcpu_threads);
-	if (p->use_uffd) {
+	if (p->uffd_mode) {
 		free(uffd_handler_threads);
 		free(uffd_args);
 		free(pipefds);
@@ -391,11 +434,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 static void help(char *name)
 {
 	puts("");
-	printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
+	printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
 	       "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
 	guest_modes_help();
-	printf(" -u: use User Fault FD to handle vCPU page\n"
-	       "     faults.\n");
+	printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
+	       "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
 	printf(" -d: add a delay in usec to the User Fault\n"
 	       "     FD handler to simulate demand paging\n"
 	       "     overheads. Ignored without -u.\n");
@@ -422,13 +465,17 @@ int main(int argc, char *argv[])
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
+	while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) {
 		switch (opt) {
 		case 'm':
 			guest_modes_cmdline(optarg);
 			break;
 		case 'u':
-			p.use_uffd = true;
+			if (!strcmp("MISSING", optarg))
+				p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
+			else if (!strcmp("MINOR", optarg))
+				p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
+			TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
 			break;
 		case 'd':
 			p.uffd_delay = strtoul(optarg, NULL, 0);
@@ -455,6 +502,9 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
+		    "userfaultfd MINOR mode requires shared memory; pick a different -t");
+
 	for_each_guest_mode(run_test, &p);
 
 	return 0;
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] KVM: selftests: add shared hugetlbfs backing source type
  2021-05-12 21:44 [PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults Axel Rasmussen
                   ` (3 preceding siblings ...)
  2021-05-12 21:45 ` [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging Axel Rasmussen
@ 2021-05-12 21:45 ` Axel Rasmussen
  2021-05-17 23:57   ` Ben Gardon
  4 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-12 21:45 UTC (permalink / raw)
  To: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Ben Gardon, Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang
  Cc: kvm, linux-kernel, linux-kselftest, Axel Rasmussen

This lets us run the demand paging test on top of a shared
hugetlbfs-backed area. The "shared" is key, as this allows us to
exercise userfaultfd minor faults on hugetlbfs.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/kvm/demand_paging_test.c |  6 ++++--
 tools/testing/selftests/kvm/include/test_util.h  | 10 ++++++++++
 tools/testing/selftests/kvm/lib/kvm_util.c       |  9 +++++++--
 tools/testing/selftests/kvm/lib/test_util.c      |  6 ++++++
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index ff29aaea3120..32942c9e0376 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -502,8 +502,10 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
-		    "userfaultfd MINOR mode requires shared memory; pick a different -t");
+	if (p.uffd_mode == UFFDIO_REGISTER_MODE_MINOR &&
+	    !backing_src_is_shared(p.src_type)) {
+		TEST_FAIL("userfaultfd MINOR mode requires shared memory; pick a different -t");
+	}
 
 	for_each_guest_mode(run_test, &p);
 
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index 7377f00469ef..852d6d2cc285 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -85,9 +85,19 @@ enum vm_mem_backing_src_type {
 	VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB,
 	VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
 	VM_MEM_SRC_SHMEM,
+	VM_MEM_SRC_SHARED_HUGETLB,
 	NUM_SRC_TYPES,
 };
 
+/*
+ * Whether or not the given source type is shared memory (as opposed to
+ * anonymous).
+ */
+static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t)
+{
+	return t == VM_MEM_SRC_SHMEM || t == VM_MEM_SRC_SHARED_HUGETLB;
+}
+
 struct vm_mem_backing_src_alias {
 	const char *name;
 	uint32_t flag;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 838d58633f7e..fed02153c919 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -756,8 +756,13 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
 		region->mmap_size += alignment;
 
 	region->fd = -1;
-	if (src_type == VM_MEM_SRC_SHMEM) {
-		region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC);
+	if (backing_src_is_shared(src_type)) {
+		int memfd_flags = MFD_CLOEXEC;
+
+		if (src_type == VM_MEM_SRC_SHARED_HUGETLB)
+			memfd_flags |= MFD_HUGETLB;
+
+		region->fd = memfd_create("kvm_selftest", memfd_flags);
 		TEST_ASSERT(region->fd != -1,
 			    "memfd_create failed, errno: %i", errno);
 
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index c7a265da5090..65fb8b43782c 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -240,6 +240,11 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
 			.name = "shmem",
 			.flag = MAP_SHARED,
 		},
+		[VM_MEM_SRC_SHARED_HUGETLB] = {
+			.name = "shared_hugetlb",
+			/* No MAP_HUGETLB, we use MFD_HUGETLB instead. */
+			.flag = MAP_SHARED,
+		},
 	};
 	_Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
 		       "Missing new backing src types?");
@@ -262,6 +267,7 @@ size_t get_backing_src_pagesz(uint32_t i)
 	case VM_MEM_SRC_ANONYMOUS_THP:
 		return get_trans_hugepagesz();
 	case VM_MEM_SRC_ANONYMOUS_HUGETLB:
+	case VM_MEM_SRC_SHARED_HUGETLB:
 		return get_def_hugetlb_pagesz();
 	default:
 		return MAP_HUGE_PAGE_SIZE(flag);
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging
  2021-05-12 21:44 ` [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging Axel Rasmussen
@ 2021-05-13 21:32   ` Ben Gardon
  2021-05-13 22:02     ` Axel Rasmussen
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Gardon @ 2021-05-13 21:32 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> Add an argument which lets us specify a different backing memory type
> for the test. The default is just to use anonymous, matching existing
> behavior (if the argument is omitted).
>
> This is in preparation for testing UFFD minor faults. For that, we need
> to use a new backing memory type which is setup with MAP_SHARED.
>
> This notably requires one other change. Perhaps counter-intuitively,
> perf_test_args.host_page_size is the host's *native* page size, not the
> size of the pages the host is using to back the guest. This means, if we
> try to run the test with e.g. VM_MEM_SRC_ANONYMOUS_HUGETLB, we'll try to
> do demand paging with 4k pages instead of 2M hugepages.

Would it make sense to factor this change out into another commit
preceding this one? Perhaps only worth it if you send a v2.

When you say "we'll try to do demand paging with 4k pages instead of
2M hugepages," what would that mean? Would we only copy 4k worth of
the contents of the 2M page in, leading to the guest seeing bad
memory? Do we have the capability to do demand paging at a smaller
granularity than the backing page size with UFFD?

Otherwise this patch looks reasonable to me. I'll try to review the
rest of your patches today / Monday.

>
> So, convert everything to use a new demand_paging_size, computed based
> on the backing memory type.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>  .../selftests/kvm/demand_paging_test.c        | 24 +++++++++++++------
>  1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> index 5f7a229c3af1..10c7ba76a9c6 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -38,6 +38,7 @@
>
>  static int nr_vcpus = 1;
>  static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE;
> +static size_t demand_paging_size;
>  static char *guest_data_prototype;
>
>  static void *vcpu_worker(void *data)
> @@ -83,7 +84,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
>
>         copy.src = (uint64_t)guest_data_prototype;
>         copy.dst = addr;
> -       copy.len = perf_test_args.host_page_size;
> +       copy.len = demand_paging_size;
>         copy.mode = 0;
>
>         clock_gettime(CLOCK_MONOTONIC, &start);
> @@ -100,7 +101,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
>         PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
>                        timespec_to_ns(ts_diff));
>         PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
> -                      perf_test_args.host_page_size, addr, tid);
> +                      demand_paging_size, addr, tid);
>
>         return 0;
>  }
> @@ -250,6 +251,7 @@ static int setup_demand_paging(struct kvm_vm *vm,
>  struct test_params {
>         bool use_uffd;
>         useconds_t uffd_delay;
> +       enum vm_mem_backing_src_type src_type;
>         bool partition_vcpu_memory_access;
>  };
>
> @@ -267,14 +269,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         int r;
>
>         vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
> -                                VM_MEM_SRC_ANONYMOUS);
> +                                p->src_type);
>
>         perf_test_args.wr_fract = 1;
>
> -       guest_data_prototype = malloc(perf_test_args.host_page_size);
> +       demand_paging_size = get_backing_src_pagesz(p->src_type);
> +
> +       guest_data_prototype = malloc(demand_paging_size);
>         TEST_ASSERT(guest_data_prototype,
>                     "Failed to allocate buffer for guest data pattern");
> -       memset(guest_data_prototype, 0xAB, perf_test_args.host_page_size);
> +       memset(guest_data_prototype, 0xAB, demand_paging_size);
>
>         vcpu_threads = malloc(nr_vcpus * sizeof(*vcpu_threads));
>         TEST_ASSERT(vcpu_threads, "Memory allocation failed");
> @@ -388,7 +392,7 @@ static void help(char *name)
>  {
>         puts("");
>         printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> -              "          [-b memory] [-v vcpus] [-o]\n", name);
> +              "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
>         guest_modes_help();
>         printf(" -u: use User Fault FD to handle vCPU page\n"
>                "     faults.\n");
> @@ -398,6 +402,8 @@ static void help(char *name)
>         printf(" -b: specify the size of the memory region which should be\n"
>                "     demand paged by each vCPU. e.g. 10M or 3G.\n"
>                "     Default: 1G\n");
> +       printf(" -t: The type of backing memory to use. Default: anonymous\n");
> +       backing_src_help();
>         printf(" -v: specify the number of vCPUs to run.\n");
>         printf(" -o: Overlap guest memory accesses instead of partitioning\n"
>                "     them into a separate region of memory for each vCPU.\n");
> @@ -409,13 +415,14 @@ int main(int argc, char *argv[])
>  {
>         int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
>         struct test_params p = {
> +               .src_type = VM_MEM_SRC_ANONYMOUS,
>                 .partition_vcpu_memory_access = true,
>         };
>         int opt;
>
>         guest_modes_append_default();
>
> -       while ((opt = getopt(argc, argv, "hm:ud:b:v:o")) != -1) {
> +       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
>                 switch (opt) {
>                 case 'm':
>                         guest_modes_cmdline(optarg);
> @@ -430,6 +437,9 @@ int main(int argc, char *argv[])
>                 case 'b':
>                         guest_percpu_mem_size = parse_size(optarg);
>                         break;
> +               case 't':
> +                       p.src_type = parse_backing_src_type(optarg);
> +                       break;
>                 case 'v':
>                         nr_vcpus = atoi(optarg);
>                         TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus,
> --
> 2.31.1.607.g51e8a6a459-goog
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] KVM: selftests: add shmem backing source type
  2021-05-12 21:44 ` [PATCH 2/5] KVM: selftests: add shmem backing source type Axel Rasmussen
@ 2021-05-13 21:46   ` Ben Gardon
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Gardon @ 2021-05-13 21:46 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> This lets us run the demand paging test on top of a shmem-backed area.
> In follow-up commits, we'll 1) leverage this new capability to create an
> alias mapping, and then 2) use the alias mapping to exercise UFFD minor
> faults.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>  .../testing/selftests/kvm/include/test_util.h |  1 +
>  tools/testing/selftests/kvm/lib/kvm_util.c    | 24 ++++++++---
>  tools/testing/selftests/kvm/lib/test_util.c   | 40 +++++++++++--------
>  3 files changed, 44 insertions(+), 21 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
> index fade3130eb01..7377f00469ef 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -84,6 +84,7 @@ enum vm_mem_backing_src_type {
>         VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB,
>         VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB,
>         VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
> +       VM_MEM_SRC_SHMEM,
>         NUM_SRC_TYPES,
>  };
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index fc83f6c5902d..6fbe124e0e16 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -663,8 +663,8 @@ int kvm_memcmp_hva_gva(void *hva, struct kvm_vm *vm, vm_vaddr_t gva, size_t len)
>   *
>   * Input Args:
>   *   vm - Virtual Machine
> - *   backing_src - Storage source for this region.
> - *                 NULL to use anonymous memory.
> + *   src_type - Storage source for this region.
> + *              NULL to use anonymous memory.

It would probably make sense to separate this rename out into a separate commit.

>   *   guest_paddr - Starting guest physical address
>   *   slot - KVM region slot
>   *   npages - Number of physical pages
> @@ -755,11 +755,25 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>         if (alignment > 1)
>                 region->mmap_size += alignment;
>
> +       region->fd = -1;
> +       if (src_type == VM_MEM_SRC_SHMEM) {
> +               region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC);
> +               TEST_ASSERT(region->fd != -1,
> +                           "memfd_create failed, errno: %i", errno);
> +
> +               ret = ftruncate(region->fd, region->mmap_size);
> +               TEST_ASSERT(ret == 0, "ftruncate failed, errno: %i", errno);
> +
> +               ret = fallocate(region->fd,
> +                               FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
> +                               region->mmap_size);
> +               TEST_ASSERT(ret == 0, "fallocate failed, errno: %i", errno);
> +       }
> +
>         region->mmap_start = mmap(NULL, region->mmap_size,
>                                   PROT_READ | PROT_WRITE,
> -                                 MAP_PRIVATE | MAP_ANONYMOUS
> -                                 | vm_mem_backing_src_alias(src_type)->flag,
> -                                 -1, 0);
> +                                 vm_mem_backing_src_alias(src_type)->flag,
> +                                 region->fd, 0);
>         TEST_ASSERT(region->mmap_start != MAP_FAILED,
>                     "test_malloc failed, mmap_start: %p errno: %i",
>                     region->mmap_start, errno);
> diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
> index 63d2bc7d757b..c7a265da5090 100644
> --- a/tools/testing/selftests/kvm/lib/test_util.c
> +++ b/tools/testing/selftests/kvm/lib/test_util.c
> @@ -168,70 +168,77 @@ size_t get_def_hugetlb_pagesz(void)
>
>  const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
>  {
> +       static const int anon_flags = MAP_PRIVATE | MAP_ANONYMOUS;
> +       static const int anon_huge_flags = anon_flags | MAP_HUGETLB;
> +
>         static const struct vm_mem_backing_src_alias aliases[] = {
>                 [VM_MEM_SRC_ANONYMOUS] = {
>                         .name = "anonymous",
> -                       .flag = 0,
> +                       .flag = anon_flags,

You could also do this refactoring in a separate commit, which would
make the change to support shmem quite small.

>                 },
>                 [VM_MEM_SRC_ANONYMOUS_THP] = {
>                         .name = "anonymous_thp",
> -                       .flag = 0,
> +                       .flag = anon_flags,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB] = {
>                         .name = "anonymous_hugetlb",
> -                       .flag = MAP_HUGETLB,
> +                       .flag = anon_huge_flags,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_16KB] = {
>                         .name = "anonymous_hugetlb_16kb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_16KB,
> +                       .flag = anon_huge_flags | MAP_HUGE_16KB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_64KB] = {
>                         .name = "anonymous_hugetlb_64kb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_64KB,
> +                       .flag = anon_huge_flags | MAP_HUGE_64KB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_512KB] = {
>                         .name = "anonymous_hugetlb_512kb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_512KB,
> +                       .flag = anon_huge_flags | MAP_HUGE_512KB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_1MB] = {
>                         .name = "anonymous_hugetlb_1mb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_1MB,
> +                       .flag = anon_huge_flags | MAP_HUGE_1MB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB] = {
>                         .name = "anonymous_hugetlb_2mb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_2MB,
> +                       .flag = anon_huge_flags | MAP_HUGE_2MB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_8MB] = {
>                         .name = "anonymous_hugetlb_8mb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_8MB,
> +                       .flag = anon_huge_flags | MAP_HUGE_8MB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_16MB] = {
>                         .name = "anonymous_hugetlb_16mb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_16MB,
> +                       .flag = anon_huge_flags | MAP_HUGE_16MB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_32MB] = {
>                         .name = "anonymous_hugetlb_32mb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_32MB,
> +                       .flag = anon_huge_flags | MAP_HUGE_32MB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_256MB] = {
>                         .name = "anonymous_hugetlb_256mb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_256MB,
> +                       .flag = anon_huge_flags | MAP_HUGE_256MB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_512MB] = {
>                         .name = "anonymous_hugetlb_512mb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_512MB,
> +                       .flag = anon_huge_flags | MAP_HUGE_512MB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB] = {
>                         .name = "anonymous_hugetlb_1gb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_1GB,
> +                       .flag = anon_huge_flags | MAP_HUGE_1GB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB] = {
>                         .name = "anonymous_hugetlb_2gb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_2GB,
> +                       .flag = anon_huge_flags | MAP_HUGE_2GB,
>                 },
>                 [VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB] = {
>                         .name = "anonymous_hugetlb_16gb",
> -                       .flag = MAP_HUGETLB | MAP_HUGE_16GB,
> +                       .flag = anon_huge_flags | MAP_HUGE_16GB,
> +               },
> +               [VM_MEM_SRC_SHMEM] = {
> +                       .name = "shmem",
> +                       .flag = MAP_SHARED,
>                 },
>         };
>         _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
> @@ -250,6 +257,7 @@ size_t get_backing_src_pagesz(uint32_t i)
>
>         switch (i) {
>         case VM_MEM_SRC_ANONYMOUS:
> +       case VM_MEM_SRC_SHMEM:
>                 return getpagesize();
>         case VM_MEM_SRC_ANONYMOUS_THP:
>                 return get_trans_hugepagesz();
> --
> 2.31.1.607.g51e8a6a459-goog
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging
  2021-05-13 21:32   ` Ben Gardon
@ 2021-05-13 22:02     ` Axel Rasmussen
  0 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-13 22:02 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Thu, May 13, 2021 at 2:32 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
> >
> > Add an argument which lets us specify a different backing memory type
> > for the test. The default is just to use anonymous, matching existing
> > behavior (if the argument is omitted).
> >
> > This is in preparation for testing UFFD minor faults. For that, we need
> > to use a new backing memory type which is setup with MAP_SHARED.
> >
> > This notably requires one other change. Perhaps counter-intuitively,
> > perf_test_args.host_page_size is the host's *native* page size, not the
> > size of the pages the host is using to back the guest. This means, if we
> > try to run the test with e.g. VM_MEM_SRC_ANONYMOUS_HUGETLB, we'll try to
> > do demand paging with 4k pages instead of 2M hugepages.
>
> Would it make sense to factor this change out into another commit
> preceding this one? Perhaps only worth it if you send a v2.
>
> When you say "we'll try to do demand paging with 4k pages instead of
> 2M hugepages," what would that mean? Would we only copy 4k worth of
> the contents of the 2M page in, leading to the guest seeing bad
> memory? Do we have the capability to do demand paging at a smaller
> granularity than the backing page size with UFFD?

Basically I think the existing behavior is to always use 4k - so we
UFFDIO_COPY that size at a time.

This is totally fine for anonymous, or even for THPs. But, once we're
using hugetlbfs, the UFFDIO_COPY ioctl will just fail unless we do 2M
(or some multiple thereof) at a time. If memory serves it returns
-EINVAL in this case.

It wouldn't be so much trouble to factor it out, so just let me know
what is preferred. For now, I'll do it if a v2 is needed for other
reasons.

>
> Otherwise this patch looks reasonable to me. I'll try to review the
> rest of your patches today / Monday.
>
> >
> > So, convert everything to use a new demand_paging_size, computed based
> > on the backing memory type.
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > ---
> >  .../selftests/kvm/demand_paging_test.c        | 24 +++++++++++++------
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> > index 5f7a229c3af1..10c7ba76a9c6 100644
> > --- a/tools/testing/selftests/kvm/demand_paging_test.c
> > +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> > @@ -38,6 +38,7 @@
> >
> >  static int nr_vcpus = 1;
> >  static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE;
> > +static size_t demand_paging_size;
> >  static char *guest_data_prototype;
> >
> >  static void *vcpu_worker(void *data)
> > @@ -83,7 +84,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
> >
> >         copy.src = (uint64_t)guest_data_prototype;
> >         copy.dst = addr;
> > -       copy.len = perf_test_args.host_page_size;
> > +       copy.len = demand_paging_size;
> >         copy.mode = 0;
> >
> >         clock_gettime(CLOCK_MONOTONIC, &start);
> > @@ -100,7 +101,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
> >         PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
> >                        timespec_to_ns(ts_diff));
> >         PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
> > -                      perf_test_args.host_page_size, addr, tid);
> > +                      demand_paging_size, addr, tid);
> >
> >         return 0;
> >  }
> > @@ -250,6 +251,7 @@ static int setup_demand_paging(struct kvm_vm *vm,
> >  struct test_params {
> >         bool use_uffd;
> >         useconds_t uffd_delay;
> > +       enum vm_mem_backing_src_type src_type;
> >         bool partition_vcpu_memory_access;
> >  };
> >
> > @@ -267,14 +269,16 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         int r;
> >
> >         vm = perf_test_create_vm(mode, nr_vcpus, guest_percpu_mem_size,
> > -                                VM_MEM_SRC_ANONYMOUS);
> > +                                p->src_type);
> >
> >         perf_test_args.wr_fract = 1;
> >
> > -       guest_data_prototype = malloc(perf_test_args.host_page_size);
> > +       demand_paging_size = get_backing_src_pagesz(p->src_type);
> > +
> > +       guest_data_prototype = malloc(demand_paging_size);
> >         TEST_ASSERT(guest_data_prototype,
> >                     "Failed to allocate buffer for guest data pattern");
> > -       memset(guest_data_prototype, 0xAB, perf_test_args.host_page_size);
> > +       memset(guest_data_prototype, 0xAB, demand_paging_size);
> >
> >         vcpu_threads = malloc(nr_vcpus * sizeof(*vcpu_threads));
> >         TEST_ASSERT(vcpu_threads, "Memory allocation failed");
> > @@ -388,7 +392,7 @@ static void help(char *name)
> >  {
> >         puts("");
> >         printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> > -              "          [-b memory] [-v vcpus] [-o]\n", name);
> > +              "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
> >         guest_modes_help();
> >         printf(" -u: use User Fault FD to handle vCPU page\n"
> >                "     faults.\n");
> > @@ -398,6 +402,8 @@ static void help(char *name)
> >         printf(" -b: specify the size of the memory region which should be\n"
> >                "     demand paged by each vCPU. e.g. 10M or 3G.\n"
> >                "     Default: 1G\n");
> > +       printf(" -t: The type of backing memory to use. Default: anonymous\n");
> > +       backing_src_help();
> >         printf(" -v: specify the number of vCPUs to run.\n");
> >         printf(" -o: Overlap guest memory accesses instead of partitioning\n"
> >                "     them into a separate region of memory for each vCPU.\n");
> > @@ -409,13 +415,14 @@ int main(int argc, char *argv[])
> >  {
> >         int max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
> >         struct test_params p = {
> > +               .src_type = VM_MEM_SRC_ANONYMOUS,
> >                 .partition_vcpu_memory_access = true,
> >         };
> >         int opt;
> >
> >         guest_modes_append_default();
> >
> > -       while ((opt = getopt(argc, argv, "hm:ud:b:v:o")) != -1) {
> > +       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
> >                 switch (opt) {
> >                 case 'm':
> >                         guest_modes_cmdline(optarg);
> > @@ -430,6 +437,9 @@ int main(int argc, char *argv[])
> >                 case 'b':
> >                         guest_percpu_mem_size = parse_size(optarg);
> >                         break;
> > +               case 't':
> > +                       p.src_type = parse_backing_src_type(optarg);
> > +                       break;
> >                 case 'v':
> >                         nr_vcpus = atoi(optarg);
> >                         TEST_ASSERT(nr_vcpus > 0 && nr_vcpus <= max_vcpus,
> > --
> > 2.31.1.607.g51e8a6a459-goog
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] KVM: selftests: create alias mappings when using shared memory
  2021-05-12 21:45 ` [PATCH 3/5] KVM: selftests: create alias mappings when using shared memory Axel Rasmussen
@ 2021-05-13 22:33   ` Ben Gardon
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Gardon @ 2021-05-13 22:33 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> When a memory region is added with a src_type specifying that it should
> use some kind of shared memory, also create an alias mapping to the same
> underlying physical pages.
>
> And, add an API so tests can get access to these alias addresses.
> Basically, for a guest physical address, let us look up the analogous
> host *alias* address.
>
> In a future commit, we'll modify the demand paging test to take
> advantage of this to exercise UFFD minor faults. The idea is, we
> pre-fault the underlying pages *via the alias*. When the *guest*
> faults, it gets a "minor" fault (PTEs don't exist yet, but a page is
> already in the page cache). Then, the userfaultfd theads can handle the
> fault: they could potentially modify the underlying memory *via the
> alias* if they wanted to, and then they install the PTEs and let the
> guest carry on via a UFFDIO_CONTINUE ioctl.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  .../testing/selftests/kvm/include/kvm_util.h  |  1 +
>  tools/testing/selftests/kvm/lib/kvm_util.c    | 50 +++++++++++++++++++
>  .../selftests/kvm/lib/kvm_util_internal.h     |  2 +
>  3 files changed, 53 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index a8f022794ce3..0624f25a6803 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -146,6 +146,7 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
>  void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa);
>  void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva);
>  vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva);
> +void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa);
>
>  /*
>   * Address Guest Virtual to Guest Physical
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 6fbe124e0e16..838d58633f7e 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -809,6 +809,19 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>
>         /* Add to linked-list of memory regions. */
>         list_add(&region->list, &vm->userspace_mem_regions);
> +
> +       /* If shared memory, create an alias. */
> +       if (region->fd >= 0) {
> +               region->mmap_alias = mmap(NULL, region->mmap_size,
> +                                         PROT_READ | PROT_WRITE,
> +                                         vm_mem_backing_src_alias(src_type)->flag,
> +                                         region->fd, 0);
> +               TEST_ASSERT(region->mmap_alias != MAP_FAILED,
> +                           "mmap of alias failed, errno: %i", errno);
> +
> +               /* Align host alias address */
> +               region->host_alias = align(region->mmap_alias, alignment);
> +       }
>  }
>
>  /*
> @@ -1237,6 +1250,43 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva)
>         return -1;
>  }
>
> +/*
> + * Address VM physical to Host Virtual *alias*.
> + *
> + * Input Args:
> + *   vm - Virtual Machine
> + *   gpa - VM physical address
> + *
> + * Output Args: None
> + *
> + * Return:
> + *   Equivalent address within the host virtual *alias* area, or NULL
> + *   (without failing the test) if the guest memory is not shared (so
> + *   no alias exists).
> + *
> + * When vm_create() and related functions are called with a shared memory
> + * src_type, we also create a writable, shared alias mapping of the
> + * underlying guest memory. This allows the host to manipulate guest memory,
> + * e.g. to implement demand paging.

I would amend this to: "This allows the host to manipulate guest
memory without mapping that memory in the guest's address space," or
something to that effect.

> + */
> +void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa)
> +{
> +       struct userspace_mem_region *region;
> +
> +       list_for_each_entry(region, &vm->userspace_mem_regions, list) {
> +               if (!region->host_alias)
> +                       continue;
> +
> +               if ((gpa >= region->region.guest_phys_addr)
> +                       && (gpa <= (region->region.guest_phys_addr
> +                               + region->region.memory_size - 1)))
> +                       return (void *) ((uintptr_t) region->host_alias
> +                               + (gpa - region->region.guest_phys_addr));
> +       }
> +
> +       return NULL;
> +}
> +
>  /*
>   * VM Create IRQ Chip
>   *
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> index 91ce1b5d480b..a25af33d4a9c 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> @@ -16,7 +16,9 @@ struct userspace_mem_region {
>         int fd;
>         off_t offset;
>         void *host_mem;
> +       void *host_alias;
>         void *mmap_start;
> +       void *mmap_alias;
>         size_t mmap_size;
>         struct list_head list;
>  };
> --
> 2.31.1.607.g51e8a6a459-goog
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging
  2021-05-12 21:45 ` [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging Axel Rasmussen
@ 2021-05-17 23:51   ` Ben Gardon
  2021-05-18 18:17     ` Axel Rasmussen
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Gardon @ 2021-05-17 23:51 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> UFFD handling of MINOR faults is a new feature whose use case is to
> speed up demand paging (compared to MISSING faults). So, it's
> interesting to let this selftest exercise this new mode.
>
> Modify the demand paging test to have the option of using UFFD minor
> faults, as opposed to missing faults. Now, when turning on userfaultfd
> with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
>
> If we're in minor mode, before registering, prefault via the *alias*.
> This way, the guest will trigger minor faults, instead of missing
> faults, and we can UFFDIO_CONTINUE to resolve them.
>
> Modify the page fault handler function to use the right ioctl depending
> on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>  .../selftests/kvm/demand_paging_test.c        | 124 ++++++++++++------
>  1 file changed, 87 insertions(+), 37 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> index 10c7ba76a9c6..ff29aaea3120 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -72,33 +72,57 @@ static void *vcpu_worker(void *data)
>         return NULL;
>  }
>
> -static int handle_uffd_page_request(int uffd, uint64_t addr)
> +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr)
>  {
> -       pid_t tid;
> +       const char *ioctl_name;
> +       pid_t tid = syscall(__NR_gettid);
>         struct timespec start;
>         struct timespec ts_diff;
> -       struct uffdio_copy copy;
>         int r;
>
> -       tid = syscall(__NR_gettid);
> +       if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
> +               struct uffdio_copy copy;
>
> -       copy.src = (uint64_t)guest_data_prototype;
> -       copy.dst = addr;
> -       copy.len = demand_paging_size;
> -       copy.mode = 0;
> +               ioctl_name = "UFFDIO_COPY";
>
> -       clock_gettime(CLOCK_MONOTONIC, &start);
> +               copy.src = (uint64_t)guest_data_prototype;
> +               copy.dst = addr;
> +               copy.len = demand_paging_size;
> +               copy.mode = 0;
>
> -       r = ioctl(uffd, UFFDIO_COPY, &copy);
> -       if (r == -1) {
> -               pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
> -                       addr, tid, errno);
> -               return r;
> -       }
> +               clock_gettime(CLOCK_MONOTONIC, &start);

Nit: It'd probably be fine to factor the timing calls out of the if
statement to deduplicate.

>
> -       ts_diff = timespec_elapsed(start);
> +               r = ioctl(uffd, UFFDIO_COPY, &copy);
> +               if (r == -1) {
> +                       pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
> +                               addr, tid, errno);
> +                       return r;
> +               }
> +
> +               ts_diff = timespec_elapsed(start);
> +       } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> +               struct uffdio_continue cont = {0};
> +
> +               ioctl_name = "UFFDIO_CONTINUE";
> +
> +               cont.range.start = addr;
> +               cont.range.len = demand_paging_size;
> +
> +               clock_gettime(CLOCK_MONOTONIC, &start);
> +
> +               r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
> +               if (r == -1) {
> +                       pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
> +                               addr, tid, errno);
> +                       return r;
> +               }
>
> -       PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
> +               ts_diff = timespec_elapsed(start);
> +       } else {
> +               TEST_FAIL("Invalid uffd mode %d", uffd_mode);
> +       }
> +
> +       PER_PAGE_DEBUG("%s %d \t%ld ns\n", ioctl_name, tid,
>                        timespec_to_ns(ts_diff));

As far as I can see this is the only use of ioctl_name and it's not
going to change in a test run, so it might make sense to not print the
ioctl name here and just do it once somewhere else.

>         PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
>                        demand_paging_size, addr, tid);
> @@ -109,6 +133,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
>  bool quit_uffd_thread;
>
>  struct uffd_handler_args {
> +       int uffd_mode;
>         int uffd;
>         int pipefd;
>         useconds_t delay;
> @@ -170,7 +195,7 @@ static void *uffd_handler_thread_fn(void *arg)
>                 if (r == -1) {
>                         if (errno == EAGAIN)
>                                 continue;
> -                       pr_info("Read of uffd gor errno %d", errno);
> +                       pr_info("Read of uffd got errno %d\n", errno);

If you end up doing some kind of cleanups patch, it might be worth
moving this in there.

>                         return NULL;
>                 }
>
> @@ -185,7 +210,7 @@ static void *uffd_handler_thread_fn(void *arg)
>                 if (delay)
>                         usleep(delay);
>                 addr =  msg.arg.pagefault.address;
> -               r = handle_uffd_page_request(uffd, addr);
> +               r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr);
>                 if (r < 0)
>                         return NULL;
>                 pages++;
> @@ -201,17 +226,32 @@ static void *uffd_handler_thread_fn(void *arg)
>
>  static int setup_demand_paging(struct kvm_vm *vm,
>                                pthread_t *uffd_handler_thread, int pipefd,
> +                              int uffd_mode,
>                                useconds_t uffd_delay,
>                                struct uffd_handler_args *uffd_args,
> -                              void *hva, uint64_t len)
> +                              void *hva, void *alias, uint64_t len)
>  {
>         int uffd;
>         struct uffdio_api uffdio_api;
>         struct uffdio_register uffdio_register;
> +       uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
> +
> +       /* In order to get minor faults, prefault via the alias. */
> +       if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> +               size_t p;
> +
> +               expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
> +
> +               TEST_ASSERT(alias != NULL, "Alias required for minor faults");
> +               for (p = 0; p < (len / demand_paging_size); ++p) {
> +                       memcpy(alias + (p * demand_paging_size),
> +                              guest_data_prototype, demand_paging_size);
> +               }
> +       }
>
>         uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
>         if (uffd == -1) {
> -               pr_info("uffd creation failed\n");
> +               pr_info("uffd creation failed, errno: %d\n", errno);
>                 return -1;
>         }

Huh, I wonder why I put all these return -1 statements in here. The
caller just does exit(-r) if r < 0. Seems like these could all just be
converted to TEST_ASSERTs.

>
> @@ -224,18 +264,18 @@ static int setup_demand_paging(struct kvm_vm *vm,
>
>         uffdio_register.range.start = (uint64_t)hva;
>         uffdio_register.range.len = len;
> -       uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
> +       uffdio_register.mode = uffd_mode;
>         if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
>                 pr_info("ioctl uffdio_register failed\n");
>                 return -1;
>         }
>
> -       if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) !=
> -                       UFFD_API_RANGE_IOCTLS) {
> -               pr_info("unexpected userfaultfd ioctl set\n");
> +       if ((uffdio_register.ioctls & expected_ioctls) != expected_ioctls) {
> +               pr_info("missing userfaultfd ioctls\n");
>                 return -1;
>         }
>
> +       uffd_args->uffd_mode = uffd_mode;
>         uffd_args->uffd = uffd;
>         uffd_args->pipefd = pipefd;
>         uffd_args->delay = uffd_delay;
> @@ -249,7 +289,7 @@ static int setup_demand_paging(struct kvm_vm *vm,
>  }
>
>  struct test_params {
> -       bool use_uffd;
> +       int uffd_mode;
>         useconds_t uffd_delay;
>         enum vm_mem_backing_src_type src_type;
>         bool partition_vcpu_memory_access;
> @@ -286,7 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>         perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size,
>                               p->partition_vcpu_memory_access);
>
> -       if (p->use_uffd) {
> +       if (p->uffd_mode) {
>                 uffd_handler_threads =
>                         malloc(nr_vcpus * sizeof(*uffd_handler_threads));
>                 TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
> @@ -300,6 +340,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                 for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
>                         vm_paddr_t vcpu_gpa;
>                         void *vcpu_hva;
> +                       void *vcpu_alias;
>                         uint64_t vcpu_mem_size;
>
>
> @@ -314,8 +355,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>                         PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n",
>                                        vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
>
> -                       /* Cache the HVA pointer of the region */
> +                       /* Cache the host addresses of the region */
>                         vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
> +                       vcpu_alias = addr_gpa2alias(vm, vcpu_gpa);
>
>                         /*
>                          * Set up user fault fd to handle demand paging
> @@ -327,9 +369,10 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>
>                         r = setup_demand_paging(vm,
>                                                 &uffd_handler_threads[vcpu_id],
> -                                               pipefds[vcpu_id * 2],
> +                                               pipefds[vcpu_id * 2], p->uffd_mode,
>                                                 p->uffd_delay, &uffd_args[vcpu_id],
> -                                               vcpu_hva, vcpu_mem_size);
> +                                               vcpu_hva, vcpu_alias,
> +                                               vcpu_mem_size);
>                         if (r < 0)
>                                 exit(-r);
>                 }
> @@ -359,7 +402,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>
>         pr_info("All vCPU threads joined\n");
>
> -       if (p->use_uffd) {
> +       if (p->uffd_mode) {
>                 char c;
>
>                 /* Tell the user fault fd handler threads to quit */
> @@ -381,7 +424,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>
>         free(guest_data_prototype);
>         free(vcpu_threads);
> -       if (p->use_uffd) {
> +       if (p->uffd_mode) {
>                 free(uffd_handler_threads);
>                 free(uffd_args);
>                 free(pipefds);
> @@ -391,11 +434,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>  static void help(char *name)
>  {
>         puts("");
> -       printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> +       printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
>                "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
>         guest_modes_help();
> -       printf(" -u: use User Fault FD to handle vCPU page\n"
> -              "     faults.\n");
> +       printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
> +              "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
>         printf(" -d: add a delay in usec to the User Fault\n"
>                "     FD handler to simulate demand paging\n"
>                "     overheads. Ignored without -u.\n");
> @@ -422,13 +465,17 @@ int main(int argc, char *argv[])
>
>         guest_modes_append_default();
>
> -       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
> +       while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) {
>                 switch (opt) {
>                 case 'm':
>                         guest_modes_cmdline(optarg);
>                         break;
>                 case 'u':
> -                       p.use_uffd = true;
> +                       if (!strcmp("MISSING", optarg))
> +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
> +                       else if (!strcmp("MINOR", optarg))
> +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
> +                       TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
>                         break;
>                 case 'd':
>                         p.uffd_delay = strtoul(optarg, NULL, 0);
> @@ -455,6 +502,9 @@ int main(int argc, char *argv[])
>                 }
>         }
>
> +       TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
> +                   "userfaultfd MINOR mode requires shared memory; pick a different -t");
> +
>         for_each_guest_mode(run_test, &p);
>
>         return 0;
> --
> 2.31.1.607.g51e8a6a459-goog
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] KVM: selftests: add shared hugetlbfs backing source type
  2021-05-12 21:45 ` [PATCH 5/5] KVM: selftests: add shared hugetlbfs backing source type Axel Rasmussen
@ 2021-05-17 23:57   ` Ben Gardon
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Gardon @ 2021-05-17 23:57 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> This lets us run the demand paging test on top of a shared
> hugetlbfs-backed area. The "shared" is key, as this allows us to
> exercise userfaultfd minor faults on hugetlbfs.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>  tools/testing/selftests/kvm/demand_paging_test.c |  6 ++++--
>  tools/testing/selftests/kvm/include/test_util.h  | 10 ++++++++++
>  tools/testing/selftests/kvm/lib/kvm_util.c       |  9 +++++++--
>  tools/testing/selftests/kvm/lib/test_util.c      |  6 ++++++
>  4 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> index ff29aaea3120..32942c9e0376 100644
> --- a/tools/testing/selftests/kvm/demand_paging_test.c
> +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> @@ -502,8 +502,10 @@ int main(int argc, char *argv[])
>                 }
>         }
>
> -       TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
> -                   "userfaultfd MINOR mode requires shared memory; pick a different -t");
> +       if (p.uffd_mode == UFFDIO_REGISTER_MODE_MINOR &&
> +           !backing_src_is_shared(p.src_type)) {
> +               TEST_FAIL("userfaultfd MINOR mode requires shared memory; pick a different -t");
> +       }
>
>         for_each_guest_mode(run_test, &p);
>
> diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
> index 7377f00469ef..852d6d2cc285 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -85,9 +85,19 @@ enum vm_mem_backing_src_type {
>         VM_MEM_SRC_ANONYMOUS_HUGETLB_2GB,
>         VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
>         VM_MEM_SRC_SHMEM,
> +       VM_MEM_SRC_SHARED_HUGETLB,
>         NUM_SRC_TYPES,
>  };
>
> +/*
> + * Whether or not the given source type is shared memory (as opposed to
> + * anonymous).
> + */
> +static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t)
> +{
> +       return t == VM_MEM_SRC_SHMEM || t == VM_MEM_SRC_SHARED_HUGETLB;
> +}

Would it make sense to check (vm_mem_backing_src_alias(t)->flag &
MAP_SHARED) == MAP_SHARED instead?

> +
>  struct vm_mem_backing_src_alias {
>         const char *name;
>         uint32_t flag;
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 838d58633f7e..fed02153c919 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -756,8 +756,13 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
>                 region->mmap_size += alignment;
>
>         region->fd = -1;
> -       if (src_type == VM_MEM_SRC_SHMEM) {
> -               region->fd = memfd_create("kvm_selftest", MFD_CLOEXEC);
> +       if (backing_src_is_shared(src_type)) {
> +               int memfd_flags = MFD_CLOEXEC;
> +
> +               if (src_type == VM_MEM_SRC_SHARED_HUGETLB)
> +                       memfd_flags |= MFD_HUGETLB;
> +
> +               region->fd = memfd_create("kvm_selftest", memfd_flags);
>                 TEST_ASSERT(region->fd != -1,
>                             "memfd_create failed, errno: %i", errno);
>
> diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
> index c7a265da5090..65fb8b43782c 100644
> --- a/tools/testing/selftests/kvm/lib/test_util.c
> +++ b/tools/testing/selftests/kvm/lib/test_util.c
> @@ -240,6 +240,11 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
>                         .name = "shmem",
>                         .flag = MAP_SHARED,
>                 },
> +               [VM_MEM_SRC_SHARED_HUGETLB] = {
> +                       .name = "shared_hugetlb",
> +                       /* No MAP_HUGETLB, we use MFD_HUGETLB instead. */

Is it worth calling out the difference between the two flags here?

> +                       .flag = MAP_SHARED,
> +               },
>         };
>         _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
>                        "Missing new backing src types?");
> @@ -262,6 +267,7 @@ size_t get_backing_src_pagesz(uint32_t i)
>         case VM_MEM_SRC_ANONYMOUS_THP:
>                 return get_trans_hugepagesz();
>         case VM_MEM_SRC_ANONYMOUS_HUGETLB:
> +       case VM_MEM_SRC_SHARED_HUGETLB:
>                 return get_def_hugetlb_pagesz();
>         default:
>                 return MAP_HUGE_PAGE_SIZE(flag);
> --
> 2.31.1.607.g51e8a6a459-goog
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging
  2021-05-17 23:51   ` Ben Gardon
@ 2021-05-18 18:17     ` Axel Rasmussen
  2021-05-18 19:05       ` Ben Gardon
  0 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2021-05-18 18:17 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Mon, May 17, 2021 at 4:51 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
> >
> > UFFD handling of MINOR faults is a new feature whose use case is to
> > speed up demand paging (compared to MISSING faults). So, it's
> > interesting to let this selftest exercise this new mode.
> >
> > Modify the demand paging test to have the option of using UFFD minor
> > faults, as opposed to missing faults. Now, when turning on userfaultfd
> > with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
> >
> > If we're in minor mode, before registering, prefault via the *alias*.
> > This way, the guest will trigger minor faults, instead of missing
> > faults, and we can UFFDIO_CONTINUE to resolve them.
> >
> > Modify the page fault handler function to use the right ioctl depending
> > on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > ---
> >  .../selftests/kvm/demand_paging_test.c        | 124 ++++++++++++------
> >  1 file changed, 87 insertions(+), 37 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> > index 10c7ba76a9c6..ff29aaea3120 100644
> > --- a/tools/testing/selftests/kvm/demand_paging_test.c
> > +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> > @@ -72,33 +72,57 @@ static void *vcpu_worker(void *data)
> >         return NULL;
> >  }
> >
> > -static int handle_uffd_page_request(int uffd, uint64_t addr)
> > +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr)
> >  {
> > -       pid_t tid;
> > +       const char *ioctl_name;
> > +       pid_t tid = syscall(__NR_gettid);
> >         struct timespec start;
> >         struct timespec ts_diff;
> > -       struct uffdio_copy copy;
> >         int r;
> >
> > -       tid = syscall(__NR_gettid);
> > +       if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
> > +               struct uffdio_copy copy;
> >
> > -       copy.src = (uint64_t)guest_data_prototype;
> > -       copy.dst = addr;
> > -       copy.len = demand_paging_size;
> > -       copy.mode = 0;
> > +               ioctl_name = "UFFDIO_COPY";
> >
> > -       clock_gettime(CLOCK_MONOTONIC, &start);
> > +               copy.src = (uint64_t)guest_data_prototype;
> > +               copy.dst = addr;
> > +               copy.len = demand_paging_size;
> > +               copy.mode = 0;
> >
> > -       r = ioctl(uffd, UFFDIO_COPY, &copy);
> > -       if (r == -1) {
> > -               pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
> > -                       addr, tid, errno);
> > -               return r;
> > -       }
> > +               clock_gettime(CLOCK_MONOTONIC, &start);
>
> Nit: It'd probably be fine to factor the timing calls out of the if
> statement to deduplicate.
>
> >
> > -       ts_diff = timespec_elapsed(start);
> > +               r = ioctl(uffd, UFFDIO_COPY, &copy);
> > +               if (r == -1) {
> > +                       pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
> > +                               addr, tid, errno);
> > +                       return r;
> > +               }
> > +
> > +               ts_diff = timespec_elapsed(start);
> > +       } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> > +               struct uffdio_continue cont = {0};
> > +
> > +               ioctl_name = "UFFDIO_CONTINUE";
> > +
> > +               cont.range.start = addr;
> > +               cont.range.len = demand_paging_size;
> > +
> > +               clock_gettime(CLOCK_MONOTONIC, &start);
> > +
> > +               r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
> > +               if (r == -1) {
> > +                       pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
> > +                               addr, tid, errno);
> > +                       return r;
> > +               }
> >
> > -       PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
> > +               ts_diff = timespec_elapsed(start);
> > +       } else {
> > +               TEST_FAIL("Invalid uffd mode %d", uffd_mode);
> > +       }
> > +
> > +       PER_PAGE_DEBUG("%s %d \t%ld ns\n", ioctl_name, tid,
> >                        timespec_to_ns(ts_diff));
>
> As far as I can see this is the only use of ioctl_name and it's not
> going to change in a test run, so it might make sense to not print the
> ioctl name here and just do it once somewhere else.
>
> >         PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
> >                        demand_paging_size, addr, tid);
> > @@ -109,6 +133,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
> >  bool quit_uffd_thread;
> >
> >  struct uffd_handler_args {
> > +       int uffd_mode;
> >         int uffd;
> >         int pipefd;
> >         useconds_t delay;
> > @@ -170,7 +195,7 @@ static void *uffd_handler_thread_fn(void *arg)
> >                 if (r == -1) {
> >                         if (errno == EAGAIN)
> >                                 continue;
> > -                       pr_info("Read of uffd gor errno %d", errno);
> > +                       pr_info("Read of uffd got errno %d\n", errno);
>
> If you end up doing some kind of cleanups patch, it might be worth
> moving this in there.
>
> >                         return NULL;
> >                 }
> >
> > @@ -185,7 +210,7 @@ static void *uffd_handler_thread_fn(void *arg)
> >                 if (delay)
> >                         usleep(delay);
> >                 addr =  msg.arg.pagefault.address;
> > -               r = handle_uffd_page_request(uffd, addr);
> > +               r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr);
> >                 if (r < 0)
> >                         return NULL;
> >                 pages++;
> > @@ -201,17 +226,32 @@ static void *uffd_handler_thread_fn(void *arg)
> >
> >  static int setup_demand_paging(struct kvm_vm *vm,
> >                                pthread_t *uffd_handler_thread, int pipefd,
> > +                              int uffd_mode,
> >                                useconds_t uffd_delay,
> >                                struct uffd_handler_args *uffd_args,
> > -                              void *hva, uint64_t len)
> > +                              void *hva, void *alias, uint64_t len)
> >  {
> >         int uffd;
> >         struct uffdio_api uffdio_api;
> >         struct uffdio_register uffdio_register;
> > +       uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
> > +
> > +       /* In order to get minor faults, prefault via the alias. */
> > +       if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> > +               size_t p;
> > +
> > +               expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
> > +
> > +               TEST_ASSERT(alias != NULL, "Alias required for minor faults");
> > +               for (p = 0; p < (len / demand_paging_size); ++p) {
> > +                       memcpy(alias + (p * demand_paging_size),
> > +                              guest_data_prototype, demand_paging_size);
> > +               }
> > +       }
> >
> >         uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> >         if (uffd == -1) {
> > -               pr_info("uffd creation failed\n");
> > +               pr_info("uffd creation failed, errno: %d\n", errno);
> >                 return -1;
> >         }
>
> Huh, I wonder why I put all these return -1 statements in here. The
> caller just does exit(-r) if r < 0. Seems like these could all just be
> converted to TEST_ASSERTs.

I agree that change makes sense, but it seems better to do it in a
separate commit as it's maybe a 10-20 line change.

Would you prefer I add that into this series, or just keep the status quo?

>
> >
> > @@ -224,18 +264,18 @@ static int setup_demand_paging(struct kvm_vm *vm,
> >
> >         uffdio_register.range.start = (uint64_t)hva;
> >         uffdio_register.range.len = len;
> > -       uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
> > +       uffdio_register.mode = uffd_mode;
> >         if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
> >                 pr_info("ioctl uffdio_register failed\n");
> >                 return -1;
> >         }
> >
> > -       if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) !=
> > -                       UFFD_API_RANGE_IOCTLS) {
> > -               pr_info("unexpected userfaultfd ioctl set\n");
> > +       if ((uffdio_register.ioctls & expected_ioctls) != expected_ioctls) {
> > +               pr_info("missing userfaultfd ioctls\n");
> >                 return -1;
> >         }
> >
> > +       uffd_args->uffd_mode = uffd_mode;
> >         uffd_args->uffd = uffd;
> >         uffd_args->pipefd = pipefd;
> >         uffd_args->delay = uffd_delay;
> > @@ -249,7 +289,7 @@ static int setup_demand_paging(struct kvm_vm *vm,
> >  }
> >
> >  struct test_params {
> > -       bool use_uffd;
> > +       int uffd_mode;
> >         useconds_t uffd_delay;
> >         enum vm_mem_backing_src_type src_type;
> >         bool partition_vcpu_memory_access;
> > @@ -286,7 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >         perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size,
> >                               p->partition_vcpu_memory_access);
> >
> > -       if (p->use_uffd) {
> > +       if (p->uffd_mode) {
> >                 uffd_handler_threads =
> >                         malloc(nr_vcpus * sizeof(*uffd_handler_threads));
> >                 TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
> > @@ -300,6 +340,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                 for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> >                         vm_paddr_t vcpu_gpa;
> >                         void *vcpu_hva;
> > +                       void *vcpu_alias;
> >                         uint64_t vcpu_mem_size;
> >
> >
> > @@ -314,8 +355,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >                         PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n",
> >                                        vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
> >
> > -                       /* Cache the HVA pointer of the region */
> > +                       /* Cache the host addresses of the region */
> >                         vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
> > +                       vcpu_alias = addr_gpa2alias(vm, vcpu_gpa);
> >
> >                         /*
> >                          * Set up user fault fd to handle demand paging
> > @@ -327,9 +369,10 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >
> >                         r = setup_demand_paging(vm,
> >                                                 &uffd_handler_threads[vcpu_id],
> > -                                               pipefds[vcpu_id * 2],
> > +                                               pipefds[vcpu_id * 2], p->uffd_mode,
> >                                                 p->uffd_delay, &uffd_args[vcpu_id],
> > -                                               vcpu_hva, vcpu_mem_size);
> > +                                               vcpu_hva, vcpu_alias,
> > +                                               vcpu_mem_size);
> >                         if (r < 0)
> >                                 exit(-r);
> >                 }
> > @@ -359,7 +402,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >
> >         pr_info("All vCPU threads joined\n");
> >
> > -       if (p->use_uffd) {
> > +       if (p->uffd_mode) {
> >                 char c;
> >
> >                 /* Tell the user fault fd handler threads to quit */
> > @@ -381,7 +424,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >
> >         free(guest_data_prototype);
> >         free(vcpu_threads);
> > -       if (p->use_uffd) {
> > +       if (p->uffd_mode) {
> >                 free(uffd_handler_threads);
> >                 free(uffd_args);
> >                 free(pipefds);
> > @@ -391,11 +434,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> >  static void help(char *name)
> >  {
> >         puts("");
> > -       printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> > +       printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
> >                "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
> >         guest_modes_help();
> > -       printf(" -u: use User Fault FD to handle vCPU page\n"
> > -              "     faults.\n");
> > +       printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
> > +              "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
> >         printf(" -d: add a delay in usec to the User Fault\n"
> >                "     FD handler to simulate demand paging\n"
> >                "     overheads. Ignored without -u.\n");
> > @@ -422,13 +465,17 @@ int main(int argc, char *argv[])
> >
> >         guest_modes_append_default();
> >
> > -       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
> > +       while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) {
> >                 switch (opt) {
> >                 case 'm':
> >                         guest_modes_cmdline(optarg);
> >                         break;
> >                 case 'u':
> > -                       p.use_uffd = true;
> > +                       if (!strcmp("MISSING", optarg))
> > +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
> > +                       else if (!strcmp("MINOR", optarg))
> > +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
> > +                       TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
> >                         break;
> >                 case 'd':
> >                         p.uffd_delay = strtoul(optarg, NULL, 0);
> > @@ -455,6 +502,9 @@ int main(int argc, char *argv[])
> >                 }
> >         }
> >
> > +       TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
> > +                   "userfaultfd MINOR mode requires shared memory; pick a different -t");
> > +
> >         for_each_guest_mode(run_test, &p);
> >
> >         return 0;
> > --
> > 2.31.1.607.g51e8a6a459-goog
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging
  2021-05-18 18:17     ` Axel Rasmussen
@ 2021-05-18 19:05       ` Ben Gardon
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Gardon @ 2021-05-18 19:05 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Aaron Lewis, Alexander Graf, Andrew Jones, Andrew Morton,
	Emanuele Giuseppe Esposito, Eric Auger, Jacob Xu,
	Makarand Sonare, Oliver Upton, Paolo Bonzini, Peter Xu,
	Shuah Khan, Yanan Wang, kvm, LKML, linux-kselftest

On Tue, May 18, 2021 at 11:17 AM Axel Rasmussen
<axelrasmussen@google.com> wrote:
>
> On Mon, May 17, 2021 at 4:51 PM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Wed, May 12, 2021 at 2:45 PM Axel Rasmussen <axelrasmussen@google.com> wrote:
> > >
> > > UFFD handling of MINOR faults is a new feature whose use case is to
> > > speed up demand paging (compared to MISSING faults). So, it's
> > > interesting to let this selftest exercise this new mode.
> > >
> > > Modify the demand paging test to have the option of using UFFD minor
> > > faults, as opposed to missing faults. Now, when turning on userfaultfd
> > > with '-u', the desired mode has to be specified ("MISSING" or "MINOR").
> > >
> > > If we're in minor mode, before registering, prefault via the *alias*.
> > > This way, the guest will trigger minor faults, instead of missing
> > > faults, and we can UFFDIO_CONTINUE to resolve them.
> > >
> > > Modify the page fault handler function to use the right ioctl depending
> > > on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.
> > >
> > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > > ---
> > >  .../selftests/kvm/demand_paging_test.c        | 124 ++++++++++++------
> > >  1 file changed, 87 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
> > > index 10c7ba76a9c6..ff29aaea3120 100644
> > > --- a/tools/testing/selftests/kvm/demand_paging_test.c
> > > +++ b/tools/testing/selftests/kvm/demand_paging_test.c
> > > @@ -72,33 +72,57 @@ static void *vcpu_worker(void *data)
> > >         return NULL;
> > >  }
> > >
> > > -static int handle_uffd_page_request(int uffd, uint64_t addr)
> > > +static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr)
> > >  {
> > > -       pid_t tid;
> > > +       const char *ioctl_name;
> > > +       pid_t tid = syscall(__NR_gettid);
> > >         struct timespec start;
> > >         struct timespec ts_diff;
> > > -       struct uffdio_copy copy;
> > >         int r;
> > >
> > > -       tid = syscall(__NR_gettid);
> > > +       if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) {
> > > +               struct uffdio_copy copy;
> > >
> > > -       copy.src = (uint64_t)guest_data_prototype;
> > > -       copy.dst = addr;
> > > -       copy.len = demand_paging_size;
> > > -       copy.mode = 0;
> > > +               ioctl_name = "UFFDIO_COPY";
> > >
> > > -       clock_gettime(CLOCK_MONOTONIC, &start);
> > > +               copy.src = (uint64_t)guest_data_prototype;
> > > +               copy.dst = addr;
> > > +               copy.len = demand_paging_size;
> > > +               copy.mode = 0;
> > >
> > > -       r = ioctl(uffd, UFFDIO_COPY, &copy);
> > > -       if (r == -1) {
> > > -               pr_info("Failed Paged in 0x%lx from thread %d with errno: %d\n",
> > > -                       addr, tid, errno);
> > > -               return r;
> > > -       }
> > > +               clock_gettime(CLOCK_MONOTONIC, &start);
> >
> > Nit: It'd probably be fine to factor the timing calls out of the if
> > statement to deduplicate.
> >
> > >
> > > -       ts_diff = timespec_elapsed(start);
> > > +               r = ioctl(uffd, UFFDIO_COPY, &copy);
> > > +               if (r == -1) {
> > > +                       pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n",
> > > +                               addr, tid, errno);
> > > +                       return r;
> > > +               }
> > > +
> > > +               ts_diff = timespec_elapsed(start);
> > > +       } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> > > +               struct uffdio_continue cont = {0};
> > > +
> > > +               ioctl_name = "UFFDIO_CONTINUE";
> > > +
> > > +               cont.range.start = addr;
> > > +               cont.range.len = demand_paging_size;
> > > +
> > > +               clock_gettime(CLOCK_MONOTONIC, &start);
> > > +
> > > +               r = ioctl(uffd, UFFDIO_CONTINUE, &cont);
> > > +               if (r == -1) {
> > > +                       pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n",
> > > +                               addr, tid, errno);
> > > +                       return r;
> > > +               }
> > >
> > > -       PER_PAGE_DEBUG("UFFDIO_COPY %d \t%ld ns\n", tid,
> > > +               ts_diff = timespec_elapsed(start);
> > > +       } else {
> > > +               TEST_FAIL("Invalid uffd mode %d", uffd_mode);
> > > +       }
> > > +
> > > +       PER_PAGE_DEBUG("%s %d \t%ld ns\n", ioctl_name, tid,
> > >                        timespec_to_ns(ts_diff));
> >
> > As far as I can see this is the only use of ioctl_name and it's not
> > going to change in a test run, so it might make sense to not print the
> > ioctl name here and just do it once somewhere else.
> >
> > >         PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n",
> > >                        demand_paging_size, addr, tid);
> > > @@ -109,6 +133,7 @@ static int handle_uffd_page_request(int uffd, uint64_t addr)
> > >  bool quit_uffd_thread;
> > >
> > >  struct uffd_handler_args {
> > > +       int uffd_mode;
> > >         int uffd;
> > >         int pipefd;
> > >         useconds_t delay;
> > > @@ -170,7 +195,7 @@ static void *uffd_handler_thread_fn(void *arg)
> > >                 if (r == -1) {
> > >                         if (errno == EAGAIN)
> > >                                 continue;
> > > -                       pr_info("Read of uffd gor errno %d", errno);
> > > +                       pr_info("Read of uffd got errno %d\n", errno);
> >
> > If you end up doing some kind of cleanups patch, it might be worth
> > moving this in there.
> >
> > >                         return NULL;
> > >                 }
> > >
> > > @@ -185,7 +210,7 @@ static void *uffd_handler_thread_fn(void *arg)
> > >                 if (delay)
> > >                         usleep(delay);
> > >                 addr =  msg.arg.pagefault.address;
> > > -               r = handle_uffd_page_request(uffd, addr);
> > > +               r = handle_uffd_page_request(uffd_args->uffd_mode, uffd, addr);
> > >                 if (r < 0)
> > >                         return NULL;
> > >                 pages++;
> > > @@ -201,17 +226,32 @@ static void *uffd_handler_thread_fn(void *arg)
> > >
> > >  static int setup_demand_paging(struct kvm_vm *vm,
> > >                                pthread_t *uffd_handler_thread, int pipefd,
> > > +                              int uffd_mode,
> > >                                useconds_t uffd_delay,
> > >                                struct uffd_handler_args *uffd_args,
> > > -                              void *hva, uint64_t len)
> > > +                              void *hva, void *alias, uint64_t len)
> > >  {
> > >         int uffd;
> > >         struct uffdio_api uffdio_api;
> > >         struct uffdio_register uffdio_register;
> > > +       uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY;
> > > +
> > > +       /* In order to get minor faults, prefault via the alias. */
> > > +       if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) {
> > > +               size_t p;
> > > +
> > > +               expected_ioctls = ((uint64_t) 1) << _UFFDIO_CONTINUE;
> > > +
> > > +               TEST_ASSERT(alias != NULL, "Alias required for minor faults");
> > > +               for (p = 0; p < (len / demand_paging_size); ++p) {
> > > +                       memcpy(alias + (p * demand_paging_size),
> > > +                              guest_data_prototype, demand_paging_size);
> > > +               }
> > > +       }
> > >
> > >         uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> > >         if (uffd == -1) {
> > > -               pr_info("uffd creation failed\n");
> > > +               pr_info("uffd creation failed, errno: %d\n", errno);
> > >                 return -1;
> > >         }
> >
> > Huh, I wonder why I put all these return -1 statements in here. The
> > caller just does exit(-r) if r < 0. Seems like these could all just be
> > converted to TEST_ASSERTs.
>
> I agree that change makes sense, but it seems better to do it in a
> separate commit as it's maybe a 10-20 line change.
>
> Would you prefer I add that into this series, or just keep the status quo?

If you don't mind adding it to this series, that would be awesome.

>
> >
> > >
> > > @@ -224,18 +264,18 @@ static int setup_demand_paging(struct kvm_vm *vm,
> > >
> > >         uffdio_register.range.start = (uint64_t)hva;
> > >         uffdio_register.range.len = len;
> > > -       uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
> > > +       uffdio_register.mode = uffd_mode;
> > >         if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
> > >                 pr_info("ioctl uffdio_register failed\n");
> > >                 return -1;
> > >         }
> > >
> > > -       if ((uffdio_register.ioctls & UFFD_API_RANGE_IOCTLS) !=
> > > -                       UFFD_API_RANGE_IOCTLS) {
> > > -               pr_info("unexpected userfaultfd ioctl set\n");
> > > +       if ((uffdio_register.ioctls & expected_ioctls) != expected_ioctls) {
> > > +               pr_info("missing userfaultfd ioctls\n");
> > >                 return -1;
> > >         }
> > >
> > > +       uffd_args->uffd_mode = uffd_mode;
> > >         uffd_args->uffd = uffd;
> > >         uffd_args->pipefd = pipefd;
> > >         uffd_args->delay = uffd_delay;
> > > @@ -249,7 +289,7 @@ static int setup_demand_paging(struct kvm_vm *vm,
> > >  }
> > >
> > >  struct test_params {
> > > -       bool use_uffd;
> > > +       int uffd_mode;
> > >         useconds_t uffd_delay;
> > >         enum vm_mem_backing_src_type src_type;
> > >         bool partition_vcpu_memory_access;
> > > @@ -286,7 +326,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> > >         perf_test_setup_vcpus(vm, nr_vcpus, guest_percpu_mem_size,
> > >                               p->partition_vcpu_memory_access);
> > >
> > > -       if (p->use_uffd) {
> > > +       if (p->uffd_mode) {
> > >                 uffd_handler_threads =
> > >                         malloc(nr_vcpus * sizeof(*uffd_handler_threads));
> > >                 TEST_ASSERT(uffd_handler_threads, "Memory allocation failed");
> > > @@ -300,6 +340,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> > >                 for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> > >                         vm_paddr_t vcpu_gpa;
> > >                         void *vcpu_hva;
> > > +                       void *vcpu_alias;
> > >                         uint64_t vcpu_mem_size;
> > >
> > >
> > > @@ -314,8 +355,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> > >                         PER_VCPU_DEBUG("Added VCPU %d with test mem gpa [%lx, %lx)\n",
> > >                                        vcpu_id, vcpu_gpa, vcpu_gpa + vcpu_mem_size);
> > >
> > > -                       /* Cache the HVA pointer of the region */
> > > +                       /* Cache the host addresses of the region */
> > >                         vcpu_hva = addr_gpa2hva(vm, vcpu_gpa);
> > > +                       vcpu_alias = addr_gpa2alias(vm, vcpu_gpa);
> > >
> > >                         /*
> > >                          * Set up user fault fd to handle demand paging
> > > @@ -327,9 +369,10 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> > >
> > >                         r = setup_demand_paging(vm,
> > >                                                 &uffd_handler_threads[vcpu_id],
> > > -                                               pipefds[vcpu_id * 2],
> > > +                                               pipefds[vcpu_id * 2], p->uffd_mode,
> > >                                                 p->uffd_delay, &uffd_args[vcpu_id],
> > > -                                               vcpu_hva, vcpu_mem_size);
> > > +                                               vcpu_hva, vcpu_alias,
> > > +                                               vcpu_mem_size);
> > >                         if (r < 0)
> > >                                 exit(-r);
> > >                 }
> > > @@ -359,7 +402,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> > >
> > >         pr_info("All vCPU threads joined\n");
> > >
> > > -       if (p->use_uffd) {
> > > +       if (p->uffd_mode) {
> > >                 char c;
> > >
> > >                 /* Tell the user fault fd handler threads to quit */
> > > @@ -381,7 +424,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> > >
> > >         free(guest_data_prototype);
> > >         free(vcpu_threads);
> > > -       if (p->use_uffd) {
> > > +       if (p->uffd_mode) {
> > >                 free(uffd_handler_threads);
> > >                 free(uffd_args);
> > >                 free(pipefds);
> > > @@ -391,11 +434,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> > >  static void help(char *name)
> > >  {
> > >         puts("");
> > > -       printf("usage: %s [-h] [-m mode] [-u] [-d uffd_delay_usec]\n"
> > > +       printf("usage: %s [-h] [-m mode] [-u mode] [-d uffd_delay_usec]\n"
> > >                "          [-b memory] [-t type] [-v vcpus] [-o]\n", name);
> > >         guest_modes_help();
> > > -       printf(" -u: use User Fault FD to handle vCPU page\n"
> > > -              "     faults.\n");
> > > +       printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n"
> > > +              "     UFFD registration mode: 'MISSING' or 'MINOR'.\n");
> > >         printf(" -d: add a delay in usec to the User Fault\n"
> > >                "     FD handler to simulate demand paging\n"
> > >                "     overheads. Ignored without -u.\n");
> > > @@ -422,13 +465,17 @@ int main(int argc, char *argv[])
> > >
> > >         guest_modes_append_default();
> > >
> > > -       while ((opt = getopt(argc, argv, "hm:ud:b:t:v:o")) != -1) {
> > > +       while ((opt = getopt(argc, argv, "hm:u:d:b:t:v:o")) != -1) {
> > >                 switch (opt) {
> > >                 case 'm':
> > >                         guest_modes_cmdline(optarg);
> > >                         break;
> > >                 case 'u':
> > > -                       p.use_uffd = true;
> > > +                       if (!strcmp("MISSING", optarg))
> > > +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING;
> > > +                       else if (!strcmp("MINOR", optarg))
> > > +                               p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR;
> > > +                       TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'.");
> > >                         break;
> > >                 case 'd':
> > >                         p.uffd_delay = strtoul(optarg, NULL, 0);
> > > @@ -455,6 +502,9 @@ int main(int argc, char *argv[])
> > >                 }
> > >         }
> > >
> > > +       TEST_ASSERT(p.uffd_mode != UFFDIO_REGISTER_MODE_MINOR || p.src_type == VM_MEM_SRC_SHMEM,
> > > +                   "userfaultfd MINOR mode requires shared memory; pick a different -t");
> > > +
> > >         for_each_guest_mode(run_test, &p);
> > >
> > >         return 0;
> > > --
> > > 2.31.1.607.g51e8a6a459-goog
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-18 19:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 21:44 [PATCH 0/5] KVM: selftests: exercise userfaultfd minor faults Axel Rasmussen
2021-05-12 21:44 ` [PATCH 1/5] KVM: selftests: allow different backing memory types for demand paging Axel Rasmussen
2021-05-13 21:32   ` Ben Gardon
2021-05-13 22:02     ` Axel Rasmussen
2021-05-12 21:44 ` [PATCH 2/5] KVM: selftests: add shmem backing source type Axel Rasmussen
2021-05-13 21:46   ` Ben Gardon
2021-05-12 21:45 ` [PATCH 3/5] KVM: selftests: create alias mappings when using shared memory Axel Rasmussen
2021-05-13 22:33   ` Ben Gardon
2021-05-12 21:45 ` [PATCH 4/5] KVM: selftests: allow using UFFD minor faults for demand paging Axel Rasmussen
2021-05-17 23:51   ` Ben Gardon
2021-05-18 18:17     ` Axel Rasmussen
2021-05-18 19:05       ` Ben Gardon
2021-05-12 21:45 ` [PATCH 5/5] KVM: selftests: add shared hugetlbfs backing source type Axel Rasmussen
2021-05-17 23:57   ` Ben Gardon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).