linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page
@ 2023-09-21 20:14 isaku.yamahata
  2023-09-21 20:14 ` [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole isaku.yamahata
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: isaku.yamahata @ 2023-09-21 20:14 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch series is to implement test cases for the KVM gmem error_remove_page
method.
- Update punch hole method to truncate pages
- Add a new ioctl KVM_GUEST_MEMORY_FAILURE to inject memory failure on
  offset of gmem

TODO:
- Update TDX KVM to handle it and Add test cases for TDX.
  This will be done by its own patch series.

Changes:
v2:
- rebased to [RFC PATCH v12 00/33] KVM: guest_memfd() and per-page attributes
  https://lore.kernel.org/all/20230914015531.1419405-1-seanjc@google.com/
- introduce new ioctl to inject memory fault on gmem and drop FIBMAP hack
- Implement test cases

v1:
https://lore.kernel.org/all/cover.1694599703.git.isaku.yamahata@intel.com/

Isaku Yamahata (6):
  KVM: gmem: Truncate pages on punch hole
  KVM: selftests: Add negative test cases for punch hole for
    guest_memfd()
  KVM: selftests: Add tests for punch hole on guest_memfd
  KVM: gmem: Add ioctl to inject memory failure on guest memfd
  KVM: selftests: Add test cases for KVM_GUEST_MEMORY_FAILURE
  KVM: guest_memfd: selftest: Add test case for error_remove_page method

 include/uapi/linux/kvm.h                      |   6 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  |  80 ++++
 .../kvm/x86_64/private_mem_conversions_test.c |  26 +-
 .../kvm/x86_64/private_mem_hwpoison_test.c    | 367 ++++++++++++++++++
 virt/kvm/guest_mem.c                          |  82 +++-
 6 files changed, 554 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_hwpoison_test.c


base-commit: 42dc814987c1feb6410904e58cfd4c36c4146150
prerequisite-patch-id: 3c646922da088ceebe447974a90217f377b76e4a
prerequisite-patch-id: 635c4dca26af8d105a4fd8f603e4a9cf830395c7
prerequisite-patch-id: eede5382aa76c0602a853fc93a1450995c651345
prerequisite-patch-id: 5549aad02248eb5a0c2853058dc7c102044c7a9d
prerequisite-patch-id: ab6557f79edb77246ee1e9955be81a10841e65fd
prerequisite-patch-id: bf75388851ee37a83b37bfa7cb0084f27301f6bc
prerequisite-patch-id: cecffe9a2445f2ea01b9fdb1356b1c87eb6b8fe7
prerequisite-patch-id: e1692d657690f974d836ba3efdd277ea82e675ca
prerequisite-patch-id: 747debe72334ef0cd12bbb42d1acb281eb59cd98
prerequisite-patch-id: 2d1df1bad8af51577ec15a37510ea8bf018b0d4f
prerequisite-patch-id: d05f7429ca893fe0fe3beb460bba1400379cd0d1
prerequisite-patch-id: 9916b6553a61beb9a5435bc0b1fcacf0a87165ea
prerequisite-patch-id: 313219882d617e4d4cb226760d1f071f52b3f882
prerequisite-patch-id: 5412c4037793bc0999377f9732290b9944257b7c
prerequisite-patch-id: b737a534d8da531f6d030be5e864a3097ca97384
prerequisite-patch-id: cafe1f6964532d261b950e1879e091dc8c0b4386
prerequisite-patch-id: 20a5d5b1f853828061ccb07ed08459b9922e5214
prerequisite-patch-id: 756402fa0914a86ac7db59aa54e36a7bca9d0770
prerequisite-patch-id: 0e93d19cb59f3a052a377a56ff0a4399046818aa
prerequisite-patch-id: 8576bf7ec9f786870e72b78299dcab5dd4eb0d23
prerequisite-patch-id: 0693f3aa65226ff9ffa1f70b6f6da2410ebd0588
prerequisite-patch-id: 301dbdf8448175ea609664c890a3694750ecf740
prerequisite-patch-id: 8f67d4366ca5c9875c4ef7f445941e3ad3162c75
prerequisite-patch-id: 2d62d84b4f9db4148af35495ce1b188c6b46f421
prerequisite-patch-id: b4526dee5b5a95da0a13116ae0c73d4e69efa3c6
prerequisite-patch-id: 8a2b4167ea632413ba8f6d39788ddf19eb928ab0
prerequisite-patch-id: 5618d2414a1ef641b4c247b5e28076f67a765b24
prerequisite-patch-id: 4415e2df6492fbb64746e3503deba6e991a0e08b
prerequisite-patch-id: ba50138fe73e9a5f58e19eebaab2c17a2dc231e3
prerequisite-patch-id: 1225df90aeae430a74354bc5ad0ddf508d0707db
prerequisite-patch-id: 4c0e6ab89b9a3ed3a2cb236e942b5842926bf868
prerequisite-patch-id: 59f78d417ca58088e336f8e2a9540d1c95bd2f6c
prerequisite-patch-id: b6a6a8916fe89e7da1fadefb7d311960732e0faf
-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
@ 2023-09-21 20:14 ` isaku.yamahata
  2023-09-21 20:37   ` Sean Christopherson
  2023-09-21 20:14 ` [RFC PATCH v2 2/6] KVM: selftests: Add negative test cases for punch hole for guest_memfd() isaku.yamahata
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: isaku.yamahata @ 2023-09-21 20:14 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Although kvm_gmem_punch_hole() keeps all pages in mapping on punching hole,
it's common expectation that pages are truncated.  Truncate pages on
punching hole.  As page contents can be encrypted, avoid zeroing partial
folio by refusing partial punch hole.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/guest_mem.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index a819367434e9..01fb4ca861d0 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -130,22 +130,32 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
 static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 {
 	struct list_head *gmem_list = &inode->i_mapping->private_list;
+	struct address_space *mapping  = inode->i_mapping;
 	pgoff_t start = offset >> PAGE_SHIFT;
 	pgoff_t end = (offset + len) >> PAGE_SHIFT;
 	struct kvm_gmem *gmem;
 
+	/*
+	 * punch hole may result in zeroing partial area.  As pages can be
+	 * encrypted, prohibit zeroing partial area.
+	 */
+	if (offset & ~PAGE_MASK || len & ~PAGE_MASK)
+		return -EINVAL;
+
 	/*
 	 * Bindings must stable across invalidation to ensure the start+end
 	 * are balanced.
 	 */
-	filemap_invalidate_lock(inode->i_mapping);
+	filemap_invalidate_lock(mapping);
 
 	list_for_each_entry(gmem, gmem_list, entry) {
 		kvm_gmem_invalidate_begin(gmem, start, end);
 		kvm_gmem_invalidate_end(gmem, start, end);
 	}
 
-	filemap_invalidate_unlock(inode->i_mapping);
+	truncate_inode_pages_range(mapping, offset, offset + len - 1);
+
+	filemap_invalidate_unlock(mapping);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 2/6] KVM: selftests: Add negative test cases for punch hole for guest_memfd()
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
  2023-09-21 20:14 ` [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole isaku.yamahata
@ 2023-09-21 20:14 ` isaku.yamahata
  2023-09-21 20:14 ` [RFC PATCH v2 3/6] KVM: selftests: Add tests for punch hole on guest_memfd isaku.yamahata
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: isaku.yamahata @ 2023-09-21 20:14 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add test cases to check for punch hole of guest_memfd to reject unaligned
offset or size.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 32 +++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 75073645aaa1..d5b4bfcdc3fe 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -91,6 +91,37 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size)
 	TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed");
 }
 
+/* Negative tests */
+static void test_fallocate_fail(int fd, size_t page_size, size_t total_size)
+{
+	struct {
+		off_t offset;
+		off_t len;
+	} cases[] = {
+		{0, 1},
+		{0, page_size - 1},
+		{0, page_size + 1},
+
+		{1, 1},
+		{1, page_size - 1},
+		{1, page_size},
+		{1, page_size + 1},
+
+		{page_size, 1},
+		{page_size, page_size - 1},
+		{page_size, page_size + 1},
+	};
+	int ret;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+				cases[i].offset, cases[i].len);
+		TEST_ASSERT(ret == -1,
+			    "fallocate(PUNCH_HOLE) with unaligned offset and/or size should fail");
+	}
+}
+
 static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
 {
 	uint64_t valid_flags = 0;
@@ -160,6 +191,7 @@ int main(int argc, char *argv[])
 	test_mmap(fd, page_size);
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
+	test_fallocate_fail(fd, page_size, total_size);
 
 	close(fd);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 3/6] KVM: selftests: Add tests for punch hole on guest_memfd
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
  2023-09-21 20:14 ` [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole isaku.yamahata
  2023-09-21 20:14 ` [RFC PATCH v2 2/6] KVM: selftests: Add negative test cases for punch hole for guest_memfd() isaku.yamahata
@ 2023-09-21 20:14 ` isaku.yamahata
  2023-09-21 20:40   ` Sean Christopherson
  2023-09-21 20:14 ` [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd isaku.yamahata
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: isaku.yamahata @ 2023-09-21 20:14 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Punch hole implies the region is zeroed out. Add tests if the punched
region has zero.
Oppertunistically Remove unused member, pattern, in guest_run_test().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../kvm/x86_64/private_mem_conversions_test.c | 26 ++++++++++++++-----
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c
index 50541246d6fd..c05c725645af 100644
--- a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c
+++ b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c
@@ -85,9 +85,10 @@ static void guest_sync_private(uint64_t gpa, uint64_t size, uint8_t pattern)
 /* Arbitrary values, KVM doesn't care about the attribute flags. */
 #define MAP_GPA_SHARED		BIT(0)
 #define MAP_GPA_DO_FALLOCATE	BIT(1)
+#define MAP_GPA_FALLOCATE_ONLY	BIT(2)
 
 static void guest_map_mem(uint64_t gpa, uint64_t size, bool map_shared,
-			  bool do_fallocate)
+			  bool do_fallocate, bool fallocate_only)
 {
 	uint64_t flags = 0;
 
@@ -95,17 +96,24 @@ static void guest_map_mem(uint64_t gpa, uint64_t size, bool map_shared,
 		flags |= MAP_GPA_SHARED;
 	if (do_fallocate)
 		flags |= MAP_GPA_DO_FALLOCATE;
+	if (fallocate_only)
+		flags |= MAP_GPA_FALLOCATE_ONLY;
 	kvm_hypercall_map_gpa_range(gpa, size, flags);
 }
 
 static void guest_map_shared(uint64_t gpa, uint64_t size, bool do_fallocate)
 {
-	guest_map_mem(gpa, size, true, do_fallocate);
+	guest_map_mem(gpa, size, true, do_fallocate, false);
 }
 
 static void guest_map_private(uint64_t gpa, uint64_t size, bool do_fallocate)
 {
-	guest_map_mem(gpa, size, false, do_fallocate);
+	guest_map_mem(gpa, size, false, do_fallocate, false);
+}
+
+static void guest_punch_hole_private(uint64_t gpa, uint64_t size)
+{
+	guest_map_mem(gpa, size, true, true, true);
 }
 
 static void guest_run_test(uint64_t base_gpa, bool do_fallocate)
@@ -113,7 +121,6 @@ static void guest_run_test(uint64_t base_gpa, bool do_fallocate)
 	struct {
 		uint64_t offset;
 		uint64_t size;
-		uint8_t pattern;
 	} stages[] = {
 		GUEST_STAGE(0, PAGE_SIZE),
 		GUEST_STAGE(0, SZ_2M),
@@ -156,6 +163,10 @@ static void guest_run_test(uint64_t base_gpa, bool do_fallocate)
 
 		if (size > PAGE_SIZE) {
 			memset((void *)gpa, p2, PAGE_SIZE);
+
+			/* Test if punch hole results in zeroing page. */
+			guest_punch_hole_private(gpa, PAGE_SIZE);
+			memcmp_g(gpa, 0, PAGE_SIZE);
 			goto skip;
 		}
 
@@ -229,6 +240,7 @@ static void handle_exit_hypercall(struct kvm_vcpu *vcpu)
 	uint64_t size = run->hypercall.args[1] * PAGE_SIZE;
 	bool map_shared = run->hypercall.args[2] & MAP_GPA_SHARED;
 	bool do_fallocate = run->hypercall.args[2] & MAP_GPA_DO_FALLOCATE;
+	bool fallocate_only = run->hypercall.args[2] & MAP_GPA_FALLOCATE_ONLY;
 	struct kvm_vm *vm = vcpu->vm;
 
 	TEST_ASSERT(run->hypercall.nr == KVM_HC_MAP_GPA_RANGE,
@@ -238,8 +250,10 @@ static void handle_exit_hypercall(struct kvm_vcpu *vcpu)
 	if (do_fallocate)
 		vm_guest_mem_fallocate(vm, gpa, size, map_shared);
 
-	vm_set_memory_attributes(vm, gpa, size,
-				 map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE);
+	if (!fallocate_only)
+		vm_set_memory_attributes(vm, gpa, size,
+					 map_shared ?
+					 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE);
 	run->hypercall.ret = 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
                   ` (2 preceding siblings ...)
  2023-09-21 20:14 ` [RFC PATCH v2 3/6] KVM: selftests: Add tests for punch hole on guest_memfd isaku.yamahata
@ 2023-09-21 20:14 ` isaku.yamahata
  2023-09-21 21:29   ` Sean Christopherson
  2023-09-21 21:53   ` Sean Christopherson
  2023-09-21 20:14 ` [RFC PATCH v2 5/6] KVM: selftests: Add test cases for KVM_GUEST_MEMORY_FAILURE isaku.yamahata
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: isaku.yamahata @ 2023-09-21 20:14 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

To test error_remove_page() method of KVM gmem, add a new ioctl to
inject memory failure based on offset of guest memfd.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/uapi/linux/kvm.h |  6 ++++
 virt/kvm/guest_mem.c     | 68 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 65fc983af840..4160614bcc0f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -2323,4 +2323,10 @@ struct kvm_create_guest_memfd {
 	__u64 reserved[6];
 };
 
+#define KVM_GUEST_MEMORY_FAILURE	_IOWR(KVMIO,  0xd5, struct kvm_guest_memory_failure)
+
+struct kvm_guest_memory_failure {
+	__u64 offset;
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 01fb4ca861d0..bc9dae50004b 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -291,10 +291,78 @@ static struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot)
 	return file;
 }
 
+static int kvm_gmem_inject_failure(struct file *file,
+				   struct kvm_guest_memory_failure *mf)
+{
+	struct inode *inode = file_inode(file);
+	struct address_space *mapping = inode->i_mapping;
+	pgoff_t index = mf->offset >> PAGE_SHIFT;
+	struct folio *folio;
+	unsigned long pfn;
+	int err = 0;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	filemap_invalidate_lock_shared(mapping);
+
+	/* Don't allocate page. */
+	folio = filemap_get_folio(mapping, index);
+	if (!folio) {
+		err = -ENOENT;
+		goto out;
+	}
+	if (IS_ERR(folio)) {
+		err = PTR_ERR(folio);
+		goto out;
+	}
+
+	pfn = folio_pfn(folio) + (index - folio_index(folio));
+	folio_put(folio);
+
+out:
+	filemap_invalidate_unlock_shared(mapping);
+	if (err)
+		return err;
+
+	/*
+	 * Race with pfn: memory_failure() and unpoison_memory() gain invalidate
+	 * lock as the error recovery logic tries to remove pages from
+	 * mapping.
+	 */
+	if (!pfn_valid(pfn))
+		return -ENXIO;
+	return memory_failure(pfn, MF_SW_SIMULATED);
+}
+
+static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
+			   unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	int r = -EINVAL;
+
+	switch (ioctl) {
+	case KVM_GUEST_MEMORY_FAILURE: {
+		struct kvm_guest_memory_failure mf;
+
+		r = -EFAULT;
+		if (copy_from_user(&mf, argp, sizeof(mf)))
+			break;
+		r = kvm_gmem_inject_failure(file, &mf);
+		break;
+	}
+	default:
+		break;
+	}
+
+	return r;
+}
+
 static const struct file_operations kvm_gmem_fops = {
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
+	.unlocked_ioctl	= kvm_gmem_ioctl,
 };
 
 static int kvm_gmem_migrate_folio(struct address_space *mapping,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 5/6] KVM: selftests: Add test cases for KVM_GUEST_MEMORY_FAILURE
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
                   ` (3 preceding siblings ...)
  2023-09-21 20:14 ` [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd isaku.yamahata
@ 2023-09-21 20:14 ` isaku.yamahata
  2023-09-21 20:14 ` [RFC PATCH v2 6/6] KVM: guest_memfd: selftest: Add test case for error_remove_page method isaku.yamahata
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: isaku.yamahata @ 2023-09-21 20:14 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add text cases for guest_memfd KVM_GUEST_MEMORY_FAILURE ioctl.
+ ioctl(KVM_GUEST_MEMORY_FAILURE) success with backing pages
+ ioctl(KVM_GUEST_MEMORY_FAILURE) fails with ENOENT without backing page
+ interaction with fallocate(PUNCH_HOLE)

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 48 +++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index d5b4bfcdc3fe..f8b242c9319d 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -122,6 +122,53 @@ static void test_fallocate_fail(int fd, size_t page_size, size_t total_size)
 	}
 }
 
+static void test_memory_failure(int fd, size_t page_size, size_t total_size)
+{
+	struct kvm_guest_memory_failure mf;
+	int ret;
+	int i;
+
+	/* Make whole file unallocated as known initial state */
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+			0, total_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) at while file should succeed");
+
+	/* Because there is no backing page, fail to inject memory failure. */
+	for (i = 0; i < total_size / page_size; i++) {
+		mf = (struct kvm_guest_memory_failure) {
+			.offset = i * page_size,
+		};
+
+		ret = ioctl(fd, KVM_GUEST_MEMORY_FAILURE, &mf);
+		if (ret == -1 && errno == EPERM) {
+			pr_info("KVM_GUEST_MEMORY_FAILURE requires CAP_SYS_ADMIN. Skipping.\n")
+			return;
+		}
+		TEST_ASSERT(ret == -1 && errno == ENOENT,
+			    "ioctl(KVM_GUEST_MEMORY_FAILURE) should fail i %d", i);
+	}
+
+	/* Allocate pages with index one and two. */
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, page_size, page_size * 2);
+	TEST_ASSERT(!ret, "fallocate beginning at page_size should succeed");
+
+	for (i = 0; i < total_size / page_size; i++) {
+		mf = (struct kvm_guest_memory_failure) {
+			.offset = i * page_size,
+		};
+
+		ret = ioctl(fd, KVM_GUEST_MEMORY_FAILURE, &mf);
+
+		if (i == 1 || i == 2) {
+			TEST_ASSERT(!ret || (ret == -1 && errno == EBUSY),
+				    "ioctl(KVM_GUEST_MEMORY_FAILURE) should succeed i %d", i);
+		} else {
+			TEST_ASSERT(ret == -1 && errno == ENOENT,
+				    "ioctl(KVM_GUEST_MEMORY_FAILURE) should fail i %d", i);
+		}
+	}
+}
+
 static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
 {
 	uint64_t valid_flags = 0;
@@ -192,6 +239,7 @@ int main(int argc, char *argv[])
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
 	test_fallocate_fail(fd, page_size, total_size);
+	test_memory_failure(fd, page_size, total_size);
 
 	close(fd);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 6/6] KVM: guest_memfd: selftest: Add test case for error_remove_page method
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
                   ` (4 preceding siblings ...)
  2023-09-21 20:14 ` [RFC PATCH v2 5/6] KVM: selftests: Add test cases for KVM_GUEST_MEMORY_FAILURE isaku.yamahata
@ 2023-09-21 20:14 ` isaku.yamahata
  2023-09-21 23:22   ` Sean Christopherson
  2023-09-21 20:29 ` [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page Sean Christopherson
  2023-09-29  2:22 ` Sean Christopherson
  7 siblings, 1 reply; 20+ messages in thread
From: isaku.yamahata @ 2023-09-21 20:14 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

This test case implements fault injection into guest memory by
madvise(MADV_HWPOISON) for shared(conventional) memory region and
KVM_GUEST_MEMORY_FAILURE for private gmem region.  Once page is poisoned,
free the poisoned page and try to run vcpu again to see a new zero page is
assigned.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/private_mem_hwpoison_test.c    | 367 ++++++++++++++++++
 2 files changed, 368 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_hwpoison_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index f7fdd8244547..a72d0946c233 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -82,6 +82,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test
 TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
 TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test
+TEST_GEN_PROGS_x86_64 += x86_64/private_mem_hwpoison_test
 TEST_GEN_PROGS_x86_64 += x86_64/private_mem_kvm_exits_test
 TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
 TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_hwpoison_test.c b/tools/testing/selftests/kvm/x86_64/private_mem_hwpoison_test.c
new file mode 100644
index 000000000000..78242ee8c8db
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/private_mem_hwpoison_test.c
@@ -0,0 +1,367 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022, Google LLC.
+ * Copyright (C) 2023, Intel Corp.
+ */
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <fcntl.h>
+#include <limits.h>
+#include <pthread.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <setjmp.h>
+
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/kvm_para.h>
+#include <linux/memfd.h>
+#include <linux/sizes.h>
+#include <linux/fs.h>
+
+#include <test_util.h>
+#include <kvm_util.h>
+#include <processor.h>
+
+#define BASE_DATA_SLOT		10
+#define BASE_DATA_GPA		((uint64_t)(1ull << 32))
+#define PER_CPU_DATA_SIZE	((uint64_t)(SZ_2M))
+
+enum ucall_syncs {
+	HWPOISON_SHARED,
+	HWPOISON_PRIVATE,
+};
+
+static void guest_sync_shared(uint64_t gpa)
+{
+	GUEST_SYNC2(HWPOISON_SHARED, gpa);
+}
+
+static void guest_sync_private(uint64_t gpa)
+{
+	GUEST_SYNC2(HWPOISON_PRIVATE, gpa);
+}
+
+/* Arbitrary values, KVM doesn't care about the attribute flags. */
+#define MAP_GPA_SHARED		BIT(0)
+#define MAP_GPA_DO_FALLOCATE	BIT(1)
+#define MAP_GPA_HWPOISON	BIT(2)
+
+static void guest_map_mem(uint64_t gpa, uint64_t size, bool map_shared,
+			  bool do_fallocate, bool hwpoison)
+{
+	uint64_t flags = 0;
+
+	if (map_shared)
+		flags |= MAP_GPA_SHARED;
+	if (do_fallocate)
+		flags |= MAP_GPA_DO_FALLOCATE;
+	if (hwpoison)
+		flags |= MAP_GPA_HWPOISON;
+	kvm_hypercall_map_gpa_range(gpa, size, flags);
+}
+
+static void guest_map_shared(uint64_t gpa, uint64_t size, bool do_fallocate,
+			     bool hwpoison)
+{
+	guest_map_mem(gpa, size, true, do_fallocate, hwpoison);
+}
+
+static void guest_map_private(uint64_t gpa, uint64_t size, bool do_fallocate,
+			      bool hwpoison)
+{
+	guest_map_mem(gpa, size, false, do_fallocate, hwpoison);
+}
+
+static void guest_run_test(uint64_t base_gpa, bool huge_page,
+			   bool test_shared)
+{
+	uint64_t gpa = base_gpa + (huge_page ? 0 : PAGE_SIZE);
+	uint64_t size = huge_page ? SZ_2M : PAGE_SIZE;
+	const uint8_t init_p = 0xcc;
+	uint64_t r;
+
+	/* Memory should be shared by default. */
+	guest_map_shared(base_gpa, PER_CPU_DATA_SIZE, true, false);
+	memset((void *)base_gpa, 0, PER_CPU_DATA_SIZE);
+
+	/*
+	 * Set the test region to non-zero to differentiate it from the page
+	 * newly assigned.
+	 */
+	memset((void *)gpa, init_p, size);
+
+	/* Ask VMM to convert to private/shared the page and poison it. */
+	if (test_shared) {
+		guest_map_shared(gpa, size, true, true);
+		guest_sync_shared(gpa);
+	} else {
+		guest_map_private(gpa, size, true, true);
+		guest_sync_private(gpa);
+	}
+
+	/* Consume poisoned data. */
+	r = READ_ONCE(*(uint64_t *)gpa);
+	/* Discard the poisoned page and assign a new page. */
+	GUEST_ASSERT_EQ((uint8_t)r, 0);
+}
+
+static void guest_code(uint64_t base_gpa, bool huge_page, bool test_shared)
+{
+	guest_run_test(base_gpa, huge_page, test_shared);
+	GUEST_DONE();
+}
+
+static void handle_exit_hypercall(struct kvm_vcpu *vcpu)
+{
+	struct kvm_run *run = vcpu->run;
+	uint64_t gpa = run->hypercall.args[0];
+	uint64_t size = run->hypercall.args[1] * PAGE_SIZE;
+	bool map_shared = run->hypercall.args[2] & MAP_GPA_SHARED;
+	bool do_fallocate = run->hypercall.args[2] & MAP_GPA_DO_FALLOCATE;
+	struct kvm_vm *vm = vcpu->vm;
+
+	TEST_ASSERT(run->hypercall.nr == KVM_HC_MAP_GPA_RANGE,
+		    "Wanted MAP_GPA_RANGE (%u), got '%llu'",
+		    KVM_HC_MAP_GPA_RANGE, run->hypercall.nr);
+
+	if (do_fallocate)
+		vm_guest_mem_fallocate(vm, gpa, size, map_shared);
+
+	vm_set_memory_attributes(vm, gpa, size,
+				 map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE);
+	run->hypercall.ret = 0;
+}
+
+static void inject_memory_failure(int gmem_fd, uint64_t gpa)
+{
+	/* See vm_mem_add() in test_mem_failure() */
+	uint64_t offset = gpa - BASE_DATA_GPA;
+	struct kvm_guest_memory_failure mf = {
+		.offset = offset,
+	};
+	int ret;
+
+	ret = ioctl(gmem_fd, KVM_GUEST_MEMORY_FAILURE, &mf);
+	__TEST_REQUIRE(!(ret == -1 && errno == EPERM),
+		       "Injecting memory fault requires CAP_SYS_ADMIN");
+	TEST_ASSERT(!ret || (ret == -1 && errno == EBUSY),
+		    "ioctl(KVM_GUEST_MEMORY_FAILURE) should success");
+}
+
+static sigjmp_buf sigbuf;
+
+static void sigbus_handler(int sig, siginfo_t *info, void *data)
+{
+	TEST_ASSERT(sig == SIGBUS, "Unknown signal received %d\n", sig);
+	siglongjmp(sigbuf, 1);
+}
+
+static bool run_vcpus;
+
+struct test_args {
+	struct kvm_vcpu *vcpu;
+	int gmem_fd;
+	bool huge_page;
+	bool test_shared;
+};
+
+static void *__test_mem_failure(void *__args)
+{
+	struct test_args *args = __args;
+	struct kvm_vcpu *vcpu = args->vcpu;
+	struct kvm_run *run = vcpu->run;
+	struct kvm_vm *vm = vcpu->vm;
+	int gmem_fd = args->gmem_fd;
+	struct ucall uc;
+
+	while (!READ_ONCE(run_vcpus))
+		;
+
+	for ( ;; ) {
+		vcpu_run(vcpu);
+
+		if (run->exit_reason == KVM_EXIT_HYPERCALL) {
+			handle_exit_hypercall(vcpu);
+			continue;
+		}
+
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Wanted KVM_EXIT_IO, got exit reason: %u (%s)",
+			    run->exit_reason, exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vcpu, &uc)) {
+		case UCALL_ABORT:
+			REPORT_GUEST_ASSERT(uc);
+			break;
+		case UCALL_SYNC: {
+			uint64_t gpa = uc.args[1];
+
+			TEST_ASSERT(uc.args[0] == HWPOISON_SHARED ||
+				    uc.args[0] == HWPOISON_PRIVATE,
+				    "Unknown sync command '%ld'", uc.args[0]);
+
+			if (uc.args[0] == HWPOISON_PRIVATE) {
+				int ret;
+
+				inject_memory_failure(gmem_fd, gpa);
+				ret = _vcpu_run(vcpu);
+				TEST_ASSERT(ret == -1 && errno == EHWPOISON &&
+					    run->exit_reason == KVM_EXIT_MEMORY_FAULT,
+					    "exit_reason 0x%x",
+					    run->exit_reason);
+				/* Discard the poisoned page and assign new page. */
+				vm_guest_mem_fallocate(vm, gpa, PAGE_SIZE, true);
+			} else {
+				uint8_t *hva = addr_gpa2hva(vm, gpa);
+				int r;
+
+				r = madvise(hva, 8, MADV_HWPOISON);
+				__TEST_REQUIRE(!(r == -1 && errno == EPERM),
+					       "madvise(MADV_HWPOISON) requires CAP_SYS_ADMIN");
+				TEST_ASSERT(!r, "madvise(MADV_HWPOISON) should succeed");
+				if (sigsetjmp(sigbuf, 1)) {
+					TEST_ASSERT(!sigaction(SIGBUS, NULL, NULL),
+						    "sigaction should success");
+					r = madvise(hva, PAGE_SIZE, MADV_FREE);
+					TEST_ASSERT(!r, "madvise(MADV_FREE) should success");
+				} else {
+					struct sigaction sa = {
+						.sa_sigaction = sigbus_handler,
+						.sa_flags = SA_SIGINFO,
+					};
+					TEST_ASSERT(!sigaction(SIGBUS, &sa, NULL),
+						    "sigaction should success");
+					/* Trigger SIGBUS */
+					vcpu_run(vcpu);
+				}
+			}
+			break;
+		}
+		case UCALL_DONE:
+			return NULL;
+		default:
+			TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
+		}
+	}
+}
+
+static void test_mem_failure(enum vm_mem_backing_src_type src_type, uint32_t nr_vcpus,
+			     uint32_t nr_memslots, bool huge_page, bool test_shared)
+{
+	/*
+	 * Allocate enough memory so that each vCPU's chunk of memory can be
+	 * naturally aligned with respect to the size of the backing store.
+	 */
+	const size_t size = align_up(PER_CPU_DATA_SIZE, get_backing_src_pagesz(src_type));
+	const size_t memfd_size = size * nr_vcpus;
+	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+	pthread_t threads[KVM_MAX_VCPUS];
+	uint64_t gmem_flags;
+	struct kvm_vm *vm;
+	int memfd, i;
+
+	const struct vm_shape shape = {
+		.mode = VM_MODE_DEFAULT,
+		.type = KVM_X86_SW_PROTECTED_VM,
+	};
+
+	vm = __vm_create_with_vcpus(shape, nr_vcpus, 0, guest_code, vcpus);
+
+	vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
+
+	if (huge_page && !backing_src_can_be_huge(src_type))
+		TEST_FAIL("Huge page is requested, but not supported");
+	if (backing_src_can_be_huge(src_type))
+		gmem_flags = KVM_GUEST_MEMFD_ALLOW_HUGEPAGE;
+	else
+		gmem_flags = 0;
+	memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
+
+	for (i = 0; i < nr_memslots; i++)
+		vm_mem_add(vm, src_type, BASE_DATA_GPA + size * i,
+			   BASE_DATA_SLOT + i, size / vm->page_size,
+			   KVM_MEM_PRIVATE, memfd, size * i);
+
+	for (i = 0; i < nr_vcpus; i++) {
+		uint64_t gpa =  BASE_DATA_GPA + i * size;
+		struct test_args args;
+
+		vcpu_args_set(vcpus[i], 3, gpa, huge_page, test_shared);
+
+		virt_map(vm, gpa, gpa, size / vm->page_size);
+
+		args = (struct test_args) {
+			.vcpu = vcpus[i],
+			.gmem_fd = memfd,
+			.huge_page = huge_page,
+			.test_shared = test_shared,
+		};
+		pthread_create(&threads[i], NULL, __test_mem_failure, &args);
+	}
+
+	WRITE_ONCE(run_vcpus, true);
+
+	for (i = 0; i < nr_vcpus; i++)
+		pthread_join(threads[i], NULL);
+
+	kvm_vm_free(vm);
+
+	close(memfd);
+}
+
+static void help(const char *prog_name)
+{
+	printf("usage: %s [-h] [-m] [-M] [-n nr_vcpus] [-s mem_type] [-?]\n"
+	       " -h: use huge page\n"
+	       " -m: use multiple memslots (default: 1)\n"
+	       " -n: specify the number of vcpus (default: 1)\n"
+	       " -s: specify the memory type\n"
+	       " -?: print this message\n",
+	       prog_name);
+}
+
+int main(int argc, char *argv[])
+{
+	enum vm_mem_backing_src_type src_type = DEFAULT_VM_MEM_SRC;
+	bool use_multiple_memslots = false;
+	bool huge_page = false;
+	uint32_t nr_vcpus = 1;
+	uint32_t nr_memslots;
+	int opt;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXIT_HYPERCALL));
+	TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) &
+		     BIT(KVM_X86_SW_PROTECTED_VM));
+
+	while ((opt = getopt(argc, argv, "hmn:s:S?")) != -1) {
+		switch (opt) {
+		case 'h':
+			huge_page = true;
+			break;
+		case 'm':
+			use_multiple_memslots = true;
+			break;
+		case 'n':
+			nr_vcpus = atoi_positive("nr_vcpus", optarg);
+			break;
+		case 's':
+			src_type = parse_backing_src_type(optarg);
+			break;
+		case '?':
+		default:
+			help(argv[0]);
+			exit(0);
+		}
+	}
+
+	nr_memslots = use_multiple_memslots ? nr_vcpus : 1;
+
+	test_mem_failure(src_type, nr_vcpus, nr_memslots, huge_page, true);
+	test_mem_failure(src_type, nr_vcpus, nr_memslots, huge_page, false);
+
+	return 0;
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
                   ` (5 preceding siblings ...)
  2023-09-21 20:14 ` [RFC PATCH v2 6/6] KVM: guest_memfd: selftest: Add test case for error_remove_page method isaku.yamahata
@ 2023-09-21 20:29 ` Sean Christopherson
  2023-09-22 19:40   ` Isaku Yamahata
  2023-09-29  2:22 ` Sean Christopherson
  7 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2023-09-21 20:29 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> This patch series is to implement test cases for the KVM gmem error_remove_page
> method.
> - Update punch hole method to truncate pages
> - Add a new ioctl KVM_GUEST_MEMORY_FAILURE to inject memory failure on
>   offset of gmem

Doh.  Please try to communicate what you're working on.  I was just about to hit
SEND on a series to fix the truncation bug, and to add a similar test.  I would
have happily punted that in your direction, but I had no idea that you were aware
of the bug[*], let alone working on a fix.  I could have explicitly stated that
I was going to fix the bug, but I thought that it was implied that I needed to
clean up my own mess.

Anyways, I'm going to hit SEND anyways, my changes are slightly different to your
patch 1 (truncate needs to happen between begin() and end()) and patch 3 (I added
a slightly more comprehensive test).

[*] https://lore.kernel.org/all/20230918163647.m6bjgwusc7ww5tyu@amd.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole
  2023-09-21 20:14 ` [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole isaku.yamahata
@ 2023-09-21 20:37   ` Sean Christopherson
  2023-09-21 21:34     ` Sean Christopherson
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2023-09-21 20:37 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Although kvm_gmem_punch_hole() keeps all pages in mapping on punching hole,
> it's common expectation that pages are truncated.  Truncate pages on
> punching hole.  As page contents can be encrypted, avoid zeroing partial
> folio by refusing partial punch hole.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  virt/kvm/guest_mem.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> index a819367434e9..01fb4ca861d0 100644
> --- a/virt/kvm/guest_mem.c
> +++ b/virt/kvm/guest_mem.c
> @@ -130,22 +130,32 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
>  static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>  {
>  	struct list_head *gmem_list = &inode->i_mapping->private_list;
> +	struct address_space *mapping  = inode->i_mapping;
>  	pgoff_t start = offset >> PAGE_SHIFT;
>  	pgoff_t end = (offset + len) >> PAGE_SHIFT;
>  	struct kvm_gmem *gmem;
>  
> +	/*
> +	 * punch hole may result in zeroing partial area.  As pages can be
> +	 * encrypted, prohibit zeroing partial area.
> +	 */
> +	if (offset & ~PAGE_MASK || len & ~PAGE_MASK)
> +		return -EINVAL;

This should be unnecessary, kvm_gmem_fallocate() does

	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
		return -EINVAL;

before invoking kvm_gmem_punch_hole().  If that's not working, i.e. your test
fails, then that code needs to be fixed.  I'll run your test to double-check,
but AFAICT this is unnecesary.

> +
>  	/*
>  	 * Bindings must stable across invalidation to ensure the start+end
>  	 * are balanced.
>  	 */
> -	filemap_invalidate_lock(inode->i_mapping);
> +	filemap_invalidate_lock(mapping);
>  
>  	list_for_each_entry(gmem, gmem_list, entry) {
>  		kvm_gmem_invalidate_begin(gmem, start, end);
>  		kvm_gmem_invalidate_end(gmem, start, end);
>  	}
>  
> -	filemap_invalidate_unlock(inode->i_mapping);
> +	truncate_inode_pages_range(mapping, offset, offset + len - 1);

The truncate needs to happen between begin() and end(), otherwise KVM can create
mappings to the memory between end() and truncate().

> +
> +	filemap_invalidate_unlock(mapping);
>  
>  	return 0;
>  }
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 3/6] KVM: selftests: Add tests for punch hole on guest_memfd
  2023-09-21 20:14 ` [RFC PATCH v2 3/6] KVM: selftests: Add tests for punch hole on guest_memfd isaku.yamahata
@ 2023-09-21 20:40   ` Sean Christopherson
  0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-09-21 20:40 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Punch hole implies the region is zeroed out. Add tests if the punched
> region has zero.
> Oppertunistically Remove unused member, pattern, in guest_run_test().

Heh, I didn't pretty much all the same stuff, except I opted to have
SET_ATTRIBUTE instead of FALLOCATE_ONLY, e.g. in case we end up with a test that's
wants a thrid flavor (or somehow neither?).

> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  .../kvm/x86_64/private_mem_conversions_test.c | 26 ++++++++++++++-----
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 

...

> @@ -156,6 +163,10 @@ static void guest_run_test(uint64_t base_gpa, bool do_fallocate)
>  
>  		if (size > PAGE_SIZE) {
>  			memset((void *)gpa, p2, PAGE_SIZE);
> +
> +			/* Test if punch hole results in zeroing page. */
> +			guest_punch_hole_private(gpa, PAGE_SIZE);
> +			memcmp_g(gpa, 0, PAGE_SIZE);

I added a dedicated sub-test to provide a bit more variety in the tests.  I highly
doubt my version will ever find a bug that isn't caught by this approach, but if
nothing else, it's nice to have a completely separate sub-test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd
  2023-09-21 20:14 ` [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd isaku.yamahata
@ 2023-09-21 21:29   ` Sean Christopherson
  2023-09-21 21:53   ` Sean Christopherson
  1 sibling, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-09-21 21:29 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> To test error_remove_page() method of KVM gmem, add a new ioctl to
> inject memory failure based on offset of guest memfd.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  include/uapi/linux/kvm.h |  6 ++++
>  virt/kvm/guest_mem.c     | 68 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 74 insertions(+)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 65fc983af840..4160614bcc0f 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -2323,4 +2323,10 @@ struct kvm_create_guest_memfd {
>  	__u64 reserved[6];
>  };
>  
> +#define KVM_GUEST_MEMORY_FAILURE	_IOWR(KVMIO,  0xd5, struct kvm_guest_memory_failure)

If we're going to add a KVM ioctl(), my vote is to make it a generic ioctl(), not
something that's specific to guest_memfd().  IIUC, all we need is the PFN, so the
only downside is that it'd require valid memslots.  But the test isn't all that
interesting unless there are memslots, so I don't see that as a negative.

And if we add an ioctl(), it should be conditioned on CONFIG_HWPOISON_INJECT.

An alternative I think we should seriously consider is using the FAULT_INJECTION
framework to poison pages.  We (Google) have plans to utilize fault injection for
other things in KVM, e.g. to inject "failures" on CMPXCHG in atomic SPTE updates
to force KVM down unlikely slow paths.  I don't expect us to get patches posted
until early next year due to priorities, but hell or high water we will get patches
posted at some point.

The fault injection framework might be overkill for injecting memory errors, e.g.
a single ioctl() is definitely simpler to setup, but I suspect it would also be
much more powerful in the long run..

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole
  2023-09-21 20:37   ` Sean Christopherson
@ 2023-09-21 21:34     ` Sean Christopherson
  2023-10-05 17:52       ` Michael Roth
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2023-09-21 21:34 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023, Sean Christopherson wrote:
> On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > Although kvm_gmem_punch_hole() keeps all pages in mapping on punching hole,
> > it's common expectation that pages are truncated.  Truncate pages on
> > punching hole.  As page contents can be encrypted, avoid zeroing partial
> > folio by refusing partial punch hole.
> > 
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> >  virt/kvm/guest_mem.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> > index a819367434e9..01fb4ca861d0 100644
> > --- a/virt/kvm/guest_mem.c
> > +++ b/virt/kvm/guest_mem.c
> > @@ -130,22 +130,32 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
> >  static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> >  {
> >  	struct list_head *gmem_list = &inode->i_mapping->private_list;
> > +	struct address_space *mapping  = inode->i_mapping;
> >  	pgoff_t start = offset >> PAGE_SHIFT;
> >  	pgoff_t end = (offset + len) >> PAGE_SHIFT;
> >  	struct kvm_gmem *gmem;
> >  
> > +	/*
> > +	 * punch hole may result in zeroing partial area.  As pages can be
> > +	 * encrypted, prohibit zeroing partial area.
> > +	 */
> > +	if (offset & ~PAGE_MASK || len & ~PAGE_MASK)
> > +		return -EINVAL;
> 
> This should be unnecessary, kvm_gmem_fallocate() does
> 
> 	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
> 		return -EINVAL;
> 
> before invoking kvm_gmem_punch_hole().  If that's not working, i.e. your test
> fails, then that code needs to be fixed.  I'll run your test to double-check,
> but AFAICT this is unnecesary.

I confirmed that the testcase passes without the extra checks.  Just to close the
loop, what prompted adding more checks to kvm_gmem_punch_hole()?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd
  2023-09-21 20:14 ` [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd isaku.yamahata
  2023-09-21 21:29   ` Sean Christopherson
@ 2023-09-21 21:53   ` Sean Christopherson
  1 sibling, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-09-21 21:53 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> +	if (!pfn_valid(pfn))
> +		return -ENXIO;
> +	return memory_failure(pfn, MF_SW_SIMULATED);

memory_failure is defined iff CONFIG_MEMORY_FAILURE=y.  All of this code would
need to be conditioned on that (in addition to the injection configs).

address_space_operations.error_remove_page() arguably should be conditioned on
that as well, but that's a much bigger change and not a problem that needs to be
solved anytime soon.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 6/6] KVM: guest_memfd: selftest: Add test case for error_remove_page method
  2023-09-21 20:14 ` [RFC PATCH v2 6/6] KVM: guest_memfd: selftest: Add test case for error_remove_page method isaku.yamahata
@ 2023-09-21 23:22   ` Sean Christopherson
  0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-09-21 23:22 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> This test case implements fault injection into guest memory by
> madvise(MADV_HWPOISON) for shared(conventional) memory region and
> KVM_GUEST_MEMORY_FAILURE for private gmem region.  Once page is poisoned,
> free the poisoned page and try to run vcpu again to see a new zero page is
> assigned.

Thanks much for the test!  I think for the initial merge it makes sense to leave
this out, mainly because I don't think we want a KVM specific ioctl().  But I'll
definitely keep this around to do manual point testing.

> +#define BASE_DATA_SLOT		10
> +#define BASE_DATA_GPA		((uint64_t)(1ull << 32))
> +#define PER_CPU_DATA_SIZE	((uint64_t)(SZ_2M))
> +
> +enum ucall_syncs {
> +	HWPOISON_SHARED,
> +	HWPOISON_PRIVATE,
> +};
> +
> +static void guest_sync_shared(uint64_t gpa)

Probably guest_poison_{shared,private}(), or maybe just open code the GUEST_SYNC2()
calls.  I added helpers in the other tests because the ucalls were a bit more
involved then passing the GPA.

However, I don't see any reason to do hypercalls and on-demand mapping/fallocate.
Just have two separate sub-tests, one for private and one for shared, each with
its own host.  I'm pretty sure the guest code can be the same, e.g. I believe it
would just boil down to:

static void guest_code(uint64_t gpa)
{
	uint64_t *addr = (void *)gpa;

	WRITE_ONCE(*addr, <some pattern>);

	/* Ask the host to poison the page. */
	GUEST_SYNC(EWPOISON);

	/*
	 * Access the poisoned page.  The host should see a SIGBUS or EHWPOISON
	 * and then truncate the page.  After truncation, the page should be
	 * faulted back and read zeros, all before the read completes.
	 */
	GUEST_ASSERT_EQ(*(uint64_t *)gpa, 0);
	GUEST_DONE();
}

> +			if (uc.args[0] == HWPOISON_PRIVATE) {
> +				int ret;
> +
> +				inject_memory_failure(gmem_fd, gpa);
> +				ret = _vcpu_run(vcpu);
> +				TEST_ASSERT(ret == -1 && errno == EHWPOISON &&

Honestly, I'm kinda surprised the KVM code actually works :-)

> +					    run->exit_reason == KVM_EXIT_MEMORY_FAULT,
> +					    "exit_reason 0x%x",
> +					    run->exit_reason);
> +				/* Discard the poisoned page and assign new page. */
> +				vm_guest_mem_fallocate(vm, gpa, PAGE_SIZE, true);
> +			} else {
> +				uint8_t *hva = addr_gpa2hva(vm, gpa);
> +				int r;
> +
> +				r = madvise(hva, 8, MADV_HWPOISON);

Huh.  TIL there's an MADV_HWPOISON.  We've already talked about adding fbind(),
adding an fadvise() seems like the obvious solution.  Or maybe overload
fallocate() with a new flag?  Regardless, I think we should add or extend a generic
fd-based syscall(), not throw in something KVM specific.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page
  2023-09-21 20:29 ` [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page Sean Christopherson
@ 2023-09-22 19:40   ` Isaku Yamahata
  2023-09-22 20:32     ` Sean Christopherson
  0 siblings, 1 reply; 20+ messages in thread
From: Isaku Yamahata @ 2023-09-22 19:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Michael Roth,
	Paolo Bonzini, erdemaktas, Sagi Shahar, David Matlack, Kai Huang,
	Zhi Wang, chen.bo, linux-coco, Chao Peng, Ackerley Tng,
	Vishal Annapurve, Yuan Yao, Jarkko Sakkinen, Xu Yilun,
	Quentin Perret, wei.w.wang, Fuad Tabba, isaku.yamahata

On Thu, Sep 21, 2023 at 01:29:59PM -0700,
Sean Christopherson <seanjc@google.com> wrote:

> On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > This patch series is to implement test cases for the KVM gmem error_remove_page
> > method.
> > - Update punch hole method to truncate pages
> > - Add a new ioctl KVM_GUEST_MEMORY_FAILURE to inject memory failure on
> >   offset of gmem
> 
> Doh.  Please try to communicate what you're working on.  I was just about to hit
> SEND on a series to fix the truncation bug, and to add a similar test.  I would
> have happily punted that in your direction, but I had no idea that you were aware
> of the bug[*], let alone working on a fix.  I could have explicitly stated that
> I was going to fix the bug, but I thought that it was implied that I needed to
> clean up my own mess.

Oops sorry.  Now I'm considering about machine check injection.
i.e. somehow trigger kvm_machine_check() and its own test cases.
-- 
Isaku Yamahata <isaku.yamahata@linux.intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page
  2023-09-22 19:40   ` Isaku Yamahata
@ 2023-09-22 20:32     ` Sean Christopherson
  2023-09-28 17:14       ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2023-09-22 20:32 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Michael Roth,
	Paolo Bonzini, erdemaktas, Sagi Shahar, David Matlack, Kai Huang,
	Zhi Wang, chen.bo, linux-coco, Chao Peng, Ackerley Tng,
	Vishal Annapurve, Yuan Yao, Jarkko Sakkinen, Xu Yilun,
	Quentin Perret, wei.w.wang, Fuad Tabba

On Fri, Sep 22, 2023, Isaku Yamahata wrote:
> On Thu, Sep 21, 2023 at 01:29:59PM -0700,
> Sean Christopherson <seanjc@google.com> wrote:
> 
> > On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > 
> > > This patch series is to implement test cases for the KVM gmem error_remove_page
> > > method.
> > > - Update punch hole method to truncate pages
> > > - Add a new ioctl KVM_GUEST_MEMORY_FAILURE to inject memory failure on
> > >   offset of gmem
> > 
> > Doh.  Please try to communicate what you're working on.  I was just about to hit
> > SEND on a series to fix the truncation bug, and to add a similar test.  I would
> > have happily punted that in your direction, but I had no idea that you were aware
> > of the bug[*], let alone working on a fix.  I could have explicitly stated that
> > I was going to fix the bug, but I thought that it was implied that I needed to
> > clean up my own mess.
> 
> Oops sorry.  Now I'm considering about machine check injection.
> i.e. somehow trigger kvm_machine_check() and its own test cases.

Unless we can't extend fadvise() for some reason, I think we should pursue
FADV_HWPOISION.  The enabling should be downright trivial, e.g. just implement
file_operations.fadvise() for guest_memfd, have it handle FADV_HWPOISON, and pass
everything else to generic_fadvise().

It'll basically be your ioctl() just without a dedicated ioctl().

At the very least, we should run the idea past the fs maintainers.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page
  2023-09-22 20:32     ` Sean Christopherson
@ 2023-09-28 17:14       ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2023-09-28 17:14 UTC (permalink / raw)
  To: Sean Christopherson, Isaku Yamahata
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Michael Roth,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On 9/22/23 22:32, Sean Christopherson wrote:
> Unless we can't extend fadvise() for some reason, I think we should pursue
> FADV_HWPOISION.  The enabling should be downright trivial, e.g. just implement
> file_operations.fadvise() for guest_memfd, have it handle FADV_HWPOISON, and pass
> everything else to generic_fadvise().
> 
> It'll basically be your ioctl() just without a dedicated ioctl().
> 
> At the very least, we should run the idea past the fs maintainers.

fadvise() is different from madvise() though and not necessarily a great 
match.  Looking at the list of flags in advise(), something like 
FADV_POPULATE_READ, FADV_PAGEOUT or FADV_COLD would make sense, but I 
can't really think of any other flag that would be useful in a general 
case for fadvise.  Everything else would have to be very spcific to 
memfd or guest_memfd.

In particular FADV_HWPOISON would not make sense for anything that is 
not backend by memory.  There are some flags that could be useful on 
gmem file descriptors, such as hypothetically {WIPE,KEEP}ONFORK or 
SOFT_OFFLINE, but again they're not something that can be applied to 
fadvise().

So a ioctl implementation does have some advantages after all.  I 
suggest that we reuse MADV_* flags in the ioctl arguments, to leave the 
door open for future extensions and avoid ioctl proliferation.  The 
ioctl could be implemented by memfd, too, and perhaps even by /dev/zero.

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page
  2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
                   ` (6 preceding siblings ...)
  2023-09-21 20:29 ` [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page Sean Christopherson
@ 2023-09-29  2:22 ` Sean Christopherson
  7 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-09-29  2:22 UTC (permalink / raw)
  To: Sean Christopherson, kvm, linux-kernel, isaku.yamahata
  Cc: isaku.yamahata, Michael Roth, Paolo Bonzini, erdemaktas,
	Sagi Shahar, David Matlack, Kai Huang, Zhi Wang, chen.bo,
	linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve, Yuan Yao,
	Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, 21 Sep 2023 13:14:33 -0700, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> This patch series is to implement test cases for the KVM gmem error_remove_page
> method.
> - Update punch hole method to truncate pages
> - Add a new ioctl KVM_GUEST_MEMORY_FAILURE to inject memory failure on
>   offset of gmem
> 
> [...]

Applied the punch hole negative test to kvm-x86 guest_memfd, thanks!

[2/6] KVM: selftests: Add negative test cases for punch hole for guest_memfd()
      https://github.com/kvm-x86/linux/commit/e2bbfd5549be

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole
  2023-09-21 21:34     ` Sean Christopherson
@ 2023-10-05 17:52       ` Michael Roth
  2023-10-05 23:48         ` Sean Christopherson
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Roth @ 2023-10-05 17:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Sep 21, 2023 at 02:34:46PM -0700, Sean Christopherson wrote:
> On Thu, Sep 21, 2023, Sean Christopherson wrote:
> > On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > 
> > > Although kvm_gmem_punch_hole() keeps all pages in mapping on punching hole,
> > > it's common expectation that pages are truncated.  Truncate pages on
> > > punching hole.  As page contents can be encrypted, avoid zeroing partial
> > > folio by refusing partial punch hole.
> > > 
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > ---
> > >  virt/kvm/guest_mem.c | 14 ++++++++++++--
> > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> > > index a819367434e9..01fb4ca861d0 100644
> > > --- a/virt/kvm/guest_mem.c
> > > +++ b/virt/kvm/guest_mem.c
> > > @@ -130,22 +130,32 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
> > >  static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> > >  {
> > >  	struct list_head *gmem_list = &inode->i_mapping->private_list;
> > > +	struct address_space *mapping  = inode->i_mapping;
> > >  	pgoff_t start = offset >> PAGE_SHIFT;
> > >  	pgoff_t end = (offset + len) >> PAGE_SHIFT;
> > >  	struct kvm_gmem *gmem;
> > >  
> > > +	/*
> > > +	 * punch hole may result in zeroing partial area.  As pages can be
> > > +	 * encrypted, prohibit zeroing partial area.
> > > +	 */
> > > +	if (offset & ~PAGE_MASK || len & ~PAGE_MASK)
> > > +		return -EINVAL;
> > 
> > This should be unnecessary, kvm_gmem_fallocate() does
> > 
> > 	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
> > 		return -EINVAL;
> > 
> > before invoking kvm_gmem_punch_hole().  If that's not working, i.e. your test
> > fails, then that code needs to be fixed.  I'll run your test to double-check,
> > but AFAICT this is unnecesary.
> 
> I confirmed that the testcase passes without the extra checks.  Just to close the
> loop, what prompted adding more checks to kvm_gmem_punch_hole()?

I don't know if it's the same issue that Isaku ran into, but for SNP we
hit a similar issue with the truncate_inode_pages_range(lstart, lend) call.

The issue in that case was a bit more subtle:

  - userspace does a hole-punch on a 4K range of its gmem FD, which happens
    to be backed by a 2MB folio.
  - truncate_inode_pages_range() gets called for that 4K range
  - truncate_inode_pages_range() does special handling on the folios at the
    start/end of the range in case they are partial and passes these to
    truncate_inode_partial_folio(folio, lstart, lend). In this case, there's
    just the 1 backing folio. But it *still* gets the special treatment, and
    so gets passed to truncate_inode_partial_folio().
  - truncate_inode_partial_folio() will then zero that 4K range, even though
    it is page-aligned, based on the following rationale in the comments:

        /*
         * We may be zeroing pages we're about to discard, but it avoids
         * doing a complex calculation here, and then doing the zeroing
         * anyway if the page split fails.
         */
        folio_zero_range(folio, offset, length);

  - after that, .invalidate_folio callback is issued, then the folio is split,
    and the caller (truncate_inode_pages_range()) does another pass through
	the whole range and can free the now-split folio then .free_folio callbacks
    are issued.

Because of that, we can't rely on .invalidate_folio/.free_folio to handle
putting the page back into a normal host-accessible state, because the
zero'ing will happen beforehand. That's why we ended up needing to do this
for SNP patches to make sure arch-specific invalidation callbacks are issued 
before the truncation occurs:

  https://github.com/mdroth/linux/commit/4ebcc04b84dd691fc6daccb9b7438402520b0704#diff-77306411fdaeb7f322a1ca756dead9feb75363aa6117b703ac118576153ddb37R233

I'd planned to post those as a separate RFC to discuss, but when I came across
this it seemed like it might be relevant to what the TDX folks might ran into
here.

If not for the zero'ing logic mentioned above, for SNP at least, the
.free_folio() ends up working pretty nicely for both truncation and fput(),
and even plays nicely with live update use-case where the destination gmem
instance shares the inode->i_mapping, since iput() won't trigger the
truncate_inode_pages_final() until the last reference goes away so we don't
have to do anything special in kvm_gmem_release() to determine when we
should/shouldn't issue the arch-invalidations to clean up things like the
RMP table.

It seems like the above zero'ing logic could be reworked to only zero non-page
aligned ranges (as the comments above truncate_inode_pages_range() claim
should be the case), which would avoid the issue for the gmem use-case. But I
wonder if some explicit "dont-zero-these-pages" flag might be more robust.

Or maybe there's some other way we should be going about this?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole
  2023-10-05 17:52       ` Michael Roth
@ 2023-10-05 23:48         ` Sean Christopherson
  0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-10-05 23:48 UTC (permalink / raw)
  To: Michael Roth
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Thu, Oct 05, 2023, Michael Roth wrote:
> On Thu, Sep 21, 2023 at 02:34:46PM -0700, Sean Christopherson wrote:
> > On Thu, Sep 21, 2023, Sean Christopherson wrote:
> > > > diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> > > > index a819367434e9..01fb4ca861d0 100644
> > > > --- a/virt/kvm/guest_mem.c
> > > > +++ b/virt/kvm/guest_mem.c
> > > > @@ -130,22 +130,32 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
> > > >  static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> > > >  {
> > > >  	struct list_head *gmem_list = &inode->i_mapping->private_list;
> > > > +	struct address_space *mapping  = inode->i_mapping;
> > > >  	pgoff_t start = offset >> PAGE_SHIFT;
> > > >  	pgoff_t end = (offset + len) >> PAGE_SHIFT;
> > > >  	struct kvm_gmem *gmem;
> > > >  
> > > > +	/*
> > > > +	 * punch hole may result in zeroing partial area.  As pages can be
> > > > +	 * encrypted, prohibit zeroing partial area.
> > > > +	 */
> > > > +	if (offset & ~PAGE_MASK || len & ~PAGE_MASK)
> > > > +		return -EINVAL;
> > > 
> > > This should be unnecessary, kvm_gmem_fallocate() does
> > > 
> > > 	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
> > > 		return -EINVAL;
> > > 
> > > before invoking kvm_gmem_punch_hole().  If that's not working, i.e. your test
> > > fails, then that code needs to be fixed.  I'll run your test to double-check,
> > > but AFAICT this is unnecesary.
> > 
> > I confirmed that the testcase passes without the extra checks.  Just to close the
> > loop, what prompted adding more checks to kvm_gmem_punch_hole()?
> 
> I don't know if it's the same issue that Isaku ran into, but for SNP we
> hit a similar issue with the truncate_inode_pages_range(lstart, lend) call.
> 
> The issue in that case was a bit more subtle:
> 
>   - userspace does a hole-punch on a 4K range of its gmem FD, which happens
>     to be backed by a 2MB folio.
>   - truncate_inode_pages_range() gets called for that 4K range
>   - truncate_inode_pages_range() does special handling on the folios at the
>     start/end of the range in case they are partial and passes these to
>     truncate_inode_partial_folio(folio, lstart, lend). In this case, there's
>     just the 1 backing folio. But it *still* gets the special treatment, and
>     so gets passed to truncate_inode_partial_folio().
>   - truncate_inode_partial_folio() will then zero that 4K range, even though
>     it is page-aligned, based on the following rationale in the comments:
> 
>         /*
>          * We may be zeroing pages we're about to discard, but it avoids
>          * doing a complex calculation here, and then doing the zeroing
>          * anyway if the page split fails.
>          */
>         folio_zero_range(folio, offset, length);
> 
>   - after that, .invalidate_folio callback is issued, then the folio is split,
>     and the caller (truncate_inode_pages_range()) does another pass through
> 	the whole range and can free the now-split folio then .free_folio callbacks
>     are issued.
> 
> Because of that, we can't rely on .invalidate_folio/.free_folio to handle
> putting the page back into a normal host-accessible state, because the
> zero'ing will happen beforehand.

Argh, and that causes an RMP violation #PF.

FWIW, I don't *think* zeroing would be problematic for TDX.  The page would get
poisoned, but KVM would re-zero the memory with MOVDIR64B and flush the cache.

> That's why we ended up needing to do this for SNP patches to make sure
> arch-specific invalidation callbacks are issued before the truncation occurs:
> 
>   https://github.com/mdroth/linux/commit/4ebcc04b84dd691fc6daccb9b7438402520b0704#diff-77306411fdaeb7f322a1ca756dead9feb75363aa6117b703ac118576153ddb37R233
> 
> I'd planned to post those as a separate RFC to discuss, but when I came across
> this it seemed like it might be relevant to what the TDX folks might ran into
> here.
> 
> If not for the zero'ing logic mentioned above, for SNP at least, the
> .free_folio() ends up working pretty nicely for both truncation and fput(),
> and even plays nicely with live update use-case where the destination gmem
> instance shares the inode->i_mapping, since iput() won't trigger the
> truncate_inode_pages_final() until the last reference goes away so we don't
> have to do anything special in kvm_gmem_release() to determine when we
> should/shouldn't issue the arch-invalidations to clean up things like the
> RMP table.
> 
> It seems like the above zero'ing logic could be reworked to only zero non-page
> aligned ranges (as the comments above truncate_inode_pages_range() claim
> should be the case), which would avoid the issue for the gmem use-case. But I
> wonder if some explicit "dont-zero-these-pages" flag might be more robust.
> 
> Or maybe there's some other way we should be going about this?

Skipping the write seems like the obvious solution.  An address_space flag,
e.g. AS_INACCESSIBLE, would be the easiest thing to implement.  Or maybe even
make it AS_DONT_ZERO_ON_TRUNCATE_DAMMIT (mostly joking).

Or a hook in address_space_operations to zero the folio, which conceptually is
better in many ways, but feels like overkill.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-10-05 23:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
2023-09-21 20:14 ` [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole isaku.yamahata
2023-09-21 20:37   ` Sean Christopherson
2023-09-21 21:34     ` Sean Christopherson
2023-10-05 17:52       ` Michael Roth
2023-10-05 23:48         ` Sean Christopherson
2023-09-21 20:14 ` [RFC PATCH v2 2/6] KVM: selftests: Add negative test cases for punch hole for guest_memfd() isaku.yamahata
2023-09-21 20:14 ` [RFC PATCH v2 3/6] KVM: selftests: Add tests for punch hole on guest_memfd isaku.yamahata
2023-09-21 20:40   ` Sean Christopherson
2023-09-21 20:14 ` [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd isaku.yamahata
2023-09-21 21:29   ` Sean Christopherson
2023-09-21 21:53   ` Sean Christopherson
2023-09-21 20:14 ` [RFC PATCH v2 5/6] KVM: selftests: Add test cases for KVM_GUEST_MEMORY_FAILURE isaku.yamahata
2023-09-21 20:14 ` [RFC PATCH v2 6/6] KVM: guest_memfd: selftest: Add test case for error_remove_page method isaku.yamahata
2023-09-21 23:22   ` Sean Christopherson
2023-09-21 20:29 ` [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page Sean Christopherson
2023-09-22 19:40   ` Isaku Yamahata
2023-09-22 20:32     ` Sean Christopherson
2023-09-28 17:14       ` Paolo Bonzini
2023-09-29  2:22 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).