linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page
@ 2023-09-13 10:48 isaku.yamahata
  2023-09-13 10:48 ` [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page isaku.yamahata
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: isaku.yamahata @ 2023-09-13 10:48 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch series is to share my progress on the KVM gmem error_remove_page task.
Although I'm still working on test cases, I don't want to hold the patches
locally until I finish test cases.

- Update error_remove_page method.  Unmap gfn on poisoned pages.  Pass related
  arguments.  Unfortunately, the error_remove_page callback is passed struct
  page.  So the callback can't know about the actual poisoned address and range.
  The memory poisoning would be based on cache line size, though.
- Add a new flag to KVM_EXIT_MEMORY_FAULT to indicate the page is poisoned.
- Add check in faultin_pfn_private.  When the page is poisoned,
  KVM_EXIT_MEMORY_FAULT(HWPOISON).
- Only test case for ioctl(FIBMAP).  Test cases are TODO.

TODOs
- Implement test cases to inject HWPOISON or MCE by hwpoison
  (/sys/kernel/debug/hwpoison/corrupt-pfn) or MCE injection
  (/sys/kernel/debug/mce-inject).
- Update qemu to handle KVM_EXIT_MEMORY_FAULT(HWPOISON)
- Update TDX KVM to handle it and Add test cases for TDX.
- Try to inject HWPOISON as soon as the poison is detected.

Isaku Yamahata (6):
  KVM: guest_memfd: Add config to show the capability to handle error
    page
  KVM: guestmem_fd: Make error_remove_page callback to unmap guest
    memory
  KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page
  KVM: guest_memfd: Implemnet bmap inode operation
  KVM: selftests: Add selftest for guest_memfd() fibmap
  KVM: X86: Allow KVM gmem hwpoison test cases

 arch/x86/kvm/Kconfig                          |  2 +
 arch/x86/kvm/mmu/mmu.c                        | 21 +++--
 include/linux/kvm_host.h                      |  2 +
 include/uapi/linux/kvm.h                      |  3 +-
 .../testing/selftests/kvm/guest_memfd_test.c  | 45 ++++++++++
 virt/kvm/Kconfig                              |  7 ++
 virt/kvm/guest_mem.c                          | 82 +++++++++++++++----
 7 files changed, 139 insertions(+), 23 deletions(-)


base-commit: a5accd8596fa84b9fe00076444b5ef628d2351b9
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page
  2023-09-13 10:48 [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page isaku.yamahata
@ 2023-09-13 10:48 ` isaku.yamahata
  2023-09-13 16:16   ` Sean Christopherson
  2023-09-13 10:48 ` [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory isaku.yamahata
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: isaku.yamahata @ 2023-09-13 10:48 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add config, HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR, to indicate kvm arch
can handle gmem error page.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/Kconfig     | 3 +++
 virt/kvm/guest_mem.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 1a48cb530092..624df45baff0 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,3 +112,6 @@ config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_PRIVATE_MEM
        bool
+
+config HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
+	bool
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 85903c32163f..35d8f03e7937 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -307,6 +307,9 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
 	pgoff_t start, end;
 	gfn_t gfn;
 
+	if (!IS_ENABLED(CONFIG_HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR))
+		return MF_IGNORED;
+
 	filemap_invalidate_lock_shared(mapping);
 
 	start = page->index;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory
  2023-09-13 10:48 [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page isaku.yamahata
  2023-09-13 10:48 ` [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page isaku.yamahata
@ 2023-09-13 10:48 ` isaku.yamahata
  2023-09-13 16:28   ` Sean Christopherson
  2023-09-13 10:48 ` [RFC PATCH 3/6] KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page isaku.yamahata
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: isaku.yamahata @ 2023-09-13 10:48 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement error_remove_page inode method for KVM gmem.  Update struct
kvm_gfn_range to indicate unmapping gufs because page is poisoned.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/linux/kvm_host.h |  2 ++
 virt/kvm/guest_mem.c     | 47 +++++++++++++++++++++++++++-------------
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 091bc89ae805..e81a7123c84f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -266,8 +266,10 @@ struct kvm_gfn_range {
 		pte_t pte;
 		unsigned long attributes;
 		u64 raw;
+		struct page *page;
 	} arg;
 	bool may_block;
+	bool memory_error;
 };
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 35d8f03e7937..746e683df589 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -305,7 +305,7 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
 	struct kvm_gmem *gmem;
 	unsigned long index;
 	pgoff_t start, end;
-	gfn_t gfn;
+	bool flush;
 
 	if (!IS_ENABLED(CONFIG_HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR))
 		return MF_IGNORED;
@@ -316,26 +316,43 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
 	end = start + thp_nr_pages(page);
 
 	list_for_each_entry(gmem, gmem_list, entry) {
+		struct kvm *kvm = gmem->kvm;
+
+		KVM_MMU_LOCK(kvm);
+		kvm_mmu_invalidate_begin(kvm);
+		KVM_MMU_UNLOCK(kvm);
+
+		flush = false;
 		xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) {
-			for (gfn = start; gfn < end; gfn++) {
-				if (WARN_ON_ONCE(gfn < slot->base_gfn ||
-						gfn >= slot->base_gfn + slot->npages))
-					continue;
-
-				/*
-				 * FIXME: Tell userspace that the *private*
-				 * memory encountered an error.
-				 */
-				send_sig_mceerr(BUS_MCEERR_AR,
-						(void __user *)gfn_to_hva_memslot(slot, gfn),
-						PAGE_SHIFT, current);
-			}
+			pgoff_t pgoff;
+
+			if (WARN_ON_ONCE(end < slot->base_gfn ||
+					 start >= slot->base_gfn + slot->npages))
+				continue;
+
+			pgoff = slot->gmem.pgoff;
+			struct kvm_gfn_range gfn_range = {
+				.slot = slot,
+				.start = slot->base_gfn + max(pgoff, start) - pgoff,
+				.end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
+				.arg.page = page,
+				.may_block = true,
+				.memory_error = true,
+			};
+
+			flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range);
 		}
+		if (flush)
+			kvm_flush_remote_tlbs(kvm);
+
+		KVM_MMU_LOCK(kvm);
+		kvm_mmu_invalidate_end(kvm);
+		KVM_MMU_UNLOCK(kvm);
 	}
 
 	filemap_invalidate_unlock_shared(mapping);
 
-	return 0;
+	return MF_DELAYED;
 }
 
 static const struct address_space_operations kvm_gmem_aops = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/6] KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page
  2023-09-13 10:48 [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page isaku.yamahata
  2023-09-13 10:48 ` [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page isaku.yamahata
  2023-09-13 10:48 ` [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory isaku.yamahata
@ 2023-09-13 10:48 ` isaku.yamahata
  2023-09-13 17:37   ` Sean Christopherson
  2023-09-13 10:48 ` [RFC PATCH 4/6] KVM: guest_memfd: Implemnet bmap inode operation isaku.yamahata
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: isaku.yamahata @ 2023-09-13 10:48 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

When resolving kvm page fault and hwpoisoned page is given, KVM exit
with HWPOISONED flag so that user space VMM, e.g. qemu, handle it.

- Add a new flag POISON to KVM_EXIT_MEMORY_FAULT to indicate the page is
  poisoned.
- Make kvm_gmem_get_pfn() return hwpoison state by -EHWPOISON when the
  folio is hw-poisoned.
- When page is hw-poisoned on faulting in private gmem, return
  KVM_EXIT_MEMORY_FAULT with HWPOISONED flag.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c   | 21 +++++++++++++++------
 include/uapi/linux/kvm.h |  3 ++-
 virt/kvm/guest_mem.c     |  4 +++-
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 05943ccb55a4..5dc9d1fdadca 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4335,19 +4335,24 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static int kvm_do_memory_fault_exit(struct kvm_vcpu *vcpu,
-				    struct kvm_page_fault *fault)
+static int __kvm_do_memory_fault_exit(struct kvm_vcpu *vcpu,
+				      struct kvm_page_fault *fault, __u64 flags)
 {
 	vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
 	if (fault->is_private)
-		vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
-	else
-		vcpu->run->memory.flags = 0;
+		flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
+	vcpu->run->flags = flags;
 	vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
 	vcpu->run->memory.size = PAGE_SIZE;
 	return RET_PF_USER;
 }
 
+static int kvm_do_memory_fault_exit(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault)
+{
+	return __kvm_do_memory_fault_exit(vcpu, fault, 0);
+}
+
 static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault)
 {
@@ -4358,12 +4363,16 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 
 	r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
 			     &max_order);
-	if (r)
+	if (r && r != -EHWPOISON)
 		return r;
 
 	fault->max_level = min(kvm_max_level_for_order(max_order),
 			       fault->max_level);
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
+
+	if (r == -EHWPOISON)
+		return __kvm_do_memory_fault_exit(vcpu, fault,
+						  KVM_MEMORY_EXIT_FLAG_HWPOISON);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index eb900344a054..48329cb44415 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -527,7 +527,8 @@ struct kvm_run {
 		} notify;
 		/* KVM_EXIT_MEMORY_FAULT */
 		struct {
-#define KVM_MEMORY_EXIT_FLAG_PRIVATE	(1ULL << 3)
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE	BIT_ULL(3)
+#define KVM_MEMORY_EXIT_FLAG_HWPOISON	BIT_ULL(4)
 			__u64 flags;
 			__u64 gpa;
 			__u64 size;
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 746e683df589..3678287d7c9d 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -589,6 +589,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 {
 	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	struct kvm_gmem *gmem;
+	bool hwpoison = false;
 	struct folio *folio;
 	struct page *page;
 	struct file *file;
@@ -610,6 +611,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		return -ENOMEM;
 	}
 
+	hwpoison = folio_test_hwpoison(folio);
 	page = folio_file_page(folio, index);
 
 	*pfn = page_to_pfn(page);
@@ -618,7 +620,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	folio_unlock(folio);
 	fput(file);
 
-	return 0;
+	return hwpoison ? -EHWPOISON : 0;
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/6] KVM: guest_memfd: Implemnet bmap inode operation
  2023-09-13 10:48 [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page isaku.yamahata
                   ` (2 preceding siblings ...)
  2023-09-13 10:48 ` [RFC PATCH 3/6] KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page isaku.yamahata
@ 2023-09-13 10:48 ` isaku.yamahata
  2023-09-13 17:46   ` Sean Christopherson
  2023-09-13 10:48 ` [RFC PATCH 5/6] KVM: selftests: Add selftest for guest_memfd() fibmap isaku.yamahata
  2023-09-13 10:48 ` [RFC PATCH 6/6] KVM: X86: Allow KVM gmem hwpoison test cases isaku.yamahata
  5 siblings, 1 reply; 11+ messages in thread
From: isaku.yamahata @ 2023-09-13 10:48 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

To inject memory failure, physical address of the page is needed.
Implement bmap() method to convert the file offset into physical address.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/Kconfig     |  4 ++++
 virt/kvm/guest_mem.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 624df45baff0..eb008f0e7cc3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -115,3 +115,7 @@ config KVM_GENERIC_PRIVATE_MEM
 
 config HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
 	bool
+
+config KVM_GENERIC_PRIVATE_MEM_BMAP
+	depends on KVM_GENERIC_PRIVATE_MEM
+	bool
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 3678287d7c9d..90dfdfab1f8c 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -355,12 +355,40 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
 	return MF_DELAYED;
 }
 
+#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP
+static sector_t kvm_gmem_bmap(struct address_space *mapping, sector_t block)
+{
+	struct folio *folio;
+	sector_t pfn = 0;
+
+	filemap_invalidate_lock_shared(mapping);
+
+	if (block << PAGE_SHIFT > i_size_read(mapping->host))
+		goto out;
+
+	folio = filemap_get_folio(mapping, block);
+	if (IS_ERR_OR_NULL(folio))
+		goto out;
+
+	pfn = folio_pfn(folio) + (block - folio->index);
+	folio_put(folio);
+
+out:
+	filemap_invalidate_unlock_shared(mapping);
+	return pfn;
+
+}
+#endif
+
 static const struct address_space_operations kvm_gmem_aops = {
 	.dirty_folio = noop_dirty_folio,
 #ifdef CONFIG_MIGRATION
 	.migrate_folio	= kvm_gmem_migrate_folio,
 #endif
 	.error_remove_page = kvm_gmem_error_page,
+#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP
+	.bmap = kvm_gmem_bmap,
+#endif
 };
 
 static int  kvm_gmem_getattr(struct mnt_idmap *idmap,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 5/6] KVM: selftests: Add selftest for guest_memfd() fibmap
  2023-09-13 10:48 [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page isaku.yamahata
                   ` (3 preceding siblings ...)
  2023-09-13 10:48 ` [RFC PATCH 4/6] KVM: guest_memfd: Implemnet bmap inode operation isaku.yamahata
@ 2023-09-13 10:48 ` isaku.yamahata
  2023-09-13 10:48 ` [RFC PATCH 6/6] KVM: X86: Allow KVM gmem hwpoison test cases isaku.yamahata
  5 siblings, 0 replies; 11+ messages in thread
From: isaku.yamahata @ 2023-09-13 10:48 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 45 +++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 4d2b110ab0d6..c20b4a14e9c7 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -10,6 +10,7 @@
 #include "kvm_util_base.h"
 #include <linux/bitmap.h>
 #include <linux/falloc.h>
+#include <linux/fs.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -91,6 +92,49 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size)
 	TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed");
 }
 
+static void test_fibmap(int fd, size_t page_size, size_t total_size)
+{
+	int ret;
+	int b;
+	int i;
+
+	/* Make while file unallocated as known initial state */
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+			0, total_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) at while file should succeed");
+
+	for (i = 0; i < total_size / page_size; i++) {
+		b = i;
+		ret = ioctl(fd, FIBMAP, &b);
+		if (ret == -EINVAL) {
+			print_skip("Set CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP=y. bmap test. ");
+			return;
+		}
+		TEST_ASSERT(!ret, "ioctl(FIBMAP) should succeed");
+		TEST_ASSERT(!b, "ioctl(FIBMAP) should return zero 0x%x", b);
+	}
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, page_size, page_size * 2);
+	TEST_ASSERT(!ret, "fallocate beginning at page_size should succeed");
+
+	for (i = 0; i < total_size / page_size; i++) {
+		b = i;
+		ret = ioctl(fd, FIBMAP, &b);
+		if (ret == -EINVAL) {
+			print_skip("Set CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP=y. bmap test. ");
+			return;
+		}
+		TEST_ASSERT(!ret, "ioctl(FIBMAP) should succeed");
+
+		if (i == 1 || i == 2) {
+			TEST_ASSERT(b, "ioctl(FIBMAP) should return non-zero 0x%x", b);
+		} else {
+			TEST_ASSERT(!b, "ioctl(FIBMAP) should return non-zero, 0x%x", b);
+		}
+	}
+
+}
+
 static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
 {
 	uint64_t valid_flags = 0;
@@ -158,6 +202,7 @@ int main(int argc, char *argv[])
 	test_mmap(fd, page_size);
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
+	test_fibmap(fd, page_size, total_size);
 
 	close(fd);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 6/6] KVM: X86: Allow KVM gmem hwpoison test cases
  2023-09-13 10:48 [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page isaku.yamahata
                   ` (4 preceding siblings ...)
  2023-09-13 10:48 ` [RFC PATCH 5/6] KVM: selftests: Add selftest for guest_memfd() fibmap isaku.yamahata
@ 2023-09-13 10:48 ` isaku.yamahata
  5 siblings, 0 replies; 11+ messages in thread
From: isaku.yamahata @ 2023-09-13 10:48 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Michael Roth, Paolo Bonzini,
	Sean Christopherson, erdemaktas, Sagi Shahar, David Matlack,
	Kai Huang, Zhi Wang, chen.bo, linux-coco, Chao Peng,
	Ackerley Tng, Vishal Annapurve, Yuan Yao, Jarkko Sakkinen,
	Xu Yilun, Quentin Perret, wei.w.wang, Fuad Tabba

From: Isaku Yamahata <isaku.yamahata@intel.com>

Set HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR and KVM_GENERIC_PRIVATE_MEM_BMAP
to allow test cases for KVM gmem.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 029c76bcd1a5..46fdedde9c0f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -82,6 +82,8 @@ config KVM_SW_PROTECTED_VM
 	depends on EXPERT
 	depends on X86_64
 	select KVM_GENERIC_PRIVATE_MEM
+	select HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
+	select KVM_GENERIC_PRIVATE_MEM_BMAP
 	help
 	  Enable support for KVM software-protected VMs.  Currently "protected"
 	  means the VM can be backed with memory provided by
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page
  2023-09-13 10:48 ` [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page isaku.yamahata
@ 2023-09-13 16:16   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2023-09-13 16:16 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Wed, Sep 13, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add config, HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR, to indicate kvm arch
> can handle gmem error page.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  virt/kvm/Kconfig     | 3 +++
>  virt/kvm/guest_mem.c | 3 +++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 1a48cb530092..624df45baff0 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -112,3 +112,6 @@ config KVM_GENERIC_PRIVATE_MEM
>         select KVM_GENERIC_MEMORY_ATTRIBUTES
>         select KVM_PRIVATE_MEM
>         bool
> +
> +config HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
> +	bool
> diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> index 85903c32163f..35d8f03e7937 100644
> --- a/virt/kvm/guest_mem.c
> +++ b/virt/kvm/guest_mem.c
> @@ -307,6 +307,9 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
>  	pgoff_t start, end;
>  	gfn_t gfn;
>  
> +	if (!IS_ENABLED(CONFIG_HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR))
> +		return MF_IGNORED;

I don't see the point, KVM can and should always zap SPTEs, i.e. can force the
geust to re-fault on the affected memory.  At that point kvm_gmem_get_pfn() will
return -EHWPOISON and architectures that don't support graceful recovery can
simply terminate the VM.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory
  2023-09-13 10:48 ` [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory isaku.yamahata
@ 2023-09-13 16:28   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2023-09-13 16:28 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Wed, Sep 13, 2023, isaku.yamahata@intel.com wrote:
> @@ -316,26 +316,43 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
>  	end = start + thp_nr_pages(page);
>  
>  	list_for_each_entry(gmem, gmem_list, entry) {
> +		struct kvm *kvm = gmem->kvm;
> +
> +		KVM_MMU_LOCK(kvm);
> +		kvm_mmu_invalidate_begin(kvm);
> +		KVM_MMU_UNLOCK(kvm);
> +
> +		flush = false;
>  		xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) {
> -			for (gfn = start; gfn < end; gfn++) {
> -				if (WARN_ON_ONCE(gfn < slot->base_gfn ||
> -						gfn >= slot->base_gfn + slot->npages))
> -					continue;
> -
> -				/*
> -				 * FIXME: Tell userspace that the *private*
> -				 * memory encountered an error.
> -				 */
> -				send_sig_mceerr(BUS_MCEERR_AR,
> -						(void __user *)gfn_to_hva_memslot(slot, gfn),
> -						PAGE_SHIFT, current);
> -			}
> +			pgoff_t pgoff;
> +
> +			if (WARN_ON_ONCE(end < slot->base_gfn ||
> +					 start >= slot->base_gfn + slot->npages))
> +				continue;
> +
> +			pgoff = slot->gmem.pgoff;
> +			struct kvm_gfn_range gfn_range = {
> +				.slot = slot,
> +				.start = slot->base_gfn + max(pgoff, start) - pgoff,
> +				.end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
> +				.arg.page = page,
> +				.may_block = true,
> +				.memory_error = true,

Why pass arg.page and memory_error?  There's no usage in this mini-series, and no
explanation of what arch code would do the information.  And I can't think of why
arch would need to do anything but zap the SPTEs.  If the memory error is directly
related to the current instruction, the vCPU will fault on the zapped SPTE, see
-HWPOISON, and exit to userspace.  If the memory is unrelated, then the delayed
notification is less than ideal, but not fundamentally broken, e.g. it's no worse
than TDX's behavior of not signaling #MC until a poisoned cache line is actually
accessed.

I don't get arg.page in particular, because having the gfn should be enough for
arch code to take action beyond zapping SPTEs.

And _if_ we want to communicate the error to arch code, it would be much better
to add a dedicated arch hook instead of piggybacking kvm_mmu_unmap_gfn_range()
with a "memory_error" flag. 

If we just zap SPTEs, then can't this simply be?

  static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
  {
	struct list_head *gmem_list = &mapping->private_list;
	struct kvm_gmem *gmem;
	pgoff_t start, end;

	filemap_invalidate_lock_shared(mapping);

	start = page->index;
	end = start + thp_nr_pages(page);

	list_for_each_entry(gmem, gmem_list, entry)
		kvm_gmem_invalidate_begin(gmem, start, end);

	/*
	 * Do not truncate the range, what action is taken in response to the
	 * error is userspace's decision (assuming the architecture supports
	 * gracefully handling memory errors).  If/when the guest attempts to
	 * access a poisoned page, kvm_gmem_get_pfn() will return -EHWPOISON,
	 * at which point KVM can either terminate the VM or propagate the
	 * error to userspace.
	 */

	list_for_each_entry(gmem, gmem_list, entry)
		kvm_gmem_invalidate_end(gmem, start, end);

	filemap_invalidate_unlock_shared(mapping);

	return MF_DELAYED;
  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 3/6] KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page
  2023-09-13 10:48 ` [RFC PATCH 3/6] KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page isaku.yamahata
@ 2023-09-13 17:37   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2023-09-13 17:37 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Wed, Sep 13, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> When resolving kvm page fault and hwpoisoned page is given, KVM exit
> with HWPOISONED flag so that user space VMM, e.g. qemu, handle it.
> 
> - Add a new flag POISON to KVM_EXIT_MEMORY_FAULT to indicate the page is
>   poisoned.
> - Make kvm_gmem_get_pfn() return hwpoison state by -EHWPOISON when the
>   folio is hw-poisoned.
> - When page is hw-poisoned on faulting in private gmem, return
>   KVM_EXIT_MEMORY_FAULT with HWPOISONED flag.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---

...

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index eb900344a054..48329cb44415 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -527,7 +527,8 @@ struct kvm_run {
>  		} notify;
>  		/* KVM_EXIT_MEMORY_FAULT */
>  		struct {
> -#define KVM_MEMORY_EXIT_FLAG_PRIVATE	(1ULL << 3)
> +#define KVM_MEMORY_EXIT_FLAG_PRIVATE	BIT_ULL(3)
> +#define KVM_MEMORY_EXIT_FLAG_HWPOISON	BIT_ULL(4)

Rather than add a flag, I think we should double down on returning -1 + errno
when exiting with vcpu->run->exit_reason == KVM_EXIT_MEMORY_FAULT, as is being
proposed in Anish's series for accelerating UFFD-like behavior in KVM[*].

Then KVM can simply return -EFAULT or -EHWPOISON to communicate why KVM is
existing at a higher level, and let the kvm_run structure provide the finer
details about the access itself.  E.g. kvm_faultin_pfn_private() can simply
propagate the return value from kvm_gmem_get_pfn() without having to identify
*why* kvm_gmem_get_pfn() failed.

static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
				   struct kvm_page_fault *fault)
{
	int max_order, r;

	if (!kvm_slot_can_be_private(fault->slot)) {
		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
		return -EFAULT;
	}

	r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
			     &max_order);
	if (r) {
		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
		return r;
	}

	...
}

[*] https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 4/6] KVM: guest_memfd: Implemnet bmap inode operation
  2023-09-13 10:48 ` [RFC PATCH 4/6] KVM: guest_memfd: Implemnet bmap inode operation isaku.yamahata
@ 2023-09-13 17:46   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2023-09-13 17:46 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: kvm, linux-kernel, isaku.yamahata, Michael Roth, Paolo Bonzini,
	erdemaktas, Sagi Shahar, David Matlack, Kai Huang, Zhi Wang,
	chen.bo, linux-coco, Chao Peng, Ackerley Tng, Vishal Annapurve,
	Yuan Yao, Jarkko Sakkinen, Xu Yilun, Quentin Perret, wei.w.wang,
	Fuad Tabba

On Wed, Sep 13, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> To inject memory failure, physical address of the page is needed.
> Implement bmap() method to convert the file offset into physical address.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  virt/kvm/Kconfig     |  4 ++++
>  virt/kvm/guest_mem.c | 28 ++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 624df45baff0..eb008f0e7cc3 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -115,3 +115,7 @@ config KVM_GENERIC_PRIVATE_MEM
>  
>  config HAVE_GENERIC_PRIVATE_MEM_HANDLE_ERROR
>  	bool
> +
> +config KVM_GENERIC_PRIVATE_MEM_BMAP
> +	depends on KVM_GENERIC_PRIVATE_MEM
> +	bool
> diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> index 3678287d7c9d..90dfdfab1f8c 100644
> --- a/virt/kvm/guest_mem.c
> +++ b/virt/kvm/guest_mem.c
> @@ -355,12 +355,40 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page)
>  	return MF_DELAYED;
>  }
>  
> +#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP
> +static sector_t kvm_gmem_bmap(struct address_space *mapping, sector_t block)
> +{
> +	struct folio *folio;
> +	sector_t pfn = 0;
> +
> +	filemap_invalidate_lock_shared(mapping);
> +
> +	if (block << PAGE_SHIFT > i_size_read(mapping->host))
> +		goto out;
> +
> +	folio = filemap_get_folio(mapping, block);
> +	if (IS_ERR_OR_NULL(folio))
> +		goto out;
> +
> +	pfn = folio_pfn(folio) + (block - folio->index);
> +	folio_put(folio);
> +
> +out:
> +	filemap_invalidate_unlock_shared(mapping);
> +	return pfn;

IIUC, hijacking bmap() is a gigantic hack to propagate a host pfn to userspace
without adding a new ioctl() or syscall.  If we want to support target injection,
I would much, much rather add a KVM ioctl(), e.g. to let userspace inject errors
for a gfn.  Returning a pfn for something that AFAICT has nothing to do with pfns
is gross, e.g. the whole "0 is the error code" thing is technically wrong because
'0' is a perfectly valid pfn.

My vote is to drop this and not extend the injection information for the initial
merge, i.e. rely on point testing to verify kvm_gmem_error_page(), and defer adding
uAPI to let selftests inject errors.

> +
> +}
> +#endif
> +
>  static const struct address_space_operations kvm_gmem_aops = {
>  	.dirty_folio = noop_dirty_folio,
>  #ifdef CONFIG_MIGRATION
>  	.migrate_folio	= kvm_gmem_migrate_folio,
>  #endif
>  	.error_remove_page = kvm_gmem_error_page,
> +#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM_BMAP
> +	.bmap = kvm_gmem_bmap,
> +#endif
>  };
>  
>  static int  kvm_gmem_getattr(struct mnt_idmap *idmap,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-09-13 17:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-13 10:48 [RFC PATCH 0/6] KVM: gmem: Implement error_remove_page isaku.yamahata
2023-09-13 10:48 ` [RFC PATCH 1/6] KVM: guest_memfd: Add config to show the capability to handle error page isaku.yamahata
2023-09-13 16:16   ` Sean Christopherson
2023-09-13 10:48 ` [RFC PATCH 2/6] KVM: guestmem_fd: Make error_remove_page callback to unmap guest memory isaku.yamahata
2023-09-13 16:28   ` Sean Christopherson
2023-09-13 10:48 ` [RFC PATCH 3/6] KVM: guest_memfd, x86: MEMORY_FAULT exit with hw poisoned page isaku.yamahata
2023-09-13 17:37   ` Sean Christopherson
2023-09-13 10:48 ` [RFC PATCH 4/6] KVM: guest_memfd: Implemnet bmap inode operation isaku.yamahata
2023-09-13 17:46   ` Sean Christopherson
2023-09-13 10:48 ` [RFC PATCH 5/6] KVM: selftests: Add selftest for guest_memfd() fibmap isaku.yamahata
2023-09-13 10:48 ` [RFC PATCH 6/6] KVM: X86: Allow KVM gmem hwpoison test cases isaku.yamahata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).